Finding closest node(s) with label, through specific relationships & directions

I have a use case where I have a specific node as a starting point, and I want to find the closest nodes (with a specific label), when only traversing specific relationships in specified direction. I've made some progress but would appreciate any suggestions on a more elegant approach.

The diagram below shows an example data set. This is just a subset of the overall model and there are many other relationships and nodes/labels.

The data model is basically document meta data that consists of a (:DocVersion), which [;CONTAINS] a hierarchy of (:Clause). The (:Applicability) is meta data that can be associated with (:DocVersion) or (:Clause)

Starting at any Clause, I want to return the closest Applicability node. The closest can be directly related to the starting Clause, or it could be found by traversing up the hierarchy like (c0:Clause)<-[:CONTAINS]-(c1:Clause)<-[:CONTAINS]-(:DocVersion) hierarchy.

Some examples of the starting point and expected output:

Starting Clause

Expected Output

D1V1 C1

D1V1 A1a, D1V1 A2b

D1V1 C2

D1V1 A2

D1V1 C1.1

D1V1 A1a, D1V1 A2b

D1V1 C1.2.1.1

D1V1 A5a, D1V1A5b

Sample Data
This script will create the sample data above.

// Create Sample Doc1

MERGE (dv:DocVersion {name:'D1V1'})

MERGE (dv)-[:CONTAINS]->(c1:Clause {name:'D1V1 C1'})

MERGE (c1)-[:CONTAINS]->(c1_1:Clause {name:'D1V1 C1.1'})

MERGE (c1_1)-[:CONTAINS]->(c1_1_1:Clause {name:'D1V1 C1.1.1'})

MERGE (c1)-[:CONTAINS]->(c1_2:Clause {name:'D1V1 C1.2'})

MERGE (c1_2)-[:CONTAINS]->(c1_2_1:Clause {name:'D1V1 C1.2.1'})

MERGE (c1_2_1)-[:CONTAINS]->(c1_2_1_1:Clause {name:'D1V1 C1.2.1.1'})

MERGE (c1_2_1)-[:CONTAINS]->(c1_2_1_2:Clause {name:'D1V1 C1.2.1.2'})

MERGE (dv)-[:CONTAINS]->(c2:Clause {name:'D1V1 C2'})

MERGE (c2)-[:CONTAINS]->(c2_1:Clause {name:'D1V1 C2.1'})

MERGE (dv)-[:CONTAINS]->(c3:Clause {name:'D1V1 C3'})

MERGE (c3)-[:CONTAINS]->(c3_1:Clause {name:'D1V1 C3.1'})

MERGE (dv)-[:APPLIES_TO]->(a1a:Applicability {name:'D1V1 A1a'})

MERGE (dv)-[:APPLIES_TO]->(a1b:Applicability {name:'D1V1 A1b'})

MERGE (c2)-[:APPLIES_TO]->(a2:Applicability {name:'D1V1 A2'})

MERGE (c3)-[:APPLIES_TO]->(a3:Applicability {name:'D1V1 A3'})

MERGE (c1_1_1)-[:APPLIES_TO]->(a4:Applicability {name:'D1V1 A4'})

MERGE (c1_2_1)-[:APPLIES_TO]->(a5a:Applicability {name:'D1V1 A5a'})

MERGE (c1_2_1)-[:APPLIES_TO]->(a5b:Applicability {name:'D1V1 A5b'})

Currently I have 2 approaches that get close to what I'm looking for. They both return a superset of what I'm wanting.

Method 1: plain cypher
This returns ALL the possible (:Applicability) and the number of relationships. What I want is just the ones that have the shortest distance.

WITH "D1V1 C1.2.1" as Cname

MATCH (c:Clause {name:Cname})

// Get all possibly applicability nodes

MATCH p=(a:Applicability)<-[:APPLIES_TO]-(n)-[:CONTAINS*0..10]->(c)

// For each path, get the steps to applicability node

UNWIND p as pn

// return applicability nodes and steps

WITH a, length(p) as steps

RETURN a.name, steps

ORDER BY steps ASC

Method 2: using apoc
This returns a set of paths and lengths which would then need to be parsed to see which path includes (:Applicability), and which are the closest.

// Get clause applicability (apoc)

MATCH (c:Clause {name: "D1V1 C1.2.1"})

CALL apoc.path.expand(c, "<CONTAINS|APPLIES_TO>", "Clause|DocVersion|Applicability", 1, 5)

YIELD path

RETURN path, length(path) AS hops

ORDER BY hops;

Try without having any maxLevel. This should give your expected results.

match (a:Clause) where a.name = "D1V1 C2.1"

CALL apoc.path.spanningTree(a,{labelFilter:'Clause|DocVersion|Applicability',relationshipFilter:'<CONTAINS|APPLIES_TO>'}) YIELD path

return path

match (a:Clause) where a.name = "D1V1 C2.1"

CALL apoc.path.spanningTree(a,{labelFilter:'Clause|DocVersion|Applicability',relationshipFilter:'<CONTAINS|APPLIES_TO>',maxLevel:2}) YIELD path

return path

Screen Shot 2022-06-07 at 2.22.17 PM.png

Thanks. I think the issue with this approach is the maxLevel will be different depending on the starting point.

For example in this slightly modified example, the maxLevel needs to be 3.

match (a:Clause) where a.name = "D1V1 C1.1"
CALL apoc.path.spanningTree(a,{labelFilter:'Clause|DocVersion|Applicability', relationshipFilter:'<CONTAINS|APPLIES_TO>',maxLevel:3}) YIELD path
return path

But using 3 for a different start node results in more Applicability nodes than expected.

match (a:Clause) where a.name = "D1V1 C2.1"
CALL apoc.path.spanningTree(a,{labelFilter:'Clause|DocVersion|Applicability', relationshipFilter:'<CONTAINS|APPLIES_TO>',maxLevel:3}) YIELD path
return path

kurt_slater_0-1654634849534.png

Try this:

match (a:Clause) where a.name = "D1V1 C1"

CALL apoc.path.spanningTree(a,{labelFilter:'DocVersion|Applicability', relationshipFilter:'CONTAINS|APPLIES_TO',maxLevel:2}) YIELD path

return path

Try this modification to your cypher approach:

WITH "D1V1 C1.2.1" as Cname
MATCH (c:Clause {name:Cname})
MATCH p=(a:Applicability)<-[:APPLIES_TO]-(n)-[:CONTAINS*0..10]->(c)
WITH collect({a:a, length: length(p)}) as paths, min(length(p)) as minLength
unwind [i in paths where i.length = minLength] as result
return result.a as a, result.length as length

Screen Shot 2022-06-07 at 3.13.48 PM.png

Thanks! This does seem to provide the expected output, at least for a single starting point. I'll be curious to see how performant it is with a larger data set. My concern is the step that gets all the paths (which could be hundreds), then finds the minLength.

Give it a try to see. I am not sure it will be too bad. You have a defined anchor node to start with; make sure there is an index on the search property. Also, your graph looks like a tree that does not have crossovers and any looping, so the expansion should not be too bad. Let us know how it works.