I have a use case where I have a specific node as a starting point, and I want to find the closest nodes (with a specific label), when only traversing specific relationships in specified direction. I've made some progress but would appreciate any suggestions on a more elegant approach.
The diagram below shows an example data set. This is just a subset of the overall model and there are many other relationships and nodes/labels.
The data model is basically document meta data that consists of a (:DocVersion), which [;CONTAINS] a hierarchy of (:Clause). The (:Applicability) is meta data that can be associated with (:DocVersion) or (:Clause)
Starting at any Clause, I want to return the closest Applicability node. The closest can be directly related to the starting Clause, or it could be found by traversing up the hierarchy like (c0:Clause)<-[:CONTAINS]-(c1:Clause)<-[:CONTAINS]-(:DocVersion) hierarchy.
Some examples of the starting point and expected output:
Starting Clause
Expected Output
D1V1 C1
D1V1 A1a, D1V1 A2b
D1V1 C2
D1V1 A2
D1V1 C1.1
D1V1 A1a, D1V1 A2b
D1V1 C1.2.1.1
D1V1 A5a, D1V1A5b
Sample Data
This script will create the sample data above.
// Create Sample Doc1
MERGE (dv:DocVersion {name:'D1V1'})
MERGE (dv)-[:CONTAINS]->(c1:Clause {name:'D1V1 C1'})
MERGE (c1)-[:CONTAINS]->(c1_1:Clause {name:'D1V1 C1.1'})
MERGE (c1_1)-[:CONTAINS]->(c1_1_1:Clause {name:'D1V1 C1.1.1'})
MERGE (c1)-[:CONTAINS]->(c1_2:Clause {name:'D1V1 C1.2'})
MERGE (c1_2)-[:CONTAINS]->(c1_2_1:Clause {name:'D1V1 C1.2.1'})
MERGE (c1_2_1)-[:CONTAINS]->(c1_2_1_1:Clause {name:'D1V1 C1.2.1.1'})
MERGE (c1_2_1)-[:CONTAINS]->(c1_2_1_2:Clause {name:'D1V1 C1.2.1.2'})
MERGE (dv)-[:CONTAINS]->(c2:Clause {name:'D1V1 C2'})
MERGE (c2)-[:CONTAINS]->(c2_1:Clause {name:'D1V1 C2.1'})
MERGE (dv)-[:CONTAINS]->(c3:Clause {name:'D1V1 C3'})
MERGE (c3)-[:CONTAINS]->(c3_1:Clause {name:'D1V1 C3.1'})
MERGE (dv)-[:APPLIES_TO]->(a1a:Applicability {name:'D1V1 A1a'})
MERGE (dv)-[:APPLIES_TO]->(a1b:Applicability {name:'D1V1 A1b'})
MERGE (c2)-[:APPLIES_TO]->(a2:Applicability {name:'D1V1 A2'})
MERGE (c3)-[:APPLIES_TO]->(a3:Applicability {name:'D1V1 A3'})
MERGE (c1_1_1)-[:APPLIES_TO]->(a4:Applicability {name:'D1V1 A4'})
MERGE (c1_2_1)-[:APPLIES_TO]->(a5a:Applicability {name:'D1V1 A5a'})
MERGE (c1_2_1)-[:APPLIES_TO]->(a5b:Applicability {name:'D1V1 A5b'})
Currently I have 2 approaches that get close to what I'm looking for. They both return a superset of what I'm wanting.
Method 1: plain cypher
This returns ALL the possible (:Applicability) and the number of relationships. What I want is just the ones that have the shortest distance.
WITH "D1V1 C1.2.1" as Cname
MATCH (c:Clause {name:Cname})
// Get all possibly applicability nodes
MATCH p=(a:Applicability)<-[:APPLIES_TO]-(n)-[:CONTAINS*0..10]->(c)
// For each path, get the steps to applicability node
UNWIND p as pn
// return applicability nodes and steps
WITH a, length(p) as steps
RETURN a.name, steps
ORDER BY steps ASC
Method 2: using apoc
This returns a set of paths and lengths which would then need to be parsed to see which path includes (:Applicability), and which are the closest.
// Get clause applicability (apoc)
MATCH (c:Clause {name: "D1V1 C1.2.1"})
CALL apoc.path.expand(c, "<CONTAINS|APPLIES_TO>", "Clause|DocVersion|Applicability", 1, 5)
YIELD path
RETURN path, length(path) AS hops
ORDER BY hops;




