Finding closest node(s) with label, through specific relationships & directions

kurt_slater · June 7, 2022, 5:13pm

I have a use case where I have a specific node as a starting point, and I want to find the closest nodes (with a specific label), when only traversing specific relationships in specified direction. I've made some progress but would appreciate any suggestions on a more elegant approach.

The diagram below shows an example data set. This is just a subset of the overall model and there are many other relationships and nodes/labels.

The data model is basically document meta data that consists of a (:DocVersion), which [;CONTAINS] a hierarchy of (:Clause). The (:Applicability) is meta data that can be associated with (:DocVersion) or (:Clause)

Starting at any Clause, I want to return the closest Applicability node. The closest can be directly related to the starting Clause, or it could be found by traversing up the hierarchy like (c0:Clause)<-[:CONTAINS]-(c1:Clause)<-[:CONTAINS]-(:DocVersion) hierarchy.

Some examples of the starting point and expected output:

Starting Clause

Expected Output

D1V1 C1

D1V1 A1a, D1V1 A2b

D1V1 C2

D1V1 A2

D1V1 C1.1

D1V1 A1a, D1V1 A2b

D1V1 C1.2.1.1

D1V1 A5a, D1V1A5b

Sample Data
This script will create the sample data above.

// Create Sample Doc1

MERGE (dv:DocVersion {name:'D1V1'})

MERGE (dv)-[:CONTAINS]->(c1:Clause {name:'D1V1 C1'})

MERGE (c1)-[:CONTAINS]->(c1_1:Clause {name:'D1V1 C1.1'})

MERGE (c1_1)-[:CONTAINS]->(c1_1_1:Clause {name:'D1V1 C1.1.1'})

MERGE (c1)-[:CONTAINS]->(c1_2:Clause {name:'D1V1 C1.2'})

MERGE (c1_2)-[:CONTAINS]->(c1_2_1:Clause {name:'D1V1 C1.2.1'})

MERGE (c1_2_1)-[:CONTAINS]->(c1_2_1_1:Clause {name:'D1V1 C1.2.1.1'})

MERGE (c1_2_1)-[:CONTAINS]->(c1_2_1_2:Clause {name:'D1V1 C1.2.1.2'})

MERGE (dv)-[:CONTAINS]->(c2:Clause {name:'D1V1 C2'})

MERGE (c2)-[:CONTAINS]->(c2_1:Clause {name:'D1V1 C2.1'})

MERGE (dv)-[:CONTAINS]->(c3:Clause {name:'D1V1 C3'})

MERGE (c3)-[:CONTAINS]->(c3_1:Clause {name:'D1V1 C3.1'})

MERGE (dv)-[:APPLIES_TO]->(a1a:Applicability {name:'D1V1 A1a'})

MERGE (dv)-[:APPLIES_TO]->(a1b:Applicability {name:'D1V1 A1b'})

MERGE (c2)-[:APPLIES_TO]->(a2:Applicability {name:'D1V1 A2'})

MERGE (c3)-[:APPLIES_TO]->(a3:Applicability {name:'D1V1 A3'})

MERGE (c1_1_1)-[:APPLIES_TO]->(a4:Applicability {name:'D1V1 A4'})

MERGE (c1_2_1)-[:APPLIES_TO]->(a5a:Applicability {name:'D1V1 A5a'})

MERGE (c1_2_1)-[:APPLIES_TO]->(a5b:Applicability {name:'D1V1 A5b'})

Currently I have 2 approaches that get close to what I'm looking for. They both return a superset of what I'm wanting.

Method 1: plain cypher
This returns ALL the possible (:Applicability) and the number of relationships. What I want is just the ones that have the shortest distance.

WITH "D1V1 C1.2.1" as Cname

MATCH (c:Clause {name:Cname})

// Get all possibly applicability nodes

MATCH p=(a:Applicability)<-[:APPLIES_TO]-(n)-[:CONTAINS*0..10]->(c)

// For each path, get the steps to applicability node

UNWIND p as pn

// return applicability nodes and steps

WITH a, length(p) as steps

RETURN a.name, steps

ORDER BY steps ASC

Method 2: using apoc
This returns a set of paths and lengths which would then need to be parsed to see which path includes (:Applicability), and which are the closest.

// Get clause applicability (apoc)

MATCH (c:Clause {name: "D1V1 C1.2.1"})

CALL apoc.path.expand(c, "<CONTAINS|APPLIES_TO>", "Clause|DocVersion|Applicability", 1, 5)

YIELD path

RETURN path, length(path) AS hops

ORDER BY hops;

ameyasoft · June 7, 2022, 9:28pm

Try without having any maxLevel. This should give your expected results.

match (a:Clause) where a.name = "D1V1 C2.1"

CALL apoc.path.spanningTree(a,{labelFilter:'Clause|DocVersion|Applicability',relationshipFilter:'<CONTAINS|APPLIES_TO>'}) YIELD path

return path

ameyasoft · June 7, 2022, 9:24pm

match (a:Clause) where a.name = "D1V1 C2.1"

CALL apoc.path.spanningTree(a,{labelFilter:'Clause|DocVersion|Applicability',relationshipFilter:'<CONTAINS|APPLIES_TO>',maxLevel:2}) YIELD path

return path

Screen Shot 2022-06-07 at 2.22.17 PM.png

kurt_slater · June 7, 2022, 8:53pm

Thanks. I think the issue with this approach is the maxLevel will be different depending on the starting point.

For example in this slightly modified example, the maxLevel needs to be 3.

match (a:Clause) where a.name = "D1V1 C1.1"
CALL apoc.path.spanningTree(a,{labelFilter:'Clause|DocVersion|Applicability', relationshipFilter:'<CONTAINS|APPLIES_TO>',maxLevel:3}) YIELD path
return path

But using 3 for a different start node results in more Applicability nodes than expected.

match (a:Clause) where a.name = "D1V1 C2.1"
CALL apoc.path.spanningTree(a,{labelFilter:'Clause|DocVersion|Applicability', relationshipFilter:'<CONTAINS|APPLIES_TO>',maxLevel:3}) YIELD path
return path

ameyasoft · June 7, 2022, 8:25pm

Try this:

match (a:Clause) where a.name = "D1V1 C1"

CALL apoc.path.spanningTree(a,{labelFilter:'DocVersion|Applicability', relationshipFilter:'CONTAINS|APPLIES_TO',maxLevel:2}) YIELD path

return path

glilienfield · June 7, 2022, 7:16pm

Try this modification to your cypher approach:

WITH "D1V1 C1.2.1" as Cname
MATCH (c:Clause {name:Cname})
MATCH p=(a:Applicability)<-[:APPLIES_TO]-(n)-[:CONTAINS*0..10]->(c)
WITH collect({a:a, length: length(p)}) as paths, min(length(p)) as minLength
unwind [i in paths where i.length = minLength] as result
return result.a as a, result.length as length

Screen Shot 2022-06-07 at 3.13.48 PM.png

kurt_slater · June 7, 2022, 8:35pm

Thanks! This does seem to provide the expected output, at least for a single starting point. I'll be curious to see how performant it is with a larger data set. My concern is the step that gets all the paths (which could be hundreds), then finds the minLength.

glilienfield · June 7, 2022, 9:22pm

Give it a try to see. I am not sure it will be too bad. You have a defined anchor node to start with; make sure there is an index on the search property. Also, your graph looks like a tree that does not have crossovers and any looping, so the expansion should not be too bad. Let us know how it works.

Topic		Replies	Views
The closest node with label Cypher	10	305	May 8, 2023
Finding the closest node with a certain label from a given node Cypher shortest-path	3	1715	April 20, 2020
Possible paths from a node without a destination node Neo4j Graph Platform	14	885	May 21, 2020
Length pattern. Find nodes at a certain distance from another node Cypher	3	614	October 27, 2021
How to get only the closest neighbor in the pattern -[:rel * 0..i]->? Cypher shortest-path	1	1455	September 24, 2019

Take the Course Then Join The Aura Agent Hackathon

Finding closest node(s) with label, through specific relationships & directions

Related topics

Take the Course Then Join
The Aura Agent Hackathon