Shortest Path between nodes excluding nodes with a certain label

andyking27 · January 20, 2021, 1:01pm

We have a large network stored in v3.4.8 that incorporates nodes representing Companies and People. The edges between the nodes represent Appointments (i.e. Person 1 works at Company A).
Some of the People nodes are actually companies who function as if they are People (and are stored in the graph with a label of 'Is Company' = 1).

We are calculating the shortest path between companies using the built-in shortestPath cypher function -
match (start:Company {id:1}),(end:Company {id:100}), p = shortestPath((start)-[:APPOINTMENT*]-(end)) return *

What we want to do now is return the shortest path that doesn't involve a People node where 'Is Company' = 1.

I have tried using queries like where ALL(x in nodes(p) WHERE x.Is Company = 0) but performance is awful even with an index on 'Is Company'.

tard_gabriel · January 20, 2021, 6:55pm

WHERE NONE (n IN nodes(p) WHERE n.Is Company = 1) ?

OR SOMETHING DIFFERENT IN PURE OLD SCHOOL CYPHER
DO NOT FORGET TO INDICATE A RELATIONSHIP DIRECTION IF YOU CAN ( OR TIME WILL DOUBLE )

MATCH path = (s:Company {id:1})-[:APPOINTMENT*]->(e:Company {id:100})
RETURN path AS Path, length(path) AS Length ORDER BY Length

Not quite perfect this last one but can give you a kind of debugger.

Or you can use the PROFILE clause on the shortestPath one to see what is happening there.

andyking27 · January 28, 2021, 9:22am

We got the following cypher to work -
match (start:Company {id:1}),(end:Company {id:100}), p = shortestPath((start)-[:APPOINTMENT*]-(end)) WHERE ALL(x in nodes(p) where x.'Is Company' = 0 OR x.'Is Company' IS NULL) return *

However, we also have a query similar to the following -
match (start:Company {id:1}),(end:Person), p = shortestPath((start)-[:APPOINTMENT*]-(end)) where end.id IN [<list of comma separated ids>] return * order by length(p) limit 1 which is designed to get the shortest path from a user-defined list of people nodes. This list has no limit on the number of entries. When trying to apply the same logic from the first query the performance is awful and, depending on the length of the list, can cause the server to fall over. The query is -
match (start:Company {id:1}),(end:Person), p = shortestPath((start)-[:APPOINTMENT*]-(end)) where ALL(x in nodes(p) where x.'Is Company' = 0 OR x.'Is Company' IS NULL) and end.id IN [<list of comma separated ids>] return * order by length(p) limit 1

Can anyone suggest any alternatives where the performance is not as bad? There are indexes on the node.id labels and the 'Is Company' label.

clem · January 28, 2021, 5:50pm

I think the problem you are falling into is that the number of nodes matched (end:Person) is huge. I think you need to filter out the end matches. If you do PROFILE on the query, you'll see what I mean.

I'll confess that I haven't used shortestPath(), so I'm not 100% sure that this will work, but I tried this somewhat analogous (and simpler) query with the movie DB and it worked:

MATCH(start:Person) WHERE start.name = "Rob Reiner"
// limited possible ends:
MATCH (end:Person) WHERE end.name STARTS WITH "Michael" 
WITH shortestPath((start)-[*]-(end)) AS p
RETURN p ORDER BY length(p)

So, I suggest trying something like this (although I'm almost sure I got the syntax screwed up after the WITH.)

MATCH (start:Company {id:1})
MATCH (end:Person) WHERE end.id IN [<list of comma separated ids>]
// the following part needs fixing!
WITH shortestPath((start)-[:APPOINTMENT*]-(end)) where ALL(x in nodes(p) where x.'Is Company' = 0 OR x.'Is Company' IS NULL) 
 return * order by length(p) limit 1

It would be helpful if this documentation includes complex conditionals using WHERE for start and end nodes plus relationships:

(I gave some feedback on this doc page.).

andrew_bowman · February 2, 2021, 3:41am

It sounds like you're trying to find which of those persons with the given ids is closest to the company.

If you have APOC Procedures you can use the path expander to discover this. We will need to pre-match to the persons and collect() them into a list, and submit them as the end nodes of the expansion, and that we only want 1 result. Because breadth-first expansion is used, it will find the single closest result. Keep in mind though that this only expands in one direction, unlike the shortestPath() function.

That said, the path expander procs don't do filtering of properties during expansion, but it can blacklist labels, including compound labels. So if the People/Company nodes are labeled as both :Person and :Company, then we can blacklist :Person:Company nodes easily.

MATCH (end:Person)
WHERE end.id IN [2, 5]
WITH collect(end) as endPersons
MATCH (start:Company {id:1})
CALL apoc.path.subgraphNodes(start, {relationshipFilter:'APPOINTMENT', endNodes:endPersons, limit:1, labelFilter:'-Person:Company'}) YIELD node
RETURN node

Note that this will traverse any other kind of node, include just :Person nodes, or just :Company nodes. You can tweak the labelFilter if you require additional restrictions.

Topic		Replies	Views
Shortest Path between nodes excluding nodes with a certain label Neo4j Graph Platform migrated	1	214	June 4, 2022
Excluding certain paths from allShortestPaths Cypher shortest-path	6	943	May 1, 2022
Get shortest path between nodes going via specific node Cypher cypher , shortest-path	1	541	November 12, 2020
Find the shortest Path in a graph excluding a relationship with a certain paramater Graph Data Science / Graph Analytics shortest-path	0	1046	May 22, 2019
Restricted shortest path: include nodes with certain properties in the shortest found path Neo4j Graph Platform migrated	0	217	January 10, 2023

Take the Course Then Join The Aura Agent Hackathon

Shortest Path between nodes excluding nodes with a certain label

Related topics

Take the Course Then Join
The Aura Agent Hackathon