How to capture intersecting paths?

pborah88 · February 19, 2020, 4:14pm

I have a graph that looks like this -

It's the same type of node interacting with one another. (:DataSet)-[:INTERACTS_WITH*]->(:DataSet)

I want to write a query that returns the starting nodes of the longest paths.

so far I have -

match p=(d:DataSet)-[r:INTERACTS_WITH*]->(d1:DataSet)
return nodes(p), length(p)
order by length(p) desc

And this kind of does the job but I see that the max path length is 8

CLEARLY that is not the length of the longest path. If I remove direction from the query, it becomes very expensive and unscalable. What am I doing wrong?

andrew_bowman · February 19, 2020, 7:11pm

Removing the direction in your pattern would likely get you the answer you're after. Are you after the single longest path? If so you may want to add a LIMIT 1 to the end of your query, or otherwise look for a different way to get the top results (if you want all paths with the same longest path length). apoc.agg.maxItems() from APOC Procedures may do the trick.

Also, are there any particular restrictions you want on the paths explored? Do you want to ensure the nodes in each path never repeat (never looping back)?

pborah88 · February 19, 2020, 7:35pm

Removing the direction is not an option as then it becomes a very very expensive query even with a total of only 4500 nodes.
I want to understand how the limit works in this case because for this query with a direction, the limit really affects how intersecting paths are returned. For example, for the neovis graph I posted above, the query is -

match p=(d:DataSet)-[r:INTERACTS_WITH*]->(d1:DataSet)
return p
order by length(p) desc limit 400

The image is pretty much the whole graph and it sure isn't 400 nodes!
Lastly, I tried the query with a limit of 20 and removing the direction. It times out. All node names of all types are indexed but not constrained.

andrew_bowman · February 19, 2020, 7:54pm

LIMIT refers to rows. In your case, a row is a separate distinct path resulting from your MATCH.

The current query with LIMIT 400 means that only 400 paths are being returned (the top 400 paths by path length), and the nodes visualized are those that make up those 400 paths.

A LIMIT of 20 removing direction means that the top 20 paths will be returned. However in order to know if you have the top 20 paths, all paths of that pattern must be found first and evaluated (since you have a LIMIT following ORDER BY that should make it a Top operation so it should go easier on the heap, since only 20 paths need to be kept at a time). Another potential problem is that you have no upper bound on your var-length path. It is possible that the bottleneck is simply finding the sheer number of all possible permutations of paths of any length. You might consider adding upper bounds and possibly count() (without the ordering) to see what kind of numbers you're working with for the number of paths involved.

pborah88 · February 19, 2020, 10:07pm

So is 1 path = 1 hop? If not, I don't see 400 paths in that graph. Actually on a closer look, I don't see 400 paths in there anyway. In the neovis graph I mean.

andrew_bowman · February 19, 2020, 10:41pm

No. You don't have an upper bound on your pattern -[r:INTERACTS_WITH*]->, so 1 path can be a minimum of 1 hop, but no maximum. Paths can overlap, so you might have 1 path of 1 hop from node A to node B, but have another path of 2 hops from node A to node B to node C. Etc.

The paths collectively form your graph, and there is very likely intersection between the nodes and relationships comprising the paths.

If you want to get the number of paths, then check this:

match p=(d:DataSet)-[r:INTERACTS_WITH*]->(d1:DataSet)
return count(p) as pathCount

pborah88 · February 19, 2020, 10:58pm

Okay now it makes sense. 640 nodes on the lefthand side and 779 nodes on the righthand side form 2733 paths. So there are definitely overlaps. But now I'm even more curious as to the details of what the limit 400 is comprised of in the output.

andrew_bowman · February 19, 2020, 11:04pm

LIMIT always applies to the number of rows being returned.

In this query:

match p=(d:DataSet)-[r:INTERACTS_WITH*]->(d1:DataSet) return count(p) as pathCount

Since I'm using an aggregation without a grouping key, this will return a single row with the count, so with a limit, without a limit, it doesn't matter, the result will be on a single row.

In your original query:

match p=(d:DataSet)-[r:INTERACTS_WITH*]->(d1:DataSet)
return p
order by length(p) desc limit 400

since you're returning p (and you're not using aggregations) each row will have a full path (which is an ordered series of nodes connected by relationships), each of which could be comprised of a potentially unlimited (but not infinite) number of nodes and relationships, since no upper bound is provided.

tony.chiboucas · February 21, 2020, 7:35pm

An aside, worth considering:

Your dataset has cycles (loops). The longest path is of length ∞.

pborah88 · February 21, 2020, 8:12pm

Great. Thank you for pointing that out. So again, just trying to understand how neo4j works.... How come then, it's not going into an infinite loop when the query executes? And how come it comes back with a result that says longest path length = 8?

andrew_bowman · February 21, 2020, 10:33pm

Cypher uses a type of uniqueness behavior called relationship isomorphism (also referred as RELATIONSHIP_PATH uniqueness within the traversal API), which means that per path, relationships must be unique, they cannot be reused. By definition an infinite path requires the ability to traverse the same relationships over and over, so with Cypher alone you cannot get caught in an infinite loop.

Topic		Replies	Views
Getting paths of any length or long paths does not work Cypher performance	9	1561	May 9, 2020
Limit on paths Cypher	2	349	November 5, 2021
Using max function in a Cypher query causes two different paths of the same length to merge in the output Newbie Questions cypher	2	1353	December 3, 2019
Query to return all possible paths return infinite no of paths Cypher performance , paths	12	1129	March 7, 2022
Break shortestPath loop having "WHERE property IN [..]" Cypher cypher , shortest-path	14	775	May 5, 2021

How to capture intersecting paths?

Related topics