A more efficient way to traverse recursive relationships

max.pimm · April 5, 2024, 11:18am

I am new to Cypher. We are attempting to query our data and running into performance problems.

We have a recursive structure that relates events via identifiers. A single event has multiple identifiers and each identifier can appear in multiple events

Something like this

(:Event)-[]-(:Id)-[]-(:Event )-[]-(:Id)-[]-(:Event )...

We are looking for a query that can give us all the events that are connected via identifiers.

We've tried writing something like this

MATCH (e:Event)-[*0..]-(e2:Event) RETURN e2

We have an index that allows us to filter the start positions for the traversal efficiently but the recursive nature of the traversal means that this query is timing out. If we limit the recursion to a small number is completes but returns lots of results.

It looks like it is collecting all possible paths from the start event to any other event, including itself and counting paths multiple times.

I guess that in order to make it efficient I need to prevent it from revisiting nodes that it has already traversed.

Other that remodelling my data how can I make this query more efficient?

hakan.lofqvist1 · April 5, 2024, 12:48pm

To get "fewer" results it may be as simple as:

MATCH (e:Event)-[*1..1000]-(e2:Event) RETURN e2
WHERE NOT EXISTS {  (e2)-->() } //Make sure e2 is end of chain 
RETURN e2 limit 10

Also consider using relationship type and direction of the relationship in your query.

Look at some examples here: Concepts - Cypher Manual if you need to do more advanced matching of the "repeating pattern"

max.pimm · April 5, 2024, 2:47pm

Thanks for your reply but I want all the nodes in the chain including the end node.

Between Event and Id there is a HAS_ID relationship from Event to Id however this is the only relationship in the graph so specifying it doesn't filter out anything else.

It thought about using the direction but when these exist in a chain Event->Id<-Event->Id<-Event.... I don't see how I can use the direction to restrict the query to visiting each node once.

I have found apoc.neighbors.tohop which performs better but still not fast enough for what I need.

hakan.lofqvist1 · April 5, 2024, 3:39pm

Do like this:

MATCH pattern=(start:Event{id:123123})-[*1..1000]-(end:Event) 
WHERE NOT EXISTS {  (end)-->() } //Make sure end is end of chain 
RETURN pattern limit 10

Because I suspect you are getting "fragments" of the pattern => many rows returned that is just "waste".

Or, let me try to formulate it as a question back to you. Do you want to find all these paths:

(:Event)-[]-(:Id)-[]-(:Event )
(:Event)-[]-(:Id)-[]-(:Event )-[]-(:Id)-[]-(:Event )
(:Event)-[]-(:Id)-[]-(:Event )-[]-(:Id)-[]-(:Event )-[]-(:Id)-[]-(:Event )

or are you rather just interested in the last one? I.e. the path that takes you from start to finish.

hakan.lofqvist1 · April 5, 2024, 3:49pm

I am guessing a lot today, but I think this is what you are after:

// Create sample data
create (:Event{id:"start"})-[:HAS]->(:Id)<-[:HAS]-(:Event )-[:HAS]->(:Id)<-[:HAS]-(:Event )-[:HAS]->(:Id)<-[:HAS]-(:Event{id:"end"})

// Find the whole chain "start" to "end" by starting from the "start" node (unknown end node)
match pattern= (start:Event{id:"start"}) ( ()-[:HAS]->(:Id)<-[:HAS]-() ) {1,100} (end:Event)
where count { (end)-[:HAS]->() } = 1 // Last event only has one HAS relationship 
return pattern

max.pimm · April 8, 2024, 8:10pm

Good guesses but I think haven't given enough information.

It's not so much a chain as a network of Event and Id nodes.

In my model an Event is connected to multiple Id nodes and each Id node is connected to multiple events. By starting with any Event I need to find all the Events is is transitively connected to.

The Event and Id nodes tend to be very well connected so even with 100 Event and 100 Id nodes there may be close to 100 * 100 connections there are many possible ways to traverse this graph and closed loops. I'm not sure how the Neo4J traversal works but I am guessing that it doesn't keep track of visited nodes which is what makes this query slow.

max.pimm · April 8, 2024, 8:44pm

Perhaps the Weakly Connected Components algorithm is what I'm looking for. Except I'm using AuraDB which doesn't seem to have the GDS plugin.

v2belleville · April 17, 2024, 2:50pm

hi, if you think GDS can help you, you can have an AuraDS instance here Neo4j Deployment Center - Graph Database & Analytics

but if you want to match a pattern with cypher, we are going to need know more about your data for someone to be able to figure a strategy. As you pointed out, graph traversal quickly gets combinatorially big. only the specifics of the data can help reduce search space

If you want to know more about how the Cypher planner get things done, I recommend you read this Slow Cypher Statements and How to Fix Them | by Adam Cowley | Neo4j Developer Blog | Medium

Topic		Replies	Views
Yet Another Repeat Until Question (YARUQ) Cypher	2	399	December 9, 2020
Find nodes in for a relationship recursively (deep relation) Cypher cypher	3	3819	September 16, 2020
Traverse graph recursively Cypher cypher , operations , relationship , neo4j-desktop	1	1279	October 20, 2020
Effective way to get all the nodes and relationships between two nodes Cypher performance , cypher	2	1313	September 24, 2023
Recursive Cypher query following relations with a given direction and property Cypher cypher	0	1898	February 4, 2019

A more efficient way to traverse recursive relationships

Related topics