A more efficient way to traverse recursive relationships

I am new to Cypher. We are attempting to query our data and running into performance problems.

We have a recursive structure that relates events via identifiers. A single event has multiple identifiers and each identifier can appear in multiple events

Something like this

(:Event)-[]-(:Id)-[]-(:Event )-[]-(:Id)-[]-(:Event )...

We are looking for a query that can give us all the events that are connected via identifiers.

We've tried writing something like this

MATCH (e:Event)-[*0..]-(e2:Event) RETURN e2

We have an index that allows us to filter the start positions for the traversal efficiently but the recursive nature of the traversal means that this query is timing out. If we limit the recursion to a small number is completes but returns lots of results.

It looks like it is collecting all possible paths from the start event to any other event, including itself and counting paths multiple times.

I guess that in order to make it efficient I need to prevent it from revisiting nodes that it has already traversed.

Other that remodelling my data how can I make this query more efficient?

To get "fewer" results it may be as simple as:

MATCH (e:Event)-[*1..1000]-(e2:Event) RETURN e2
WHERE NOT EXISTS {  (e2)-->() } //Make sure e2 is end of chain 
RETURN e2 limit 10

Also consider using relationship type and direction of the relationship in your query.

Look at some examples here: Concepts - Cypher Manual if you need to do more advanced matching of the "repeating pattern"

Thanks for your reply but I want all the nodes in the chain including the end node.

Between Event and Id there is a HAS_ID relationship from Event to Id however this is the only relationship in the graph so specifying it doesn't filter out anything else.

It thought about using the direction but when these exist in a chain Event->Id<-Event->Id<-Event.... I don't see how I can use the direction to restrict the query to visiting each node once.

I have found apoc.neighbors.tohop which performs better but still not fast enough for what I need.

Do like this:

MATCH pattern=(start:Event{id:123123})-[*1..1000]-(end:Event) 
WHERE NOT EXISTS {  (end)-->() } //Make sure end is end of chain 
RETURN pattern limit 10

Because I suspect you are getting "fragments" of the pattern => many rows returned that is just "waste".

Or, let me try to formulate it as a question back to you. Do you want to find all these paths:

(:Event)-[]-(:Id)-[]-(:Event )
(:Event)-[]-(:Id)-[]-(:Event )-[]-(:Id)-[]-(:Event )
(:Event)-[]-(:Id)-[]-(:Event )-[]-(:Id)-[]-(:Event )-[]-(:Id)-[]-(:Event )

or are you rather just interested in the last one? I.e. the path that takes you from start to finish.

I am guessing a lot today, but I think this is what you are after:

// Create sample data
create (:Event{id:"start"})-[:HAS]->(:Id)<-[:HAS]-(:Event )-[:HAS]->(:Id)<-[:HAS]-(:Event )-[:HAS]->(:Id)<-[:HAS]-(:Event{id:"end"})

// Find the whole chain "start" to "end" by starting from the "start" node (unknown end node)
match pattern= (start:Event{id:"start"}) ( ()-[:HAS]->(:Id)<-[:HAS]-() ) {1,100} (end:Event)
where count { (end)-[:HAS]->() } = 1 // Last event only has one HAS relationship 
return pattern

Good guesses but I think haven't given enough information.

It's not so much a chain as a network of Event and Id nodes.

In my model an Event is connected to multiple Id nodes and each Id node is connected to multiple events. By starting with any Event I need to find all the Events is is transitively connected to.

The Event and Id nodes tend to be very well connected so even with 100 Event and 100 Id nodes there may be close to 100 * 100 connections there are many possible ways to traverse this graph and closed loops. I'm not sure how the Neo4J traversal works but I am guessing that it doesn't keep track of visited nodes which is what makes this query slow.

Perhaps the Weakly Connected Components algorithm is what I'm looking for. Except I'm using AuraDB which doesn't seem to have the GDS plugin.

hi, if you think GDS can help you, you can have an AuraDS instance here Neo4j Deployment Center - Graph Database & Analytics

but if you want to match a pattern with cypher, we are going to need know more about your data for someone to be able to figure a strategy. As you pointed out, graph traversal quickly gets combinatorially big. only the specifics of the data can help reduce search space

If you want to know more about how the Cypher planner get things done, I recommend you read this Slow Cypher Statements and How to Fix Them | by Adam Cowley | Neo4j Developer Blog | Medium