I am new to Cypher. We are attempting to query our data and running into performance problems.
We have a recursive structure that relates events via identifiers. A single event has multiple identifiers and each identifier can appear in multiple events
We are looking for a query that can give us all the events that are connected via identifiers.
We've tried writing something like this
MATCH (e:Event)-[*0..]-(e2:Event) RETURN e2
We have an index that allows us to filter the start positions for the traversal efficiently but the recursive nature of the traversal means that this query is timing out. If we limit the recursion to a small number is completes but returns lots of results.
It looks like it is collecting all possible paths from the start event to any other event, including itself and counting paths multiple times.
I guess that in order to make it efficient I need to prevent it from revisiting nodes that it has already traversed.
Other that remodelling my data how can I make this query more efficient?
Thanks for your reply but I want all the nodes in the chain including the end node.
Between Event and Id there is a HAS_ID relationship from Event to Id however this is the only relationship in the graph so specifying it doesn't filter out anything else.
It thought about using the direction but when these exist in a chain Event->Id<-Event->Id<-Event.... I don't see how I can use the direction to restrict the query to visiting each node once.
I have found apoc.neighbors.tohop which performs better but still not fast enough for what I need.
I am guessing a lot today, but I think this is what you are after:
// Create sample data
create (:Event{id:"start"})-[:HAS]->(:Id)<-[:HAS]-(:Event )-[:HAS]->(:Id)<-[:HAS]-(:Event )-[:HAS]->(:Id)<-[:HAS]-(:Event{id:"end"})
// Find the whole chain "start" to "end" by starting from the "start" node (unknown end node)
match pattern= (start:Event{id:"start"}) ( ()-[:HAS]->(:Id)<-[:HAS]-() ) {1,100} (end:Event)
where count { (end)-[:HAS]->() } = 1 // Last event only has one HAS relationship
return pattern
Good guesses but I think haven't given enough information.
It's not so much a chain as a network of Event and Id nodes.
In my model an Event is connected to multiple Id nodes and each Id node is connected to multiple events. By starting with any Event I need to find all the Events is is transitively connected to.
The Event and Id nodes tend to be very well connected so even with 100 Event and 100 Id nodes there may be close to 100 * 100 connections there are many possible ways to traverse this graph and closed loops. I'm not sure how the Neo4J traversal works but I am guessing that it doesn't keep track of visited nodes which is what makes this query slow.
but if you want to match a pattern with cypher, we are going to need know more about your data for someone to be able to figure a strategy. As you pointed out, graph traversal quickly gets combinatorially big. only the specifics of the data can help reduce search space