cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Neo4j Cypher query execution optimization

myshareit
Node Clone

I have the following Cypher query:

 
MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Decision:Profile ) 
MATCH (childD)-[:EMPLOYMENT_AS]->(root2:Employment ) 
WHERE root2.id IN ([1]) WITH DISTINCT childD, dg, rdgd  MATCH path3=(root3:Location )-[:CONTAINS*0..]->(descendant3:Location) 
WHERE (descendant3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]) 
OR root3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])) 
UNWIND nodes(path3) AS pathNode3 
WITH childD, dg, rdgd, COLLECT(DISTINCT pathNode3) AS pathNodes3 
MATCH (childD)-[:LOCATED_IN]->(pathNode3) 
WHERE pathNode3 IN pathNodes3 
RETURN DISTINCT childD
 
which runs slowly:
 
plan(2).png
 
Could you please help to optimize this query? I need to check that childD variable :LOCATED_IN Locations (with provided IDs) or their descendants 
2 REPLIES 2

dana_canzano
Neo4j
Neo4j

@myshareit 

Is there an index on :Location and property id?

 

 

glilienfield
Ninja
Ninja

In addition to the index improvements recommended by @dana_canzano, I think you can also optimize the query. The way I read it, you are performing the same 'match' query to get the 'path3' for each 'childD, dg, rdgd' combination. This is because a Cartesian product is computed when you have the 'with' followed by the 'match'. I removed that by collecting all the 'childD' nodes and passing this with the 'with.' Since the collection result in only one row, only one 'match' is executed. The result is a row for each 'path3', with the same 'children' list appended to each row. I also removed the passing of 'rd' and 'rdgd', since they were not utilized in the rest of the query. 

What it seems you are returning is the list of all childD nodes that have an existing relationships with any of the nodes along any of the path3 rows that are returned. The refactored query should give you that. 

Sorry, I did have any test data to validate it gives the correct results. 

MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Decision:Profile ) 
MATCH (childD)-[:EMPLOYMENT_AS]->(root2:Employment ) 
WHERE root2.id IN ([1]) 
WITH collect(DISTINCT childD) as children
MATCH path3=(root3:Location )-[:CONTAINS*0..]->(descendant3:Location) 
WHERE (descendant3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]) 
OR root3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])) 
UNWIND nodes(path3) AS pathNode3
WITH distinct pathNode3, children
UNWIND children as child
MATCH (child)-[:LOCATED_IN]->(pathNode3) 
RETURN distinct child