Query slow performance

erenkzly · March 1, 2024, 2:48pm

Hello, I am trying to understand that why following query takes long time to return results:

PROFILE
with ['frace_paris', 'spain_barcelona', 'spain_madrid'] as cities,
'turkey_istanbul' as startCity
MATCH p = (l1:Location {name :'turkey_istanbul', day: '2024/03/18'})-[:DEPARTURE]->(f1:Flight)-[:ARRIVAL]->(l2:Location)-[:NEXT*3..3]->(l3:Location)
-[:DEPARTURE]->(f2:Flight)-[:ARRIVAL]->(l4:Location)-[:NEXT*3..3]->(l5:Location)
-[:DEPARTURE]->(f3:Flight)-[:ARRIVAL]->(l6:Location {name : 'turkey_istanbul'})
where 
l2.name in cities and l4.name in cities and l3.name in cities and l5.name in cities
return f1.price + f2.price + f3.price as totalPrice order by totalPrice limit 10

arne.fischereit · March 1, 2024, 3:12pm

Welcome @erenkzly to the Neo4j community,

The profile that you attached shows that the query engine started with l1, ploughed through short of 30 million intermediate results and spent a lot of time on the filters that were only possible to evaluate after reaching the end of the pattern at l6.

So, in essence: Your query contains a long pattern which is very generic on the nodes in the middle.

Does that help?

erenkzly · March 1, 2024, 3:27pm

Thanks for the reply. I have two range indexes for Location label on name and day fields. That is why i expected to filtre the results quickly.

arne.fischereit · March 14, 2024, 5:24pm

But it is only the end nodes of the pattern that could be obtained from these indexes.

There are now generally two variants how we could approach this pattern:

Using a JOIN

In this scenario, we start from both sides, produce all combinations and see whether we meet on some middle node.

This is the approach that relational databases usually choose and if "all combinations" sounds like a very large number, that is exactly the point and will probably not perform very well.

Using Expands

In this scenario, we start from one side and only explore the relationships that are bound to the node(s) that we have encountered so far, to check in the end whether we reached the right node.

This scenario only uses the index on one side but tends to have far lower intermediate results to process.

Conclusion

Neo4j's execution engine considers both alternatives and chooses the one which it deems more performant. "Quickly" is relative to all other alternatives to answer your query.
With large patterns, indexes only provide a quick starting point but we still need to explore all relationships. What you can do, however, is to provide more information about the nodes and relationships within the pattern (are there any restrictions on the properties), so that we could potentially discard some of the intermediate results.

Topic		Replies	Views
Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice Cypher performance , cypher	5	425	December 18, 2020
Query optimalisation Cypher	25	2410	February 25, 2019
Cypher query slow performance Cypher cypher	5	544	November 12, 2023
Extremely slow query when profile looks very good? Cypher	11	4081	October 3, 2019
Why is this geospatial search so slow? Cypher	24	1291	January 19, 2021

July Summer Fun!

Query slow performance

Using a JOIN

Using Expands

Conclusion

Related topics