I upgraded my Neo4j docker container from
and I'm facing some huge performance issues when querying a simple variable length path traversal.
A query like the following:
MATCH (t:Tree)-[:HAS_CHILD_BLOB|HAS_CHILD_TREE*]->(b:Blob) WHERE t.hash = '51ac215688f1b1713e9a7992292e75f86defd5c9' RETURN count(b)
This query takes around
200 seconds to execute, for a count of 300 blobs btw ...
I don't recall having so much performance issues with the previous version.
When I ssh on my instance, I see the CPU spike at 100% all the time when the query is being executed.
Is that expected ?
Here is the profile of the query:
Cypher version: CYPHER 4.4, planner: COST, runtime: INTERPRETED. 353729021 total db hits in 201249 ms.
All I want to do is to reach all the children of this
And for information, the maximum path length is ...
Any tips ?
Something wrong with my query, or the planner here ?
I just tried to run the same query (on a different data set), on AuraDB:
Started streaming 1 records in less than 1 ms and completed after 94 ms.
It works fine.
But the query planner is different:
What's happening on my docker instance ??
This looks like some buggy planner behavior. Can you let us know how many :Tree nodes are in your db, how many :Blob nodes, how many :HAS_CHILD_BLOB relationships, and how many :HAS_CHILD_TREE relationships?
In the meantime, see if this performs better.
MATCH (t:Tree)-[:HAS_CHILD_BLOB|HAS_CHILD_TREE*]->(b) WHERE t.hash = '51ac215688f1b1713e9a7992292e75f86defd5c9' WITH t, b, 1 as ignored WHERE b:Blob RETURN count(b)
We want to see a plan like the second one you posted, that only expands from your starting t node and doesn't perform any label scans or cartesian product operations.