cancel
Showing results for 
Search instead for 
Did you mean: 

Neo4j CPU 100% for a simple variable-length path traversal

Wenzel
Node

Hi,

I upgraded my Neo4j docker container from 4.2.4 to 4.4.2 today,
and I'm facing some huge performance issues when querying a simple variable length path traversal.

A query like the following:

MATCH (t:Tree)-[:HAS_CHILD_BLOB|HAS_CHILD_TREE*]->(b:Blob)
WHERE t.hash = '51ac215688f1b1713e9a7992292e75f86defd5c9'
RETURN count(b)

This query takes around 200 seconds to execute, for a count of 300 blobs btw ...
I don't recall having so much performance issues with the previous version.

When I ssh on my instance, I see the CPU spike at 100% all the time when the query is being executed.
Is that expected ?

Here is the profile of the query:

Cypher version: CYPHER 4.4, planner: COST, runtime: INTERPRETED. 353729021 total db hits in 201249 ms.

All I want to do is to reach all the children of this Tree, recursively.

And for information, the maximum path length is ... 11:

Any tips ?
Something wrong with my query, or the planner here ?

I just tried to run the same query (on a different data set), on AuraDB:

Started streaming 1 records in less than 1 ms and completed after 94 ms.

It works fine.
But the query planner is different:

What's happening on my docker instance ??

Thanks !

3 REPLIES 3

dana_canzano
Neo4j
Neo4j

@Wenzel

if you preface the query with

CYPHER runtime=pipelined

do you get better results?

Hi @dana.canzano and thank you very much for your answer !

As I'm running the community edition where the performance issue is triggered, I can't test your query tuning trick, since it's limited on the Enterprise edition only.

This looks like some buggy planner behavior. Can you let us know how many :Tree nodes are in your db, how many :Blob nodes, how many :HAS_CHILD_BLOB relationships, and how many :HAS_CHILD_TREE relationships?

In the meantime, see if this performs better.

MATCH (t:Tree)-[:HAS_CHILD_BLOB|HAS_CHILD_TREE*]->(b)
WHERE t.hash = '51ac215688f1b1713e9a7992292e75f86defd5c9'
WITH t, b, 1 as ignored
WHERE b:Blob
RETURN count(b)

We want to see a plan like the second one you posted, that only expands from your starting t node and doesn't perform any label scans or cartesian product operations.