Neo4j CPU 100% for a simple variable-length path traversal

Wenzel · December 23, 2021, 7:49pm

Hi,

I upgraded my Neo4j docker container from 4.2.4 to 4.4.2 today,
and I'm facing some huge performance issues when querying a simple variable length path traversal.

A query like the following:

MATCH (t:Tree)-[:HAS_CHILD_BLOB|HAS_CHILD_TREE*]->(b:Blob)
WHERE t.hash = '51ac215688f1b1713e9a7992292e75f86defd5c9'
RETURN count(b)

This query takes around 200 seconds to execute, for a count of 300 blobs btw ...
I don't recall having so much performance issues with the previous version.

When I ssh on my instance, I see the CPU spike at 100% all the time when the query is being executed.
Is that expected ?

Here is the profile of the query:

Cypher version: CYPHER 4.4, planner: COST, runtime: INTERPRETED. 353729021 total db hits in 201249 ms.

All I want to do is to reach all the children of this Tree, recursively.

And for information, the maximum path length is ... 11:

Any tips ?
Something wrong with my query, or the planner here ?

I just tried to run the same query (on a different data set), on AuraDB:

Started streaming 1 records in less than 1 ms and completed after 94 ms.

It works fine.
But the query planner is different:

What's happening on my docker instance ??

Thanks !

dana_canzano · January 1, 2022, 2:20pm

@Wenzel

if you preface the query with

CYPHER runtime=pipelined

do you get better results?

Wenzel · January 2, 2022, 8:43pm

Hi @dana_canzano and thank you very much for your answer !

As I'm running the community edition where the performance issue is triggered, I can't test your query tuning trick, since it's limited on the Enterprise edition only.

andrew_bowman · January 5, 2022, 1:57am

This looks like some buggy planner behavior. Can you let us know how many :Tree nodes are in your db, how many :Blob nodes, how many :HAS_CHILD_BLOB relationships, and how many :HAS_CHILD_TREE relationships?

In the meantime, see if this performs better.

MATCH (t:Tree)-[:HAS_CHILD_BLOB|HAS_CHILD_TREE*]->(b)
WHERE t.hash = '51ac215688f1b1713e9a7992292e75f86defd5c9'
WITH t, b, 1 as ignored
WHERE b:Blob
RETURN count(b)

We want to see a plan like the second one you posted, that only expands from your starting t node and doesn't perform any label scans or cartesian product operations.

Wenzel · July 2, 2024, 10:43pm

Hi @andrew_bowman ,

I realized that I didn't followed up on this topic, and now I'm facing the same problem 2 years later:

github.com/neo4j/neo4j

CPU Overload on Cypher Query Post-Update to Neo4j 5.20

opened 08:49PM - 02 Jul 24 UTC

Wenzel

bug

## Guidelines - Neo4j version: 5.20 - Operating system: Ubuntu 22.04 - API/…Driver: Neo4j browser ## Example bug report I discovered that upon updating my database from Neo4j v4.15 to v5.20, a specific Cypher query that used to perform under a few seconds now takes forever and just hangs in the database, never returning, while taking the CPU at 100%. In both cases, the indexes are the same. ### Neo4j 5.15 `Started streaming 95432 records after 241 ms and completed after 2416 ms, displaying first 1000 rows.` ### Neo4j 5.20 Never returns (even after one hour). Nothing suspicious in the log output (no java stacktraces, exception or log message that would explain it) This is problematic because I would have prefer a memory heap overflow or a crash, but at least that the query returns. Now I might expect my programs to hang somewhere when they perform a Cypher query. ### Steps to reproduce I can share the relevant dataset as well as the query if a maintainer wishes to repro this issue.. One thing to note: I used the `EXPLAIN` keyword to obtain the execution plan, and for the same query it differs between the 2 versions, with 5.20 a **cartesian product** is inserted. I believe this is the key issue here. ### Expected behavior The query should have completed under 30 seconds. ### Actual behavior The query never returns and the CPU is maxed out to 100% forever.

When I take that simple query on Neo4j 5.21:

MATCH (t:Tree)-[:HAS_CHILD_BLOB|HAS_CHILD_TREE*]->(b:Blob)
WHERE t.hash = $root_hash
RETURN count(b)

I still get a query plan including a cartesian product.

And with the solution you proposed (WITH t, b, 1 as ignored), the product is gone and the query executes as expected.

Do you still think it might be a bug in the planner ?

Like in the Github issue I described, I wasn't facing these issues with Neo4J 5.15.

For example, the same query on 5.15 gives me this plan:

How can I get a reliable execution plan, excluding a cartesian product, to traverse all the blobs given a specific Tree ?

andrew_bowman · July 3, 2024, 11:33am

This is going beyond just query tuning, and requires a check from our engineers.

So you have a workaround at the moment, but for determining if the planner's choice is a result of a bug, or otherwise something our engineers can change in the planner code, the github issue is actually the better place for getting an answer for that, as you'll have the attention of our Cypher engineers.

Topic		Replies	Views
How to make multi level path traversal faster Cypher apoc , performance , configuration , cypher	8	5177	December 4, 2019
Upgrade Kills Query Cypher	9	1170	November 13, 2018
Getting paths of any length or long paths does not work Cypher performance	9	1529	May 9, 2020
Variable length path traversal Cypher performance , cypher	10	1088	September 27, 2023
Cypher query slow performance Cypher cypher	5	544	November 12, 2023

August Summer Fun!

Neo4j CPU 100% for a simple variable-length path traversal

Related topics