How to make multi level path traversal faster

apoc
performance
configuration
cypher

(Raj) #1

I want to find all the paths from the leaf node(E) to the root node(A).
Not for any specific node so no id or filed filter here.

The data model is as shown in the screenshot.

I used a basic Cypher query to find the paths:

MATCH path=(:A)-[:USE*]->(:E) RETURN path

This is to return all the paths from A to E: It took 254 seconds to return 18k paths.

I also want to return paths from B, C, D to E. I have written this query for that

MATCH path=(n)-[:USE*]->(:E) RETURN path

How can I improve this traversal to return results in 10-15 seconds?

System Configurations:
System memory is 16GB
Storage: SSD

Current Neo4j Conf:
heap size: min-3GB max-3GB
database size: 1.5GB


(Andrew Bowman) #2

We'll need some information here.

Can you do a PROFILE of the query, expand all elements of the query plan, and add it here?

If we assume you're working with a tree, then it will be much faster if we start from :E nodes (leaves) rather than root nodes (since there is only one possible path from a leaf up to a root), and this is probably where the query is being planned suboptimally. If there are few roots and many leaves, the planner is likely using a label scan of :A first, which won't execute well (the planner doesn't understand this is a tree structure, and that for this particular use case it would be more efficient to start with :E nodes).

We need to provide a hint to the planner to start at :E nodes. We can do this through a scan hint.

Try this:

MATCH path=(:A)-[:USE*]->(e:E) 
USING SCAN e:E
RETURN path

(Raj) #3

@andrew.bowman I tried with USING SCAN, there is no change.

When I run the PROFILE query (with or without USING SCAN takes same time- 6 to 14 seconds), it also returns the result i.e. all the paths 18k.
I think there is some issue with Neo4j browser, it's trying to plot the graph and due large size it's taking time. So I am trying from program to get the results.

Here is the plan:
plan