Poor Performance on Consuming/Returning Millions of Rows

tom.russell · January 18, 2021, 8:51pm

We see very slow performance for some of our queries that returns lots of results (100ks to 1Ms of rows). Analyzing the PROFILE it seems that the consumption of the results is what is eating up the time and not the querying itself. We tried reducing the overhead of data being returned by doing a collection of the property that returns in one row vs millions of rows of one property. However, this approach has limitations and wandering if there is any insight on what else can be done. The following is a general representation of the query. The pattern match does not seem to be the bottleneck and the availability only increases slightly in version 2 since we encapsulated in collect() aggregation.

Neo4j Version: 3.5
Is there any improvement on the transfer/consumption of results in 4.X or is that primarily a bandwidth and driver issue?

Basic Return Version:
Query: MATCH (a:Label1)-[:relation*]->(b:Label2) RETURN b.Property as result
Available: 13s
Consumed: 189.5s
Screenshot of Actual PROFILE

Collection Return Version:
Query: MATCH (a:Label1)-[:relation*]->(b:Label2) RETURN collect(b.Property) as result
Available: 17s
Consumed: 0.15s
Screenshot of Actual PROFILE

clem · January 18, 2021, 9:09pm

One obvious (sorry) question, is do you really need to do [:relation*] (infinite depth) or can you make the search depth smaller? E.g. [:relation*5]

As mentioned here:

(nodeA)-[:RELTYPE*]->(nodeB)

Retrieve all paths of any length with the relationship, :RELTYPE from nodeA to nodeB or from nodeB to nodeA and beyond. This is usually a very expensive query so you should place limits on how many nodes are retrieved:

tom.russell · January 18, 2021, 10:20pm

We do have constructors/paramterized queries that get created that introduce depth limits (which I omitted origianlly as I did not expect that to be a factor). Given what I have researched about this I do not expect changes in the pattern match part of the query to have an impact. However, in doing a quick test, introducing a the depth limitation had similar result availability performance and did not impact the consumption performance at all for either version of the query.

Topic		Replies	Views
Performance query over millions of relationships Cypher	2	2584	January 31, 2020
Need advice on performance tuning for Neo4j Cypher on a large dataset with relationships Cypher performance , cypher	16	61	July 9, 2025
Poor performance when querying long node string Cypher performance , cypher , relationship	4	385	June 15, 2021
A slow running cypher query Cypher performance	8	5189	February 10, 2020
Improving the performance of a cypher query Neo4j Graph Platform	15	724	October 26, 2020

July Summer Fun!

Poor Performance on Consuming/Returning Millions of Rows

Related topics