Slow Aggregation when dealing with 1M+ node

1033516561 · May 29, 2023, 8:17am

My neo4j server config:

version: 4.3.3
dbms.memory.pagecache.size: 13G
dbms.memory.heap.initial_size: 13G
dbms.memory.heap.max_size: 13G

Now i want to do a simple aggregation on nodes, no extra filters

match (n:OrderLine{tenantId:'TENANT-A', type:'OrderLine'}) return n.status, avg(n.value)

I want to get all nodes by tenantId and type, which is around 1,880,000 (1M+ OrderLine nodes)
And then do an aggregation, group by status field and avg/sum function

It costs around 5-7s , and my target is under 3 seconds.

The PROFILE result looks like above png.

My question is:

Why it costs so much when facing 1M+ nodes
Is there any solution to reduce the time code?

michael.hunger · May 30, 2023, 7:51am

Do you have any indexes? e.g. on tenantId ?
remove the type property.
Upgrade your version to the latest if possible, the runtime gets faster with every version
if you don't use enterprise edition which has a better runtime
do you have high iops/a fast NVMe SSD? For the initial run you're mostly measuring disk speed.
if you want to speed it up more, you can add an index on the fields
then your EXPLAIN or PROFILE should show the more efficient plan

create index orderline_tenant for (o:OrderLine) on (o.tenantId);

// or compound index
create index orderline_tenant_status_value for (o:OrderLine) on (o.tenantId, o.status, o.value);

unwind range(1,10000) as id create (o:OrderLine {id:id, tenantId:'TENANT-A',status:toInteger(rand()*10), value:round(rand()*100,2)});

profile
match (n:OrderLine{tenantId:'TENANT-A'}) 
// indication for the index to fetch the values
where n.status is not null and n.value is not null
return n.status, avg(n.value), sum(n.value);

1033516561 · May 31, 2023, 2:54am

yeah i have created a compound index for tenant+type, you can see it in my images of PROFILE cypher.

Topic		Replies	Views
Very slow performance when aggregate on node property Cypher performance	21	4693	December 21, 2020
My count() query is too slow Cypher cypher , counts	6	1462	June 18, 2021
Very slow performance when aggregate on node property Neo4j Graph Platform performance , migrated , cypher-tagged	2	318	March 5, 2023
Improving the performance of a cypher query Neo4j Graph Platform	15	722	October 26, 2020
How to Aggregate calculation of data faster? Cypher cypher	3	1115	March 3, 2019

Get Certified in June!

Slow Aggregation when dealing with 1M+ node

Related topics