Slow Aggregation when dealing with 1M+ node

Slow Aggregation when dealing with 1M+ node

My neo4j server config:

version: 4.3.3
dbms.memory.pagecache.size: 13G
dbms.memory.heap.initial_size: 13G
dbms.memory.heap.max_size: 13G

Now i want to do a simple aggregation on nodes, no extra filters

match (n:OrderLine{tenantId:'TENANT-A', type:'OrderLine'}) return n.status, avg(n.value)

I want to get all nodes by tenantId and type, which is around 1,880,000 (1M+ OrderLine nodes)
And then do an aggregation, group by status field and avg/sum function

It costs around 5-7s , and my target is under 3 seconds.


The PROFILE result looks like above png.

My question is:

  • Why it costs so much when facing 1M+ nodes
  • Is there any solution to reduce the time code?
  1. Do you have any indexes? e.g. on tenantId ?
  2. remove the type property.
  3. Upgrade your version to the latest if possible, the runtime gets faster with every version
  4. if you don't use enterprise edition which has a better runtime
  5. do you have high iops/a fast NVMe SSD? For the initial run you're mostly measuring disk speed.
  6. if you want to speed it up more, you can add an index on the fields
  7. then your EXPLAIN or PROFILE should show the more efficient plan

create index orderline_tenant for (o:OrderLine) on (o.tenantId);

// or compound index
create index orderline_tenant_status_value for (o:OrderLine) on (o.tenantId, o.status, o.value);

unwind range(1,10000) as id create (o:OrderLine {id:id, tenantId:'TENANT-A',status:toInteger(rand()*10), value:round(rand()*100,2)});

profile
match (n:OrderLine{tenantId:'TENANT-A'}) 
// indication for the index to fetch the values
where n.status is not null and n.value is not null
return n.status, avg(n.value), sum(n.value);

yeah i have created a compound index for tenant+type, you can see it in my images of PROFILE cypher.