Slow Aggregation when dealing with 1M+ node
My neo4j server config:
version: 4.3.3
dbms.memory.pagecache.size: 13G
dbms.memory.heap.initial_size: 13G
dbms.memory.heap.max_size: 13G
Now i want to do a simple aggregation on nodes, no extra filters
match (n:OrderLine{tenantId:'TENANT-A', type:'OrderLine'}) return n.status, avg(n.value)
I want to get all nodes by tenantId and type, which is around 1,880,000 (1M+ OrderLine nodes)
And then do an aggregation, group by status
field and avg/sum
function
It costs around 5-7s , and my target is under 3 seconds.
The PROFILE result looks like above png.
My question is:
- Why it costs so much when facing 1M+ nodes
- Is there any solution to reduce the time code?
- Do you have any indexes? e.g. on tenantId ?
- remove the
type
property.
- Upgrade your version to the latest if possible, the runtime gets faster with every version
- if you don't use enterprise edition which has a better runtime
- do you have high iops/a fast NVMe SSD? For the initial run you're mostly measuring disk speed.
- if you want to speed it up more, you can add an index on the fields
- then your EXPLAIN or PROFILE should show the more efficient plan
create index orderline_tenant for (o:OrderLine) on (o.tenantId);
// or compound index
create index orderline_tenant_status_value for (o:OrderLine) on (o.tenantId, o.status, o.value);
unwind range(1,10000) as id create (o:OrderLine {id:id, tenantId:'TENANT-A',status:toInteger(rand()*10), value:round(rand()*100,2)});
profile
match (n:OrderLine{tenantId:'TENANT-A'})
// indication for the index to fetch the values
where n.status is not null and n.value is not null
return n.status, avg(n.value), sum(n.value);
yeah i have created a compound index for tenant+type, you can see it in my images of PROFILE cypher.