We have a neo4j database running with ~ 44mil nodes and ~200mil relationships.
Here are some background information:
- neo4j version: 5.15
- database size: 37GB
- os cpu: 4 cpu
- os ram: 32GB memory
- memory configurations:
server.memory.heap.initial_size=13000m
server.memory.heap.max_size=13000m
server.memory.pagecache.size=18000m
Our users are suffering from poor performance when querying the database, especially when calculating aggregation (e.g. count or sum). We are considering different scaling option to improve the overall query speed.
Scale out:
We have done some testing on our side to scale out the database and form a cluster.
We created an eks cluster, with 3 neo4j nodes.
OS and database configurations are the same as existing environment, when running the same slow query, the performance is almost the same, or even worse (which i am not sure why..)
My question is:
Does neo4j clustering ever improves query speed? From the official cluster introduction,
it says, " Scale: Servers hosting databases in secondary mode provide a massively scalable platform for graph queries that enables very large graph workloads to be executed in a widely distributed topology."
I thought it will be like hadoop/spark architecture which scale out and distribute the query workload across multiple workers/nodes, which can run in parallel?
Scale up:
For this we will just increase the resource (right now we are using aws ec2 r6i.xlarge), we will be testing this option soon.
Appreciate it if anyone could share experience or best practices in scaling and improving query speed in general. Thanks.