Hello team,
we've run into a weird problem with memory consumption in our cluster setup.
Important note: all described problem is only specific to the cluster, when we run all same setup but with a single node (more hefty, though), the problem does not manifest itself.
Environment info:
- neo4j enterprise 4.2
- cluster consists of 5 members: 3 cores, 2 read replicas;
- each member is 16Gb/4 cores instance Azure VM;
- load intencity is very low, averaging at a few per sec and maxing (not often, not related to the problem) at a few dozens per second, mostly read queries;
- cpu averge is 8-9%, with peaks at 25%;
- the graph size is around 400 000 nodes and 2 000 000 edges;
The problem is that the memory on each member of the cluster steadily grows, until the system oom killer terminates the jvm.
The growing speed is on each node different, but the pattern is always the same.
Here is the memory consumption pattern:
All our queries are profiled/optimized and parametrized so eligible for planning caching. Playing with query plan cache parameters didn't do any good.
Now I tried to take a jvm dump from one of the members that was maxing out on memory, and analyze it using memory analyzing tool. Don't know if it is of any help, but here is some info:
Any hints in which direction to look, are greatly appreciated. We really want to go on with the cluster setup.