Neo4j GC Problems


(Marko Coric) #1

In December 2017 we start with huge project, and since than our database growth every day (it's on 40GB at the moment).

ID Allocation
Node ID: 70322184
Property ID: 362545374
Relationship ID: 84805115
Relationship Type ID: 49

We have server with 16vCPU and 48GB, and it's configured:

16GB Memory Heap (initial and max)
24GB Page Cache

During work period, we were monitoring Neo4j Server with Zabbix, but graphs are not promising. We were forced to restart servers every 2-3 month (max) since Old GC are going crazy (last night it was around 120 sec). Also, while we monitor server we noticed there is some Mixed GC (Young/Old generation space), but after few days it completely stops. And Old generation space just increase till it fill like 90% of Heap and after that Old CG kicks in.

I read a lot of documentation on how to tune G1GC, analyzing GC logs but no success. Here's some setup parts:

dbms.jvm.additional=-XX:MaxGCPauseMillis=750
dbms.jvm.additional=-XX:G1MixedGCLiveThresholdPercent=60
dbms.jvm.additional=-XX:G1HeapWastePercent=3
dbms.jvm.additional=-XX:G1MixedGCCountTarget=16
dbms.jvm.additional=-XX:+ParallelRefProcEnabled

I tried to tune "G1MixedGCLiveThresholdPercent", "G1HeapWastePercent" and "G1MixedGCCountTarget" but no success. For new setup I will enable:

dbms.jvm.additional=-XX:G1NewSizePercent=25
dbms.jvm.additional=-XX:G1MaxNewSizePercent=50

to try to force Young/Old ratio.

Anyone have any idea what we are missing here, since at the moment we are out of ideas. Also, does additional JVM setup actually affect Neo4j server or they are just useless?


(Michael Hunger) #3

Can you share more details on your read and write workloads? best by enabling the query log

Also for larger graphs you also need to grow the page-cache.
Usually you shouldn't need to do GC tuning G1GC works pretty well.

But in general you should restart your server regularly anyway to apply upgrades/patches.

Please also enable GC loggin in neo4j.conf and share the GC logs over a few days.


(Stefan Armbruster) #5

Additionally: are you using any add-on libraries?


(Marko Coric) #6

Think I'm not able to turn on query logging in Community Edition. Also, we are not able to move to Enterprise Edition. We have a lot of query tuning because I found that some Cypher queries are not working in Enterprise Edition default plan (was already reported on Slack group month ago). About workload, here's last three GC logs (third is current one):
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDkvMTIvLS1nYy5sb2cuMi0tNy0zNi0yOA==
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDkvMTIvLS1nYy5sb2cuMy0tNy01MC0xOA==
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDkvMTIvLS1nYy5sb2cuNC5jdXJyZW50LS03LTUxLTM4

24GB Page Cache should be enough, cause we face same problem with 2GB DB size and now with 40GB.

Yup, know that. But we need DB to be 100% uptime between major releases (when we plan small down times).

Sure, I can provide you last 5 logs.

Also, database is on Virtualization (think it's vmware). Could that be a problem?

Just APOC package, nothing special. But function from APOC is used only in few queries, nothing special.


(Stefan Armbruster) #7

I did experience lengthy VM pauses e.g. when the hypervisor performs a vm snapshot. In debug.log this gets reported as application threads stopped for xxxxx ms.