Neo4j server slow startup time

After importing 17 million nodes and 85 million relationships into the Neo4j database (150gb), starting up and shutting down the Neo4j database is much slower. Start up now takes over 15 minutes as compared to 4 seconds prior to the data being imported:

2019-07-11 21:13:13.616+0000 INFO Starting...
2019-07-11 21:13:16.049+0000 INFO Bolt enabled on 9.42.117.142:7687.
2019-07-11 21:13:17.611+0000 INFO Started.
2019-07-11 21:13:18.790+0000 INFO Remote interface available at http://9.42.117.142:7474/
2019-07-12 03:41:05.896+0000 INFO Neo4j Server shutdown initiated by request
2019-07-12 03:41:05.951+0000 INFO Stopping...
2019-07-12 03:41:06.005+0000 INFO Stopped.
nohup: ignoring input
2019-07-15 15:53:34.281+0000 INFO ======== Neo4j 3.5.3 ========
2019-07-15 15:53:34.302+0000 INFO Starting...
2019-07-15 16:09:01.795+0000 INFO Bolt enabled on 9.42.117.142:7687.
2019-07-15 16:09:05.829+0000 INFO Started.
2019-07-15 16:09:06.999+0000 INFO Remote interface available at http://9.42.117.142:7474/

  1. Is this slow start up time to be expected? This is a small subset of what we expect to import into the database, so if increasing the amount of data in the database causes start up times to slow down linearly, this would be a significant weakness of the database.

  2. If this is not expected, what can I do to debug why its starting up so slowly?

Neo4j Version: Community 3.5.3
Platform: Red Hat Enterprise Linux Server 7.6 (Maipo)

Thank you,
Peter

does your logs\debug.log provide detail on the slowness from 2019-07-15 15:53 to 2019-07-15 16:09

@dana.canzano There was nothing that stood out to me that provided any detail on slowness. I would attach the debug.log here, but I guess attachments aren't allowed and the text is too large to fit here.

Is there a debug option to see what is it exactly doing during this period of time?

how long does this line report

2019-06-26 14:48:04.547+0000 INFO [o.n.k.i.p.PageCacheWarmer] Page cache warmup completed. 0 pages loaded. Duration: 26ms.

in your instance?
Are you seeing a number of lines similar to

2019-06-26 14:50:02.274+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=102, gcTime=0, gcCount=0}

or a line similar to

2019-07-10 11:15:35.784+0000 INFO [o.n.k.NeoStoreDataSource] Recovery required from position

@dana.canzano I don't see any line with "PageCacheWarmer" or "Recovery required from position" in it.

However, I do see "VmPauseMonitorComponent" in these two lines:

2019-07-15 16:09:02.915+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=862, gcTime=933, gcCount=1}
2019-07-15 16:09:04.163+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=446, gcTime=525, gcCount=1}

ok... the VmPauseMonitorComponent would be indicative of a Java Garbage Collection event.

How much RAM is on the machine? if this is a linux environment please return free -m

Also, how have you defined the heap and pagecache in the conf/neo4j.conf. Again if linux return the results of grep dbms.memory conf/neo4j.conf

@dana.canzano There is 32GB of RAM on the machine. Is this enough memory or is more recommended? Here is the output of the commands, I set the memory size to what is recommended by the neo4j-admin command.

free -m

          total        used        free      shared  buff/cache   available

Mem: 32146 18890 4121 15 9135 12738
Swap: 16127 372 15755

grep dbms.memory conf/neo4j.conf

dbms.memory.heap.initial_size=12000m
dbms.memory.heap.max_size=12000m
dbms.memory.pagecache.size=12100m

thanks. so it looks as if initial and max heap are set at 12GB and pagecache is also at 12GB thus consuming up to 24G of the total 32G of RAM.
maybe lat question ;). what is the value returned by the line in the debug.log which indicates


2019-06-26 14:48:04.145+0000 INFO [o.n.i.d.DiagnosticsManager]   Total size of mapped files: 4

@dana.canzano Here is the output:

Total size of mapped files: 37.04 GB

What is the size of mapped files referring to?

Total size of mapped files: 37.04 GB

represents the total amount of data which is eligible to be put in the pagecache, provided it is sufficiently large enough. given a pagecache of 12GB and mapped files of 37 GB, this would indicate at most 1/3 of your database is in RAM. So as a starter, regardless of startup time, your system is not optimally configured in terms of RAM. In a perfect world your should have RAM > (heap + 'Total size of mapped files: ` + 1-2GB for general OS operations). But as you only have 32GB of RAM this is not optimal.

However this still does not explain startup time degradation. There was a fix in 3.5.6 which addressed a significant memory allocation for query stats. Maybe this has something to do with the slowdown. any opportunity to upgrade to 3.5.6

@dana.canzano Okay I will work on getting more hardware for the server so it can run optimally.

Yes we can upgrade as this is prototyping server. To upgrade, it looks like a can unzip a new download, update the configuration settings, copy the database directory, and set the dbms.allow_upgrade=true. When upgrade is done, shut down and set dbms.allow_upgrade=false and restart the server. This sound about right to you?

correct for upgrade steps. the complete steps are detailed at https://neo4j.com/docs/operations-manual/current/upgrade/

@dana.canzano I've upgraded to 3.5.7 and the startup/shutdown time is still slow as before.

Any other thoughts on where to look?

Out of curiosity what vendor/version of java are you using ,?

@dana.canzano I'm using 1.8.0

neo4j-community-3.5.7# java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

hmm... ok. had you been running IBM JDK maybe, just maybe I might suggest this is the source as anecdotally we have seen slower peformance here though I dont have specifics. I'm still a bit stumped without debug.log

@dana.canzano for sure I'm not using the IBM JDK, I recall it had some issue running with the IBM JDK if I'm not mistaken.

This is from the debug output:

2019-07-15 21:30:16.770+0000 INFO [o.n.i.d.DiagnosticsManager] JVM information:
2019-07-15 21:30:16.770+0000 INFO [o.n.i.d.DiagnosticsManager] VM Name: OpenJDK 64-Bit Server VM
2019-07-15 21:30:16.770+0000 INFO [o.n.i.d.DiagnosticsManager] VM Vendor: Oracle Corporation
2019-07-15 21:30:16.770+0000 INFO [o.n.i.d.DiagnosticsManager] VM Version: 25.212-b04
2019-07-15 21:30:16.771+0000 INFO [o.n.i.d.DiagnosticsManager] JIT compiler: HotSpot 64-Bit Tiered Compilers
2019-07-15 21:30:16.771+0000 INFO [o.n.i.d.DiagnosticsManager] VM Arguments: [-Xms12000m, -Xmx12000m, -XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -Djdk.tls.ephemeralDHKeySize=2048, -Djdk.tls.rejectClientInitiatedRenegotiation=true, -Dunsupported.dbms.udc.source=tarball, -Dfile.encoding=UTF-8]

@dana.canzano can I email you the debug.log or share it somehow via a dropbox link?