I have a csv file which contains 50 million record. I am trying to create node in neo4j with below query which is using apoc.periodic.iterate and apoc.load.csv with batch size of 100000 and parallel=true.
I am running it on a machine which has 90 GB RAM and I have configure below in neo4j.conf.
I am using python neo4j driver to run below query. It is taking very long time to create the nodes. Around 1.6 million nodes got created in 2 days and process is still running.
Am I missing something? How much time generally it takes to create 50 million nodes with each node having 3 properties.?
For performance reasons, creating a schema index on the label or property
is highly recommended when using MERGE. See Create, show, and delete
indexes for more information.
JVM Heap
The JVM heap is a separate dynamic memory allocation that Neo4j uses to store instantiated Java objects. The memory for the Java objects are managed automatically by a garbage collector. Particularly important is that a garbage collector automatically handles the deletion of unused objects. For more information on how the garbage collector works and how to tune it, see Tuning of the garbage collector.
The heap memory size is determined by the parameters server.memory.heap.initial_size and server.memory.heap.max_size.
It is recommended to set these two parameters to the same value to avoid
unwanted full garbage collection pauses.
and specifically It is recommended to set these two parameters to the same value .... ....