Creating 50 million nodes in neo4j in fastest way

dt1 · March 7, 2025, 11:59am

I have a csv file which contains 50 million record. I am trying to create node in neo4j with below query which is using apoc.periodic.iterate and apoc.load.csv with batch size of 100000 and parallel=true.

I am running it on a machine which has 90 GB RAM and I have configure below in neo4j.conf.

I am using python neo4j driver to run below query. It is taking very long time to create the nodes. Around 1.6 million nodes got created in 2 days and process is still running.

Am I missing something? How much time generally it takes to create 50 million nodes with each node having 3 properties.?

Please suggest.

server.memory.heap.initial_size=8GB
server.memory.heap.max_size=30GB
server.memory.pagecache.size=50GB

query = """
CALL apoc.periodic.iterate(
"
CALL apoc.load.csv('large_file.csv', {{header:true,sep:',', ignore:['OBJECTID','CONNECTION_CNT'],
{mapping_dict}
}})
YIELD map as row
RETURN row
",
"
WITH row WHERE row.uid IS NOT NULL
MERGE (i:Point {{uid: row.uid}})
SET i.x=toFloat(row.x),
i.y = toFloat(row.y)
RETURN COUNT(*) as total
",
{{batchSize:100000, iterateList:true, parallel:true}}
)
"""

dt1 · March 11, 2025, 1:29pm

Till now 96 hours have passed and only 3.2 million nodes got created.

dana_canzano · March 11, 2025, 1:36pm

@dt1

Do you have an index on :Point(uid) ?

For performance reasons, creating a schema index on the label or property 
is highly recommended when using MERGE. See Create, show, and delete 
indexes for more information.

also per your original post which indicates

I am trying to create node .... ....

is there a need to use MERGE and not CREATE ?

dana_canzano · March 11, 2025, 11:39pm

@dt1

Also woith reference to

server.memory.heap.initial_size=8GB
server.memory.heap.max_size=30GB

im curious as to how these values were set ( though I also dont think its the cause of your performance ). Per Memory configuration - Operations Manual

JVM Heap
The JVM heap is a separate dynamic memory allocation that Neo4j uses to store instantiated Java objects. The memory for the Java objects are managed automatically by a garbage collector. Particularly important is that a garbage collector automatically handles the deletion of unused objects. For more information on how the garbage collector works and how to tune it, see Tuning of the garbage collector.

The heap memory size is determined by the parameters server.memory.heap.initial_size and server.memory.heap.max_size. 

It is recommended to set these two parameters to the same value to avoid
unwanted full garbage collection pauses.

and specifically It is recommended to set these two parameters to the same value .... ....

dt1 · April 9, 2025, 11:13am

Thanks @dana_canzano. Using CREATE instead of MERGE and creating index on nodes solved the problem. Thanks for the help.

Topic		Replies	Views
Upload large amounts of data on Neo4j Community Edition Import / Export	5	1082	February 13, 2020
Using neo4j module and/or apoc to merge large number of nodes Import / Export	6	99	October 22, 2024
Creating masses of new nodes take a very long time Neo4j Graph Platform apoc , performance , cypher	10	409	July 12, 2023
Performance Issue in node creation Integrations & Ecosystem migrated	2	149	June 27, 2022
How can I improve the performance of this query? Newbie Questions	5	1391	April 4, 2019

July Summer Fun!

Creating 50 million nodes in neo4j in fastest way

Related topics