Hi,
I'm trying to load a 5M nodes graph in the database. For different reasons I'm RAM constrained and can allocate only 4G to the database. I'm performing transactions in which I'm creating batches of 500 nodes.
At the beginning I get a throughput of about 2000 nodes created per second. But after loading around 250k nodes, the performance collapse and I eventually get an outOfMemory error.
I've tried various memory settings, this is the last one :
----
dbms.memory.heap.initial_size=1g
dbms.memory.heap.max_size=1g
dbms.memory.pagecache.size=1400m
# Limit the amount of memory that all of the running transaction can consume.
dbms.memory.transaction.global_max_size=500m
# Limit the amount of memory that a single transaction can consume.
dbms.memory.transaction.max_size=500m
# Transaction state location. It is recommended to use ON_HEAP.
dbms.tx_state.memory_allocation=OFF_HEAP
dbms.tx_state.max_off_heap_memory=400m
I don't understand why it is not possible to scale. Is there more caching taking place ? Is there a memory leak somewhere ? Fundamentally, my queries are independent from each other (just node creations) so the complexity should not increase with the number of nodes...
As a follow-up. I've tried to pause the creation of the graph (putting a breakpoint in the debugger). Once the program that performs queries is paused, I've stopped and restarted neo4j.
This solves the problem, and I'm able to load 250k more nodes... until it gets slow again. So there is apparently a leak somewhere... am I forgetting something after the transaction is run ? Should I close something ? Here's my code :
Try running you transaction with parameters instead of parsing your query with the values. This way Neo4J will plan your query just once. Also, do you have all the indexes needed in order to perform your task?