cancel
Showing results for 
Search instead for 
Did you mean: 

Recommended memory config for importing 10GB dataset with 16GB RAM

manish_giri_me
Node Clone

I'm trying to import a 10GB dataset into Neo4j using Cypher. There are 8 node types and 23 relationship types.

So far, I've tried this on 2 different machines, each with 16GB RAM. On the first machine, I was able to successfully import all nodes, but as soon as I tried to import the first relationship type, I kept getting this error after every 2nd import -

2019-04-16 00:18:56.438+0000 ERROR Client triggered an unexpected error [Neo.DatabaseError.Transaction.TransactionStartFailed]: The database has encountered a critical error, and needs to be restarted. Please see database logs for more details.

This was the memory config in that machine at the time of the error -

dbms.memory.heap.initial_size=4G
dbms.memory.heap.max_size=8G
dbms.memory.pagecache.size=1G

I increased the page size to 2GB and tried again, but still the same error -

dbms.memory.heap.initial_size=4G
dbms.memory.heap.max_size=6G
dbms.memory.pagecache.size=2G

After importing the nodes, the neo4j DB size seems to be 24GB, and each time this error occurs, I am forced to restart the DB. And for some reason, with a large DB size, whenever it starts up, there is a significant delay (more than 20 minutes) in the "Initating metrics..." phase. I have an open issue on Github about this.

With 24 relationship types, and 7 csv files to import for each type, if I keep having to restart the DB after every 2nd import, I'll never get to finishing the imports.

On the second machine, I tried this config:

dbms.memory.heap.initial_size=4G
dbms.memory.heap.max_size=4G
dbms.memory.pagecache.size=50G

I had only managed to import one csv file for the first node, when I noticed the second import seemed to have hung/crashed. It was running for more than 30 minutes, while the first csv import took only 6 minutes. So I terminated the query, and closed Neo4J browser, but when I restarted it, I got this error -

The error says -

ServiceUnavailable: Failed to establish connection in 5000ms

After researching this error, I came across this troubleshooting guide that says -

A common reason why this error occurs is that your Neo4j instance is under heavy load. For example if you're running a query that is soon going to result in an Out of Memory error, it would be possible to run into this error.

I'm all out of ideas now. I'm pretty sure the memory config is the culprit in both the cases, but as you can see, I've tried different combinations of heap size and page cache size, but I still get some error or the other, every single time.

Is there a recommended configuration that I should be using? I found a similar question here but it's answered.

2 REPLIES 2

In general you can use bin/neo4j-admin memrec

Which will probably result in 6G PageCache and 8G heap.

What's more important: How do you import your data?

Can you please share either your neo4j-admin import call or your cypher statements !!

epurcell
Node

I'm getting similar out of memory issues with Neo4j desktop 4.1.1
There is not enough memory to perform the current task.
Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf'
I'm trying to create a relationship with a 700K node group. Here's the cypher code.

MATCH (s:Station), (t:Trip)
WHERE s.stationId = t.endStationId
CREATE (s) <- [:TERMINATED] - (t)

I'm on a Dell Laptop with 16G RAM. What would you suggest?

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.