I'm trying to import a 10GB dataset into Neo4j using Cypher. There are 8 node types and 23 relationship types.
So far, I've tried this on 2 different machines, each with 16GB RAM. On the first machine, I was able to successfully import all nodes, but as soon as I tried to import the first relationship type, I kept getting this error after every 2nd import -
2019-04-16 00:18:56.438+0000 ERROR Client triggered an unexpected error [Neo.DatabaseError.Transaction.TransactionStartFailed]: The database has encountered a critical error, and needs to be restarted. Please see database logs for more details.
This was the memory config in that machine at the time of the error -
dbms.memory.heap.initial_size=4G dbms.memory.heap.max_size=8G dbms.memory.pagecache.size=1G
I increased the page size to 2GB and tried again, but still the same error -
dbms.memory.heap.initial_size=4G dbms.memory.heap.max_size=6G dbms.memory.pagecache.size=2G
After importing the nodes, the neo4j DB size seems to be 24GB, and each time this error occurs, I am forced to restart the DB. And for some reason, with a large DB size, whenever it starts up, there is a significant delay (more than 20 minutes) in the "Initating metrics..." phase. I have an open issue on Github about this.
With 24 relationship types, and 7 csv files to import for each type, if I keep having to restart the DB after every 2nd import, I'll never get to finishing the imports.
On the second machine, I tried this config:
dbms.memory.heap.initial_size=4G dbms.memory.heap.max_size=4G dbms.memory.pagecache.size=50G
I had only managed to import one csv file for the first node, when I noticed the second import seemed to have hung/crashed. It was running for more than 30 minutes, while the first csv import took only 6 minutes. So I terminated the query, and closed Neo4J browser, but when I restarted it, I got this error -
The error says -
ServiceUnavailable: Failed to establish connection in 5000ms
After researching this error, I came across this troubleshooting guide that says -
A common reason why this error occurs is that your Neo4j instance is under heavy load. For example if you're running a query that is soon going to result in an Out of Memory error, it would be possible to run into this error.
I'm all out of ideas now. I'm pretty sure the memory config is the culprit in both the cases, but as you can see, I've tried different combinations of heap size and page cache size, but I still get some error or the other, every single time.
Is there a recommended configuration that I should be using? I found a similar question here but it's answered.
- neo4j version, desktop version, browser version: 3.5.4/3.5.3, 1.1.18, 3.2.19
- what kind of API / driver do you use: REST/Cypher
- a sample of the data you want to import: https://www.dropbox.com/s/3s73jjb6i6zgbgg/comment_1_0.csv?dl=1
- which plugins / extensions / procedures do you use: none