Neo4j Import error- There is insufficient memory for the Java Runtime Environment to continue. - 2.3 TB dataset

benjamin.squire · October 26, 2018, 4:51pm

I am trying to import 2.3 TB of data onto an EC2 Box with 8 Cores 120 GB Ram and a 8 TB SSD. I have been able to load smaller datasets but am now scaling up to a larger dataset. The command to invoke the import is

~/../../../usr/bin/neo4j-admin import  \
--nodes "import/uids-header.csv,import/uid_no.*"  \
--nodes "import/age-header.csv,import/age_no.*"  \
--nodes "import/gender-header.csv,import/gender_no.*"  \
--nodes "import/ip-header.csv,import/ip_no.*"  \
--nodes "import/device-header.csv,import/device_no.*"  \
--nodes "import/os-header.csv,import/os_no.*"  \
--nodes "import/browser-header.csv,import/browser_no.*"  \
--nodes "import/identitylink-header.csv,import/idlink_no.*"  \
--nodes "import/opti-header.csv,import/opti_no.*"  \
--nodes "import/bluekai-header.csv,import/bk_no.*"  \
--nodes "import/acxiom-header.csv,import/axm_no.*"  \
--nodes "import/adobe-header.csv,import/adb_no.*"  \
--nodes "import/lr-header.csv,import/lr_no.*"  \
--nodes "import/viant-header.csv,import/vnt_no.*"  \
--nodes "import/ga-header.csv,import/ggl_no.*"  \
--nodes "import/segment-header.csv,import/seg_no.*"  \
--nodes "import/email-header.csv,import/email_no.*"  \
--nodes "import/country-header.csv,import/cntry_no.*"  \
--nodes "import/citystate-header.csv,import/city_no.*"   \
--relationships:OBSERVED_WITH "import/rels-header.csv,import/opti_li.*,import/idlink_li.*,import/bk_li.*,import/axm_li.*,import/adb_li.*,import/lr_li.*,import/vnt_li.*,import/ggl_li.*,import/seg_li.*,import/email_li.*"  \
--relationships:VISITED_ON "import/rels-header.csv,import/device_li.*,import/os_li.*,import/browser_li.*"  \
--relationships:VISITED_FROM "import/rels-header.csv,import/city_li.*,import/cntry_li.*,import/ip_li.*"  \
--relationships:IDENTIFIED_AS "import/rels-header.csv,import/gender_li.*,import/age_li.*"  \
--ignore-duplicate-nodes=true  \
--ignore-missing-nodes=true  \
--delimiter="~"  \
--max-memory=95%

Please provide the following information if you ran into a more serious issue:

neo4j version: Community 3.4.9
neo4j.log and debug.log
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 224919552 bytes for committing reserved memory. Possible reasons:
The system is out of physical RAM or swap space
In 32 bit mode, the process size limit was hit
Possible solutions:
Reduce memory load on the system
Increase physical memory or swap space
Check if swap backing store is full
Use 64 bit Java on a 64 bit OS
Decrease Java heap size (-Xmx/-Xms)
Decrease number of Java threads
Decrease Java thread stack sizes (-Xss)
Set larger code cache with -XX:ReservedCodeCacheSize=
This output file may be truncated or incomplete.

Out of Memory Error (os_linux.cpp:2657), pid=7928, tid=0x00007fbd2e13c700

JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 1.8.0_181-b13)
Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 compressed oops)
Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

stefan.armbruster · October 26, 2018, 8:54pm

is known to slow down the import and to have a much higher memory footprint. If possible try to get rid of duplicates before running the import.

benjamin.squire · October 26, 2018, 10:08pm

That would not be possible given the data as some of the ids across different nodes inherently had the same id which was giving an error and required me to have it ignore duplicate nodes.
I am handling this per Max Demarzi suggestion here: Batch Importer – Part 2 | Max De Marzi
I have used row_number sequentially on tables of different nodes to ensure a unique numerical id which I will load into the DB with id-type: ACTUAL and i will no longer use --ignore-duplicate-nodes hopefully this reduces memory need and speeds up the import as it was taking 4.5 hours just to hit a brick wall.

michael.hunger · October 27, 2018, 10:25pm

Are there other processes running consuming memory? 120G might also be a bit on the low side.
Can you share the output of the tool?

mpviolet · October 29, 2018, 9:45am

Does it fail before even starting to import or midway into it? Can you include printout from the import run? How much heap do you give it?

benjamin.squire · October 29, 2018, 9:42pm

@michael.hunger there are no other processes running. I am trying to format data via Max's suggestion to see if ordering the data with the Actual Node id via a row_number command on the distincts might help.

@mpviolet it fails about 45% into the 1/4 node import stage. It took around 4.5 hours to hit the error. Never reached the relationship stage.

benjamin.squire · October 31, 2018, 10:04am

Had a critical realization that in my CSV export from redshift I failed to perform a distinct on one of the nodes sets, namely citystate, this meant I had duplicates in the range of 5.4 Billion as every record had a city recorded. It broke the import on the ignore duplicates as was stated previous.

Key discovery: Make sure to double check your data prior to loading with import process

benjamin.squire · November 7, 2018, 12:25am

Despite fixing the UNLOAD command in redshift, and double checking my data, I found for 2.2 TB of data that 120 GB RAM will not be enough. I extended to 244 GB RAM and it is now on 3/4 stages linking relationship after load time of around 10 hours. I did follow Max's link about using ACTUAL ID, not sure if it sped up the process any but at least it has almost loaded the 33 Billion nodes which is the max limit of Neo4j Community

benjamin.squire · November 8, 2018, 1:23am

IMPORT DONE in 18h 51m 44s 165ms.
Imported:
7553667978 nodes
29805914822 relationships
18671681291 properties
Peak memory usage: 92.45 GB

Thanks for everyone's help

Topic		Replies	Views
Neo4j Admin Import Error: Insufficient Memory for JRE to continue Neo4j Graph Platform migrated	0	172	January 2, 2023
How can I load a very large dataset with limited memory? Neo4j Graph Platform migrated	5	169	October 11, 2022
Error while doing neo4j-admin import Neo4j Graph Platform migrated	1	399	September 29, 2022
Can't import csv data after desktop v5.4 with python neo4j package Import / Export apoc , import , neo4j-desktop	0	282	August 3, 2023
Recommended memory config for importing 10GB dataset with 16GB RAM Neo4j Graph Platform cypher	2	2938	September 8, 2020

Neo4j Import error- There is insufficient memory for the Java Runtime Environment to continue. - 2.3 TB dataset

Related topics