How can I load a very large dataset with limited memory?

JDC · October 9, 2022, 2:28pm

I am importing several TB of CSV data into Neo4J for a project I have been working on. I have enough fast storage for the estimated 6.6TiB, however the machine has only 32GB of memory, and the import tool is suggesting 203GB to complete the import.

When I run the import, I see the following (I assume it exited because it ran out of memory). Is there any way I can import this large dataset with the limited amount of memory I have? Or if not with the limited amount of memory I have, with the maximum ~128GB that the motherboard this machine can support.

Available resources:
  Total machine memory: 30.73GiB
  Free machine memory: 14.92GiB
  Max heap memory : 6.828GiB
  Processors: 16
  Configured max memory: 21.51GiB
  High-IO: true

WARNING: estimated number of nodes 37583174424 may exceed capacity 34359738367 of selected record format
WARNING: 14.62GiB memory may not be sufficient to complete this import. Suggested memory distribution is:
heap size: 5.026GiB
minimum free and available memory excluding heap size: 202.6GiB
Import starting 2022-10-08 19:01:43.942+0000
  Estimated number of nodes: 15.14 G
  Estimated number of node properties: 97.72 G
  Estimated number of relationships: 37.58 G
  Estimated number of relationship properties: 0.00 
  Estimated disk space usage: 6.598TiB
  Estimated required memory usage: 202.6GiB

(1/4) Node import 2022-10-08 19:01:43.953+0000
  Estimated number of nodes: 15.14 G
  Estimated disk space usage: 5.436TiB
  Estimated required memory usage: 202.6GiB
.......... .......... .......... .......... ..........   5% ∆1h 38m 2s 867ms
neo4j@79d2b0538617:~/import$

JDC · October 9, 2022, 8:24pm

As a sidenote- if I have to reduce the amount of data I am importing to fit within memory constrains, what will have the biggest impact? Removing nodes, edges, or attributes, something different? Thanks

glilienfield · October 10, 2022, 5:53am

Can you just partition data into smaller chunks and import separately?

JDC · October 10, 2022, 1:57pm

I thought I could only use admin import once since it overwrites the graph? Should I look into using another import tool?

john.stegeman · October 10, 2022, 7:08pm

This is correct *today*

JDC · October 11, 2022, 12:26am

This may be a bad idea- but I have added 240GB of SWAP on an SSD (My boot drive, which is probably even more ill-advised, but the NVME zfs pool I have the database on will have a great many writes, and I am cutting it a bit close on storage space with all this as is). I will check tomorrow to see where it's at.

Topic		Replies	Views
Neo4j Import error- There is insufficient memory for the Java Runtime Environment to continue. - 2.3 TB dataset Import / Export performance , neo4j-import , cloud	8	3629	November 8, 2018
Import maxing out memory, but not using CPU Neo4j Graph Platform import	0	814	January 30, 2019
Recommended memory config for importing 10GB dataset with 16GB RAM Neo4j Graph Platform cypher	2	2966	September 8, 2020
Memory config for loading data Neo4j Graph Platform migrated	2	175	September 23, 2022
Neo4j always read data to memory , which lead neo4j no responsing when I import data into neo4j Neo4j Graph Platform performance , import	12	829	September 13, 2019

July Summer Fun!

How can I load a very large dataset with limited memory?

Related topics