Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 2: migrate recent content

How can I load a very large dataset with limited memory?

Node Link

I am importing several TB of CSV data into Neo4J for a project I have been working on. I have enough fast storage for the estimated 6.6TiB, however the machine has only 32GB of memory, and the import tool is suggesting 203GB to complete the import.

When I run the import, I see the following (I assume it exited because it ran out of memory). Is there any way I can import this large dataset with the limited amount of memory I have? Or if not with the limited amount of memory I have, with the maximum ~128GB that the motherboard this machine can support.


Available resources:
  Total machine memory: 30.73GiB
  Free machine memory: 14.92GiB
  Max heap memory : 6.828GiB
  Processors: 16
  Configured max memory: 21.51GiB
  High-IO: true

WARNING: estimated number of nodes 37583174424 may exceed capacity 34359738367 of selected record format
WARNING: 14.62GiB memory may not be sufficient to complete this import. Suggested memory distribution is:
heap size: 5.026GiB
minimum free and available memory excluding heap size: 202.6GiB
Import starting 2022-10-08 19:01:43.942+0000
  Estimated number of nodes: 15.14 G
  Estimated number of node properties: 97.72 G
  Estimated number of relationships: 37.58 G
  Estimated number of relationship properties: 0.00 
  Estimated disk space usage: 6.598TiB
  Estimated required memory usage: 202.6GiB

(1/4) Node import 2022-10-08 19:01:43.953+0000
  Estimated number of nodes: 15.14 G
  Estimated disk space usage: 5.436TiB
  Estimated required memory usage: 202.6GiB
.......... .......... .......... .......... ..........   5% ∆1h 38m 2s 867ms



This is correct *today*

View solution in original post


Node Link

As a sidenote- if I have to reduce the amount of data I am importing to fit within memory constrains, what will have the biggest impact? Removing nodes, edges, or attributes, something different? Thanks 🙂

Can you just partition data into smaller chunks and import separately?  

I thought I could only use admin import once since it overwrites the graph? Should I look into using another import tool?

This is correct *today*

This may be a bad idea- but I have added 240GB of SWAP on an SSD (My boot drive, which is probably even more ill-advised, but the NVME zfs pool I have the database on will have a great many writes, and I am cutting it a bit close on storage space with all this as is). I will check tomorrow to see where it's at.

Nodes 2022
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online