I am importing several TB of CSV data into Neo4J for a project I have been working on. I have enough fast storage for the estimated 6.6TiB, however the machine has only 32GB of memory, and the import tool is suggesting 203GB to complete the import.
When I run the import, I see the following (I assume it exited because it ran out of memory). Is there any way I can import this large dataset with the limited amount of memory I have? Or if not with the limited amount of memory I have, with the maximum ~128GB that the motherboard this machine can support.
Available resources:
Total machine memory: 30.73GiB
Free machine memory: 14.92GiB
Max heap memory : 6.828GiB
Processors: 16
Configured max memory: 21.51GiB
High-IO: true
WARNING: estimated number of nodes 37583174424 may exceed capacity 34359738367 of selected record format
WARNING: 14.62GiB memory may not be sufficient to complete this import. Suggested memory distribution is:
heap size: 5.026GiB
minimum free and available memory excluding heap size: 202.6GiB
Import starting 2022-10-08 19:01:43.942+0000
Estimated number of nodes: 15.14 G
Estimated number of node properties: 97.72 G
Estimated number of relationships: 37.58 G
Estimated number of relationship properties: 0.00
Estimated disk space usage: 6.598TiB
Estimated required memory usage: 202.6GiB
(1/4) Node import 2022-10-08 19:01:43.953+0000
Estimated number of nodes: 15.14 G
Estimated disk space usage: 5.436TiB
Estimated required memory usage: 202.6GiB
.......... .......... .......... .......... .......... 5% ∆1h 38m 2s 867ms
neo4j@79d2b0538617:~/import$