Data Loading

Hi All,

I am using Neo4j Desktop Version. The hardware configuration of the system are as follows:
Storage: 128 GB

Tired loading 5.3 billion nodes into the system. But it was unsuccessful, the error throw was heap size exceeded. Again tried to do the same by increasing the maximum heap size to 6 GB. Still it was unsuccessful with the same error.

Also tried loading same 5.3 billion records as relationship between the nodes. The loading command was running since last one and half hour, hence terminated it and no relation was created.

It would be great if I get the answers to the following question regarding the performance:

  1. How much time it will take to load 5.3 billion nodes?
  2. How much time it will take to load 5.3 billion relationship?
  3. What will be the performance of the queries(data retrieval, aggregation etc) for above both scenario's?
  4. What are the hardware configuration required to meet the above performance goals?

Thanks & Regards,

Hi, @vinayak.bali !

I would recommend using apoc.periodic.iterate() for loading your data in transactional batches and in parallel. By using it, the heap memory will be released in every batch.

An example of usage is:

CALL apoc.periodic.iterate(
  'CALL apoc.load.jdbc("jdbc:mysql://localhost:3306/northwind?user=root","company")',
  'CREATE (p:Person) SET p += value',
  { batchSize:10000, parallel:true})
RETURN batches, total
1 Like