Hello
I am working with Neo4j Community Edition running on EC2 (r5.16xlarge instance type). I am trying to upload data from S3 buckets.
I have a number of CSV files (each with 1M records) and I am trying to upload data into Neo4j. I used LOAD CSV initially and now I am using apoc.load.csv after checking out a few topics on the community forum. Even this process is also taking lot of time to upload the data. My query looks something like below.
CALL apoc.periodic.iterate('
CALL apoc.load.csv({file_path}) yield map as row return row
','
MERGE ....
MERGE ....
MERGE ...
...
...
...
...
...
{batchSize:10000, parallel:true});
As seen above, I have a lot of MERGE operations in the query. Even to upload 10K records, it is taking more than a minute. I need to upload millions of records every minute. On the forum, someone suggested me to try neo4j-admin import but for my use case, I need to mutate the graph with new data every hour.
I tried to change the EC2 instance types by increasing the memory and CPU but no success. Please suggest me on how to go about this.
Thank you!