Hello , this is my first topic in neo4j community and I am learning neo4j .I am recently trying to upload data into neo4j graphDB from csv files. I have a written a python script for that. Among my csv files, some csv file is large (3.2 GB or above) which contains roughly 50 million or above rows. I have done bulk import first and it worked well but I need to upload data into existing database so I used load csv for importing data into graphdb. since my data is very large , I have used apoc library(version 3.5.0.4) for using parallel features. my current cypher query is
CALL apoc.periodic.iterate('
load csv with headers from "file:///relcashoutTest.csv" AS row return row ','
MATCH (a:CUSTOMER {WALLETID: row.CUSTOMER})
MATCH (c:AGENT{WALLETID: row.AGENT})
MERGE(a)-[r:CASHOUT]->(c)
return count(*)
',{batchSize:1000, iterateList:true, parallel:true})
this query for single cashout relationship. but I have others . In pyscript I am maintaining it dynamically.Happy thing is node creation works properly around 105 sec. I am facing problem to build relationships in nodes. My amazon instance have 32 CPU core with 240G RAM. I have observed that, firstly the parallelism works fine but after times it can't use all cores , in my case it is stuck between 2 -7 cores. I have printed some statistics , making 10 relations take 39 sec. yesterday I ran above relationship query for 8hours and I didn't get output. I am confused Constraint and indexing won't be helpful cause read and write trade off. Kindly help me out to solve this problem . my pyscript with this query works fine for small sized data. Thank you in advance. My neo4j version is 3.5.8