I am new to neo4j. I have a problem statement where I need to import a CALL dataset. The dataset is in csv format with following columns (Source, Target, Timestamp and Duration). For a similar small dataset (5000 rows), I created Source and Target nodes (with constraints of unique source and target nodes) and CALL relationship with timestamp and duration as the properties.
Now, in the large dataset, I have ~70 millions rows and 200,000 nodes. I have a separate csv with the node ids from which I already created the nodes. I didn't completely understand the working of bulk import so I wrote a python script to split my csv into 70 csv's where each csv has 1 million nodes (saved as calls_0, calls_1, .... calls_69). I took the initiative to manually run a cypher query changing the filename every time. It worked well for first 10 files but then I noticed that after adding relationship from a file, the import is getting slower for the next file. Now it is taking almost 25 minutes for importing a file.
Can someone link me to an efficient and easy way of doing it?
Here is the query :
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///calls/calls_28.csv' AS line
WITH toInteger(line.Source) AS Source,
datetime(replace(line.Time,' ','T')) AS time,
toInteger(line.Target) AS Target,
toInteger(line.Duration) AS Duration
MATCH (p1:Person {pid: Source})
MATCH (p2:Person {pid: Target})
MERGE (p1)-[rel:CALLS {time: time, duration: Duration}]->(p2)
RETURN count(rel)
Note: I am using Neo4j 4.0.3