Import of 3M data into neo4j with relationship leads to out of memory

cypher
import
knowledge-base

(Sachin Kamath) #1

Hi Team,

Im using neo4j 2.3 and facing issues during import of data.

I have been trying to import 3 different CSV files with the third file containing 3M data. creation of relationship between file 1 and file 2 was successful as there was not much of data. But creation of relationship between file 2 and file 3 was not successful. The data structure is as follows.

File 1: (This has 2 headers) Data sample for eg

HEADER1,HEADER2
'A', 'B'
File2:(This has two headers) values in HEADER2 common between file1 and file2

HEADER2,HEADER3
'B','C'
File3: (This has 4 headers) values in Header3 common between file2 and file3 This file has 3M values

HEADER3,HEADER4,HEADER5,HEADER6
'C','D','E','F'
I want to create relationships between nodes as follows: If a :file1 node has a HEADER2 property that is equal to a :file2 node's HEADER2 property, then a relationship should be created between those nodes. And a relationship should be created similarly between :file2/:file3 nodes using their HEADER3 properties.

I'm using the following code to load data but the JVM is going out of memory since the relationship creation is in very high number :

USING PERIODIC COMMIT 2000
LOAD CSV WITH HEADERS FROM "file:///D:/file3.csv" as csvline
MATCH (file2:file2 {HEADER3: csvline.HEADER3})
create (file3:file3 {HEADER3: csvline.HEADER3, HEADER4: toString(csvline.HEADER4), HEADER5: toString(csvline.HEADER5), HEADER6: csvline.HEADER6})
CREATE (file2)-[:HAS_SERVICE]->(file3)

Any pointers on how to import data will be helpful !!!


(Andrew Bowman) #2

Is there any particular reason why you're using Neo4j 2.3? That's many years out of date.

It would help to know what indexes you have in the database.


(Michael Hunger) #3

Because you MATCH and CREATE on the same label + properties it turns the query into an EAGER query which disables your periodic commit.

USE MERGE () ON CREATE SET ... instead.

And upgrade to a recent version :)

MATCH (file2:file2 {HEADER3: csvline.HEADER3})
create (file3:file3 {HEADER3: csvline.HEADER3, HEADER4: toString(csvline.HEADER4), HEADER5: toString(csvline.HEADER5), HEADER6: csvline.HEADER6})

(Ron van Weverwijk) #4

I can suggest this read: https://markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/

This help me understanding the issue.


(Sachin Kamath) #5

Thank you all for the response !!! I will check this by changing the query and then update the results.