I just started using Neo4j to visualize data. I am currently stuck on creating relationships. I have twe CSV files. One with the nodes and all attributes and one with all relationships.
The CSV with all nodes (Object.csv) are succesfully loaded in Neo4j as follows:
Name, ID
I want to create relationships on these nodes with the following CSV (object-object-link.csv):
Unid (relationship id), parentid, childid.
I want to join the Parentid and the child id from the relationship csv on the ID from the node CSV. How should I proceed? There are 51k relationships.
You can do this across multiple CSVs as long as you've got some way to uniquely identify the nodes you wish to connect the relationships to.
Whilst this article is about performant LOAD CSV - it gives a great example of how you'd look up existing nodes as part of LOAD CSV to connect them: Neo4j: Cypher - Avoiding the Eager | Mark Needham
Let's say our Nodes.csv file which contains people, looks something like this...
id, name
1, Bob
2, Jane
Let's say our Relationship.csv file, which contains who knows who, looks something like this...
id_from, id_to
1, 2
We do our first pass which creates our person nodes, i.e.
LOAD CSV WITH HEADERS FROM "file:///Nodes.csv" AS row
CREATE (:Person {id:row.id, name:row.name});
Then, we're going to use the id property as a look-up so that we can join the nodes together, i.e.
LOAD CSV WITH HEADERS FROM "file:///Relationships.csv" AS row
//look up the two nodes we want to connect up
MATCH (p1:Person {id:row.id_from}), (p2:Person {id:row.id_to})
//now create a relationship between them
CREATE (p1)-[:KNOWS]->(p2);
Hi @lju,
Thanks for this response. I am facing a similar issue and had already come to a solution similar to your proposal.
While it works for small data, it's not optimal for large datasets (100M+ nodes , 1B+ relationships)..
If you're looking at that size of data - if it's an initial load, I would heartily suggest you use the offline batch importer! It will load that of that size significantly faster than an online, transactional process (which LOAD CSV is).
A. To make this example to work, you need to remove the spaces from the .csv file.
B. (EDIT): I had a typo. I fixed the query and it now works.
I was problems getting this to work when the nodes are of different types:
owner.csv
id,name
1,Bob
id,petname
2,Rover
LOAD CSV WITH HEADERS FROM "file:///pet.csv" AS row
CREATE (:pet {id:row.id, petname:row.petname});
LOAD CSV WITH HEADERS FROM "file:///owner.csv" AS row
CREATE (:owner {id:row.id, name:row.name});
and using the same relationship file:
LOAD CSV WITH HEADERS FROM "file:///Relationships.csv" AS row
//look up the two nodes we want to connect up
MATCH (p1:owner {id:row.id_from}), (p2:pet {id:row.id_to})
//now create a relationship between them
CREATE (p1)-[:OWNS]->(p2);