Create relationship from CSV on existing nodes

Hi,

I just started using Neo4j to visualize data. I am currently stuck on creating relationships. I have twe CSV files. One with the nodes and all attributes and one with all relationships.

The CSV with all nodes (Object.csv) are succesfully loaded in Neo4j as follows:
Name, ID

I want to create relationships on these nodes with the following CSV (object-object-link.csv):
Unid (relationship id), parentid, childid.

I want to join the Parentid and the child id from the relationship csv on the ID from the node CSV. How should I proceed? There are 51k relationships.

Thank you.

Hi!

You can do this across multiple CSVs as long as you've got some way to uniquely identify the nodes you wish to connect the relationships to.

Whilst this article is about performant LOAD CSV - it gives a great example of how you'd look up existing nodes as part of LOAD CSV to connect them: Neo4j: Cypher - Avoiding the Eager | Mark Needham

I hope this helps!

Thank you for your response. The ID from the nodes are unique and are equivalent to either a parentid or a childid from the relationship CSV.

I tried to read through the link but I can't see what the connection is to my question, might be because of my inexperience though.

No problem at all, here's an example:

Let's say our Nodes.csv file which contains people, looks something like this...

id, name
1, Bob
2, Jane

Let's say our Relationship.csv file, which contains who knows who, looks something like this...

id_from, id_to
1, 2

We do our first pass which creates our person nodes, i.e.

LOAD CSV WITH HEADERS FROM "file:///Nodes.csv" AS row
CREATE (:Person {id:row.id, name:row.name});

Then, we're going to use the id property as a look-up so that we can join the nodes together, i.e.

LOAD CSV WITH HEADERS FROM "file:///Relationships.csv" AS row
//look up the two nodes we want to connect up
MATCH (p1:Person {id:row.id_from}), (p2:Person {id:row.id_to})
//now create a relationship between them
CREATE (p1)-[:KNOWS]->(p2);

Does that make sense?

5 Likes

Hi @lju,
Thanks for this response. I am facing a similar issue and had already come to a solution similar to your proposal.
While it works for small data, it's not optimal for large datasets (100M+ nodes , 1B+ relationships)..

Any proposals?

Hi @DanielGittx
Sorry I missed your post!

If you're looking at that size of data - if it's an initial load, I would heartily suggest you use the offline batch importer! It will load that of that size significantly faster than an online, transactional process (which LOAD CSV is).

You can find out more here.

A. To make this example to work, you need to remove the spaces from the .csv file.

B. (EDIT): I had a typo. I fixed the query and it now works.

I was problems getting this to work when the nodes are of different types:

owner.csv

id,name
1,Bob
id,petname
2,Rover
LOAD CSV WITH HEADERS FROM "file:///pet.csv" AS row
CREATE (:pet {id:row.id, petname:row.petname});
LOAD CSV WITH HEADERS FROM "file:///owner.csv" AS row
CREATE (:owner {id:row.id, name:row.name});

and using the same relationship file:

LOAD CSV WITH HEADERS FROM "file:///Relationships.csv" AS row
//look up the two nodes we want to connect up
MATCH (p1:owner {id:row.id_from}), (p2:pet {id:row.id_to})
//now create a relationship between them
CREATE (p1)-[:OWNS]->(p2);