New to Neo4j: Looking for feedback on setting correct relationships between entities

Hi all,

I am new to Neo4j or even graph databases for that matter. I am struggling with building my first graph. I would be very if the community would like to assist me with this basic question. I can not get it to work when following the instructions: How-To: Import CSV Data with Neo4j Desktop - Developer Guides

My case also involves three csv-files (two entities and one relationship) The csv's have the following layout:

CSV1: entity1_id, entity1_name
CSV2: entity2_id, entity2_name
CSV3: id, entity1_id, entity2_id (many:many relationship)

I would create a relationship using CSV3, however the result when running the following cypher code:


LOAD CSV WITH HEADERS FROM 'file:///CSV3.csv' AS row
WITH toInteger(row.id) AS relation_id, toInteger(row.entity1_id) AS entity1_id, toInteger(row.entity2_id) AS entity2_id
MATCH (e1:CSV1 {entity1_id:entity1_id})
MATCH (e2:CSV2 {entity2_id:entity2_id})
MERGE (e1)-[rel:CONTAINS]->(e2)
RETURN count(rel);

This yield 0 relationships. Could someone help me out on setting the correct relationships between the entities?

Thanks in advance!

Hi,

Based on your query you are trying to MATCH on the nodes you want to connect. The question I have is do they exist yet? If not, try switching to use MERGE instead of match. The relationships are likely not being created if the nodes don't exist.

Hi Michael,

Thank you for your prompt response. I used something along the following Cypher to construct the nodes (according to the mentioned link).

LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
WITH toInteger(row.orderID) AS orderId, datetime(replace(row.orderDate,' ','T')) AS orderDate, row.shipCountry AS country
RETURN orderId, orderDate, country
LIMIT 5;

I assume this constructs a node? Or am I being ignorant?

Unfortunately it looks like you never actually created the nodes; you only returned the values from the CSV to the browser. You need to use CREATE or MERGE to create the nodes. Something like:

LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
WITH toIntegeer(row.orderID) AS orderId,
     datetime(replace(row.orderDate(, ' ', 'T'))  AS orderDate
     row.shipCountry AS country
CREATE (order:Order {orderId: orderId}) // MERGE would also work here assuming each order number is unique
SET order.orderDate = orderDate,
    order.country = country

One things to consider with converting the order ID to an integer is that it will remove leading zeros. That may or many not be an issue depending on how you want to use that property. Just something to think about.

Thank you! Can I ask you a final question? My script:

// Set constraints on unique keys of files.
CREATE CONSTRAINT UniqueInformatiedomeinGroep ON (idg:Informatiedomeingroep) ASSERT idg.id IS UNIQUE;
CREATE CONSTRAINT UniqueInformatiedomein ON (id:Informatiedomein) ASSERT id.id IS UNIQUE;
CREATE CONSTRAINT UniqueBedrijfsactiviteit ON (ba:Bedrijfsactiviteit) ASSERT ba.id IS UNIQUE;
CREATE CONSTRAINT UniqueInformatieObject ON (io:Informatieobject) ASSERT io.id IS UNIQUE;

// Load informatiedomein_groepen nodes.
LOAD CSV WITH HEADERS FROM 'file:///informatiedomein_groepen.csv' AS row
WITH toInteger(row.id) AS groepId
   , row.groep         AS groepNaam
MERGE (idg:Informatiedomeingroep {groepId: groepId})
  SET idg.groepNaam = groepNaam;

// Load informatiedomeinen nodes.
LOAD CSV WITH HEADERS FROM 'file:///informatiedomeinen.csv' AS row
WITH toInteger(row.id)             AS informatiedomeinId
   , row.informatiedomein          AS informatiedomeinNaam
   , row.informatiedomein_groep_id AS informatiedomeinGroepId
MERGE (id:Informatiedomein {informatiedomeinId: informatiedomeinId})
  SET id.informatiedomeinNaam = informatiedomeinNaam
    , id.informatiedomeinGroepId = informatiedomeinGroepId;

// Load bedrijfsactiviteiten nodes.
LOAD CSV WITH HEADERS FROM 'file:///bedrijfsactiviteiten.csv' AS row
WITH toInteger(row.id)                  AS bedrijfsactiviteitId
   , row.bedrijfsactiviteit             AS bedrijfsactiviteitNaam
   , toInteger(row.informatiedomein_id) AS informatiedomeinId
MERGE (ba:Bedrijfsactiviteit {bedrijfsactiviteitId: bedrijfsactiviteitId})
  SET ba.bedrijfsactiviteitNaam = bedrijfsactiviteitNaam
    , ba.informatiedomeinId = informatiedomeinId;

// Load informatieobjecten nodes.
LOAD CSV WITH HEADERS FROM 'file:///informatieobjecten.csv' AS row
WITH toInteger(row.id)                  AS informatieobjectId
   , row.informatieobject               AS informatieobjectNaam
   , toInteger(row.informatiedomeinId)  AS informatiedomeinId
MERGE (io:Informatieobject {informatieobjectId: informatieobjectId})
  SET io.informatieobjectNaam = informatieobjectNaam
    , io.informatiedomeinId = informatiedomeinId;

// Create relationships.
MATCH (io:Informatieobject)
    , (ba:Bedrijfsactiviteit)
    , (idg:Informatiedomeingroep)
    , (id:Informatiedomein)
WHERE idg.groepId = id.informatiedomeinGroepId
  AND id.informatiedomeinId = ba.informatiedomeinId
  AND id.informatiedomeinId = io.informatiedomeinId
CREATE  (idg)-[cid:CONTAINS_ID]->(id)
      , (id)-[cba:CONTAINS_BA]->(ba)
      , (id)-[cio:CONTAINS_IO]->(io)
RETURN idg, id, ba, io;

However, the relationships returns 0 changes, what am I doing wrong?

It's a bit tough to follow with how you have labeled everything. But I would say try creating the relationships in 3 separate queries and see if that helps.

Have you queried the nodes after each node creating step to see if the properties are as you intended?

Run this query and check the results:

MATCH (io:Informatieobject)
    , (ba:Bedrijfsactiviteit)
    , (idg:Informatiedomeingroep)
    , (id:Informatiedomein)
WHERE idg.groepId = id.informatiedomeinGroepId
  AND id.informatiedomeinId = ba.informatiedomeinId
  AND id.informatiedomeinId = io.informatiedomeinId
  
RETURN io, ba, idg, id  

Hi,

I got the query to work. After mckenzma's advice on breaking the relationships up in three separate queries, I was able to debug the script. After inspecting the properties of the nodes that I trying to match, I saw that one of the properties was a different datatype. Which I overlooked. I am grateful for your assistance!

I am glad you were able to figure it out!

I tend to try and break up node and relationship creation into separate queries for that reason. It becomes easier to debug errors and issues. Especially for larger data sets, I have learned that running a query to create a set of nodes and then running a query to connect those nodes can be more performant than trying to do it at the same time. And it becomes easier to potentially run only parts of the queries you need to when/if something changes.

Good luck!

My recommendation, especially for Neo4J newbies is take it step by step and try on very small files that are samples of your data.

  1. do a LOAD.CSV with file CSV1. Make sure you have the nodes you want (with MATCH)
    2 do a LOAD.CSV with file CSV2. Again, make sure you have the nodes you want.