Creating relationships across multiple CSV's

cypher
csv

(Michael Cohen) #1

I'm looking to import data from three different csv's with headers and create relationships between certain nodes. The CSV data looks like this:

activities.csv
activity activityID stakeholder process lifecycle
submit budget aei89302.13 budget analyst AHA application
review budget aei89301.14 program officer AHA application

problems.csv
problem activity activityID
lengthy wait time submit budget aei89302.13
manual data entry submit budget aei89302.13
no clear rules review budget aei89302.14

systems.csv
system activity activityID
email submit budget aei89302.13
excel review budget aei89302.14

Here's my code so far:

CREATE CONSTRAINT ON(a:Activity) ASSERT a.activity_id IS UNIQUE;
CREATE CONSTRAINT ON(s:Stakeholder) ASSERT s.name IS UNIQUE;
CREATE CONSTRAINT ON(p:Process) ASSERT p.name IS UNIQUE;
CREATE CONSTRAINT ON(pr:Problem) ASSERT pr.activity_ID IS UNIQUE;
CREATE CONSTRAINT ON(sy:System) ASSERT sy.activity_ID IS UNIQUE;

LOAD CSV WITH HEADERS FROM "file:///activities.csv" as line
WITH line
MERGE (a:Activity {name:line.activity, id:line.activity_id, lifecycle:line.lifecycle, activity_type:line.activity_type})
MERGE (s:Stakeholder {name :line.stakeholder})
MERGE (p:Process {name: line.process})
CREATE (s)-[:responsible_for]->(a)-[:belongs_to]->(p);

LOAD CSV WITH HEADERS FROM "file:///problems.csv" as line2
WITH line2
CREATE (pr:Problem {name: line2.problem, activity_id:line2.activity_id});
MATCH (pr:Problem), (a:Activity)
WHERE pr.id = a.id
MERGE (a)-[:has_problem] -> (pr);

LOAD CSV WITH HEADERS FROM "file///systems.csv" as line3
WITH line3
CREATE (sy:System {name:line3.system, id:line3.activity_id})
MATCH (a:Activity), (sy:System)
WHERE a.id = sy.id
MERGE (a)-[:has_system] -> (sy);

When I run this, cypher has no problem with the constraint statements or the first load_csv statement, but when it comes to the second load csv (for Problems), I get the following error:

Neo.ClientError.Schema.ConstraintValidationFailed: Node(17781) already exists with label Problem and property activity_id = '50ec26ddc31.17'

I have a feeling that the issue is with how I'm writing the constraints, but I'm not 100% sure.


(Ameyasoft) #2

The problem is in defining the constraints and node properties. There is a mismatch between the two,

  1. CREATE CONSTRAINT ON(a:Activity) ASSERT a.activity_id IS UNIQUE;
    In a.Activity node the property is defined as 'id' and hence the mismatch.
    In the constraint it's a.activity_id and in the node it's a.id.
    Replace 'a.activity_id' in the constraint with 'id'.

  2. CREATE CONSTRAINT ON(pr:Problem) ASSERT pr.activity_ID IS UNIQUE;
    CREATE CONSTRAINT ON(sy:System) ASSERT sy.activity_ID IS UNIQUE;

Here it's case difference. Replace 'pr.activity_ID' with 'pr.activity_id' and 'sy.activity_ID' with 'sy.activity_id'

Make these changes and try.


(Michael Cohen) #3

Thanks! I made those changes but it's still giving me an error after running the first CSV statement, "Neo.ClientError.Schema.ConstraintValidationFailed: Node(96) already exists with label Activity and property id = 'aaec265e3f3.1"

I've changed all the constraints so that they're referencing (node).id or (node).name - I don't THINK they should be reference (node).activity_id because 'activity_id' is just the header in the CSV, but the key for the property is id.

Here's the code I'm currently running:

CREATE CONSTRAINT ON(a:Activity) ASSERT a.id IS UNIQUE;
CREATE CONSTRAINT ON(s:Stakeholder) ASSERT s.name IS UNIQUE;
CREATE CONSTRAINT ON(p:Process) ASSERT p.name IS UNIQUE;
CREATE CONSTRAINT ON(pr:Problem) ASSERT pr.name IS UNIQUE;
CREATE CONSTRAINT ON(sy:System) ASSERT sy.id IS UNIQUE;

LOAD CSV WITH HEADERS FROM "file:///activities_full.csv" as line
WITH line
CREATE (a:Activity {name:line.activity, id:line.activity_id, lifecycle:line.lifecycle, activity_type:line.activity_type})
MERGE (s:Stakeholder {name :line.stakeholder})
MERGE (p:Process {name: line.process})
MERGE (s)-[:responsible_for]->(a)-[:belongs_to]->(p);

LOAD CSV WITH HEADERS FROM "file:///problems.csv" as line2
WITH line2
MERGE (pr:Problem {name: line2.problem, id:line2.activity_id});
MATCH (pr:Problem), (a:Activity)
WHERE pr.id = a.id
MERGE (a)-[:has_problem] -> (pr);

LOAD CSV WITH HEADERS FROM "file///systems.csv" as line3
WITH line3
CREATE (sy:System {name:line3.system, id:line3.activity_id})
MATCH (a:Activity), (sy:System)
WHERE a.id = sy.id
MERGE (a)-[:has_system] -> (sy);


(Michael Cohen) #4

I tweaked the code and got the following to work:

CREATE CONSTRAINT ON(a:Activity) ASSERT a.activity_id IS UNIQUE;
CREATE CONSTRAINT ON(s:Stakeholder) ASSERT s.name IS UNIQUE;
CREATE CONSTRAINT ON(p:Process) ASSERT p.name IS UNIQUE;
CREATE CONSTRAINT ON(pr:Problem) ASSERT pr.activity_id IS UNIQUE;
CREATE CONSTRAINT ON(sy:System) ASSERT sy.activity_id IS UNIQUE;

LOAD CSV WITH HEADERS FROM "file:///activities_full.csv" as line
WITH line
CREATE (a:Activity {name:line.activity, id:line.activity_id, lifecycle:line.lifecycle, activity_type:line.activity_type})
MERGE (s:Stakeholder {name :line.stakeholder})
MERGE (p:Process {name: line.process})
MERGE (s)-[:responsible_for]->(a)-[:belongs_to]->(p);

LOAD CSV WITH HEADERS FROM "file:///problems.csv" as line2
WITH line2
MERGE (pr:Problem {name: line2.problem, id:line2.activity_id});
MATCH (pr:Problem), (a:Activity)
WHERE pr.id = a.id
MERGE (a)-[:has_problem] -> (pr);

LOAD CSV WITH HEADERS FROM "file:///systems.csv" as line3
WITH line3
MERGE (sy:System {name: line3.system, id:line3.activity_id});
MATCH (sy:System), (a:Activity)
WHERE sy.id = a.id
MERGE (a)-[:has_system] -> (sy);