How to Split a Row While importing CSV with Cypher

mochalla · August 5, 2022, 7:03am

Hi,

I want to create a database from a CSV file using LOAD CSV. A column called Problem contains a unique value in each cell. So I'm going to generate a node from each one of the values, belonging to the p: product label.
Two other columns -Patient and Medication- may have a unique value or multiple values in each of their cells (and, sometimes, are empty).
So, I want to split cells containing multiple values in Problem Column in order to generate separated nodes. Nodes from the column patient will belong to the label p.patient, and nodes from the Medication column will belong to the label m.medication.

Test Data: Drive Link

I am successful with this cypher query to load and transform data except splitting the row which has multi value.

CREATE CONSTRAINT FOR (p:Patient) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT FOR (p:Problem) REQUIRE p.name IS UNIQUE;
CREATE CONSTRAINT FOR (t:Test) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT FOR (m:Medication) REQUIRE m.name IS UNIQUE;

// Patient id, Problem, Test, Medication.
LOAD CSV WITH HEADERS FROM
'file:///tathasthu1.csv' AS row
WITH row

CREATE (patient:Patient {id: row.Patient})
MERGE (problem:Problem {name: row.Problem})
MERGE (test:Test {name: row.Test})
MERGE (medication:Medication {name: row.Medication})

CREATE (patient)-[:HAS]->(problem)
CREATE (patient)-[:PERFORMED]->(test)
CREATE (test)-[:TAKEN_FOR]->(problem)
CREATE (patient)-[:TAKES]->(medication)
CREATE (medication)-[:CAN_TREAT]->(problem)
;

--
CREATE CONSTRAINT FOR (s:Subproblem) REQUIRE s.name is UNIQUE;
LOAD CSV WITH HEADERS FROM
'file:///tathasthu1.csv' AS row
WITH row WHERE NOT row.Subproblem IS null
MATCH (problem:Problem {name: row.Problem})
MATCH (medication:Medication {name: row.Medication})
MERGE(subproblem:Subproblem {name: row.Subproblem})
CREATE (subproblem)-[:IN_CATEGORY]->(problem)
CREATE (medication)-[:SIDE_EFFECT]->(subproblem)
SET subproblem.name = row.Subproblem

Now I have to split Glaucoma and Chronic Pain value row into two nodes, because as per test data, vision loss is due to chronic pain but not to Glaucoma as mentioned in the Details column.

Help is appreciated. Thank you.

cobra · August 5, 2022, 8:20am

These queries should do what you want:

// Create constraints
CREATE CONSTRAINT FOR (p:Patient) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT FOR (p:Problem) REQUIRE p.name IS UNIQUE;
CREATE CONSTRAINT FOR (t:Test) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT FOR (m:Medication) REQUIRE m.name IS UNIQUE;
CREATE CONSTRAINT FOR (s:Subproblem) REQUIRE s.name is UNIQUE;

// Create nodes
LOAD CSV WITH HEADERS FROM 'file:///tathasthu1.csv' AS row
WITH row
MERGE (:Patient {id: row.Patient})
MERGE (:Test {name: row.Test})
MERGE (:Medication {name: row.Medication})
WITH row
CALL apoc.do.when(row.Subproblem IS NOT NULL,
    'MERGE (:Subproblem {name: row.Subproblem}) RETURN row', 
    'RETURN row', 
    {row: row}
)
YIELD value
WITH split(row.Problem, ", ") AS problems
UNWIND problems AS problem
MERGE (:Problem {name: problem});

// Create relationships
LOAD CSV WITH HEADERS FROM 'file:///tathasthu1.csv' AS row
WITH row
MATCH (p:Patient {id: row.Patient})
MATCH (t:Test {name: row.Test})
MATCH (m:Medication {name: row.Medication})
CREATE (p)-[:PERFORMED]->(t)
CREATE (p)-[:TAKES]->(m)
WITH row, p, t, m
CALL apoc.do.when(row.Subproblem IS NOT NULL,
    'MATCH (s:Subproblem {name: row.Subproblem}) CREATE (m)-[:SIDE_EFFECT]->(s) RETURN s', 
    'RETURN null AS s', 
    {row: row, m: m}
)
YIELD value
WITH split(row.Problem, ", ") AS problems, value.s AS subproblem, p, t, m
UNWIND problems AS problem
MATCH (prob:Problem {name: problem})
CREATE (p)-[:HAS]->(prob)
CREATE (t)-[:TAKEN_FOR]->(prob)
CREATE (m)-[:CAN_TREAT]->(prob)
WITH prob, subproblem
WHERE subproblem IS NOT NULL
CREATE (subproblem)-[:IN_CATEGORY]->(prob);

Regards,
Cobra

cobra · August 5, 2022, 7:17am

Hello

In your example, which cell in your CSV has multiple values?

Regards,
Cobra

mochalla · August 5, 2022, 8:40am

I am getting this error!

mochalla · August 5, 2022, 7:31am

Patient B0021 with Problem. Cell number: B2

cobra · August 5, 2022, 8:48am

Sorry, you must install APOC plugin on your database.

cobra · August 5, 2022, 7:59am

All columns except Patient one can have multiple values?

mochalla · August 5, 2022, 9:14am

Thank you so much. This worked.

Could you help me to understand the following:
1. Why is yield used?
2. How cypher is reading prop as problem because not in any clause you mentioned the same.

MATCH (prob:Problem {name: problem})

3. If i want the subproblem to be connected to one problem, but to both split values, how to do that!
Thanks.

cobra · August 5, 2022, 9:22am

YIELD is used to get result from procedure, in your case it's apoc.do.when(). The value is a dictionary composed of what is in the return clause of apoc.do.when().
prob and problem are both variables, prob is the variable that contains the node and problem is the variable that contains the name that I get from the UNWIND clause.

mochalla · August 5, 2022, 9:28am

Thank you.

3. If i want the subproblem to be connected to one problem, but not to both split values, then may I know how to do that?

cobra · August 5, 2022, 9:42am

Do you have a rule that works all the time? For example always connect to the first problem.

Otherwise, I will need more details about your use case.

mochalla · August 5, 2022, 1:06pm

No there no such rule. It depends on the problem and subproblem!
Kindly look at the following image, here you see Glaucoma doesn't have a relationship to sub problem, but chronic pain does have a relationship to sub problem.
So, now with the Split, the values in the problem row are separated. But, what if I want to connect one value from the same cell to the sub problem but not the other one!

cobra · August 5, 2022, 1:27pm

Well I think you will have create 2 CSVs, one for nodes and the other one for relationships. Since it's specific, you will have to define them before to load them in the database.

mochalla · August 8, 2022, 11:54am

Instead of 2 CSV's I have created a separate column for each and added more rows. Now in this file (FILE), you can see there are one or two null values in each column.

// Create constraints
CREATE CONSTRAINT FOR (p:Patient) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT FOR (d:Disease) REQUIRE d.name IS UNIQUE;
CREATE CONSTRAINT FOR (t:Test) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT FOR (m:Medication) REQUIRE m.name IS UNIQUE;
CREATE CONSTRAINT FOR (s:Sub_disease) REQUIRE s.name is UNIQUE;
CREATE CONSTRAINT FOR (dm:Medication_details) REQUIRE dm.name is UNIQUE;
CREATE CONSTRAINT FOR (td:Test_details) REQUIRE td.name is UNIQUE;

// Create nodes
LOAD CSV WITH HEADERS FROM 'file:///healthcare.csv' AS row
WITH row
MERGE (:Patient {id: row.Patient})
MERGE (:Test {name: row.Test})
MERGE (:Medication {name: row.Medication})
MERGE (:Disease {name: row.Disease});
WITH row
CALL apoc.do.when(row.Sub_disease IS NOT NULL,
'MERGE (:Sub_disease {name: row.Sub_disease}) RETURN row',
'RETURN row',
{row: row}
CALL apoc.do.when(row.Detailmed is NOT NULL,
'MERGE (:Detailmed {name: row.Medical_details}) RETURN row',
'RETURN row',
{row: row}
CALL apoc.do.when(row.Test_details is NOT NULL,
'MERGE (:Test_details {name: row.Test_details}) RETURN row',
'RETURN row',
{row: row}
)
YIELD value

// Create relationships
LOAD CSV WITH HEADERS FROM 'file:///healthcare.csv' AS row
WITH row
MATCH (p:Patient {id: row.Patient})
MATCH (t:Test {name: row.Test})
MATCH (m:Medication {name: row.Medication})
MATCH (dm:Medical_details {name: row.Medica_details})
MATCH (tm:Test_detials {name: row.Test_details})

CREATE (p)-[:HAS]->(d)
CREATE (p)-[:PERFORMED]->(t)
CREATE (p)-[:TAKES]->(m)
WITH row, p, t, m,
CALL apoc.do.when(row.Sub_disease IS NOT NULL,
'MATCH (s:Sub_disease {name: row.Sub_diease}) CREATE (s)-[:CAN_BE_ASSOCIATED_WITH]->(d) RETURN s',
'RETURN null AS s',
{row: row, d: d}
CALL apoc.do.when(row.Medical_detials is NOT NULL,
'MATCH (:Medical_details {name: row.Medical_detials}) CREATE (m)-[:IS_A]->(dm) RETURN dm',
'RETURN null AS dm',
{row: row, m: m}
CALL apoc.do.when(row.Test_details is NOT NULL,
'MATCH (:Testdetails {name: row.Test_details}) CREATE (td)-[:IS_A_TEST_OF]->(t) RETURN td',
'RETURN null AS td',
{row: row, td: td}
)
YIELD value
CREATE (t)-[:TAKEN_FOR]->(d)
CREATE (m)-[:CAN_TREAT]->(d)
;
I have shared example of one patient about how I want to connect.

Following is the error I am getting,

Thanks in advance.

cobra · August 8, 2022, 11:57am

I guess you have a problem of copy-paste. The beginning of the query is missing.

mochalla · August 8, 2022, 12:17pm

I have the one I pasted. I think I am missing something in my query!

cobra · August 8, 2022, 1:15pm

I will see but I cannot access your new file

mochalla · August 8, 2022, 5:15pm

Apologies. Here is the File

cobra · August 9, 2022, 8:15am

// Create constraints
CREATE CONSTRAINT FOR (p:Patient) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT FOR (d:Disease) REQUIRE d.name IS UNIQUE;
CREATE CONSTRAINT FOR (t:Test) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT FOR (m:Medication) REQUIRE m.name IS UNIQUE;
CREATE CONSTRAINT FOR (s:SubDisease) REQUIRE s.name is UNIQUE;
CREATE CONSTRAINT FOR (dm:MedicationDetails) REQUIRE dm.name is UNIQUE;
CREATE CONSTRAINT FOR (td:TestDetails) REQUIRE td.name is UNIQUE;

// Create nodes
LOAD CSV WITH HEADERS FROM 'file:///healthcare.csv' AS row 
WITH row 
MERGE (:Patient {id: row.Patient}) 
MERGE (:Test {name: row.Test}) 
MERGE (:Medication {name: row.Medication}) 
MERGE (:Disease {name: row.Disease}) 
WITH row
CALL apoc.do.when(
	row.Subdisease IS NOT NULL, 
	'MERGE (:SubDisease {name: row.Subdisease}) RETURN row', 
	'RETURN row', 
	{row: row}
) 
YIELD value
WITH row
CALL apoc.do.when(
	row.Detailmed IS NOT NULL, 
	'MERGE (:MedicationDetails {name: row.Detailmed}) RETURN row', 
	'RETURN row', 
	{row: row}
)
YIELD value
WITH row
CALL apoc.do.when(
	row.Testdetails IS NOT NULL, 
	'MERGE (:TestDetails {name: row.Testdetails}) RETURN row', 
	'RETURN row', 
	{row: row}
)
YIELD value
RETURN row;

// Create relationships
LOAD CSV WITH HEADERS FROM 'file:///healthcare.csv' AS row 
WITH row 
MATCH (p:Patient {id: row.Patient}) 
MATCH (t:Test {name: row.Test}) 
MATCH (m:Medication {name: row.Medication}) 
MATCH (d:Disease {name: row.Disease}) 
MERGE (p)-[:PERFORMED]->(t) 
MERGE (p)-[:TAKES]->(m) 
MERGE (p)-[:HAS]->(d) 
MERGE (t)-[:TAKEN_FOR]->(d) 
MERGE (m)-[:CAN_TREAT]->(d) 
WITH row, p, t, m, d 
CALL apoc.do.when(
	row.Detailmed IS NOT NULL, 
	'MATCH (n:MedicationDetails {name: row.Detailmed}) MERGE (m)-[:IS_A]->(n) RETURN row', 
	'RETURN row', 
	{row: row, m: m}
)
YIELD value
WITH row, p, t, m, d
CALL apoc.do.when(
	row.Testdetails IS NOT NULL, 
	'MATCH (n:TestDetails {name: row.Testdetails}) MERGE (n)-[:IS_A_TEST_OF]->(t) RETURN row', 
	'RETURN row', 
	{row: row, t: t}
)
YIELD value
WITH row, p, t, m, d
CALL apoc.do.when(
	row.Subdisease IS NOT NULL, 
	'MATCH (s:SubDisease {name: row.Subdisease}) MERGE (s)-[:CAN_BE_ASSOCIATED_WITH]->(d) RETURN row', 
	'RETURN row',
	{row: row, d: d}
)
YIELD value
RETURN row;

Regards,
Cobra

cobra · August 9, 2022, 7:19am

There are a lot of typos in your queries and column names are not corresponding to what you write in the queries... You could at least have a look at the documentation...

Topic		Replies	Views
"How to Split a Row While importing CSV with Cypher" was flagged as spam Cypher	1	172	March 13, 2023
Splitting multi-valued cells when importing CSV files Cypher	9	2513	January 29, 2020
How to Split a ROW into two labels or nodes while importing CSV file General migrated	3	164	September 21, 2022
Split multi value csv colomn into nodes and create relationship of those values Cypher cypher	9	71	August 19, 2024
How do I split the data in the column while loading in neo4j using neo4j-admin import? parsing issue Import / Export import , neo4j-import	2	1091	December 30, 2020

Get Certified in June!

How to Split a Row While importing CSV with Cypher

Related topics