Split multi value csv colomn into nodes and create relationship of those values

arthcras · August 12, 2024, 12:38am

I’am looking to split multiple datapoints of one csv colomm into 3 nodes and create a next relation. The case is as follows the header of the csv file is named Variable; the data is separated with | as follows “Emissions|CO2|AFOLU” any suggestion how to split this into Nodes and then create a next relation between those nodes are welcome and appreciated.
The next step is to link this variable to a specific case in another colomn of the csv file
As follows
Case 1 - emission's|co2|afolu
Case 2 - emission’s/ price |afolu
Any further ideas are appreciated

glilienfield · August 12, 2024, 1:47pm

Try this:

with "Emissions|CO2|AFOLU" as input
unwind split(input, "|") as prop
create(n:NodeLabel{type: prop})
with collect(n) as nodes
unwind range(0,size(nodes)-2) as index
with nodes[index] as startNode, nodes[index+1] as endNode
create (startNode)-[r:REL_TYPE]->(endNode)
return *

arthcras · August 12, 2024, 5:09pm

Cool!! Thank you
Rgds arthur

glilienfield · August 17, 2024, 2:54am

I am not sure I understand your ask, but I did something that almost gets the graph. I am not sure how you get a relationship from AFOLU to Emissions.

Here are two approaches that do the same thing. One uses sequential code by passing along case2 through the creation of the case1 nodes. The second avoids the need to pass the case2 list by using call subqueries.

Solution with "with" clauses

with 
    "Emissions|CO2|AFOLU" as case1,
    "Emissions|Price|AFOLU" as case2
//case1
unwind split(case1, "|") as prop
merge(n:NodeLabel{type: prop})
with collect(n) as nodes, case2
unwind range(0,size(nodes)-2) as index
with nodes[index] as startNode, nodes[index+1] as endNode, case2
merge (startNode)-[r:REL_TYPE]->(endNode)
//case2
with distinct case2
unwind split(case2, "|") as prop
merge(m:NodeLabel{type: prop})
with collect(m) as nodes
unwind range(0,size(nodes)-2) as index
with nodes[index] as startNode, nodes[index+1] as endNode
merge (startNode)-[r:REL_TYPE]->(endNode)

Solution with call subqueries:

with 
    "Emissions|CO2|AFOLU" as case1,
    "Emissions|Price|AFOLU" as case2
call {
    with case1
    unwind split(case1, "|") as prop
    merge(n:NodeLabel{type: prop})
    with collect(n) as nodes
    unwind range(0,size(nodes)-2) as index
    with nodes[index] as startNode, nodes[index+1] as endNode
    merge (startNode)-[r:REL_TYPE]->(endNode)
}
call {
    with case2
    unwind split(case2, "|") as prop
    merge(m:NodeLabel{type: prop})
    with collect(m) as nodes
    unwind range(0,size(nodes)-2) as index
    with nodes[index] as startNode, nodes[index+1] as endNode
    merge (startNode)-[r:REL_TYPE]->(endNode)
}

Is this what you wanted?

arthcras · August 17, 2024, 10:51am

Hi Gary: thanks for your input so far, i believe we are almost there (i have attached a picture of the csv file for reference); now we have to two path visualized; the first one emissions, co2, afolu and second one: emissions, price and afolu. The one item pending is to create a relation named "Linked_to" to case 1 and case 2; thus the nodes of path 1 should be linked to case 1 and the nodes of path 2 should be linked to case 2. The result will be amongst that node "emissions" is then linked to case 1 and 2 and the same goes for the nodes afolu. CO2 is only linked to case 1 and Price to case 2.
With regard to the suggested solution; what about if you have for example 100 different variable combinations case 1 case 2... case 100, should you have equal sub_queries or is there an alternative.

glilienfield · August 17, 2024, 11:46am

Actually, since they will be rows in a csv file, you can do the following to process any number of rows. The example has two rows for 'row', but you would replace the beginning 'unwind' I used to set test data with a "load csv" with your real data.

unwind [
    "Emissions|CO2|AFOLU",
    "Emissions|Price|AFOLU"
] as row
unwind split(row, "|") as prop
merge(n:NodeLabel{type: prop})
with collect(n) as nodes
unwind range(0,size(nodes)-2) as index
with nodes[index] as startNode, nodes[index+1] as endNode
merge (startNode)-[r:REL_TYPE]->(endNode)
return *

Sorry, but I still don't understand what a case is and how cases are related. Understanding will help me write a query.

Also, can you explain how you will query this data? It may help me understand the data model, and maybe propose an improvement.

glilienfield · August 17, 2024, 12:04pm

So, I kinda thought I figured out what you wanted, but I am not sure any longer. I added a "Case" node to represent each case, then linked the paths to them. If you want to keep the path nodes unique, then you can't tell which path belongs to each case (as shown below):

unwind [
    {path: "case1", values: "Emissions|CO2|AFOLU"},
    {path: "case2", values: "Emissions|Price|AFOLU"}
] as row
unwind split(row.values, "|") as prop
merge(n:NodeLabel{type: prop})
with row, collect(n) as nodes
create(p:Case{name: row.path})
with p, nodes, head(nodes) as firstNode
merge (p)-[:LINKED_TO]->(firstNode)
with nodes
unwind range(0,size(nodes)-2) as index
with nodes[index] as startNode, nodes[index+1] as endNode
merge (startNode)-[r:REL_TYPE]->(endNode)

The alternative is to create new path nodes, in which case you know which nodes below to each case (as shown below):

unwind [
    {path: "case1", values: "Emissions|CO2|AFOLU"},
    {path: "case2", values: "Emissions|Price|AFOLU"}
] as row
unwind split(row.values, "|") as prop
create(n:NodeLabel{type: prop})
with row, collect(n) as nodes
create(p:Case{name: row.path})
with p, nodes, head(nodes) as firstNode
merge (p)-[:LINKED_TO]->(firstNode)
with nodes
unwind range(0,size(nodes)-2) as index
with nodes[index] as startNode, nodes[index+1] as endNode
merge (startNode)-[r:REL_TYPE]->(endNode)

Are either of these moving in the correct direction?

Finally, if you want to have the relationship type defined in the spreadsheet and vary by row, then you will have to use an APOC procedure to create the relationships dynamically. Cypher does not have a mechanism to set the relationship's type using a parameter.

arthcras · August 17, 2024, 12:19pm

Hi Gary
Thanks again and the right direction!. Each variable of the rows are now related to the cases
Rgds

arthcras · August 19, 2024, 12:09am

Hi Gary
Have replaced the case now for a scenario node and have added an additional model node
This result that the paths are is longer Scenario - model - variable. No sure of the first and secondNode approach is to right approach any suggestions are welcome
Cheers

unwind [

{path: "scenario 1", model: “model 3”,values: "Emissions|CO2|AFOLU"},

{path: "scenario 2", model “model 2”, values: "Emissions|Price|AFOLU"}

{path: "scenario 3", model “model 2”, values: "Emissions|Investment|AFOLU"}

] as row

unwind split(row.values, "|") as prop

create(n:Variabele{type: prop})

with row, collect(n) as nodes

create(p:Scenario{name: row.path})

create (z:Model{name:row.path})

with p, nodes, head(nodes) as firstNode

with z, nodes, head(nodes) as secondNode

merge (p(- [LINKED_TO]->(firstNode)

with nodes

merge (z)- [FOR]-> (secondNode)

unwind range(0,size(nodes)-2) as index

with nodes[index] as startNode, nodes[index+1] as endNode

merge (startNode)-[r:REL_TYPE]->(endNode)

glilienfield · August 19, 2024, 1:37am

The query is a little off. Can you provide a diagram of the result? I will make the query match. A nice tool for neo4j data modeling is arrows:

In case you just want the nodes linked together..

unwind [
    {path: "scenario 1", model: "model 3", values: "Emissions|CO2|AFOLU"},
    {path: "scenario 2", model: "model 2", values: "Emissions|Price|AFOLU"},
    {path: "scenario 3", model: "model 2", values: "Emissions|Investment|AFOLU"}
] as row
//
// Create all the nodes
//
unwind split(row.values, "|") as prop
create(n:Variabele{type: prop})
with row, collect(n) as nodes
create(p:Scenario{name: row.path})
create(z:Model{name: row.model})
//
// Create the relationships
//
with p, z, nodes, head(nodes) as startNode
merge (p)-[:LINKED_TO]->(z)
merge (z)-[:FOR]->(startNode)
with nodes
unwind range(0,size(nodes)-2) as index
with nodes[index] as startNode, nodes[index+1] as endNode
merge (startNode)-[r:NEXT_VALUE]->(endNode)

Topic		Replies	Views
How to Split a Row While importing CSV with Cypher Neo4j Graph Platform migrated	27	293	August 10, 2022
Splitting multi-valued cells when importing CSV files Cypher	9	2490	January 29, 2020
Multiple Nodes in 1 column as target Cypher cypher	8	239	March 2, 2022
Create relationship based on column value from CSV Cypher querying , cypher	3	683	July 13, 2021
Multiple Relationships in one csv with arrays in the relationship column Newbie Questions cypher	5	514	May 19, 2023

Split multi value csv colomn into nodes and create relationship of those values

Related topics