Need to map Row to Column when they are refering to the interconnection of the same objects

joemalone1 · April 27, 2021, 5:49pm

I am trying to under stand how I would map/ interconnect row data to headers.

Below is a sample of the data and a drawing off what I would think the output would look.

	A	B	C	D
A	N	2	3	4
B	5	N	7	8
C	9	0	N	2
D	3	4	5	N

ameyasoft · April 28, 2021, 6:15pm

Here is my solution:

I created a csv file that matches with your data.
Col1,A,B,C,D
A,N,2,3,4
B,5,N,7,8
C,9,0,N,2
D,3,4,5,N

Step 1: As you are creating nodes with labels from column headers, I first created these nodes.

LOAD CSV WITH HEADERS from 'file:///joem.csv' as row 
with row limit 1 
with keys(row) as k1
CALL apoc.create.node(["Headers"], {name: k1[1]})
CALL apoc.create.node(["Headers"], {name: k1[2]})
CALL apoc.create.node(["Headers"], {name: k1[3]})
CALL apoc.create.node(["Headers"], {name: k1[4]})

Created nodes with names A, B, C, D

Step 2: 

LOAD CSV WITH HEADERS from 'file:///joem.csv' as row 
with row 
with keys(row) as k1, row
match(a:Headers) where a.name = k1[1]
match(b:Headers) where b.name = k1[2]
match(c:Headers) where c.name = k1[3]
match(d:Headers) where d.name = k1[4]

//case when row.Col1 = k1[1]
FOREACH(ignoreMe IN CASE WHEN row.Col1 = k1[1] THEN [1] ELSE [] END |

merge (a)-[:ROW_WISE {val: toInteger(row.B)}]->(b)
merge (a)-[:ROW_WISE {val: toInteger(row.C)}]->(c)
merge (a)-[:ROW_WISE {val: toInteger(row.D)}]->(d)

)


//case when row.Col1 = k1[2]
FOREACH(ignoreMe IN CASE WHEN row.Col1 = k1[2] THEN [1] ELSE [] END |

merge (a)-[:COLUMN_WISE {val: toInteger(row.A)}]->(b)
merge (c)-[:COLUMN_WISE {val: toInteger(row.C)}]->(b)
merge (d)-[:COLUMN__WISE {val: toInteger(row.D)}]->(b)

)

//case when row.Col1 = k1[3]
FOREACH(ignoreMe IN CASE WHEN row.Col1 = k1[3] THEN [1] ELSE [] END |

merge (a)-[:COLUMN_WISE {val: toInteger(row.A)}]->(c)
merge (b)-[:COLUMN_WISE {val: toInteger(row.B)}]->(c)
merge (d)-[:COLUMN_WISE {val: toInteger(row.D)}]->(c)

)

//case when row.Col1 = k1[4]
FOREACH(ignoreMe IN CASE WHEN row.Col1 = k1[4] THEN [1] ELSE [] END |

merge (a)-[:COLUMN_WISE {val: toInteger(row.A)}]->(d)
merge (b)-[:COLUMN_WISE {val: toInteger(row.B)}]->(d)
merge (c)-[:COLUMN_WISE {val: toInteger(row.C)}]->(d)

)

Set 12 properties, created 12 relationships, completed after 18 ms.

Result:

andrew_bowman · April 28, 2021, 7:43pm

While this is an interesting way to load data, it isn't standard, and doesn't scale well, since for each additional node involved you would have to add a new column and update all rows. Ideally, when the number of fields being set isn't growing, only the number of nodes needing to be connected or changed, your CSV should only need to change by adding additional rows. The number of columns should remain fixed.

A much easier approach is with a CSV that has a 3 columns: start, end, and value. Then you just build the CSV, each row representing a particular relationships between two nodes, and the value to use.

Your load query becomes much easier. Assuming that your nodes are already in the graph and this is just about creating relationships between them, and assuming we use :Node labels and :REL relationships (since that wasn't provided), and numeric ids for the nodes and numeric values for the values on relationships, a load of the relationships would be :

LOAD CSV WITH HEADERS from 'file:///myNewData.csv' as row 
MATCH (start:Node {id:toInteger(row.start)}), (end:Node {id:toInteger(row.end)})
MERGE (start)-[:REL {value:toInteger(row.value)}]->(end)

For a sufficiently large CSV you would want to use USING PERIODIC COMMIT LOAD CSV instead.

joemalone1 · April 28, 2021, 8:44pm

Wow thank you. I was afraid it would look as it did and the complexity is going to make me review how I am approaching my data set, as it is much larger.

ameyasoft · April 29, 2021, 6:42am

While this is an interesting way to load data, it isn't standard, and doesn't scale well:

It would be better to provide your model to this problem to prove that my model does not scale up.

andrew_bowman · April 29, 2021, 6:48am

My comment about scaling was more about CSV creation and maintenance (my prior answer should have been a reply to the original question, not your proposed solution). I wasn't looking much at your query for handling that strange CSV, I was more baffled about the CSV formatting.

The point I was trying to make is that for an arbitrary number of nodes to connect, you have to keep adding columns to the CSV, and fill out those columns across all rows.

But that's not how a CSV should be built out. If no additional properties are being added or removed for the load, then the addition of new nodes to connect should only require adding new rows to the CSV, not new columns, and not backfilling for new columns either.

ameyasoft · April 29, 2021, 6:55am

I agree with your standard scenarios. If a client wants to use Neo4j in scenarios other than standard scenarios, from business perspective you want to say yes or no.

Topic		Replies	Views
Loading CSV with Cypher with certain column names as properties Cypher cypher	2	1052	November 26, 2019
Relating nodes in csv with column names in Neo4J Neo4j Graph Platform apoc , cypher , operations , neo4j-desktop	7	351	January 2, 2022
Row's value with create Cypher	4	364	April 20, 2021
Load multiple CSV rows into same node Cypher cypher , import , neo4j-desktop	9	538	November 8, 2021
Load CSV into Neo4j and create nodes out of each row Cypher apoc , cypher , neo4j-desktop	2	964	January 1, 2022

Get Certified in June!

Need to map Row to Column when they are refering to the interconnection of the same objects

Related topics