Better way to merge/match loading from csv

norbert · November 13, 2018, 8:29am

I am reading data from a file and want to create nodes and relations between them. There is a chance that either of the two nodes of the relation are not existing by now.

I solved this by:

LOAD CSV WITH HEADERS FROM "foobar.csv" AS ROW
  MERGE (a:TypeA { name: row.name, revision: toInt(row.revision) })
  MERGE (b:TypeB { name: row.name }) ;

LOAD CSV WITH HEADERS FROM "foobar.csv" AS ROW
  MATCH (a:TypeA { name: row.name, revision: toInt(row.revision) })
  MATCH (b:TypeB { name: row.name }) 
  MERGE (a) -[r:is_instance]-> (b) ;

This loads the csv two times, which is not optimal.

I have the following cases:

(a), (b), (r) do not exists: In this case I could have simply used one single MERGE command
(b) exists, (a), (r) do not exists: In this case I would need a MATCH against (b) followed by a merge of the relation
(a), (b), (r) exists: a simple merge or a match against all three all would be fine.

Using the second variant that MATCH against (b) fails to create anything in the first case, because there is no MATCH at all.

Is there any other way to avoid using the double loop over csv?

Thanks

andrew_bowman · November 13, 2018, 5:12pm

Sure, just use MERGE for all 3:

LOAD CSV WITH HEADERS FROM "foobar.csv" AS row
  MERGE (a:TypeA { name: row.name, revision: toInt(row.revision) })
  MERGE (b:TypeB { name: row.name })
  MERGE (a) -[r:is_instance]-> (b) ;

norbert · November 14, 2018, 5:37am

Thanks, - I guess without the ; after the second MERGE.

I somehow assumed that the first two MERGE will not be ready for the third one.

Thanks for the quick response, and sorry for the rather stupid question

andrew_bowman · November 14, 2018, 11:28pm

My mistake on the ;, and you're welcome!

No stupid questions here, learning is full of trial and error.

jsmccrumb · November 16, 2018, 8:50pm

I think doing it all in one go will cause an EAGER (run it through an EXPLAIN). If you have a large CSV this could lead to problems. Might be better to iterate over the file twice. Just something to keep an eye on, if it works loading all at once then no worries.

andrew_bowman · November 16, 2018, 10:35pm

In this case since the nodes are of two different types, no Eager will be introduced.

jsmccrumb · November 17, 2018, 2:26am

Thanks for the clarification, good to know!

norbert · November 17, 2018, 3:40am

I don't know about EAGER, sorry, but I anyway added a USING PERIODIC COMMIT 1000 before each load.

jsmccrumb · November 17, 2018, 1:29pm

I was mistaken, you only have EAGER if the nodes had the same label. If you run:

EXPLAIN
LOAD CSV WITH HEADERS FROM "foobar.csv" AS row
MERGE (a:TypeA { name: row.name, revision: toInt(row.revision) })
MERGE (b:TypeB { name: row.name })
MERGE (a) -[r:is_instance]-> (b) ;

in the browser it comes back with the "query plan". If you see a line that just says "EAGER" when loading a CSV, its bad news. You should be good

Topic		Replies	Views
Load csv without duplicates Cypher	3	1129	February 12, 2020
Load data from CSV Cypher cypher , load-csv , import , csv	2	586	March 4, 2020
Duplicated relationships for same nodes when loading from CSV Newbie Questions	2	1223	February 17, 2019
MERGE to create new Relation from csv Neo4j Graph Platform migrated	2	100	August 18, 2022
Relationship Load Failure Cypher	4	920	May 25, 2019

Better way to merge/match loading from csv

Related topics