Better way to merge/match loading from csv

cypher
merge

(Norbert Preining) #1

I am reading data from a file and want to create nodes and relations between them. There is a chance that either of the two nodes of the relation are not existing by now.

I solved this by:

LOAD CSV WITH HEADERS FROM "foobar.csv" AS ROW
  MERGE (a:TypeA { name: row.name, revision: toInt(row.revision) })
  MERGE (b:TypeB { name: row.name }) ;

LOAD CSV WITH HEADERS FROM "foobar.csv" AS ROW
  MATCH (a:TypeA { name: row.name, revision: toInt(row.revision) })
  MATCH (b:TypeB { name: row.name }) 
  MERGE (a) -[r:is_instance]-> (b) ;

This loads the csv two times, which is not optimal.

I have the following cases:

  • (a), (b), (r) do not exists: In this case I could have simply used one single MERGE command
  • (b) exists, (a), (r) do not exists: In this case I would need a MATCH against (b) followed by a merge of the relation
  • (a), (b), (r) exists: a simple merge or a match against all three all would be fine.

Using the second variant that MATCH against (b) fails to create anything in the first case, because there is no MATCH at all.

Is there any other way to avoid using the double loop over csv?

Thanks


(Andrew Bowman) #2

Sure, just use MERGE for all 3:

LOAD CSV WITH HEADERS FROM "foobar.csv" AS row
  MERGE (a:TypeA { name: row.name, revision: toInt(row.revision) })
  MERGE (b:TypeB { name: row.name })
  MERGE (a) -[r:is_instance]-> (b) ;

(Norbert Preining) #3

Thanks, - I guess without the ; after the second MERGE.

I somehow assumed that the first two MERGE will not be ready for the third one.

Thanks for the quick response, and sorry for the rather stupid question


(Andrew Bowman) #4

My mistake on the ;, and you're welcome!

No stupid questions here, learning is full of trial and error.


(Jacob McCrumb) #5

I think doing it all in one go will cause an EAGER (run it through an EXPLAIN). If you have a large CSV this could lead to problems. Might be better to iterate over the file twice. Just something to keep an eye on, if it works loading all at once then no worries.


(Andrew Bowman) #6

In this case since the nodes are of two different types, no Eager will be introduced.


(Jacob McCrumb) #7

Thanks for the clarification, good to know!


(Norbert Preining) #8

I don't know about EAGER, sorry, but I anyway added a USING PERIODIC COMMIT 1000 before each load.


(Jacob McCrumb) #9

I was mistaken, you only have EAGER if the nodes had the same label. If you run:

EXPLAIN
LOAD CSV WITH HEADERS FROM "foobar.csv" AS row
MERGE (a:TypeA { name: row.name, revision: toInt(row.revision) })
MERGE (b:TypeB { name: row.name })
MERGE (a) -[r:is_instance]-> (b) ;

in the browser it comes back with the "query plan". If you see a line that just says "EAGER" when loading a CSV, its bad news. You should be good :slight_smile: