Using MERGE w/ ON CREATE SET vs MERGE

Hi new here. I am following the guide (Tutorial: Import Relational Data Into Neo4j - Developer Guides) on importing CSV files into the graph database using the Northwind products.csv file as an example.

The sample code provided in the guide is:

/ LOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row
MERGE (product:Product {productID: row.ProductID})
ON CREATE SET product.productName = row.ProductName, product.unitPrice = toFloat(row.UnitPrice); /
using this code takes appox. 27 ms on my machine to create the nodes and properties

I've created the exact same result using MERGE only:

/LOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row
MERGE (product:Product {productID: row.ProductID, productName:row.ProductName, unitPrice:toFloat(row.UnitPrice)}) / using this code takes appox. 60 ms on my machine to create the nodes and properties

Is there an advantage or best practice in using one over the other or is it pretty much the same thing?

Thx Josh

1 Like

the 2 statements are slightly different in that the first simply is asking to find a node with label :Product and which has a productID of row.ProductID. The 2nd statement is asking to find a node with label :Product and which has a productID of row.ProductID AND also has a productName equal to row.ProductName AND unitPrice equal to row.UnitPrice

Thx for the reply Dana after reading your comment it seems to me my confusion was in the MERGE clause. I was starting with a blank database so I kept thinking MERGE was only to create new nodes; instead of MERGE as trying to first find if those nodes with properties exist in my current database, and if not, create them.

So how is the row.ProductName assigned in the line ON CREATE SET product.productName = row.ProductName? Wouldn't the node (e.g. with ID 101) get the row.ProductName that is on the same row as ID 101? Which is the same as

right?

Are there examples of how these these two queries would lead to differen results? I don't think I fully understand it still.

Thanks.

Hi @rogeryu

The ID is just a native internal number neo4j assigns to any node created. When any node is created neo4j automatically assigns an internal "id" number. So as example, if you start with a blank empty database, and create the first node, the internal id number assigned could be 0. Then if you create a second node, it could be assigned 1. This internal id has nothing to do with the information you are trying to import from a csv file.

As for the example importing the products.csv file:

/LOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row
MERGE (product:Product {productID: row.ProductID})
ON CREATE SET product.productName = row.ProductName, product.unitPrice = toFloat(row.UnitPrice);/

//when importing, row.ProductID looks for ProductID on the spreadsheet, and will create productID property for the node and assign it 1.

Hopefully that helps.

@moomoofarm
To compare to SQL, using MERGE in Cypher has the effect of applying a key constraint, is that right?

I'm going through the Graph Academy at the moment and one of the queries is:

LOAD CSV WITH HEADERS FROM 'https://r.neo4j.com/flights_2019_1k' AS row
MERGE (origin:Airport {code: row.Origin})
MERGE (destination:Airport {code: row.Dest})
MERGE (origin)-[connection:CONNECTED_TO {
  airline: row.UniqueCarrier,
  flightNumber: row.FlightNum,
  date: toInteger(row.Year) + '-' + toInteger(row.Month) + '-' + toInteger(row.DayofMonth)}]->(destination)
ON CREATE SET connection.departure = toInteger(row.CRSDepTime), connection.arrival = toInteger(row.CRSArrTime)

And I was just wondering why some fields are MERGEd and others are SET. Seems like it's the concept of ensuring that there is no duplication for certain combinations of fields. Am I on the right track?

@rogeryu

Hi Roger, using MERGE is not the same as applying a key constraint. MERGE checks to see if a pattern already exists in your graph. If it exists, it will not create the pattern. If it does not exist, then it will create the pattern. For example assuming Bob is already created in the graph:

MERGE(:Student {studentid:1000, name:"Bob"}) //This will not create Bob again since it is already in the graph. However, if you copy and paste with out setting a key constraint, CREATE(:Student {studentid:1000, name:"Bob"}) will create another Bob with the same information making it a duplicate.

To set a key constraint you need to do something like this:
CREATE CONSTRAINT ON (s:Student) ASSERT s.studentid IS UNIQUE

Once the constraint is set, you will not be able to make another node with studentid:1000 as it is taken by Bob.

It depends on the use case scenario, but in the example you provided the ON CREATE SET makes sense and is used in conjunction in this case with MERGE since if the above nodes do not exist, then create it along with setting the departure and arrival.

But basically yes at times, I use MERGE to make sure there are not duplicate nodes. But it might be better like setting a constraint on a property like the above example of student to make sure there are no duplicate ids. Really depends on your use case but definitely more than one way to go about getting an answer.

I hope this helps.

To elaborate further, MERGE is not the same as an upsert. The key thing to keep in mind is what @moomoofarm mentioned in his reply:

If it exists , it will not create the pattern. If it does not exist , then it will create the pattern.

I'll add on to his reply with some additional examples.

Assume we do this:

MERGE(:Student {studentid:1000, name:"Bob"})

Now we have a :Student node with the given id with the name of "Bob". But let's say we now know Bob's hair color, and want to add on to the node. If we try this:

MERGE(:Student {studentid:1000, name:"Bob", haircolor:"Brown"})

Instead of updating the Bob node, this will find that there is no such node with all of the given properties, so it will create a brand new node with the given properties. This will mean that there will be two nodes with the same studentid and name, but only one will have the haircolor property.

If we wanted to update a potentially existing node, we should only MERGE on the properties that uniquely identify the node, and then use either SET, ON CREATE SET, or ON MATCH SET to set any additional properties, depending on if there are any restrictions for which situation the other properties should be set:

MERGE(s:Student {studentid:1000, name:"Bob"})
SET s.haircolor = "Brown"
1 Like

Ok, that clears things up‼ Thank you so much!