I am wondering how this following query works. We start with an empty DB.
WITH [{a:1, b:null}, {a:1, b:null}] AS rows
UNWIND rows AS row
MERGE (t:Test {a:row.a, b:coalesce(row.b, 'none')}) SET t.b=row.b
Added 1 label, created 1 node, set 3 properties, completed after 3 ms.
MATCH (t:Test) return t
╒═══════╕
│"t" │
╞═══════╡
│{"a":1}│
└───────┘
Why does this only create one node?
I would assume the following:
First row MERGE --> (t:Test {a:row.a, b:'none'}) does not exist
Create node (t:Test {a:row.a, b:'none'})
Set b = null --> now node is (t:Test {a:row.a})
Second row MERGE --> (t:Test {a:row.a, b:'none'}) does not exist
Create node (t:Test {a:row.a, b:'none'})
Set b = null --> now node is (t:Test {a:row.a})
--> At the end we have two nodes (t:Test {a:row.a})
Is coalesce or the SET doing here something different?
Note that if I do the two rows as separate queries (one per row) then the result is exactly as I expect and describe above. If the rows are unwinded in one query it's different.
If you EXPLAIN or PROFILE the query, you'll see there's an Eager operation in the query plan. This changes how the query will execute
Without an Eager, its execution matches how you described, executing row by row until completion.
With an Eager, the planner has recognized that this type of query will cause the ordering of row processing to have an impact on the results, something we generally want to avoid. The planner will then favor a mode of execution which minimizes the chances for this to happen.
Instead of executing the entirety of the query per row, the planner will instead apply each subsequent operation across all rows, then repeat for the next operation.
So what happens in this case, after the UNWIND, the MERGE will execute for both rows, which will create the node for the first row, and match to that newly created node for the second row. Then SET is applied for both rows, each of them operating on that single node.