Now that I look at this, I think we've identified the root of the slowdown, including within the original query with MERGE.
The problem isn't with locks, or double-checked locking (okay it's KIND of related, but not the heart of this).
The problem is the direction the planner is choosing to test whether the nodes are connected (which is part of the MERGE operation...twice, because of the double-checked locking), and how it isn't coping with the changes to the degree of the relationship as the relationships are added and the node becomes dense.
Again, at the start, b
has degree 0, so the planner uses an expand from b
to a
to see if the nodes are connected. And that's just fine until enough relationships are added that b
has become dense, and the cost for this check becomes increasingly expensive. The more dense it gets, the worse it performs for the next row.
So there are a few ways to solve this.
In my first (viable and working) solution with collections and the list membership check (not using subtract()) we avoided the expansion problem completely. This is probably a great general solution that should work in all cases.
In the second efficient solution (using apoc.nodes.connected()
) the function allowed the query to choose at runtime, on a row-by-row basis, which node to expand from to test for the check, so it avoids the cost of expanding from the now-dense node b
. This should be a good solution as long as both nodes are not dense.
And just to prove what's going on...if you create your nodes, but add some relationships on b
at the very start, and use the query with MERGE from the start (with a slight tweak), you should see it should only take a couple seconds, because the planner seems to choose to expand from a
nodes rather than the b
node, avoiding being bitten as b
becomes dense:
unwind range(1,200000) as id
create (:Dense1 {id:toString(id), key:12345})
now seed some relationships on b
match (b:Dense1{id:"1"})
unwind range(1,100) as id
create (:Dummy)-[:PARENT_OF]->(b)
and use the MERGE query
profile
MATCH (b:Dense1) where b.id = "1"
MATCH (a:Dense1) WHERE a.key = b.key and a.id <> "1"
MERGE (a)-[:PARENT_OF]->(b)
I'll see what our Cypher engineers can use from this.