Hello all,
I would be extremely grateful if someone could shed some light on the behavior I get when running the simple experiments described below. They are about concurrent access in transactions and what I get does not match the documented behavior.
The scenario
Consider the following query:
MATCH(reg:Registry)
SET reg.counter = reg.counter + 1
MERGE (d:Dummy {Id: reg.counter})<-[:Owns]-(reg)
Before running the query, the DB is emptied, then initialized with a unique Registry node whose counter property is set to 0.
When running two instances of this query on an empty database, in one transaction each, I see that a deadlock blocks one of the two transactions.
Here is the output of the program:
> dotnet run
info: sandbox[0]
Routing table is updated => RoutingTable{database=neo4j, routers=[neo4j://127.0.0.1:7687/], writers=[neo4j://127.0.0.1:7687/], readers=[neo4j://127.0.0.1:7687/], expiresAfter=300s}
info: sandbox[0]
Routing table is updated => RoutingTable{database=neo4j, routers=[neo4j://127.0.0.1:7687/], writers=[neo4j://127.0.0.1:7687/], readers=[neo4j://127.0.0.1:7687/], expiresAfter=300s}
info: sandbox[0]
Routing table is updated => RoutingTable{database=neo4j, routers=[neo4j://127.0.0.1:7687/], writers=[neo4j://127.0.0.1:7687/], readers=[neo4j://127.0.0.1:7687/], expiresAfter=300s}
warn: sandbox[0]
Transaction failed and will be retried in 00:00:00.9965516 ms.
Neo4j.Driver.TransientException: ForsetiClient[transactionId=2511, clientId=77] can't acquire EXCLUSIVE NODE(34401) because it would form this deadlock wait cycle:
NODE(34401)-[EXCLUSIVE_OWNER]->(tx:2512)-[WAITING_FOR_EXCLUSIVE]->(NODE_RELATIONSHIP_GROUP_DELETE(34401))-[EXCLUSIVE_OWNER]->(tx:2511)-[WAITING_FOR_EXCLUSIVE]->(NODE(34401))
at Neo4j.Driver.Internal.MessageHandling.ResponsePipelineError.EnsureThrown()
at Neo4j.Driver.Internal.Result.ResultCursorBuilder.ConsumeAsync()
at Neo4j.Driver.Internal.AsyncTransaction.DiscardUnconsumed()
at Neo4j.Driver.Internal.AsyncTransaction.CommitAsync()
at Neo4j.Driver.Internal.AsyncTransaction.CommitAsync()
at Neo4j.Driver.Internal.AsyncSession.<>c__DisplayClass52_0`1.<<RunTransactionAsync>b__1>d.MoveNext()
--- End of stack trace from previous location ---
at Neo4j.Driver.Internal.AsyncSession.<>c__DisplayClass52_0`1.<<RunTransactionAsync>b__1>d.MoveNext()
--- End of stack trace from previous location ---
at Neo4j.Driver.Internal.AsyncRetryLogic.RetryAsync[T](Func`1 runTxAsyncFunc)
The blocked transaction is retried by the .NET driver and eventually succeeds. The final content of the DB looks correct:
@neo4j> MATCH(n) RETURN n;
+--------------------------+
| n |
+--------------------------+
| (:Registry {counter: 2}) |
| (:Dummy {Id: 1}) |
| (:Dummy {Id: 2}) |
+--------------------------+
The execution plan looks like so:
@neo4j> PROFILE MATCH(reg:Registry)
SET reg.counter = reg.counter + 1
MERGE (d:Dummy {Id: reg.counter})<-[:Owns]-(reg);
+--------------------------------------------------------------------------------------------------+
| Plan | Statement | Version | Planner | Runtime | Time | DbHits | Rows | Memory (Bytes) |
+--------------------------------------------------------------------------------------------------+
| "PROFILE" | "WRITE_ONLY" | "5" | "COST" | "SLOTTED" | 17 | 18 | 0 | 64 |
+--------------------------------------------------------------------------------------------------+
Cypher 5
Planner COST
Runtime SLOTTED
Runtime version 2025.12
+------------------+----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| Operator | Id | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses |
+------------------+----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +ProduceResults | 0 | | 10 | 0 | 0 | 0 | 0/0 |
| | +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +EmptyResult | 1 | | 10 | 0 | 0 | | 0/0 |
| | +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Apply | 2 | | 10 | 1 | 0 | | 0/0 |
| |\ +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +LockingMerge | 3 | CREATE (d:Dummy {Id: cache[reg.counter]}), (d)<-[anon_0:Owns]-(reg), LOCK(reg) | 10 | 1 | 4 | | 0/0 |
| | | +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +Filter | 4 | d.Id = cache[reg.counter] AND d:Dummy | 0 | 0 | 4 | | 0/0 |
| | | +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +Expand(All) | 5 | (reg)-[anon_0:Owns]->(d) | 3 | 4 | 6 | | 0/0 |
| | | +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +Argument | 6 | reg | 10 | 2 | 0 | | 0/0 |
| | +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +SetProperty | 7 | reg.counter = reg.counter + $autoint_0 | 10 | 1 | 2 | | 0/0 |
| | +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +NodeByLabelScan | 8 | reg:Registry | 10 | 1 | 2 | | 0/0 |
+------------------+----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
Interestingly enough, when inserting a SET clause on the registry node, the deadlock seems to go away:
MATCH(reg:Registry)
SET reg._dummy_ = true
REMOVE reg._dummy_
SET reg.counter = reg.counter + 1
MERGE (d:Dummy {{Id: reg.counter}})<-[:Owns]-(reg)
Here is the full code of the program to reproduce the problem (with the query being the first described in this post):
public static class Program
{
public static async Task Main()
{
Test test = new();
await test.DoAsync().ConfigureAwait(false);
}
}
public class Test
{
private N4j.IDriver _driver;
private ILoggerFactory _loggerFactory = LoggerFactory.Create(b =>
{
b.AddConsole();
});
private ILogger _logger;
public Test()
{
_logger = _loggerFactory.CreateLogger("sandbox");
_driver = N4j.GraphDatabase.Driver
(
new Uri("neo4j://127.0.0.1:7687"),
N4j.AuthTokens.None,
b =>
{
b.WithLogger(new Logger(_logger));
}
);
}
public async Task DoAsync()
{
{
using N4j.IAsyncSession session = _driver.AsyncSession();
await session.ExecuteWriteAsync(InitDbAsync).ConfigureAwait(false);
}
// Run multiple concurrent node creation queries, in one transaction each.
await Task.WhenAll(
Enumerable
.Range(0, 2)
.Select(async i =>
{
using N4j.IAsyncSession session = _driver.AsyncSession();
await session.ExecuteWriteAsync(
runner => CreateDummyAsync(runner, i),
o => o.WithTimeout(TimeSpan.FromSeconds(120))
).ConfigureAwait(false);
})
).ConfigureAwait(false);
}
private async Task InitDbAsync(N4j.IAsyncQueryRunner runner)
{
// Clear the DB and create a fresh registry node.
await runner.RunAsync(
@$"CALL() {{
MATCH (d:Dummy|Registry)
DETACH DELETE (d)
}}
CREATE (:Registry {{counter: 0}})"
).ConfigureAwait(false);
}
private static async Task CreateDummyAsync(N4j.IAsyncQueryRunner runner, int index)
{
N4j.IResultCursor cursor = await runner.RunAsync(
@$"MATCH(reg:Registry)
SET reg.counter = reg.counter + 1
MERGE (d:Dummy {{Id: reg.counter}})<-[:Owns]-(reg)"
).ConfigureAwait(false);
}
}
[System.Diagnostics.CodeAnalysis.SuppressMessage("Usage", "CA1848: Use the LoggerMessage delegates")]
[System.Diagnostics.CodeAnalysis.SuppressMessage("Usage", "CA2254: Template should be a static expression")]
internal sealed class Logger(ILogger inner) : global::Neo4j.Driver.ILogger
{
public void Error(Exception cause, string message, params object[] args)
=> Log(LogLevel.Error, cause, message, args);
public void Warn(Exception cause, string message, params object[] args)
=> Log(LogLevel.Warning, cause, message, args);
public void Info(string message, params object[] args)
=> Log(LogLevel.Information, null, message, args);
public void Debug(string message, params object[] args)
=> Log(LogLevel.Debug, null, message, args);
public void Trace(string message, params object[] args)
=> Log(LogLevel.Trace, null, message, args);
private void Log(LogLevel level, Exception? cause, string message, params object[] args)
{
inner.Log(level, new EventId(), cause, message, args);
}
public bool IsTraceEnabled() => inner.IsEnabled(LogLevel.Trace);
public bool IsDebugEnabled() => inner.IsEnabled(LogLevel.Debug);
}
Why I am asking for help
I don't understand why there is a deadlock in this scenario. It harms performance by a good deal in a real life scenario where hundreds of similar queries compete with one another, so I would like to get rid of it. I also want to understand why the query is problematic to educate myself. Adding a SET reg._dummy_ = true line seems to fix the problem but I would feel more confident that this is a real fix if I could understand the very nature of the issue.
What the documentation says
My query seems to match the example detailed here. According to the documented behavior, the SET reg.counter = reg.counter + 1 line should require the transaction to acquire a write lock on the reg node prior to its execution. The documentation is not very precise on that point but my guess is the acquired lock is of type NODE as described in this table.
The MERGE clause involves the creation of a relationship, so the doc states that prior to its execution, the transaction must acquire a write lock on the sync node. Based on the output, the table of internal locks and more random findings on the internet, I guess the internal locks required are of type NODE and NODE_RELATIONSHIP_GROUP_DELETE.
Unless I'm mistaken, the documented behavior could not lead to a deadlock. In my scenario with two concurrent transactions, the first to reach the SET reg.counter = reg.counter + 1 line wins. Since it acquires a NODE lock, the other has to wait until that first transaction completes. Once it completes, the second transaction resumes without any issues.
More poking in the dark
I read the documentation page on operators which is helpful to understand the meaning of the execution plan's binary tree.
The output I get suggests that the deadlock occurs because one transaction holds the NODE lock while the other holds the NODE_RELATIONSHIP_GROUP_DELETE lock. When looking a the execution plan, the only explanation I can imagine is the following chain of events:
- Transaction A (Tx A) gets to the
SetPropertyoperator first. It acquires aNODE(reg)lock which prevents Transaction B (Tx B) from traversing this operator yet - One way or another, Tx B, while waiting for Tx A to release its
NODE(reg)lock manages to preemptively acquire theNODE_RELATIONSHIP_GROUP_DELETE(reg)lock that it will need when it reaches theLockingMergeoperator. Maybe because theApplyoperator start pulling data from its right hand side when the left hand side is blocked. I insist that this is only a mere guess, not supported by any of the pieces of documentation I could set my eyes on. - Tx A continues its execution and blocks when trying to acquire the
NODE_RELATIONSHIP_GROUP_DELETE(reg)lock, because Tx B already holds it. The two transactions are in a deadlock because each holds one of the two locks required to move on.
But even this bogus reasoning would not explain why the additional SET reg._dummy_ = true line fixes the problem. I am still at a loss.
To debug a little further, I added an apoc.util.sleep(5000) line right before the MERGE clause, so that I have time to run SHOW TRANSACTIONS before the first transaction completes the sleep. That confirms that both transactions have respectively 3 and 2 locks at that point:
@neo4j> SHOW TRANSACTIONS YIELD transactionId, currentQueryId, currentQuery, status, currentQueryStatus, statusDetails, activeLockCount;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| transactionId | currentQueryId | currentQuery | status | currentQueryStatus | statusDetails | activeLockCount |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "neo4j-transaction-2794" | "query-5927" | "SHOW TRANSACTIONS YIELD transactionId, currentQueryId, currentQuery, status, currentQueryStatus, statusDetails, activeLockCount" | "Running" | "running" | "" | 0 |
| "neo4j-transaction-2793" | "query-5926" | "MATCH(reg:Registry) | "Blocked by: [neo4j-transaction-2792]" | "waiting" | "" | 2 |
| | \ SET reg.counter = reg.counter + 1 | | | | |
| | \ WITH reg | | | | |
| | \ CALL apoc.util.sleep(5000) | | | | |
| | \ MERGE (d:Dummy {Id: reg.counter})<-[:Owns]-(reg)" | | | | |
| "neo4j-transaction-2792" | "query-5925" | "MATCH(reg:Registry) | "Running" | "running" | "" | 3 |
| | \ SET reg.counter = reg.counter + 1 | | | | |
| | \ WITH reg | | | | |
| | \ CALL apoc.util.sleep(5000) | |
| | |
| | \ MERGE (d:Dummy {Id: reg.counter})<-[:Owns]-(reg)" | |
| | |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Thanks!