Help needed to understand deadlock in concurrent transactions

Hello all,

I would be extremely grateful if someone could shed some light on the behavior I get when running the simple experiments described below. They are about concurrent access in transactions and what I get does not match the documented behavior.

The scenario

Consider the following query:

MATCH(reg:Registry)
SET reg.counter = reg.counter + 1
MERGE (d:Dummy {Id: reg.counter})<-[:Owns]-(reg)

Before running the query, the DB is emptied, then initialized with a unique Registry node whose counter property is set to 0.
When running two instances of this query on an empty database, in one transaction each, I see that a deadlock blocks one of the two transactions.

Here is the output of the program:

> dotnet run
info: sandbox[0]
      Routing table is updated => RoutingTable{database=neo4j, routers=[neo4j://127.0.0.1:7687/], writers=[neo4j://127.0.0.1:7687/], readers=[neo4j://127.0.0.1:7687/], expiresAfter=300s}
info: sandbox[0]
      Routing table is updated => RoutingTable{database=neo4j, routers=[neo4j://127.0.0.1:7687/], writers=[neo4j://127.0.0.1:7687/], readers=[neo4j://127.0.0.1:7687/], expiresAfter=300s}
info: sandbox[0]
      Routing table is updated => RoutingTable{database=neo4j, routers=[neo4j://127.0.0.1:7687/], writers=[neo4j://127.0.0.1:7687/], readers=[neo4j://127.0.0.1:7687/], expiresAfter=300s}
warn: sandbox[0]
      Transaction failed and will be retried in 00:00:00.9965516 ms.
      Neo4j.Driver.TransientException: ForsetiClient[transactionId=2511, clientId=77] can't acquire EXCLUSIVE NODE(34401) because it would form this deadlock wait cycle:
NODE(34401)-[EXCLUSIVE_OWNER]->(tx:2512)-[WAITING_FOR_EXCLUSIVE]->(NODE_RELATIONSHIP_GROUP_DELETE(34401))-[EXCLUSIVE_OWNER]->(tx:2511)-[WAITING_FOR_EXCLUSIVE]->(NODE(34401))
         at Neo4j.Driver.Internal.MessageHandling.ResponsePipelineError.EnsureThrown()
         at Neo4j.Driver.Internal.Result.ResultCursorBuilder.ConsumeAsync()
         at Neo4j.Driver.Internal.AsyncTransaction.DiscardUnconsumed()
         at Neo4j.Driver.Internal.AsyncTransaction.CommitAsync()
         at Neo4j.Driver.Internal.AsyncTransaction.CommitAsync()
         at Neo4j.Driver.Internal.AsyncSession.<>c__DisplayClass52_0`1.<<RunTransactionAsync>b__1>d.MoveNext()
      --- End of stack trace from previous location ---
         at Neo4j.Driver.Internal.AsyncSession.<>c__DisplayClass52_0`1.<<RunTransactionAsync>b__1>d.MoveNext()
      --- End of stack trace from previous location ---
         at Neo4j.Driver.Internal.AsyncRetryLogic.RetryAsync[T](Func`1 runTxAsyncFunc)

The blocked transaction is retried by the .NET driver and eventually succeeds. The final content of the DB looks correct:

@neo4j> MATCH(n) RETURN n;
+--------------------------+
| n                        |
+--------------------------+
| (:Registry {counter: 2}) |
| (:Dummy {Id: 1})         |
| (:Dummy {Id: 2})         |
+--------------------------+

The execution plan looks like so:

@neo4j> PROFILE MATCH(reg:Registry)
        SET reg.counter = reg.counter + 1
        MERGE (d:Dummy {Id: reg.counter})<-[:Owns]-(reg);
+--------------------------------------------------------------------------------------------------+
| Plan      | Statement    | Version | Planner | Runtime   | Time | DbHits | Rows | Memory (Bytes) |
+--------------------------------------------------------------------------------------------------+
| "PROFILE" | "WRITE_ONLY" | "5"     | "COST"  | "SLOTTED" | 17   | 18     | 0    | 64             |
+--------------------------------------------------------------------------------------------------+


Cypher 5

Planner COST

Runtime SLOTTED

Runtime version 2025.12

+------------------+----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| Operator         | Id | Details                                                                        | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses |
+------------------+----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +ProduceResults  |  0 |                                                                                |             10 |    0 |       0 |              0 |                    0/0 |
| |                +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +EmptyResult     |  1 |                                                                                |             10 |    0 |       0 |                |                    0/0 |
| |                +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +Apply           |  2 |                                                                                |             10 |    1 |       0 |                |                    0/0 |
| |\               +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +LockingMerge  |  3 | CREATE (d:Dummy {Id: cache[reg.counter]}), (d)<-[anon_0:Owns]-(reg), LOCK(reg) |             10 |    1 |       4 |                |                    0/0 |
| | |              +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +Filter        |  4 | d.Id = cache[reg.counter] AND d:Dummy                                          |              0 |    0 |       4 |                |                    0/0 |
| | |              +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +Expand(All)   |  5 | (reg)-[anon_0:Owns]->(d)                                                       |              3 |    4 |       6 |                |                    0/0 |
| | |              +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| | +Argument      |  6 | reg                                                                            |             10 |    2 |       0 |                |                    0/0 |
| |                +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +SetProperty     |  7 | reg.counter = reg.counter + $autoint_0                                         |             10 |    1 |       2 |                |                    0/0 |
| |                +----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+
| +NodeByLabelScan |  8 | reg:Registry                                                                   |             10 |    1 |       2 |                |                    0/0 |
+------------------+----+--------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+

Interestingly enough, when inserting a SET clause on the registry node, the deadlock seems to go away:

MATCH(reg:Registry)
SET reg._dummy_ = true
REMOVE reg._dummy_
SET reg.counter = reg.counter + 1
MERGE (d:Dummy {{Id: reg.counter}})<-[:Owns]-(reg)

Here is the full code of the program to reproduce the problem (with the query being the first described in this post):

public static class Program
{
    public static async Task Main()
    {
        Test test = new();
        await test.DoAsync().ConfigureAwait(false);
    }
}

public class Test
{
    private N4j.IDriver _driver;
    private ILoggerFactory _loggerFactory = LoggerFactory.Create(b =>
    {
        b.AddConsole();
    });
    private ILogger _logger;
    public Test()
    {
        _logger = _loggerFactory.CreateLogger("sandbox");
        _driver = N4j.GraphDatabase.Driver
        (
            new Uri("neo4j://127.0.0.1:7687"),
            N4j.AuthTokens.None,
            b =>
            {
                b.WithLogger(new Logger(_logger));
            }
        );
    }

    public async Task DoAsync()
    {
        {
            using N4j.IAsyncSession session = _driver.AsyncSession();

            await session.ExecuteWriteAsync(InitDbAsync).ConfigureAwait(false);
        }

        // Run multiple concurrent node creation queries, in one transaction each.
        await Task.WhenAll(
            Enumerable
            .Range(0, 2)
            .Select(async i =>
            {
                using N4j.IAsyncSession session = _driver.AsyncSession();
                await session.ExecuteWriteAsync(
                    runner => CreateDummyAsync(runner, i),
                    o => o.WithTimeout(TimeSpan.FromSeconds(120))
                ).ConfigureAwait(false);
            })
        ).ConfigureAwait(false);
    }

    private async Task InitDbAsync(N4j.IAsyncQueryRunner runner)
    {
        // Clear the DB and create a fresh registry node.
        await runner.RunAsync(
            @$"CALL() {{
                 MATCH (d:Dummy|Registry)
                 DETACH DELETE (d)
               }}
               CREATE (:Registry {{counter: 0}})"
        ).ConfigureAwait(false);
    }

    private static async Task CreateDummyAsync(N4j.IAsyncQueryRunner runner, int index)
    {
        N4j.IResultCursor cursor = await runner.RunAsync(
            @$"MATCH(reg:Registry)
               SET reg.counter = reg.counter + 1
               MERGE (d:Dummy {{Id: reg.counter}})<-[:Owns]-(reg)"
        ).ConfigureAwait(false);
    }
}

[System.Diagnostics.CodeAnalysis.SuppressMessage("Usage", "CA1848: Use the LoggerMessage delegates")]
[System.Diagnostics.CodeAnalysis.SuppressMessage("Usage", "CA2254: Template should be a static expression")]
internal sealed class Logger(ILogger inner) : global::Neo4j.Driver.ILogger
{
    public void Error(Exception cause, string message, params object[] args)
        => Log(LogLevel.Error, cause, message, args);

    public void Warn(Exception cause, string message, params object[] args)
        => Log(LogLevel.Warning, cause, message, args);

    public void Info(string message, params object[] args)
        => Log(LogLevel.Information, null, message, args);

    public void Debug(string message, params object[] args)
        => Log(LogLevel.Debug, null, message, args);

    public void Trace(string message, params object[] args)
        => Log(LogLevel.Trace, null, message, args);

    private void Log(LogLevel level, Exception? cause, string message, params object[] args)
    {
        inner.Log(level, new EventId(), cause, message, args);
    }

    public bool IsTraceEnabled() => inner.IsEnabled(LogLevel.Trace);

    public bool IsDebugEnabled() => inner.IsEnabled(LogLevel.Debug);
}

Why I am asking for help

I don't understand why there is a deadlock in this scenario. It harms performance by a good deal in a real life scenario where hundreds of similar queries compete with one another, so I would like to get rid of it. I also want to understand why the query is problematic to educate myself. Adding a SET reg._dummy_ = true line seems to fix the problem but I would feel more confident that this is a real fix if I could understand the very nature of the issue.

What the documentation says

My query seems to match the example detailed here. According to the documented behavior, the SET reg.counter = reg.counter + 1 line should require the transaction to acquire a write lock on the reg node prior to its execution. The documentation is not very precise on that point but my guess is the acquired lock is of type NODE as described in this table.

The MERGE clause involves the creation of a relationship, so the doc states that prior to its execution, the transaction must acquire a write lock on the sync node. Based on the output, the table of internal locks and more random findings on the internet, I guess the internal locks required are of type NODE and NODE_RELATIONSHIP_GROUP_DELETE.

Unless I'm mistaken, the documented behavior could not lead to a deadlock. In my scenario with two concurrent transactions, the first to reach the SET reg.counter = reg.counter + 1 line wins. Since it acquires a NODE lock, the other has to wait until that first transaction completes. Once it completes, the second transaction resumes without any issues.

More poking in the dark

I read the documentation page on operators which is helpful to understand the meaning of the execution plan's binary tree.

The output I get suggests that the deadlock occurs because one transaction holds the NODE lock while the other holds the NODE_RELATIONSHIP_GROUP_DELETE lock. When looking a the execution plan, the only explanation I can imagine is the following chain of events:

  • Transaction A (Tx A) gets to the SetProperty operator first. It acquires a NODE(reg) lock which prevents Transaction B (Tx B) from traversing this operator yet
  • One way or another, Tx B, while waiting for Tx A to release its NODE(reg) lock manages to preemptively acquire the NODE_RELATIONSHIP_GROUP_DELETE(reg) lock that it will need when it reaches the LockingMerge operator. Maybe because the Apply operator start pulling data from its right hand side when the left hand side is blocked. I insist that this is only a mere guess, not supported by any of the pieces of documentation I could set my eyes on.
  • Tx A continues its execution and blocks when trying to acquire the NODE_RELATIONSHIP_GROUP_DELETE(reg) lock, because Tx B already holds it. The two transactions are in a deadlock because each holds one of the two locks required to move on.

But even this bogus reasoning would not explain why the additional SET reg._dummy_ = true line fixes the problem. I am still at a loss.

To debug a little further, I added an apoc.util.sleep(5000) line right before the MERGE clause, so that I have time to run SHOW TRANSACTIONS before the first transaction completes the sleep. That confirms that both transactions have respectively 3 and 2 locks at that point:

@neo4j> SHOW TRANSACTIONS YIELD transactionId, currentQueryId, currentQuery, status, currentQueryStatus, statusDetails, activeLockCount;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| transactionId            | currentQueryId | currentQuery                                                                                                                      | status                                 | currentQueryStatus | statusDetails | activeLockCount |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "neo4j-transaction-2794" | "query-5927"   | "SHOW TRANSACTIONS YIELD transactionId, currentQueryId, currentQuery, status, currentQueryStatus, statusDetails, activeLockCount" | "Running"                              | "running"          | ""            | 0               |
| "neo4j-transaction-2793" | "query-5926"   | "MATCH(reg:Registry)                                                                                                              | "Blocked by: [neo4j-transaction-2792]" | "waiting"          | ""            | 2               |
|                          |                \                SET reg.counter = reg.counter + 1                                                                                  |                                        |                    |               |                 |
|                          |                \                WITH reg                                                                                                           |                                        |                    |               |                 |
|                          |                \                CALL apoc.util.sleep(5000)                                                                                         |                                        |                    |               |                 |
|                          |                \                MERGE (d:Dummy {Id: reg.counter})<-[:Owns]-(reg)"                                                                  |                                        |                    |               |                 |
| "neo4j-transaction-2792" | "query-5925"   | "MATCH(reg:Registry)                                                                                                              | "Running"                              | "running"          | ""            | 3               |
|                          |                \                SET reg.counter = reg.counter + 1                                                                                  |                                        |                    |               |                 |
|                          |                \                WITH reg                                                                                                           |                                        |                    |               |                 |
|                          |                \                CALL apoc.util.sleep(5000)                                                                                         |                                        |
|                          |                |
|                          |                \                MERGE (d:Dummy {Id: reg.counter})<-[:Owns]-(reg)"                                                                  |                                        |
|                          |                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Thanks!

Actually reading it again, it looks like a bug and because of execution timings the introduction of dummy might not be causing deadlock. Maybe if you tried that one with more than 2 (e.g. 20) you could maybe surface a deadlock as well to confirm it is a timing issue.

@dana_canzano - one to look at.

Thanks for taking the time to read my lengthy post and replying @joshcornejo

Maybe if you tried that one with more than 2 (e.g. 20) you could maybe surface a deadlock as well

I ran this query in 200 concurrent transactions (with Enumerable.Range(0, 200) in my test program):

MATCH(reg:Registry)
SET reg._dummy_ = true
REMOVE reg._dummy_
SET reg.counter = reg.counter + 1
// I enabled this later to monitor lock count. Read on.
// WITH reg
// CALL apoc.util.sleep(5000)
MERGE (d:Dummy {{Id: reg.counter}})<-[:Owns]-(reg)

That consistently runs successfully in less than 2s. I retried multiple times and never got a deadlock.

Note that when uncommenting the apoc.util.sleep(5000) call, I could see that transactions that are waiting only hold 1 lock, as opposed to the 2 locks they have when not setting the _dummy_ property. There is definitely something different going on internally.

I hope that answers your suggestion. Otherwise, let me know.

Thanks!

Can you try:

MATCH(reg:Registry)
SET reg._dummy_ = NOT (reg._dummy_)

Sure. I ran this exact query in 200 concurrent transactions. No deadlock.

Then I ran

MATCH(reg:Registry)
SET reg._dummy_ = NOT (reg._dummy_)
SET reg.counter = reg.counter + 1
MERGE (d:Dummy {{Id: reg.counter}})<-[:Owns]-(reg)

and got more interesting results:

  • I can run that in 20 concurrent transactions without encountering any deadlocks
  • When running it in 30 concurrent transactions, I see 7 failures because of deadlocks

Is that helpful?

It will be to the Neo4J staff debugging.

when you set reg._dummy_ originally it was a 1 way op ( = true), then now you have a (get value => negate value => set value) that is similar to the reg.counter = reg.counter + 1