Locking a unique sequential Integer property On Create

Hey Team,

I am trying to create what is essentially an auto-increment ID on new nodes.

Requirements:

  1. Should be human readable (we are using an integer)
  2. Should be unique to the Node Label
  3. Should be sequential (auto-increment)

I first try to "reserve" the next Node in line with a query that looks like this:

CALL {
  MATCH (n:Foo)
  WITH MAX(n.id)+1 as next_number
  RETURN next_number
}
WITH next_number
CREATE (new:Foo {id: next_number})
WITH *
SET
  new.uuid = apoc.create.uuid(),
  new._deleted = false,
  new.created_date = DATETIME()
RETURN new

There is a UNIQUENESS CONSTRAINT on Foo.id and Foo.uuid

The issue is that this query is can be run many times and even concurrently so I end up with many errors Neo.ClientError.Schema.ConstraintValidationFailed

I have the query nested in some code that performs a number of retries if it sees that error to try an eventually get a valid Node.

Unfortunately, it still often fails after several retries (conflicting with other queries that are being run at the same time).

I am looking for a better/reliable way to this.

The best I can come up with is to first create/reserve all the nodes to ensure that each one has a unique ID and THEN to perform the rest of the work in parallel processes.

Is there not a better/simpler way in Neo4j to just grab the next available number upon the creation of a Node? (sort of how it does it for the Node ID())

Sounds like the proverbial gapless sequences... The only way to get those in a multiuser database is to serialize on ID creation. The issue you have is that your new node with your next ID is not yet committed, and another concurrent user can generate the same ID.

Note: This isn't unique to Neo4j - you can visit AskTom to see people talking about this in Oracle 20 years ago.

Bottom line - if those 3 things are actual hard requirements, then you are destined to have a single-threaded path for your code.

1 Like

Thanks! I suspected as much.
And yes, not unique to Neo4j, but some other DB's do have an auto-increment feature. I was just hoping I missed it somehow :)

Your solution will not work, as you can have a race condition where any number of threads match on Foo and calculate the same max value before each creates a new Foo with the next number. This is because you have not done anything to serialize the calculation of the next sequence number.

What I have done is create a node for each sequence generator I need. I then query for this node, update its sequence number, and return the new value. I do this in a transaction and I lock the node with a 'set' operations before I do anything. This seems to lock the node, making all threads serialize behind this lock if there is a race condition among multiple threads. I have done this in a custom procedure for my application because I need it for other custom procedures, but I have created an example with some Junit tests to show you the approach and the issue concurrency.

Here is a snippet of code with the concurrency issue. This is running in a test container. First I delete all data and then create a new 'UniqueId' node, which is used to track the sequence for invoice numbers. The sequence is seeded with a starting id of 100. In the test method, I created 1000 threads, each trying to get the next sequence number by matching on the node with the name 'invoice', calculating the next number in the sequence, and updating the node. The 'with' clause does not lock the node, allowing any number of threads to read the same value of id, thus calculating the same value of id+1 and persisting (overwriting) and returning the same value.

   @BeforeEach
    void setUpdata() {
        try(final Session session = driver.session()) {
            session.run("MATCH(n:UniqueId) DELETE n");
            session.run("CREATE(:UniqueId {name:'invoice', id:100})");
        }
    }

    @Test
    public void testConcurrencyWithCallable() throws InterruptedException {
        List<Future<Long>> futures = new ArrayList<>();
        Callable<Long> callable = () -> {
            try (final Session session = driver.session()) {
                return session.executeWrite(tx -> {
                    Result result = tx.run("""
                                MATCH (id:UniqueId{name:'invoice'})
                                WITH id, id.id as x
                                SET id.id = x + 1
                                RETURN id.id as id"""
                    );
                    return result.single().get("id").asLong();
                });
            }
        };
        for (int i = 0; i < 1000; i++) {
            futures.add(threadPool.submit(callable));
        }

        List<Long> ids = new ArrayList<>();
        futures.forEach(x -> {
            try {
                Long id = x.get();
                ids.add(id);
            } catch (InterruptedException | ExecutionException e) {
                throw new RuntimeException(e);
            }
        });

        ids.sort(Comparator.comparingLong(x->x));
        ids.forEach(System.out::println);

        threadPool.awaitTermination(10, TimeUnit.SECONDS);
        threadPool.shutdown();
    }

Following is the beginning of the sequence of id's returned. As you can see, there is a lot of duplications because multiple threads concurrently read the same id value in the 'with' clause.

100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
101
102
102
102
102
102
102
102
102
102

To fix this issue, we need to lock the UniqueId node with the name 'invoice', so we can can calculate the next sequence number sequentially. To do this, we immediately execute a 'set' operation so the node is locked, then calculate the next value by adding 1. Below is the new query to get the next sequence number and update the node in a serialized fashion.

MATCH (id:UniqueId{name:'invoice'})
SET id.id = id.id + 1
RETURN id.id as id

The following is the sequence of id's with the updated query:

101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128

You can do something more like this in a transaction, which will serialize the update of this node, giving you a sequence generator without duplicates.

MATCH (n:Foo{name:'sequence_name'})
SET n.next_number = n.next_number + 1
//Set other attributes
RETURN n.next_number

If you need to other stuff before calculating the next number, you can set a temporary property to lock the node, do other needed processing, then remove the temporary property at the end of the query.

MATCH (n:Foo{name:'sequence_name'})
SET n._LOCK = true
//do other stuff
SET n.next_number = n.next_number + 1
REMOVE n._LOCK
RETURN n.next_number

An interesting implementation would be to wrap this code in an APOC custom procedure so you can call it as a procedure when needed, eliminating duplicating the code in each query that the sequence number is needed.

Here is an example. Create the custom procedure in the 'system' database. Execute :use system to switch to the system database, and :use neo4j to switch back.

CALL apoc.custom.installProcedure(
    'getNextSequenceNumber(sequence::STRING) :: (nextValue::INT)', 
    '
     MATCH (n:Sequence{name:$sequence})
     SET n.next_number = n.next_number + 1
     RETURN n.next_number as nextValue
    ',
    'neo4j',
    'write'
)

After creating the Sequence node with name equal to 'sequence_name' and an initial next_number value of 100, you can get the next number in the sequence as follows:

call custom.getNextSequenceNumber('sequence_name') yield nextValue
return nextValue

Second call:

You can create any number of sequence nodes, just use a different name value that represent what the sequence is for, then call the custom procedure with the sequence's name.

If you could relax your requirement on sequential numbers a little, you could use the apoc.atomic.add with your sequence label. This will allow you to minimize the blocking of your concurrent processes and still get unique numbers with greater throughput of the overall processing. The only caveat is this updates the number in the sequence label in a separate transaction so if your transaction fails and rolls back or the sequence is not used, the number will be lost, and you will have a gap in your numbers.