How to create a unique index for a property > 8k in Neo4j 5+?

I want to create a unique index on a label/property combination where the property values can be longer than ~8k.

A simple

CREATE CONSTRAINT my_unique_index FOR (r:Chemical) REQUIRE (r.fingerprint) IS UNIQUE;

does not work if suppliying a property value > 8k, I'm getting

Neo.DatabaseError.Statement.ExecutionFailed
Property value is too large to index, please see index documentation for limitations. Index: Index( id=3, name='my_unique_index', type='RANGE', schema=(:Chemical {fingerprint}), indexProvider='range-1.0', owningConstraint=4 ), element id: 4:6d6230f8-5a72-4075-babf-3f1ce5fc8d80:10, property size: 18011.

I fully understand that range-1.0 index provider has limitations. I could use CREATE TEXT INDEX ... for building an index to support longer strings.

Is there a way to use text-2.0 index provider for a unique constraint?

Back in Neo4j 4.x this was perfectly possible using :

call db.createUniquePropertyConstraint('my_unique_constraint', ['Chemical'], ['fingerprint'], 'lucene+native-3.0')

So far I couldn't find a way to reach feature parity between Neo4j 4.x and Neo4j 5.x (and higher) for this scenario. Any hints?

Maybe just split your fingerprint into f1, f2 (I have never tested, but I hope we can game the system this way).

CREATE CONSTRAINT my_unique_index FOR (r:Chemical) REQUIRE (r.f1, r.f2) IS UNIQUE;

I hope you are well :hugs:

EDIT: This will not work the ~8k is shared between the keys :sad_but_relieved_face:

1 Like

If this synopsis from Claude is correct:

  • Index Acceleration: Uniqueness constraints benefit from Neo4j's index capabilities
  • Cache Impact: Constraint metadata is kept in Neo4j's schema cache
  • Execution Cost: Existence constraints add minimal overhead, while uniqueness constraints add index lookup costs

And you have a lot of Chemical nodes, you might be penalising your perfomance by having extremely large objects. So, is it not possible to provide a hash of those keys (either calculating it in your code before inserting or creating a procedure?), using SHA-256 or BLAKE3 depending on your experience:

hasFingerPrint = blake3(fingerprint) 

and then

CREATE CONSTRAINT hashIndex FOR (r:Chemical) REQUIRE (r.hashFingerprint) IS UNIQUE;
1 Like

I see some internal chatter about the index provider (sounds like a "I would not hold my breath situation"). So finding a way of reducing the size of the fingerprints is likely the only option. Or some other hack to keep a shorter key mapped to the fingerprint somewhere else.

1 Like

Thanks for your answers. The easiest way would be indeed use a apoc trigger that hashes the long property value and stores it in a secondary property using the unique constraint.

An alternative would be using a text index (aka text-2.0 provider) for the given property and use MERGE in single threaded way to prevent race conditions.

Does index provider lucene+native-3.0 still exist under the hoods? If so I could check the old source code for db.createUniquePropertyConstraint procedure and try to reimplement it as a custom procedure.

Sorry lucene+native-3.0 was removed in Neo4j 5. The BTREE index was overloaded trying to handle too much, which lead to splitting it into RANGE, POINT, and TEXT indexes.

We are moving away from public index providers and to focus on index types; they have effectively become versions, and we want people using the latest version of an index type. Public surface for specifying an index provider is already been removed in the latest version of Cypher, and will be ignored in many other existing cases.

It would be best to use a long, high-entropy hash of the relevant data you wish to have uniqueness over, potentially salted, or affixed with some relevant UUID for that data to help minimise collisions.

1 Like