ID() function deprecated? How to replace easily?

olivier_sibuet · May 17, 2023, 10:43am

The ID() function is quite a basic function in neo4j and I am very surprised to see it is most likely to be deprecated soon in version 5... We have used the ID() function for several years, as it remains an efficient way to find nodes. e.g. referential data or specifi nodes, etc...
If the ID() function is deprecated, then we will have thousands of cypher queries to update in our app. I would like to ask (and make sure) if in future neo4j versions:

the exisiting internal identifier (ID) of a node will be kept in database (and remain constant)
if yes, will it still be a 'number' format e.g. 234354 (especially for nodes created before deprecation)?
how would you suggest to adapt all our queries ? is there another way to access the id value (today this is a number), e.g. if how may I replace everywhere in my code 'ID(' by something else ? could it be as simple as that? e.g. is there an apoc function which replace the id() function? Hope you can help and provide us with a cheap and time/cost affective way to manage this deprecation.
Thanks a lot! cheers!

cobra · May 17, 2023, 2:11pm

Hello @olivier_sibuet

With Neo4j 5, you should use elementId() function. You can have more information here and here.

Best regards,
Cobra

olivier_sibuet · May 17, 2023, 3:21pm

Thanks cobra,

quick question : in my app, I would need to check if the node correspond to a specific value of the old ID(node).
do you think I can use this to extract the "old" internal id and check if it is equal to a specific "old" value? (here below, n is a node)
toInteger(apoc.text.split(elementId(n), ":")[2])

I have check my whole database and I can see that
toInteger(apoc.text.split(elementId(n), ":")[2]) = ID(n) on all nodes of my database. But this may change in the future?

cobra · May 17, 2023, 3:39pm

I'm sorry but I don't understand everything but you should know the internal ID of Neo4j must never be used. The elementId() should give you the ID you need.

olivier_sibuet · May 18, 2023, 11:27am

Thanks Cobra for taking time to answer. From our perspective, using the id is really an efficient way to access nodes!

The really bad thing is that elementId() does not return the same value as id(). The returned value of elementId() is really not readable nor user friendly, unfortunately.... Also the id() format is much more convenient than the format of elementId()

Of course we know that the id can be reallocated, but if we use only one ddatabase, and if some nodes are never deleted, then we are sure that these nodes will always keep their original id. Also, this id is guaranteed to be unique within our database, on a cluster!

I can imagine that a lot of neo4j clients are really to be very disappointed by such a deprecation. The impact on our development will be super EXPENSIVE (maybe 100 of man/days of stupid work). Do you know what are the reason to deprecated such a useful function ID() ?
Is there still a chance that it will not get deprecated?
So far, I am testing 5.7 and fortunately, it still works.

Cheers

dana_canzano · May 18, 2023, 4:25pm

@olivier_sibuet
doesnt make it less right, more wrong but

Keep in mind that Neo4j reuses its internal IDs when nodes and relationships are deleted. This means that applications using, and relying on, internal Neo4j IDs, are brittle or at risk of making mistakes. It is therefore recommended to use application-generated IDs instead.

olivier_sibuet · May 19, 2023, 6:37am

Thanks Dana. Of course we do pay attention to this, you are right to remind everybody with this.
As I said we have some nodes which are never deleted (we are sure about that). This is why we use the id() so many times in our queries. I will be happy to show to your team our data model if you wish

Have a good day

cobra · May 21, 2023, 10:01am

Hello @olivier_sibuet

I think you can go with toInteger(apoc.text.split(elementId(n), ":")[2]) = ID(n) but I can't guarantee you that it will not change in the future. You should go with randomUUID() function.

Best regards,
Cobra

stephanie.ranft · May 21, 2023, 11:45pm

Hey @cobra,

Wouldn't it be simpler to use toInteger(split(elementId(n), ":")[-1]), or is there a reason you're using the apoc function?

cobra · May 22, 2023, 11:24am

Hey @stephanie.ranft

I just took the command line from the previous message but it's better to use the split() function of the Cypher language.

Best regards,
Cobra

olivier_sibuet · May 22, 2023, 1:14pm

Thank you both! At night, now I just dream that the original ID() function will just NEVER be deprecated, as the impact on our side is so huge and expensive!...

Anybody knows why this will be deprecated?

PS : If the deprecation is confirmed, I'm afraid we will just keep current neo4j enterprise version 5 as of now, and we will never upgrade it... It's really part of our foundation/principles for all the cypher queries that we developped during 3 years (in order to get efficiency, without any additional index).

dana_canzano · May 22, 2023, 1:49pm

@olivier_sibuet

cypher queries that we developped during 3 years (in order to get efficiency, without any additional index)

but why? why is an index simply not an option? if i was to hazzard a guess of all the Neo4j installs I might think 85% or more are utilizing indexes as opposed to id() so as to perform fast lookups.

dana_canzano · May 22, 2023, 2:08pm

@olivier_sibuet

also if for example I run

 profile match (n) where elementid(n)='4:48793436-bb10-42f5-95ff-f3755fd97a2f:1001' return n;

we do see a plan which uses

Planner COST

Runtime PIPELINED

Runtime version 5.7

Batch size 128

+----------------------+----+--------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+
| Operator             | Id | Details                              | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline      |
+----------------------+----+--------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+
| +ProduceResults      |  0 | n                                    |              1 |    1 |       2 |                |                    1/0 |     0.402 |               |
| |                    +----+--------------------------------------+----------------+------+---------+----------------+------------------------+-----------+               |
| +NodeByElementIdSeek |  1 | n WHERE elementId(n) = $autostring_0 |              1 |    1 |       1 |            120 |                    1/0 |     1.946 | In Pipeline 0 |
+----------------------+----+--------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------+

Total database accesses: 3, total allocated memory: 184

and whereby I presume +NodeByElementIdSeek is synonymous with NodeByIdSeek and when using id(n)=???

olivier_sibuet · May 22, 2023, 3:11pm

Thanks Dana!

You are right, creating an index will always be an option. Unfortunately, the impact will be huge for us to upgrade our code and we have no time for this in coming months.

Also, as we had got a few issues in the past when trying to generate unique functional IDs on a cluster with large volume of parallel request, we simply decided to keep the internal ID. But we were still in our learning curve at that time...

Here is an example of issue : on a cluster, with hundreds of cypher queries run in parallel, we had from time to time duplicated functional ID numbers generated. This was due (I believe) to the time needed to replicate the WRITE cluster node on all the READ cluster nodes - We had several ways to solve this issue, but the simplest one was just to rely on neo4j internal ID, which is just unique for sure. Also, ID() has a user friendly format... (the elementId format unfortunately is not really easy to "read" ;-)

dana_canzano · May 22, 2023, 3:27pm

@olivier_sibuet
ill agree the value of id() is easier to read than elementid() but wouldnt one for example

match (n:Person {name'Dana'}) with id(n) as internal  match .....

and wouldnt this switch to

match (n:Person {name'Dana'}) with elementid(n) as internal  match .....

olivier_sibuet · May 23, 2023, 6:45am

Most of our queries would be like that
Match [s:Shop]<-[r1:IN_SHOP]-(o:Order)<-[r0:ORDER]-(n:Person) where id(n) = 12345 and r0.orderDate = date() and ID(s) = 34566
return ID(o) as orderReferenceId

@Dana, do you know why neo4j team really want to deprecate the Id function? (it's just up to us, as clients of neo4j, to know what are the limits of uising Id() function).

michael.hunger · May 23, 2023, 9:05am

There is a lot of history here.

As the id() represents the internal storage offset in Neo4j it should have never leaked in the first place into the API or even exposed to users, but it did and it was fast.

It comes with a lot of drawbacks though, the most pressing ones were reuse after deletion, so your id can point to a different entity after a while. The second is that in a clustered setup with shards/multi-db it is not possible from an id to derive which database the entity came from and route queries correctly back to that database.

All those things plus a bit more were the intention behind adding elementId()

The original proposal was just to change the data type of id() from long to string and make it an opaque value. But that would have been even more devastating, so we avoided it with adding a new function that allows migration.

And you should make no assumptions about the internal format of elementId() it can and will change without notice. It is meant to be an opaque string.

The correct way as Dana points out is to use "business-identifiers" (think user-id, ISBN, email address, oauth-key, SSN) and create a unique constraint per label + key(s) for efficient lookups.

Btw. you should also use parameters and not literal values in your statements!

olivier_sibuet · May 23, 2023, 10:23am

Thanks Michael, very honored to get your support as well. Now Chrystal clear. Thank you all.

Btw. as we just use one database, with nodes never deleted, we were not facing all these issues. But I can understand now where you are going, and reasons. If the id() really disappear in version 6.0, and if we need to upgrade, we will invest and rewrite hundred/thousands of queries. Understood.

Best regards,

Olivier

PS : and you are right, we use parameters ;-) (it was just to give you an example of an integer id)

dimitri1 · May 23, 2023, 7:52pm

Hi all,

While I understand that the ID() function should be used with caution, quite a few apoc functions return the ID of the node. For instance apic.refactor.cloneSubGraph returns the input node as integer: apoc.refactor.cloneSubgraph - APOC Documentation

How to match the input node when the ID() function is no longer to be used?

olivier_sibuet · July 6, 2023, 10:00am

Replacing de id() call that we made in thousands of queries is becoming a real nightmare…

Topic		Replies	Views
Get a node identifier! Cypher knowledge-base	1	126	April 22, 2024
Id vs elementid transactions question Cypher cypher	4	1827	July 18, 2024
Node properties - Difference between "identity" and "elementId" General migrated	6	1561	January 12, 2023
ID() deprecated, while apoc.refactor.cloneSubGraph returns ids Procedures & APOC	0	212	May 25, 2023
What's the proper approach instead of deprecated `id()` function, for selected `$nodes` in a Scene Action? Neo4j Bloom	0	280	November 23, 2023

Get Certified in June!

ID() function deprecated? How to replace easily?

Related topics