Fast with Properties, Slow with elementId() — Why? Fixable?

Hi community,

I'm experiencing an unexpected performance issue with a simple Neo4j query.

The following query runs very slowly because it tries to find all relations of Language (there is only one relation with Article):

PROFILE MATCH (article:Article)-[r:IN_LANGUAGE]->(language:Language)
WHERE elementId(article) = '4:a5ad2f22-47ff-4fcc-a208-358646d5b3ec:261210970'
  AND elementId(language) = '4:a5ad2f22-47ff-4fcc-a208-358646d5b3ec:1074513'
RETURN r

However, using an indexed property, it does what I would expect:

PROFILE MATCH (article:Article)-[r:IN_LANGUAGE]->(language:Language)
WHERE article.uuid = '57686315-e27e-42bd-bbef-3736f9d896f9'
  AND language.uuid = 'af8bd5c02b8211e8b26f020bc29461c8'
RETURN r

Unfortunately, my use case requires the use of node element IDs.

Does anyone know why this is happening, and if I could optimize this query? I'm guessing internally elementId() or id() works differently than indexed properties, but somehow this behavior looks very weird to me.

Thanks for your help!

I don't think elementID is hashed the same way as an index and would suggest you figure a way of migrating away from elementID() or you might start having weird behaviours.

Quote:

There are important considerations to bear in mind when using elementId():

  1. Every node and relationship is guaranteed an element ID. This ID is unique among both nodes and relationships across all databases in the same DBMS within the scope of a single transaction. However, no guarantees are given regarding the order of the returned ID values or the length of the ID STRINGvalues. Outside of the scope of a single transaction, no guarantees are given about the mapping between ID values and elements.
    2. Neo4j reuses its internal IDs when nodes and relationships are deleted. Applications relying on internal Neo4j IDs are, as a result, brittle and can be inaccurate. It is therefore recommended to use application-generated IDs.

@joseraul

i'm in 1000% agreement with @joshcornejo response. Simply because what is documented is documented and true.

However is there any detail on what version of Neo4j this was encountered upon?

Could you send the PROFILE plan of the faster query for comparison?
We'll want to double-check if the slowdown is actually because of the lookup (doubtful) or some other aspect of how the query is planned which we may be able to change.

Specifically, in the PROFILE plan for the slower query which you provided, it looks like it's starting from the :Language node and expanding to all articles in that language before filtering down.

That would naturally be slow, but expanding in the opposite direction should be fast. We would want to nudge the planner to start by matching on the :Article node, and expanding to the languages that it's in, and filtering down to the desired node.

You can use a SKIP 0 to create a barrier to the planner, where it will have to solve patterns prior to the SKIP before it can proceed. By putting the filtering of the :Language node on the other side of the SKIP, it should cause the planner to expand from the article node.

Give this a try:

PROFILE 
MATCH (article:Article)-[r:IN_LANGUAGE]->(language:Language)
WHERE elementId(article) = '4:a5ad2f22-47ff-4fcc-a208-358646d5b3ec:261210970'
WITH article, r, language
SKIP 0
WHERE elementId(language) = '4:a5ad2f22-47ff-4fcc-a208-358646d5b3ec:1074513'
RETURN r

It may be expanding and filtering instead of doing a NodeByElementIdSeek, but it should still be faster due to the cheaper expansion.

Thanks to all for the reply!

Yessss, we will definitely do the transition, but at the moment, I need to stick with IDs. The version of our neo4j isntance is 5.26.1

@andrew_bowman The PROFILE for the faster query is:

As you said, in the slower query, to me looks like it is "choosing" the slow path, the one that has more relations.

I just tried your suggestion and definitely works fast, thanks man!