I'm experiencing an unexpected performance issue with a simple Neo4j query.
The following query runs very slowly because it tries to find all relations of Language (there is only one relation with Article):
PROFILE MATCH (article:Article)-[r:IN_LANGUAGE]->(language:Language)
WHERE elementId(article) = '4:a5ad2f22-47ff-4fcc-a208-358646d5b3ec:261210970'
AND elementId(language) = '4:a5ad2f22-47ff-4fcc-a208-358646d5b3ec:1074513'
RETURN r
However, using an indexed property, it does what I would expect:
PROFILE MATCH (article:Article)-[r:IN_LANGUAGE]->(language:Language)
WHERE article.uuid = '57686315-e27e-42bd-bbef-3736f9d896f9'
AND language.uuid = 'af8bd5c02b8211e8b26f020bc29461c8'
RETURN r
Unfortunately, my use case requires the use of node element IDs.
Does anyone know why this is happening, and if I could optimize this query? I'm guessing internally elementId() or id() works differently than indexed properties, but somehow this behavior looks very weird to me.
I don't think elementID is hashed the same way as an index and would suggest you figure a way of migrating away from elementID() or you might start having weird behaviours.
Quote:
There are important considerations to bear in mind when using elementId():
Every node and relationship is guaranteed an element ID. This ID is unique among both nodes and relationships across all databases in the same DBMS within the scope of a single transaction. However, no guarantees are given regarding the order of the returned ID values or the length of the ID STRINGvalues. Outside of the scope of a single transaction, no guarantees are given about the mapping between ID values and elements. 2. Neo4j reuses its internal IDs when nodes and relationships are deleted. Applications relying on internal Neo4j IDs are, as a result, brittle and can be inaccurate. It is therefore recommended to use application-generated IDs.
Could you send the PROFILE plan of the faster query for comparison?
We'll want to double-check if the slowdown is actually because of the lookup (doubtful) or some other aspect of how the query is planned which we may be able to change.
Specifically, in the PROFILE plan for the slower query which you provided, it looks like it's starting from the :Language node and expanding to all articles in that language before filtering down.
That would naturally be slow, but expanding in the opposite direction should be fast. We would want to nudge the planner to start by matching on the :Article node, and expanding to the languages that it's in, and filtering down to the desired node.
You can use a SKIP 0 to create a barrier to the planner, where it will have to solve patterns prior to the SKIP before it can proceed. By putting the filtering of the :Language node on the other side of the SKIP, it should cause the planner to expand from the article node.
Give this a try:
PROFILE
MATCH (article:Article)-[r:IN_LANGUAGE]->(language:Language)
WHERE elementId(article) = '4:a5ad2f22-47ff-4fcc-a208-358646d5b3ec:261210970'
WITH article, r, language
SKIP 0
WHERE elementId(language) = '4:a5ad2f22-47ff-4fcc-a208-358646d5b3ec:1074513'
RETURN r
It may be expanding and filtering instead of doing a NodeByElementIdSeek, but it should still be faster due to the cheaper expansion.