Performance of query with DISTINCT and unique index

pska752 · July 29, 2024, 10:03pm

Hi Neo4j Community,

I have a question about query performance and the use of indices. I have performed the query

PROFILE MATCH (p:Provider) WHERE
EXISTS(p.service_provider) AND EXISTS(p.sourceID) AND EXISTS(p.valid_until)
RETURN DISTINCT(p.service_provider)

and I have got a uniqueness constraint on p.service_provider. I would have thought that the query planner picks up on this and makes use of the unique index and notices that the DISTINCT is actually redundant here. The query planner looks as follows:

However, when I got rid of the uniqueness constraint and just created a (NONUNIQUE) index on p.service_provider the query planner seems to do exactly the same.

Do I maybe have to rewrite the query somehow that in the situation where I have the uniqueness constraint (and hence a UNIQUE index) that the query planner picks up on this or does this just not happen in Neo4j (so far)?

I found similar questions about indexing and query performance online, but nothing that seems to address exactly this.

Thanks for letting me know.

Best,
Philipp

jennifer_reif · August 26, 2024, 8:38pm

Hello, Philipp! Thanks for reaching out.

I think the issue here is that your query is checking whether the property exists, not whether the value is unique or not. So your query will not hit the index because it's not checking the value of the property. An existence constraint would be more appropriate here, although then your query isn't needed because the constraint will ensure the property exists and fail if it doesn't. Documentation on each constraint type is here: Constraints - Cypher Manual.

Let me know if you have more questions on this!

Cheers,
Jennifer Reif

pska752 · August 26, 2024, 10:19pm

Hi Jennifer,

Thanks a lot for getting back to me. I am still left a little confused I have to say.

Even when just executing

PROFILE MATCH (p:Provider)
RETURN DISTINCT(p.service_provider)

then the query plan looks the same, whether I have a unique constraint on p.service_provider or not. How would I have to go about things that the redundant check for DISTINCT will be omitted, i.e. the query could be sped up?

I know that in this instance I could just leave out the DISTINCT as the property has to be unique due to the constraint that is enforced. However, say I have a database and I am not aware of which constraints are enforced, etc. how would such a query have to be written so the redundant DISTINCT check won't be executed?

Thanks for letting me know, much appreciated

jennifer_reif · August 27, 2024, 6:02pm

Got a bit more detail on this.....Cypher planner doesn't eliminate Distinct in this case. It does in some other cases in Neo4j 5.x, but nothing in 4.x.

However, your query's plan is using OrderedDistinct, which is much more efficient (improving query performance). For unique values coming from an index, OrderedDistinct will only need to keep a single row in memory, whereas plain Distinct must accumulate all rows in memory. The easiest way to get rid of OrderedDistinct is not to use DISTINCT in the query when you know that the values are unique. This method does require some knowledge of constraints and db operations, though.

pska752 · August 29, 2024, 10:07pm

Hi Jennifer,

Thanks a lot for the detailed explanation, and in particular about the difference between OrderedDistinct and (regular) Distinct in the query planner.

That helps a lot

Topic		Replies	Views
Any ideas on speeding up a query with DISTINCT and LIMIT already in place Cypher cypher	13	1245	September 20, 2019
My first post and question: Is DISTINCT going to hurt my query performance in the long run? Modeling performance , cypher , relationship	2	321	February 17, 2021
Not able to use index to get distinct values Cypher	2	987	July 1, 2019
Query Optimization - We wanna exclude the "distinct" part Cypher performance , cypher , operations	1	823	January 31, 2019
Sub-optimal query plan? Cypher performance	4	304	August 8, 2023

July Summer Fun!

Performance of query with DISTINCT and unique index

Related topics