I have a question about query performance and the use of indices. I have performed the query
PROFILE MATCH (p:Provider) WHERE
EXISTS(p.service_provider) AND EXISTS(p.sourceID) AND EXISTS(p.valid_until)
RETURN DISTINCT(p.service_provider)
and I have got a uniqueness constraint on p.service_provider. I would have thought that the query planner picks up on this and makes use of the unique index and notices that the DISTINCT is actually redundant here. The query planner looks as follows:
However, when I got rid of the uniqueness constraint and just created a (NONUNIQUE) index on p.service_provider the query planner seems to do exactly the same.
Do I maybe have to rewrite the query somehow that in the situation where I have the uniqueness constraint (and hence a UNIQUE index) that the query planner picks up on this or does this just not happen in Neo4j (so far)?
I found similar questions about indexing and query performance online, but nothing that seems to address exactly this.
I think the issue here is that your query is checking whether the property exists, not whether the value is unique or not. So your query will not hit the index because it's not checking the value of the property. An existence constraint would be more appropriate here, although then your query isn't needed because the constraint will ensure the property exists and fail if it doesn't. Documentation on each constraint type is here: Constraints - Cypher Manual.
Thanks a lot for getting back to me. I am still left a little confused I have to say.
Even when just executing
PROFILE MATCH (p:Provider)
RETURN DISTINCT(p.service_provider)
then the query plan looks the same, whether I have a unique constraint on p.service_provider or not. How would I have to go about things that the redundant check for DISTINCT will be omitted, i.e. the query could be sped up?
I know that in this instance I could just leave out the DISTINCT as the property has to be unique due to the constraint that is enforced. However, say I have a database and I am not aware of which constraints are enforced, etc. how would such a query have to be written so the redundant DISTINCT check won't be executed?
Got a bit more detail on this.....Cypher planner doesn't eliminate Distinct in this case. It does in some other cases in Neo4j 5.x, but nothing in 4.x.
However, your query's plan is using OrderedDistinct, which is much more efficient (improving query performance). For unique values coming from an index, OrderedDistinct will only need to keep a single row in memory, whereas plain Distinct must accumulate all rows in memory. The easiest way to get rid of OrderedDistinct is not to use DISTINCT in the query when you know that the values are unique. This method does require some knowledge of constraints and db operations, though.