I am wondering about the performance of the following string comparison in where clause:
Case 1:
MATCH (p: Person)
WHERE NOT p.name='Person 1'
AND NOT p.name='Person 2'
RETURN p
Case 2:
MATCH (p: Person)
WHERE NOT (p.name='Person 1' OR p.name='Person 2')
RETURN p
Case 3:
MATCH (p: Person)
WHERE NOT p.name IN ['Person 1', 'Person 2']
RETURN p
I have tried on my database but the result is not united, sometimes case 3 gets be best response time and db hits, sometimes the best is case 1 and case 2.
Can you please make this clear for me?
My case is more complicated and this is my example for the comparison. I have to consider this because my the property, that I need to filter out, is rarely used and it's not so selective (it has less then 10 different values).
Have you tried using the keyword EXPLAIN and PROFILE on before your query?
EXPLAIN will not run the query, but show you the predicted cost of the query
PROFILE will run the query and show the cost of the query
In the your case, the Case 1 & Case 2 should perform with around the same cost. Basically it would scan all node label Person two times, one time for 'Person 1' the second time for 'Person 2'.
In Case 3 it will only scan all the node label Person once, checking both string 'Person 1' and 'Person 2' in one go.
The query Case 3 with the IN keyword would perform even better in comparison with the other two when you actually have more strings to compare (e.g p.name IN ['Person 1', 'Person 2, 'Person 3']