How to NOT use an index?

chris3 · June 24, 2019, 4:02pm

Hello everyone!

I have been using Neo4j for a while now but just realized that I have a performance problem with the following query (simplified):

MATCH (u:User)<-[:U]-(:Profits)-[:DP]->(dp:DailyProfit) WHERE ID(u)=12345 AND dp.localDate>='20190601' AND dp.localDate<='20190624' RETURN dp

(FYI Result = only 24 DailyProfit nodes, one per day)

Instead of using the index on the ID, getting about 500 dp nodes and then filtering those on the localDate property, the query planner uses a NodeIndexSeekByRange on all of the 500k (!) DailyProfit nodes in the graph ... which takes about 1.5 seconds (!)

BTW the query with just the localDate>=x (and not a range >= and <=) takes 18ms (milli seconds that is!) because it does exactly what I would expect: Start with a single User node and filter the ~500 DailyProfit nodes...

Can I tell the query planner to start with the index on the ID of the user and filter the very small result with a range filter somehow?

Or any other suggestions to optimize this so that the planner doesn't always start with the range seek just because it thinks that is the fastest way (because of course it isn't since there are about 500 DailyProfits nodes per user but 500k in the entire DB)?

THX!

andrew_bowman · June 24, 2019, 4:34pm

So this is a frustrating thing, the planner REALLY wants to use that index seek by range. Even if I massage the query to perform that lookup by id, it still wants to use an index seek by range and use a hash join between the results.

I'll raise this with the Cypher team, but in the meantime we can throw something into the query which will cause the planner to only use the localDate predicates as a filter and not for lookup. Try this out, verify it with EXPLAIN:

MATCH (u:User)<-[:U]-(:Profits)-[:DP]->(dp:DailyProfit) 
WHERE ID(u)=12345 
WITH u, dp, true as ignored // introducing a new variable to throw off the planner
WHERE dp.localDate>='20190601' AND dp.localDate<='20190624' 
RETURN dp

chris3 · June 24, 2019, 5:06pm

Hey Andrew,
I have tried all sorts of stuff ... but your solution is genius AND it works!
THX!!!
Cheers, Chris

PS: I was thinking along the lines of USING INDEX u:User(*ID*) ... or NOT USING INDEX dp:DailyProfit(localDate) ... or some way of telling the planner where to start since I know that it would be smarter to start at the user-index...

andrew_bowman · June 24, 2019, 5:09pm

By default the planner SHOULD be preferring the graph id lookup (it isn't an index lookup so we can't use a hint for it) over all the others.

And in the case where I could get it to use the graph id lookup, it just couldn't let go of the index range scan on the others, so it would still be inefficient, so I had to resort to trickery.

I've raised the suggestion with our Cypher team on introducing a new planner hint to disable a lookup of any type on a node, so only expansions to that node would be possible. We'll see if that gets picked up later.

Topic		Replies	Views
How do I get this query to use an index? Cypher performance , cypher	7	355	February 16, 2023
Index is not always used Cypher	9	2196	November 13, 2018
Cypher: Alias length >4 chars makes query 25x slower Cypher performance , migrated , planner	8	363	April 27, 2023
How to get query plan and how to create and use an index on id? Cypher	10	526	August 4, 2020
Why is a projection killing index query performance? Cypher performance	4	226	April 20, 2023

Take the Course Then Join The Aura Agent Hackathon

How to NOT use an index?

Related topics

Take the Course Then Join
The Aura Agent Hackathon