Why is a projection killing index query performance?

nordli · April 20, 2023, 10:14am

In a query over a BTREE index, I introduced a map projection to only return the properties I was interested in, but got an unexpected hit in performance:

I have a (Node) BTREE index over the label "Node" on the single property "id"

They query was originally

MATCH (n:Node)
  WHERE n.id > $bookmark
RETURN n
  ORDER BY n.id
  LIMIT $limit

and used the index for ordering and sorting.

However, changing it to

MATCH (n:Node)
  WHERE n.id > $bookmark
RETURN n{.id, .other_field}
  ORDER BY n.id
  LIMIT $limit

severely impacted the performance.

The query planner showed that the first query has operations NodeUniqueIndexSeekByRange -> Limit -> ProduceResults, while the second has operations NodeUniqueIndexSeekByRange -> Projection -> Projection -> Top -> ProduceResults.

Thus, it looks like the planner doesn't look into the projection and see that it can be done after ordering.

There are simple work-arounds (like moving the ORDER BY before RETURN, or returning the fields directly instead of using a projection), but I think this should be filed as a bug in the query planner.

We are using Neo4j 4.4.18-enterprise.

bennu_neo · April 20, 2023, 10:44am

Hi @nordli ,

Can you try something like

MATCH (n:Node)
WHERE n.id > $bookmark
WITH n ORDER BY n.id
LIMIT $limit
RETURN n{.id, .other_field}

?

Bennu

nordli · April 20, 2023, 11:51am

Yes, that works, and was one of my suggested workarounds. I just thought it was very surprising that such a minor change had such a large impact.

As a developer having written many Cypher queries, I didn't really think of re-profiling my query after (effectively) changing it from

MATCH (n:Node) WHERE n.id > $bookmark
RETURN n.id
  ORDER BY n.id LIMIT $limit

to

MATCH (n:Node) WHERE n.id > $bookmark
RETURN n{.id}
  ORDER BY n.id LIMIT $limit

Luckily, we had a medium sized data set in testing and an observant tester who thought the query was getting significantly slower. On that particular label run time went from 2ms to 1802ms. On the largest node set we have, run time goes from 2ms to 39000ms..

For medium sized (or slowly growing) data sets it's not that easy to discover, so do other people think it's worth to file a bug against the query planner?

bennu_neo · April 20, 2023, 12:44pm

Hi @nordli !

You are completely right about this. I just reported this internally and it will be considered as a future enhancement on the planner.

Thanks for reporting this!

nordli · April 20, 2023, 12:52pm

Thanks for picking it up!

Topic		Replies	Views
ORDER BY not using index Neo4j Graph Platform performance	3	307	June 8, 2023
Cypher: Alias length >4 chars makes query 25x slower Cypher performance , migrated , planner	8	363	April 27, 2023
Extremely slow query when profile looks very good? Cypher	11	4167	October 3, 2019
How to NOT use an index? Cypher	3	1484	June 24, 2019
How to speed up ORDER BY Cypher	10	1252	July 2, 2020

Take the Course Then Join The Aura Agent Hackathon

Why is a projection killing index query performance?

Related topics

Take the Course Then Join
The Aura Agent Hackathon