Why is a projection killing index query performance?

In a query over a BTREE index, I introduced a map projection to only return the properties I was interested in, but got an unexpected hit in performance:

I have a (Node) BTREE index over the label "Node" on the single property "id"

They query was originally

MATCH (n:Node)
  WHERE n.id > $bookmark
RETURN n
  ORDER BY n.id
  LIMIT $limit

and used the index for ordering and sorting.

However, changing it to

MATCH (n:Node)
  WHERE n.id > $bookmark
RETURN n{.id, .other_field}
  ORDER BY n.id
  LIMIT $limit

severely impacted the performance.

The query planner showed that the first query has operations NodeUniqueIndexSeekByRange -> Limit -> ProduceResults, while the second has operations NodeUniqueIndexSeekByRange -> Projection -> Projection -> Top -> ProduceResults.

Thus, it looks like the planner doesn't look into the projection and see that it can be done after ordering.

There are simple work-arounds (like moving the ORDER BY before RETURN, or returning the fields directly instead of using a projection), but I think this should be filed as a bug in the query planner.

We are using Neo4j 4.4.18-enterprise.

Hi @nordli ,

Can you try something like

MATCH (n:Node)
WHERE n.id > $bookmark
WITH n ORDER BY n.id
LIMIT $limit
RETURN n{.id, .other_field}
  

?

Bennu

Yes, that works, and was one of my suggested workarounds. I just thought it was very surprising that such a minor change had such a large impact.

As a developer having written many Cypher queries, I didn't really think of re-profiling my query after (effectively) changing it from

MATCH (n:Node) WHERE n.id > $bookmark
RETURN n.id
  ORDER BY n.id LIMIT $limit

to

MATCH (n:Node) WHERE n.id > $bookmark
RETURN n{.id}
  ORDER BY n.id LIMIT $limit

Luckily, we had a medium sized data set in testing and an observant tester who thought the query was getting significantly slower. On that particular label run time went from 2ms to 1802ms. On the largest node set we have, run time goes from 2ms to 39000ms..

For medium sized (or slowly growing) data sets it's not that easy to discover, so do other people think it's worth to file a bug against the query planner?

Hi @nordli !

You are completely right about this. I just reported this internally and it will be considered as a future enhancement on the planner.

Thanks for reporting this!

Thanks for picking it up! :slight_smile: