In a query over a BTREE index, I introduced a map projection to only return the properties I was interested in, but got an unexpected hit in performance:
I have a (Node) BTREE index over the label "Node" on the single property "id"
They query was originally
MATCH (n:Node)
WHERE n.id > $bookmark
RETURN n
ORDER BY n.id
LIMIT $limit
and used the index for ordering and sorting.
However, changing it to
MATCH (n:Node)
WHERE n.id > $bookmark
RETURN n{.id, .other_field}
ORDER BY n.id
LIMIT $limit
severely impacted the performance.
The query planner showed that the first query has operations NodeUniqueIndexSeekByRange -> Limit -> ProduceResults, while the second has operations NodeUniqueIndexSeekByRange -> Projection -> Projection -> Top -> ProduceResults.
Thus, it looks like the planner doesn't look into the projection and see that it can be done after ordering.
There are simple work-arounds (like moving the ORDER BY before RETURN, or returning the fields directly instead of using a projection), but I think this should be filed as a bug in the query planner.
Yes, that works, and was one of my suggested workarounds. I just thought it was very surprising that such a minor change had such a large impact.
As a developer having written many Cypher queries, I didn't really think of re-profiling my query after (effectively) changing it from
MATCH (n:Node) WHERE n.id > $bookmark
RETURN n.id
ORDER BY n.id LIMIT $limit
to
MATCH (n:Node) WHERE n.id > $bookmark
RETURN n{.id}
ORDER BY n.id LIMIT $limit
Luckily, we had a medium sized data set in testing and an observant tester who thought the query was getting significantly slower. On that particular label run time went from 2ms to 1802ms. On the largest node set we have, run time goes from 2ms to 39000ms..
For medium sized (or slowly growing) data sets it's not that easy to discover, so do other people think it's worth to file a bug against the query planner?