I have a cypher query:
MATCH (x:My_Label)
where x.DateProperty.epochMillis <= 1679443200000
RETURN count(x)
- There are approximately 7M nodes in my DB.
- My_Label refers to approximately 500k nodes.
- Of the 500k nodes, about 45k match the
where
clause. - The query takes about 10 seconds to run from cold and about 4 seconds on reruns.
- The query planner shows 500k db hits for stage 1 of the query (NodeByLabelScan), but then shows 28M db hits for stage 2 (Filter).
- The server has 128gb of RAM and only runs Neo.
- The server is not under any particular load.
- Each node has a few small properties on it.
So here are my questions:
- With all that free RAM, I'd expect the initial result set of 500k nodes to be loaded into memory. Is this the case? If not, why not?
- Why are we seeing 28M db hits for a filter on a single property? If we have a 500k node dataset and we need to compare a property on each of those nodes, surely that's only 1M db hits.
I've redacted and altered a couple of bits of info from the plan, but this is essentially what I'm seeing:
Help plix?