Why so many DB hits?

sporritt · March 22, 2023, 5:05pm

I have a cypher query:

MATCH (x:My_Label)
where x.DateProperty.epochMillis <= 1679443200000
RETURN count(x)

There are approximately 7M nodes in my DB.
My_Label refers to approximately 500k nodes.
Of the 500k nodes, about 45k match the where clause.
The query takes about 10 seconds to run from cold and about 4 seconds on reruns.
The query planner shows 500k db hits for stage 1 of the query (NodeByLabelScan), but then shows 28M db hits for stage 2 (Filter).
The server has 128gb of RAM and only runs Neo.
The server is not under any particular load.
Each node has a few small properties on it.

So here are my questions:

With all that free RAM, I'd expect the initial result set of 500k nodes to be loaded into memory. Is this the case? If not, why not?
Why are we seeing 28M db hits for a filter on a single property? If we have a 500k node dataset and we need to compare a property on each of those nodes, surely that's only 1M db hits.

I've redacted and altered a couple of bits of info from the plan, but this is essentially what I'm seeing:

Help plix?

lyonwj · March 22, 2023, 7:10pm

Do you have an index on the date property? That will improve range comparisons on large datasets like this:

CREATE RANGE INDEX FOR (t:My_Label) ON t.DateProperty

sporritt · March 23, 2023, 12:55pm

Thanks. I'll try this, but I still don't get why I'm seeing circa 30M db hits.

Topic		Replies	Views
Query Performance for Label Matching Cypher	3	290	November 25, 2021
Query Tuning Help - Lots of DB hits? Cypher performance , cypher	3	1164	March 24, 2020
Simple count query taking too long to execute Cypher	5	869	February 17, 2020
About db hits of Neo4j Cypher performance , cypher	2	1632	December 13, 2019
Optimising query performance with a relatively simple match Cypher performance	3	656	July 3, 2020

Get Certified in June!