DB hits increase a lot when added a WHERE condition

cuneyttyler · January 12, 2023, 8:07am

I have the following query:

PROFILE MATCH(e:Entity {name:'Mona Lisa'})-[rel]->(e2:Entity) 
WITH *, collect( e2.name) as e2Nodes 
MATCH(e2)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity) WHERE e2ins.wikidata_id in ['Q5']
WITH *, collect( e2ins.name) as e2InsNodes
return   e2Nodes,e2InsNodes

It has 1k db hits and it's plan is as below:

plan (3).png

When I add a WHERE condition to this query as follows:

PROFILE MATCH(e:Entity {name:'Mona Lisa'})-[rel]->(e2:Entity) 
WITH *, collect( e2.name) as e2Nodes 
MATCH(e2)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity) WHERE e2ins.wikidata_id in ['Q5']
WITH *, collect( e2ins.name) as e2InsNodes
return   e2Nodes,e2InsNodes

It's db hit count increases to 100k and the plan is as below:

Basically I'm returning entities(e2) have an incoming edge from 'Mona Lisa'. This is 1k db hits. Then I fetch entities with some specific relation(...P31) of these e2 entities. The total number of results is 227. However, there are 100k hits in between the steps when I filtered results by some attribute. Why is this the case?

Thanks.

NOTE: This applies to the following two queries below:

PROFILE MATCH(e:Entity {name:'Mona Lisa'})-[rel]->(e2:Entity)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity)  
return   e2.name,e2ins.wikidata_id

PROFILE MATCH(e:Entity {name:'Mona Lisa'})-[rel]->(e2:Entity)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity) WHERE e2ins.wikidata_id in ['Q5']
return   e2.name,e2ins.wikidata_id

cuneyttyler · January 12, 2023, 8:50am

As I've figured out, when you put a WHERE condition - no matter the start node is specified by an index lookup - it also fetches those nodes with the specified conditions in the where clause at the beginning of the query.

If the first query is collected beforehand, than it works as expected with few db hits.

PROFILE MATCH(e:Entity{name: 'Mona Lisa'})-[rel]->(e2:Entity)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity) WITH e,e2,collect(e2ins) as e2InsNodes  UNWIND e2InsNodes as e2ins WITH * WHERE e2ins.wikidata_id = 'Q5' 
return  distinct e2.name,e2ins.wikidata_id

I'm not sure about the reason of this implementation and causes of it but I think this types of queries should be handled in a way that they start from the start node, expand and filter queries by WHERE clause. It would be a nice thing to have.

anthapu · January 13, 2023, 12:07am

In the new query planner it tries to optimize for speed rather than memory and db hits. If a node has an unique index it tries to get those nodes and traverse from both sides and reach the middle. This can have higher db hits, but the total time query taken can get improved.

I have talked about this aspect in my book Graph Data Processing with Cypher .

Topic		Replies	Views
About db hits of Neo4j Cypher performance , cypher	2	1597	December 13, 2019
Understanding the terms in the Profile operators Newbie Questions	8	1555	August 25, 2020
Why so many DB hits? Cypher performance , cypher	2	286	March 23, 2023
How to reduce the db hit during the CYPHER execution General performance	4	1300	February 4, 2020
Why so high db hits in ProduceResults? Cypher	4	886	December 17, 2020

DB hits increase a lot when added a WHERE condition

Related topics