Hello,
We're using neo4j 4.4.23 and have the following simplified model (I can't share the real model but the structure is the same):
node labels: Person { name }
, Location
, Food
relationship types: VISITED
, ATE
indexes: CREATE BTREE INDEX idx_person_name FOR (p:Person) ON (p.name)
We use the following two queries of similar structure to audit our data:
Q1 (distinct locations visited by different persons with the same name):
MATCH (p:Person)-[:VISITED]->(l:Location)<-[:VISITED]-(p2:Person)
WHERE p.name=p2.name and ID(p) <> ID(p2)
RETURN COUNT(distinct l)
Q2 (distinct foods eaten by different persons with the same name):
MATCH (p:Person)-[:ATE]->(f:Food)<-[:ATE]-(p2:Person)
WHERE p.name=p2.name and ID(p) <> ID(p2)
RETURN COUNT(distinct f)
Q1 takes some time to complete (15-20 minutes) but Q2 crushes the database. Oddly, they have different query plans despite having the same structure and referencing the same indexed property. Q1 scans persons once and then follows relationships from p1
to p2
whereas Q2 double scans all persons, follows relationships on both sides until f
and finally does a join:
Note: I had to edit the label names and relationship types since I can't share the real data model.
Some data statistics :
- about 15M
Person
nodes - about 70M
Location
nodes - about 20M
Food
nodes - about 30M
VISITED
relationships - about 50M
ATE
relationships
Since the orders of magnitude are the same (tens of millions), I would have expected the execution plans to be the same, but maybe the differences between the numbers are enough to trigger different execution plans.
But it also looks to me that Q2's execution plan is sub-optimal (scanning parties two times and then doing a join instead of following relationships), is it really? I have little knowledge of neo4j's query execution mechanics.
We tried to recompute database statistics and to manually replan cached execution plans but it did not help. We also tried the dp
query planner. Any ideas of something we would be missing or anything we should try? How can we hint the query planner to choose a plan for Q2 similar to Q1's?
Thank you for your help.