Sub-optimal query plan?

markarasev · August 3, 2023, 4:18pm

Hello,

We're using neo4j 4.4.23 and have the following simplified model (I can't share the real model but the structure is the same):

node labels: Person { name }, Location, Food
relationship types: VISITED, ATE
indexes: CREATE BTREE INDEX idx_person_name FOR (p:Person) ON (p.name)

We use the following two queries of similar structure to audit our data:

Q1 (distinct locations visited by different persons with the same name):

MATCH (p:Person)-[:VISITED]->(l:Location)<-[:VISITED]-(p2:Person)
WHERE p.name=p2.name and ID(p) <> ID(p2)
RETURN COUNT(distinct l)

Q2 (distinct foods eaten by different persons with the same name):

MATCH (p:Person)-[:ATE]->(f:Food)<-[:ATE]-(p2:Person)
WHERE p.name=p2.name and ID(p) <> ID(p2)
RETURN COUNT(distinct f)

Q1 takes some time to complete (15-20 minutes) but Q2 crushes the database. Oddly, they have different query plans despite having the same structure and referencing the same indexed property. Q1 scans persons once and then follows relationships from p1 to p2 whereas Q2 double scans all persons, follows relationships on both sides until f and finally does a join:

Note: I had to edit the label names and relationship types since I can't share the real data model.

Some data statistics :

about 15M Person nodes
about 70M Location nodes
about 20M Food nodes
about 30M VISITED relationships
about 50M ATE relationships

Since the orders of magnitude are the same (tens of millions), I would have expected the execution plans to be the same, but maybe the differences between the numbers are enough to trigger different execution plans.

But it also looks to me that Q2's execution plan is sub-optimal (scanning parties two times and then doing a join instead of following relationships), is it really? I have little knowledge of neo4j's query execution mechanics.

We tried to recompute database statistics and to manually replan cached execution plans but it did not help. We also tried the dp query planner. Any ideas of something we would be missing or anything we should try? How can we hint the query planner to choose a plan for Q2 similar to Q1's?

Thank you for your help.

ameyasoft · August 3, 2023, 10:04pm

Try this:

MATCH (p:Person)-[:VISITED]->(l:Location)
WITH distinct p.name as persons,  count(distinct l.location) as place
RETURN persons, place order by place desc

MATCH (p:Person)-[:ATE]->(f:Food)
WITH distinct p.name as persons,  count(distinct f.food) as food
RETURN persons, food order by food desc

markarasev · August 7, 2023, 11:48am

Hello, thank you for your response.

These queries seem not to return the total number of distinct locations visited (or foods eaten) by different persons with the same name as Q1 and Q2 do. They seem to return counts of distinct locations visited (or foods eaten) by groups of people with the same name instead. And some locations/foods may be shared by different groups of people with the same name, so even adding a WITH sum(place) AS total does not give the same result as Q1.

bennu_neo · August 7, 2023, 8:20pm

Hi @markarasev ,

This is an interesting topic. Just in order to get an idea on your data, what's the result of:

MATCH (p:Person)
WHERE EXISTS{(pro:Person) WHERE pro.name = p.name and pro <> p } 
WITH distinct p.name as name
RETURN COUNT(name)

markarasev · August 8, 2023, 9:32am

Hello @bennu_neo,

The result of this query is 1197831.
If I understand correctly, it is the number of names shared by at least two Persons, right?

Topic		Replies	Views
Cypher query execution is not effectively Cypher performance , cypher	2	320	July 27, 2021
Query Performance for Label Matching Cypher	3	291	November 25, 2021
How to optimize a query that will be executed millions of times under different parameters Drivers & Stacks querying	3	637	October 4, 2019
Questions about my query model Cypher querying , optimization , cypher , subquery	13	128	March 6, 2025
Friend-of-a-friend of many degrees query in Neo4j Cypher performance	3	732	May 7, 2023

Get Certified in June!

Sub-optimal query plan?

Related topics