Exercise 3: Cypher Query Tuning with Neo4j 4.0

terryfranklin82 · November 4, 2020, 11:16pm

The answer on slide 8 of the browser guide indicates that the correct way to introduce pattern comprehension to this query:

PROFILE
MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE $relYear1 <= m.releaseYear <= $relYear2 AND a.born > $yearBorn
RETURN a.name as actor, a.born as born,
collect(DISTINCT m.title) AS titles ORDER BY actor

is like so:

PROFILE
MATCH (a:Actor)
WHERE a.born > $yearBorn
RETURN a.name AS actor,  a.born AS born,
[(a)-->(x) WHERE $relYear1 <= x.releaseYear <= $relYear2 | x.title] AS titles ORDER BY actor

Which generates 95036 hits in 49 ms. My questions are:

why is there seemingly no performance penalty from using [(a)-->(x).. rather than [(a)-[:ACTED_IN]->(m:Movie)..? Does pattern comprehension really not care about having no relationship type or node labels included in this way?
if you did swap to the the more explicit syntax of [(a)-[:ACTED_IN]->(m:Movie).., then you must include WITH DISTINCT a before the return statement, otherwise there are duplicate rows for each actor - why is this?
if you use the documented syntax of [(a)-->(x).. it's possible to to get a slightly faster execution time by adding a WITH DISTINCT a clause first (a few ms), but strangely this increases dbhits to 98273. Can anyone explain why?

elaine_rosenber · November 5, 2020, 8:20pm

Hello @terryfranklin82 ,

(1) I see slightly worse performance using the relationship type for the pattern comprehension. Using this unnamed type syntax is definitely better. Of course, things like this could change in future releases.

(2) I do not see duplicate rows. Is it possible that you have loaded the data more than once?

(3) I am getting the same number of db hits with or without WITH DISTINCT

I am using a 4.1.3 database.

Elaine

andrew_bowman · November 16, 2020, 3:51pm

Keep in mind there are very few relationships in the movies graph. Because of that, not specifying the relationship type can be about the same or even better performance than with the type specified.

In a more complex or realistic graph with many types of relationships, it will almost always be more efficient to include the relationship type, not just because of correctness, but efficiency.

Topic		Replies	Views
Db hits difference when using labeled entity vs. not labeled entity selection inside pattern comprehension string Cypher cypher	1	194	November 6, 2023
Efficient Code Graph Academy & Certifications cypher	6	373	January 9, 2021
Adding Genre nodes challenge - alternate Cypher code enquiry Cypher performance , cypher	3	257	May 9, 2023
Is it better to have many different relationship types or one relationship with properties? Cypher performance	10	8315	January 23, 2020
What's the difference between one MATCH pattern vs the same pattern broken up? Cypher	1	306	June 24, 2021

August Summer Fun!

Exercise 3: Cypher Query Tuning with Neo4j 4.0

Related topics