Exercise 3: Cypher Query Tuning with Neo4j 4.0

The answer on slide 8 of the browser guide indicates that the correct way to introduce pattern comprehension to this query:

PROFILE
MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE $relYear1 <= m.releaseYear <= $relYear2 AND a.born > $yearBorn
RETURN a.name as actor, a.born as born,
collect(DISTINCT m.title) AS titles ORDER BY actor

is like so:

PROFILE
MATCH (a:Actor)
WHERE a.born > $yearBorn
RETURN a.name AS actor,  a.born AS born,
[(a)-->(x) WHERE $relYear1 <= x.releaseYear <= $relYear2 | x.title] AS titles ORDER BY actor

Which generates 95036 hits in 49 ms. My questions are:

  1. why is there seemingly no performance penalty from using [(a)-->(x).. rather than [(a)-[:ACTED_IN]->(m:Movie)..? Does pattern comprehension really not care about having no relationship type or node labels included in this way?

  2. if you did swap to the the more explicit syntax of [(a)-[:ACTED_IN]->(m:Movie).., then you must include WITH DISTINCT a before the return statement, otherwise there are duplicate rows for each actor - why is this?

  3. if you use the documented syntax of [(a)-->(x).. it's possible to to get a slightly faster execution time by adding a WITH DISTINCT a clause first (a few ms), but strangely this increases dbhits to 98273. Can anyone explain why?

Hello @terryfranklin82 ,

(1) I see slightly worse performance using the relationship type for the pattern comprehension. Using this unnamed type syntax is definitely better. Of course, things like this could change in future releases.

(2) I do not see duplicate rows. Is it possible that you have loaded the data more than once?

(3) I am getting the same number of db hits with or without WITH DISTINCT

I am using a 4.1.3 database.

Elaine

Keep in mind there are very few relationships in the movies graph. Because of that, not specifying the relationship type can be about the same or even better performance than with the type specified.

In a more complex or realistic graph with many types of relationships, it will almost always be more efficient to include the relationship type, not just because of correctness, but efficiency.