The answer on slide 8 of the browser guide indicates that the correct way to introduce pattern comprehension to this query:
PROFILE
MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE $relYear1 <= m.releaseYear <= $relYear2 AND a.born > $yearBorn
RETURN a.name as actor, a.born as born,
collect(DISTINCT m.title) AS titles ORDER BY actor
is like so:
PROFILE
MATCH (a:Actor)
WHERE a.born > $yearBorn
RETURN a.name AS actor, a.born AS born,
[(a)-->(x) WHERE $relYear1 <= x.releaseYear <= $relYear2 | x.title] AS titles ORDER BY actor
Which generates 95036 hits in 49 ms. My questions are:
-
why is there seemingly no performance penalty from using
[(a)-->(x)..
rather than[(a)-[:ACTED_IN]->(m:Movie)..
? Does pattern comprehension really not care about having no relationship type or node labels included in this way? -
if you did swap to the the more explicit syntax of
[(a)-[:ACTED_IN]->(m:Movie)..
, then you must includeWITH DISTINCT a
before the return statement, otherwise there are duplicate rows for each actor - why is this? -
if you use the documented syntax of
[(a)-->(x)..
it's possible to to get a slightly faster execution time by adding aWITH DISTINCT a
clause first (a few ms), but strangely this increases dbhits to 98273. Can anyone explain why?