Hey,
an inconsistency I've encountered while doing exercise 5 on top of the movies db.
first query:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
WITH count(a) as numMovies, collect(m.title) as movies,a
WHERE numMovies = 5
RETURN a.name, movies
resulting in a table with 3 rows.
second query:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
WITH count(a) as numMovies, m, a
WHERE numMovies = 5
RETURN a.name, collect(m.title) as movies
returning no records.
the only difference is that I've declared the collect(m.title) in the with and not in the return.
There may be a convention in node4j that I'm not familiar with that may explain this but I would love your help to understand this situation.
Cypher doesn't have a group by like SQL. When you use aggregating functions, anything in your return clause that is not aggregated becomes part of the grouping.
In your second query WITH clause, you included both m and a. That means that you are grouping by both the move and the person. Since there is only one ACTED_IN relationship connecting an actor to an individual movie, numMovies is always equal to 1. You can run the following query to see why your WHERE clause is filtering out all of the data.
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
RETURN count(a) as numMovies, m, a
ORDER BY a.name
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(a) AS numMovies, collect(m.title) AS movies
WHERE numMovies = 5
RETURN a.name, movies
vs
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(m) AS numMovies, collect(m.title) AS movies //notice the count(m)
WHERE numMovies = 5
RETURN a.name, movies
Both queries return the same result.
For me it's more intuitive to say count(m) to find number of movies than count(a).
But the tutorials use count(a).
Why is count(a) a better practice than count(m)?
Thanks.
Since it is a simple single step count(a) or count(m) both will give the distinct
(a)-->(m) paths. So either one will give same result. Once you have longer pattern things can get different.
count(a) is not better practice than count(m) here, you're right.
Given the context provided by the alias being used here (numMovies), the query should be favoring count(m). The results may not change, but it's important I think to align the context with what you're doing, especially if we have to alter the query to count something related to movies, or to count(DISTINCT m). If it was left as a, then such an alternation may overlook the fact that we're aggregating on the wrong variable.