Inconsistent results using WITH in query

Hey,
an inconsistency I've encountered while doing exercise 5 on top of the movies db.

first query:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
WITH count(a) as numMovies, collect(m.title) as movies,a
WHERE numMovies = 5
RETURN a.name, movies

resulting in a table with 3 rows.

second query:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
WITH count(a) as numMovies, m, a
WHERE numMovies = 5
RETURN a.name, collect(m.title) as movies

returning no records.

the only difference is that I've declared the collect(m.title) in the with and not in the return.
There may be a convention in node4j that I'm not familiar with that may explain this but I would love your help to understand this situation.

thanks in advance!

Hi there,

Cypher doesn't have a group by like SQL. When you use aggregating functions, anything in your return clause that is not aggregated becomes part of the grouping.

In your second query WITH clause, you included both m and a. That means that you are grouping by both the move and the person. Since there is only one ACTED_IN relationship connecting an actor to an individual movie, numMovies is always equal to 1. You can run the following query to see why your WHERE clause is filtering out all of the data.

MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
RETURN count(a) as numMovies, m, a
ORDER BY a.name

Best wishes,
Nathan

1 Like

Hi,

I am trying to understand the difference of

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH  a, count(a) AS numMovies, collect(m.title) AS movies
WHERE numMovies = 5
RETURN a.name, movies

vs

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH  a, count(m) AS numMovies, collect(m.title) AS movies //notice the count(m)
WHERE numMovies = 5
RETURN a.name, movies

Both queries return the same result.
For me it's more intuitive to say count(m) to find number of movies than count(a).
But the tutorials use count(a).
Why is count(a) a better practice than count(m)?
Thanks.

Hi Alex,

You are correct that count(a), count(m), and count(*) will return the same result for this query. As far as I know, one is not better than the other.

You might find this article helpful if you need very fast counts. https://neo4j.com/developer/kb/fast-counts-using-the-count-store/ The examples are from the movie database.

Best wishes,
Nathan

1 Like

In this pattern

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)

Since it is a simple single step count(a) or count(m) both will give the distinct
(a)-->(m) paths. So either one will give same result. Once you have longer pattern things can get different.

count(a) is not better practice than count(m) here, you're right.

Given the context provided by the alias being used here (numMovies), the query should be favoring count(m). The results may not change, but it's important I think to align the context with what you're doing, especially if we have to alter the query to count something related to movies, or to count(DISTINCT m). If it was left as a, then such an alternation may overlook the fact that we're aggregating on the wrong variable.

1 Like