Inconsistent results using WITH in query

orisbaum · March 23, 2020, 3:41pm

Hey,
an inconsistency I've encountered while doing exercise 5 on top of the movies db.

first query:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
WITH count(a) as numMovies, collect(m.title) as movies,a
WHERE numMovies = 5
RETURN a.name, movies

resulting in a table with 3 rows.

second query:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
WITH count(a) as numMovies, m, a
WHERE numMovies = 5
RETURN a.name, collect(m.title) as movies

returning no records.

the only difference is that I've declared the collect(m.title) in the with and not in the return.
There may be a convention in node4j that I'm not familiar with that may explain this but I would love your help to understand this situation.

thanks in advance!

nsmith_piano · March 23, 2020, 5:32pm

Hi there,

Cypher doesn't have a group by like SQL. When you use aggregating functions, anything in your return clause that is not aggregated becomes part of the grouping.

In your second query WITH clause, you included both m and a. That means that you are grouping by both the move and the person. Since there is only one ACTED_IN relationship connecting an actor to an individual movie, numMovies is always equal to 1. You can run the following query to see why your WHERE clause is filtering out all of the data.

MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
RETURN count(a) as numMovies, m, a
ORDER BY a.name

Best wishes,
Nathan

admin3 · April 15, 2020, 10:31am

Hi,

I am trying to understand the difference of

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH  a, count(a) AS numMovies, collect(m.title) AS movies
WHERE numMovies = 5
RETURN a.name, movies

vs

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH  a, count(m) AS numMovies, collect(m.title) AS movies //notice the count(m)
WHERE numMovies = 5
RETURN a.name, movies

Both queries return the same result.
For me it's more intuitive to say count(m) to find number of movies than count(a).
But the tutorials use count(a).
Why is count(a) a better practice than count(m)?
Thanks.

nsmith_piano · April 15, 2020, 2:09pm

Hi Alex,

You are correct that count(a), count(m), and count(*) will return the same result for this query. As far as I know, one is not better than the other.

You might find this article helpful if you need very fast counts. Fast counts using the count store - Knowledge Base The examples are from the movie database.

Best wishes,
Nathan

anthapu · April 15, 2020, 2:15pm

In this pattern

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)

Since it is a simple single step count(a) or count(m) both will give the distinct
(a)-->(m) paths. So either one will give same result. Once you have longer pattern things can get different.

andrew_bowman · April 23, 2020, 1:50am

count(a) is not better practice than count(m) here, you're right.

Given the context provided by the alias being used here (numMovies), the query should be favoring count(m). The results may not change, but it's important I think to align the context with what you're doing, especially if we have to alter the query to count something related to movies, or to count(DISTINCT m). If it was left as a, then such an alternation may overlook the fact that we're aggregating on the wrong variable.

Topic		Replies	Views
Aggregate Results vs Un-aggregated Cypher	9	1623	September 23, 2018
Inconsistent Query Behavior with WITH Clause - Potential Bug Report Cypher	6	228	November 16, 2023
Using WITH clause along with an aggregating function like count() Cypher	3	1803	May 28, 2021
Https://neo4j.com/graphacademy/online-training/introduction-to-neo4j/part-5/ Cypher	2	404	June 21, 2020
Count() returns same thing using two very different variables Newbie Questions	2	253	April 26, 2022

July Summer Fun!

Inconsistent results using WITH in query

Related topics