Aggregate Results vs Un-aggregated

rcfro2 · September 22, 2018, 12:22am

Why specifically does the M in the following queries return unexpected results ? The one I am looking for is the results of the second query. But the first returns results by movie and actor. Is the WITH statement issuing some sort of "grouping" function here ?

match (a:Person) -[r:ACTED_IN]-> (m:Movie)
with a, m, collect(m.title) as Movies,sum(size(r.roles)) as RoleCount, collect(r.roles) as Roles
where RoleCount > 3
return
Movies,
a.name as Actor,
Roles

vs

match (a:Person) -[r:ACTED_IN]-> (m:Movie)
with
a,
collect(m.title) as Movies,
sum(size(r.roles)) as RoleCount,
collect(r.roles) as Roles
where RoleCount > 3
return
Movies,
a.name as Actor,
Roles

Benoit · September 22, 2018, 2:56pm

Hi rcfro2,

In Cypher there is no GROUP BY like in SQL.
When you are using an aggregate function, the aggregation is done on the variables that precede your aggregation function.

So in your case :

Query 1 : aggregation is done on the couple a and m
Query 2 : only on a

That's why you have different results.

Cheers

rcfro2 · September 22, 2018, 5:17pm

Ok. So in the first it assumes im aggregating by both a,m. Then how would you describe the WITH clause? Is it somewhere between Group By and Select statement in SQL?

Benoit · September 22, 2018, 9:48pm

The WITH creates a result (like a return) and the under script will be executed for each row.

rcfro2 · September 22, 2018, 10:06pm

Hm well I understand that it creates a result...I am trying to understand it in the context of another language as well to place it better. Can I use WITH to "group by" or look at the results in two different ways? See below:

match (p:Person)-[r:ACTED_IN]->(m)
with count(m.title) as Total, p.name as Name, collect(r.earnings) as R,collect(p.name) as N, r as Ro
where Total >2
with N, sum(Ro.earnings) as SUM
where SUM > 1000
return N, SUM

I would think so but it's not quite reading in the way I would think - it returns no results. I would think that it first passes through all actors thats that have acted in more than 2 movies, then sums the earnings of the grouping of actors and returns a list of actors and a sum where the grouping sum is greater than 1000.

The real question is whether I can have multiple WITH statements within the same query.

Benoit · September 23, 2018, 1:47am

Hi

You can have multiple WITH in the same query there is no problem !

In your example, what is weird is that you are doing an aggregation, and then ask for a specific item : count(m.title) as Total, p.name as Name. So here, it's not easy to predict the result.

rcfro2 · September 23, 2018, 2:05am

Well, Im trying to first filter (where) on the total number of movies, then I use the WITH clause again to filter on earnings of the set of actors who have acted in 2 or more movies. It returns no results. Hm...

tom.geudens · September 23, 2018, 6:15am

I may misunderstand what your trying to accomplish but here's what I think your basic query is (I changed variable names to explain what they mean) ...

# match pattern
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
# aggregation on Person
WITH p, count(m.title) AS numOfMoviesPersonActedIn, sum(r.earnings) AS whatThePersonEarned
# filter (= HAVING)
WHERE numOfMoviesPersonActedIn > 2 AND whatThePersonEarned > 1000
# resultset
RETURN p.name, whatThePersonEarned

That will answer the question give me the name and earnings of actors that played in more than two movies and made more than 1000.

If you merely want the total sum of the earnings (of all actors that play in more than two movies and made more than 1000), the return statement changes to ...
RETURN sum(whatThePersonEarned) AS whatAllThoseActorsEarnedTogether

Does that make sense ? Every WITH and/or RETURN statement is in effect a SELECT with an implicit GROUP BY (read: it will GROUP BY on anything you are not aggregating/collection in the statement).

Hope this helps.

Regards,
Tom

rcfro2 · September 23, 2018, 6:19am

Tom,

thanks again! SO I realize I can do it in one WITH statement. But I was trying to use WITH twice in one query and see how it affects the results.

thanks,

tom.geudens · September 23, 2018, 6:34am

Hello,

You can obviously do that. The trick about that is - quite often - to collect in a first WITH, then use UNWIND to expand again, for example ...

# pattern
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
# aggregation across everything (and we need to collect the persons or we lose them)
WITH sum(r.earnings) AS totalEarnings, collect(DISTINCT p) AS theActors
# expand the set again
UNWIND theActors as oneActor
# match again for one person
MATCH (oneActor)-[r:ACTED_IN]->(m:Movie)
# aggregate for one person
WITH totalEarnings, oneActor,sum(r.earnings) AS whatThePersonEarned
# extra filter
WHERE whatThePersonEarned > 1000
# result set making use of both levels of aggregation ...
RETURN oneActor.name, whatThePersonEarned, whatThePersonEarned / totalEarnings AS pieceOfTheTotalPie

Now, I just shook that out of my sleeve as an example and didn't run it myself (so there may be errors in it) ... but that's the general idea ...

Regards,
Tom

Topic		Replies	Views
Using WITH clause along with an aggregating function like count() Cypher	3	1814	May 28, 2021
Inconsistent results using WITH in query Newbie Questions	5	680	April 23, 2020
Inconsistent Query Behavior with WITH Clause - Potential Bug Report Cypher	6	236	November 16, 2023
What does WITH do? Graph Academy Question Graph Academy & Certifications migrated	2	234	June 9, 2022
Tutorial on WITH Clause Cypher cypher	1	739	October 18, 2019

Aggregate Results vs Un-aggregated

Related topics