Aggregate Results vs Un-aggregated


(Rcfro2) #1

Why specifically does the M in the following queries return unexpected results ? The one I am looking for is the results of the second query. But the first returns results by movie and actor. Is the WITH statement issuing some sort of "grouping" function here ?

match (a:Person) -[r:ACTED_IN]-> (m:Movie)
with a, m, collect(m.title) as Movies,sum(size(r.roles)) as RoleCount, collect(r.roles) as Roles
where RoleCount > 3
return
Movies,
a.name as Actor,
Roles

vs

match (a:Person) -[r:ACTED_IN]-> (m:Movie)
with
a,
collect(m.title) as Movies,
sum(size(r.roles)) as RoleCount,
collect(r.roles) as Roles
where RoleCount > 3
return
Movies,
a.name as Actor,
Roles


(Benoit Simard) #2

Hi rcfro2,

In Cypher there is no GROUP BY like in SQL.
When you are using an aggregate function, the aggregation is done on the variables that precede your aggregation function.

So in your case :

  • Query 1 : aggregation is done on the couple a and m
  • Query 2 : only on a

That's why you have different results.

Cheers


(Rcfro2) #3

Ok. So in the first it assumes im aggregating by both a,m. Then how would you describe the WITH clause? Is it somewhere between Group By and Select statement in SQL?


(Benoit Simard) #4

The WITH creates a result (like a return) and the under script will be executed for each row.


(Rcfro2) #5

Hm well I understand that it creates a result...I am trying to understand it in the context of another language as well to place it better. Can I use WITH to "group by" or look at the results in two different ways? See below:

match (p:Person)-[r:ACTED_IN]->(m)
with count(m.title) as Total, p.name as Name, collect(r.earnings) as R,collect(p.name) as N, r as Ro
where Total >2
with N, sum(Ro.earnings) as SUM
where SUM > 1000
return N, SUM

I would think so but it's not quite reading in the way I would think - it returns no results. I would think that it first passes through all actors thats that have acted in more than 2 movies, then sums the earnings of the grouping of actors and returns a list of actors and a sum where the grouping sum is greater than 1000.

The real question is whether I can have multiple WITH statements within the same query.


(Benoit Simard) #6

Hi

You can have multiple WITH in the same query there is no problem !

In your example, what is weird is that you are doing an aggregation, and then ask for a specific item : count(m.title) as Total, p.name as Name. So here, it's not easy to predict the result.


(Rcfro2) #7

Well, Im trying to first filter (where) on the total number of movies, then I use the WITH clause again to filter on earnings of the set of actors who have acted in 2 or more movies. It returns no results. Hm...


(Tom) #8

I may misunderstand what your trying to accomplish but here's what I think your basic query is (I changed variable names to explain what they mean) ...

# match pattern
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
# aggregation on Person
WITH p, count(m.title) AS numOfMoviesPersonActedIn, sum(r.earnings) AS whatThePersonEarned
# filter (= HAVING)
WHERE numOfMoviesPersonActedIn > 2 AND whatThePersonEarned > 1000
# resultset
RETURN p.name, whatThePersonEarned

That will answer the question give me the name and earnings of actors that played in more than two movies and made more than 1000.

If you merely want the total sum of the earnings (of all actors that play in more than two movies and made more than 1000), the return statement changes to ...
RETURN sum(whatThePersonEarned) AS whatAllThoseActorsEarnedTogether

Does that make sense ? Every WITH and/or RETURN statement is in effect a SELECT with an implicit GROUP BY (read: it will GROUP BY on anything you are not aggregating/collection in the statement).

Hope this helps.

Regards,
Tom


(Rcfro2) #9

Tom,

thanks again! SO I realize I can do it in one WITH statement. But I was trying to use WITH twice in one query and see how it affects the results.

thanks,


(Tom) #10

Hello,

You can obviously do that. The trick about that is - quite often - to collect in a first WITH, then use UNWIND to expand again, for example ...

# pattern
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
# aggregation across everything (and we need to collect the persons or we lose them)
WITH sum(r.earnings) AS totalEarnings, collect(DISTINCT p) AS theActors
# expand the set again
UNWIND theActors as oneActor
# match again for one person
MATCH (oneActor)-[r:ACTED_IN]->(m:Movie)
# aggregate for one person
WITH totalEarnings, oneActor,sum(r.earnings) AS whatThePersonEarned
# extra filter
WHERE whatThePersonEarned > 1000
# result set making use of both levels of aggregation ...
RETURN oneActor.name, whatThePersonEarned, whatThePersonEarned / totalEarnings AS pieceOfTheTotalPie

Now, I just shook that out of my sleeve as an example and didn't run it myself (so there may be errors in it) ... but that's the general idea ...

Regards,
Tom