Why can't I use a conditional on an aggregating function?

One thing that's not clear is the scope of the aggregation, the grouping key is not clear if we were to use it in a WHERE clause.

Would it be asking for the average of all persons who acted in all movies (so the same average would display on every result row)? Would persons be counted multiple times because they act in multiple movies, and should that duplication be captured in the average? Or is it asking for the average per movie? What if the pattern was more complicated, or there were more variables in scope?

Consider this MATCH and WHERE clause:

MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)-[:ATTENDED_COLLEGE_AT]->(college:College)
WHERE AVG(p.born) > 1967 

How could we tell what the grouping key is here? Is it per movie? Per college? Neither, and you want the average of all persons who acted in a movie and attended college? What if they acted in multiple movies, or attended multiple colleges?

When we restrict aggregations to WITH clauses, the grouping key becomes much clearer, as in the workaround you cited: you're getting, per movie, the average born year of that movie's actors, and then filtering that row based on whether the average is over 1967.

3 Likes