Head's Up! Site migration is underway. Expect disruption to service on Thursday, Feb. 9!
β05-27-2021 01:35 PM
I thought this would work:
(get movies where the average birth year of the actors is greater than a certain value)
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
WHERE AVG(p.born) > 1967
RETURN m.title, AVG(p.born)
but I get the (obscure) error:
Invalid use of aggregating function avg(...) in this context (line 2, column 7 (offset: 46))
"WHERE AVG(p.born) > 1967
^
I can work around it with this (which is a bit uglier)
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
WITH m, AVG(p.born) AS avgBorn
WHERE avgBorn > 1967
RETURN m.title, avgBorn
β05-27-2021 01:40 PM
β05-27-2021 01:54 PM
Yes, I know that. What I want to understand is the logic as to why the first one doesn't work.
β05-27-2021 04:24 PM
One thing that's not clear is the scope of the aggregation, the grouping key is not clear if we were to use it in a WHERE clause.
Would it be asking for the average of all persons who acted in all movies (so the same average would display on every result row)? Would persons be counted multiple times because they act in multiple movies, and should that duplication be captured in the average? Or is it asking for the average per movie? What if the pattern was more complicated, or there were more variables in scope?
Consider this MATCH and WHERE clause:
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)-[:ATTENDED_COLLEGE_AT]->(college:College)
WHERE AVG(p.born) > 1967
How could we tell what the grouping key is here? Is it per movie? Per college? Neither, and you want the average of all persons who acted in a movie and attended college? What if they acted in multiple movies, or attended multiple colleges?
When we restrict aggregations to WITH clauses, the grouping key becomes much clearer, as in the workaround you cited: you're getting, per movie, the average born year of that movie's actors, and then filtering that row based on whether the average is over 1967.
β05-27-2021 05:56 PM
Thanks.... this explanation was using and a little trickier than I had imagined.
I also found this explanation: equivalent having clause. (I always found the SQL HAVING
somewhat cryptic as well.)
I guess what made this tough for me is the error msg wasn't very informative...
So, is it that the GROUP BY
concept is implicit in Cypher?
Thanks so much for the explanation!
β05-27-2021 06:13 PM
For fun, I made an some other interesting queries:
Find movies who's average actor's birth year is less or equal than the average from the movie 'The Matrix', (but not including 'The Matrix')
MATCH (m1:Movie{title:'The Matrix'})<-[:ACTED_IN]-(p1:Person)
WITH AVG(p1.born) AS matrixAvg, m1
MATCH (m2:Movie)<-[:ACTED_IN]-(p2:Person)
WITH AVG(p2.born) AS bornAvg, matrixAvg, m2, m1
WHERE bornAvg <= matrixAvg AND id(m2) <> id(m1)
RETURN m2.title, bornAvg
Or yet still more interesting, is to get the actor's average age for a movie at the time the movie was released and select for the average age less than the average age for 'The Matrix".
MATCH (m1:Movie{title:'The Matrix'})<-[:ACTED_IN]-(p1:Person)
WITH m1.released - AVG(p1.born) AS matrixAgeAvg, m1
MATCH (m2:Movie)<-[:ACTED_IN]-(p2:Person)
WITH m2.released - AVG(p2.born) AS movieAgeAvg, matrixAgeAvg, m2, m1
WHERE movieAgeAvg <= matrixAgeAvg AND id(m2) <> id(m1)
RETURN m2.title, m2.released, movieAgeAvg
β05-31-2021 09:13 PM
Hello @clem
Thanks @andrew.bowman for the precise answer
Unless you want to create your own graph database engine some questions cannot be entirely understand by a Cypher user, and it's the point of having a short and sweet language as Cypher.
But roughly, Cypher works on two data domains at the same time, the graph itself, let's describe it as a bunch of physical links in the memory, and the projected data from this graph who are rows with data in it, you can visualise it as a table.
Some clauses or function are build to work with the graph domain and others are build to work with the projected data. If you think about it carefully, it doesn't make any sens to use avg in the graph domain because the data doesn't exists as a list or rows of something in the graph domain.
Read and try to understand the behaviour of the clause MATCH and what's a graph and so you will be able to understand the answer of @andrew.bowman
β06-01-2021 09:35 AM
Thanks for the insight.
After years of mainly using procedural languages, thinking in declaratively can be a bit tricky. Adding Graph thinking on top of that adds an additional dimension.
I certainly find Cypher more intuitive than SQL but every so often, I get tripped up on the finer points.
All the sessions of the conference are now available online