How groupby works implicitly with more than 1 column?

technerd · March 12, 2022, 5:48am

I have written the below query to retrieve movie title and corresponding rating :

    match (rvwr:Person)-[r:REVIEWED]->(m:Movie)
    where m.released > 2003
    return m.title, avg(r.rating) as rating

I understand that groupby is done implicitly for m.title .

But, how groupby works if there are more than 1 columns as below:

    match (rvwr:Person)-[r:REVIEWED]->(m:Movie)
    where m.released > 2003
    return m.title, rvwr.born, avg(r.rating) as rating

glilienfield · March 12, 2022, 10:06am

The result of your query is a set of rows, with values for rvwr, r, and m. These values can repeat on several lines because a reviewer can review many movies and a movie can be reviewed by many reviewers.

When you use an aggregate function in a WITH or RETURN statement, all values not included as parameters of the aggregate function form the grouping criteria.

In your example, every combination of m.title and rvwr.born would be a group and the average would be computed over the subset of records with the same combination of m.title and rvwr.born.

To visualize this, run the two queries below. The first will output the data grouped, so you can see the values of r.rating that will be included in the averages for each group.

The second query will apply the average; thereby, reducing each group of records with the same combination of m.title and rvwr,born to one row each, with their average computed over the r.ratings shown in the first query.

match (rvwr:Person)-[r:REVIEWED]->(m:Movie)
where m.released > 2003
return m.title, rvwr.born, r.rating
order by m.title, rvwr.born

match (rvwr:Person)-[r:REVIEWED]->(m:Movie)
where m.released > 2003
return m.title, rvwr.born, avg(r.rating) as rating
order by m.title, rvwr.born

Note: what I described is explicit grouping. There is something called implicit grouping, but this feature has been deprecated. You will see warnings in neo4j browsers when it occurs. You can refactor your code to make the grouping explicit. It also makes your code more understandable.

This may help too:

venkat1 · March 12, 2022, 10:35am

Nice explanation, Gary Lilienfield

Topic		Replies	Views
Do we have 'groupby' clause in Neo4j? Neo4j Graph Platform apoc , cypher , knowledge-base	2	227	March 11, 2022
Do we have 'groupby' clause in Neo4j? Neo4j Graph Platform migrated	1	79	October 11, 2022
Aggregate Results vs Un-aggregated Cypher	9	1623	September 23, 2018
I do not understand why I have this result Cypher	14	533	September 25, 2021
Inconsistent results using WITH in query Newbie Questions	5	668	April 23, 2020

Get Certified in June!

How groupby works implicitly with more than 1 column?

Related topics