Understanding grouping key in aggregations

BairDev · June 7, 2021, 10:39am

I have a currently small database with the following relevant part:

            (:Address)
        /
(:User)
        \
            (:UsedMeasure) // multiple of these!!!

The relevant properties here are: User.createdAt (DateTime), Address.addressCityCode (String), UsedMeasure.reductionAvg (Number).

Now I want to list the User with Date (createdAt), its city code and the sum of the reductionAvg (divided by 2, but that does not matter) of all their UsedMeasures.

But when I use a query like this:

MATCH (u:User {isBetUser: true})-->(m:UsedMeasure) WITH u, m, u.createdAt.epochMillis AS date MATCH (u)-->(a:Address) RETURN ROUND((SUM(m.reductionAvg) / 2), 1) AS sum, a.addressCityCode AS PLZ, SUM(m.reductionAvg), apoc.date.format(date, 'ms', 'dd.MM.yyyy') as Datum;

the result is grouped by the Datum, which makes sense due to:

Grouping keys are non-aggregate expressions, that are used to group the values going into the aggregate functions.

How can I get exactly one row per user without adding ID(user) to the RETURN list? (I don't want to get the id here, the data will be sent away in a CSV file.) Is there any way to add a different grouping key to the (ROUND(SUM(m.reductionAvg) / 2), 1) part of the query without returning it?

andrew_bowman · June 7, 2021, 7:00pm

Yes, a key understanding here is that you do not have to do all your aggregations in the RETURN, you can use WITH clauses to perform work like aggregations, and perform later matches from them afterward:

MATCH (u:User {isBetUser: true})-->(m:UsedMeasure) 
WITH u, ROUND((SUM(m.reductionAvg) / 2), 1) AS sum, SUM(m.reductionAvg) as sumReductionAvg
MATCH (u)-->(a:Address)
RETURN u, apoc.date.format(u.createdAt.epochMillis, 'ms', 'dd.MM.yyyy') AS Datum, a.addressCityCode AS PLZ, sum, sumReductionAvg

So here, we get the sums with respect to each u node, and only after we perform the aggregations do we MATCH out to the rest and project out the properties we need.

BairDev · June 11, 2021, 11:17am

Thanks for your suggestion, it is way more easy than my former solution.

"MATCH (u:User {isBetUser: true})-->(m:UsedMeasure) WHERE u.createdAt > datetime($fromDate) " +
    "WITH {id: ID(u), sum: ROUND(SUM(m.reductionAvg) / 2, 1), betCode: COALESCE(u.betCode, '')} AS userMeas UNWIND userMeas as uM " +
    "MATCH (u)-->(a:Address) WHERE ID(u) = uM.id WITH u.createdAt.epochMillis AS date, uM, a" +
    " RETURN apoc.date.format(date, 'ms', 'dd.MM.yyyy') as Date," +
    " uM.sum AS `Some Column Title`, a.addressCityCode as PLZ, " +
    "uM.betCode AS `KLIMAWETTE - Code (optional)`"

So UNWIND might never be useful for processing MATCH results.

andrew_bowman · June 11, 2021, 6:45pm

I wouldn't say that. UNWIND can be very useful, it's just a tool, you will need to develop experience to understand when that particular tool is needed. Just think of it like the opposite of collect(), it takes elements of a list, and emits a row for each of them. There are cases where a FOREACH can't be used, so an UNWIND might be the next thing to try. Or a case where you had to collect something, and need to expand from the nodes collected, so you need to UNWIND them back to rows first.

Topic		Replies	Views
Aggregation, Returns, & Functions Neo4j Website	0	1633	August 5, 2020
Cypher to group data General migrated	3	208	January 24, 2023
Group by date and sum some results Cypher	5	7999	September 26, 2019
Why can't I use a conditional on an aggregating function? Cypher cypher	7	1087	June 1, 2021
I do not understand why I have this result Cypher	14	534	September 25, 2021

July Summer Fun!

Understanding grouping key in aggregations

Related topics