To clarify, what's happening is that when you perform an aggregation (like collect() or count()), the non-aggregation variables present in the WITH (or RETURN) clause become the grouping key, which provides context for the aggregation, and also makes the grouping key distinct.
For example, if we had a month
variable and a days
variable, and we only had two months (January and February), with 31 days in January and 28 days in February modeled as (:Month)-[:HAS_DAY]->(:Day)
, and we performed the following:
MATCH (month:Month)-[:HAS_DAY]->(day:Day)
RETURN count(day) as dayCount
This would return a single row with a dayCount of 59, the count of all the day nodes found.
But if we changed the return to RETURN month, count(day) as dayCount
, then we would get two rows, one for January with a dayCount of 31, and another for February with a dayCount of 28.
If we kept day
as a variable like this: RETURN month, day, count(day) as dayCount
we would get 59 rows back, since for each row we need to output a distinct month and day, and the dayCount would be 1 for each of those 59 rows (since per row there is only a single day, and the count is with respect to the month and the day, that forms the grouping key of the aggregation).
If you wanted the count of days of the month, yet wanted to keep each day on its own row, then you would need to collect() at the same time that you count(), then UNWIND back to rows afterward, as in Dana's suggestion:
MATCH (month:Month)-[:HAS_DAY]->(day:Day)
WITH month, collect(day) as days, count(day) as dayCount
UNWIND days as day
RETURN month, day, dayCount
At the time we aggregate, the only non-aggregation variable is month
. Days are collected with respect to the month, and the days are counted with respect to the month. Then we UNWIND the list of days back to rows, leaving us with the month, a day of that month, and the count of all days in that month.