Counting nodes based on date

I have a set of nodes with a creationDate property that is indexed, and I am trying to count how many nodes were created for every day in a given date range, including days for which none were created. To do this for the month of December, I tried:

// Query 1
MATCH (m:item)
WHERE datetime('2019-12-01') <= m.creationDate < datetime('2020-01-01')
RETURN date(m.creationDate) AS date, COUNT(*) AS count
ORDER BY date

However, this query does not return dates for which there are no nodes. I managed to get this working:

// Query 2
WITH date('2019-12-01') AS startDate, date('2019-12-31') AS endDate
WITH startDate, duration.inDays(startDate, endDate).days AS days
WITH [day in range(0, days) | startDate + duration({days: day})] AS dates
UNWIND dates AS date
MATCH (m:item)
WHERE date(m.creationDate) = date
RETURN date, COUNT(*) AS count

The problem is that query 2 takes 27 seconds, while query 1 takes 1 second. This makes sense–query 2 is essentially running query 1 once for each day in the range.

I know I can post-process the data in my code to fill in the gaps, but I would prefer not to have to do the extra step if possible. Is there a way to do this in Cypher without the time penalty of rerunning query 1 once for each day in the range?

just faced the same issue and got a solution:

MATCH (m:item)
return date(m.creationDate) as date,count(*) as total
UNION
WITH date('2022-06-01') AS startDate, date('2022-07-07') AS endDate
WITH startDate, duration.inDays(startDate, endDate).days AS days
WITH [day in range(0, days) | startDate + duration({days: day})] AS dates
UNWIND dates AS date
MATCH (m:item)
where date(m.creationDate)<>date
return date,0 as total

This seems to work. It may look a little convoluted, but it does not require a second search to fill in the missing dates. Instead it does a little processing to get the missing dates and creates a map using a apoc procedure. I borrowed @Umer's logic for generating all the dates between two given dates.

with date('2020-01-01') AS startDate, date('2020-01-10') AS endDate
with duration.inDays(startDate, endDate).days AS days, startDate, endDate
with [day in range(0, days) | toString(startDate + duration({days: day}))] AS allDates, startDate, endDate
match (s:Sample) where startDate <= s.creationDate <= endDate
with toString(s.creationDate) as date, count(s) as dateCounts, allDates
with collect(date) as dateList, collect(dateCounts) as countList, allDates
with [i in allDates where not i in dateList | i] as zeroDates, dateList, countList
with zeroDates, [i in zeroDates | 0] as zeros, dateList, countList
return apoc.map.fromLists(dateList + zeroDates, countList + zeros) as result

Test Data (I used @daveb's script):

unwind(['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06', '2020-01-09', '2020-01-09', '2020-01-02', '2020-01-03', '2020-01-06', '2020-01-09']) as dt 
create (:Sample { creationDate: date(dt) });

Result:

{
  "2020-01-10": 0,
  "2020-01-01": 2,
  "2020-01-02": 2,
  "2020-01-03": 2,
  "2020-01-04": 0,
  "2020-01-05": 0,
  "2020-01-06": 2,
  "2020-01-07": 0,
  "2020-01-08": 0,
  "2020-01-09": 3
}

I refactored the query using a different apoc function. I think it is slightly less complicated as a result.

with date('2020-01-01') AS startDate, date('2020-01-10') AS endDate
WITH duration.inDays(startDate, endDate).days AS days, startDate, endDate
WITH [day in range(0, days) | toString(startDate + duration({days: day}))] AS allDates, startDate, endDate
match (s:Sample) where startDate <= s.creationDate <= endDate
with toString(s.creationDate) as date, count(s) as dateCounts, allDates
with collect([date, dateCounts]) as dateCounts, collect(date) as datesWithCounts, allDates
with [i in allDates where not i in datesWithCounts | [i, 0]] as datesWithZeroCounts, dateCounts
return apoc.map.fromPairs(dateCounts + datesWithZeroCounts) as result