Counting nodes based on date

mventimi · February 1, 2020, 2:58am

I have a set of nodes with a creationDate property that is indexed, and I am trying to count how many nodes were created for every day in a given date range, including days for which none were created. To do this for the month of December, I tried:

// Query 1
MATCH (m:item)
WHERE datetime('2019-12-01') <= m.creationDate < datetime('2020-01-01')
RETURN date(m.creationDate) AS date, COUNT(*) AS count
ORDER BY date

However, this query does not return dates for which there are no nodes. I managed to get this working:

// Query 2
WITH date('2019-12-01') AS startDate, date('2019-12-31') AS endDate
WITH startDate, duration.inDays(startDate, endDate).days AS days
WITH [day in range(0, days) | startDate + duration({days: day})] AS dates
UNWIND dates AS date
MATCH (m:item)
WHERE date(m.creationDate) = date
RETURN date, COUNT(*) AS count

The problem is that query 2 takes 27 seconds, while query 1 takes 1 second. This makes sense–query 2 is essentially running query 1 once for each day in the range.

I know I can post-process the data in my code to fill in the gaps, but I would prefer not to have to do the extra step if possible. Is there a way to do this in Cypher without the time penalty of rerunning query 1 once for each day in the range?

david_allen · February 1, 2020, 1:17pm

unwind(['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-03']) as dt 
create (:Sample { creationDate: datetime(dt) });

(Note that these are datetimes in there and not dates)

match (s:Sample) 
return distinct(date(s.creationDate)) as date, count(s) as nodesWithThisDate

Yields

"2020-01-01"    2
"2020-01-02"    1
"2020-01-03"    1

If there's a node in there with a null, it will show up and count nulls.

mventimi · February 2, 2020, 5:20am

Thanks for the response David!

In your example, if I changed the first statement to be:

unwind(['2020-01-01', '2020-01-01', '2020-01-02', '2020-01-04']) as dt 
create (:Sample { creationDate: datetime(dt) });

then run the same query you give, the results would be:

"2020-01-01"    2
"2020-01-02"    1
"2020-01-04"    1

The results I need are:

"2020-01-01"    2
"2020-01-02"    1
"2020-01-03"    0
"2020-01-04"    1

In a relational database, this would be an OUTER JOIN operation between the aggregated data and a list of dates. In Neo4j, I'm not sure if an analogous operation is possible. Query 2 isn't the same–it's more of an iterated approach. OPTIONAL MATCH seems like the closest thing to OUTER JOIN, but

The only option I see (other than post-processing) is to create "date bin" nodes for each possible creation date:

UNWIND (['2019-12-01', '2019-12-02', ..., '2019-12-31') AS dt
CREATE (:DateBin {date: date(dt)})

Then I can establish relationships between those nodes and the item nodes:

MATCH (m:item)
WITH m
MATCH (n:DateBin {date: date(m.creationDate})
MERGE (m)-[:CREATED_ON]->(n)

Then the report can be run using those nodes:

MATCH (n:DateBin)
WHERE date('2019-12-01') <= n.date < date('2020-01-01')
WITH n
OPTIONAL MATCH (m:item)-[:CREATED_ON]->(n)
RETURN n.date AS date, COUNT(m) AS count
ORDER BY date

(I haven't actually tried this. Apologies if there are errors; I just wanted to illustrate the idea.)

The only real downside to this approach is the additional structure needed in the database to support these queries. Any chance there's an easier way that isn't O(n) on the size of the date range?

Topic		Replies	Views
Counting nodes based on date Neo4j Graph Platform migrated	3	101	July 15, 2022
Getting the node counts between 10 minutes periods Cypher cypher	1	226	October 15, 2021
Cypher for counting node property that meet certain conditions Cypher	4	290	April 19, 2021
Count and nodes for each day of week and each hour Cypher	3	609	March 21, 2021
Creating date nodes by looping through a start date and an end date Procedures & APOC	3	865	October 22, 2020

Counting nodes based on date

Related topics