I'm struggling to query a list of ids and years. I need to count distinct ids per year,
but also need distinct ids over the whole list.
Simple example
id year
11111 2000
22222 2000
11111 2000
33333 2001
33333 2001
33333 2002
44444 2002
becomes (via distinct count)
year uniqueIds
2000 2
2001 1
2002 2
but I also need to know there were 4 unique ids in the entire set, from the same query. The list takes time to assemble and I'm trying to avoid a second query.
Zach
The challenge is that you want to process the rows in two entirely different ways. Cypher is not really a scripting language with branching and controls capabilities. You basically have to process the stream of rows.
Here is a solution if you don't mind collecting the data, so the data can be processed separately with two call subqueries. The first 'with' and 'unwind' are a means of getting your test data in. The first step is to collect the data so a single row can be sent to each call subquery. Each subquery then calculates the information you want. The results are joined together in the outer query.
with [
{id: 11111, year: 2000},
{id: 22222, year: 2000},
{id: 11111, year: 2000},
{id: 33333, year: 2001},
{id: 33333, year: 2001},
{id: 33333, year: 2002},
{id: 44444, year: 2002}
] as inputs
unwind inputs as input
with collect(input) as data
call {
with data
unwind data as x
with x.year as year, count(distinct x.id) as count
return collect({year: year, uniqueIds: count}) as yearData
}
call {
with data
unwind data as x
return count(distinct x.id) as uniqueIds
}
return yearData, uniqueIds