@Joel thanks for your reply.
This is a try of giving you a minimal example to understand from my side what inhibits the performance. I will give you the actual full query which I am aiming for further below. But first of all to answer your last question I give you a bit more simplified example.
MATCH (s:Session {countryoforigin:"CH"})
RETURN
s.countryoforigin AS countryoforigin,
apoc.temporal.format(s.sessionstarttime,"YYYY-\'W\'ww") as activity_week,
count(DISTINCT(s)) AS session_count
returns something like this.
╒═════════════════╤═══════════════╤═══════════════╕
│"countryoforigin"│"activity_week"│"session_count"│
╞═════════════════╪═══════════════╪═══════════════╡
│"CH" │"2020-W24" │xxxx12 │
├─────────────────┼───────────────┼───────────────┤
│"CH" │"2020-W18" │xxxx66 │
├─────────────────┼───────────────┼───────────────┤
│"CH" │"2020-W22" │xxxx29 │
├─────────────────┼───────────────┼───────────────┤
│"CH" │"2020-W23" │xxxx83 │
├─────────────────┼───────────────┼───────────────┤
I crossed out the real results in the session_count column.
My final goal is to make the calculation of the following query possible with let us say up to 1 billion session nodes.
// Weekly Active User Analysis
MATCH (u:User)-[:IS_IN_COHORT]-(c:CohortWeekly) WHERE c.cohort >= "2020-W01"
MATCH (c)-[:IS_IN_COHORT]-(u)-[g:GAVE_UB_CONSENT]-(d:Device)
WITH u,d,c,count(g) AS total
MATCH (u)-[:IS_IN_COHORT]-(c)
MATCH (c)-[:IS_IN_COHORT]-(u)-[g:GAVE_UB_CONSENT]-(d)
MATCH (u)-[:GAVE_UB_CONSENT]-(d)-[:IS_SESSION_OF_DEVICE]-(s:Session)
RETURN
s.countryoforigin AS countryoforigin,
c.cohort AS cohort_week,
c.total_user_count AS total_user_count,
total AS users_with_ub_consent,
apoc.temporal.format(s.sessionstarttime,"YYYY-'W'ww") as activity_week,
count(DISTINCT(u)) AS active_users,
(toFloat(count(DISTINCT(u.userid)))/total)*100 AS active_rate_percent,
count(s) AS session_count
ORDER BY
countryoforigin,
cohort_week,
activity_week
But since the simple query does not run well, I am without chance to make the larger query running and finishing.
I am basically wondering how to do queries where the aggregation seems too big to be fit into memory.