Optimising nested COLLECT aggregations in Cypher Query

ciaran.byrne1 · May 6, 2021, 9:42pm

My graph is for representing Backlogs, Stories, Sprints, Tasks Users, and Resources. A Backlog can have Stories, which can have tasks, which can be associated with Sprints and Users. I would like to return a nested structure of Backlog -> Stories ->Task, and also show the Sprint & Users associated with Tasks.

For a small Backlog with 11 Stories and 40 tasks, I'm getting 1,972 db hits taking 1208ms.
Seems quite sluggish, and i think the issue is with the nested COLLECT aggregations. I'm hoping someone might have some suggestions to improve performance.

My query is below along with the profiled execution plan

MATCH (backlog:Backlog{id:'2957822f34504862ae51e8f69980a15f'})
OPTIONAL MATCH (backlog)-[:BACKLOG_HAS_STORY]->(story)
OPTIONAL MATCH (story)-[:STORY_HAS_TASK]->(task
OPTIONAL MATCH (task)-[:TASK_HAS_RESOURCE]->(resource)
OPTIONAL MATCH (sprint)-[:SPRINT_HAS_TASK]->(task)
OPTIONAL MATCH (owner)-[:ACCOUNT_OWNS_STORY]->(story)
WITH backlog,story,COLLECT(resource{.*}) as resources, task, sprint,owner
WITH backlog,story,COLLECT(task{.*,resources:resources,sprint:sprint{.*}}) as tasks,owner
WITH backlog, COLLECT(story{.*,owner:owner{.id,.firstName,.lastName},tasks: tasks}) as stories
RETURN backlog{.*,stories:stories}

andrew_bowman · May 7, 2021, 1:24am

In this case it will be more convenient to use pattern comprehensions, which is like doing an optional match and collecting the results, and it deals with cardinality issues that you're likely seeing as a result of your back-to-back optional matches.

Maybe something like this:

MATCH (backlog:Backlog{id:'2957822f34504862ae51e8f69980a15f'})
OPTIONAL MATCH (backlog)-[:BACKLOG_HAS_STORY]->(story)
WITH backlog, story, 
 [(story)-[:STORY_HAS_TASK]->(task) | task {.*, resources:[(task)-[:TASK_HAS_RESOURCE]->(resource) | resource {.*}], 
  sprint:[(sprint)-[:SPRINT_HAS_TASK]->(task) | sprint {.*}]}] as tasks
WITH backlog, story {.*, owner:[(owner)-[:ACCOUNT_OWNS_STORY]->(story) | owner {.id, .firstName, .lastName}], tasks} as story
WITH backlog, collect(story) as stories
RETURN backlog {.*, stories} as backlog

Technically you could skip the OPTIONAL MATCH from backlog to stories and use that for the top-level nesting of the pattern comprehension, that's up to you.

ciaran.byrne1 · May 7, 2021, 6:06am

Wow ! I had no idea about pattern comprehensions. It reduced my query down to 8ms and is far more elegant. Thanks so much @andrew_bowman.

Topic		Replies	Views
WITH nested collect query Cypher cypher	4	245	March 24, 2022
Nested query results - just dont get it? Cypher apoc , cypher	4	516	June 11, 2021
Optimizing a Query Cypher	1	288	October 30, 2020
Questions about my query model Cypher querying , optimization , cypher , subquery	13	97	March 6, 2025
Poor Performance on Consuming/Returning Millions of Rows Cypher performance , cypher	2	430	January 18, 2021

Optimising nested COLLECT aggregations in Cypher Query

Related topics