Collect intermediate nodes from variable length query for further processing

Think of a process flow or a precedence/successor network. There are many activities in the overall process flow performed by many different groups. Given a certain group, if they have rework to do, they will impact many of the downstream activities and thus the groups that perform those activities. I can easily write a query similar to the one shown below:

MATCH p = (g:Group {id:'XXX'})-[:PERFORMS]->(a1:Activity)-[:PRECEDES*]->(a2:Activity {id:'Finish'})
RETURN p

When I do this, I get the graph that is expected showing all the activities that initial group is responsible for and how those activity nodes terminate along a variable path to the terminus a2 node. In my example, 50 or so total activities are returned including three "first level" activities and the final activity (which were already known).
What I want is to somehow collect the intermediate 46 activities that were "hidden" and query which (g2:Group)-[:PERFORMS]-> those (hidden:Activity) to RETURN g2.name, hidden.id.

Can someone help with the additional cypher or recommend an APOC procedure to help me out? I feel like this should be pretty simple but the solution escapes me.

Hello @FourMoBro :slight_smile:

You can use the nodes() function to get the list of nodes:

MATCH p = (g:Group {id:'XXX'})-[:PERFORMS]->(a1:Activity)-[:PRECEDES*]->(a2:Activity {id:'Finish'})
RETURN p, [n IN nodes(p) WHERE n:Activity] AS activites

Then you can directly work with this list to only keep the nodes you want.

Regards,
Cobra

1 Like

Thanks @Cobra
I knew it was simple. Here is the final query:

MATCH p = (g:Group {id:'XXX'})-[:PERFORMS]->(a1:Activity)-[:PRECEDES*]->(a2:Activity {id:'Finish'})
WITH [n IN nodes(p) WHERE n:Activity] AS Activities
UNWIND Activities as a3
MATCH (g:Group)-[:PERFORMS]->(a3)
RETURN DISTINCT(g.id), COLLECT (DISTINCT a3.id)
2 Likes