apoc.path.subgraphAll not behaving as expected with endNodes parameter

I have been using the apoc.path.subgraphAll procedure to find nodes reachable according to certain label/relationship criteria from a given starting node and in general this has been working as expected. However when I specify endNodes the returned set of nodes and relationships only includes nodes that are included in the endNodes parameter.

Why does subgraphAll behave like this? The documentation would suggest it should return the entire expansion which includes all possible paths to these end nodes?

When using the "spanningTree" procedure I get something similar to what I want, although a spanning tree by definition returns the least number of required returned and so some relationships might not be returned when using this.

Finally I tried using expandConfig which seems to return what I wanted without the potential for missing relationships that the spanningTree procedure causes. But I can't see what's substantively different about the settings for expandConfig and subgraphAll in this case.

This expertly drawn diagram illustrates my question:
image

Why does specifying the set of blue nodes as the endNodes property in a subgraphAll procedure call with the red node as the starting node return only the blue nodes circled red instead of the nodes along the entire expansion circled in green?

Hi Andrew,

Thanks for your question. Can you provide your code that shows how you are querying using subgraphAll? My thought is that your subgraphAll query is lacking a YIELD statement.

When using apoc.path.subgraphAll with endNodes, you need to include a YIELD statement to specify what data you want to return from the procedure.

The YIELD statement allows you to specify the properties and elements you want to include in the output, such as the nodes and relationships in the subgraph, the path between the start and end nodes, and any other properties or data that you want to include.

Here's an example of how to use apoc.path.subgraphAll with endNodes and a YIELD statement:

MATCH (start:Label1)-[rel:REL_TYPE*]->(end:Label2)
CALL apoc.path.subgraphAll(start, {labelFilter: 'Label1|Label2', relationshipFilter: 'REL_TYPE', endNodes: [end], limit: 1000})
YIELD nodes, relationships, path
RETURN nodes, relationships, path

In this example, we are using MATCH to specify the starting and ending nodes, and then calling apoc.path.subgraphAll to find the subgraph between them. The YIELD statement specifies that we want to return the nodes, relationships, and path elements from the subgraph.

Note that the YIELD statement is required when using apoc.path.subgraphAll with endNodes, because it determines what data will be returned from the procedure. If you don't include a YIELD statement when calling apoc.path.subgraphAll with endNodes, the procedure will still execute and find the subgraph between the start and end nodes, but it will not return any data.

Without a YIELD statement, the subgraph will be created and stored in memory, but it will not be returned to the user or used in any subsequent queries. This means that you won't be able to use the subgraph in further processing or analysis, and the results of the procedure won't be visible in the Neo4j Browser or any other output.

By using a YIELD statement afterapoc.path.subgraphAll with endNodes you are able to access and use the results of the procedure in subsequent queries and analysis.

Try rerunning your query with the included 'YIELD' and let us know how it goes.

My query does not lack a yield statement - it doesn't return no data it returns only a subset of the data I am expecting as I explained. I do not have 'path' in my yield as the APOC documentation only specifies nodes and relationships as outputs of subgraphAll and when I tried including path after your reply it just returns an error stating that path is "Unknown procedure output".

MATCH (m:LABEL {property1: 'value1a'})
WITH COLLECT(m) AS ms
MATCH (n:LABEL {property1:'value1b', property2: 'value2'})
CALL apoc.path.subgraphAll(n, {labelFilter: 'LABEL', relationshipFilter: '<', endNodes: ms})
YIELD nodes, relationships
RETURN nodes, relationships

Relating this example query back to my sketch in my original post the MATCH m clause would match all the blue nodes and the MATCH n would match the red node only. The query including the endNodes parameter would only return the two blue nodes reachable from the red node (circled in red), not everything circled in green as expected. Without the endNodes parameter it would return everything circled in green and the yellow node as expected.

Edit: For clarity, as I already mentioned, the following query does return what I expect the above query should return:

MATCH (m:LABEL {property1: 'value1a'})
WITH COLLECT(m) AS ms
MATCH (n:LABEL {property1:'value1b', property2: 'value2'})
CALL apoc.path.expandConfig(n, {labelFilter: 'LABEL', relationshipFilter: '<', endNodes: ms})
YIELD path
RETURN path

Based on the APOC documentation I don't understand why these two queries don't behave in the same way?

Apologies for the typo and include "path" in the earlier code. I see clearly now from your code that it is not a YIELD issue at all.

That said, let me ask a follow up questions. Do each of the relationships from red to purple and purple to blue meet the constraints of your query?