Set upper bound for node and edge count in a apoc.path.spanningTree query

Consider the following query:

MATCH (root:N) 
CALL apoc.path.spanningTree(
    root, 
    {
        maxLevel: 3,
        limit: 10000
    }) 
YIELD path 
WITH root, nodes(path) AS pathNodes, relationships(path) AS pathRels 
WHERE size(pathNodes) <= 10 AND size(pathRels) <= 200 
LIMIT 500 
RETURN [root] AS root, [n IN pathNodes WHERE n <> root] AS nodes, pathRels AS relationships

With this query, I am trying to get a spanning tree and limit the number of nodes to a max of 10 and edges to a maximum of 200. However, this ends up returning 500 nodes and 499 edges.

I would appreciate any thoughts on how I can correct this query.

With that query, you'll get all the nodes that match the conditions, up to LIMIT 500

Seems you have more than 500 nodes that match.

That is correct; do you have any thoughts on how I should modify it?

If your query is returning too many, then I would think you need either an anchoring point (identify nodes to start?) or more refinements ("WHERE XXX").

I executed it in my Graph and it basically returned everything too.

Do you mean this section would not have any effects?

Your initial match has only a label constraint, so you are calling the spanning tree for each N node. You WITH clause contains the 'root' node, so the WHERE clause and LIMIT clause is applied to each 'root' node; thus you can get way more than 500 nodes.

Is this what you want, or do you want to apply this query to just one 'root' node at a time? What is your desired output?

To one root node at a time, and limit the number of nodes and edges.

Currently, I always get 500 nodes and 499 edges, which is driven by LIMIT 500 .

If I understand, you want to run this query for one specific root node. If true, you need to add a constraint to your match clause that looks for N nodes.

No, that won't be what I need.

So, imagine a densely connected graph, where for any given root node, there is a path to every other node in the graph, where some are reachable with 1 hop, some with 10 hops, and with 3 hops, you can reach 80% of the nodes in the graph.

For a given root node, how would you get its neighbors at a 3-hop distance---knowing that if you do not limit, you will end up retrieving 80% of the nodes in your graph---returning a given maximum number of nodes and edges?

Imagine you have a graph like this:

(A)->(B)->(C)->(D)->(E)->(F)->(A)

which is the "root" node where you start?

If you think "give me every one that is 2 hops away":

(A)->(C)
(B)->(D)
(C)->(E)
(E)->(A)
(F)->(B)

You can see that you'll get more if you are "between 2 and 3" and even more "2 and 4", etc.

You need to provide a starting point to the query - otherwise your result is correct (you're retrieving all paths that match)

The spanningTree procedure will return all paths starting from the root node passed. This includes all the intermediate paths too, not just the max length paths. For example, if you have a path a->b->c->d, you will get a->b, a->b->c, and a->b->c->d for a single spanningTree call passing it 'a'. In your query, you are returning the nodes/relationships for each of those paths. As such, you will have many subpaths.

Is this what you want? If not, do you want just the whole paths or do you want just want the collection of nodes connected to the root node by 1 to 3 hops ([b, c, d] in the example)?

1 Like