Loop until subqueries does not make any more changes?

in short the question is whether CYPHER has a way to replace the for like loop we can build with UNWIND and range() by a while like loop that terminates when the inside subquery makes no change anymore

The context I would use this in is that I have a graph with a specific shape by construction : it has two types of relationships : child and jump. There is a "pseudo" root, with children that have been build to represent main areas of the graph (VGsubgraph)
The process is seeded by setting the root children VGsubgraph properties to a list with just their own name in it. It then propagates those VGsubgraph values down the child relationship. Some nodes have several VGsubgraph values, but I know the traversal will end because before adding a new value, I check it's not already in the list. So after a certain number of iterations, the query will make no change anymore.

unwind range(0,30) as i
call {
    match (n:thg)-[:child|typed {Relation:1}]->(child:thg) where
    n.VGsubgraph<>[] and n.VGsubgraph<> child.VGsubgraph and
    any(val in n.VGsubgraph where not apoc.coll.contains(child.VGsubgraph,val))
    set child.VGsubgraph=child.VGsubgraph+[val in n.VGsubgraph where not val in child.VGsubgraph ] }

Given the current state of the graph, I know 30 iterations will be enough to terminate, but I'd rather write it with an end condition (no change) to make sure it will always work, even if more depth gets required.

(I'm doing it this way, because a pattern like (n:thg)-[:child|typed0..30 {Relation:1}]->(child:thg)
uses an awful lot of memory because it explores far too many paths. Propagating property value one hop at a time seems like the most efficient way to me, but suggestions welcome if you can think of another one)

Do you need the full paths or just the distinct end nodes?

You could check out the new Quantified Path Patterns that are more efficent than just a path expansion.

Hi Mickael, thanks for you reply.

I've looked into QPP and I feel I'm missing some things about it:

  1. syntactically it feels like there is a non explicit relation in the pattern
    image

does that mean the last node before (first_tx) is identified with the first (tx_i) in the pattern (same for last node in pattern and first node after ) ?

  1. in terms of what difference it make for the planner / why it is faster

But from what I gathered, it still explores ALL possible paths

When in the case described above, it is about spreading a value, one hop at time. No need for the whole path from the initial anchor, for each set.
just recursively matching

(n:thg)-[:child|typed {Relation:1}]->(child:thg)

where n has the value and child does not, until no change are made, seems like a more efficient strategy to me

the above query works fine, it is just that I have to arbitrarily set a number of iteration, when "no change in db by sub-query" would be more generic

Hi Véronique,

Have you given a try to apoc.periodic.commit
Basically, this procedure runs a statement while the value of limit is not 0.
If you are able to count changes, you have your while loop.

To do so, we need to slightly refactor the statement inside the loop. It needs to add 1 to the value of limit only when the SET statement actually does something, which means when the list of values to append is not empty.

It could look like this (not tested):

CALL apoc.periodic.commit('MATCH (n:thg)-[:child|typed {Relation:1}]->(child:thg)
    WHERE n.VGsubgraph<>[] and n.VGsubgraph<> child.VGsubgraph
        AND any(val in n.VGsubgraph WHERE NOT apoc.coll.contains(child.VGsubgraph,val))
    WITH n, child, [val IN n.VGsubgraph WHERE NOT val IN child.VGsubgraph ] AS to_append
    WHERE to_append <> []
    SET child.VGsubgraph=child.VGsubgraph + to_append        
    WITH count(child) as limit
    RETURN limit',
    {});

[edit] Here is a (maybe) simpler query that does basically the same:

CALL apoc.periodic.commit('
MATCH (n:thg)-[:child|jump {Relation:1}]->(child:thg)
UNWIND n.VGsubgraph AS val
WITH DISTINCT val, child
WHERE NOT val IN child.VGsubgraph
WITH child, COLLECT (val) AS vals
SET child.VGsubgraph = child.VGsubgraph + vals
WITH count(child) AS limit
RETURN limit'
,{})

hello Pierre, thanks for you answer (good to see you here ! :slightly_smiling_face: )

Never thought of using apoc.periodic.commit like this !! so clever ! (for the record, it is also in the current version of apoc, doc page here apoc.periodic.commit - APOC Documentation

now I'm wondering why you say the second option is (maybe) simpler?
what is it you like better ?

as a matter of fact it does run a little quicker (about 17s to 23s)
I'd like to know why, because if I profile runs, they are more rows and more db hit in the second option
this is for the first run

I thought maybe it is because for later runs the tendency is reversed, but if I read PROFILE right, it remains the same (see this pdf for details

also, now that you are here Pierre :grin:, if you could shed some light on my two above questions about QPP, it would be very helpful !

I've marked the solution for the initial question, but if someone has complementary info about my above questions concerning QPP or the difference between the 2 queries Pierre suggested, feel free to follow up !

Hi Véronique,

By simpler, I meant readable, apoc-free and Zen :person_in_lotus_position:
Basically, both are doing the same so I would have expected similar perf.
There may be a time overhead when you call apoc procedures but I'm only guessing :man_shrugging:

I forgot to answer the part about QPPs.

  1. For the first question, the answer is yes.
    It you have a pattern like
    ...(before)((x)...(y)){quantifier}(after)

The binding will be done like this:

before <---> x[0]
x[i] <---> y[i-1] when i >= 1
y[-1] <---> after

In case of a match with zero iteration of the repeating pattern:

before <---> after

I like to draw bindings as pairs of half-circles:
...◖(◗<--○-->◖){q}◗...

  1. Second question. I'd say QPPs are good when you need to filter inside the repeating pattern to prune the search space. QPPs enable you to name everything you need to tell the planner what you need.

I see no obvious way to leverage them in your case because you're writing logic for a given node depends on too many paths and that's where the complexity hides.

Last thing. If you absolutely need to make this query faster, you should consider writing a label-propagation-like procedure using the Pregel API.