Loop until subqueries does not make any more changes?

v2belleville · April 17, 2024, 2:08pm

in short the question is whether CYPHER has a way to replace the for like loop we can build with UNWIND and range() by a while like loop that terminates when the inside subquery makes no change anymore

The context I would use this in is that I have a graph with a specific shape by construction : it has two types of relationships : child and jump. There is a "pseudo" root, with children that have been build to represent main areas of the graph (VGsubgraph)
The process is seeded by setting the root children VGsubgraph properties to a list with just their own name in it. It then propagates those VGsubgraph values down the child relationship. Some nodes have several VGsubgraph values, but I know the traversal will end because before adding a new value, I check it's not already in the list. So after a certain number of iterations, the query will make no change anymore.

unwind range(0,30) as i
call {
    match (n:thg)-[:child|typed {Relation:1}]->(child:thg) where
    n.VGsubgraph<>[] and n.VGsubgraph<> child.VGsubgraph and
    any(val in n.VGsubgraph where not apoc.coll.contains(child.VGsubgraph,val))
    set child.VGsubgraph=child.VGsubgraph+[val in n.VGsubgraph where not val in child.VGsubgraph ] }

Given the current state of the graph, I know 30 iterations will be enough to terminate, but I'd rather write it with an end condition (no change) to make sure it will always work, even if more depth gets required.

(I'm doing it this way, because a pattern like (n:thg)-[:child|typed0..30 {Relation:1}]->(child:thg)
uses an awful lot of memory because it explores far too many paths. Propagating property value one hop at a time seems like the most efficient way to me, but suggestions welcome if you can think of another one)

michael.hunger · April 20, 2024, 11:02pm

Do you need the full paths or just the distinct end nodes?

You could check out the new Quantified Path Patterns that are more efficent than just a path expansion.

v2belleville · April 22, 2024, 3:17pm

Hi Mickael, thanks for you reply.

I've looked into QPP and I feel I'm missing some things about it:

syntactically it feels like there is a non explicit relation in the pattern

does that mean the last node before (first_tx) is identified with the first (tx_i) in the pattern (same for last node in pattern and first node after ) ?

in terms of what difference it make for the planner / why it is faster

But from what I gathered, it still explores ALL possible paths

When in the case described above, it is about spreading a value, one hop at time. No need for the whole path from the initial anchor, for each set.
just recursively matching

(n:thg)-[:child|typed {Relation:1}]->(child:thg)

where n has the value and child does not, until no change are made, seems like a more efficient strategy to me

the above query works fine, it is just that I have to arbitrarily set a number of iteration, when "no change in db by sub-query" would be more generic

pierre.halftermeyer · April 22, 2024, 7:47pm

Hi Véronique,

Have you given a try to apoc.periodic.commit
Basically, this procedure runs a statement while the value of limit is not 0.
If you are able to count changes, you have your while loop.

To do so, we need to slightly refactor the statement inside the loop. It needs to add 1 to the value of limit only when the SET statement actually does something, which means when the list of values to append is not empty.

It could look like this (not tested):

CALL apoc.periodic.commit('MATCH (n:thg)-[:child|typed {Relation:1}]->(child:thg)
    WHERE n.VGsubgraph<>[] and n.VGsubgraph<> child.VGsubgraph
        AND any(val in n.VGsubgraph WHERE NOT apoc.coll.contains(child.VGsubgraph,val))
    WITH n, child, [val IN n.VGsubgraph WHERE NOT val IN child.VGsubgraph ] AS to_append
    WHERE to_append <> []
    SET child.VGsubgraph=child.VGsubgraph + to_append        
    WITH count(child) as limit
    RETURN limit',
    {});

[edit] Here is a (maybe) simpler query that does basically the same:

CALL apoc.periodic.commit('
MATCH (n:thg)-[:child|jump {Relation:1}]->(child:thg)
UNWIND n.VGsubgraph AS val
WITH DISTINCT val, child
WHERE NOT val IN child.VGsubgraph
WITH child, COLLECT (val) AS vals
SET child.VGsubgraph = child.VGsubgraph + vals
WITH count(child) AS limit
RETURN limit'
,{})

v2belleville · April 24, 2024, 7:49am

hello Pierre, thanks for you answer (good to see you here ! )

Never thought of using apoc.periodic.commit like this !! so clever ! (for the record, it is also in the current version of apoc, doc page here apoc.periodic.commit - APOC Documentation

now I'm wondering why you say the second option is (maybe) simpler?
what is it you like better ?

as a matter of fact it does run a little quicker (about 17s to 23s)
I'd like to know why, because if I profile runs, they are more rows and more db hit in the second option
this is for the first run

I thought maybe it is because for later runs the tendency is reversed, but if I read PROFILE right, it remains the same (see this pdf for details

also, now that you are here Pierre , if you could shed some light on my two above questions about QPP, it would be very helpful !

v2belleville · April 29, 2024, 8:22am

I've marked the solution for the initial question, but if someone has complementary info about my above questions concerning QPP or the difference between the 2 queries Pierre suggested, feel free to follow up !

pierre.halftermeyer · April 30, 2024, 8:58pm

Hi Véronique,

By simpler, I meant readable, apoc-free and Zen
Basically, both are doing the same so I would have expected similar perf.
There may be a time overhead when you call apoc procedures but I'm only guessing

pierre.halftermeyer · April 30, 2024, 9:16pm

I forgot to answer the part about QPPs.

For the first question, the answer is yes.
It you have a pattern like
...(before)((x)...(y)){quantifier}(after)

The binding will be done like this:

before <---> x[0]
x[i] <---> y[i-1] when i >= 1
y[-1] <---> after

In case of a match with zero iteration of the repeating pattern:

before <---> after

I like to draw bindings as pairs of half-circles:
...◖(◗<--○-->◖){q}◗...

Second question. I'd say QPPs are good when you need to filter inside the repeating pattern to prune the search space. QPPs enable you to name everything you need to tell the planner what you need.

I see no obvious way to leverage them in your case because you're writing logic for a given node depends on too many paths and that's where the complexity hides.

pierre.halftermeyer · April 30, 2024, 9:23pm

Last thing. If you absolutely need to make this query faster, you should consider writing a label-propagation-like procedure using the Pregel API.

Topic		Replies	Views
Modification of apoc.subgraphAll procedure Procedures & APOC apoc , cypher	5	145	January 2, 2024
Difficult query not working Cypher operations	5	1072	May 21, 2019
Need Help In query Cypher apoc , cypher	2	909	November 20, 2019
Help re-writing query Cypher performance , cypher	7	958	March 11, 2019
Simple query returns (no records) Cypher apoc , cypher	3	155	May 2, 2023

Loop until subqueries does not make any more changes?

Related Topics