in short the question is whether CYPHER has a way to replace the for like loop we can build with UNWIND and range() by a while like loop that terminates when the inside subquery makes no change anymore
The context I would use this in is that I have a graph with a specific shape by construction : it has two types of relationships : child and jump. There is a "pseudo" root, with children that have been build to represent main areas of the graph (VGsubgraph)
The process is seeded by setting the root children VGsubgraph properties to a list with just their own name in it. It then propagates those VGsubgraph values down the child relationship. Some nodes have several VGsubgraph values, but I know the traversal will end because before adding a new value, I check it's not already in the list. So after a certain number of iterations, the query will make no change anymore.
unwind range(0,30) as i
call {
match (n:thg)-[:child|typed {Relation:1}]->(child:thg) where
n.VGsubgraph<>[] and n.VGsubgraph<> child.VGsubgraph and
any(val in n.VGsubgraph where not apoc.coll.contains(child.VGsubgraph,val))
set child.VGsubgraph=child.VGsubgraph+[val in n.VGsubgraph where not val in child.VGsubgraph ] }
Given the current state of the graph, I know 30 iterations will be enough to terminate, but I'd rather write it with an end condition (no change) to make sure it will always work, even if more depth gets required.
(I'm doing it this way, because a pattern like (n:thg)-[:child|typed0..30 {Relation:1}]->(child:thg)
uses an awful lot of memory because it explores far too many paths. Propagating property value one hop at a time seems like the most efficient way to me, but suggestions welcome if you can think of another one)
I've looked into QPP and I feel I'm missing some things about it:
syntactically it feels like there is a non explicit relation in the pattern
does that mean the last node before (first_tx) is identified with the first (tx_i) in the pattern (same for last node in pattern and first node after ) ?
in terms of what difference it make for the planner / why it is faster
But from what I gathered, it still explores ALL possible paths
When in the case described above, it is about spreading a value, one hop at time. No need for the whole path from the initial anchor, for each set.
just recursively matching
(n:thg)-[:child|typed {Relation:1}]->(child:thg)
where n has the value and child does not, until no change are made, seems like a more efficient strategy to me
the above query works fine, it is just that I have to arbitrarily set a number of iteration, when "no change in db by sub-query" would be more generic
Have you given a try to apoc.periodic.commit
Basically, this procedure runs a statement while the value of limit is not 0.
If you are able to count changes, you have your while loop.
To do so, we need to slightly refactor the statement inside the loop. It needs to add 1 to the value of limit only when the SET statement actually does something, which means when the list of values to append is not empty.
It could look like this (not tested):
CALL apoc.periodic.commit('MATCH (n:thg)-[:child|typed {Relation:1}]->(child:thg)
WHERE n.VGsubgraph<>[] and n.VGsubgraph<> child.VGsubgraph
AND any(val in n.VGsubgraph WHERE NOT apoc.coll.contains(child.VGsubgraph,val))
WITH n, child, [val IN n.VGsubgraph WHERE NOT val IN child.VGsubgraph ] AS to_append
WHERE to_append <> []
SET child.VGsubgraph=child.VGsubgraph + to_append
WITH count(child) as limit
RETURN limit',
{});
[edit] Here is a (maybe) simpler query that does basically the same:
CALL apoc.periodic.commit('
MATCH (n:thg)-[:child|jump {Relation:1}]->(child:thg)
UNWIND n.VGsubgraph AS val
WITH DISTINCT val, child
WHERE NOT val IN child.VGsubgraph
WITH child, COLLECT (val) AS vals
SET child.VGsubgraph = child.VGsubgraph + vals
WITH count(child) AS limit
RETURN limit'
,{})
hello Pierre, thanks for you answer (good to see you here ! )
Never thought of using apoc.periodic.commit like this !! so clever ! (for the record, it is also in the current version of apoc, doc page here apoc.periodic.commit - APOC Documentation
now I'm wondering why you say the second option is (maybe) simpler?
what is it you like better ?
as a matter of fact it does run a little quicker (about 17s to 23s)
I'd like to know why, because if I profile runs, they are more rows and more db hit in the second option
this is for the first run
I've marked the solution for the initial question, but if someone has complementary info about my above questions concerning QPP or the difference between the 2 queries Pierre suggested, feel free to follow up !
By simpler, I meant readable, apoc-free and Zen
Basically, both are doing the same so I would have expected similar perf.
There may be a time overhead when you call apoc procedures but I'm only guessing
For the first question, the answer is yes.
It you have a pattern like ...(before)((x)...(y)){quantifier}(after)
The binding will be done like this:
before <---> x[0]
x[i] <---> y[i-1] when i >= 1
y[-1] <---> after
In case of a match with zero iteration of the repeating pattern:
before <---> after
I like to draw bindings as pairs of half-circles: ...◖(◗<--○-->◖){q}◗...
Second question. I'd say QPPs are good when you need to filter inside the repeating pattern to prune the search space. QPPs enable you to name everything you need to tell the planner what you need.
I see no obvious way to leverage them in your case because you're writing logic for a given node depends on too many paths and that's where the complexity hides.