Unexpected behaviour writing an apoc trigger (or, trouble with iterative node generation)

apoc

(Gal) #1

Hello,

I am playing with apoc triggers and would like to ask for an explanation regarding
how cypher and apoc work in this particular case.

I start out with the following nodes:

CREATE (:Base:Element {layer: 0, position: 0})
CREATE (:Input {data: "111"})
CREATE (:BaseTemporaryValues)

This is the trigger code:

CALL apoc.trigger.add('createInputRelationshipStructure', "UNWIND apoc.trigger.propertiesByKey({assignedNodeProperties}, 'data') as input
    WITH input.node as i
        MATCH (previous:Element) WHERE NOT EXISTS((previous)-[:NEXT_LAYER]->())
        WITH i, previous limit 1
        MATCH (temp:BaseTemporaryValues)
        SET temp.position=1
        WITH *
        FOREACH ( element in split(i.data, '') |
            CREATE (e:Element {value: toInteger(element), position: temp.position, layer: previous.layer+1})
            SET temp.position = temp.position+1 )
        WITH temp, previous
            CALL apoc.cypher.doIt(\"UNWIND [1,2,3] as pos
                SET temp.next_position=CASE WHEN pos<3 THEN pos+1 ELSE 1 END
                WITH *
                MATCH (e1:Element {position: pos, layer: previous.layer+1})
                MATCH (e2:Element {position: temp.next_position, layer: previous.layer+1})
                MERGE (e1)-[:NEXT_VALUE]->(e2)
                RETURN 0\", {temp: temp, previous: previous}) yield value
        WITH temp, previous
            REMOVE temp.position
            REMOVE temp.next_position
        WITH temp, previous
            MATCH (e:Element {layer: previous.layer+1})
            MATCH (prev:Element {layer: previous.layer}) WHERE NOT EXISTS((prev)-[:NEXT_LAYER]->())
        CALL apoc.do.when(prev.position<>0, 'WITH prev, e WHERE prev.position=e.position
                                                     MERGE (prev)-[:NEXT_LAYER]->(e) RETURN 0',
                                                'MERGE (prev)-[:NEXT_LAYER]->(e) RETURN 0', 
                         {prev: prev, e: e}) yield value RETURN value",
    {phase: "after"})

This code does what was intended, however, i don't understand exactly why this code doesn't:

CALL apoc.trigger.add('createInputRelationshipStructure', "UNWIND apoc.trigger.propertiesByKey({assignedNodeProperties}, 'data') as input
    WITH input.node as i
        MATCH (previous:Element) WHERE NOT EXISTS((previous)-[:NEXT_LAYER]->())
        MATCH (temp:BaseTemporaryValues)
        SET temp.position=1
        WITH *
        CALL apoc.cypher.doIt(\"UNWIND split(i.data, '') as element
            CREATE (e:Element {value: toInteger(element), position: temp.position, layer: previous.layer+1})
            SET temp.position = temp.position+1 
            RETURN 0\", {temp: temp, i: i, previous: previous}) yield value
        WITH temp, previous
            CALL apoc.cypher.doIt(\"UNWIND [1,2,3] as pos
                SET temp.next_position=CASE WHEN pos<3 THEN pos+1 ELSE 1 END
                WITH *
                MATCH (e1:Element {position: pos, layer: previous.layer+1})
                MATCH (e2:Element {position: temp.next_position, layer: previous.layer+1})
                MERGE (e1)-[:NEXT_VALUE]->(e2)
                RETURN 0\", {temp: temp, previous: previous}) yield value
        WITH temp, previous
            REMOVE temp.position
            REMOVE temp.next_position
        WITH previous
            MATCH (e:Element {layer: previous.layer+1})
        CALL apoc.do.when(previous.position<>0, 'WITH previous, e WHERE previous.position=e.position
                                                     MERGE (previous)-[:NEXT_LAYER]->(e) RETURN 0',
                                                'MERGE (previous)-[:NEXT_LAYER]->(e) RETURN 0', 
                         {previous: previous, e: e}) yield value RETURN value",
    {phase: "after"})

The intention is the following:
The data property is a 3-character string with digits. the trigger makes it so that on setting this property repeatedly a chain is constructed, starting from the Base node, with each digit being a node. Each digit node is connected to it's right hand neighbour via a :NEXT_VALUE relationship, and it's circular so the :NEXT_VALUE node of the rightmost digit is the leftmost digit. in addition to this, each time the input.data property is set the previous digit nodes are connected to the next one via a :NEXT_LAYER relationship.

The question is the following:
Using the second form of the trigger generates additional nodes. The first time the data property is set, the trigger works correctly. The second time it generates 3 nodes for each digit (with position properties x, x+3, x+6) instead of 1 node per digit. i fixed the code by limiting the result set of 'previous' nodes at the start of the trigger, but i do not understand why the number of nodes set to the 'previous' variable would change anything since none of the unwind clauses are based on that variable.

I hope the question is posed clearly enough. I am thankful for any explanations of this, as well as any corrections or optimizations you might think of.

Oh, this is my first post. Hello everyone, my name is Gal. Hopefully i can start answering questions more than i ask them soon :).


(Andrew Bowman) #2

Hi there, welcome!

For this one, I think you're missing a fundamental understanding of how UNWIND works, and how Cypher works in general.

UNWIND is not a looping mechanism. The end result may seem like one, but if you have a flawed understanding of how it works you may be bitten, like in this case.

UNWIND transforms elements of a list into rows. Cypher operations execute per row.

So if you UNWIND a list of 10 elements, you will have 10 rows. Any subsequent operations (MATCH, WITH, etc) are executed per row. This is usually what you want, as you tend to UNWIND a collection before you do operations involving the elements of the collection (such as matching from unwound nodes using a MATCH pattern), but even in the case that your MATCH doesn't use the variable of an unwound element, it will still execute for each row, that's where the duplication is coming from.

See our knowledge base article on understanding Cypher cardinality for more info.

Since you want to execute the block of cypher near the beginning that doesn't make use of i at all, and you only want to execute it once, move your UNWIND after that section, to just before your FOREACH. You may also want to reevaluate your query after you've had a chance to absorb the article's contents to ensure you're using the correct approach.


(Gal) #3

Thanks for the reply, the article is exactly what i needed. I wrote a new, somewhat prettier version of the query, but managed to get stuck again. This is the code:

CALL apoc.trigger.add('createInputRelationshipStructure', "UNWIND apoc.trigger.propertiesByKey({assignedNodeProperties}, 'data') as input
    WITH input.node as i
        MATCH (previous:Element) WHERE NOT exists((previous)-[:NEXT_LAYER]->())
        WITH i, previous.layer as layer LIMIT 1
        UNWIND [1,2,3] as position
            CREATE (e:Element {value: toInteger(split(i.data, '')[position-1]), position: position, layer: layer+1})
        WITH e
            CALL apoc.do.when(e.layer>1, 
                'MATCH (p:Element {layer: e.layer-1, position: e.position})
                 MERGE (p)-[:NEXT_LAYER]->(e)',
                'MATCH (p:Element {layer: e.layer-1})
                 MERGE (p)-[:NEXT_LAYER]->(e)',
                 {e:e}) yield value
        WITH e
            CALL apoc.do.when(e.position<3,
                'MATCH (e2:Element {position: e.position+1, layer: e.layer})
                 MERGE (e)-[:NEXT_VALUE]->(e2)',
                'MATCH (e2:Element {position: 1, layer: e.layer})
                 MERGE (e)-[:NEXT_VALUE]->(e2)',
                 {e: e}) yield value RETURN 1",
    {phase: "after"})

Everything works okay up until the last CALL, which does not execute. If i swap the two CALL clauses only the element with position:3 connects with position:1, as if the procedure only executes once (and the second call does not execute at all). running the query outside of the trigger and returning e before the CALL clauses returns 3 rows (saying "with distinct e" does the same). Running the following:

MATCH (i:Input)
MATCH (previous:Element) WHERE NOT exists((previous)-[:NEXT_LAYER]->())
        WITH i, previous.layer as layer LIMIT 1
        UNWIND [1,2,3] as position
            CREATE (e:Element {value: toInteger(split(i.data, '')[position-1]), position: position, layer: layer+1})    
        WITH e
             CALL apoc.do.when(e.layer>1, 
                'MATCH (p:Element {layer: e.layer-1, position: e.position})
                 MERGE (p)-[:NEXT_LAYER]->(e)',
                'MATCH (p:Element {layer: e.layer-1})
                 MERGE (p)-[:NEXT_LAYER]->(e)',
                 {e:e}) yield value RETURN 1

and then

MATCH (e:Element) WHERE NOT exists((e)-[:NEXT_LAYER]->())
            CALL apoc.do.when(e.position<3,
                'MATCH (e2:Element {position: e.position+1, layer: e.layer})
                 MERGE (e)-[:NEXT_VALUE]->(e2)',
                'MATCH (e2:Element {position: 1, layer: e.layer})
                 MERGE (e)-[:NEXT_VALUE]->(e2)',
                 {e: e}) yield value  RETURN 1

does what is expected (makes all the wanted relationships). This seems really inconsistent. my explanation for the single execution was that the call is executed once with 3 rows as the e parameter in the parameter list (thus executing the content of the call three times) and the predicate only being evaluated once (maybe the first or last row), but then why is that not true for the standalone query (the last one)?

I guess i still don't really understand UNWIND and rows... does running WITH e RETURN e before the call clauses run the return statement 3 times? even so, even if there are redundant executions, shouldnt all the desired relationships still appear?

Thanks again for your help, i already learned a lot from this.