Double unwinds in a cypher script

  • version: 4.1
  • neo4j-driver==1.7.2 (py lib)
  • question about UNWIND syntax
  • tags: cypher, syntax

Hello everyone,

I have a question about syntax. I have a script which executed as a transaction:

CREATE labalbla
// Do something

WITH $list1 AS list1
    UNWIND list1 AS l1
        MATCH (o:Output)
        WHERE o.prop=l1.prop
        MATCH (i:Input)
        WHERE i.prop=l1.prop
        MERGE (o)-[:REL]->(i)

WITH $list2 AS list2
    UNWIND list2 AS l2
        MATCH (o:Output)
        WHERE o.prop=l2.prop
        MATCH (i:Input)
        WHERE i.prop=l2.prop
        MERGE (o)-[:REL]->(i)

The second UNWIND will not executed. if I change order of the "unwinds" the second UNWIND will be executed. I think that the second UNWIND was wrapped in the first.

Question: how can I "separate" this unwind?

PS: cypher's syntax is not allowed me to limit UNWIND's area by BEGIN-END, {} or etc syntax's construction

Best regards,
Alex

In the second UNWIND,  where o.prop = l2.prop. Since you used l1.prop in the second UNWIND it's executing the query but produces no result. 

Sorry, it was my mistype in this sample. The original code is right (I fixed it in this sample). Thank u for your review

MATCH (i:Input)
WHERE o.prop=l1.prop

Is it o.prop or i.prop?

I fixed the mistype.

If you do a PROFILE of the query, you should see where the rows go to 0.

Since operations in Cypher execute per row, when you go to 0 rows (when all rows get filtered out, maybe because your MATCH didn't find anything or your WHERE filtered out all rows) nothing further will execute, unless you bring back a row, usually by use of an aggregation.

For example, if you add WITH count(*) as reset in between these two blocks, then even if all rows had been filtered out somewhere in the first half, you'll regain a row where reset = 0 (the count of 0 rows), which will allow subsequent operations to execute.

That said...be aware that when you use an aggregation in this way, no other variable must be present, so any variables you had previously will drop out of scope. Parameters are still addressable, though, so it should work in this case.

Also be aware that UNWIND is not some kind of structure (such as a FOREACH). All UNWIND does is take a list, and emit a row per element of that list. It's the reverse of a collect().

You can use either an aggregation, as mentioned above), or a WITH DISTINCT clause specifying the variables you want, to shrink cardinality back down.

Alternately you could use a subquery (in Neo4j 4.1) to contain the UNWIND and subsequent operations, though you'll still need to use an aggregation at the end to prevent increasing row cardinality. The advantage there is that the aggregation is constrained by the subquery, and doesn't have to operate across all rows of the query.

Many thanks for your explanation, but my question was about syntax: may I use two unwind. If I use two unwinds, the first unwind wrap the second. How can I separate two unwinds in one query?

I solved the problem the same:

query1 = """
        CREATE .....

        ....
### !!!The first unwind!!!  ###
        WITH $array AS array
            UNWIND array AS item
                MATCH (n1:NodeType1)
                WHERE n.prop1=item.prop1 AND n.prop2=item.prop2
                MATCH (n2:NodeType2)
                WHERE n.prop1=item.prop1 AND n.prop2=item.prop2
                MERGE (n1)-[:UNLOCK]->(n2)
        """
### !!! The second unwind !!! ###
    query2 = """
         WITH $array2 AS array2
            UNWIND array2 AS item
                MATCH (n1:NodeType1)
                WHERE n.prop1=item.prop1 AND n.prop2=item.prop2
                MATCH (n2:NodeType2)
                WHERE n.prop1=item.prop1 AND n.prop2=item.prop2
                MERGE (n1)-[:UNLOCK]->(n2)"""
    try:
        tx.run(query1, **params)
        tx.run(query2, **params)
        logger.info(f"Success")
    except ConstraintError as e:
        logger.info(f"Is already exists")
        tx.rollback()
        tx.close()
    except Exception as e:
        logger.exception("Transaction was failure")

You can use as many UNWINDs as you want freely, but again be aware that this isn't an iteration structure (there is no wrapping or anything going on), this is just a consequence of the way Cypher behaves: operations execute per row. UNWIND is just the opposite of a collect(), it takes every element in a list and outputs a row for it. It expands the row cardinality of the query (if you had 5 rows with lists of 4 each, then you would have 20 rows after the UNWIND, since the sizes of the list multiply times to the row).

What you're looking for is how to shrink the cardinality back after the UNWIND, and you can do this with either aggregations or usage of DISTINCT, depending on which variables you're using in your WITH clause.

If you were trying to combine these two queries into 1, for example, after the operations following the UNWIND of array, you could use this in between to reset cardinality:

WITH DISTINCT $array2 as array2 // there is only one distinct list, so rows reset to 1

or

WITH count(*) as ignored // resets rows to 1
WITH $array2 as array2

or similar.

If you're using a Neo4j verison >= 4.1.x, then you can use subqueries to isolate these, but you will still need to aggregate in some way before the return of the subquery, such as returning count(*) or something to reset rows to 1.

I try to use FOREACH too solve my problem, but I have an error: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid use of MATCH inside FOREACH (line 23, column 17 (offset: 765)) " MATCH (o:Output)

@andrew_bowman How can I use MATCH in FOREACH?

PS: I choiced UNWIND for my query, because UNWIND allows to use a side effect

FOREACH can't be used here because only updating clauses are allowed (so MATCH can't be used). You could use subqueries, but you'd have to still use the techniques I mentioned earlier to reset row cardinality near the end of the subquery.