Decoupling two queries

y-pankaj · June 17, 2021, 6:04pm

I want to run two independent queries in the same query. First I want to create nodes using the nodesData object and then I want to create some relationships using edgesData. Running the two queries one by one gives expected result but combining them produces multiple edges between nodes.

session.run(
          // create nodes
        ` WITH $nodesData as value
          UNWIND value as data
          CALL apoc.create.node([data.class], data)
          YIELD node

          // breakpoint. Running above query and then below query gives
          // expected result.
          WITH $edgesData as value
          UNWIND value AS data  
          MATCH (n {newtId: data.source}), (m { newtId: data.target})  
          WITH n, m, data
          CALL apoc.create.relationship(n,data.class,data,m) 
          YIELD rel  
          RETURN rel
            `,
          { nodesData: nodesData, edgesData: edgesData }
        )

I suspect that this might be due to how data is carried over between the statements. Such as, maybe I can't use `WITH $edgesData as value` just after `YEILD node`. Maybe I should somehow drop the records then use `WITH $edgesData as value` statement. But I am not sure.

What's the issue here?
Also, if possible please share some resource explaining how data is organised/ carried between statements in Neo4j.
Thanks.

andrew_bowman · June 17, 2021, 8:37pm

The issue is one of cardinality: Cypher operations yield rows. Cypher operations execute per row. This is a critical understanding to keep in mind, that the data you're generating (and operations executing!) in the second query is dependent upon the data in the first query.

Since you're yielding > 1 rows from the first part of the query, subsequent operations are executing per row, redundantly. That's unnecessarily multiplying out not only the work that is being done, but the results yielded at the end.

So the question is, how do we make the data independent? We can aggregate, so we collect the nodes into a single row of nodesData or nodeCount (cardinality resets to a single row), and then subsequent operations in the second part of the query only happen once, and no operations or results get multiplied by the input rows. Then you collect the edgesData (or count into relCount), and can return that if needed. And if you want more clear separation (as well as protection from cases where either $nodesData or $edgesData is empty), then use subqueries around each:

CALL {
  UNWIND $nodesData as data
  CALL apoc.create.node([data.class], data) YIELD node
  WITH count(node) as nodeCount // needed to protect against empty parameter list
  RETURN nodeCount // subqueries must return something
}
WITH nodeCount // only a single row at this point from the earlier aggregation
CALL {
  UNWIND $edgesData AS data  
  // you should be using labels or this will be really really slow!
  MATCH (n {newtId: data.source}), (m { newtId: data.target})  
  WITH n, m, data
  CALL apoc.create.relationship(n,data.class,data,m) YIELD rel  
  WITH count(rel) as relCount
  RETURN relCount
}
RETURN nodeCount, relCount

Topic		Replies	Views
How to combine two different queries together into a single one Cypher	1	3476	February 13, 2019
Merge tow cypher statement into single Procedures & APOC apoc , cypher	2	283	December 17, 2020
Single queries vs Transactions Cypher performance , cypher	0	253	November 22, 2021
Combining queries Cypher	5	522	January 9, 2020
What is the concurrency of Cypher subqueries ('CALL {}')? Neo4j Graph Platform migrated	7	109	August 10, 2022

July Summer Fun!

Decoupling two queries

Related topics