Merging data : Deadlocks and performance balance in a heavily connected graph

Hello,

According to @glilienfield in this post UNWIND with inline variable, what am i doing wrong? i went using apoc.create.iterate to import approx 100MB of xmlFiles. It will generate 205000 nodes (8 different node labels) and 335542 relations (4 different types of relations). It tooks 1 hour and half with 202 failing queries for DEADLOCK (with NODE_RELATIONSHIP_GROUP_DELETE related error). Here is the message :

ForsetiClient[transactionId=15518, clientId=6471] can't acquire ExclusiveLock{owner=ForsetiClient[transactionId=15519, clientId=6507]} on NODE_RELATIONSHIP_GROUP_DELETE(36341) because holders of that lock are waiting for ForsetiClient[transactionId=15518, clientId=6471].\n Wait list:ExclusiveLock[\nClient[15519] waits for [ForsetiClient[transactionId=15516, clientId=6513],ForsetiClient[transactionId=15517, clientId=6503],ForsetiClient[transactionId=15518, clientId=6471]]]

First, i don't understand why i have errors related to DELETE operations. I didn't delete any node or relationship in my code. So i will explain my matter to see if someone has ideas to mitigate DEADLOCKS and improve performance.

On the fly, my application :

  • generates queries,
  • analyze them,
  • factorized them with parameters.

The final parameterized queries look like :

CALL apoc.periodic.iterate(
  "WITH $data AS data  UNWIND data as row RETURN row",
  "MERGE (nodea1:HypoComplexe:DOPLER:Maquette:ImportXML{id:row.params1_id})   
      ON CREATE SET nodea1 += row.properties2 
   MERGE (nodeb1:Hypothese:DOPLER:Maquette:ImportXML{id:row.params3_id})
      ON CREATE SET nodeb1 += row.properties4 
   MERGE (nodea1)-[nodea1relnodeb1:CONTAINS]->(nodeb1)",
   {batchSize:100, parallel:true,params:{data: $data}}
)
YIELD batch, operations;

I am calling them with this piece of Javascript code (it is dirty, i put everything to debug without nicely building my code) :

    queriesToRun.forEach(async (objectToRun,idQuery) =>{
      const session = this.driver.session();
      queryAnalysis = objectToRun;
      let queryExecTime = Date.now();
      let resultQuery = null;
      if(enablePerfLof)this.perfsAnalyser[queryAnalysis.id].push({step:'Deb Query', time:queryExecTime});
      try {
          const tx = session.beginTransaction();
          resultQuery = await tx.run(objectToRun.query,objectToRun.varRun);
          await tx.commit();

          const records = resultQuery.records
          for (let i = 0; i < records.length; i++) {
            console.dir(records[i],{depth:null})
          }
      } catch (e) {
        console.log('Req ERROR '+queryAnalysis.id+' : '+e+'\n\n');
        queryExecTime = Date.now();
        if(enablePerfLof)this.perfsAnalyser[queryAnalysis.id].push({step:'Error Query', time:(queryExecTime-(this.perfsAnalyser[queryAnalysis.id][0].time))});
      }finally{
      if(enablePerfLof)this.perfsAnalyser[queryAnalysis.id].push({step:'Close Query', time:(debExecTime-(this.perfsAnalyser[queryAnalysis.id][0].time))});
        console.dir(resultQuery.records,{depth:null})
        let debExecTime = new Date();
        console.log(debExecTime.toISOString());
        await session.close()
        debExecTime = Date.now();
        if(enablePerfLof)this.perfsAnalyser[queryAnalysis.id].push({step:'End Query',time:(debExecTime-(this.perfsAnalyser[queryAnalysis.id][0].time))});
      }
    });​

According to my debug output, the total import could be factorized in six types of queries, here is a table with number of query call for each parameterized query :

Query 1

Query 2

Query 3

Query 4

Query 5

Query 6

number of call

382

201

162252

514

2725

21

Fail(nCall)

NO

NO

YES(202)

NO

NO

NO

Here is the Query 6 with deadlocks :

CALL apoc.periodic.iterate(
   "WITH $data AS data UNWIND data as row RETURN row",
    "MERGE (nodea1:HypoComplexe:DOPLER:Maquette:ImportXML{id:row.params1_id})
        ON CREATE SET nodea1 += row.properties2 
      MERGE (nodeb1:Hypothese:DOPLER:Maquette:ImportXML{id:row.params3_id})
        ON CREATE SET nodeb1 += row.properties4
      MERGE (nodea1)-[nodea1relnodeb1:CONTAINS]->(nodeb1)
      MERGE (nodea2:Ouvrage:DOPLER:Maquette:ImportXML{name:row.params5_name})
      MERGE (nodea2)<-[nodea2relnodeb1:CONCERNS]-(nodeb1)",
      {batchSize:100, parallel:true,params:{data: $data}}
)
YIELD batch, operations;

Here is the situation with deadlocks Graph Image here :

  • node in the center (brown) is nodea1 in the query (Label HypoComplexe)
  • rose node is nodeb1 in the query (Label Hypothese)
  • blue node is nodea2 in the query (Label Ouvrage)

When the query executes, nodea1 may exists (He is created by Query 1 or Query 4 or is already in the database, created in a previous import). nodea2 and nodeb1 may exists with previous executions in the same import or in a previous import. So everything is in MERGE.
An instance of this query is called for each pattern of (Hypothese)-[:CONCERNS]->(Ouvrage), MERGE on nodea1(HypoComplexe) is to ensure with have a reference to bind nodeb1(Hypothese) with a relation with CONTAINS type.
There is 912 (Hypothese)-[:CONCERNS]->(Ouvrage) patterns in this case. Each couple require lock on brown node (HypoComplexe). I think this is were deadlocks occurs

How can i mitigate this?
I know setting parallel to false will do the trick. But it already takes 1h30 to import 260 000 nodes in parallel.
How to balance performances and deadlocks? I try to focus on Query 6 which represents most of my queries.
Is there any mechanism to manually place a lock on a node during a query to avoid such problems?
Is there any other strategy or workaround?
Maybe with trying merging a first time the Node nodea1(HypoComplexe) and after only MATCH clauses but it will be a nightmare to develop.

I am open to exchange on your ideas

Thank you