Showing results for 
Search instead for 
Did you mean: 

'ExclusiveLock errors' when writing to Neo4j with Spark

Node Clone

Two different teams have run into a mysterious 'locking' errors when trying to write a graph into Neo4j with PySpark:

Caused by: org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 15 in stage 13.0 failed 1 times, 
most recent failure: 
Lost task 15.0 in stage 13.0 (TID 1033, localhost, executor driver): 
ForsetiClient[1] can't acquire ExclusiveLock{owner=ForsetiClient[2]} on NODE(243),
 because holders of that lock are waiting for ForsetiClient[1].

We think it might have something to do with unique node properties, and we've messed around with constraints without much luck so far.

Any ideas?


Not sure if this is still an issue for you, but it has to do with the exclusive node locking in Neo4J when writing relationships combined with parallelization in the spark writer.  If two relationships in different executors from spark have the same node and are attempted to be written in a concurrent transaction, one transaction will fail to get the necessary exclusive lock on the node.

It kills performance, but using a single executor to write each batch is the only current workaround we've found.

Nodes 2022
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.