cancel
Showing results for 
Search instead for 
Did you mean: 

Dealing with deadlocks when batch creating in parallel relationships in Neo4j Community Edition

Hello,

I am trying to bulk import large volumes of data into a Neo4j database. The database includes hundreds of millions of nodes and even more relationships. Because I use the community version of Neo4j I can not use the neo4j-admin import tool. So the only alternative I can use is to import them in batches through transactions and the Python driver. However, during the mass creation of relationships from concurrent execution threads, deadlocks emerge.  How can I resolve the deadlock issue so that concurrent execution threads can create relationships in large batches into the database even though deadlocks can be created at the nodes connecting the relationships? The Neo4j documentation says something about retrying those failed transactions. That is, the logic of dealing with transactions is integrated in the transactions of Neo4j; A few small pieces of code that implement the solution of deadlocks when creating relationships would help a lot. Thanks in advance for your time.

1 REPLY 1

1. you can use neo4j-admin import with community edition (just shutdown the server during import)

2. at your volumes you don't need to do concurrent updates with the drivers, just send batches of 50k updates as a parameter with your update statement, see https://medium.com/neo4j/5-tips-tricks-for-fast-batched-updates-of-graph-structures-with-neo4j-and-c...

3. if you really want to go concurrently through the driver, you need to updated independent subgraphs (i.e. run a clustering algo on the data upfront, e.g. in networkx)