cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Deleting Large Amount of Nodes Results in Quarantined Database

jennifer
Node

Hello everyone,


I am having some problems with large-scale deletions in my database. Referring to this article: Large Delete Transaction Best Practices in Neo4j - Knowledge Base I decided to use the following code to perform large-scale deletions:

call apoc.periodic.iterate("MATCH (n: NodeTypeToBeDeleted) return id(n) as id", "MATCH (n) WHERE id(n) = id DETACH DELETE n", {batchSize:10000}) 
  yield batches, total return batches, total

However, whenever I run this command, it results in an error which causes the database to be quarantined. The database lists the following error:

Failed to apply transaction: Transaction #1466 at log position LogPosition{logVersion=57, byteOffset=169956717} {started 2022-01-31 21:57:53.495+0000, committed 2022-01-31 21:57:54.181+0000, with 138588 commands in this transaction, lease -1, latest committed transaction id when started was 1465, additional header bytes: }


I'm a bit scared to run variations on this code, as I don't actually have permission to un-quarantine the database πŸ™‚ Does anyone know why this is, and if so, how to prevent the database from quarantining while deleting a large number of nodes?
I'm running Neo4j version 4.4.2, Enterprise Edition, from the browser.
Thank you!
1 REPLY 1

Usually for this kind of outcome, it means the combination of the nodes you're deleting per batch, plus the number of attached relationships to those nodes that also have to be deleted, are more than the heap can handle, which can lead to out of memory events that can result in a quarantine. Supernodes, that may have a large number of relationships per node, are usually culprits here.

A good way to handle this is to change your MATCH to relationships attached to your to-be-deleted nodes, and delete those relationships in batches first, so only 10k or so at a time will be deleted.

The inner and outer queries to use would be:

"MATCH (n: NodeTypeToBeDeleted)-[r]-() return id(r) as id"

and

"MATCH ()-[r]->() WHERE id(r) = id DELETE r"

Once the relationships are deleted, you can go ahead with the deletion of the nodes themselves using the original batch delete query you were attempting.