Hi,
I'm observing some unexpected behavior related to memory usage when deleting a large number of relationships.
For demonstration purposes, let's say I'm deleting ~20 million of relationships on a neo4j container with max heap configured to 800mb using the following cypher:
CALL apoc.periodic.iterate("MATCH (n:Node)-[r]->() RETURN r", "DELETE r", {batchSize:1000})
Observing the heap usage when this is running:
You can see there's linear growth until eventually running out of memory at 800mb. Now, after some profiling and reading I concluded the reason for this could be that neo4j is keeping an array of IDs of deleted relationships in memory until the transaction finishes. I would expect this to happen every "batchSize", but it appears it lasts until the procedure call completes. Another reason could also be that the match statement is accumulating the results faster than we're able to delete them and it's keeping them in memory. It would be great if someone could confirm this is indeed the reason for the growth.
Now, I've found a way around that issue by doing another level of "batching" around the procedure call to allow neo4j to do whatever cleanup is needed between the calls:
for hasMore := true; hasMore; {
committed, _ := session.Run("CALL apoc.periodic.iterate('MATCH (n:Node)-[r]->() RETURN r LIMIT 1000000', 'DELETE r', {batchSize:1000})", nil)
hasMore = committed >= 1000000
}
This results in a more stable memory usage and a successful delete.
This has been working fine, until I observed another OOM recently, this time even when using the improved approach. Since there could be other transactions (but not related to nodes and relationships from the delete query) running at the same time, I tested the previous approach with running apoc.util.sleep() at the same time as deletes are happening. This time, memory usage is again similar to initial approach and results in OOM.
Could this mean that neo4j is holding to something internally (presumably the IDs of deleted nodes) as long as there is any transaction running?
Keep in mind that heap size and the number of relationships are set this way so it's easier to reproduce. Initially the same behaviour was observed when deleting 1.5 billion edges on a neo4j container with 32g heap. Running neo4y 4.4.29 community and apoc 4.4.0.22.