Excessive transaction logging during DETACH DELETE

tms · May 10, 2022, 8:05pm

I'm using Neo4J enterprise (v4,4,2) on an AWS EC2 instance running CentOS 7 and having trouble cleaning up a database after a runaway query added about 8M excess labeled nodes.

I'm running a "DETACH DELETE" operation that is removing about 8M nodes in small batches (10,000). I use this small batch size to avoid running out memory.

Although the query appears to be behaving as desired, Neo4J is filling the disk with large transaction log files every minute or so.

Here is an excerpt from /var/log/neo4j/debug.log:

2022-05-10 19:32:35.273+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.131] version=130, last transaction in previous log=523146, rotation took 48 millis, started after 71159 millis.
2022-05-10 19:33:09.861+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.132] version=131, last transaction in previous log=523176, rotation took 86 millis, started after 34502 millis.
2022-05-10 19:33:44.477+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.133] version=132, last transaction in previous log=523206, rotation took 45 millis, started after 34571 millis.
2022-05-10 19:34:16.330+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.134] version=133, last transaction in previous log=523236, rotation took 44 millis, started after 31809 millis.
2022-05-10 19:34:49.175+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.135] version=134, last transaction in previous log=523266, rotation took 47 millis, started after 32798 millis.

The resulting files in the transactions subdirectory are many and large:

ls -l /var/lib/neo4j/data/transactions/covid-b
total 6341444
-rw-r--r-- 1 root root 176896 May 10 15:36 checkpoint.0
-rw-r--r-- 1 root root 281823579 May 10 15:20 neostore.transaction.db.120
-rw-r--r-- 1 root root 300188833 May 10 15:22 neostore.transaction.db.121
-rw-r--r-- 1 root root 300139578 May 10 15:23 neostore.transaction.db.122
-rw-r--r-- 1 root root 300247329 May 10 15:24 neostore.transaction.db.123
-rw-r--r-- 1 root root 300140025 May 10 15:25 neostore.transaction.db.124
-rw-r--r-- 1 root root 266161319 May 10 15:26 neostore.transaction.db.125
-rw-r--r-- 1 root root 300331081 May 10 15:28 neostore.transaction.db.126
-rw-r--r-- 1 root root 300511449 May 10 15:29 neostore.transaction.db.127
-rw-r--r-- 1 root root 299871860 May 10 15:30 neostore.transaction.db.128
-rw-r--r-- 1 root root 300844707 May 10 15:31 neostore.transaction.db.129
-rw-r--r-- 1 root root 263526666 May 10 15:32 neostore.transaction.db.130
-rw-r--r-- 1 root root 270276758 May 10 15:33 neostore.transaction.db.131
-rw-r--r-- 1 root root 270243754 May 10 15:33 neostore.transaction.db.132
-rw-r--r-- 1 root root 269995516 May 10 15:34 neostore.transaction.db.133
-rw-r--r-- 1 root root 270051214 May 10 15:34 neostore.transaction.db.134
-rw-r--r-- 1 root root 270246701 May 10 15:35 neostore.transaction.db.135
-rw-r--r-- 1 root root 270053812 May 10 15:35 neostore.transaction.db.136
-rw-r--r-- 1 root root 262469905 May 10 15:36 neostore.transaction.db.137
-rw-r--r-- 1 root root 262577377 May 10 15:37 neostore.transaction.db.138
-rw-r--r-- 1 root root 270839820 May 10 15:37 neostore.transaction.db.139
-rw-r--r-- 1 root root 270257299 May 10 15:38 neostore.transaction.db.140
-rw-r--r-- 1 root root 262144000 May 10 15:38 neostore.transaction.db.141

According to du -h, it filled this with more than 6G of logs in just 18 minutes.

What am I doing wrong and what should I do differently?

dana_canzano · May 10, 2022, 9:58pm

@tms

DETACH DELETE is also going to remove relationships. Do you have dense nodes, i.e. some nodes which have for example 50k relationships and to which it may be viewed as just deleteing 1 node but its really deleting 1 node and 50k relationships

You can also influences the txn retention via dbms.tx_log.rotation.retention_policy Configuration settings - Operations Manual
and this can be set dynamically via call dbms.setconfigValue() see https://neo4j.com/docs/operations-manual/current/configuration/dynamic-settings/

tms · May 10, 2022, 10:15pm

I don't think I have any "dense" nodes as you describe them.

Each deleted node (Datapoint) has a single labeled :DATASET relationship to an instance of another labeled node (Dataset). There are typically about 3K Datapoint instances bound to each Dataset, although one anomalous Dataset has many more than that.

I don't know about and have not attempted to configure any transaction-related configuration.

I'll read more about "Dynamic settings". I'm attempting to do a one-time patch of two databases.

Topic		Replies	Views
Neo4j Transaction Logs eating up entire hard disk space Neo4j Graph Platform performance , operations , knowledge-base	3	362	April 17, 2023
Quick tip: Save disk space on Neo4j Desktop Databases Random: Challenges, Polls, Fun Banter	0	1545	March 20, 2020
DETACH DELETE taking long time to execute Cypher performance , cypher	1	241	February 20, 2024
How to programmatically clean `data/transactions` folder in neo4j / auradb? Import / Export performance , cypher , import	2	321	April 17, 2023
Files keep on growing in number and size under neo4j installation folder/databases/graph.db Neo4j Graph Platform	17	954	March 18, 2021

July Summer Fun!

Excessive transaction logging during DETACH DELETE

Related topics