Neo4j uses logical deletes to delete from the database to achieve maximum performance and scalability. To understand how this might appear to an operator of the database, lets take a simple case of loading data into Neo4j. When you start loading data, you can see
the nodes are stored in a file called
neostore.nodestore.db. As you keep loading, the file will keep growing.
However, once you start deleting nodes, you can verify that the file
neostore.nodestore.db does not reduce in size. In fact, not only
does the size remain the same, but you will also start to see the file
neostore.nodestore.db.**id** grow - and keep growing for all records deleted.
This happens because of id re-use. Deletes in Neo4j do not physically delete the records, but rather just flip the bit from available
to unavailable. We keep the deleted (but available to reuse) IDs in
neostore.nodestore.db.**id**. This means the
neostore.nodestore.db.**id** file acts sort of like a "recycle bin" where it stores all the deleted ids.
Now you've deleted the data and
neostore.nodestore.db is the same size as before the delete, the
neostore.nodestore.db.**id** file is
larger than before the delete operation. How do you reclaim this space?
When you start loading new data after the deletes, Neo4j starts using the ids recorded in
neostore.nodestore.db.**id** and thus the
neostore.nodestore.db file does not grow in size and the file
neostore.nodestore.db.**id** starts decreasing until it's completely
If you do not plan to add more nodes but still want to shrink the size of the database on disk, you can use the
copy store util. This utility will read an offline database, copy it to a new one, and leave out data that is no longer in use (and also the list of eligible ids to re-use).
Large deletes can generate a lot of transaction logs. You should be aware of this when doing mass delete operations otherwise - ironically
Deletes also seem to affect the size of dump files. A DB with about 18M nodes was producing dump files of about 2.5 GB. I deleted about 6M nodes, or about 33% of the data. The ID files in the DB only changed a small amount as described above. But the dump files are now nearly double their previous size, about 4.25 GB. Why is this? Can I expect the dump size to shrink as IDs are recycled?
I'm running Community Edition 4.4.3 on Mac.
Deleting the connected nodes changes the file sizes of two store files: neostore.nodestore.db.id and neostore.relationshipstore.db.id. The increase in size is much bigger with neostore.relationshipstore.db.id file. This may be the reason for the increase of your dump file to 4.25GB. If possible check these two store files for their sizes.
There are several files that did grow significantly when I deleted a lot of data:
The net growth of these was about 0.483GB while the growth of the dump file was about 1.980GB, or about 4x the file growth. Maybe there's some expansion as data converts from binary to ASCII in the dump, or something. It's not a problem, it's just surprising. It'll be intersting to see if the dump size stabilizes or shrinks as IDs are reused as the DB collects more data.