How deletes work in Neo4j

Neo4j uses logical deletes to delete from the database to achieve maximum performance and scalability. To understand how this might appear to an operator of the database, lets take a simple case of loading data into Neo4j. When you start loading data, you can see
the nodes are stored in a file called neostore.nodestore.db. As you keep loading, the file will keep growing.

However, once you start deleting nodes, you can verify that the file neostore.nodestore.db does not reduce in size. In fact, not only
does the size remain the same, but you will also start to see the file neostore.nodestore.db.**id** grow - and keep growing for all records deleted.

This happens because of id re-use. Deletes in Neo4j do not physically delete the records, but rather just flip the bit from available
to unavailable. We keep the deleted (but available to reuse) IDs in neostore.nodestore.db.**id**. This means the
neostore.nodestore.db.**id** file acts sort of like a "recycle bin" where it stores all the deleted ids.

Now you've deleted the data and neostore.nodestore.db is the same size as before the delete, the neostore.nodestore.db.**id** file is
larger than before the delete operation. How do you reclaim this space?

When you start loading new data after the deletes, Neo4j starts using the ids recorded in neostore.nodestore.db.**id** and thus the
neostore.nodestore.db file does not grow in size and the file neostore.nodestore.db.**id** starts decreasing until it's completely
empty.

If you do not plan to add more nodes but still want to shrink the size of the database on disk, you can use the
copy store util. This utility will read an offline database, copy it to a new one, and leave out data that is no longer in use (and also the list of eligible ids to re-use).


Large deletes can generate a lot of transaction logs. You should be aware of this when doing mass delete operations otherwise - ironically

  • your filesystem can potentially fill up.

Deleting the connected nodes changes the file sizes of two store files: neostore.nodestore.db.id and neostore.relationshipstore.db.id. The increase in size is much bigger with neostore.relationshipstore.db.id file. This may be the reason for the increase of your dump file to 4.25GB. If possible check these two store files for their sizes.

Deletes also seem to affect the size of dump files. A DB with about 18M nodes was producing dump files of about 2.5 GB. I deleted about 6M nodes, or about 33% of the data. The ID files in the DB only changed a small amount as described above. But the dump files are now nearly double their previous size, about 4.25 GB. Why is this? Can I expect the dump size to shrink as IDs are recycled?

I'm running Community Edition 4.4.3 on Mac.

There are several files that did grow significantly when I deleted a lot of data:
propertystore.db.id
relationshiptypescanstore.db
relationshipstore.db.id
propertystore.db.strings.id
nodestore.db.id
labelscanstore.db

The net growth of these was about 0.483GB while the growth of the dump file was about 1.980GB, or about 4x the file growth. Maybe there's some expansion as data converts from binary to ASCII in the dump, or something. It's not a problem, it's just surprising. It'll be intersting to see if the dump size stabilizes or shrinks as IDs are reused as the DB collects more data.