My graph contains 2.5 Milion nodes.
Current graph size is 58G.
Machine RAM = 24G.
We need to do the cleanup of DB. We have to delete 10 Millions of nodes with their attached relations. As documented graph size (disk) will not reduce because of the re-use of ids.
Will this deletion improve query performance (Read/Write).
Any suggestions for running store utils for compacting DB on production systems.
Actually re-use will allow you to update your graph without changing of graph sizes.
So your graph shouldn't grow if you are deleting/adding nodes over the course of a day.
If you do huge operations, you'd have to wait for or change the grace period for id-reuse.
Compaction will help with disk and memory use, esp. if you have fragementation and partially filled pages.
You can run store-utils on a copy/backup and then test it afterwards.
Hi,
Number of nodes to be deleted are more than 40%. so deletion is the only way for cleanup.
We are setting up casual cluster, How should we run store-utils for compaction on production ?
Is there any chances of data mismatch in case of cluster (Replica or core) ?
And will only deletion is sufficient to reduce the RAM used by db or compaction will help in this?
Thanks
I honestly think that you're best off with deletion and record-reuse.
You can reduce the grace period to something lower than 1hr, e.g. 1 to 10 minutes.
Then your the records marked as unused for deleted node and relationship records will be reused for the new data.
Hi Michael,
I didn't get the concept of the grace period. is it a config to delete automatically deleted ids.
After deletion of data, will it use less memory (RAM) because graph size is the same?
Thanks
It does never delete records, just reuse them.
The grace period is for reuse.