Suppose we are provisioning data in neo4j. We are also taking backup periodically. Suppose today afternoon, due to some reason the database has crashed and we restored the data from the last backup which was taken yesterday. But suppose today morning some new data was provisioned. If we restore yesterday's backup, today's data will be lost. Is there any way that we can recover today's newly created data as well?
No. If its not backed up.
When data is created/updated it is written to pagecache/RAM and also the transaction logs.
At time of checkpoint, i.e. default every 15 minutes/100k txn, we then checkpoint which flushes data from pagecache/RAM to the final database files and then write a record to the transaction logs effectively saying 'we are good to here'. If you gracefully stop the database we thus checkpoint as above.
If however the database comes down hard / i.e. crashes, a checkpoint clearly doesnt happen. However upon restart we would read the transaction logs, see the last record does not indicate we are good to here
and as such we would replay the transaction log so as to finally get the data into the final database files, and then update the transaction logs to indicate we are good to here
There is no way however to replay transaction logs manuallly/individually.
Hi Dana
Thanks for your reply. But we are bit confused.
You are saying that -
However upon restart we would read the transaction logs, see the last record does not indicate we are good to here
and as such we would replay the transaction log so as to finally get the data into the final database files
This means that all the data that is in the transaction logs will be recovered after the crash. So we will get back most of the data even if the database crashes.
But the data that is committed, but not yet in the transaction logs will not be recovered. That's what you mean? So, what is happening first - data is committed first or transaction logs are written first?
Is it possible to recover the data that is committed, but not yet in the transaction logs?
Thanks again for clarifying our doubts.
Nirmalya
Do you have such an experience where the database did crash and you are wondering how to recover the data or are you just wondering what would I do in the event of such expereince?
When for example you create a node, and for example at 09:00 am, i.e.
create (n:Person {name: 'nirmalya.sinha`});
the node is 1st created in RAM (i.e. the area described by dbms.memory.pagecache.size ). The node creation is also written to the transaction logs. And the node creation is effectively committed from a database perspective in that if another database user runs
match (n:Person {name: 'nirmalya.sinha`}) return n;
they would thus see the node.
At time of checkpoint, for example at 09:15a, we flush this 'new' data (i.e. this new node as well as any other new data) to data/databases//neostore.nodestore.db and we also update the transaction log to effectively indicate all is fully written to the database at this point (i.e. 09:15a).
Now if at 09:18 we create a new node, i.e.
create (n:Person {name: 'dana`});
the node is 1st created in RAM (i.e. the area described by dbms.memory.pagecache.size ). The node creation is also written to the transaction logs. And the node creation is effectively committed from a database perspective in that if another database user runs
match (n:Person {name: 'dana`}) return n;
they would see this node.
At 09:20a if we then 'crash' and as the last checkpoint was at 09:15 and this new node as created at 09:18, the node creation is still in the transaction logs. Upon restart of Neo4j we would open the transaction logs, see the last record was not a checkpoint and as such replay all the transactions in the transaction log which are post the last checkpoint, i.e. the checkpoint at 09:15a. And thus the 'dana' node would be added back to the database and specifically to data/databases//neostore.nodestore.db.
As from my original response
There is no way however to replay transaction logs manuallly/individually.
Thanks a lot. Now everything is clear. So, to my understanding -
- If the DB is crashed, all the data will be recovered after it is restarted. So we are safe.
- I also believe that if we create a neo4j index, that is also stored in the transaction log and that can be also recovered when the DB is started.
- We have deployed neo4j in kubernates cluster. Many times one follower node fails and we restore it with the neo4j-admin restore command. We restore the backup taken in the master node. So in that case we will get only the backup data and not the data from the transaction logs.
Thanks