Scheduling Neo4j Backups while running in Kubernetes


(M. David Allen) #1

I was talking to a customer about this topic earlier today and realized we hadn't written anything up on this, and thought I'd drop a short post here for folks to find as it's relevant.

If you're running Neo4j on Kubernetes, either via the Google Kubernetes Marketplace entry, the public helm chart, or other -- you're probably going to have questions about how to do common maintenance operations in Kubernetes.

As part of the Google Kubernetes Marketplace offering of Neo4j, we engineered a special container to take a backup and copy it to Google Storage, which you can find here:

This container can be pretty easily adapted, say if you wanted to copy the backup to S3 instead of google storage. The concepts are pretty straight forward in the README, the idea is simply to schedule a new container into your kubernetes cluster which runs neo4j-admin backup, tars/gzips the backup set, and then sends it to the cloud storage provider of your choice.

Because kubernetes gives you the ability to do things like set CronJobs to do full backups on a weekly basis, with something like incrementals nightly, for example.

In a future post when the container is ready, I'll cover how to restore from these same backups. Taken together, this lets you stand up new clusters from existing backups quickly.

Running neo4j-admin backup from inside docker container on cron interval