Does neo4j-admin restore do anything more than a file copy when seeding a CORE cluster from a backup

backup

(Galatians) #1

I run a 3 server CORE cluster and I regularly seed an identical cluster with the production data for testing purposes. I seed from a backup created using neo4j-admin backup. In the past I've always used the following procedure to seed the cluster:

  1. stop the cluster
  2. run neo4j-admin unbind --database=graph.db on each server
  3. delete graph.db on each server
  4. copy the backup directory to graph.db on each server
  5. start the cluster

Recently I started using neo4j-admin restore instead of just copying the backup directory to graph.db. Every time I do this, I run a md5 digest command on the backup directory and the resulting graph.db directory and every time they are identical.

So my question is: does neo4j-admin restore do anything other than copy the backup files?

I ask because my database is 120Gb so it's much faster to directly scp the backup to the graph.db on the other servers instead of having to scp then neo4j-admin restore.

This is the command I use to generate the digest for a whole directory. It only works if your current working directory is the directory you are generating the digest for:

find "." -type f -print0 | sort -z | xargs -r0 md5sum | md5sum


(Galatians) #2

@david.allen do you have any insight on this?


(M. David Allen) #3

I can't answer authoritatively about restore, but I believe the backup tool which produces the set you'd be using does various kinds of integrity checks you'd be missing if you just copied graph.db. And so it's not the same as just copying the files. Another difference is that you can take backups while the database is online, while copying the underlying files would almost certainly result in a backup set with problems if your database was still running.

I wouldn't recommend copying the graph.db folder, because these are considered implementation details which could change with time. Neo4j-admin is going to stay supported, where in future versions raw copying the directory could break, as users aren't technically supposed to access the files on disk directly.


(Galatians) #4

I'm not sure you understood my question. I'm not copying graph.db, I'm only copying the backup folder which was created earlier with neo4j-admin backup eg.

service neo4j stop
neo4j-admin unbind --database=graph.db
rm -r /var/neo4j/data/databases/graph.db
cp /var/neo4j/backup/3.4.9 /var/neo4j/data/databases/graph.db
service neo4j start

The cp step is producing byte for byte identical results to neo4j-admin restore.