Can't backup remote database using Neo4J

I'm trying to remotely back up my Neo4J database for already 2 days and nothing works.

I run

sudo neo4j-admin backup --backup-dir=backup --name=graph.db-backup --from=1.1.1.1:1111 --timeout=50m

The files start to get saved and then simply erased after a while and there's no backup.

I tried setting up --pagecache=16M and HEAP_SIZE but it has no effect. Sometime it just stalls sometimes I get an error like this:

unexpected error: java.io.IOException: org.neo4j.com.ComException: Channel has been closed

The DB I'm backing up is Enterprise 3.3.3 and the one I'm backing up with is 3.5.14

Thank you for any help.

Is this the normal Neo4J behavior?

The two notable things that you're mentioning here are

  • You're backing up Neo4j 3.3 with Neo4j 3.5 tools. I believe there were some store upgrade changes between these versions and so I would not advise that....have you tried using 3.3 tooling?
  • The concrete error that you've provided suggest there's a network interruption that's happening some place. It's tough to see what's happening without a full paste of the output of the command, and some knowledge of what's happening on the network between you and the database.

Ok, I'm trying with 3.3 and I get this error now:

command failed: Backup failed: Unexpected Exception

How can I find out what's happening really? Is there a log file for the neo4j-admin backup command?

And then if that doesn't happen if the backup runs until the end, then automatically it erases everything and nothing happens β€” the process stalls.

Honestly, I'm dealing with it for already 5 days. Is it supposed to be that hard to do a backup?

Just to clarify once again: the files are being copied to the temp-copy folder inside the folder (backup) where I am making a backup to. But either there is an error during the backup, or if the backup is done (judging by the size of the temp-copy, I guess at the stage where it's supposed to finalize the "write" it just erases everything and the process stalls everything disappears.

I'm doing a backup from a WebFaction server (production) to a Neo4J db hosted on AWS.

When I look at the log file in /var/log/neo4j/ there's no information there (just that the database has launched or not) and there is no other log I can access or I don't know where it is.

Could it be an issue with permissions of the folder where the backup is made?

Could it be I have to run neo4j-admin backup using systemctl?

Could you please provide some help on this because this topic is not so well documented in your manual and I think it's very important.

I tried it from another machine, locally, and it can go further but then gives us this error:

2020-05-11 19:20:23.480+0000 INFO [o.n.c.s.StoreCopyClient] Copying index/lucene/relationship/TO/_qmb8.si
2020-05-11 19:20:23.482+0000 INFO [o.n.c.s.StoreCopyClient] Copied index/lucene/relationship/TO/_qmb8.si 427.00 B
2020-05-11 19:20:23.482+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore
2020-05-11 19:20:23.483+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore 16.00 kB
2020-05-11 19:20:23.483+0000 INFO [o.n.c.s.StoreCopyClient] Done, copied 637 files
2020-05-11 19:20:23.590+0000 INFO [o.n.b.BackupService] Start receiving transactions from 3794881
2020-05-11 19:20:32.540+0000 INFO [o.n.b.BackupService] Finish receiving transactions at 3794881
2020-05-11 19:20:32.572+0000 INFO [o.n.b.BackupService] Start recovering store
command failed: Backup failed: Error starting org.neo4j.com.storecopy.ExternallyManagedPageCache$GraphDatabaseFactoryWithPageCacheFactory$1, /Volumes/Extreme SSD/Backup/main-graph.db-backup/temp-copy

Hi, you could try with this program locally... I developed that utility..

https://github.com/jggomez/neo4j-backup.

I hope can help you

1 Like

Yes, thank you, @jggomez, I saw this, but I don't want it to run locally plus it's using the same command neo4j-admin backup inside your script, and that is not working for me.

I also want to get a conclusive answer from Neo4J engineers: does the backup, which is a feature (apart from clustering) setting the Enterprise version apart from Community, actually work? Or only in some cases and sometimes? And the 3 pages of documentation that exist on it is all there is to understand how it works?

I'm not new to Neo4J but these 5 days I'm trying to make this simple task of online backup work is the longest stretch I've ever had so far with this technology and my experience is that it's super buggy and unreliable with not enough options and insufficiently documented too.

Now, even if it happens (1 in 10 times) that the process goes to its completion, at the stage where I am at

2020-05-12 15:25:44.046+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore
2020-05-12 15:25:44.048+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore 16.00 kB
2020-05-12 15:25:44.049+0000 INFO [o.n.c.s.StoreCopyClient] Done, copied 642 files
2020-05-12 15:25:44.202+0000 INFO [o.n.b.BackupService] Start receiving transactions from 3796108
2020-05-12 15:25:46.884+0000 INFO [o.n.b.BackupService] Finish receiving transactions at 3796108
2020-05-12 15:25:46.920+0000 INFO [o.n.b.BackupService] Start recovering store

I get this error after:

command failed: Backup failed: Error starting org.neo4j.com.storecopy.ExternallyManagedPageCache$GraphDatabaseFactoryWithPageCacheFactory$1

I saw a post about it on https://github.com/neo4j/neo4j/issues/11992 and changed the max open files on my system, but that didn't help either.

The last records in the log on the server I am backing up are:

2020-05-12 15:25:42.811+0000 INFO [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 3796320 to [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.b], from [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.a].
2020-05-12 15:25:42.819+0000 INFO [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 3796320 to [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.b], from [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.a].
2020-05-12 15:25:47.723+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]:  Store flush completed
2020-05-12 15:25:47.723+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]:  Starting appending check point entry into the tx log...
2020-05-12 15:25:47.724+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]:  Appending check point entry into the tx log completed
2020-05-12 15:25:47.725+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]:  Check pointing completed
2020-05-12 15:25:47.725+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [898]:  Starting log pruning.
2020-05-12 15:25:47.728+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [898]:  Log pruning complete.
2020-05-12 15:30:25.788+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 204ms.

The local backup log (where I'm backing up to) doesn't have any errors.

@deemeetree it might be worth double-checking the file limit increase is in effect. IIRC we edited the neo4j-admin script to print ulimit -a. Also note that 65535 has not always been sufficient for us. I would try a much larger value to rule that out as a root cause.

Somebody from Neo4J β€” could you please respond and advise?

Can you set debug level for logging and then provide the log file(s)?

dbms.logs.debug.level=DEBUG

And set the env variable NEO4j_DEBUG to true

If you don't want to provide the log file(s) here, you can send them to the Intercom ticket you opened for this case also.

Elaine

Hi Elaine,

Thank you for responding. I'm actually communicating with you through DM on Twitter but I guess you are receiving it through Intercom, right?

Could you please tell me if I need to set this up as you advised above on the database I am backing up or on the remote system I'm using to do the backup? Or on both?

Thanks

To be sure, you are using the same version of Neo4j on the system you are backing up and the system where you are running neo4j-admin backup from correct?

Since you are saying that it appears to do the backup and then the files disappear, I would say that what you need to look at is debugging the system from where. you are executing the backup command from. The server doesn't seem to be the problem.

That being said, you do not have a log file or debug log file on the system from where you are running neo4j-admin backup so perhaps setting the env variable will help you, but changing anything in the neo4j.conf will not as you do not use a local Neo4j instance to perform the backup.

Does the server that you want to back up need to be online 24x7? Another option you could try is to shut down the server that. you want to back up and try neo4j-admin dump to at least get a dump file for the database.

Elaine

Hello Elaine,

The whole point of me switching to the enterprise version was to be able to do online backups. So I want to be able to do those.

Regarding the backup files β€” I already provided all the data from all the sources (both remote and local) above.

It looks like the backup feature in Neo4J Enterprise works really badly and is super unstable.

I guess I should just switch back to Community and do offline backups as before, right?

For now, can you back up on the same system as the server just to make sure that the backup works locally? Then copy the backup files to a different system. This will at least enable you to backup your database without any interruption of service.

Elaine

I can try to do it on the same system but do you know if it's going to slow down my app / database drastically comparing to remote backup? And how can I ensure it doesn't happen? Thanks!

Also β€” is it possible to do an offline backup, then copy it to a remote location, and then do online incremental backups on that offline backup?

Also, I'm doing it locally and it just crashes my database with a message

command failed: Backup failed: Unexpected Exception

Can you send the log file for this time-period where the local backup failed?

Elaine