The two notable things that you're mentioning here are
You're backing up Neo4j 3.3 with Neo4j 3.5 tools. I believe there were some store upgrade changes between these versions and so I would not advise that....have you tried using 3.3 tooling?
The concrete error that you've provided suggest there's a network interruption that's happening some place. It's tough to see what's happening without a full paste of the output of the command, and some knowledge of what's happening on the network between you and the database.
Just to clarify once again: the files are being copied to the temp-copy folder inside the folder (backup) where I am making a backup to. But either there is an error during the backup, or if the backup is done (judging by the size of the temp-copy, I guess at the stage where it's supposed to finalize the "write" it just erases everything and the process stalls everything disappears.
I'm doing a backup from a WebFaction server (production) to a Neo4J db hosted on AWS.
When I look at the log file in /var/log/neo4j/ there's no information there (just that the database has launched or not) and there is no other log I can access or I don't know where it is.
Could it be an issue with permissions of the folder where the backup is made?
Could it be I have to run neo4j-admin backup using systemctl?
Could you please provide some help on this because this topic is not so well documented in your manual and I think it's very important.
Yes, thank you, @jggomez, I saw this, but I don't want it to run locally plus it's using the same command neo4j-admin backup inside your script, and that is not working for me.
I also want to get a conclusive answer from Neo4J engineers: does the backup, which is a feature (apart from clustering) setting the Enterprise version apart from Community, actually work? Or only in some cases and sometimes? And the 3 pages of documentation that exist on it is all there is to understand how it works?
I'm not new to Neo4J but these 5 days I'm trying to make this simple task of online backup work is the longest stretch I've ever had so far with this technology and my experience is that it's super buggy and unreliable with not enough options and insufficiently documented too.
The last records in the log on the server I am backing up are:
2020-05-12 15:25:42.811+0000 INFO [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 3796320 to [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.b], from [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.a].
2020-05-12 15:25:42.819+0000 INFO [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 3796320 to [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.b], from [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.a].
2020-05-12 15:25:47.723+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]: Store flush completed
2020-05-12 15:25:47.723+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]: Starting appending check point entry into the tx log...
2020-05-12 15:25:47.724+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]: Appending check point entry into the tx log completed
2020-05-12 15:25:47.725+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]: Check pointing completed
2020-05-12 15:25:47.725+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [898]: Starting log pruning.
2020-05-12 15:25:47.728+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [898]: Log pruning complete.
2020-05-12 15:30:25.788+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 204ms.
The local backup log (where I'm backing up to) doesn't have any errors.
@deemeetree it might be worth double-checking the file limit increase is in effect. IIRC we edited the neo4j-admin script to print ulimit -a. Also note that 65535 has not always been sufficient for us. I would try a much larger value to rule that out as a root cause.
Thank you for responding. I'm actually communicating with you through DM on Twitter but I guess you are receiving it through Intercom, right?
Could you please tell me if I need to set this up as you advised above on the database I am backing up or on the remote system I'm using to do the backup? Or on both?
To be sure, you are using the same version of Neo4j on the system you are backing up and the system where you are running neo4j-admin backup from correct?
Since you are saying that it appears to do the backup and then the files disappear, I would say that what you need to look at is debugging the system from where. you are executing the backup command from. The server doesn't seem to be the problem.
That being said, you do not have a log file or debug log file on the system from where you are running neo4j-admin backup so perhaps setting the env variable will help you, but changing anything in the neo4j.conf will not as you do not use a local Neo4j instance to perform the backup.
Does the server that you want to back up need to be online 24x7? Another option you could try is to shut down the server that. you want to back up and try neo4j-admin dump to at least get a dump file for the database.
For now, can you back up on the same system as the server just to make sure that the backup works locally? Then copy the backup files to a different system. This will at least enable you to backup your database without any interruption of service.
I can try to do it on the same system but do you know if it's going to slow down my app / database drastically comparing to remote backup? And how can I ensure it doesn't happen? Thanks!