Stuck Remote Neo4j Backups

We've started seeing sporadic "stuck" backups using the Java libs:

  // https://mvnrepository.com/artifact/org.neo4j/neo4j-backup
  implementation "org.neo4j:neo4j-backup:3.4.9"

Targeting a Neo4j Enterprise 3.4.3 we have this snippet:

  @Override
  public void performBackup(BackupConfig config) {
    Neo4jBackupConfig backupConfig = (Neo4jBackupConfig) config;
    OnlineBackup onlineBackup = getInstance(backupConfig);
    OnlineBackup result = onlineBackup.backup(createBackupDirectory(backupConfig), Neo4jConstants.VERIFY_BACKUP)
        .gatheringForensics(Neo4jConstants.GATHER_FORENSICS)
        .withTimeout(Neo4jConstants.TIMEOUT_MS);
    if (!Optional.ofNullable(result).isPresent()) {
      throw new AssertionError("Backup failed. Please see attached log.");
    }
    if (!result.isConsistent()) {
      throw new AssertionError("Backup is inconsistent. Please see attached log.");
    }
  }
  public static final Boolean VERIFY_BACKUP = Boolean.TRUE;
  public static final Boolean GATHER_FORENSICS = Boolean.TRUE;
  // 5 minute timeout
  public static final Long TIMEOUT_MS = 300000L;

And all we see from stdout logs the consistency check seems to stop w/o any corresponding memory / CPU spikes:

2020-01-18 12:41:42.192+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore
2020-01-18 12:41:42.192+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore 8.00 kB
2020-01-18 12:41:42.192+0000 INFO [o.n.c.s.StoreCopyClient] Done, copied 711 files
2020-01-18 12:41:52.332+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_4[v0.A.9] record format
2020-01-18 12:41:52.332+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_4[v0.A.9]
....................  10%
....................  20%
....................  30%
....................  40%
....................  50%
....................  60%
....................  70%
....................  80%
....................  90%
...................Checking node and relationship counts
....................  10%
....................  20%
....................  30%
....................  40%
....................  50%
....................  60%
....................  70%
....................  80%
....................  90%
.................... 100%

Hello Mike,

How large is your backup?
The consistency check will take some memory and can be done at a later stage or on a different environment if it's large.

Can you retry with this:

  public static final Boolean VERIFY_BACKUP = Boolean.FALSE;

You can do it manually with the built-in consistency checker tool:

Kind regards,
J

Hi Jéremie,

On average the backup is around 100 Mb and tarballed a bit less than that. We set each backup job to use 2 Gb of memory but aren't seeing any memory issues / OOM errors. I'll try turning off the consistency checker as we run this every backup hourly in an attempt to verify the integrity of the backup. Looking at the underlying infrastructure this might be a compute problem with bursting CPUs getting throttled.

Does the consistency checker utilize a lot of CPU that we're not accounting for?

Thanks,
Mike

You are using version 3.4.9 which will be out of support in a couple of weeks.
On large graphs, this can be an issue:

Among many other things, this is was adressed and improved in the next major version.
It might be worth having a look.

We're investigating the migration path for going to 4.0. Is that what you mean by major version or is there a smaller hop for the neo4j-backup version train? We run the consistency checker on every backup before tarballing and shipping to S3 as a sanity check.