Neo4j Consistency Check not terminating

Hi,

We are using neo4j 3.4 Enterprise version. And perform online backups every 2 days.

With the growth of our data set to 600 GB our backups take 4-6 hours (acceptable) but the consistency check doesnt terminate even if the process runs for > 2 days.

As noted in the verbose logging; there is no progress after this log, although the process is still running. -

2018-09-10 22:00:26.081+0000 INFO [o.n.c.ConsistencyCheckService] Counts:
  6071569156 skipCheck
  2165146025 missCheck
  4510949640 checked
  6071569156 correctSkipCheck
  959792671 skipBackup
  3660483406 overwrite
  60004953 activeCache
  60004953 clearCache
  996949595 relSourcePrevCheck
  910714774 relSourceNextCheck
  1722194037 relTargetPrevCheck
  881091234 relTargetNextCheck
  4617231658 forwardLinks
  5313816270 backLinks
  651470868 nullLinks
2018-09-10 22:00:26.084+0000 INFO [o.n.c.ConsistencyCheckService] Memory[used:414.96 MB, free:186.04 MB, total:601.00 MB, max:3.38 GB]
2018-09-10 22:00:26.084+0000 INFO [o.n.c.ConsistencyCheckService] Done in  11h 1m 11s 115ms
2018-09-10 22:35:43.578+0000 INFO [o.n.c.ConsistencyCheckService] === RelationshipGroupStore-RelGrp ===
2018-09-10 22:35:43.579+0000 INFO [o.n.c.ConsistencyCheckService] I/Os
RelationshipStore
  Reads: 36096769
  Random Reads: 34901868
  ScatterIndex: 96
NodeStore
  Reads: 28126427
  Random Reads: 25762015
  ScatterIndex: 91
RelationshipGroupStore
  Reads: 24644914
  Random Reads: 12712054
  ScatterIndex: 51

2018-09-10 22:35:43.581+0000 INFO [o.n.c.ConsistencyCheckService] Counts:
2018-09-10 22:35:43.584+0000 INFO [o.n.c.ConsistencyCheckService] Memory[used:395.51 MB, free:205.49 MB, total:601.00 MB, max:3.38 GB]
2018-09-10 22:35:43.585+0000 INFO [o.n.c.ConsistencyCheckService] Done in  35m 17s 501ms


We are running our backups on AWS m5.xlarge and it is a dedicated machine for backups.

These are the comands we are using -

$NEO4J_ADMIN_PATH backup --from=$NEO4J_DB --backup-dir=$BACKUP_DIR --name=$BACKUP_NAME --fallback-to-full=true --check-consistency=false --pagecache=4G  >> $LOG_PATH

echo "$(date +"%m.%d.%Y %H:%M:%S") INFO: Starting consistency check" >> $LOG_PATH
    
$NEO4J_ADMIN_PATH check-consistency --backup $BACKUP_DIR/$BACKUP_NAME --verbose true >> $LOG_PATH

Could you please provide some insight into why it is taking so long for us to take backups.
Are there any additional configurations we could use to speed it up.

Thanks!

Consistency checking on a huge graph takes a lot of time. You can speed up that process if your page-cache setting is larger than your graph size. Otherwise the consistency checker will perform a lot of seek operations on your disc. Having locally attached SSDs does help as well. AWS i3 seems to offer this: Now Available: Amazon EC2 I3 Instances, next-generation Storage Optimized High I/O instances - I've never used it myself.

You can also fine-tune the checks being performed by consistency checker, see the -cc-XXXXX options at https://neo4j.com/docs/operations-manual/current/backup/perform-backup/

Another strategy is to skip consistency checking during daily backup and run the consistency check e.g. only once per week.

1 Like

Hey Stefan,

Thanks for the pointers.

In which scenarios does the consistency check fail?

Is the page cache setting applicable to the consistency check only ; or does it also affect the perform backup step?

Thanks!

Not 100% sure here but I assume running backup without consistency checking won't require a large page cache - your're just dumping the files to disc.

If you see consistency checker getting stuck you can grab a thread dump using jstack or kill -3 <pid>.