Apoc.export consistency

apoc

(Yves Renard) #1

Hello !

I am using neo4j 3.5.0.

I am exporting my whole database with the following APOC command :
CALL apoc.export.cypher.all('/tmp/export.cypher',{format:'cypher-shell'});

I'm wondering about consistency of this export : if a node or a link is created or modified in my database during de apoc.export.cypher.all, will this creation or modification be included in my export ?

(=> If the apoc.export includes modifications that have occured during apoc.export, It means that apoc.export is not consistent and that I have to close access to my database during export.)

I'm unable to find this answer by myself. Any answer/help would be greatly appreciated !

Thanks in advance,

Yves.


(M. David Allen) #2

The short answer is no, or at least "not necessarily".

Here's why. First, you can see the implementation of how APOC does the export here: https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.4/src/main/java/apoc/export/cypher/MultiStatementCypherSubGraphExporter.java#L59

Neo4j clusters follow a model called "causal consistency" which you can read about here: https://neo4j.com/docs/operations-manual/current/clustering/introduction/#causal-consistency-explained

The short short version of that is that when you start a read transaction (and an export, or series of batches in an export would qualify) you are guaranteed to read all writes you caused, as well as all writes committed as of the bookmark point where you're querying.

But imagine this scenario: you have a large DB, and the export takes 5 minutes. You launch the export at time t0. 2 minutes later, a write comes in. Does that change show up in the export? The answer is probably not because it occurred after the read transaction began. I didn't look too deeply under the covers, it's possible that in the underlying implementation it does multiple read transactions, and then the answer would be maybe that write would be in, but it would depend on timing and you should not assume that it will be in.

If you need this level of consistency, then what I think your'e looking for is online backups, paired with an incremental backup strategy:

https://neo4j.com/docs/operations-manual/current/backup/performing/

Note that probably no approach will guarantee that every single write that occurred during the export is included. If it did, you could have the possibility of a backup that might never terminate. Imagine this: on a busy transactional system, you start an export. While it's running 10 new TXs come in. So you begin to export those. But while you're doing that, another 20 come in! Now you're falling behind, and if you insist on exporting everything, you may never terminate.