"Not a valid Neo4j archive"

Hello

I have setup two separate Neo4j servers, both of version 4.4.0.

I am using neo4j-admin dump from one to create an archive to then use neo4j-admin load on the other to transfer specific databases between them.

The tool fails with "Not a valid neo4j archive" when attempting to load the data to the second instance. I have come across this "fix" but it does not work.

I have also tried to list the contents of the backup archive via the gzip tool and a custom Python script and both give the same error, leading me to believe that indeed, the neo4j-admin tool is producing an invalid file.

There is also the point of which compressor is used in the end by neo4j-admin? Is it zstd, gzip or something else?

If i try to list files with either the gzip or zstd command line utilities, I am getting the same error in both of them ("No, this is not a gzip file", "No, this is not a zstd file")...

Is it possible to get some clarity on these issues?

(There are no logs from the servers because they are shutdown in both cases during the whole backup-restore process, but here is what --verbose from neo4j-admin says:

org.neo4j.cli.CommandFailedException: Not a valid Neo4j archive: ./backup
        at org.neo4j.commandline.dbms.LoadCommand.load(LoadCommand.java:155)
        at org.neo4j.commandline.dbms.LoadCommand.execute(LoadCommand.java:85)
        at org.neo4j.cli.AbstractCommand.call(AbstractCommand.java:60)
        at org.neo4j.cli.AbstractCommand.call(AbstractCommand.java:30)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1743)
        at picocli.CommandLine.access$900(CommandLine.java:145)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2101)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2068)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1935)
        at picocli.CommandLine.execute(CommandLine.java:1864)
        at org.neo4j.cli.AdminTool.execute(AdminTool.java:78)
        at org.neo4j.cli.AdminTool.main(AdminTool.java:59)
Caused by: org.neo4j.dbms.archive.IncorrectFormat: ./backup
        at org.neo4j.dbms.archive.Loader.openArchiveIn(Loader.java:172)
        at org.neo4j.dbms.archive.Loader.load(Loader.java:74)
        at org.neo4j.commandline.dbms.LoadCommand.load(LoadCommand.java:131)
        ... 11 more
Caused by: java.io.IOException: Decompression error: Unknown frame descriptor
        at com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:147)
        at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:107)
        at java.base/java.io.FilterInputStream.read(FilterInputStream.java:107)
        at org.neo4j.dbms.archive.CompressionFormat$2.decompress(CompressionFormat.java:79)
        at org.neo4j.dbms.archive.CompressionFormat.decompress(CompressionFormat.java:148)
        at org.neo4j.dbms.archive.CompressionFormat.decompress(CompressionFormat.java:125)
        at org.neo4j.dbms.archive.Loader.openArchiveIn(Loader.java:156)
        ... 13 more
        Suppressed: java.util.zip.ZipException: Not in GZIP format
                at java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:166)
                at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:80)
                at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:92)
                at org.neo4j.dbms.archive.CompressionFormat$1.decompress(CompressionFormat.java:52)
                at org.neo4j.dbms.archive.CompressionFormat.decompress(CompressionFormat.java:148)
                at org.neo4j.dbms.archive.CompressionFormat.decompress(CompressionFormat.java:132)
                ... 14 more

Any ideas on this one?

All the best
AA

This does appear to be an issue with version differences.

I don't know if there's a general behavior that newer snapshot versions will not work on older database versions, but these are my observations about what works / doesn't:

  • snapshot from 4.4.5 does not work with 4.3.6
  • snapshot from 4.4.5 works fine with 4.4.7

I got this working by setting up a new GCP instance – not from the public image but instead just an empty VM. After various setup steps – add instance to firewall group, create "neo4j" user, download my snapshot from GCS, download JDK 11 from Archived OpenJDK GA Releases, install JDK, download Neo4j community 4.4.7 – I was able to load the snapshot taken earlier from 4.4.5 community.

$ neo4j-admin load --database=neo4j --force --from=snapshot.dump
Selecting JVM - Version:11.0.2+9, Name:OpenJDK 64-Bit Server VM, Vendor:Oracle Corporation
Done: 46 files, 1.416GiB processed.

I'm running into this issue while attempting to load a snapshot (taken from Community 4.4.5 running on Mac) into a GCP instance which was created with the latest public image (for 4.3.6; specifically, this: "neo4j-community-1-4-3-6-apoc").

I've verified that the md5 sums are the same on originating machine, as well as the GCP VM where the load attempt runs – using md5 on Mac and md5sum on GCP instance; both return "cbccec60523ddbae1a016a19afd3b785".

Here's what a load attempt looks like on the GCP VM:

$ whoami
neo4j

$ /usr/share/neo4j/bin/neo4j-admin load --database=neo4j --force --from=snapshot.dump
Selecting JVM - Version:11.0.15, Name:OpenJDK 64-Bit Server VM, Vendor:Private Build
Not a valid Neo4j archive: snapshot.dump

Perhaps this is due to version differences? I think my next step would be to upgrade the GCP instance from 4.3.6 to 4.4.7 and try to reload one more time. That seems like a straightforward idea, though the pre-built Neo4j install is in several places which doesn't align with upgrade steps (https://neo4j.com/docs/upgrade-migration-guide/current/upgrade/upgrade-4.4/deployment-upgrading/) so it might be a bit of exploration and trial-and-error.

$ ls /usr/share/neo4j/
bin data lib logs run tools

$ ls /var/lib/neo4j/
certificates conf data import labs licensing logs metrics plugins

$ ls /etc/neo4j
neo4j.conf pre-neo4j.sh

Any other suggestions to try?