Neo4j admin incremental import into second DB within one DBMS

Hello all,

I'm running in an issue with the incremental input functionality within neo4j 5 admin.

I use the neo4j admin tool for mass data input into a neo4j 5.5.0 DB.

The initial full import with the following command works like a charm as it already did within the 4th versions (command slightly different from 4 versions of course):

bin/neo4j-admin database import full --skip-bad-relationships --nodes=import/file1.csv --nodes=import/file2.csv ... --relationships=import/Relations.csv --ignore-empty-strings=true neo4j

for a number of about 40 files. The initial import goes into the default neo4j DB.

Afterwards I followed the admin manual and created a second DB with:

CREATE DATABASE db0;

I activated this DB and created the required node property uniqueness constraints prior to the following import as defined here:

After shutting the DB down I run the following incremental imports from the terminal

bin/neo4j-admin database import incremental --force --skip-bad-relationships --nodes=import/fileXYZ.csv --ignore-empty-strings=true db0

The header files for this single file are extended as described in the manual:

Incremental import

When using incremental import, you must have node property uniqueness constraints in place for the property key and label combinations that form the primary key, or the uniquely identifiable nodes. For example, importing nodes with a Person label that are uniquely identified with a uuid property key, the format of the header should be uuid:ID{label:Person}.

My CSV header for the first ID column looks like this id:ID{label:myLabel_Name}

Exactly for this label I have created the node property uniqueness constraints described above.

Here is my issue...

When I run the incremental import on the second DB within the DBMS for just one CSV file it works perfectly fine. It creates the new nodes in the second (not default) DB using the incremental admin import. Successful execution with no errors reported. All imported nodes are available in den second DB after starting the DBMS.

But when I run an incremental import command that covers more than one CSV file like this:

bin/neo4j-admin database import incremental --force --skip-bad-relationships --nodes=import/fileXYZ.csv --nodes=import/fileABC.csv --nodes=import/fileDEF.csv ... --relationships=import/RELATIONS.csv --ignore-empty-strings=true db0

I get the following error message:

Import error: Multiple indexes for group global id space
Caused by:Multiple indexes for group global id space
java.lang.IllegalStateException: Multiple indexes for group global id space
        at org.neo4j.util.Preconditions.checkState(Preconditions.java:181)
        at org.neo4j.internal.batchimport.input.csv.CsvInput.lambda$collectReferencedNodeSchemaFromHeader$12(CsvInput.java:417)
        at java.base/java.util.Optional.ifPresent(Optional.java:178)
        at org.neo4j.internal.batchimport.input.csv.CsvInput.collectReferencedNodeSchemaFromHeader(CsvInput.java:398)
        at org.neo4j.internal.batchimport.input.csv.CsvInput.referencedNodeSchema(CsvInput.java:384)
        at com.neo4j.internal.batchimport.ParallelIncrementalBatchImporter.prepare(ParallelIncrementalBatchImporter.java:343)
        at org.neo4j.importer.CsvImporter.doImport(CsvImporter.java:238)
        at org.neo4j.importer.CsvImporter.doImport(CsvImporter.java:182)
        at org.neo4j.importer.ImportCommand$Base.doExecute(ImportCommand.java:380)
        at org.neo4j.importer.ImportCommand$Incremental.execute(ImportCommand.java:532)
        at org.neo4j.cli.AbstractCommand.call(AbstractCommand.java:92)
        at org.neo4j.cli.AbstractCommand.call(AbstractCommand.java:37)
        at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
        at picocli.CommandLine.access$1500(CommandLine.java:148)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
        at picocli.CommandLine.execute(CommandLine.java:2170)
        at org.neo4j.cli.AdminTool.execute(AdminTool.java:94)
        at org.neo4j.cli.AdminTool.main(AdminTool.java:82)

Each CSV file carries nodes of a unique Label. But each file has different labels. The CSV file headers are exactly defined as mentioned in the manual:

id:ID{label:"label matching the content of the file"}, ..., ...

and the matching index for each id property | label combination is populated.

I think I'm missing something to make the incremental admin import deal with multiple CSV files in one go.

Any idea how to get this going for multiple CSV files in one go greatly welcome. I could run each CSV file through a single import call but for about 40 files thats a bit of work if needed on a regular basis.

Cheers

Krid

I found a (possible) workaround that helped in my case...

It appears to me that the incremental admin import has some problems with multiple node csv files that are different in structure and carrying elements of different labels (mixed within each row of the node csv files and mixed within start and end nodes in the relationship csv file).

All possible options I tried (with or without grouped ID spaces, label information within ID column, group labeled relations csv ID column entries etc...) nothing worked. I always ended up with more or less helpful error messages.

My solution to get this thing going:

I created one single node csv file out of all my initaly different csv files. Having different csv column structures within each initial csv file, I was forced to create as many unique columns as needed and left the row cells empty for each column that was not applicable.

So I ended up with a huge sparse matrix like structure within one csv file.

The only requirement to go down this path is that each column has a unique type (which is the case for my import) to set the import value right during the incremental import run.

Then I created one simple relationship csv file as defined for the admin import.

There is one more strange thing that needs to be done to get this whole thing flying. The incremental admin import requires "node property uniqueness constraints" to be set. Otherwise this thing will not even start. But what constraint should you set if all this stuff is now cramped in one single file with multiple labels?

The answer is: it doesn't matter. Just pick one. Preferable one for a label that is in your import so you don't end up with an extra "chip" in the "Nodes" section in neo4j Browser. But I noticed that in the end it doesn't matter. You could even pick a name like "Foo" if you like. It appears that the import just looks for the existence but isn't doing anything with it.

The only thing that is required is that the label for which the constraint is set is added to the first column header in the nodes.csv file as id:ID{label:Foo}

As long as the :LABELS column cell in the nodes csv is set with whatever label is applicable for the specific row the correct label is set in neo4j during the import.

One more thing: set "--ignore-empty-strings=true" to avoid processing the sparse cells of the "matrix".

Then the whole thing worked like expected and incrementally imported all the csv data into an existing neo4j.

It is definetly a strange behaviour. Maybe someone from neo4j can turn some lights on this issue. But for the time being I'm happy to be able to import my stuff into an existing DB using the incremental Admin update.

Krid