Inconsistent database after neo4j-admin import in docker

cuenta_chou · November 12, 2020, 1:30pm

Dear all,
I am using neo4j v4.1.3 windows and docker versions (I have tried both community and enterprise editions), and I have the following issue:

If I execute neo4j-admin import with a set of csv files (around 20 different files containing nodes and 1 with relationships, 8M nodes and 6M relationships in total) on my windows machine, everything works find and the data gets imported into a new database.
But when I execute the exact same command with the exact same csv files on the docker version, the import process finished without errors but the data gets corrupted.
If I run the neo4j-admin check-consistency after having done the import, I get tons of errors saying that the next properties did not have this property as its previous records and many other errors that prevent me to use the database.
Same product version, same csv file and same command result in two completely different outcomes when run on two different platforms (Windows and Linux).
Any ideas?
Thanks

Joel · November 13, 2020, 9:40pm

Could you elaborate on the setup? Are you using the neo4j-admin inside docker to create a database, or another copy somewhere else (e.g. installed on the host). I use a multi-step process to avoid installing any neo4j components on the host. I didn't see you mention that so I guess you are using a separate neo4j installation. Could you double check that the versions are the same for both docker and the neo4j-admin you are using?

cuenta_chou · November 23, 2020, 3:12pm

Hi,
Thanks for your message.
I am using neo4j-admin inside docker, that's right. This is my setup:
-I have a docker container running latest enterprise image (v4.2.0), and persisting the data, plugin and import folders into an external volume.
-I copy the csv files into the import folder
-I connect to the docker container and run the neo4j-admin import command directly there (is there any other way of executing this command?)

The neo4j-admin version is the one contained on the neo4j docker image, both are v4.2.0.

The import process returns no errors, I am using the skip-duplicate-nodes and skip-bad-relationships parameters both in Windows and docker. My Windows machine has more resources available (32Gb RAM and 4 cores), while my docker instance has 5Gb RAM and 2 cores.

So, after having run the exact same command on both servers, the output is the same and no critical errors are reported. But when I run the neo4j-admin check-consistency command, I get many critical errors on the docker instance reporting existing relationships with either source or destination node missing.
Any clues?
Many thanks.

Joel · November 23, 2020, 5:24pm

I've never used consistency check, I guess it is primarily for checking backups?

Below is a snippet from my script that creates neo4j database with the utility inside docker...
Note: EPHERMERALCONTAINER is not equal to CONTAINER, since one can not use neo4j-admin import on the running database that neo4j docker immediately creates when it starts up.

docker exec -t \
   ${EPHEMERALCONTAINER} bin/neo4j-admin import \
  --database="${CONTAINER}" \
  --verbose \
  --skip-bad-relationships=true \
  --skip-duplicate-nodes=true \
  --ignore-empty-strings=true \
  --normalize-types=true \
  --trim-strings=false \
  --delimiter "\t" \
  ...
  ...

I just tried to run a consistency check on my database right after creation like this (snippet below) but it throws an error (something about issues with the index files not existing, and they probably don't yet, this newly created database has never been in the "running" state yet). Now I have a bit of catch 22, what I normally do is shutdown the ephemeral neo4j docker instance, and then start up the actual (using the database just created), that would probably create the remaining files required, but one must

neo4j stop

in order to run a consistency check, but docker will immediately exit if I do that.

How are you able to run the consistency check? you start up the database, and then stop it, and use another separate docker instance?

cuenta_chou · November 24, 2020, 6:53am

Hi, thanks for your prompt response.
Yes, this is the only way I have to check the consistency. First I run the import to create the new database, then I start the database so the indices are created as well, then I stop it and finally I run the consistency-check.
I have just checked I am using the skip-duplicate-nodes and skip-bad-relationship properties. I am not using the ignore-empty-strings or any of the other ones. The relevant issue is that the exact same command ran on Windows creates a consistent database, but the docker version seem to have some issues and some inconsistencies are created as well corrupting the database.
I have double-checked the import.report that gets created with the import command and both in Windows and docker the number of reported errors while importing is exactly the same, so I'm guessing that the number of skipped records is the same.
The main errors I keep on having on docker are:
-The source node is not in use
-The target node is not in use

Any clues?
Thanks!

Joel · November 24, 2020, 2:53pm

I'd like to quick double check on something, when you start the database have you looked inside to check it yourself? and checked the import report-file file contains no errors? I drop my import report in the mounted /import folder with

--report-file=importreport.txt

Manually run queries to check node and relationship counts are correct, spot check some example properties then while the database is still running check the docker logs

docker logs containername

I'm going to guess the database is fine?

cuenta_chou · November 26, 2020, 12:40pm

Hi,
If I start the database everything works until I try to query some of the links affected. I am using a tool called Linkurious that consumes the database, and one of the first things to do after setting the connection is to index the database. When I try to create this index is when I get lots of errors and the whole process fails.
So... the database seems to be partially corrupted, and as long as you don't consume any of the affected nodes/links, there are no errors. The moment you hit one of the corrupted links/nodes, you get the errors. If you run the check-consistency command, you get a big list of errors listed.
My feeling is that for some reason the docker version is interpreting some special characters on the csv files slightly different than the Windows one, and this is causing it not to skip the bad-relationships the same way, thus generating some bad relationships.
I will try again using the parameters you have suggested on your previous comment: --ignore-empty-strings and trim-strings.
Thanks

Joel · November 29, 2020, 3:04am

It is possible there is a rare bug, but from my experience (with a lot of different types of dirty data) every time I had an issue with the data neo4j-admin import threw errors either back to the console, or into the import log. You may have done this already, but at times I've had to step through each step one a time to make sure I don't miss any error messages. For good measure I'd also check docker logs containername and the neo4j logs inside the container (right after the import command), I map the logs folder to a volume, but if you haven't you can bash into the container to check them like this

docker exec -it containername bash
I have no name!@8cb123ebc39a:/var/lib/neo4j$ cd logs
I have no name!@8cb123ebc39a:/var/lib/neo4j/logs$ ls
debug.log  security.log
cat debug.log
...

cuenta_chou · November 30, 2020, 3:03pm

Hi again,
I tried with the suggested options enabled:

--ignore-empty-strings=true
--normalize-types=true
--trim-strings=false \

and this seems to have solved the issue. Why the Windows version worked fine without these parameters seems to be another discussion...
Many thanks for all the support!
Regards.

Topic		Replies	Views
Admin Import failing for neo4j v4.0 Import / Export	2	556	February 27, 2020
Has the neo4j-admin import tool changed in recent versions? Neo4j Graph Platform import	2	312	December 31, 2021
Create Cypher Overwriting Existing Nodes Operations	13	2000	August 29, 2018
Import csv with neo4j-admin on a docker Neo4j Graph Platform import , migrated	2	364	November 21, 2022
Docker neo4j admin import not working General migrated	2	361	November 27, 2022

Get Certified in June!

Inconsistent database after neo4j-admin import in docker

Related topics