Neo4j-admin import fails on specific ID duplicates

Hey there,

I stumbled while importing data via the neo4j-importer using CSV files that contain duplicates.
It seems like there is an issue with the length and/or the character set of the used IDs. I ran several tests to find a specific pattern, but I can't see any. Maybe you guys do, or maybe there are other restrictions regarding the IDs that I'm not aware of.

System:

neo4j 4.4.0 (Ubuntu Desktop AppImage)
VM Name: OpenJDK 64-Bit Server VM
VM Vendor: Azul Systems, Inc.
VM Version: 11.0.8+10-LTS
JIT compiler: HotSpot 64-Bit Tiered Compilers
VM Arguments: [-Xmx6291456k, -XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -XX:MaxInlineLevel=15, -XX:-UseBiasedLocking, -Djdk.nio.maxCachedBufferSize=262144, -Dio.netty.tryReflectionSetAccessible=true, -Djdk.tls.ephemeralDHKeySize=2048, -Djdk.tls.rejectClientInitiatedRenegotiation=true, -XX:FlightRecorderOptions=stackdepth=256, -XX:+UnlockDiagnosticVMOptions, -XX:+DebugNonSafepoints, -Dlog4j2.disable.jmx=true, -Dfile.encoding=UTF-8]

The command used for all tests:

./neo4j-admin import \
--verbose \
--skip-duplicate-nodes=true \
--nodes nodes.csv

The error that is thrown in all non-working cases:

Available resources:
  Total machine memory: 31.14GiB
  Free machine memory: 5.117GiB
  Max heap memory : 6.000GiB
  Processors: 12
  Configured max memory: 22.63GiB
  High-IO: true

Nodes, started 2021-12-16 07:55:24.521+0000
[*Nodes:?? 1.004GiB---------------------------------------------------------------------------]    0 ∆    0
Done in 28ms
Prepare node index, started 2021-12-16 07:55:24.556+0000
Critical error occurred! Shutting down the import...
[*RESOLVE (~2 collisions):1.004GiB------------------------------------------------------------]    0 ∆    0
Done in 61ms

IMPORT FAILED in 216ms. 
Data statistics is not available.
Peak memory usage: 1.004GiB
Import error: DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
Caused by:DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
org.neo4j.kernel.impl.store.InvalidRecordException: DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
        at org.neo4j.kernel.impl.store.record.RecordLoad.verify(RecordLoad.java:141)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.verifyAfterReading(CommonAbstractStore.java:1074)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.readRecordFromPage(CommonAbstractStore.java:897)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.readIntoRecord(CommonAbstractStore.java:850)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecordByCursor(CommonAbstractStore.java:830)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.streamRecords(CommonAbstractStore.java:1003)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecords(CommonAbstractStore.java:979)
        at org.neo4j.kernel.impl.store.PropertyStore.ensureHeavy(PropertyStore.java:300)
        at org.neo4j.kernel.impl.store.PropertyStore.getTextValueFor(PropertyStore.java:700)
        at org.neo4j.kernel.impl.store.PropertyType$9.value(PropertyType.java:129)
        at org.neo4j.kernel.impl.store.record.PropertyBlock.newPropertyValue(PropertyBlock.java:280)
        at org.neo4j.internal.batchimport.NodeInputIdPropertyLookup.lookupProperty(NodeInputIdPropertyLookup.java:59)
        at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:527)
        at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:263)
        at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54)
        at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53)
        at java.base/java.lang.Thread.run(Thread.java:834)

WARNING Import failed. The store files in /home/dbr/.config/Neo4j Desktop/Application/relate-data/dbmss/dbms-49a06a75-bf53-4257-b193-a193c9e41556/data/databases/neo4j are left as they are, although they are likely in an unusable state. Starting a database on these store files will likely fail or observe inconsistent records so start at your own risk or delete the store manually
org.neo4j.kernel.impl.store.InvalidRecordException: DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
        at org.neo4j.kernel.impl.store.record.RecordLoad.verify(RecordLoad.java:141)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.verifyAfterReading(CommonAbstractStore.java:1074)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.readRecordFromPage(CommonAbstractStore.java:897)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.readIntoRecord(CommonAbstractStore.java:850)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecordByCursor(CommonAbstractStore.java:830)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.streamRecords(CommonAbstractStore.java:1003)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecords(CommonAbstractStore.java:979)
        at org.neo4j.kernel.impl.store.PropertyStore.ensureHeavy(PropertyStore.java:300)
        at org.neo4j.kernel.impl.store.PropertyStore.getTextValueFor(PropertyStore.java:700)
        at org.neo4j.kernel.impl.store.PropertyType$9.value(PropertyType.java:129)
        at org.neo4j.kernel.impl.store.record.PropertyBlock.newPropertyValue(PropertyBlock.java:280)
        at org.neo4j.internal.batchimport.NodeInputIdPropertyLookup.lookupProperty(NodeInputIdPropertyLookup.java:59)
        at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:527)
        at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:263)
        at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54)
        at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53)
        at java.base/java.lang.Thread.run(Thread.java:834)

Test cases

As mentioned before, I tried to figure out a pattern. So there are some ridiculous test cases (test data):

Does not work (44 chars)

"id:ID","name",":LABEL"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr","Tom","Person"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr","Tina","Person"

Does work (43 chars)

"id:ID","name",":LABEL"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopq","Tom","Person"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopq","Tina","Person"

Does work (44 chars) (length not problem?)

"id:ID","name",":LABEL"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does work (54 chars)

"id:ID","name",":LABEL"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does not work (55 chars)

"id:ID","name",":LABEL"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does work (54 chars) [a-f]

"id:ID","name",":LABEL"
"abcdefaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"abcdefaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does not work (54 chars) [a-g]

"id:ID","name",":LABEL"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does work (43 chars (44 chars does not work)) [a-g]

"id:ID","name",":LABEL"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

This looks like a bug related to the --skip-duplicate-nodes flag. Could you report an issue to Issues · neo4j/neo4j · GitHub and link it back here?

Best,
ABK

Github Issue: #12800

1 Like