cancel
Showing results for 
Search instead for 
Did you mean: 

Neo4j-admin import fails on specific ID duplicates

Hey there,

I stumbled while importing data via the neo4j-importer using CSV files that contain duplicates.
It seems like there is an issue with the length and/or the character set of the used IDs. I ran several tests to find a specific pattern, but I can't see any. Maybe you guys do, or maybe there are other restrictions regarding the IDs that I'm not aware of.

System:

neo4j 4.4.0 (Ubuntu Desktop AppImage)
VM Name: OpenJDK 64-Bit Server VM
VM Vendor: Azul Systems, Inc.
VM Version: 11.0.8+10-LTS
JIT compiler: HotSpot 64-Bit Tiered Compilers
VM Arguments: [-Xmx6291456k, -XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -XX:MaxInlineLevel=15, -XX:-UseBiasedLocking, -Djdk.nio.maxCachedBufferSize=262144, -Dio.netty.tryReflectionSetAccessible=true, -Djdk.tls.ephemeralDHKeySize=2048, -Djdk.tls.rejectClientInitiatedRenegotiation=true, -XX:FlightRecorderOptions=stackdepth=256, -XX:+UnlockDiagnosticVMOptions, -XX:+DebugNonSafepoints, -Dlog4j2.disable.jmx=true, -Dfile.encoding=UTF-8]

The command used for all tests:

./neo4j-admin import \
--verbose \
--skip-duplicate-nodes=true \
--nodes nodes.csv

The error that is thrown in all non-working cases:

Available resources:
  Total machine memory: 31.14GiB
  Free machine memory: 5.117GiB
  Max heap memory : 6.000GiB
  Processors: 12
  Configured max memory: 22.63GiB
  High-IO: true

Nodes, started 2021-12-16 07:55:24.521+0000
[*Nodes:?? 1.004GiB---------------------------------------------------------------------------]    0 ∆    0
Done in 28ms
Prepare node index, started 2021-12-16 07:55:24.556+0000
Critical error occurred! Shutting down the import...
[*RESOLVE (~2 collisions):1.004GiB------------------------------------------------------------]    0 ∆    0
Done in 61ms

IMPORT FAILED in 216ms. 
Data statistics is not available.
Peak memory usage: 1.004GiB
Import error: DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
Caused by:DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
org.neo4j.kernel.impl.store.InvalidRecordException: DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
        at org.neo4j.kernel.impl.store.record.RecordLoad.verify(RecordLoad.java:141)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.verifyAfterReading(CommonAbstractStore.java:1074)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.readRecordFromPage(CommonAbstractStore.java:897)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.readIntoRecord(CommonAbstractStore.java:850)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecordByCursor(CommonAbstractStore.java:830)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.streamRecords(CommonAbstractStore.java:1003)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecords(CommonAbstractStore.java:979)
        at org.neo4j.kernel.impl.store.PropertyStore.ensureHeavy(PropertyStore.java:300)
        at org.neo4j.kernel.impl.store.PropertyStore.getTextValueFor(PropertyStore.java:700)
        at org.neo4j.kernel.impl.store.PropertyType$9.value(PropertyType.java:129)
        at org.neo4j.kernel.impl.store.record.PropertyBlock.newPropertyValue(PropertyBlock.java:280)
        at org.neo4j.internal.batchimport.NodeInputIdPropertyLookup.lookupProperty(NodeInputIdPropertyLookup.java:59)
        at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:527)
        at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:263)
        at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54)
        at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53)
        at java.base/java.lang.Thread.run(Thread.java:834)

WARNING Import failed. The store files in /home/dbr/.config/Neo4j Desktop/Application/relate-data/dbmss/dbms-49a06a75-bf53-4257-b193-a193c9e41556/data/databases/neo4j are left as they are, although they are likely in an unusable state. Starting a database on these store files will likely fail or observe inconsistent records so start at your own risk or delete the store manually
org.neo4j.kernel.impl.store.InvalidRecordException: DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
        at org.neo4j.kernel.impl.store.record.RecordLoad.verify(RecordLoad.java:141)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.verifyAfterReading(CommonAbstractStore.java:1074)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.readRecordFromPage(CommonAbstractStore.java:897)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.readIntoRecord(CommonAbstractStore.java:850)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecordByCursor(CommonAbstractStore.java:830)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.streamRecords(CommonAbstractStore.java:1003)
        at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecords(CommonAbstractStore.java:979)
        at org.neo4j.kernel.impl.store.PropertyStore.ensureHeavy(PropertyStore.java:300)
        at org.neo4j.kernel.impl.store.PropertyStore.getTextValueFor(PropertyStore.java:700)
        at org.neo4j.kernel.impl.store.PropertyType$9.value(PropertyType.java:129)
        at org.neo4j.kernel.impl.store.record.PropertyBlock.newPropertyValue(PropertyBlock.java:280)
        at org.neo4j.internal.batchimport.NodeInputIdPropertyLookup.lookupProperty(NodeInputIdPropertyLookup.java:59)
        at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:527)
        at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:263)
        at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54)
        at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53)
        at java.base/java.lang.Thread.run(Thread.java:834)

Test cases

As mentioned before, I tried to figure out a pattern. So there are some ridiculous test cases (test data):

Does not work (44 chars)

"id:ID","name",":LABEL"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr","Tom","Person"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr","Tina","Person"

Does work (43 chars)

"id:ID","name",":LABEL"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopq","Tom","Person"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopq","Tina","Person"

Does work (44 chars) (length not problem?)

"id:ID","name",":LABEL"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does work (54 chars)

"id:ID","name",":LABEL"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does not work (55 chars)

"id:ID","name",":LABEL"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does work (54 chars) [a-f]

"id:ID","name",":LABEL"
"abcdefaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"abcdefaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does not work (54 chars) [a-g]

"id:ID","name",":LABEL"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"

Does work (43 chars (44 chars does not work)) [a-g]

"id:ID","name",":LABEL"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"
2 REPLIES 2

abk
Neo4j
Neo4j

This looks like a bug related to the --skip-duplicate-nodes flag. Could you report an issue to Issues · neo4j/neo4j · GitHub and link it back here?

Best,
ABK

Github Issue: #12800

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.