Neo4j-admin import failed when header name contains colon(url)

Hi,

I am trying to upgrade my neo4j version to Neo4j 5.3.0:latest.

But when i am trying to import csv data using neo4j-admin import, it fails as below, which was not occured on 4.4.11.

org.neo4j.internal.batchimport.input.HeaderException: 
Unable to parse header, unknown property type 
'://example.com/property/name:string[]

It seems URL property name cause an error, because it contains 'colon(:)'.

as header contains information for each field, with the format <name>:<field_type>

Is there any way to import data successfully?

Thanks.

@june

are you able to share the header line and 1-2 rows of csv data ( admittedly obfuscated if required )?

@june

to confirm your last response is the 1st line/header of the csv? and thus the end result is for example nodes get created with a property named uri and a property name of http://example.com/proprerty.name?

@dana_canzano

Here it is.

uri:ID    http://example.com/property/name:string[]

Same for me. I tried to import using a column name with `:` in in it, and it throws a similar exception as @june .

version 5.3.0 | "neo4j-admin database import full"

Reverted back to version 4.4.11 "neo4j-admin import" and I am able to ingest successfully.

@june @shern2

im still missing something. For example and with 4.4.14, if I have a csv with

uri:ID,myProp
1,abc
2,def

this imports running neo4j-admin import --database=proddb1 --nodes=sample.csv.

If however I change the header to

uri:ID,myProp:added
1,abc
2,def

and effectively rerunning same neo4j-admin import and this fails with

org.neo4j.internal.batchimport.input.HeaderException: Unable to parse header
        at org.neo4j.internal.batchimport.input.csv.DataFactories.parsePropertyType(DataFactories.java:520)
        at org.neo4j.internal.batchimport.input.csv.DataFactories$AbstractDefaultFileHeaderParser.propertyExtractor(DataFactories.java:318)
        at org.neo4j.internal.batchimport.input.csv.DataFactories$DefaultNodeFileHeaderParser.entry(DataFactories.java:432)
        at org.neo4j.internal.batchimport.input.csv.DataFactories$AbstractDefaultFileHeaderParser.lambda$new$0(DataFactories.java:246)
        at org.neo4j.internal.batchimport.input.csv.DataFactories.parseHeaderEntries(DataFactories.java:219)
        at org.neo4j.internal.batchimport.input.csv.DataFactories$AbstractDefaultFileHeaderParser.create(DataFactories.java:253)
        at org.neo4j.internal.batchimport.input.csv.CsvInput.verifyHeaders(CsvInput.java:143)
        at org.neo4j.internal.batchimport.input.csv.CsvInput.<init>(CsvInput.java:120)
        at org.neo4j.internal.batchimport.input.csv.CsvInput.<init>(CsvInput.java:98)
        at org.neo4j.importer.CsvImporter.doImport(CsvImporter.java:168)
        at org.neo4j.importer.ImportCommand.execute(ImportCommand.java:268)
        at org.neo4j.cli.AbstractCommand.call(AbstractCommand.java:71)
        at org.neo4j.cli.AbstractCommand.call(AbstractCommand.java:34)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at org.neo4j.cli.AdminTool.execute(AdminTool.java:93)
        at org.neo4j.cli.AdminTool.main(AdminTool.java:79)
Caused by: java.lang.IllegalArgumentException: 'added'
        at org.neo4j.csv.reader.Extractors.valueOf(Extractors.java:222)

this is somewhat expected for https://neo4j.com/docs/cypher-manual/current/syntax/naming/#_naming_rules indicates

  • Symbols:

    • Names should not contain symbols, except for underscore, as in my_variable, or $ as the first character to denote a parameter, as given by $myParam.

but maybe I have misread/misunderstood your issue.

are you able to share the header line and 1-2 rows of csv data ( admittedly obfuscated if required )?

@shern2

Thank you for providing these details and a reproducible case. I was able to demonstrate success under 4.4.14 but failure under 5.3.0. This has now been reported to Neo4j Engineering for further analysis

Sample ingestion script:

'''

export NEO4J_USERNAME=neo4j
export NEO4J_PASSWORD=xxx
NEO_ROOTDIR='[dbms-dir]'

cd "${NEO_ROOTDIR}/import"

cat< movies3-header.csv
movieId:ID,title,year:int,:LABEL,"new col's: (xxx) data:string"
EOF

cat<< EOF > movies3.csv
tt0133093,"The Matrix",1999,Movie,abc1
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel,2abc
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel,3abc
EOF

cat< actors3-header.csv
personId:ID,name,:LABEL
EOF

cat< actors3.csv
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
EOF

cat< roles3-header.csv
:START_ID,role,:END_ID,:TYPE
EOF

cat< roles3.csv
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
EOF

cd "${NEO_ROOTDIR}"
"${NEO_ROOTDIR}/bin/neo4j-admin" import \
--database=test-db \
--ignore-empty-strings \
--array-delimiter="|" \
--skip-bad-relationships=true \
--nodes=import/movies3-header.csv,import/movies3.csv \
--nodes=import/actors3-header.csv,import/actors3.csv \
--relationships=import/roles3-header.csv,import/roles3.csv \
2>&1 | tee ./${date_idx}/neoimport.log

"${NEO_ROOTDIR}/bin/cypher-shell" \
'CREATE DATABASE `test-db`'

"${NEO_ROOTDIR}/bin/cypher-shell" \
'DROP DATABASE `test-db`

'''

Thanks @dana_canzano for your quick reply! Apologies for the delay.

In the same naming rules doc, it also states in a box that symbols are allowed. haha.

'''
Non-alphabetic characters, including numbers, symbols and whitespace characters, can be used in names, but must be escaped using backticks. For example: `^n`, `1first`, `$$n`, and `my variable has spaces`. Database names are an exception and may include dots without the need for escaping. For example: naming a database foo.bar.baz is perfectly valid.
'''

Let me prepare a sample ingestion script in a separate post.