I am trying to import a CSV file (using neo4j-admin import
) that has a header row and uses pipe characters as delimiters. Some of the rows have commas in the fields (e.g., in an address field), which you would think would be irrelevant when the field delimiter is a pipe.
However, I'm seeing an error that makes me think that the import tool is either not considering pipes as delimiters, or maybe considering commas as delimiters in addition to the pipes.
I executed the following command:
neo4j-admin import --delimiter "|" --f scripts/load3.txt
and scripts/load3.txt
contains:
--nodes:Type1 data/type1.csv
Here are the first 2 rows of data/type1.csv
:
field1:string|field2:string|field3:string|field4:string|field5:string|field6:string|field7:string|field8:string|field9:string|field10:string|field11:int|field12:int|field13:string|field14:string|field15:string|field16:string|field17:ID(Type1-ID)|field18:string|field19:datetime|field20:string|field21:string|field22:string|field23:string|field24:string|field25:string|field26:string|field27:date|field28:string|field29:string|field30:string|field31:string|field32:string|field33:string|field34:string|field35:string|field36:string|field37:string|field38:string|field39:date|field40:string|field41:string
NV|TI|E2|OZ|N9JFNAWYZC7NISXE0C|Lanesboro|LC8O2AWVQB|""|VBYLX24ESGRT55KZ7N|""|26677|319|Stefany|82977|""|""|4208453280712768998|""|""|""|0259256186|""|7|0726014721|51451|8|2019-07-18|CA|A29FFO4YFVA20SKTRW|KUFFPM|W|""|1 655 964 2676|""|W7020CXL08|LCFFHU4RBZ6JSI5|1 057 256 5644|23|2019-07-18|597 CASTORO Canyon, Suite 3157, Boston, Virginia, 22713|""
This is the error message:
org.neo4j.unsafe.impl.batchimport.input.InputException: ERROR in input
data source: BufferedCharSeeker[source:/Users/xyz/data/type1.csv, position:321, line:0]
in field: field1:string|field2:string|field3:string|field4:string|field5:string|field6:string|field7:string|field8:string|field9:string|field10:string|field11:int|field12:int|field13:string|field14:string|field15:string|field16:string|field17:ID(Type1-ID)|field18:string|field19:datetime|field20:string|field21:string|field22:string|field23:string|field24:string|field25:string|field26:string|field27:date|field28:string|field29:string|field30:string|field31:string|field32:string|field33:string|field34:string|field35:string|field36:string|field37:string|field38:string|field39:date|field40:string|field41:string:2
for header: [field1:string|field2:string|field3:string|field4:string|field5:string|field6:string|field7:string|field8:string|field9:string|field10:string|field11:int|field12:int|field13:string|field14:string|field15:string|field16:string|field17:ID(Type1-ID)|field18:string|field19:datetime|field20:string|field21:string|field22:string|field23:string|field24:string|field25:string|field26:string|field27:date|field28:string|field29:string|field30:string|field31:string|field32:string|field33:string|field34:string|field35:string|field36:string|field37:string|field38:string|field39:date|field40:string|field41:string]
raw field value: Suite 3157
original error: Extra column not present in header on line 1 in /Users/xyz/data/type1.csv with value Suite 3157
at org.neo4j.unsafe.impl.batchimport.input.BadCollector$ExtraColumnsProblemReporter.exception(BadCollector.java:306)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collect(BadCollector.java:168)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collectExtraColumns(BadCollector.java:129)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:198)
at org.neo4j.unsafe.impl.batchimport.input.csv.LazyCsvInputChunk.next(LazyCsvInputChunk.java:96)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputChunkProxy.next(CsvInputChunkProxy.java:75)
at org.neo4j.unsafe.impl.batchimport.ExhaustingEntityImporterRunnable.run(ExhaustingEntityImporterRunnable.java:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:122)
Looks to me like it split the record on the pipes up to the point of the address, then it split on the commas in the address.
Help me understand how to use neo4j-admin import
correctly...
Thanks
Dave