Importing pipe-delimited CSV fails due to embedded commas?

dave1 · November 20, 2019, 9:09pm

I am trying to import a CSV file (using neo4j-admin import) that has a header row and uses pipe characters as delimiters. Some of the rows have commas in the fields (e.g., in an address field), which you would think would be irrelevant when the field delimiter is a pipe.

However, I'm seeing an error that makes me think that the import tool is either not considering pipes as delimiters, or maybe considering commas as delimiters in addition to the pipes.

I executed the following command:

neo4j-admin import --delimiter "|" --f scripts/load3.txt

and scripts/load3.txt contains:

--nodes:Type1 data/type1.csv

Here are the first 2 rows of data/type1.csv:

field1:string|field2:string|field3:string|field4:string|field5:string|field6:string|field7:string|field8:string|field9:string|field10:string|field11:int|field12:int|field13:string|field14:string|field15:string|field16:string|field17:ID(Type1-ID)|field18:string|field19:datetime|field20:string|field21:string|field22:string|field23:string|field24:string|field25:string|field26:string|field27:date|field28:string|field29:string|field30:string|field31:string|field32:string|field33:string|field34:string|field35:string|field36:string|field37:string|field38:string|field39:date|field40:string|field41:string
NV|TI|E2|OZ|N9JFNAWYZC7NISXE0C|Lanesboro|LC8O2AWVQB|""|VBYLX24ESGRT55KZ7N|""|26677|319|Stefany|82977|""|""|4208453280712768998|""|""|""|0259256186|""|7|0726014721|51451|8|2019-07-18|CA|A29FFO4YFVA20SKTRW|KUFFPM|W|""|1 655 964 2676|""|W7020CXL08|LCFFHU4RBZ6JSI5|1 057 256 5644|23|2019-07-18|597 CASTORO Canyon, Suite 3157, Boston, Virginia, 22713|""

This is the error message:

org.neo4j.unsafe.impl.batchimport.input.InputException: ERROR in input
  data source: BufferedCharSeeker[source:/Users/xyz/data/type1.csv, position:321, line:0]
  in field: field1:string|field2:string|field3:string|field4:string|field5:string|field6:string|field7:string|field8:string|field9:string|field10:string|field11:int|field12:int|field13:string|field14:string|field15:string|field16:string|field17:ID(Type1-ID)|field18:string|field19:datetime|field20:string|field21:string|field22:string|field23:string|field24:string|field25:string|field26:string|field27:date|field28:string|field29:string|field30:string|field31:string|field32:string|field33:string|field34:string|field35:string|field36:string|field37:string|field38:string|field39:date|field40:string|field41:string:2
  for header: [field1:string|field2:string|field3:string|field4:string|field5:string|field6:string|field7:string|field8:string|field9:string|field10:string|field11:int|field12:int|field13:string|field14:string|field15:string|field16:string|field17:ID(Type1-ID)|field18:string|field19:datetime|field20:string|field21:string|field22:string|field23:string|field24:string|field25:string|field26:string|field27:date|field28:string|field29:string|field30:string|field31:string|field32:string|field33:string|field34:string|field35:string|field36:string|field37:string|field38:string|field39:date|field40:string|field41:string]
  raw field value: Suite 3157
  original error: Extra column not present in header on line 1 in /Users/xyz/data/type1.csv with value Suite 3157
	at org.neo4j.unsafe.impl.batchimport.input.BadCollector$ExtraColumnsProblemReporter.exception(BadCollector.java:306)
	at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collect(BadCollector.java:168)
	at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collectExtraColumns(BadCollector.java:129)
	at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:198)
	at org.neo4j.unsafe.impl.batchimport.input.csv.LazyCsvInputChunk.next(LazyCsvInputChunk.java:96)
	at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputChunkProxy.next(CsvInputChunkProxy.java:75)
	at org.neo4j.unsafe.impl.batchimport.ExhaustingEntityImporterRunnable.run(ExhaustingEntityImporterRunnable.java:57)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:122)

Looks to me like it split the record on the pipes up to the point of the address, then it split on the commas in the address.

Help me understand how to use neo4j-admin import correctly...

Thanks

Dave

stefan.armbruster · November 21, 2019, 11:22am

Running this directly from command line just worked for me:

bin/neo4j-admin import --delimiter "|" --nodes:Type1 import/data1.csv

Have you tried to move --delimited "|" to load3.txt. My suspicion is if you use --f all other options are ignored.

Topic		Replies	Views
Failed to invoke procedure `apoc.import.csv` Procedures & APOC	12	2133	January 8, 2021
Properly escaping input data for neo4j-import Import / Export import , knowledge-base , neo4j-import , csv	0	1222	August 23, 2018
How do I parse a data with column that needs to be splitted while using neo4j admin-import? Newbie Questions	3	563	April 22, 2020
How do I split the data in the column while loading in neo4j using neo4j-admin import? parsing issue Import / Export import , neo4j-import	2	1108	December 30, 2020
Can neo4j-admin import skip CSV rows with formatting errors? Import / Export import , csv	0	1385	October 13, 2019

Take the Course Then Join The Aura Agent Hackathon

Importing pipe-delimited CSV fails due to embedded commas?

Related topics

Take the Course Then Join
The Aura Agent Hackathon