I am trying to use neo4j-admin import
to populate a neo4j database with CSV input data. According to documentation, escaping quotation marks with \"
is not supported but my input has these and other formatting anomalies. Hence neo4j-admin import
obviously fails for input CSV
> neo4j-admin import --mode=csv --id-type=INTEGER \
> --high-io=true \
> --ignore-missing-nodes=true \
> --ignore-duplicate-nodes=true \
> --nodes:user="import/headers_users.csv,import/users.csv"
Neo4j version: 3.5.11
Importing the contents of these files into /data/databases/graph.db:
Nodes:
:user
/var/lib/neo4j/import/headers_users.csv
/var/lib/neo4j/import/users.csv
Available resources:
Total machine memory: 15.58 GB
Free machine memory: 598.36 MB
Max heap memory : 17.78 GB
Processors: 8
Configured max memory: -2120992358.00 B
High-IO: true
IMPORT FAILED in 97ms.
Data statistics is not available.
Peak memory usage: 0.00 B
Error in input data
Caused by:ERROR in input
data source: BufferedCharSeeker[source:/var/lib/neo4j/import/users.csv, position:91935, line:866]
in field: company:string:3
for header: [user_id:ID(user), login:string, company:string, created_at:string, type:string, fake:string, deleted:string, long:string, lat:string, country_code:string, state:string, city:string, location:string]
raw field value: yyeshua
original error: At /var/lib/neo4j/import/users.csv @ position 91935 - there's a field starting with a quote and whereas it ends that quote there seems to be characters in that field after that ending quote. That isn't supported. This is what I read: 'Universidad Pedagógica Nacional \"F'
My question is whether is it possible to skip or ignore poorly formatted rows of the CSV file for which neo4j-admin import
throws an error. No such option seems available in the docs. I understand that solutions exist using LOAD CSV
and that CSVs ought to be preprocessed prior to import. Since my use case can tolerate some missing data, I wanted to see if there was an option to ignore bad rows before I begin properly formatting the CSV input.