Showing results for 
Search instead for 
Did you mean: 

How to LOAD CSV from a csv file that has different amount of information for each line

Node Link


I have a csv file that I want to import but each line of the csv file contains data that i dont want to keep (need to be filtered) and length of each line is different (in one column might have two different information but it is still string). The csv file dont have header but it can be added using excel but i dont think its a good idea.

Example csv:
(id, typeA,typeB,typeC) --> header that i add in the csv

But some line, dont have type B, so it became like this
(id, typeA,typeC,1)

some are

Is there a way i can import this kind of csv without modifying the csv?


Node Link

Or if you guys know any tools that can sort the csv/excel file based on their number of columns,is very much appreciated.

You can load a CSV that doesn't have a header or even ignore the header by using a SKIP 1. e.g.

load csv from "" AS row
WHERE size(row) > 1

We could then choose to process the lines differently depending on how many items they contain. e.g. maybe if we have 4 items we know that they are id,typeA,typeB,typeC, but if we have 3 items then they are id,typeA,typeC.

Or do you have another way of determining which type each column represents?

I can also determine what a column present by their initals because we have a standard system in naming them such as LDN (London) is Place and BNG (router) is an Equipment. Is there a way we can create a block of name that we can fill with list of initials and classify them based on that ?

For my csv, each line means one connection.

typeA --connected--> typeB --connected--> typeC and Id represents identification of one connection/pathway. Therefore i also need to use MERGE for my csv.

I'm looking for way where we could identify the number of column that is not null, and from that we could the data identically based on the number of column for each row.

Please provide one data line with initials(LDN) and other data columns so that I can suggest a solution.

I didnt quite understand this at first, but this actually what im looking for. So for example if i only want to import data with only 4 column for each line, i need to put the condition "WHERE size(row) < 5" . Therefore, It will skip the line with >=5 column right Is it right ?

Yep exactly! And then you could run the same query again with a different WHERE clause to handle the rows with more columns

Graph Buddy

You need to do Data modeling exercise with Target Data Model in Mind and write your own transformation Logic in Cipher Queries.

Nodes 2022
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.