Bulk Import limitations


(Bi Y Tcc K Ciu) #1

So I am trying to import 26 million rows in around 300 csv files (using bash to execute the bulk import code).

I come across a limitation in the number of csv's I can reference before I'm told , that the command is too long...

And some of those csv's have null values in columns.. in the normal load csv , I know how to deal with those, but with the bulk import (neo4j-admin.bat) I do not.

any help would be appreciated


(Michael Hunger) #2

You can use regular expressions for the files

e.g. --nodes:Person file-[0-9]+.csv.gz

note that in regexp you need to use .* instead of * for "any character"

the null values are skipped during import
if the --ignore-empty-strings setting is set to true:


--ignore-empty-strings <true/false>
        Whether or not empty string fields, i.e. "" from input source are ignored, i.e.
        treated as null. Default value: false

(Bi Y Tcc K Ciu) #3

You are a kind man for answering one of my questions again Michael.. :) Thanks.. I'll give it a shot right now. (that helps me having to adjust the data export out of oracle that I'm thought I'd have to labour through).


(Michael Hunger) #4

It's best to try it out with a small subset first and only run the big one after all the kinks have been sorted out.

Saves a lot of waiting time :)

Good luck


(Bi Y Tcc K Ciu) #5

what if I have different files per node?

ie. --nodes:c_contracts:c_payments ......

and each node is named after a csv file?


(Michael Hunger) #6

Then you use multiple --nodes parameters.

see the documentation:

https://neo4j.com/docs/operations-manual/current/tutorial/import-tool/


(Bi Y Tcc K Ciu) #7

Hmmm the problem is that I reach the limitation in bash for characters, when I have 111 tables to import from Oracle.... (using the multiple lines)...


(Michael Hunger) #8

I don't know of such a limit in bash. Did you use teh regexps for the files?

And you can put all the command line options into a file too:

--f <file name>
        File containing all arguments, used as an alternative to supplying all arguments
        on the command line directly.Each argument can be on a separate line or multiple
        arguments per line separated by space.Arguments containing spaces needs to be
        quoted.Supplying other arguments in addition to this file argument is not
        supported.

(Bi Y Tcc K Ciu) #9

using the command --ignore-empty-strings=true and i get the message : unrecognized option:'ignore-empty-strings'

... This is also not mentioned in the documentation (which I've taken a look at again as I didn't see before posting the original post ... )


(Michael Hunger) #10

Oh sorry, I'm always using neo4j-import not neo4j-admin import :slight_smile:


(Bi Y Tcc K Ciu) #11

Ah I see.

Thats a depreciated feature and therefore no longer documentated on neo4j (though in some blogs)

It works on a test example for me , so yeah, that fixes (till you get rid of the neo4j-import , my null value on bulk ) and the import using the file fixes my other issue.

Awesome stuff.