Load large json file with new lines as seperator

json

(Radke) #1

I try to load a large (4.5GB) json file into neo4j. This file is in jsonl format, meaning each json object is on its own line. There are about 5.3 million entries.
I read about the apoc.load..() functions but have a few questions:

Do I have to take care of periodic commits?
Can I split the file via apoc.load on the line endings?

Thanks in advance.


(Stefan Armbruster) #2

Hi Bert,

from my understanding if the json file is essentially a list on top level (and not a map), it is streamed, see https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.5/src/main/java/apoc/load/LoadJson.java#L60.

There is no periodic commit by default, but you can easily do that (untested code below, take care);

"call apoc.load.json(....) yield value return value",
"  create (p:Person) set p = $value // placeholder for your create/merge... statement that operates on every json list elemt - aka every value",
{batchSize: 10000});

(Radke) #3

Thanks Stefan,

my problem is that the file is not proper json as a whole, but each line represents a json object. I will try some command line magic to torn this into an json array.

Good to know that are periodic commits.


(Radke) #4

Thanks again, import of the german handelsregister is running now. Will take some time as it is over 4GB of json with over 5.000.000 company entries.


(Radke) #5

Or not. Import ist OOM me. Not enough heap space, even though I increased it to 8 GB (dbms.memory.heap.max_size) already.

Looks like apoc.load.json($url) is not streaming and tries to load the file upfront.


(Radke) #6

Just to close this thread, I finally managed to conclude the import and wrote a bit about it: https://faboo.org/2019/02/handelregister-neo4j/

Thanks for the help.