Importing JSON to Neo4j from a file containing list of JSONs

apoc
import
json

(Parvez Hazari) #1

I have a file containing JSONs on each line of the file.
Sample records from file is as below:

{"bookId": "1000027", "relatedBookIds": ["4330592", "4755603", "4330602", "1100247", "4330612", "3042379", "4330596", "4330610", "1100231", "4330606", "3440120", "999901413"]}
{"bookId": "1000029", "relatedBookIds": ["4330606", "4330592", "999931622", "4330576", "3273969", "4755989", "1100223", "4330588", "3339070", "999901411", "4755609", "4330602"]}

The file contain approx million of JSON in a single file.

I was thinking of reading a line from the file and using the apoc.load.json to load the JSON. But considering the number of records, is there a better way to load the file?


(Michael Hunger) #2

Combine apoc.load.json with apoc.periodic.iterate

call apoc.periodic.iterate('
call apoc.load.json("file:///path/to/file.json") yield value
','
create (n:Node) SET n += value
',{batchSize:10000})

(Parvez Hazari) #3

Thanks Michael for your reply.
Just to understand your suggestion, is this approach equivalent to "USING PERIODIC COMMIT 10000 LOAD CSV" ?
Just wanted to understand as I was thinking of converting this file to a CSV and then using LOAD CSV with periodic commit.
But in case if this approach meets the performance to that of a LOAD CSV, then I would not like to convert the file and use JSON file directly to import.

Also just to clarify my requirement, I have just one file as input which contain a JSON flattened on each line of file. So if I have 10 lines in my file, I will have 10 separate JSONs.
The example in my initial post can be considered as a file containing 2 lines with 2 JSONs present in it.


(Michael Hunger) #4

Yes exactly, the apoc version has some benefits over regular load csv in terms of batching and large datasets.
Yes you will have one json object per line which will be turned into one row aka "value"


(Parvez Hazari) #5

Thanks Michael.
Let me try it out and I will update with the results.