Json loading is slow after some time

Hello Everyone,

I am trying to load some thousands JSON in neo4j with the help of python script and apoc.json.load.
Initially, for around 1000 JSON, my scripts work fine and loading data very fast but after some time this loading becomes extremely slow.
for the initial 1000, it took only 40 sec but after that, it is taking around 10 sec for each JSON.
is there anything so I can do to speed up my operation.

overall size of all JSON files are 750 MB and i am using 8GB macine to load this data

There are a number of things you can do to make sure you data imports go smoothly. Using Periodic Commit, Merge creating just the first property of your created Nodes, and ensuring that you have enough memory allocated to the process are all contributing factors. Here's a link to the official docs https://neo4j.com/docs/labs/apoc/current/import/load-json/

I have taken care of most of the mentioned suggestions.
Periodic commit would not work in my case because in my scenario i don't have much nodes .each JSON contains around 5k nodes and 7k relationship. i have 14000 json files and each JSON contain this much of data. i am using for loop to iterate each json and each JSON passing to apoc.load.json.
is there any way to iterate in diff manner and what about memory configuration .
for 700 MB of data what properties i need to defile in neo4j,conf ?

You can set the number of periodic commits to a lower number, like 500. You can also convert the JSON into CSV which I've seen recommended here https://neo4j.com/blog/bulk-data-import-neo4j-3-0/. As for memory in your neo4j.conf settings there are a number of settings like heap, etc. you can increase to help with load.

Here's another load json resource https://neo4j.com/blog/cypher-load-json-from-url/

Do you have proper indexes created? That could explain the behavior you are seeing.

Without indexes, when there is no data, the ingestion would be faster. As data starts growing the ingestion will start slowing down.

Is indexing causes problem ?
i have created the indexes before loading. i am loading 3 type of lables and 2 type of relationships. index already created for these labels.

Yes ..
As mentioned by @anthapu indexes always cause latency in load. Infact in DW project for all bulk load indexes are removed before historical load and applied post load.

Updating indexes does not cause an issue. If your query cannot use index then it can cause the queries to run slower.

Would it be possible to list down your queries and indexes you created to see why they could be slower?

@intouch.vivek That's incorrect.

For large data loads using Cypher, if you do not have indexes it can take lot of time, unless you are only using CREATE statements.

Index updates do not have too much of side-effect. So creating indexes before you do the load is very important.

yeah sure.
but after your answer, I tried without indexes as well but still, again the same issue occurs.

Please find below queries which I am using.

call apoc.load.json("file:///Neo4j/blocks_set_001/'''+f_name+'''") YIELD value
                            MERGE (m:BLK{blk_id: value.id}) 
                                SET m.confirmations = value.confirmations,
                                    m.strippedsize = value.strippedsize ,
                                    m.time = value.time,
                                    m.weight = value.weight,
                                    m.mediantime = value.mediantime,
                                    m.difficulty = value.difficulty
                                                    
                            FOREACH (node in value.tx + [value] | 
                                MERGE (e:TX{txid: node.id}) 
                                    SET  e.tx_locktime = node.locktime ,
                               e.time = apoc.date.format(value.time, 's', 'yyyy/MM/dd HH:mm:ss'),
                                        e.tx_version = node.version,
                                        e.tx_size = node.size,
                                        e.tx_vsize = node.vsize,
                                        e.tx_weight = node.weight
                                                        
                                FOREACH(out in node.vout |
                                    FOREACH(adrc in out.key.addresses|
                                        MERGE (e:ADDRESS{address:adrc})
                                            ))) with value
                                            
        UNWIND value.tx as transaction
            UNWIND transaction.vout as out
                MATCH(n:TX{txid:transaction.txid})
                MATCH(m:ADDRESS{address:out.key.addresses[0]})
                MERGE (n)-[r:OUT]->(m) 
                ON CREATE SET r.id= value.id, 
                              r.output_value = out.value,
                              r.index_out = out.n   

please suggest me on what i have to create index and what wrong in query

@anthapu, Indexes might not be the problem here..

However as per i know index is useful in data fetch but not in data ingestion..
At the time of data ingestion if index is enabled then DB has to manage that also along with storing data.

This depends on how you're ingesting the data. Anthapu noted correctly that if you're only using CREATE statements, then the indexes won't be useful here, and that's more inline with your thinking on this.

However, in the query given we can see MERGE operations on the node, and since MERGE is like a MATCH (and a CREATE if no match is found), this does incorporate data fetching as well, and without indexes on the following, the import will get slower as more nodes are added:

:BLK(blk_id)
:TX(txid)
:ADDRESS(address)

I tried in both ways but getting same result.
with indexes ,it's loading slow and without indexes also.
what wrong i am doing here .
is memory causes the problem ? I am running this script on 8GB of ram and using default neo4j.cong file

Hi @andrew.bowman @anthapu

I noticed that nodes are not the problem in this.
this slowness coming through relationship. I have separated the logic of nodes and rels and got to know that nodes ingestion is very fast (4 min for 200000 nodes) but rels taking so much time. is there any way to fasten this process ??
using below query for rels creation

call apoc.load.json("file:///Neo4j/blocks_set_001/'''+f_name+'''") YIELD value
UNWIND value.tx as transaction
            UNWIND transaction.vout as out
                MATCH(n:TX{txid:transaction.txid})
                MATCH(m:ADDRESS{address:out.key.addresses[0]})
                MERGE (n)-[r:OUT]->(m) 
                ON CREATE SET r.id= value.id, 
                              r.output_value = out.value,
                              r.index_out = out.n   

Hi Kunal,
When you create a relationship nodes on both sides are locked to make sure to stay ACID compliant. So, it is possible your requests getting serialized if you are running them in parallel.