How do I efficiently upload/import large amount (billions) of nodes every week to Neo4j graph database?

setu.96.shah · February 15, 2024, 1:05am

We already have a setup Neo4j graph database with 1.25M nodes in it.
Going forward if I want to update the database with roughly a billion nodes every week.
What is the best way to do its?

We currently do update jobs using 'MERGE' queries, but that's not going to work when the amt of nodes goes into millions/billions.

dana_canzano · February 15, 2024, 1:14am

@setu.96.shah

does the node count grow without end? or are we always updating 1 billion nodes every week and thus node count remains relatively fixed?

And how much RAM does your instance have?

Also when using MERGE, be sure to index on the label/property

For performance reasons, creating a schema index on the label or property is highly recommended when using MERGE. See Create, show, and delete indexes for more information.

setu.96.shah · February 15, 2024, 8:14pm

Thank yo for the reply @dana_canzano
The RAM for the instance is : 115410136 kB
For what we know now, the node count to be updated every week will remain relatively constant to be <=1B

Thank you for the indexing resource. While I look into it, can you provide with some more information on whether indexing will help when we potentially reach a stage where we start going into new-node debt?

New-node debt: When the previous update job outruns a week's timeline and now we have a debt of new node from the catalog that needs to be added. This might spiral into increasing debt of new nodes.

Also, will updating 1B weekly be more effective or doing a couple of millions daily from the neo4j standpoint?

Thanks kindly,
Setu

dana_canzano · February 15, 2024, 9:15pm

@setu.96.shah

regarding MERGE and index.. as MERGE is effectively

find a node and if it exists update
if not
create node

in order to find a node an index will help.
For example lets say you have 1B nodes and with label :Person And youmerge (n:Person {name:'setu.96.shah'}) set n.status='active'; Without an index on :Person(name) then it will look at all 1B nodes and determine if any :Person nodes have a name = setu.96.shah. And so a FullLabelScan and looking at 1B nodes. Now if you index on :Person(name) we will look at far far less than 1B for the index will be able to tell us much faster if such a node exists.

setu.96.shah · February 18, 2024, 12:04am

Thank you for the great direction @dana_canzano
Testing so far, it is proving to be very helpful.

I did have a question on 'composite node range index on multiple properties', does adding more properties to the index make the searching work faster?

In general, what factors(parameters) in the indexing affect positively/negatively the performance of searching?

Thank you.

Topic		Replies	Views
What is the most efficient and fast way to load very large volumes of data into a Neo4j graph database? Import / Export apoc , cypher , import	2	703	August 19, 2021
I have some questions about importing data Import / Export	4	1065	January 3, 2019
Upload mass data in neo4j Import / Export	9	160	July 4, 2024
Most effecient way to load up millions of nodes? Neo4j Graph Platform migrated	0	160	November 25, 2022
How to handle large data insertion in Neo4j Operations performance , cypher	1	2209	July 26, 2021

How do I efficiently upload/import large amount (billions) of nodes every week to Neo4j graph database?

Related topics