We already have a setup Neo4j graph database with 1.25M nodes in it.
Going forward if I want to update the database with roughly a billion nodes every week.
What is the best way to do its?
We currently do update jobs using 'MERGE' queries, but that's not going to work when the amt of nodes goes into millions/billions.
@setu.96.shah
does the node count grow without end? or are we always updating 1 billion nodes every week and thus node count remains relatively fixed?
And how much RAM does your instance have?
Also when using MERGE, be sure to index on the label/property
For performance reasons, creating a schema index on the label or property is highly recommended when using MERGE. See Create, show, and delete indexes for more information.
Thank yo for the reply @dana_canzano
The RAM for the instance is : 115410136 kB
For what we know now, the node count to be updated every week will remain relatively constant to be <=1B
Thank you for the indexing resource. While I look into it, can you provide with some more information on whether indexing will help when we potentially reach a stage where we start going into new-node debt?
New-node debt: When the previous update job outruns a week's timeline and now we have a debt of new node from the catalog that needs to be added. This might spiral into increasing debt of new nodes.
Also, will updating 1B weekly be more effective or doing a couple of millions daily from the neo4j standpoint?
Thanks kindly,
Setu
@setu.96.shah
regarding MERGE and index.. as MERGE is effectively
- find a node and if it exists update
if not
- create node
in order to find a node an index will help.
For example lets say you have 1B nodes and with label :Person And youmerge (n:Person {name:'setu.96.shah'}) set n.status='active';
Without an index on :Person(name)
then it will look at all 1B nodes and determine if any :Person nodes have a name = setu.96.shah
. And so a FullLabelScan and looking at 1B nodes. Now if you index on :Person(name) we will look at far far less than 1B for the index will be able to tell us much faster if such a node exists.
Thank you for the great direction @dana_canzano
Testing so far, it is proving to be very helpful.
I did have a question on 'composite node range index on multiple properties', does adding more properties to the index make the searching work faster?
In general, what factors(parameters) in the indexing affect positively/negatively the performance of searching?
Thank you.