Memory and Performance implications of a multiple database deployment of neo4j 4

francofs · March 12, 2020, 9:42am

Hi, I have been digging around and trying to understand what would be the implications of having multiple databases as part of a segregated data tenancy business model.

To date we approached this using labels under a single database which has some advantages that we can also connect these different tenancies to enrich the graph without compromising data stewardship protection the labels can provide.

With the arrival of neo4j 4, a new possibility presented itself: hard separation of data in the form of separate databases, which can be even a compelling offering in a SaaS solution. We are considering it even though we would no longer be able to create direct relationships between tenancies, which can make our work more difficult if we want to provide some features that depend on those relationships. Moreover, I am assuming running graph algos across multiple databases won't be directly possible, which can lead to some challenges ahead when we want to extract value from a holistic view of the solution.

Now, having all of those considerations there are a few undocumented implication of going through that route (AFAIK, everything is here: Chapter 5. Manage databases):

How can this impact the amount of open file descriptors in linux? This is important to know, because we have only one neo4j instance. It makes sense for now and the near future that it stays this way. I predict we having to maintain many databases under one instance in a fairly accelerated growth pace.
How does this impact memory consumption? Having multiple databases add significant overhead? If so, under what conditions this overhead increase? The memory considerations explained in 13.1. Memory configuration are a 100% per database or the instance itself represents the bulk of memory consumption?
What's the impact in performance? Considering that there will be I/O on a different file structure possibly asynchronously, there are also the implications on CPU usage during I/O operations. Are there any benchmarks already available?

All of these have a huge impact and I would love to take the most informed decision. So far I was unable to find any online resources that explore those.

Finally what's the picture of label vs database, resource wise. I have a feeling that in practice, considering a single neo4j instance under the same hardware, labels will always be more performative than multi-database. This is very important as we want to keep our infra costs low at this moment.
I already understand that it will be easier to scale and is inherently more secure to use multiple databases, so that is not a point I would like to discuss here.

Thanks for any contributions to the discussion.
Best Regards,

Fábio

YMA-MDL · April 28, 2021, 2:22pm

Hi Fabio, did you get any answer maybe through other medias/discussions?

francofs · April 30, 2021, 8:32am

Hi @YMA-MDL , @jim.webber replied to me in his 4.0 GA annoucement: Introducing Neo4j Graph Database 4.0 [GA Release]

But still a bit difficult to grasp as projecting shared vs unshared resources is a bit difficult in an initial project phase where these are mostly unknowns and metrics are mostly guess work.

What I would love to see are a few example scenarios where one would have more advantage over the other approach. It is then easier to correlate with my use cases and would have a more effective way to balance trade-offs (including functional ones as consequence of disconnected graphs).

Given hardware can come in many flavors and sizes, being able to visualize at which point the "unshared" resources could become a bottleneck and start tipping the performance needle towards a single database would be mostly insightful.

So to conclude, while this was answered I still feel there should be a better guide on that area on Neo4j Docs.

Topic		Replies	Views
Multi-tenancy Operations apoc , operations , knowledge-base	5	2986	January 14, 2020
Multi tenant production environment Cluster	2	557	April 3, 2021
If there is one type of node that is common between two databases, should they always be one database? Neo4j Graph Platform performance	1	193	March 1, 2022
Raju - Founder @ Prama from Kentucky, United States Introduce-Yourself	4	753	June 25, 2019
Multi-Tenancy on Neo4j Neo4j Graph Platform cypher	18	5096	January 29, 2020

Memory and Performance implications of a multiple database deployment of neo4j 4

Related topics