(Dutzu) #1


It has come to this point. How to handle multi-tenancy.

Obvious choices:

  • single-tenant - 1 instance per tenant (would get hairy once the number of tenants grows and we have data that should be shared between tenants, and which would need effort to keep in sync on all instances)
  • Tenant Label - tedious and error prone
  • Referenced Tenant node - perhaps a bit simpler to get head around but still tedious and error prone

From what I saw in other threads, there is already multi-tenancy on the roadmap, but it's probably still at least half a year away: Proper way to implement multi-tenancy on Neo4j

Another set of approaches is documented pretty nicely in this article:

And i've also seen references that if we would have used the Java OGM, or with ruby with Neo4j.rb ( or with Gremlin instead of Cypher via PartitionStrategy ( we could have achieved multi-tenancy.

Well, too little too late for us now. We have 4 NodeJS api's that each communicate with Neo4j.

So, does anyone have a tip for us on how to approach this? Should we wait it out until Neo4j 4.0 is launched and until then deal with 1 db per tenant?

Is there any other "trick" we could use?

I was thinking even of something like, authentication with different credentials per tenant, and a Trigger or something in Neo4j that would filter results depending on the user making the call.

From what I understood, there is the possibility to enforce this via Subgraph access control (

Is it worth the hassle or is there a simpler, better way?

Thank you,

(M. David Allen) #2

It sounds to me like you've done your research and you have an accurate picture of the space -- and you're aware of most of the main options.

Note that in multi-tenancy setups, you're always creating some separation between graphs the question is really just at what level. You can separate them on the label level, at the graph level within a single database (that's the feature that is coming in the next version of Neo4j) and you can separate them at a physical level by putting them in different databases.

I think we've seen all of those approaches, each according to how high a level of guaranteed separation you need. Physical separation makes it maximally difficult or impossible that software errors in clients could access the wrong data, while label or subgraph access control does very well. So which you pick kinda depends on what level of assurances you need and how sensitive your data is. In regulated environments for example, often nothing less than physical separation will do, in part because even the administrative folks behind the scenes need to be locked out of datasets they shouldn't see.

There's no "trick" per se. Only choices & tradeoffs. The easiest/simplest way with the lowest level of assurance is to apply a label for each graph to every node in that graph, and then ensure with your client software that all of your queries always constrain what they're looking for to that label at a minimum. E.g. you can have a :Graph1:Client and a :Graph2:Client but you never query for a :Client.

The most complex/difficult method (but with the highest level of security assurance) will always be the physical separation. Everything else can be thought of as a midpoint on that continuum.

The key question is how much separation you need for your multi-tenancy and what you're willing to adopt to get it.

(Dhaks R) #3

Thanks for well researched question.

I too am looking for similar options, so cant solve your question.
But I am thinking of different options.

Can we have multiple databases active at same time in Neo4j? I came across below link. However I am not seeing any direct Neo4j documentation for us to proceed with below instructions. Any pointers appreciated

(Dutzu) #4


We decided to opt for the complete separation of data. So we will have 1 instance per tenant, until you guys release the multi-tenancy feature later this year. If the timeline has changed, please tell me.

We have multiple microservices that connect to neo and those microservices are of course horizontally scaled.

What are your recommendations regarding performance optimizations in this scenario. Connection pools, pool sizes, connection lifetime, etc.

Thank you