Multi-Tenancy on Neo4j

What's the latest on Multi-Tenancy.....

Also, If I have two tenants who in addition to their own data will need to share common data stored in a third graph will the nodes of each tenant be able to have RELATIONSHIPS with nodes in a third graph?
for example the common graph have nodes of Persons....the other graphs have nodes of Organization...Persons are MEMBER_OF organizations. I would not want to duplicate Persons in each Organization graphs

You're JUST a bit early for getting the latest on this one. If you can hold on just a little while longer we'll have some interesting things to share about this.

2 Likes

I cannot speak to your question about multi-tenancy, except for what I currently do to enable that scenario in Neo4j 3.x. The beauty of being schema-less and leverging what we can do now is that I have three graphs in one database. For instance, one graph contains my 'Library' data, the other 'Language' data--these are two unrelated data sources, which share a 'Common' graph that is compose of referential data shared by the other two. All nodes and edges have one of these three labels, which acts as a schema identifier since my data interacts with PostgreSQL Each node also has another label denoting the first degree of identity, such as 'Person" etc. and each edge has its corresponding label as well. This makes the Cypher statements easier to manage and to work with the data. So at the very least I have three separate graphs in one instance of Neo4j.you could do something l;ike this until Neo Tech comes up with the official methodology that provide 'schema-like' structures in Neo4j in the future.

Thanks for your response.....this is what I do now except my graphs represents different organizations and so there is a need for privacy, for example where the common graph is :Person and persons belongs to multiple organizations one could traverse the nodes of both organization by querying the relationships of a member. Separate physical DB is not an option since that would require the updating all DB when common data change......lets see what comes forth from Neo.

Thanks Andrew....looking forward to it.

We use spring-data-neo4j and ogm, and have been using this extension successfully for several months to implement multi-tenancy on a single server instance: https://github.com/meistermeier/neo4j-ogm-label-extension

Essentially what it does is automatically append an extra label to every node in your query. In your session config, you can specify a hard-coded tenant id or use a Supplier object to make it dynamic. It works well for most queries, although I have found a couple of cases where it does not (e.g. using the same variable name across WITH queries, or inside an apoc.do.when). Otherwise it's pretty solid, and we have had it in production for several months.

I do not believe it will work for the case you note, however. If you are mixing non-tenant nodes with tenant-specific nodes in the same query, it's not smart enough to know which to append labels and which not (afaik anyway). also note that it does not add labels to relationships, only nodes.

1 Like

I can actually speak to SOME things, though not all, at this time.

We did publicly announce Neo4j 4.0 MR2, a pre-alpha release, and as such some 4.0 features have been publicly revealed.

We announced support for multiple databases per server (active and accessible at the same time), and you could use these to support multi-tenancy. However, the databases are separate from each other, and transactions are isolated between databases, so we do not allow relationships to connect across different databases.

There are some additional things coming around this area that may be relevant to your use cases, stay tuned.

MR2 also introduced fine-grained security, applied through the new system database (a secondary db that is always present and used for administrative/security purposes with its own DDL for system operations).

You could implement multitenancy through use of configured users/roles such that tenants are implemented as separate users/roles and have differently configured permissions granting/denying what they can read/write/traverse. Tenant data would all exist in the same database with this approach, so common parts of the graph would not have tenant-specific restrictions. With this approach, you would need to make sure users/roles are configured correctly for all tenants, as well as the permissions per tenant role, and you would also need to make sure the labels and/or properties for those permissions are kept current (basically if a label is one thing that is important for tenant visibility, you don't want to forget that label when adding or modifying data in your graph).

As noted, more things to talk about later, but this is what's available to look at right now. Feel free to go over the documentation linked on the MR2 site I provided above, and/or download the MR2 release (standalone or via the canary Neo4j Desktop release), that can get you started!

2 Likes

Thanks for that information Bob.

Andrew my use case absolutely require non-tenancy specific nodes so I will be following you guys progress.

Awesome!

For my use case, I definitely need multi-tenancy with OGM, so I'm thrilled to hear more! :smiley:

Hello, previously, with the milestone 2 release for 4.0, I talked a bit about multi-database and schema-based security. As mentioned previously both of these can be used to implement a multi-tenancy system, one by using isolated databases running on the same system, via multi-database, the other via configuring visibility and permissions for user roles so the data still lives together in the same database, but different roles having different views of that data.

We've just made public our beta milestone 3 release for 4.0(MR3), which also adds full documentation for 4.0 Operations and 4.0 Cypher manuals and more.

The MR3 release also includes Neo4j Fabric, a converged platform that supports the storage, processing, analysis and management of data distributed and stored in multiple Neo4j databases.

Or put more simply, you can use Neo4j Fabric to make Cypher queries that can store and retrieve data in multiple federated and sharded graphs.

This can work in tandem with the previously announced multi-database feature, but is not limited to it.

So for example, you could have a single Neo4j instance with multiple databases defined, and via configuration also add a Fabric database to the same instance and assign any number of the multiple databases present as Fabric graphs that will be queryable via the Fabric database. You can then connect to the Fabric database and execute queries that can execute on the Fabric graphs (in parallel for reads...though for writes only a single Fabric database can be written to in a transaction). The individual Fabric graphs can be connected to and queried individually without any changes...the fact that they are set as Fabric graphs does not require any changes to their configuration or operation, they don't even need to be aware of the fact that they're being queried this way.

The above is the simplest case, using a single Neo4j instance for both the Fabric database and the multiple Fabric graphs that can be queried upon.

But any graph that can be connected to via Bolt can be used as a Fabric graph, whether it's local, as in the above example, or remote. You can use Neo4j Fabric to allow connection and querying to remote graphs, whether they're a remote standalone instance (potentially hosting multiple databases), or a clustered database, so you could use Fabric to issue queries across multiple clusters, or some mix of clusters and standalone instances.

This can support cases where the graphs being used have discrete kinds of data (Customers graph, Products graph, Sales graph) in a federated manner, or if you're sharding using multiple graphs for the same kind of data, and want a unified way to query upon that data.

In any of these cases, the instances or clusters do not need to be aware that they are being used as Fabric graphs, and no extra configuration or changes to usage are needed on their side to be used as such. Users can continue to connect to and query these individual instances and clusters normally.

1 Like

Thanks for the update Andrew. I will review the release.

-Michael

I am interested in the multi database per server feature to use for multi tenancy. I have been going through the documentation for MR3 and I was having trouble finding examples of how to choose the database in Java. Finally I found this PDF from MR2: https://neo4j.com/wp-content/themes/neo4jweb/assets/images/Neo4j_EE_4.0_MR2_Doc.pdf

It has a good example of how to select your database in Java in section 5.2.1.

I was looking for a similar pdf for MR3 but I could not find it. I found the MR2 pdf from a google search. I am not sure how you would get to it by navigating the site. I would like more info on this topic with MR3 also I'm curious if there might be a way to switch databases while using Spring Data Repositories.

I found the developer documentation for SDN/RX includes details on the Neo4jclient which supports selecting the database to run your queries against (https://neo4j.com/developer/spring-data-neo4j-rx/#_usage). This is great for adhoc queries. I would really like to be able to specify a database when I call a repository method or set the database in some kind of context that it will use for each request. Are there any plans for propagating the database selection to the Spring Data Repository model? Right now I'm considering using the Neo4jclient exclusively since it supports the database selection, but the code is not as clean as it would be if I could use the SDN/RX stuff.

1 Like

Thanks for the update Andrew.
I am very interested into a native multi-tenancy approach with neo4j, as currently we use a _owner property to distinguish who can interact with each node.

Let's take the example of a banking app, where Transactions should be visible to only the owner of the account. For that, I would currently do MATCH (t:Transaction {_owner: <ownerId>}).

My question is, on neo4j v4, if we were to use a single database for each bank account, would that be a scalable solution? An app could need thousands, if not millions, of databases. Is this a good approach? If not, which other practice is common to handle similar cases to the banking app described above?

Thank you all for all the help!

Hi Valerio,

I'll need to check on that one, but I don't believe we're aiming for that kind of scaling for multi-database. I think the schema based security may be a better fit, with roles added (and separate logins per role) for what permissions make sense for read/write/traverse of the graph, based on labels and properties.

Andrew,

Is there a way to do Database Selection via configuring the new SDN/RX spring driver? I cannot find a property to facilitate this.

Presently the only way I can to perform database selection is to build a SessionConfig and pass that to the Driver.session(config). This solution will not work for Spring Repositories where the Driver and Session creation is handled behind Spring's Repository facade.

AND of course the real goal is to have something like a SessionConfigFactory bean which will produce a SessionConfiguration that points to the database that serves the current request's tenant.

Thank you for taking the time to answer our questions. It's a big help!

--Scott

That's a good question, and I don't personally have the answers for you here, but I'd recommend asking as a new question on the SDN section of the forums, they may be able to give you a more detailed view of what the roadmap looks like here.

https://community.neo4j.com/c/drivers-stacks/spring-data-neo4j-ogm/30

Hi @drobinson You will be pleased to hear that we dropped support for database selection for the higher level abstractions as well. Next version will be out soon. https://github.com/neo4j/sdn-rx/commit/fd8068b24e5a80fc1ded6e5a884ddd6447de9c02