I need to build a multi tenant platform using Neo4j with 1 tenant per database. We could run into 5k-6k tenants in the next 2-3 years. Each graph or DB will hold the company hierarchy and other relevant information like skills/interests/blogs for each employee .(Each tenant would have between 500-5000 employees). Would neo4j be able to handle this design if I have to say create 40-50 DBs per server instance. I will also need replication and failover. I would not need Fabric as the queries wont span multiple DBs. Can someone please point me in the right direction to acheive this scale and flexibility. Many thanks in advance.
At the present time this would not be a good choice, as our multi-database features aren't meant to scale into the thousands. We're ensuring we can support several hundred, but for now thousands is out of the picture.
Also keep in mind that although we do support multiple databases, they are ultimately sharing the same hardware. You would definitely want to use clustering, but it's important to keep in mind the amount of traffic you want to be able to serve as you scale up in tenants. That might require multiple clusters, at the higher end.
I had the exact same intent for using neo4j in a SassS solution that requires a property graph. I thought the ability to create multiple databases, one per customer, would be an ideal method to partition each customers data, so they would not be commingled in one database.
Has there been any progress in neo4j's performance in regards to having a thousand+ databases on one instance/cluster? Keep in mind, that I don't believe the workload will be great for any single user. In any event, I assume I would have to manage different instances/clusters for sets of users if the number of customers increases beyond a practical limit anyways, regardless if they are all in one db or individual dbs.
If a single db hosting all my customers is the practical solution, would it be a good solution to partition the data using customer unique labels applied to all nodes for a client? All queries would then have to include this customer specific label, along with any other domain specific label, to limit querying to only the nodes owned by that customer.
Would it also be practical to create indexes for each customer using the customer unique label? The problem I see with this is that I can't index the domain-specific labels that I would usually use to search on, as an index can only have one label.
Any thoughts or advice would be greatly appreciated?