Controlling schema in Neo4j

Software developers need access to the Neo4j database, but they should not be able to create anything they want because we will end up with extra objects in the database that are not part of any agreed upon schema. (It's a basic law of chaos theory that just hasn't been named yet).

So I want to exert schema control to the full extent that Neo4j will allow. The database should enable developers and users to create only tokens (node labels, edge types, and properties) that are defined in the database schema. The database admin is solely in charge of the database schema.

My approach is to use roles as show in the image below. Only the db admin can create tokens that do not yet exist in the database, the users and developers can only create objects that are allowed by the db admin.

So if the development team needs a Person node label, the db admin creates an actual node and places a Person label on it. It really exists, and is fully accessible by queries in the database. That's fine for development but in production the db admin should not be creating an actual user. Yet it seems that this is necessary.

Is there a way for the db admin to define the allowable tokens without actually creating live instances of them in the database?

I am wide open for suggestions here including the suggestion of external tools. I'm using Neo4j Enterprise.

I am not sure that the required solution exists out of the box. But I have a few tips to try:

  1. If you do not have time pressure then you can wait for the Neo4j 4.0, where there are Role-Based Access Control and Fine-Grained Security, so you can have more option to control the access.

  2. There is an external solution to implement this custom access control and advanced schema enforcement. GraphAware has an extension where you can configure this kind of things. But it is available only in their enterprise subscription.

Disclaimer: I am an ex-GraphAware employee.

Thanks szenyo, waiting for version 4.0 is do-able and I respect anything coming out of GraphAware, but the problem I have is nuanced. I want someone with only editor role to be able to create the very first instance of a new node label, edge type or new property. Instead, as the db admin I have to 'prime the pump' by literally creating the first instance of every one of these so that it is possible for those with editor role to create them.

Example: Say we need a node label for representing people in the database. A developer who has only 'editor' access tries to add non-existent label People to a node in the database--and it properly fails because the 'editor' role does not allow creation of new tokens. That's good.
So then I say, let's not name it People but Person, and then I use my 'admin' role to create a new node with a new label Person. It's not just a declaration or definition of an allowed label; it's a real functional node now living in the database. Ok, so now the developer can add the new Person label to any node she wishes. But the problem is that I had to create a real node in the database in order to create that label. I wish I could avoid adding a node myself.

I don't think version 4.0 has a remedy for this. It's not mentioned in the 4.0 documentation, and I don't see it mentioned in GraphAware's short description of its schema enforcement solution:

Schema Enforcement

This extension allows you to define and ascertain a pre-defined schema in the graph. Neo4j is said to be a schema-free database, however, it can sometimes be beneficial to define your rules and put constraints on what can and cannot be written to the database. An example of such constraint could be a rule that every person node has to have a name.

My thought for now is that when I create a new token I should create a primer node or edge with the new label, type, or property and also a time-to-live (expiration date), and coordinate closely with the development team to make sure they create at least one instance of the new token before the primer node/edge expires.

This primer node concept with expiration date looks a bit over-engineering for me. I think it is harder to implement this than solve the original problem. If you need very unique custom schema enforcement, and custom validation before every write transaction, it is better to implement custom logic in a TransactionEventHandler class in Java. Then you can check whatever you want if you implement it in Java. So, you can implement your token concept this way. But to be honest I am not a big fan of this type of customisation where we want to enforce a change in the concept of a well designed best of breed solution which provides us schema-free option.

Thanks for your inputs @szenyo. I agree with you after all that I don't want any extra overhead for graph execution caused by me permanently altering Neo4j, particularly since I am aiming at just a deployment problem here. It's literally a one-off for the introduction of each new token. It doesn't happen often in a production environment. So I will continue to just coordinate closely with the development team, and when a new kind of token needs to be introduced I'll create the dummy object with that token. And basically as soon as the dev team signals that they are able to create that token themselves in the graph I'll remove the dummy object.

Why not use an internal GraphQL service as your interface to the database? This way you get a controlled schema. If you just need raw crud manipulation, the GraphQL plugin for neo4j will generate a schema for you and a service as well based on the existing state of your current database.
Alternatively,if you are starting from scratch the neo4j-graphql-* libraries will automagically generate the necessary resolvers and query + mutation type definitions, all you need to define is the GraphQL schema for them.