Permission-based Access Control over 6M Hierarchical Folder/File Nodes in Neo4j – Best Practices?

batasnewmind · June 16, 2025, 8:27pm

Hi all,

I'm working on a document management system that uses Neo4j to represent a hierarchical structure of folders and files (nodes labeled as StoreObject). The structure resembles a traditional file system:

Each StoreObject can either be a folder or a file.
The parent-child relationship is represented via [:Is_Parent] relationships.
Each node can optionally have a Permission node attached via an [:Is_Permission] relationship.
A Permission node includes userId, clientStoreId, and boolean flags like canRead.

We currently have 6 million nodes (and growing), and permissions can be added arbitrarily — meaning:

A user can be granted or denied access at any level (folder or file).
There is no strict inheritance rule: a parent may have canRead: false, while a child has canRead: true, or vice versa.
We still need to calculate "effective permissions" when rendering folder contents, especially in UIs.

The Main Challenges:

Performance

Traversing the tree and checking permissions at scale is becoming a bottleneck.
Especially difficult when needing to compute recursive access (e.g. show folders with count of accessible files beneath them).

Effective Permission Calculation

We need to compute what a user can access by:
- Traversing down the tree from a permitted folder.
- Taking into account that a child node might override parent permissions

Query Example (Simplified)

MATCH (perm:Permission {id1: id1, id2: id2})
OPTIONAL MATCH (perm)<-[:Is_Permission]-(store:StoreObject)
WITH store
MATCH path = (store)-[:Is_Parent*0..]->(child:StoreObject)
OPTIONAL MATCH (child)<-[:Is_Permission]-(childPerm:Permission {userId: $userId, clientStoreId: $clientStoreId})
WHERE childPerm.canRead = true
RETURN DISTINCT child

Questions for the Community:

How would you model effective access control over hierarchical data at this scale?
Would you suggest caching strategies, precomputed paths, or flattened permission indexes?
Are there best practices for traversing large trees with permission constraints in Neo4j?
Would something like graph projections and GDS help in this case?

Any suggestions, design patterns, or real-world examples would be greatly appreciated!

Thanks in advance!

hakan.lofqvist1 · June 17, 2025, 6:03am

A wide question. Let me see if I can add some pointers that may help:

Model: Gut feeling, you want to model so there is a path between User and Object if they should have permissions to the object. Do it with clear relationships ( shared_with, inherits, owns, ....). Because then you can turn your query into looking for a path and minimise property inspection in the query.
Best practices for traversing: Respect cost of accessing labels vs rels vs properties. Index for anchoring nodes. Profile queries to understand if query plan is optimal or need hints.
Can gds help: Maybe by pre-computing communities to create some "boundaries".

Other things:

Your query example has optional match, you probably want to change your model and query so you can do where exisist { pattern }
-[:Is_Parent*0..]-> is unbounded, better would be -[:Is_Parent]->{1,999} or even a quantified path pattern if there are things that can be checked to make an early termination of the expansion
Caching is a rabbit hole, scale with secondary servers in the cluster instead.

Final thought on the model, it usually helps to go from "what permissions does a user have to an object" to instead be explicit with what you want to ask "does a user have read permissions" vs "does a user have write permissions" vs "does a user have share permissions". Likely, your application does not need to check "what permissions" it wants to check "can user x do y". Thinking in those terms, may help finding improvements to the model ( could be shared_reader vs shared_editor for a relationship type that helps navigating the graph). I know you said you need "effective permissions" when rendering folder contents, especially in UIs. Hopefully it would help that case too.

My comments are mostly "generic advice", I hope it helps you find some improvements. I also hope some others chime in.

joshcornejo · June 17, 2025, 6:23am

You are solving what is called "ReBAC" (Relationship Based Access Control), you should have a look at OpenFGA if your only two operations over the objects are canRead and canWrite

Topic		Replies	Views
Apoc.schema.nodes/relationships and permissions Procedures & APOC apoc , cypher , relationship , knowledge-base , neo4j	0	37	October 10, 2024
Custom Security extension point options Neo4j Graph Platform security	1	968	September 27, 2018
How to create tree like directory file structure using neo4j Cypher	0	637	May 15, 2020
What is the time complexity of a query that finds all the children of a node Neo4j Graph Platform performance	3	252	March 21, 2023
Folder File Structure using neo4j Cypher apoc , cypher	4	1259	June 9, 2020

July Summer Fun!

Permission-based Access Control over 6M Hierarchical Folder/File Nodes in Neo4j – Best Practices?

The Main Challenges:

Questions for the Community:

Related topics