Permission-based Access Control over 6M Hierarchical Folder/File Nodes in Neo4j – Best Practices?

Hi all,

I'm working on a document management system that uses Neo4j to represent a hierarchical structure of folders and files (nodes labeled as StoreObject). The structure resembles a traditional file system:

  • Each StoreObject can either be a folder or a file.
  • The parent-child relationship is represented via [:Is_Parent] relationships.
  • Each node can optionally have a Permission node attached via an [:Is_Permission] relationship.
  • A Permission node includes userId, clientStoreId, and boolean flags like canRead.

We currently have 6 million nodes (and growing), and permissions can be added arbitrarily — meaning:

  • A user can be granted or denied access at any level (folder or file).
  • There is no strict inheritance rule: a parent may have canRead: false, while a child has canRead: true, or vice versa.
  • We still need to calculate "effective permissions" when rendering folder contents, especially in UIs.

The Main Challenges:

  1. Performance
  • Traversing the tree and checking permissions at scale is becoming a bottleneck.
  • Especially difficult when needing to compute recursive access (e.g. show folders with count of accessible files beneath them).
  1. Effective Permission Calculation
  • We need to compute what a user can access by:
    • Traversing down the tree from a permitted folder.
    • Taking into account that a child node might override parent permissions

Query Example (Simplified)

MATCH (perm:Permission {id1: id1, id2: id2})
OPTIONAL MATCH (perm)<-[:Is_Permission]-(store:StoreObject)
WITH store
MATCH path = (store)-[:Is_Parent*0..]->(child:StoreObject)
OPTIONAL MATCH (child)<-[:Is_Permission]-(childPerm:Permission {userId: $userId, clientStoreId: $clientStoreId})
WHERE childPerm.canRead = true
RETURN DISTINCT child

Questions for the Community:

  • How would you model effective access control over hierarchical data at this scale?
  • Would you suggest caching strategies, precomputed paths, or flattened permission indexes?
  • Are there best practices for traversing large trees with permission constraints in Neo4j?
  • Would something like graph projections and GDS help in this case?

Any suggestions, design patterns, or real-world examples would be greatly appreciated!

Thanks in advance!

A wide question. Let me see if I can add some pointers that may help:

  1. Model: Gut feeling, you want to model so there is a path between User and Object if they should have permissions to the object. Do it with clear relationships ( shared_with, inherits, owns, ....). Because then you can turn your query into looking for a path and minimise property inspection in the query.
  2. Best practices for traversing: Respect cost of accessing labels vs rels vs properties. Index for anchoring nodes. Profile queries to understand if query plan is optimal or need hints.
  3. Can gds help: Maybe by pre-computing communities to create some "boundaries".

Other things:

  • Your query example has optional match, you probably want to change your model and query so you can do where exisist { pattern }
  • -[:Is_Parent*0..]-> is unbounded, better would be -[:Is_Parent]->{1,999} or even a quantified path pattern if there are things that can be checked to make an early termination of the expansion
  • Caching is a rabbit hole, scale with secondary servers in the cluster instead.

Final thought on the model, it usually helps to go from "what permissions does a user have to an object" to instead be explicit with what you want to ask "does a user have read permissions" vs "does a user have write permissions" vs "does a user have share permissions". Likely, your application does not need to check "what permissions" it wants to check "can user x do y". Thinking in those terms, may help finding improvements to the model ( could be shared_reader vs shared_editor for a relationship type that helps navigating the graph). I know you said you need "effective permissions" when rendering folder contents, especially in UIs. Hopefully it would help that case too.

My comments are mostly "generic advice", I hope it helps you find some improvements. I also hope some others chime in.

You are solving what is called "ReBAC" (Relationship Based Access Control), you should have a look at OpenFGA if your only two operations over the objects are canRead and canWrite