Hi all,
I'm working on a document management system that uses Neo4j to represent a hierarchical structure of folders and files (nodes labeled as StoreObject
). The structure resembles a traditional file system:
- Each
StoreObject
can either be a folder or a file. - The parent-child relationship is represented via
[:Is_Parent]
relationships. - Each node can optionally have a
Permission
node attached via an[:Is_Permission]
relationship. - A
Permission
node includesuserId
,clientStoreId
, and boolean flags likecanRead
.
We currently have 6 million nodes (and growing), and permissions can be added arbitrarily — meaning:
- A user can be granted or denied access at any level (folder or file).
- There is no strict inheritance rule: a parent may have
canRead: false
, while a child hascanRead: true
, or vice versa. - We still need to calculate "effective permissions" when rendering folder contents, especially in UIs.
The Main Challenges:
- Performance
- Traversing the tree and checking permissions at scale is becoming a bottleneck.
- Especially difficult when needing to compute recursive access (e.g. show folders with count of accessible files beneath them).
- Effective Permission Calculation
- We need to compute what a user can access by:
- Traversing down the tree from a permitted folder.
- Taking into account that a child node might override parent permissions
Query Example (Simplified)
MATCH (perm:Permission {id1: id1, id2: id2})
OPTIONAL MATCH (perm)<-[:Is_Permission]-(store:StoreObject)
WITH store
MATCH path = (store)-[:Is_Parent*0..]->(child:StoreObject)
OPTIONAL MATCH (child)<-[:Is_Permission]-(childPerm:Permission {userId: $userId, clientStoreId: $clientStoreId})
WHERE childPerm.canRead = true
RETURN DISTINCT child
Questions for the Community:
- How would you model effective access control over hierarchical data at this scale?
- Would you suggest caching strategies, precomputed paths, or flattened permission indexes?
- Are there best practices for traversing large trees with permission constraints in Neo4j?
- Would something like graph projections and GDS help in this case?
Any suggestions, design patterns, or real-world examples would be greatly appreciated!
Thanks in advance!