Performance and effectivity comparison

Hi, I am working on my master thesis and I have a few questions related to performance and effectiveness. My database consists of data from an identity management system, where basically users and their roles are stored.

So, my task is to add new information about when was the role used. I need to store all timestamps when a specific user accessed some resource using that role.

  • Option one is to create a new node with a timestamp property and connect it with user and role nodes.
  • Option two is to just add the new array of timestamps of usage between user and role nodes.

Despite I have just a small amount of data, I later need to test my solution on a big dataset and use some graph algorithms which will be traversing the whole graph.

My question is which option can be more effective and provide more performance for graph traversal?

The short answer is that you're likely going to be better off with modeling each access as its own node (and not a set of properties on a relationship) because later you may want to assert properties about the access itself, and you'll want to take advantage of indexing.

So instead of:

(:Role)<-[:HAS]-(:User)-[:ACCESSED]->(:Resource)

You might consider

(:Role)<-[:HAS]-(:User)-[:access]->(:Access { date, IP, etc }]-[:resource]->(:Resource)

The long answer on why and what the tradeoffs are you can find here: Graph Data Modeling: All About Relationships | by David Allen | Neo4j Developer Blog | Medium