Graph Structure for Shared Facts & History of Connected Nodes

geronimo4j
Graph Buddy

Hi I'm hoping to see if the community might weigh in on a structure I'm looking at using. I'm trying to make a really scalable but really connected knowledge graph.

I currently have Movie nodes and then shared Widgets for tidbits of info. Simplified they might look like this:

(The Dark Knight)-[:HAS_DETAIL]->("Budget: 15000000")
(The Dark Knight)-[:HAS_DETAIL]->("Spoken Language: English")
(The Dark Knight)-[:HAS_DETAIL]->("Genre: Action")
(The Dark Knight)-[:HAS_DETAIL]->("Genre: Thriller") 
(The Dark Knight)-[:HAS_DETAIL]->("Genre: Superhero")

This works because i can have something like this:

(The Dark Knight)-[:HAS_DETAIL]->("Genre: Action") <-[:HAS_DETAIL]-(Tenet)

So two movies can share the same details, which makes the graph more connected and useful.

What I want to do next is level this up a bit, and have users able to add/modify these details (for whatever reason) and then also keep a history of all of the modifications.

To modify that I could just disconnect the nodes and add a new relationship to another Detail node and add the user's name and a timestamp to track when that node was added, but to keep track of the modifications, that breaks down because the relationship was where I would have been tracking the timestamp/username for that event.

For example:
(The Dark Knight)-[:HAS_DETAIL { added_by: "Dan", date_time: "11/30/2021" }]->("Genre: Action")

So this makes sense for all currently active connected relationships. For modifications (details that were removed) I could theoretically just try to make them inactive? E.g. if I remove this inaccurate genre:

(The Dark Knight)-[:HAS_DETAIL { added_by: "EvilPerson", date_time: "11/02/2021", active: false }]->("Genre: Comedy")

So to fetch the history of details I would fetch ALL relationships, to fetch active ones I would look for active: true. But this feels like it's a waste of a ton of relationships that would be stored for a very long time, and doesn't account for times when two nodes are connected, then separated, then reconnected because the relationship would be overwritten.

I hope this wasn't too confusing, but essentially I'm hoping to track the history of connected nodes (and I've checked a bunch of previous questions asked on here prior to asking), do people even keep nodes connected but have an "active"-like indicator in the relationship? Are there better ways to store histories of relationships?

1 REPLY 1

martin3
Node Clone

If you only have movies in the database, and want to keep track of modifications of movie details, and if all movies have the same kind of details, then maybe the relational database model would be the easiest model and method for you.

Then you could just store all the movies in one table, and have one column per detail. You could have another table for users, and a third table joining the users and movie table, for saving a history modifications of the movie table.

However, if you want to save information about relations between users and movies, or among users and among movies, then a graph database could be a good option.

NoSQL doesn't necessarily mean NotSQL, it just means not only SQL. SQL can also be OK

But if you want to do this in a graph database. Then I would also have User nodes and Modification nodes, maybe you could also have one sequential list of Timeline nodes for each history of modifications, or at least an Index property.