We have a use-case in our company, where we're using neo4j as the database and we're currently refactoring the application using it. We also have a one-time chance to change our datamodel and are not entirely sure, how to model this for most flexibility and performance.
Imagine you have an application, where you need to manage a lot of different reservations of different rooms/tools/... (e.g. an reservation on day 1 from 08:00 to 12:00 for meeting room 1). This in of itself wouldn't be that complicated, what makes it complex is that we have to provide different "view-levels" for managing those reservations:
Reservations are only for a few hours (e.g. 08:00 - 12:00) and only for one particular entity
Reservations are spanning several days (e.g. Monday 08:00 to Friday 12:00) for several entities
Reservations are spanning several weeks, months or even years
Users of the application should be able to manage reservations on those different view-levels. Users of the application can for example create a year-long reservation for a tool. Also the users should be able to look at the reservations on different view-levels, such that for example in the month view you see one rectangle per day (in a calendar-like view) and reservered entity, and it should be marked if there is ANY reserveration for the referenced entity and day (no matter if this reservation only lasts 5 minutes from 08:00 to 08:05). Conversely the user should also be able to create a week-long reservation in the month view and then switch to the day view and see the respective markings for the just created reservation.
This text will be blurred
We're currently debating on which data model we should choose for such a use-case, such that we can performantly support all operations (i.e. reading, creating, updating and deleting) on the different view-levels.
We basically came up with three different approaches:
Using time-trees similar to what is described here and here. However it seems that this approach was mainly motivated by the fact that neo4j didn't have indices on numerical properties (i.e. millisecond timestamps) and hence was not as performant when it came to searching. We're still evaluating it though, hence I'd like to hear your opinion on that.
Using numerical indices (for searching) and a very granular model for storing the reserverations. We'd store the reserverations in a "smallest" unit (e.g. 1 hour chunks) and then generate the reservations for other view-levels (e.g. weekly) by aggregating the more granular chunks (see visualization at the end of this post). This approach is our current favorite, as it is easy to implement and quite performant (we've already tried some ideas), but we're concerned with performance in case someone wants to create 100 year-long reservations at the same time (which would lead to a lot of nodes being created, like
100 x (365 * 24) = 876000 nodes if we use one-hour chunks as our smallest unit and assume 24/7 availability for reservations).
Using numerical indices and a "layered" approach for storing reservations. What I mean by that is that we would store the reservations on the level they were created (e.g. hour, day, week, ...) and then generate the "lower" (e.g. hours, days and weeks for a month-long reservation) and "upper" (e.g. weeks, months, years for a day-long reservation) reservations on the fly. This certainly has better performance in many use-cases as opposed to the second approach (e.g. when creating a year long reserveration it would only create one node) but would also introduce much more complexity code-wise. For example if someone creates a week-long reservation and then switches over to the week view to change to remove the reservation for a single day in the original timespan, we would need to split the weekly reservation up into several chunks. It just introduces a LOT of book-keeping logic, which we would like to avoid.
We also need to run analytics on these reserverations (e.g. how much is a certain room/tool reserverd throughout the year, how much capacity is left) and hence be able to query those in a performant manner, for example to check that nothing is reserved twice in a certain timespan. Also additional features like recurring reservations should be easily doable with our model.
We're already quite convinced by the second approach, but it would be really interesting to hear what others thinks of this solution and if there are other ways we haven't seen.
Thanks in advance! 🙂
PS: Debating whether neo4j is the right database/paradigm for this use-case is certainly interesting, but a switch to another database technology is not up for debate here, as it's already set and the current circumstances prohibit us from switching here.