hi all,
I appreciate your thoughts on how to best model the date of events.
A simplified schema of our graph is shown below. It includes two options for how we currently are thinking on how to model the date of an event.
For our purpose an event is a date(time) when a product arrived, was produced, altered (e.g. packaged) or dispatched.
Our goal is to find all products, and the customers who received them and suppliers that delivered them, that are produced from "bad" input products. (this was discussed in How to include multiple leave-nodes in a query that returns a path?).
Now we want to include a date or date-range in the query.
Options:
- The date can be modelled on a node or relationship.
NB In this case we prefer the relationship as that better describes a transition. - A product node is related to a day or date node in a time-tree.
Option 1 seems to be the more 'natural' solution as the date is stored where you expect it. We learned that it's now possible with Neo4j to index properties on a relation, so this could also be a sufficiently fast solution.
But we also read that it's faster with Neo4j to follow a relation than to read a property. Which lead us to option 2; using a time-tree. The query will become more verbose, but probably faster.
A drawback of option 2 may be that the day nodes will become so-called super-nodes (with approx. 100.000 to 200.000 products (all green nodes in the image) linked to a day-node, for each working day of the year). Another drawback is that this approach requires some additional nodes, like the transport-node, that we originally did not require.
The super-node can be mitigated (a bit) by adding hour-nodes to the day-node. But our business is only interested in queries at a day-level, so this may not be an improvement after all...
What is your experience with option 1 or 2? Or what are your thoughts?
And are there more alternatives?
