Pros and Cons on how to model events?

hi all,

I appreciate your thoughts on how to best model the date of events.

A simplified schema of our graph is shown below. It includes two options for how we currently are thinking on how to model the date of an event.
For our purpose an event is a date(time) when a product arrived, was produced, altered (e.g. packaged) or dispatched.
Our goal is to find all products, and the customers who received them and suppliers that delivered them, that are produced from "bad" input products. (this was discussed in How to include multiple leave-nodes in a query that returns a path?).
Now we want to include a date or date-range in the query.

Options:

  1. The date can be modelled on a node or relationship.
    NB In this case we prefer the relationship as that better describes a transition.
  2. A product node is related to a day or date node in a time-tree.

Option 1 seems to be the more 'natural' solution as the date is stored where you expect it. We learned that it's now possible with Neo4j to index properties on a relation, so this could also be a sufficiently fast solution.
But we also read that it's faster with Neo4j to follow a relation than to read a property. Which lead us to option 2; using a time-tree. The query will become more verbose, but probably faster.
A drawback of option 2 may be that the day nodes will become so-called super-nodes (with approx. 100.000 to 200.000 products (all green nodes in the image) linked to a day-node, for each working day of the year). Another drawback is that this approach requires some additional nodes, like the transport-node, that we originally did not require.
The super-node can be mitigated (a bit) by adding hour-nodes to the day-node. But our business is only interested in queries at a day-level, so this may not be an improvement after all...

What is your experience with option 1 or 2? Or what are your thoughts?
And are there more alternatives?

Hi,

I'm not sure I understand that 2nd design where date/times are "first class citizens" - and for that, i would probably use a timeseries database that is optimised for that type of problem.

IMHO:

  • you have the nodes anyway and you can just add properties to them (createdAt, dispatchedAt, packagedAt, etc) , not sure what value you get from decoupling time as an entity.
  • you will probably have supernodes anyway - on a farm/supplier/crop/batch/customer level?

So I would say 1st design makes more sense.

I think the answer depends on how you think you will query the graph. What paths will you take?

The time tree can be useful in some scenarios, you can even have a “Product Day” if you want to reduce the denseness of your Day nodes. Time tree is helpful if many of your questions are along the lines of “given x happend on day z what happened befor/after”.

If your questions are more like: “how was the delivered transport sourced and when” then you would benefit from having the time on the crop/batch/product … nodes insted.

Thanks both for your input. We have decided to check both options and are currently creating the data model and loading the database with sufficient data to perform some meaningfull queries. The performance is of the queries will be important to us.