If you are looking for the list of ordered events for a specific day, you could at that with the following query with model above. You could change the Date node to an Hour node if you had that relationship and you wanted an hour. That approach seems restricted, as you could not get events that cross hour boundaries. Another approach is to perform a range search on an event timestamp. This should be fast if indexed. The code below gets the users events on a specific date, orders them by time and passed the list with a 'with' clause, so they can be joined with other related data to perform additional analysis around the events.
match(u:User) where u.id = 1000
match(d:Date) where d.date = Date('2022-01-02')
match(u)<-[HAS_USER]-(e:Event)-[OCCURED_ON]->(d)
with e
order by e.timestamp
match(e)-->(x:OTHER_STUFF)
return e, x
The above is just an idea, but let's concentrate on your cypher. Sorry, I am still confused on the data model and your need. It sounds like you have a list of user events that are ingested each day and this process is time consuming. I assumed each event had a time associated with it and that you were adding these events and wanted to form a 'linked list' of the events ordered by time sequence. As such, don't you have a new relationship for each new event? If all the PRECEEDS relationships are an issue, can you do without them? You can get the order of the events with a timestamp (which should be indexed). Having them linked would be useful if you wanted to traverse the list of events from a specific event in either direction; otherwise, it seems an event timestamp would provide the functionality to order a list of events. Do you have a need to traverse a list of connected?
I need to understand the query a little more in order to assist. I am confused at what you are trying to achieve. I assume each event has just one HAS_HOUR relation to one Hour node, as an event happens within a specific hour. If true, the 'with' clause on line 4 will only have one row resulting from the match on line 2; therefore, the 'min' aggregation will be over just one Hour node and the result will be equal to the hour/minute associated with event 'e'. Since the 'with' clause is grouping by 'e' and the aggregation is on 't', the values of 'e2' and 't2' are not used and those matched can be removed. Is my understanding of the relationships between 'e' and 't' correct or incorrect?
The match on line 5 also confuses me. Here you are getting the pairs [e, e2] that are related by a PRECEEDS relationship, then constraining 'e2' to those nodes that have an hour/minute value greater than that of 'e'. I thought the fact that 'e' and 'e2' had a PRECEEDS relationship implied that 'e2' occurred after 'e', thus the 'where' clause would always be true and there is only one possible 'e2' event related to 'e' in this manner.
If this query is run in isolation, from a high level, it seems it gets each pair of [e, e2] nodes connected to each other, then returns the value of 'e' on line 4, then matches it to the same value of 'e2' that would have been matched one line 1, then deletes the relationships between the two. At the end, wouldn't all the PRECEEDS relationships be removed?
I think I am missing some understanding of your data mode. I am glad to try to help if you want to take the time get me to a shared understanding. I am sorry it is taking time and lots of questions.
Just a note, you need to divide by '60' instead of '100' if you want to convert the {hour, minute} pair to an hour value with a fraction that accounts for the minutes.