Find "previous" and "next" node in an events database

I have an events database (that is still active where new events are added / deleted / updated) whereby for a new event that is added, I need to know the "previous" and "next" event so that I can add a "FOLLOWS" relationship between the events, thus, meaning i can "follow" an event to the next event to the next event etc such as:

(Basketball (3pm)) - FOLLOWS -> (Tennis (1pm)) - FOLLOWS -> (Music Concert (11am))

Events can be added in the future but can also be retrospectively updated if in the past.

An event has the following properties:

  • id
  • name
  • startTime (timestamp)
  • venueId

Each event has a "SHOWN_IN" relationship to a "venue", therefore a single venue has many events related to it.

(Football 10am) - SHOWN_IN -> (Venue 1)

There are many historical events for a venue, sometimes up to 15000 events in total, therefore performance is key.

I have tried to write a Cypher query that is quick (100ms) but with a large number of "hits", this is what i would like to resolve.

There are a few caveats i have to also consider:

  1. The previous or next event may not be within X days, there may be significant gaps between events, for example an event might be added 2 years in the future (such as big event) therefore the finding of the next or previous cannot be date bound
  2. If there is an event at the same time (due to a new event replacing an old event where the old event will be deleted) then this needs to be considered and must be in a "FOLLOWS" chain

The query i have so far is as follows:

MATCH(e:Event) WHERE e.id = "123"
MATCH(previous:Event) 
    WHERE previous.venueId = "abc" 
        AND previous.startTime < e.startTime ORDER BY previous.startTime DESC LIMIT 1
MATCH(next:Event) 
    WHERE next.venueId = "abc" 
         AND next.startTime > e.startTime ORDER BY next.startTime ASC LIMIT 1
CREATE (next)-[:FOLLOWS]->(e)
CREATE (e)-[:FOLLOWS]->(previous)

I have ran PROFILE on this query and i can see that the hits are very large, optimally i would like to see this < 100 hits for any size events in a venue if possible

It looks like you are inserting a new event in-between to existing events. If so, you can probably reduce the number of nodes tests by restricting your search to a pattern with connected nodes. Also, after inserting the new event between two existing connected events, don't you have to remove the existing FOLLOWS relationships, since it is no longer true?

The following query assumes an insertion scenario, i.e. connected events exist that are before and after the new event. Don't you have to also account for the case were a new event occurs after the last event in the chain and for the case of a historical event being added before all events in the chain? This could easily be handled if every time you add a new venue, you include two events connected to the venue that are connected by a FOLLOWS relationship and the start node has the earliest possible neo4j date for its time and the end node has the largest possible neo4j date for its time. In this way, every new event is always an insert.

MATCH(e:Event) WHERE e.id = "123"
MATCH(previous:Event)-[r:FOLLOWS]->(next:Event)
WHERE previous.venueId = "abc"
AND previous.startTime < e.startTime
AND next.venueId = "abc"
AND next.startTime > e.startTime
MERGE (next)-[:FOLLOWS]->(e)
MERGE (e)-[:FOLLOWS]->(previous)
WITH r
DELETE r

Sorry, I don't have test data, but it executes without error.

Hi, thanks for your reply, the existing data does NOT have existing FOLLOWS relationships, it is only now that we are looking to retrospectively add these, therefore, unfortunately, MATCH(previous:Event)-[r:FOLLOWS]->(next:Event) will not return any results straight off the bat

Got it, so are you going to run a script to remediate the data to add the FOLLOWS relationship between every Event node? If so, performance is not as critical, but you need a script to do that.

Have you considered adding an index to improve performance during the remediation? An index on the combination of Event label and startTime property should prove helpful.

Are seeking help with a remediation script to add the FOLLOWS relationships?

I see the following steps if that is what you are looking for:

  1. Get all the venues and associated events
  2. For each venue, loop through the pairs sequential events and add the FOLLOWS link between the pair

After remediation, you will need a query to add a new event for a given venue, by inserting it in the chain of events for the given venue?

Does this sound like what you are looking for?

thanks for your reply.

we are not planning on remediating the old data, we are planning on adding these FOLLOWS relationships adhoc (as we get changes on events (either inserts or updates).

We have added an index to improve the performance, specifically on the label and startTime which means that the query run time is ~150ms, however the number of "hits" for the query is still a lot due to the large result set (15000 events per venue). We get hundreds of events a minute so any query that has a large number of hits ends up backing up the service and causing throughput issues.

if you got that many and generated that frequently, can you insert a ':Date' node between venue and event to group events to specific date? Then you would only need to find the correct date and perform the insert only considering the events on the same day?