cancel
Showing results for 
Search instead for 
Did you mean: 

Data Model for Relationships in Time

flavio
Node

I am relatively new to data modelling using Neo4j and I'd like to get some suggestions on how to model relationships in time.
Let's say that John went to ACME school from 01/Aug/1983 to 30/Sep/1988.
Peter also went to ACME school from 01/Jan/1985 to 30/Oct/1990
Mary went to ACME school from 01/Oct/1988 to 30/Oct/1993
In this simple example, Peter shares time with both John and Mary at that school. But Mary doesn't share time with John.
I need to query hundreds of thousands (maybe a few million) of those records looking for overlapping periods between students with a minimum overlap of 180 days.
What would be the best way to model this data in order to have decent performance at query time?
Thanks!

1 REPLY 1

koji
Ninja
Ninja

Hi @flavio

This case is very interesting.

I added test data to the 3 data.

CREATE (acme:School {name:"ACME"})
CREATE (:Student {name:"John"})-[:WENT_TO {startDate:date("1983-08-01"), endDate:date("1988-09-30")}]->(acme)
CREATE (:Student {name:"Peter"})-[:WENT_TO {startDate:date("1985-01-01"), endDate:date("1990-10-30")}]->(acme)
CREATE (:Student {name:"Mary"})-[:WENT_TO {startDate:date("1988-10-01"), endDate:date("1993-10-30")}]->(acme)
// = 180
CREATE (:Student {name:"Student1"})-[:WENT_TO {startDate:date("1980-01-01"), endDate:date("1984-01-28")}]->(acme)
CREATE (:Student {name:"Student2"})-[:WENT_TO {startDate:date("1984-02-01"), endDate:date("1984-07-30")}]->(acme)
CREATE (:Student {name:"Student3"})-[:WENT_TO {startDate:date("1988-04-03"), endDate:date("1993-01-01")}]->(acme)
// < 180
CREATE (:Student {name:"Student4"})-[:WENT_TO {startDate:date("1980-01-01"), endDate:date("1984-01-27")}]->(acme)
CREATE (:Student {name:"Student5"})-[:WENT_TO {startDate:date("1984-02-01"), endDate:date("1984-06-30")}]->(acme)
CREATE (:Student {name:"Student6"})-[:WENT_TO {startDate:date("1988-04-04"), endDate:date("1993-01-01")}]->(acme)

This is the Cypher code for search.
Since there are only a few records, there is no index. If the number of records increases, it is better to add indexes to startDate and endDate.

WITH 180 AS overlap
MATCH (john:Student {name:"John"})-[johnDate:WENT_TO]->(school:School),
      (someone:Student)-[someoneDate:WENT_TO]->(school)
WHERE someoneDate.endDate > johnDate.startDate
  AND someoneDate.startDate < johnDate.endDate
  AND duration.inDays(johnDate.startDate, someoneDate.endDate).days >= overlap
  AND duration.inDays(someoneDate.startDate, johnDate.endDate).days >= overlap
  AND duration.inDays(someoneDate.startDate, someoneDate.endDate).days >= overlap
RETURN someone

John, Student 1, Student 2, and Student 3 were selected.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.