Date list as relationship property - create a Student Journey

Dineshramk · January 15, 2023, 11:36am

I am building a time series based pattern identifying project using Neo4j. Below is a sample schema of the graph i have created.

Below is the distinct count of the data we have,

Student: 30000, Sports:45, Academics:20, Extracurricular: 30

Below is the relationship count formed between the 4 labels,

STUDIES:62000, PLAYS:35000, PERFORMS:41000

Screen Shot 2023-01-15 at 4.47.09 PM.png

I would like to find a pattern of students performing similar activities in a set period of time and what will be the next set of activities they may perform.

I am trying to achieve a time series based model like below,

Screen Shot 2023-01-15 at 4.57.15 PM.png

Performing the same operation in regular time series based approach is challenging due to the large number of nodes in my actual project.

Please provide some achievable solutions using Neo4j and applicable GDS Algorithms that I can implement for the above problem.

glilienfield · January 15, 2023, 1:54pm

I think you are going have difficulties with your data model, because you are storing the dates in a list. Instead, create a new relationship for each month a student participated in an activity, with the date as a relationship property. The new version of neo4j introduced indexing on relationship properties, so you can leverage that to find all interactions for a date or range of dates quickly.

Addressing your timeline requirement would be easier with my suggested data model. You can get the timeline data for a specific user as follows.

match(u:User{id: 100})

match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e)

return u.name as student, r.date as date, collect(e.name) as activities

order by date

the above will return a row for each date, with a list of activities the user participated in for that day.

having the dates in a list will make difficult to search and sort by.

this is just one options. There are others, but the best is based on its ability to allow you to answer your analytic questions.

Dineshramk · January 15, 2023, 5:28pm

Hi @glilienfield,

Thanks for the quick response. I have couple of doubts in your suggestion.

1. Can we have different relationships between same 2 nodes, but with different properties (i.e., 'date')?

2. While exploding the relationship property('month') from list to individual rows, does it affect the performance of the graph? And what is the max limitation of relationship count in the community edition?

Thanks in advance.

glilienfield · January 15, 2023, 7:55pm

Answers:

1. You can have as many relationships between the same two nodes as needed. they can exactly identical too.
2. It will negatively impact in some scenarios, but positively impact in others. In your scenario, you will need the cypher to retrieve all the relationships of these types for a specific person and group them by date, so you can get the actives for each day. Below are solutions for each data model. The trouble you will have is searching and filtering by the data in queries, as the list has be iterated through each time to evaluate a filter predicate.

The best solution depends on your needs. Which gives you the ability to efficiently answer your analytic questions.

Query for relationships with single date:

match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e) 
return u.name as student, r.date as date, collect(e.name) as activities
order by date

Query for relationships with list of dates:

match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e) 
with u, e, r
unwind r.dates as date
return u.name as student, date, collect(e.name) as activities
order by date

Dineshramk · January 22, 2023, 6:01pm

Hi @glilienfield ,

Apologies for getting back a little delayed on this.

Thanks for your suggestions. I have recreated the data structure and also modified the Graph Schema to address the same. Have added a NEXT relationship among the various events so they form a chain.

As next step could you please help/point me to the GDS Algorithms that best address the Journey identification challenge.

Thanks a lot for your help!!

glilienfield · January 22, 2023, 6:35pm

Glad you have made progress. I have to apologize; I am not a GDS user, so I am not familiar with the algorithms. You can find them with the link below. Maybe the node similarity algorithm would be a place to start.

https://neo4j.com/docs/graph-data-science/current/algorithms/

I would think you would project a filtered version of your graph that only projects the entities that meet your time frame.

Dineshramk · January 22, 2023, 7:10pm

Thanks @glilienfield I will explore the GDS algorithms keep this thread updated with my work.

As a first step taking hints from @alicia_frame's project from,
https://github.com/AliciaFrame/GDS_Patient_Journey

cobra · January 27, 2023, 3:30pm

Hello @Dineshramk

To tell you which algorithm to use, I need to know what you want to do. What questions are you trying to answer?

Regards,
Cobra

Dineshramk · February 3, 2023, 3:41pm

@Cobra

I am trying to detect

1. communities (of students having similar pattern of activities throughout multiple years).

2. Predict students to register for an activity based on similarities with other students prior activities.

3. Rank activities based on the amount of students taking part, leaving after certain time etc.

Kindly let me know if you require further details.

Topic		Replies	Views
Data Model for Relationships in Time Modeling data-modeling	1	351	January 1, 2022
Anyone have any experience implementing relationships with time based properties? Neo4j Graph Platform	2	453	September 17, 2020
Cypher Querying/Modeling Help/Guidance needed. (Multiple Relationships OR List Property) Cypher	1	712	June 6, 2019
Creating Relationship basis condition and limit Neo4j Graph Platform cypher , neo4j-desktop	7	494	August 11, 2021
Creating relationship between nodes according to date Neo4j Graph Platform migrated	6	265	November 30, 2022

Date list as relationship property - create a Student Journey

Related topics