Below is the relationship count formed between the 4 labels,
STUDIES:62000, PLAYS:35000, PERFORMS:41000
I would like to find a pattern of students performing similar activities in a set period of time and what will be the next set of activities they may perform.
I am trying to achieve a time series based model like below,
Performing the same operation in regular time series based approach is challenging due to the large number of nodes in my actual project.
Please provide some achievable solutions using Neo4j and applicable GDS Algorithms that I can implement for the above problem.
I think you are going have difficulties with your data model, because you are storing the dates in a list. Instead, create a new relationship for each month a student participated in an activity, with the date as a relationship property. The new version of neo4j introduced indexing on relationship properties, so you can leverage that to find all interactions for a date or range of dates quickly.
Addressing your timeline requirement would be easier with my suggested data model. You can get the timeline data for a specific user as follows.
match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e)
return u.name as student, r.date as date, collect(e.name) as activities
order by date
the above will return a row for each date, with a list of activities the user participated in for that day.
having the dates in a list will make difficult to search and sort by.
this is just one options. There are others, but the best is based on its ability to allow you to answer your analytic questions.
Thanks for the quick response. I have couple of doubts in your suggestion.
1. Can we have different relationships between same 2 nodes, but with different properties (i.e., 'date')?
2. While exploding the relationship property('month') from list to individual rows, does it affect the performance of the graph? And what is the max limitation of relationship count in the community edition?
1. You can have as many relationships between the same two nodes as needed. they can exactly identical too.
2. It will negatively impact in some scenarios, but positively impact in others. In your scenario, you will need the cypher to retrieve all the relationships of these types for a specific person and group them by date, so you can get the actives for each day. Below are solutions for each data model. The trouble you will have is searching and filtering by the data in queries, as the list has be iterated through each time to evaluate a filter predicate.
The best solution depends on your needs. Which gives you the ability to efficiently answer your analytic questions.
Query for relationships with single date:
match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e)
return u.name as student, r.date as date, collect(e.name) as activities
order by date
Query for relationships with list of dates:
match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e)
with u, e, r
unwind r.dates as date
return u.name as student, date, collect(e.name) as activities
order by date
Apologies for getting back a little delayed on this.
Thanks for your suggestions. I have recreated the data structure and also modified the Graph Schema to address the same. Have added a NEXT relationship among the various events so they form a chain.
As next step could you please help/point me to the GDS Algorithms that best address the Journey identification challenge.
Glad you have made progress. I have to apologize; I am not a GDS user, so I am not familiar with the algorithms. You can find them with the link below. Maybe the node similarity algorithm would be a place to start.