Hi,
I'm looking for advice on how to model journeys of users on a website like amazon. There are thousands of product pages a user can visit and a user can have many sessions, where each session could be consisting of clicks and visits to many product pages.
The dataset has a time series of user visits that looks like this.
id event time_spent_seconds session
user1 page1 1 1
user1 payment 10 1
user1 page3 2 2
user1 page4 5 2
user2 page4 1 3
user2 page1 4 3
user2 payment 3 3
user2 logout 9 3
user2 page5 4 4
user2 page4 5 4
user2 page5 6 5
We would like to store this in a Neo4J database so that we can analyse and create reports like:
- which actions were most common after a given starting point
- based on past history where is the user likely to go next
- Average amount of time spent on a given page and between 2 given actions
- Identify if a particular page is causing friction in user journey (ex: perhaps the page is taking too long to load, etc.)
What is the best way to store this in a graph database? I'm thinking each session by each individual users will be one network in the database.
Each [CLICK] will contain the link clicked and the time spent on a page is stored as a property of the corresponding (page) node.
(user) nodes will contain the details of the user, like gender, phone number, location, etc.
(user1)-[:CLICK]->(page1)-[:CLICK]->(payment)
(user1)-[:CLICK]->(page3)-[:CLICK]->(page4)
(user2)-[:CLICK]->(page4)-[:CLICK]->(page1)-[:CLICK]->(payment)-[:CLICK]->(logout)
(user2)-[:CLICK]->(page5)-[:CLICK]->(page4)
(user2)-[:CLICK]->(page4)-[:CLICK]->(page5)
User actions like scrolling on a page, etc., are not stored.
Is this good design? What is the best way to store this in Neo4J so as to make analyses like above most efficient?