Creating a network of user journeys

Hi,

I'm looking for advice on how to model journeys of users on a website like amazon. There are thousands of product pages a user can visit and a user can have many sessions, where each session could be consisting of clicks and visits to many product pages.

The dataset has a time series of user visits that looks like this.

    id  event  time_spent_seconds    session
 user1  page1                    1          1
 user1  payment                 10          1
 user1  page3                    2          2
 user1  page4                    5          2
 user2  page4                    1          3
 user2  page1                    4          3
 user2  payment                  3          3
 user2  logout                   9          3
 user2  page5                    4          4
 user2  page4                    5          4
 user2  page5                    6          5

We would like to store this in a Neo4J database so that we can analyse and create reports like:

  • which actions were most common after a given starting point
  • based on past history where is the user likely to go next
  • Average amount of time spent on a given page and between 2 given actions
  • Identify if a particular page is causing friction in user journey (ex: perhaps the page is taking too long to load, etc.)

What is the best way to store this in a graph database? I'm thinking each session by each individual users will be one network in the database.
Each [CLICK] will contain the link clicked and the time spent on a page is stored as a property of the corresponding (page) node.
(user) nodes will contain the details of the user, like gender, phone number, location, etc.

(user1)-[:CLICK]->(page1)-[:CLICK]->(payment)
(user1)-[:CLICK]->(page3)-[:CLICK]->(page4)
(user2)-[:CLICK]->(page4)-[:CLICK]->(page1)-[:CLICK]->(payment)-[:CLICK]->(logout)
(user2)-[:CLICK]->(page5)-[:CLICK]->(page4)
(user2)-[:CLICK]->(page4)-[:CLICK]->(page5)

User actions like scrolling on a page, etc., are not stored.

Is this good design? What is the best way to store this in Neo4J so as to make analyses like above most efficient?

Take a look at Loading and analyzing Snowplow event data in Neo4j | Snowplow

This is great! Thanks a lot!

Is it possible to share a sample of how the user_nodes.csv, page_nodes.csv and next_relationships.csv looks like?

It shouldn't be a big deal. User can have additional information, pages needs their URL and user id, so you can link them to the user. Once you have that, you can order events by user and date, and use apoc.nodes.link to create a NEXT relationship between them.

More in the documentation: Creating Data - APOC Extended Documentation

Sure, will try this out.

I was getting a bit confused reading the queries on the link you gave earlier but it is clear now. I'll also try the apoc.nodes.link as well.

Thanks once again.

There was also a recent talk by @adam_cowley on the topic at NODES2021

Thanks Michael. This is very helpful!