Web Analytics model using Neo4j

I would like the depict the flow of paths followed by a user from one url to another url using Neo4j. I want to produce an output like a sankey diagram. The project I am doing requires a similar output like Google Analytics [GA4]. Is using Neo4j a feasible option?

I have currently thought of storing the URL as nodes and the relationship between two nodes as relationship with type "PATH". The traversed path is stored as relationship property with a value.

For example if a user moves from A->B->C->D then the relationship between A and B will have the property A_B:1 and the relationship between B and C will have a relationship property AB_C:1.Relationship between C and D will have a relationship property as ABC_D:1.

How will i be able to get real-time flow directly from the nodes and relationships considering i need keep records of the number of users who have travelled from Url to the next url .for example if 10 users have landed in url A and from A, 7 have moved to url B and 3 have gone to url C .How do i keep track of this.I have to depict multiple hops similar to this.My data is also in millions of users.

The above model is not giving me the desired output when i query it for some edge cases.
And for other cases the query time is too much

The query that i am currently using is as follows

MATCH (n:UrlNode {label: 'H'})-[r:PATH*]->(m:UrlNode)
WHERE ALL(rel IN r WHERE ANY(key IN keys(rel) WHERE key CONTAINS 'H'))
WITH n, m, r, [rel IN r | keys(rel)] AS allKeys
UNWIND allKeys AS keyList
UNWIND keyList AS key
WITH n, m, r, key
WHERE key CONTAINS 'H'
RETURN n, m, COLLECT(key) AS filteredkeys, r AS rels

Can someone suggest a better data model or a different query.
Any help will be greatly appreciated

What I would do is to create a User node, and then create a relationship that includes the UserId in the relationship label. Then I believe your queries should be more performant because you're not filtering on property values.

For instance, for finding the path for a single user (u:User {id: 123}), you would have a query like below:

MATCH (u:User {id: 123})-[r:STARTED]->(n:UrlNode)-[r2:PATH_User_123*]->(n2:UrlNode)
RETURN *;

I would also try to limit the number of hops for any specific user, so that you wouldn't be traversing an infinite depth.

If you want to look for multiple user paths, I might try a slightly different model and use a query something like below (adding an index to the relationship property, as well!):

MATCH (u:User {id: 123})-[r:STARTED]->(n:UrlNode)-[r2:PATH*]->(n2:UrlNode)
WHERE r2.userId = 123
RETURN *;
1 Like