I would like the depict the flow of paths followed by a user from one url to another url using Neo4j. I want to produce an output like a sankey diagram. The project I am doing requires a similar output like Google Analytics [GA4]. Is using Neo4j a feasible option?
I have currently thought of storing the URL as nodes and the relationship between two nodes as relationship with type "PATH". The traversed path is stored as relationship property with a value.
For example if a user moves from A->B->C->D then the relationship between A and B will have the property A_B:1 and the relationship between B and C will have a relationship property AB_C:1.Relationship between C and D will have a relationship property as ABC_D:1.
How will i be able to get real-time flow directly from the nodes and relationships considering i need keep records of the number of users who have travelled from Url to the next url .for example if 10 users have landed in url A and from A, 7 have moved to url B and 3 have gone to url C .How do i keep track of this.I have to depict multiple hops similar to this.My data is also in millions of users.
The above model is not giving me the desired output when i query it for some edge cases.
And for other cases the query time is too much
The query that i am currently using is as follows
MATCH (n:UrlNode {label: 'H'})-[r:PATH*]->(m:UrlNode)
WHERE ALL(rel IN r WHERE ANY(key IN keys(rel) WHERE key CONTAINS 'H'))
WITH n, m, r, [rel IN r | keys(rel)] AS allKeys
UNWIND allKeys AS keyList
UNWIND keyList AS key
WITH n, m, r, key
WHERE key CONTAINS 'H'
RETURN n, m, COLLECT(key) AS filteredkeys, r AS rels
Can someone suggest a better data model or a different query.
Any help will be greatly appreciated