# Creating Sankey diagram - counting paths between nodes

(Seanfdnn) #1

Hello,

I have a model of the itineraries of many people who visit locations, and I want to create a Sankey diagram similar to https://maxdemarzi.com/2014/09/11/tracking-user-paths-in-an-ivr-with-neo4j/ that illustrates the frequency of paths followed from a given location.

I have 3 types: Person, Visit, and Place.
Each Visit node points to the single Place that the visit happened.
Visit nodes are connected in a sequential chain by `FOLLOWED_BY` relationships, i.e., you could follow the itinerary for each person by transversing the chain. At each Visit, however, you need to find the associated Place for that visit.

Additionally, I was trying to restrict the depth to be X number of Places away form a specified origin Place.

I thought this would be straightforward, and although I can create a query to return the nodes in question, performing the aggregate calculation to return the count of paths between nodes proves challenging.

The following will create an itenerary from the data, but not the count of paths between each Place necessary to populate a Sankey diagram. APOC's virtual nodes and graphs would seem to be useful, but I'm new to both Neo4J and am not sure if that's a rabbit hole (i.e. return a graph of Places and Relationships between them, with a property on the relationship that represents the count of travelled paths)

``````MATCH (root:Place)<-[:AT]-(v:Visit) WHERE root.name="ATLANTIS"
MATCH path=(v)-[:FOLLOWED_BY*0..5]->(:Visit)-[:AT]->(p:Place) RETURN v {itenerary: collect(p.name)}
``````

I am using Neo4j 3.5.1 with APOC installed.

Thank you for the help!
Sean

(Andrew Bowman) #2

Can you supply some data and a query to create a small sample graph to better illustrate the issue and the reason why the returned data doesn't cover what you need?