Modeling a sewer network

I have modeled our City's sewer network in Neo4j. Below describes the model, and after that describes my question. I do not include specific code here because I am fine talking in pseudo-code or generalizations.

I have represented manholes and sewer mains as their own kinds of labeled nodes in a graph. Any given manhole has a relationship with 0 or more sewer mains. 0 or more sewer mains may flow into the manhole and 0 or more sewer mains may flow out of the manhole. These are represented by directional relationships. If more than one sewer main flows out of a manhole, all but one of those sewer mains will be marked as a backup line. Each sewer main node has an attribute to describe how long it is, in feet. One manhole in the entire graph has the label of Head Works. It is where all sewer eventually ends up to be processed. (Note: You do NOT want to go into the Head Works building without a mask, regardless of COVID-19. :) )

The problem I want to solve is this: for each sewer main, I want to find how many feet of sewer main are downstream from it (including itself) on its shortest, directed path to Head Works. I do not want to traverse any lines that are marked as backup lines, but it's okay if the sewer main in question IS a backup line. I want this number of feet to become an attribute on the sewer line. I am not concerned with performance as this will be a rare task to repeat once the final data is generated.

Because this problem deals with accumulation, it is okay if the sewer mains closest to Head Works are first given their own length attribute as the accumulated length. Then every sewer main that flows into those sewer mains add their own length to the accumulated length, and so on in a recursive pattern. I just don't know how to do this, which is why I'm posting it to the experts on this forum.

If you would like me to post some sample Cypher that shows nodes and relationships, I will. Thank you for reading.

You're looking for a shortest path alorithm. There are a variety to choose from, each with their own benefits, but most of the differences are in performance in different kinds of networks.

Built into Neo4j is a shortestPath function which will likely suit your needs:

However, there are better tools in the Graph Data Science Library, which would even write the scores into the nodes and relationships for you.

Lastly, @michael.hunger wrote a great article on the subject:

1 Like

Thank you for your reply. I'm familiar with shortestPath, but what I wasn't sure of is how to traverse the graph to Head Works and not follow lines that are marked with the Backup attribute. It could be that the shortest path from one sewer main to Head Works goes down a backup line, but that's not the sewer water's normal path. I will read the articles you linked to, and I appreciate you taking the time to find them and share them with me.

I have begun to think that maybe the manhole nodes are unnecessary in the graph. I needed them for import (because the sewer mains knew which manhole was at the top of the line and which was at the bottom of the line), but I think I'll just create relationships between the sewer mains to make it easier to traverse.

If you design and build your graph with this query in mind, it should be very fast. Just remember that Cypher is best when avoiding negation. So, "all but one are marked backup" requires searching for a relationship which negates the backup relationship, requiring checking all rels. Instead, mark the one that is not backup as main or something similar, and search only for those. The query will be much faster.

MATCH (head:HeadWorks),(line:Line { id: 334 }), p = shortestPath((head)<-[:MAIN*]-(line))
RETURN p

You might want to keep those nodes, or adjust your import to create the graph you need. Being able to quickly regenerate your entire graph from source data, is more valuable than you know.

1 Like

THAT is a really good idea. I will attribute the lines that are not backup lines with something like NormalFlow: true. I've also wondered if going downstream from each sewer main is the correct path. Since I seek accumulation, I wonder if I should start from Head Works and go upstream, stopping at backup lines. That way, each successive main can use the previous main's accumulated length and add its own to the sum.

FYI: Since my last post, I rerouted sewer mains to have directed relationships with other sewer mains, and then deleted all the manhole nodes.

I also appreciate the links you sent me a few posts ago. I have begun reading them, but I do not understand the vernacular in the documentation, so it makes it difficult to figure out how to apply the principles and syntax to my particular environment and problem.

There are a great many courses out there, and a pretty active community. Really leveraging a graph db like Neo4j requires whole news ways of thinking, which means a whole new set of semantics.

Here's my favorite place to explore, and be lazy at the same time:

1 Like