Find longest non-repeating roadtrip path given distance budget

Hello all,

It's essentially a question of graph traversal control.
My issue isn't really about roadtrip but I'll use a metaphor to make it easier to explain and captures the essence of the issue.

I have a "undirected weighted disconnected cyclic" graph containing about 1M nodes, which we can call "City", and 1M relations, which we can call "ROADS", roads have a weight property "distance".

I'm trying to make a query, that will, given a starting City node and a maximum distance traveled called Fuel tank:

  1. Go to the unvisited city closest to any of our visited cities (we get "free teleport" at this step to all our visited cities)
  2. Deduct the Distance from Fuel tank.
  3. While Fuel Tank > 0 ==> Go to step 1
  • Return the ids of the visited City nodes (and optionally some extra info about the path).

I'm not sure if this is possible without going into the Java back-end, or having a script loop that generate the next Cypher query at each step given previous results.

All help appreciated, thanks.

Optional clarification P.S:
Example of step 1
Given we already traveled to City A, B and C.
For the next City, we look for the closest city to ANY of our A,B,C.
Let's say closest city is D and is closest to B. We would go to D and pay the "distance" between B and D.
At the next round we would do the same with Cities A,B,C and D.

So in short you want to traverse as many cities based on the fuel tank?

That is correct @intouch.vivek
Traverse as many distinct cities as possible given fuel tank. Keep track of our visited cities.

Keeping track of the visited cities and also the fuel spent makes it impossible to do with just a cypher query . you will have to go for procedures and implement the logic as a java code.

The whole thing can be easily implemented as a procedure and you can call this procedure in cypher by passing the required parameters to the procedure.

1 Like

While I'm hoping someone else comes in with at least a partial solution, I'm afraid you may well be right @ganesanmithun323 . Thanks.

I've just made an iterative script in Python with the neo4j driver to read result of each step and generate the next query. It works, but at nearly 1 second average processing per Starting City (and 0.005 minimum), it feels awfully slow. I'm sure it's possible to be orders of magnitude faster in properly prepared batches.

From my naive experience of fairly beginner in Neo4j, this use case doesn't seem that advanced or rare and I'm a little disappointed it's not covered by Cypher or APOC. Does that look like a feature request to anybody else?

We need to note that cypher is not a programming language but just a query language.

you need to try procedures , i am sure it will increase the speed drastically.

Hi balmix,

Could you please share a dataset.


Simpler way would be to traverse fixed depth say 10 levels and check the path cost (sum of distances). You could return all the paths where cost of the path is less than the pre-determined cost.

In Cypher this would be the way to implement. This may not give all the paths that are greater than the depth you specified.

If you want all the paths then procedure could be one option.

Cypher could be

MATCH path=(start:City {name:'test'}) -[r:ROADS*1..10]->(end)
WITH path, reduce(s = 0, , x IN relationships(path) | s + x.prop) as cost
WHERE cost < 1000 and cost > 800

This query will return all the paths from starting point that can be travelled with 800 miles minimum and less than 1000 miles.

1 Like

Very interesting use of relationships(path) and reduce.
It certainly is a solid partial result, and perhaps with tweaking I could get close to a result. Thanks @anthapu

There are still a few unsatisfied clause here, I'll be working on this to see how much closer I can get to a solution. Mainly, limited number of hops, doesn't seem like the "free teleport" clause is matched here, and similarly we can visit the same node multiple times given the graph is cyclic and undirected.

I'm thinking perhaps with a "COUNT(distinct UNWIND relationships(path)) = LENGTH(path)", we may ensure no double visits of node.

P.S: @intouch.vivek I'm sorry, this is sensitive data, I can share neither dataset nor exact schema. However I believe I gave a clear and concise description of the problem that is equivalent to the issue at hand, we're essentially looking at a map of roads and cities.

The reason we need to limit the paths in Cypher, because Cypher is path distinct. So it needs to traverse all the paths and in cyclical graph it becomes an issue.

Please look at apoc path expander.

You should play with config

uniqueness: 'NODE_GLOBAL'
1 Like