Is neo4j the best tool to find all possible multi-hop paths between two nodes

Hello,

I am building a tool that builds all possible paths ('routes') of a specified length on a given network, subject to some constraints.

Nodes are linked often with multiple parallel relationships, which differ by a property value (a departure and arrival time), and all relations in a route should be consistent in time. Moreover we only want the first connection that departs a node.
For example for (n1)-[r1:REL]->(n2)-[r2:REL]->(n3) we want that
r2.departure > r1. arrival, and then keep the r2 with the lowest departure time, such that we have one route for each n1.

Given this description, do you consider neo4j a good tool for this job? Could a relational database be a better fit?

Additionaly, I currently use a clause to group over all r1's, order by the r2.departures and then select the first r2.departure. Would you agree that that is the best way of querying this problem?

I am curious to your thoughts! Thank you in advance for replying.

Background info:

  • Running neo4j in docker, using the neo4j:latest image
  • Not using any plugins or extensions

Neo4j has some nice query constructs for this, Quantified path patterns: Variable length patterns - Cypher Manual

Do you already have a query that is giving you the results that you want? Can you share it, so other can comment and suggest potential improvements?

If neo4j is the best tool is too complex of a question, let's start with if we can write a query that executes well :wink:

1 Like

If it is always two hops, I would derive each segment separately.

If the length is variable, I would consider a custom procedure, as you will be able to traverse the segments and pick the best relationship in each hop.

@gilienfield what do you mean with deriving each segment separately - do you mean splitting up the nodes into segments, and then finding all paths for those segments in separate queries (maybe in parallel)?

@hakan.lofqvist1 I have tried to work a bit with the variable-lenght patterns, but failed to see how to create interrelational path conditions in those (e.g. 'the second segment should be after the first')

The query I am using for 2-length hops is as follows (and as you see there are some more conditions):

MATCH p = (n0:Station)-[f1:Line]->(n1:Station)-[f2:Line WHERE f2.departure > f1.arrival AND f2.departure < f1.arrival + duration("P7D")]->(n2:Station)
WHERE n0 <> n2 AND 
(
    (f1.registration = f2.registration AND NOT n1.loc in ['hub1', 'hub2'] AND f2.departure > f1.arrival + duration("PT30M") AND f2.departure < f1.arrival + duration("P1D"))
    OR
    (n1.loc in ['hub1', 'hub2'] AND f2.departure > f1.arrival + duration("PT3H"))
)
 AND
(
    f1.distance < 1481 OR
    f2.distance < 1481 OR
    abs(abs(f2.direction_initial - f1.direction_final) - 180) > 40*sqrt((
    (f2.distance + f1.distance - abs(f2.distance-f1.distance))/2 - 1481)
    /1481)
)
WITH n0, n1, n2, f1, f2 ORDER BY f2.departure
WITH n0, n1, n2, f1, f2.configuration AS f2c, HEAD(COLLECT(f2)) AS f2m
RETURN n0.loc || "-" || n1.loc || "-" || n2.loc AS route, f1, f2m

There are about 5k relationships in the current testing model, which turn into 28k two-hop relationships.

Hi Hakan, would you have a moment to again look at the query I posted? It would be really helpful to hear your thoughts!

I think it would be something like this:

MATCH p = (start:Station) ( 
    (n0:Station)-[f1:Line]->(n1:Station)-[f2:Line WHERE f2.departure > f1.arrival AND f2.departure < f1.arrival + duration("P7D")]->(n2:Station)
    WHERE n0 <> n2 <> start AND 
    (
        (f1.registration = f2.registration AND NOT n1.loc in ['hub1', 'hub2'] AND f2.departure > f1.arrival + duration("PT30M") AND f2.departure < f1.arrival + duration("P1D"))
        OR
        (n1.loc in ['hub1', 'hub2'] AND f2.departure > f1.arrival + duration("PT3H"))
    )
    AND
    (
        f1.distance < 1481 OR
        f2.distance < 1481 OR
        abs(abs(f2.direction_initial - f1.direction_final) - 180) > 40*sqrt((
        (f2.distance + f1.distance - abs(f2.distance-f1.distance))/2 - 1481)
        /1481)
    )
){1,2} (end:Station)
WITH p ORDER BY end.departure
RETURN p limit 1

Where {1,3} is the min/max number of times the pattern can repeat. I am worried if you may return to the start node. So I kept the condition WHERE n0 <> n2 <> start

Thank you! Question about this, the final output now is ordered by end.departure. End, however, is a node right? The departure is a property of the Line, not of the Station, so I cannot order by that. How can I access the departure property of the last Line? I've tried the relationships(p)[-1], which is yet to complete.
Also it seems that the time departure>arrival requirement is not satisfied between each flight element. Is there a good way to do that in a variable path like this?

ORDER BY relationships(p)[-1].departure should work. But I am also struggeling to piece togheter how we can get the conditions for f1 and f2 to match up for each "iteration" of the pattern. I am going to poke someone smart and get back to you.

My brain trust came up with this:

Post filter the paths with:

WHERE all(i in range(0, length(path)-2) WHERE
  relationships(path)[i+1].departure >   relationships(path)[i].arrival
) 

But that will not be as optimal as being able to filter inside the repeating pattern. If you study the example in the docs Variable length patterns - Cypher Manual they have made a nice trick where nodes represent the stops for a "time table".

I don't know if that would be an option, to change the model in a way so more properties lives on nodes instead of on the relationships? That may help both wirting the query as well as make it more efficient/faster.

Obvisouly, that depends on the size of the graph, maybe it is "fast enogh" just the way it is now?