I am building a tool that builds all possible paths ('routes') of a specified length on a given network, subject to some constraints.
Nodes are linked often with multiple parallel relationships, which differ by a property value (a departure and arrival time), and all relations in a route should be consistent in time. Moreover we only want the first connection that departs a node.
For example for (n1)-[r1:REL]->(n2)-[r2:REL]->(n3) we want that
r2.departure > r1. arrival, and then keep the r2 with the lowest departure time, such that we have one route for each n1.
Given this description, do you consider neo4j a good tool for this job? Could a relational database be a better fit?
Additionaly, I currently use a clause to group over all r1's, order by the r2.departures and then select the first r2.departure. Would you agree that that is the best way of querying this problem?
I am curious to your thoughts! Thank you in advance for replying.
Background info:
Running neo4j in docker, using the neo4j:latest image
If it is always two hops, I would derive each segment separately.
If the length is variable, I would consider a custom procedure, as you will be able to traverse the segments and pick the best relationship in each hop.
@gilienfield what do you mean with deriving each segment separately - do you mean splitting up the nodes into segments, and then finding all paths for those segments in separate queries (maybe in parallel)?
@hakan.lofqvist1 I have tried to work a bit with the variable-lenght patterns, but failed to see how to create interrelational path conditions in those (e.g. 'the second segment should be after the first')
The query I am using for 2-length hops is as follows (and as you see there are some more conditions):
MATCH p = (n0:Station)-[f1:Line]->(n1:Station)-[f2:Line WHERE f2.departure > f1.arrival AND f2.departure < f1.arrival + duration("P7D")]->(n2:Station)
WHERE n0 <> n2 AND
(
(f1.registration = f2.registration AND NOT n1.loc in ['hub1', 'hub2'] AND f2.departure > f1.arrival + duration("PT30M") AND f2.departure < f1.arrival + duration("P1D"))
OR
(n1.loc in ['hub1', 'hub2'] AND f2.departure > f1.arrival + duration("PT3H"))
)
AND
(
f1.distance < 1481 OR
f2.distance < 1481 OR
abs(abs(f2.direction_initial - f1.direction_final) - 180) > 40*sqrt((
(f2.distance + f1.distance - abs(f2.distance-f1.distance))/2 - 1481)
/1481)
)
WITH n0, n1, n2, f1, f2 ORDER BY f2.departure
WITH n0, n1, n2, f1, f2.configuration AS f2c, HEAD(COLLECT(f2)) AS f2m
RETURN n0.loc || "-" || n1.loc || "-" || n2.loc AS route, f1, f2m
There are about 5k relationships in the current testing model, which turn into 28k two-hop relationships.
MATCH p = (start:Station) (
(n0:Station)-[f1:Line]->(n1:Station)-[f2:Line WHERE f2.departure > f1.arrival AND f2.departure < f1.arrival + duration("P7D")]->(n2:Station)
WHERE n0 <> n2 <> start AND
(
(f1.registration = f2.registration AND NOT n1.loc in ['hub1', 'hub2'] AND f2.departure > f1.arrival + duration("PT30M") AND f2.departure < f1.arrival + duration("P1D"))
OR
(n1.loc in ['hub1', 'hub2'] AND f2.departure > f1.arrival + duration("PT3H"))
)
AND
(
f1.distance < 1481 OR
f2.distance < 1481 OR
abs(abs(f2.direction_initial - f1.direction_final) - 180) > 40*sqrt((
(f2.distance + f1.distance - abs(f2.distance-f1.distance))/2 - 1481)
/1481)
)
){1,2} (end:Station)
WITH p ORDER BY end.departure
RETURN p limit 1
Where {1,3} is the min/max number of times the pattern can repeat. I am worried if you may return to the start node. So I kept the condition WHERE n0 <> n2 <> start
Thank you! Question about this, the final output now is ordered by end.departure. End, however, is a node right? The departure is a property of the Line, not of the Station, so I cannot order by that. How can I access the departure property of the last Line? I've tried the relationships(p)[-1], which is yet to complete.
Also it seems that the time departure>arrival requirement is not satisfied between each flight element. Is there a good way to do that in a variable path like this?
ORDER BY relationships(p)[-1].departure should work. But I am also struggeling to piece togheter how we can get the conditions for f1 and f2 to match up for each "iteration" of the pattern. I am going to poke someone smart and get back to you.
WHERE all(i in range(0, length(path)-2) WHERE
relationships(path)[i+1].departure > relationships(path)[i].arrival
)
But that will not be as optimal as being able to filter inside the repeating pattern. If you study the example in the docs Variable length patterns - Cypher Manual they have made a nice trick where nodes represent the stops for a "time table".
I don't know if that would be an option, to change the model in a way so more properties lives on nodes instead of on the relationships? That may help both wirting the query as well as make it more efficient/faster.
Obvisouly, that depends on the size of the graph, maybe it is "fast enogh" just the way it is now?