How find shortest path in transport connection by using Cypher query over GTFS data?

I am new in neo4j, I created a graph following this steps, based on a data model from GTFS. I would like to find all the shortest indirect routes in the graph (with transfers). Data model of graph database contains 4 entities: Route, Trip, Stop, Stoptime. Here is a screenshot of db.scheme().

I tried to write query, with method allshortestpaths. But it returns me also trips which run in a different direction (VIR -> VBR)... another problem is the timing of that connection.

match (from:Stop {code:'VBR'}),(to:Stop {code:'VIR'})
with from,to
match p = allshortestpaths((from)-[*]-(to))
where NONE (x in relationships(p) where type(x)="OPERATES")
return p
limit 10;

Also tried this to get output for the right direction, but it doesn't work.

match p = allshortestpaths((from)-[*]->(to))

Could you help me, how to access to the transfer node (Station) when I am using allshortestpath method? I want to write a condition for timing and stop_sequence to be sure that's the right direction.

This is "pseudo-code", but I don't know how to write that query.

MATCH 
(from:Stop {code:'VBR'})--(st_from:Stoptime), 
(to:Stop {code:'VIR'})--(st_to:Stoptime), 
WHERE 
st_from.stop_sequence > (middlenode: Stoptime {stop_sequence}) 
AND (
middlenode: Stoptime {stop_sequence}) < st_to.stop_sequence
match (from:Stop {code:'VBR'})ā€”(st_from:Stoptime),(to:Stop {code:'VIR'})ā€”(st_to:Stoprime)
with from,to,st_from,st_to
match p = allshortestpaths((from)-[*]-(to)) 
where NONE (x in relationships(p) where type(x)="OPERATES")
AND ALL(node in nodes(p) WHERE node = from OR node = to OR (st_from.stop_sequence > node.stop_sequence
AND node.stop_sequence < st_to.stop_sequence) )
RETURN p
limit 10

Here node is all the nodes in the path. Itā€™s either from node or to node or the middle mode ...thatā€™s why the condition inside ALL predicates .
Hope it helps you :slight_smile: and welcome to the community :slightly_smiling_face:

Thanks for your answer, but when I use relationship (n)-[*]->(m) My original query doesn't work. I also tried your query and I got and error message:

Invalid input 'i': expected 'o/O' (line 6, column 2 (offset: 378))
"limit 5;"
  ^

But I don't understand what is wrong. I found only one spelling mistake in (to:Stop {code:'VIR'})ā€”(st_to:Stoptime)

it could mean , there are no paths in that direction ,
could you please the share the query for which you got the error

This is my query for that error:

match (from:Stop {code:'MMCT'})--(st_from:Stoptime),(to:Stop {code:'NDLS'})--(st_to:Stoptime)
with from,to,st_from,st_to
match p = allshortestpaths((from)-[*]->(to)) 
where NONE (x in relationships(p) where type(x)="OPERATES")
AND ALL(node in nodes(p) WHERE node = from OR node = to OR (st_from.stop_sequence < node.stop_sequence
AND node.stop_sequence < st_to.stop_sequence) )
limit 10;

You should not end the cypher query with semi colons

1 Like

Try removing the extra space between the two last end brackets:

AND node.stop_sequence < st_to.stop_sequence))

Space there doesnā€™t matter actually

The problem with this query is you basically have:

MATCH 
WITH
MATCH
WHERE
LIMIT

LIMIT can only follow WITH or RETURN. It's not valid just after a WHERE clause, you need to do something with the result, so you're likely missing the RETURN.

2 Likes

Man, i didnā€™t even notice it ..got distracted by that semi colon in the end :man_facepalming:t3:

Thanks, I did it, also deleted semicolon but now I have a different problem it's such slow, after waiting several minutes I get an error message that There is not enough memory to perform the current task. With compare of my first query on the top which is really fast and runs in 300ms. @ganesanmithun323 @ameyasoft @andrew.bowman any suggestions? (I raised Heap - size from 512MB to 4GB) Didn't help.

Hello, I still struggling to make this query work, I tried to take apart the query and understand to their parts. I realized that this query is not proper because of the method nodes() return list of nodes. But there are different types of nodes (Stop, Stoptime, Trip), so this condition is not correct. Because node as the variable does not contain only "Stop" value.

AND ALL(node in nodes(p) WHERE node = from OR node = to OR (st_from.stop_sequence < node.stop_sequence
AND node.stop_sequence < st_to.stop_sequence))

Could you please help me, @ganesanmithun323 @ameyasoft @andrew_bowman how to treat that the value in the node will contain only Stop nodes? Then the WHERE condition above could work.

@Martin_Pogadl , could you please just explain what you are expecting in the final output. Like, what node types you are expecting to be present in shortest path . That would be more helpful to help you here

And attach the image in the post rather than sharing the links..itā€™s really difficult ..most people will even ignore clicking those links

You can add a little more logic to allow non-Stop nodes, and to otherwise apply the predicate for stop_sequence

AND ALL(node in nodes(p) WHERE node = from OR node = to OR NOT node:Stop OR (st_from.stop_sequence < node.stop_sequence
AND node.stop_sequence < st_to.stop_sequence))

@ganesanmithun323 OK, I understand so this is my db.schema():


For better understanding, the node of Stoptime realizes a sequence of Stops through which train/bus (trip) is going, has attribute arrival (arr.) and departure (dep.) time. As you can see below:


What should be at the output?
I want to find the shortest path with transfers (T) from Stop (A) to Stop (B). It means that the query is able to return the whole path, with time attributes. (A)->(T)->(B).

Stop A (from) -> dep. time from Stop A -> Trip_id (number of train between A->T) -> arr. time to Stop T (transfer) -> Stop T -> dep. time from Stop T -> Trip_id (between T->B) -> arr. time to Stop B (to).

Conditions:

  • The path is only in one direction from A to B.
  • Stop_time of arr. time in Stop T is lower as dep.time from Stop T to Stop B

The query which I posted at the top of my post is running pretty fast, but return both direction connections of journey. Something like (B)->(T)->(A).

What is transfers here

For example when you look at this screenshot. Stop (A) = BCT ... Stop (T) = DDR (transfer) ... Stop (B) = NDLS ... It means that if you want get from Stop A to Stop B you have to get out from train at transfer station (T) in this case DDR.

Hi @Martin_Pogadl , thanks for the explanation .

match (from:Stop {code:'VBR'})ā€”(st_from:Stoptime),(to:Stop {code:'VIR'})ā€”(st_to:Stoprime)
with from,to,st_from,st_to
match p = allshortestpaths((from)-[*]-(to)) 
where NONE (x in relationships(p) where type(x)="OPERATES")
AND ALL(node in nodes(p) WHERE node = from OR node = to OR (NOT node :Stop AND 
st_from.stop_sequence > node.stop_sequence AND node.stop_sequence < st_to.stop_sequence) OR node :Stop ) 
RETURN p
LIMIT  10

i have handled the case when Transfer station comes in the path , but if you think 'Trip' nodes also can be part of shortest path , then just extend the part inside where clause like

where NONE (x in relationships(p) where type(x)="OPERATES")
AND ALL(node in nodes(p) WHERE node = from OR node = to OR (NOT node :Stop AND 
st_from.stop_sequence > node.stop_sequence AND node.stop_sequence < st_to.stop_sequence) OR node :Stop OR node :Trip ) 

i don't see 'OPERATES' relation in the schema , not sure why you are checking that in query .
Also, i am not able to understand the design properly . i do feel the design can be improved for your use case .

You are right, the design is not great for this model. "OPERATES" is deleted, i just leave it there instead of using WITH p. I make several tests and realised that shortestpath method work with number of hops from A to B. I need to add distance of transport. I think that using GTFS model in graph database is not optimal and will try find some better solution. Do you have any suggestion? How to create optimal model for this type of data?

We will need a brief explanation on data points you have to help you with designing the graph.
you can create separate post asking for design help actually rather than continuing here. More people will see if its post.