We are building a Journey Planner kind of application.
For the database, we are using almost same design mentioned in the below blog.
https://neo4j.com/blog/journey-planning-why-i-love-cypher/
The only change which differs from above design is, here we are creating CAN_TRAVEL_ONWARD_TO relationship between ALIGHT_AT LEG of one Service and BOARD_AT LEG of another Service by identifying some common STOPs between each two Service combination.
On entire data around 50-60 percent of LEG nodes having this CAN_TRAVEL_ONWARD_TO relationship.
We imported the data for one city, we got around 65000 nodes and 1700000 relationships created.
For small data below query is giving expected output, but on entire data it is taking too long(almost infinite loop).
// Finding nearest start and end point LOCATIONS for the given lat, longs
WITH point({latitude:17.3220253, longitude:78.4022942}) as start_position
MATCH (startPoint:LOCATION)
WHERE distance(startPoint.location,start_position)<=2000
WITH startPoint, point({latitude:17.5040161, longitude:78.31087889999999}) as end_position, start_position
MATCH (endPoint:LOCATION)
WHERE distance(endPoint.location,end_position)<=2000
WITH startPoint, endPoint, start_position, end_position
// Finding PATH
MATCH (startPoint)-[:CAN_BOARD]->(l1:LEGS)
MATCH (l2:LEGS)-[:CAN_ALIGHT]->(endPoint)
MATCH path = (startPoint)-[:CAN_BOARD]->(l1)-[r:CAN_TRAVEL_ONWARD_TO|:CAN_TRAVEL_ONWARD_FROM|:NEXT_LEG*0..]->(l2)-[:CAN_ALIGHT]->(endPoint)
// Finding current available Services based on time
MATCH p1 = (l1)<-[h1:HAS_LEG]-(b1:BUS)
WHERE h1.boardTimeInSec >=-46020 AND h1.boardTimeInSec <= 126780
MATCH p2 = (l2)<-[h2:HAS_LEG]-(b2:BUS)
// Getting list of legs having CAN_TRAVEL_ONWARD_TO relationships
WITH path,p1,b1, start_position, end_position, p2, [x IN nodes(path) WHERE (x:LEGS) and (x)-[:CAN_TRAVEL_ONWARD_TO]->(:LEGS) | (:BUS)-[:HAS_LEG]->(x)-[:CAN_TRAVEL_ONWARD_TO]->(:LEGS)<-[:HAS_LEG]-(:BUS)] AS legsList
UNWIND CASE WHEN legsList = THEN [null] ELSE legsList END as legsInPath
UNWIND legsInPath as legsInPath1
//Doing some calculations and returning the results
WITH path, collect(DISTINCT p1) as startLegOnePath, b1,p1, collect(DISTINCT p2) as endLegOnePath, legsInPath1, nodes(legsInPath1)[0] as CTOTHavingB1, relationships(legsInPath1)[0] as changableBeforeLeg, relationships(legsInPath1)[2] as changableAfterLeg,distance(nodes(path)[0].location,start_position) as startDist, distance(LAST(nodes(path)).location,end_position) as endDist,
REDUCE(dist = 0, n in [x in nodes(path) where (x:LEGS) | x] | dist+n.distance)+REDUCE(dist = 0, n in [x in relationships(path) where type(x)='CAN_TRAVEL_ONWARD_TO' | x] | dist+n.distance) as totalPathDistance, [x in nodes(path) where (x:LEGS) | x.busNo] as busList
WHERE CTOTHavingB1.tripNo = b1.tripNo AND changableBeforeLeg.alightTimeInSec<=changableAfterLeg.boardTimeInSec AND (changableAfterLeg.boardTimeInSec-changableBeforeLeg.alightTimeInSec)<1800
UNWIND busList as busList1
WITH path, relationships(p1) as legTime, legsInPath1, totalPathDistance, startDist, endDist, COLLECT(DISTINCT busList1) as busList
RETURN path, legTime, legsInPath1, totalPathDistance, startDist, endDist, busList ORDER BY startDist+endDist, totalPathDistance
Please suggest us if there are any changes needed in DB design level or suggest some optimizations in the query level.
Thank you.