Problem with relationships(path) of dijkstra algorithm

Hi guys, I have a problem with my neo4j project. I'm trying to return a path made by a dijkstra shortest path algorithm:

query_balanced = """
        MATCH (s:RoadJunction {id: '%s'}), (t:RoadJunction {id: '%s'})
        CALL gds.shortestPath.dijkstra.stream('subgraph_routing', {
            sourceNode: s,
            targetNode: t,
            relationshipWeightProperty: 'comfort_cost'
        })
        YIELD index, sourceNode, targetNode, totalCost, nodeIds, path
        WITH totalCost AS balanced_cost, nodeIds, path
        UNWIND relationships(path) AS r
        WITH nodeIds, balanced_cost, SUM(r.distance) AS total_distance
        WHERE total_distance <= 1.05 * %f
        RETURN nodeIds, balanced_cost, total_distance
        """ % (source, target, shortest_distance)

The problem is that after relationships(path) AS r when I calculate "total _distance" it gives me zero as result. I found that r.distance doesn't exists in any record of relatioships(path), there is only the property "cost". In my graph the relation ROUTE has the property distance, so I don't understand why when i use relationships(path) this doesn't return the correct properties.

Hi @doniyeah,

The issue arises from the fact that GDS returns only a virtual path i.e., the relationships in the path object are not existing in the Database, and are only created for the purpose of visualising the answer.

In our docs we explain this as follows

The Path objects contain the node objects and virtual relationships which have a cost property.

This means that they will unfortunately not contain the distance property.

So if you want to access it you will likely have to do match queries for each consecutive pair of nodes in path to find the relationship with the exact cost specified in the costs YIELD parameter (or the minimum between the two would suffice)

I hope that clears your confusion
Let us know if you need any help with the latter.

Best regards,
Ioannis.

Oh ok, thank you. I have one question: if there are more than one relation with that cost how can i choose the correct one? I mean, is there a way to iterate the nodeIds with something like "nodeId[i], nodeId[i+1]" so I'm sure to retrieve the correct relationship?

Hello @doniyeah

I would advise this modified version of your query:

MATCH (s:RoadJunction ), (t:RoadJunction )
CALL gds.shortestPath.dijkstra.stream('subgraph_routing', {
    sourceNode: s,
    targetNode: t,
    relationshipWeightProperty: 'comfort_cost'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, path
WITH totalCost AS balanced_cost, nodeIds, path
UNWIND relationships(path) AS r
CALL (r) {
  MATCH (source WHERE source = startNode(r))-[realR]->(target WHERE target = endNode(r))
  RETURN realR.distance AS distance LIMIT 1
}
WITH nodeIds, balanced_cost, SUM(distance) AS total_distance
WHERE total_distance <= 1.05 
RETURN nodeIds, balanced_cost, total_distance

In this query I'm making use of a CALL subquery to MATCH a relationship from the database between the source nodes of the virtual relationship from the GDS graph. Since GDS projections preserve nodes, but make it possible to abstract/invent/create relationships from wider patterns, there is no direct correlation between a relationship in the GDS graph and in the database. That is to repeat some of what @ioannis_panagio already said.

A consequence of that is that there is no guarantee that exactly 1 relationship will be found between the two endpoint nodes. There could be any number, greater or equal to zero. However, in your specific use case, it may well be the case that there is exactly 1.

In my query, I arbitrarily LIMIT the MATCH so as to get maximum 1 relationship. If the LIMIT is omitted, it may result in a sum aggregation that is greater than expected, as it sums over several parallel relationships. The LIMIT 1 is a very naive approach. Depending on your use case, better alternatives may exist. If you know that there are no parallel relationships, it is a non-issue. If you know that there are parallel relationships, but they are of different types, then adding the appropriate relationship type in the MATCH pattern could be useful.

If you know that there are parallel relationships, but with different distance property values, you may want to pick the minimum or maximum to use in the summation. To do that, you can use a min or max aggregation in the subquery. For example:

CALL (r) {
  MATCH (source WHERE source = startNode(r))-[realR]->(target WHERE target = endNode(r))
  RETURN min(realR.distance) AS distance
}

Hope this helps!
Mats