How to efficiently compute all-pairs Dijkstra shortest paths for a Louvain community in GDS?

I'm working on a road network analysis using Neo4j and GDS, and I need help with an efficient way to compute shortest paths between all "Location" (label) nodes that belong to a specific Louvain community.

What I did :

Graph projection:

CALL gds.graph.project(
'locGraph1',
['Location', 'OGLoc'],
{
CONNECTED_BY: {
type: 'CONNECTED_BY',
orientation: 'UNDIRECTED',
properties: 'Length'
}
}
)

MATCH (source:Location), (target:Location)

WHERE id(source) < id(target)

CALL gds.shortestPath.dijkstra.stream('locGraph1', {

sourceNode: id(source),

targetNode: id(target),

relationshipWeightProperty: 'Length'

})

YIELD totalCost

RETURN source.Node3ID AS from, target.Node3ID AS to, totalCost AS cost

…but this crashes with:

Neo.TransientError.General.MemoryPoolOutOfMemoryError

Even though I only want to do it for a single community, I have over 3,400 nodes which creates ~5.8 million pairs.

  • Is there a memory-efficient way to compute all-pairs Dijkstra paths within a community using GDS?

  • Can I use gds.beta.shortestPath.dijkstra.stream with sourceNodeIds in GDS 2.x? If not, what’s the recommended workaround?

  • Should I switch to a different algorithm (like allShortestPaths or allShortestPaths.deltaStepping)?

  • Would you recommend writing a custom loop outside GDS (via APOC or Python) — or is there a better built-in solution?

Total Nodes in Location are approx. -9K , but I tested on a small batch of 3K

Hi @surbhi.gupta2104,

I think your path query could be optimised significantly.

First of all you say that you want to compute distances between only nodes belong to a specific community.
This is currently not reflected in your query as you compute it from every pair of location nodes.
Where have you stored the community information for example?
If it written to the database as a node property you do extra filtering i.e., including additionally source.community = x AND target.community = x on the WHERE clause assuming for example it is named community.

You can also project (if the communities are in database) or filter (if you compute the communities on-the-spot) the graph to create a in-memory graph containg only the nodes in that specific community so that you do not have to use the WHERE clause i specified erlier.

In addition rather than running a source-target query between each pair, you can pass multiple targets so that only a single dijkstra query is executed per source node:

This can happen as follows

MATCH (source:Location)
WITH source, COLLECT {
MATCH (l: Location)
WHERE id(l)  > id(source)
RETURN l
} as targetNodes
CALL gds.shortestPath.dijkstra.stream(‘graph’,{sourceNode:source, targetNodes: targetNodes})

Again you can use optimization by specifying the specific community to avoid some searches.

  • That should hopefully give you some direction for your first question.
    As for the other ones,
  • sourceNodeIds does not seem to be supported in 2.x. Recommended workaround is probably doing as you have done or indeed trying apoc/python workarounds.
  • You can certinly try allShortestPaths, though that accepts no filtering an will compute every possible distance. As for delta stepping, that one is parallel and can be faster than dijkstra in large graphs, although your graphs are small and perhaps will not yield full benefits.

I hope that gives you some ways to go about improving your query. Let me know if you have any more questions.

Best regards,
Ioannis.

1 Like

Thank you for responding!

I did try adding the community to my query , but it seems to run endlessly!

def get_node3_ids(tx, community_id):
    result = tx.run("""
        MATCH (n:Location)
        WHERE n.Community = $community
        RETURN id(n) AS internalId, n.ID AS nodeid 
    """, community=community_id)
    return [(r["internalId"], r["nodeid"]) for r in result]
def dijkstra_between(tx, graph_name, source_id, target_id):
    result = tx.run("""
        CALL gds.shortestPath.dijkstra.stream($graph_name, {
            sourceNode: $source,
            targetNode: $target,
            relationshipWeightProperty: 'cost'
        })
        YIELD totalCost
        RETURN totalCost
    """, graph_name=graph_name, source=source_id, target=target_id)
    record = result.single()
    return record["totalCost"] if record else None

Is there any way to optimise this query?

Not sure if this would help.

It looks like you have 2 separate function calls, where you could just modify Ioannis' one and introduce the Community:

MATCH (source:Location { Community: { $community } )
  WITH source, COLLECT {
    MATCH (l: Location)
    WHERE id(l)  > id(source)
  RETURN l
} AS targetNodes
CALL gds.shortestPath.dijkstra.stream(‘graph’,{sourceNode:source, targetNodes: targetNodes})

PS. In general I think it is best practice to have labels with a capital letter (Location) and properties with camel (likeThis and community) so they are easily mapped by other developers as it is easy to point at what is an entity/node or a property of that node.

Hi again @surbhi.gupta2104 ,

Can you post the exact query that runs forever ?

To try the optimizations I have suggested with your current python queries what I think you should do is the following:

  • store only the r["internalId"] in the list
  • modify the djikstra_between method to accept source, target_ids where target_ids is the ist returned by get_node3_ids for the community of source_id (perhaps you can even call the get_node3_ids function inside)
  • replace targetNode with targetNodes in the dijkstra query to find for many candidates

Something among these lines, my python is a bit rusty so i hesitate to post the adaptations myself.

Let us know if that helps you or if you need any help trying these.

Best,
Ioannis.

The basic problem is :slight_smile: I have a spatial Data.

There are two types of Labels .

Label 1 : Node31
Label 2: Node91

The are connected to each other via Relationship CONNCETED_BY.

Node31->Node91
Node91->Node31
Node31->Node31

I need to find out the shortest path for each Node31, from a node91.id =167

each connceted_by has a property Length , which is the length of the road.

MATCH(source:Node91 {nodeid: 167}), (target:Node31)     
CALL gds.allShortestPaths.dijkstra.stream('myGraph44', {
    sourceNode: source,
    relationshipWeightProperty: 'Length'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
  RETURN 
                source.nodeidAS fromNode,
                target.node31idAS toNode,
                target.Latitude,
                target.Longitude,
                target.louvainCommunity,
                totalCost AS cost
                ORDER BY cost, fromNode,toNode
CALL gds.graph.project(
  'myGraph44',
  ['Node31', 'Node91'],
  {
    CONNECTED_BY: {
      type: 'CONNECTED_BY',
      orientation: 'UNDIRECTED',
      properties: ['Length']
    }
  }
)

@surbhi.gupta2104

I still think you should try to run the following that i mentioned in my first post (Adapted for new labels, hopefully i havent mistyped anything)

MATCH (source:Node91 {nodeid: 167})
WITH source, COLLECT {
MATCH (l:Node31)
RETURN l
} as targetNodes
CALL gds.shortestPath.dijkstra.stream(‘graph’,{sourceNode:source, targetNodes: targetNodes})

Notice that in the last query you have shared the target is not even used! Please give a go the query I've shared and let us know how it goes. This is the most optimal answer i can think currently of so if that still doesn't work, I will need to know to try and think something else :)

Best regards,
Ioannis.

Yes this will work with UNWIND.