How to efficiently compute all-pairs Dijkstra shortest paths for a Louvain community in GDS?

surbhi.gupta2104 · April 3, 2025, 7:39pm

I'm working on a road network analysis using Neo4j and GDS, and I need help with an efficient way to compute shortest paths between all "Location" (label) nodes that belong to a specific Louvain community.

What I did :

Graph projection:

CALL gds.graph.project(
'locGraph1',
['Location', 'OGLoc'],
{
CONNECTED_BY: {
type: 'CONNECTED_BY',
orientation: 'UNDIRECTED',
properties: 'Length'
}
}
)

MATCH (source:Location), (target:Location)

WHERE id(source) < id(target)

CALL gds.shortestPath.dijkstra.stream('locGraph1', {

sourceNode: id(source),

targetNode: id(target),

relationshipWeightProperty: 'Length'

})

YIELD totalCost

RETURN source.Node3ID AS from, target.Node3ID AS to, totalCost AS cost

…but this crashes with:

Neo.TransientError.General.MemoryPoolOutOfMemoryError

Even though I only want to do it for a single community, I have over 3,400 nodes which creates ~5.8 million pairs.

Is there a memory-efficient way to compute all-pairs Dijkstra paths within a community using GDS?
Can I use gds.beta.shortestPath.dijkstra.stream with sourceNodeIds in GDS 2.x? If not, what’s the recommended workaround?
Should I switch to a different algorithm (like allShortestPaths or allShortestPaths.deltaStepping)?
Would you recommend writing a custom loop outside GDS (via APOC or Python) — or is there a better built-in solution?

Total Nodes in Location are approx. -9K , but I tested on a small batch of 3K

ioannis_panagio · April 4, 2025, 8:12am

Hi @surbhi.gupta2104,

I think your path query could be optimised significantly.

First of all you say that you want to compute distances between only nodes belong to a specific community.
This is currently not reflected in your query as you compute it from every pair of location nodes.
Where have you stored the community information for example?
If it written to the database as a node property you do extra filtering i.e., including additionally source.community = x AND target.community = x on the WHERE clause assuming for example it is named community.

You can also project (if the communities are in database) or filter (if you compute the communities on-the-spot) the graph to create a in-memory graph containg only the nodes in that specific community so that you do not have to use the WHERE clause i specified erlier.

In addition rather than running a source-target query between each pair, you can pass multiple targets so that only a single dijkstra query is executed per source node:

This can happen as follows

MATCH (source:Location)
WITH source, COLLECT {
MATCH (l: Location)
WHERE id(l)  > id(source)
RETURN l
} as targetNodes
CALL gds.shortestPath.dijkstra.stream(‘graph’,{sourceNode:source, targetNodes: targetNodes})

Again you can use optimization by specifying the specific community to avoid some searches.

That should hopefully give you some direction for your first question.
As for the other ones,
sourceNodeIds does not seem to be supported in 2.x. Recommended workaround is probably doing as you have done or indeed trying apoc/python workarounds.
You can certinly try allShortestPaths, though that accepts no filtering an will compute every possible distance. As for delta stepping, that one is parallel and can be faster than dijkstra in large graphs, although your graphs are small and perhaps will not yield full benefits.

I hope that gives you some ways to go about improving your query. Let me know if you have any more questions.

Best regards,
Ioannis.

surbhi.gupta2104 · April 9, 2025, 5:20pm

Thank you for responding!

I did try adding the community to my query , but it seems to run endlessly!

def get_node3_ids(tx, community_id):
    result = tx.run("""
        MATCH (n:Location)
        WHERE n.Community = $community
        RETURN id(n) AS internalId, n.ID AS nodeid 
    """, community=community_id)
    return [(r["internalId"], r["nodeid"]) for r in result]

def dijkstra_between(tx, graph_name, source_id, target_id):
    result = tx.run("""
        CALL gds.shortestPath.dijkstra.stream($graph_name, {
            sourceNode: $source,
            targetNode: $target,
            relationshipWeightProperty: 'cost'
        })
        YIELD totalCost
        RETURN totalCost
    """, graph_name=graph_name, source=source_id, target=target_id)
    record = result.single()
    return record["totalCost"] if record else None

Is there any way to optimise this query?

joshcornejo · April 10, 2025, 6:47am

Not sure if this would help.

It looks like you have 2 separate function calls, where you could just modify Ioannis' one and introduce the Community:

MATCH (source:Location { Community: { $community } )
  WITH source, COLLECT {
    MATCH (l: Location)
    WHERE id(l)  > id(source)
  RETURN l
} AS targetNodes
CALL gds.shortestPath.dijkstra.stream(‘graph’,{sourceNode:source, targetNodes: targetNodes})

PS. In general I think it is best practice to have labels with a capital letter (Location) and properties with camel (likeThis and community) so they are easily mapped by other developers as it is easy to point at what is an entity/node or a property of that node.

ioannis_panagio · April 10, 2025, 12:10pm

Hi again @surbhi.gupta2104 ,

Can you post the exact query that runs forever ?

To try the optimizations I have suggested with your current python queries what I think you should do is the following:

store only the r["internalId"] in the list
modify the djikstra_between method to accept source, target_ids where target_ids is the ist returned by get_node3_ids for the community of source_id (perhaps you can even call the get_node3_ids function inside)
replace targetNode with targetNodes in the dijkstra query to find for many candidates

Something among these lines, my python is a bit rusty so i hesitate to post the adaptations myself.

Let us know if that helps you or if you need any help trying these.

Best,
Ioannis.

surbhi.gupta2104 · April 14, 2025, 9:57pm

The basic problem is I have a spatial Data.

There are two types of Labels .

Label 1 : Node31
Label 2: Node91

The are connected to each other via Relationship CONNCETED_BY.

Node31->Node91
Node91->Node31
Node31->Node31

I need to find out the shortest path for each Node31, from a node91.id =167

each connceted_by has a property Length , which is the length of the road.

MATCH(source:Node91 {nodeid: 167}), (target:Node31)     
CALL gds.allShortestPaths.dijkstra.stream('myGraph44', {
    sourceNode: source,
    relationshipWeightProperty: 'Length'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
  RETURN 
                source.nodeidAS fromNode,
                target.node31idAS toNode,
                target.Latitude,
                target.Longitude,
                target.louvainCommunity,
                totalCost AS cost
                ORDER BY cost, fromNode,toNode

CALL gds.graph.project(
  'myGraph44',
  ['Node31', 'Node91'],
  {
    CONNECTED_BY: {
      type: 'CONNECTED_BY',
      orientation: 'UNDIRECTED',
      properties: ['Length']
    }
  }
)

ioannis_panagio · April 15, 2025, 10:34am

@surbhi.gupta2104

I still think you should try to run the following that i mentioned in my first post (Adapted for new labels, hopefully i havent mistyped anything)

MATCH (source:Node91 {nodeid: 167})
WITH source, COLLECT {
MATCH (l:Node31)
RETURN l
} as targetNodes
CALL gds.shortestPath.dijkstra.stream(‘graph’,{sourceNode:source, targetNodes: targetNodes})

Notice that in the last query you have shared the target is not even used! Please give a go the query I've shared and let us know how it goes. This is the most optimal answer i can think currently of so if that still doesn't work, I will need to know to try and think something else :)

Best regards,
Ioannis.

surbhi.gupta2104 · April 15, 2025, 7:51pm

Yes this will work with UNWIND.

Topic		Replies	Views
Shortest Paths algorithms Graph Algorithms/Graph Data Science paths	5	1234	September 11, 2020
All pair shortest algorithm is taking very long time on very large data. Is there any way to optimise Cypher	0	247	February 6, 2021
Additional Algorithms for Graph Data Science Neo4j Graph Platform migrated	2	128	June 4, 2022
Does allshortestpath in gds can not outcome the links and nodes for each path? Neo4j Graph Platform	6	491	January 4, 2022
Algorithm developed using GDS core api gives different results for different graph projections Neo4j Graph Platform migrated	1	138	June 3, 2022

How to efficiently compute all-pairs Dijkstra shortest paths for a Louvain community in GDS?

Related topics