Shortestpath query is taking long time

dt1 · April 9, 2025, 12:09pm

I have 50 million nodes and 64 million relationships in neo4j db. I am using shortestPath query (shown below-3rd Query) to form origin-destination matrix. But it is taking lot of time because of too many db hits.
Please suggest what are the alternatives to minimize the query time?

Query used to Create Nodes with label "Point":

CALL apoc.periodic.iterate(
"
CALL apoc.load.csv('nodes.csv', {header:true,sep:',', ignore:['OBJECTID','CONNECTION_CNT'],
mapping:{
NODE_ID: {type:'int',name:'uid'},
X_COORD: {type:'float',name:'x'},
Y_COORD: {type:'float',name:'y'}
}
})
YIELD map as row
RETURN row
",
"
WITH row WHERE row.uid IS NOT NULL
CREATE (i:Point {{uid: row.uid}})
SET i.x=toFloat(row.x),
i.y = toFloat(row.y)
RETURN COUNT(*) as total
",
{{batchSize:100000, iterateList:true, parallel:true}}
)

Query used to Create Relationship with label "EDGE":
CALL apoc.periodic.iterate(
"
CALL apoc.load.csv('edges.csv', {header:true,sep:',',ignore:['POSTED_AVG_TRAVEL_TM'],
mapping:{
OBJECTID: {type:'int',name:'edgeId'},
FNODE: {type:'float',name:'u'},
TNODE: {type:'float',name:'v'},
TRAVEL_TM: {type:'float',name:'time'},
ROAD_LEN: {type:'float',name:'distance'}
}
})
YIELD map as edge
RETURN edge
",
"
WITH edge
WHERE edge.edgeId IS NOT NULL
MATCH (u:Point {uid: edge.u})
MATCH (v:Point {uid: edge.v})
CREATE (u)-[r:EDGE {edgeId: edge.edgeId}]->(v)
SET r.length = toFloat(edge.distance)
SET r.time = toFloat(edge.time)
RETURN COUNT(*) AS total
",
{batchSize:100000, iterateList:true, parallel:true}
)

Shortest Path Query which is taking long time:

with 20 uids inside WITH taking time=2m 20 sec
with 50 uids inside WITH taking time= 21 mins

WITH [345920715, 345920716, 345920717, 345920718, 345920719, 345920720, 345920721, 345920722, 345920723, 345920724, 345920725, 345920726, 345920727, 345920728, 345920729, 345920730, 345920731, 345920732, 345920733, 345920734, 345920735, 345920736, 345920737, 345920738, 345920739, 345920740, 345920741, 345920742, 345920743, 345920744, 345920745, 345920746, 345920747, 345920748, 345920749, 345920750, 345920751, 345920752, 345920753, 345920754, 345920755, 345920756, 345920757, 345920758, 345920759, 345920760, 345920761, 345920762, 345920763, 345920764] AS uids
MATCH (from:Point), (to:Point)
WHERE from.uid IN uids AND to.uid IN uids
MATCH path = shortestPath((from)-[r:EDGE*]-(to))
WITH from.uid AS fromPoint, to.uid AS toPoint,path,
reduce(time = 0, r in relationships(path) | time + r.time) AS totalTime,
reduce(dist = 0, r in relationships(path) | dist + r.length) AS totalDistance
RETURN fromPoint, toPoint, totalDistance,totalTime
ORDER BY fromPoint, toPoint;

ioannis_panagio · April 9, 2025, 1:19pm

Hi @dt1,

You are currently in the gds section, I have transferred your thread to the more appropriate cypher section in case someone knows how to optimize your query.

That being said, you are welcome to have a look into gds's shortest path algorithms. They work on an in-memory graph so once projecting it could potentially run faster.

Let me know if you have need any help with these GDS procedures.

Best regards,
Ioannis.

glilienfield · April 10, 2025, 12:41am

Try this:

WITH [345920715, 345920716, 345920717, 345920718, 345920719, 345920720, 345920721, 345920722, 345920723, 345920724, 345920725, 345920726, 345920727, 345920728, 345920729, 345920730, 345920731, 345920732, 345920733, 345920734, 345920735, 345920736, 345920737, 345920738, 345920739, 345920740, 345920741, 345920742, 345920743, 345920744, 345920745, 345920746, 345920747, 345920748, 345920749, 345920750, 345920751, 345920752, 345920753, 345920754, 345920755, 345920756, 345920757, 345920758, 345920759, 345920760, 345920761, 345920762, 345920763, 345920764] AS uids
UNWIND uids as uid
MATCH (n:Point{uid:uid})
WITH COLLECT(n) as Points
UNWIND Points as from
UNWIND Points as to
WITH from, to
WHERE from.uid < to.uid
MATCH path = shortestPath((from)-[r:EDGE*]-(to))
WITH from.uid AS fromPoint, to.uid AS toPoint,path,
reduce(time = 0, r in relationships(path) | time + r.time) AS totalTime,
reduce(dist = 0, r in relationships(path) | dist + r.length) AS totalDistance
RETURN fromPoint, toPoint, totalDistance,totalTime
ORDER BY fromPoint, toPoint;

dt1 · April 10, 2025, 12:42pm

Thanks @glilienfield for your reply. Earlier I was getting 2425 rows (50*49=2450) in query result and it was taking 21 mins. Now above query is giving 1225 rows and taking 10 mins. Both, Time and rows are reduced by half.

Is there a way to reduce the query time in the range of 1 to 2 mins? Please suggest.

glilienfield · April 10, 2025, 1:05pm

You should get half the results, as I configured it not to calculate the shortest path from each direction, i.e. Node A -> Node B and Node B -> Node A, as the result should be the same since you did not specify relationship direction in your query.

Do you have an index defined on property 'uid' for Point label?

dt1 · April 11, 2025, 7:18am

Yes. Index is defined on uid property of Point label.

Topic		Replies	Views
Optimizing Neo4j Query for Path search Cypher apoc , cypher	0	649	March 26, 2020
All pair shortest algorithm is taking very long time on very large data. Is there any way to optimise Cypher	0	255	February 6, 2021
Query result timing Cypher browser , cypher	2	64	September 26, 2024
Creating 200K relationships to a node is taking a lot of time in Neo4J 3.5? Neo4j Graph Platform	18	7250	November 18, 2021
Performance issue with shortestPath on cypher query Cypher performance , cypher	4	1182	May 21, 2020

Get Certified in June!

Shortestpath query is taking long time

Related topics