I have 50 million nodes and 64 million relationships in neo4j db. I am using shortestPath query (shown below-3rd Query) to form origin-destination matrix. But it is taking lot of time because of too many db hits.
Please suggest what are the alternatives to minimize the query time?
Query used to Create Nodes with label "Point":
CALL apoc.periodic.iterate(
"
CALL apoc.load.csv('nodes.csv', {header:true,sep:',', ignore:['OBJECTID','CONNECTION_CNT'],
mapping:{
NODE_ID: {type:'int',name:'uid'},
X_COORD: {type:'float',name:'x'},
Y_COORD: {type:'float',name:'y'}
}
})
YIELD map as row
RETURN row
",
"
WITH row WHERE row.uid IS NOT NULL
CREATE (i:Point {{uid: row.uid}})
SET i.x=toFloat(row.x),
i.y = toFloat(row.y)
RETURN COUNT(*) as total
",
{{batchSize:100000, iterateList:true, parallel:true}}
)
Query used to Create Relationship with label "EDGE":
CALL apoc.periodic.iterate(
"
CALL apoc.load.csv('edges.csv', {header:true,sep:',',ignore:['POSTED_AVG_TRAVEL_TM'],
mapping:{
OBJECTID: {type:'int',name:'edgeId'},
FNODE: {type:'float',name:'u'},
TNODE: {type:'float',name:'v'},
TRAVEL_TM: {type:'float',name:'time'},
ROAD_LEN: {type:'float',name:'distance'}
}
})
YIELD map as edge
RETURN edge
",
"
WITH edge
WHERE edge.edgeId IS NOT NULL
MATCH (u:Point {uid: edge.u})
MATCH (v:Point {uid: edge.v})
CREATE (u)-[r:EDGE {edgeId: edge.edgeId}]->(v)
SET r.length = toFloat(edge.distance)
SET r.time = toFloat(edge.time)
RETURN COUNT(*) AS total
",
{batchSize:100000, iterateList:true, parallel:true}
)
Shortest Path Query which is taking long time:
with 20 uids inside WITH taking time=2m 20 sec
with 50 uids inside WITH taking time= 21 mins
WITH [345920715, 345920716, 345920717, 345920718, 345920719, 345920720, 345920721, 345920722, 345920723, 345920724, 345920725, 345920726, 345920727, 345920728, 345920729, 345920730, 345920731, 345920732, 345920733, 345920734, 345920735, 345920736, 345920737, 345920738, 345920739, 345920740, 345920741, 345920742, 345920743, 345920744, 345920745, 345920746, 345920747, 345920748, 345920749, 345920750, 345920751, 345920752, 345920753, 345920754, 345920755, 345920756, 345920757, 345920758, 345920759, 345920760, 345920761, 345920762, 345920763, 345920764] AS uids
MATCH (from:Point), (to:Point)
WHERE from.uid IN uids AND to.uid IN uids
MATCH path = shortestPath((from)-[r:EDGE*]-(to))
WITH from.uid AS fromPoint, to.uid AS toPoint,path,
reduce(time = 0, r in relationships(path) | time + r.time) AS totalTime,
reduce(dist = 0, r in relationships(path) | dist + r.length) AS totalDistance
RETURN fromPoint, toPoint, totalDistance,totalTime
ORDER BY fromPoint, toPoint;