I'm trying to understand why below query is taking too long to retrieve results. I have mocked up the values used but the below query is right and is returning 40 records (a node has 8 diff values and z node has 5 diff values so total 40 combinations). It's taking 2.5 min to return those 40 records. Please let me know what the issue is here. I'm suspecting this to be Neo4j version and infrastructure we're using right now in production.
After the below query we have algo.kShortestPaths.stream so the whole thing together is taking more than 5 min. What do you suggest? Is there no other way where we can handle such combinations (a and z node combinations > 40) within 5 min?
Infrastructure details: Neo4j 3.5 community edition
2 separate datacenters, sync job - 64GB mem 16GB CPU 4 cores
Cypher Query:
MATCH (s:SiteNode {siteName: 'siteName1'})-[rl:CONNECTED_TO]-(a:EquipmentNode)
WHERE a.locationClli = s.siteName AND toUpper(a.networkType) = 'networkType1' AND NOT (toUpper(a.equipmentTid) CONTAINS 'TEST')
WITH a.equipmentTid AS tid_A
MATCH pp = (a:EquipmentNode)-[rel:CONNECTED_TO]-(a1:EquipmentNode)
WHERE a.equipmentTid = tid_A AND ALL( t IN relationships(pp)
WHERE t.type IN ['Type1'] AND (t.totalChannels > 0 AND t.totalChannelsUsed < t.totalChannels) AND t.networkId IN ['networkId1'] AND t.status IN ['status1', 'status2'] )
WITH a
MATCH (d:SiteNode {siteName: 'siteName2'})-[rl:CONNECTED_TO]-(z:EquipmentNode)
WHERE z.locationClli = d.siteName AND toUpper(z.networkType) = 'networkType2' AND NOT (toUpper(z.equipmentTid) CONTAINS 'TEST')
WITH z.equipmentTid AS tid_Z, a
MATCH pp = (z:EquipmentNode)-[rel:CONNECTED_TO]-(z1:EquipmentNode)
WHERE z.equipmentTid=tid_Z AND ALL(t IN relationships(pp)
WHERE t.type IN ['Type2'] AND (t.totalChannels > 0 AND t.totalChannelsUsed < t.totalChannels) AND t.networkId IN ['networkId2'] AND t.status IN ['status1', 'status2'])
WITH DISTINCT z, a CALL algo.kShortestPaths.stream(a, z, 5, 'mileage', {nodeQuery:'MATCH (n:EquipmentNode) RETURN id(n) AS id ', relationshipQuery:'MATCH p = (n:EquipmentNode)-[r:CONNECTED_TO]-(m:EquipmentNode) WHERE ALL ( Q IN relationships(p) WHERE Q.type IN ["type1"] AND Q.totalChannels > 0 AND (Q.totalChannelsUsed < Q.totalChannels) AND Q.networkId IN ["networkId1"] AND Q.status IN ["status1", "status2"] ) RETURN id(n) AS source, id(m) AS target, r.mileage AS weight, r.trailName AS linkName, r.networkId AS linkNetworkId ', maxDepth:80, direction:'BOTH', defaultValue:0.0, write: false , graph:'cypher', path: true}) YIELD path, index, nodeIds, costsWITH nodeIds, a, z, index, path, costs, [node IN algo.getNodesById(nodeIds) | node.locationClli] AS clliInRoutes, [node IN algo.getNodesById(nodeIds) | node.nodeId] AS tidsInRoutesUNWIND range(0, size(nodeIds)-2) AS idxWITH clliInRoutes, tidsInRoutes, algo.getNodeById(nodeIds[idx]) AS current, algo.getNodeById(nodeIds[idx+1]) AS next, path, a, z, index, nodeIds, costsWITH clliInRoutes, tidsInRoutes, path, current, next, a, z, index, nodeIds, costsMATCH pp=(current)-[t:CONNECTED_TO]-(next)WHERE ALL( t IN relationships(pp)WHERE t.type IN ['type1'] AND (t.totalChannels > 0 AND t.totalChannelsUsed < t.totalChannels) AND t.networkId IN ['networkId1'] AND t.status IN ['status1', 'status2'] )WITH clliInRoutes, tidsInRoutes, path AS p, a, z, index, nodeIds, costs, relationships(pp) AS connectionsUNWIND connections AS routesWITH clliInRoutes, tidsInRoutes, costs, index, nodeIds, routes, connections, p, a, z, size(nodeIds)-1 AS noOfLinks, reduce(acc = 0.0, cost IN costs | acc + cost) AS totalMileage, EXTRACT(n IN connections | n.trailName) AS shortestPath, a.nodeId AS startNode, z.nodeId AS endNode, a.locationClli AS startLocation, z.locationClli AS endLocation, REDUCE(COST=0.0, r IN connections | COST+toFloat(r.cost)) AS totalCost, 10^2 AS factorWITH clliInRoutes, tidsInRoutes, costs, index, p, nodeIds, routes, factor, totalMileage, totalCost, shortestPath, noOfLinks, startNode, endNode, startLocation, endLocationWHERE NOT (routes.trailName IN ['trail1','trail2'])RETURN
This query was built to handle small combinations upto 4 total a and z node combinations but today we might have combinations greater than 10 or 40 or 100 so this is timing out. I'm not sure if there's a better way to write the query to improve performance assuming the community edition is good enough for our case.