Recently I optimized a customer query(on a network graph) by switching from apoc.path.expandConfig() to a QPP. The performance improved significantly--from 5 secs to under a second.
Although both queries return identical results, I'm wondering if there's any more improvement or comment on this.
< Before >
CALL apoc.path.expandConfig(logical, { relationshipFilter: "ROUTE>|CONTAIN|-ASSIGN", labelFilter: "+Logical|Equipment", uniqueness: "RELATIONSHIP_PATH", filterStartNode: true, bfs: false, whitelistNodes: whitelist, endNodes: endNodes, minLevel: 8, maxLevel: 10 }) YIELD path
WITH path, requiredDestinations, s, [r IN relationships(path) WHERE type(r) = "ROUTE" | r.destination] AS allDestinations
WITH path, all(dests IN allDestinations WHERE any(reqDest IN requiredDestinations WHERE reqDest IN dests)) AS isPathValid WHERE isPathValid
return path
< After >
match path = any((logical)-[r where any(dst in r.destination where dst in requiredDestinations and type(r) = 'ROUTE') or type(r) = 'CONTAIN' and not type(r) = 'ASSIGN']-(le:Logical|Equipment where le in whitelist)
((le2:Logical|Equipment where le2 in whitelist)-[r2 where any(dst in r2.destination where dst in requiredDestinations and type(r2) = 'ROUTE') or type(r2) = 'CONTAIN' and not type(r2) = 'ASSIGN']-(le3:Logical|Equipment where le3 in whitelist)){6,8}
(le4:Logical|Equipment where le4 in whitelist)-[r3 where any(dst in r3.destination where dst in requiredDestinations and type(r3) = 'ROUTE') or type(r3) = 'CONTAIN' and not type(r3) = 'ASSIGN']-(end where end in endNodes))
return path
Since this is a fragment, the first thing I get to execute or explain:
Variable `requiredDestinations` not defined (line 1, column 83 (offset: 82))
The first obvious thing was to move the simpler and shorter short circuit operations (AND) to be evaluated first, I put it on Claude and that's pretty much what it did:
MATCH path = ANY(
(logical:Logical|Equipment WHERE logical IN whitelist)
-[r WHERE r.type IN ['ROUTE', 'CONTAIN']
AND r.type <> 'ASSIGN'
AND (r.type = 'CONTAIN' OR ANY(dst IN r.destination WHERE dst IN requiredDestinations))]->
(le:Logical|Equipment WHERE le IN whitelist)
(
(le2:Logical|Equipment WHERE le2 IN whitelist)
-[r2 WHERE r2.type IN ['ROUTE', 'CONTAIN']
AND r2.type <> 'ASSIGN'
AND (r2.type = 'CONTAIN' OR ANY(dst IN r2.destination WHERE dst IN requiredDestinations))]->
(le3:Logical|Equipment WHERE le3 IN whitelist)
){6,8}
(le4:Logical|Equipment WHERE le4 IN whitelist)
-[r3 WHERE r3.type IN ['ROUTE', 'CONTAIN']
AND r3.type <> 'ASSIGN'
AND (r3.type = 'CONTAIN' OR ANY(dst IN r3.destination WHERE dst IN requiredDestinations))]->
(end WHERE end IN endNodes)
)
RETURN path;
It even came up with a very different query that seems to have the same gist:
WITH requiredDestinations, whitelist, endNodes
MATCH (start:Logical|Equipment WHERE start IN whitelist)
MATCH (end WHERE end IN endNodes)
MATCH path = ANY(
(start)
-[r:ROUTE|CONTAIN WHERE r.type <> 'ASSIGN'
AND (r.type = 'CONTAIN' OR ANY(dst IN r.destination WHERE dst IN requiredDestinations))]->*{8,10}
(end)
)
WHERE ALL(n IN nodes(path)[1..-1] WHERE (n:Logical OR n:Equipment) AND n IN whitelist)
RETURN path;
"requiredDestinations" is a list of destination networks like x.x.x.x/xx
"logical" and "whitelist" are list of nodes that have already been filtered in the earlier steps.