Cartesian Products lie at the heart of pretty much every query I've been troubleshooting on our Neo servers over the last nine months, however I can't find a generic solution to this one. The query is produced by query builder code in response to user activity in the UI.
XX has been used to hide certain pieces of private data.
The following query uses multiple labels to avoid joins where the joined nodes aren't needed as part of the output.
PROFILE
MATCH (ph:PH_JP_XXXX:PH_OF_XXXX:PH_BU_XXXX:PH_EC_XXXX)-[:PH_OD]->(od:OD_PR_XXXX)
WHERE ph.PH_XXX_PhysicalDevices_XXXS.epochMillis >= {p0}
RETURN count(DISTINCT ph) AS Count_PH_Id
This produces a query which runs as follows:
Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 99032348 total db hits in 53971 ms
The most annoying thing about this problem is that if I simply reverse the expression of the query (without any changes to the logic), the planner chooses the correct plan, avoiding a cartesian product. Like this:
PROFILE
MATCH (od:OD_PR_XXXX)<-[:PH_OD]-(ph:PH_JP_XXXX:PH_OF_XXXX:PH_BU_XXXX:PH_EC_XXXX)
WHERE ph.PH_XXX_PhysicalDevices_XXXS.epochMillis >= {p0}
RETURN count(DISTINCT ph) AS Count_PH_Id
Producing a nice quick execution as follows:
Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 66466 total db hits in 172 ms
That's ~313 times faster just because the planner stopped being stupid.
Unfortunately because these queries are built dynamically, I can't manually tweak them. I have to find a generic way to tell the planner to avoid a cartesian join. I have tried using the USING JOIN ON
hint to no avail. I even resorted to executing part of the query, COLLECTing and then UNWINDing it to try and influence when filters occur, but (perhaps thankfully) this didn't work - it seemed to use up all available memory.
Is there any way to regain control of the query plan?