Yet another Cartesian Product problem

tinycode2 · May 11, 2020, 11:17am

Cartesian Products lie at the heart of pretty much every query I've been troubleshooting on our Neo servers over the last nine months, however I can't find a generic solution to this one. The query is produced by query builder code in response to user activity in the UI.

XX has been used to hide certain pieces of private data.

The following query uses multiple labels to avoid joins where the joined nodes aren't needed as part of the output.

PROFILE
MATCH (ph:PH_JP_XXXX:PH_OF_XXXX:PH_BU_XXXX:PH_EC_XXXX)-[:PH_OD]->(od:OD_PR_XXXX)
WHERE ph.PH_XXX_PhysicalDevices_XXXS.epochMillis >= {p0}
RETURN count(DISTINCT ph) AS Count_PH_Id

This produces a query which runs as follows:

Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 99032348 total db hits in 53971 ms

The most annoying thing about this problem is that if I simply reverse the expression of the query (without any changes to the logic), the planner chooses the correct plan, avoiding a cartesian product. Like this:

PROFILE
MATCH (od:OD_PR_XXXX)<-[:PH_OD]-(ph:PH_JP_XXXX:PH_OF_XXXX:PH_BU_XXXX:PH_EC_XXXX)
WHERE ph.PH_XXX_PhysicalDevices_XXXS.epochMillis >= {p0}
RETURN count(DISTINCT ph) AS Count_PH_Id

Producing a nice quick execution as follows:

Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 66466 total db hits in 172 ms

That's ~313 times faster just because the planner stopped being stupid.

Unfortunately because these queries are built dynamically, I can't manually tweak them. I have to find a generic way to tell the planner to avoid a cartesian join. I have tried using the USING JOIN ON hint to no avail. I even resorted to executing part of the query, COLLECTing and then UNWINDing it to try and influence when filters occur, but (perhaps thankfully) this didn't work - it seemed to use up all available memory.

Is there any way to regain control of the query plan?

andrew_bowman · May 11, 2020, 11:47am

Which version of Neo4j is being used here (we know it's 3.5.x, but which patch release)? I agree that's a horrible plan, we'll want to figure out why it's being generated.

Do you have an index on one of ph's labels and the PH_XXX_PhysicalDevices_XXXS property? Looks like this is a temporal type of some kind, but the WHERE clause here, a predicate against it's epochMillis, won't utilize an index, and needs to be changed such that you're comparing against another temporal type.

For example, if this is a dateTime type, then you could do something like:

WHERE ph.PH_XXX_PhysicalDevices_XXXS >= dateTime(epochMillis:{p0})

tinycode2 · May 11, 2020, 1:32pm

Version is 3.5.3 (enterprise)

If I take out the WHERE clause entirely, it exhibits the same behaviours as described above, only exacerbated by the increased size of the returned dataset. (But yes, the property is a dateTime type and we have indexes on it).

andrew_bowman · May 11, 2020, 5:55pm

Have you attempted on a newer patch version? 3.5.17 is the latest version of 3.5.x, there may be some planner fixes that may address the issue. At the least we could use that to rule out any already-fixed bugs.

goofballLogic · May 11, 2020, 6:23pm

We don't have a spare server within the protected production environment where this data is contractually obligated to be at rest. Patching is a non trivial operation which we would need to schedule for next weekend (or perhaps the one after).

goofballLogic · May 18, 2020, 7:56pm

After days of setting up a server in the production environment to test this upgrade I can confirm that the problem is exactly the same as before.

Topic		Replies	Views
Why does a single label massively alter my query plan? Cypher	15	905	March 2, 2020
This tutorial should explain how to avoid Cartesian Products Feedback & Requests	2	1574	November 26, 2020
Why is the neo4j query plan without cartesian product operation slower? Cypher querying , performance , operations , profile	0	814	July 8, 2019
Why does specifying the label of a node result in a Cartesian product? Cypher performance , cypher	7	279	October 19, 2023
Avoid cartesian product when create relationships Cypher cypher	10	4731	July 18, 2024

Yet another Cartesian Product problem

Related topics