Query Optimization taking more than an hour to laod

The query consumes more than an 30 min to 1.5 hour to load
MG contains around 15000 records
MG_Status around 21k

MATCH (m:MG{MARGIN_PLATFORM: 'ELEC'})-[:DETAILS(TS_END: 7258136400)]- (

ms: MG_Status) with m, ms MATCH (n:MG)

WHERE n.MARGIN_PLATFORM <> 'ZOBA'

with m, n, ms MATCH (n)-[:DETAILS(TS_END: 7258136400)]-(ns: MG_Status)

with m, ms, n, ns

Match p = ((m)-[:RELATED*2..3]-(n)) with m, ms, n, ns, p WHERE all (x IN relationships (p)

WHERE X.TS_END = 7258136400) and ANY (X IN nodes (p)

WHERE X.ACCOUNT_ID = 'CONSOLIDATE')

IS NULL return Distinct ms.MARGIN_GROUP_ID,

ms.MARGIN_PLATFORM, ms.SUB, ms.HOM ns.MARGIN_GROUP_ID, ns.MARGIN_PLATFORM, I

ns.SUB, ns.HOM

What shall be done for optimization to 15 min?

The performance issue is probably due to the query generating a Cartesian product of its results. For illustration purposes, the query’s flow can be summarized as follows:

Match (m:MG)—(ms:MG_Status)
Match (n:MG)—(ns:MG_Status)
Match path=(n)-[*2..3]-(m)
Return ms, ms

The first two match statements are not correlated, so you will generate a Cartesian product of their results. Assuming the first match generates N rows and the second match M rows, N x M rows will result from the first two queries. This result will have the results from the first match repeated for each result of the second match.

The result of the third match (path) is not used in your result, so this match’s effect is primarily to filter out the pairs (n,m) that are not related. I suggest using an EXISTS subquery here in a predicate to perform the filtering. Itadvantage of being more efficient and it will not create duplicate pairs (n,m) when multiple paths exists between n and m.

Finally, your results will return every combination of (ms, ns) that exists for a given pair (n,m). Is this really the result you are looking for?

I see one way of possibly optimizing this, but let’s validate the query is working as expected. Also, maybe we can rewrite it if you can explain the outcome you are trying to achieve.

BTW- the results of the second match will contain the results of the first match since ELEC <> ZOBA and the rest of the conditions are the same. For efficiency sake, you could add to the second match’s predicate to not allow ELEC, or you could eliminate the first query and extract the (m,ms) results from the second query’s results using list comprehension.

  1. I tried to shorten it

MATCH (m:MG{MARGIN_PLATFORM: 'ELEC'})-[:DETAILS(TS_END: 7258136400)]- (

ms: MG_Status) where n.MARGIN_PLATFORM <> 'ZOBA'

with m, ms MATCH (n:MG)-[:DETAILS(TS_END: 7258136400)]-(ns: MG_Status)

with m, ms, n, ns

Match p = ((m)-[:RELATED*2..3]-(n)) with m, ms, n, ns, p WHERE all (x IN relationships (p)

WHERE X.TS_END = 7258136400) and ANY (X IN nodes (p)

WHERE X.ACCOUNT_ID = 'CONSOLIDATE')

IS NULL return Distinct ms.MARGIN_GROUP_ID,

ms.MARGIN_PLATFORM, ms.SUB, ms.HOM ns.MARGIN_GROUP_ID, ns.MARGIN_PLATFORM, I

ns.SUB, ns.HOM

It took an hour approx.

  1. Also tried this :slight_smile:

Match p = ((m:MG)-[:RELATED*2..3]-(n:MG)) with m, ms, n, ns, p WHERE all (x IN relationships (p)

WHERE X.TS_END = 7258136400) and ANY (X IN nodes (p)

WHERE X.ACCOUNT_ID = 'CONSOLIDATE')

It took approx 35 min but then question is how to retrieve status details. That is also required

Any data not currently active has TS_END value different from above value 7258136400.

Did profiling on query the above(both) are taking a millions hits on db

Tried that it took forever for query to run. using exists ()

Hello hello!

Which Neo4j version are u using?