Query Optimization taking more than an hour to laod

itz.007.786 · November 14, 2024, 7:14am

The query consumes more than an 30 min to 1.5 hour to load
MG contains around 15000 records
MG_Status around 21k

MATCH (m:MG{MARGIN_PLATFORM: 'ELEC'})-[:DETAILS(TS_END: 7258136400)]- (

ms: MG_Status) with m, ms MATCH (n:MG)

WHERE n.MARGIN_PLATFORM <> 'ZOBA'

with m, n, ms MATCH (n)-[:DETAILS(TS_END: 7258136400)]-(ns: MG_Status)

with m, ms, n, ns

Match p = ((m)-[:RELATED*2..3]-(n)) with m, ms, n, ns, p WHERE all (x IN relationships (p)

WHERE X.TS_END = 7258136400) and ANY (X IN nodes (p)

WHERE X.ACCOUNT_ID = 'CONSOLIDATE')

IS NULL return Distinct ms.MARGIN_GROUP_ID,

ms.MARGIN_PLATFORM, ms.SUB, ms.HOM ns.MARGIN_GROUP_ID, ns.MARGIN_PLATFORM, I

ns.SUB, ns.HOM

What shall be done for optimization to 15 min?

glilienfield · November 15, 2024, 12:28pm

The performance issue is probably due to the query generating a Cartesian product of its results. For illustration purposes, the query’s flow can be summarized as follows:

Match (m:MG)—(ms:MG_Status)
Match (n:MG)—(ns:MG_Status)
Match path=(n)-[*2..3]-(m)
Return ms, ms

The first two match statements are not correlated, so you will generate a Cartesian product of their results. Assuming the first match generates N rows and the second match M rows, N x M rows will result from the first two queries. This result will have the results from the first match repeated for each result of the second match.

The result of the third match (path) is not used in your result, so this match’s effect is primarily to filter out the pairs (n,m) that are not related. I suggest using an EXISTS subquery here in a predicate to perform the filtering. Itadvantage of being more efficient and it will not create duplicate pairs (n,m) when multiple paths exists between n and m.

Finally, your results will return every combination of (ms, ns) that exists for a given pair (n,m). Is this really the result you are looking for?

I see one way of possibly optimizing this, but let’s validate the query is working as expected. Also, maybe we can rewrite it if you can explain the outcome you are trying to achieve.

BTW- the results of the second match will contain the results of the first match since ELEC <> ZOBA and the rest of the conditions are the same. For efficiency sake, you could add to the second match’s predicate to not allow ELEC, or you could eliminate the first query and extract the (m,ms) results from the second query’s results using list comprehension.

itz.007.786 · November 15, 2024, 3:46pm

I tried to shorten it

MATCH (m:MG{MARGIN_PLATFORM: 'ELEC'})-[:DETAILS(TS_END: 7258136400)]- (

ms: MG_Status) where n.MARGIN_PLATFORM <> 'ZOBA'

with m, ms MATCH (n:MG)-[:DETAILS(TS_END: 7258136400)]-(ns: MG_Status)

with m, ms, n, ns

Match p = ((m)-[:RELATED*2..3]-(n)) with m, ms, n, ns, p WHERE all (x IN relationships (p)

WHERE X.TS_END = 7258136400) and ANY (X IN nodes (p)

WHERE X.ACCOUNT_ID = 'CONSOLIDATE')

IS NULL return Distinct ms.MARGIN_GROUP_ID,

ms.MARGIN_PLATFORM, ms.SUB, ms.HOM ns.MARGIN_GROUP_ID, ns.MARGIN_PLATFORM, I

ns.SUB, ns.HOM

It took an hour approx.

Also tried this

Match p = ((m:MG)-[:RELATED*2..3]-(n:MG)) with m, ms, n, ns, p WHERE all (x IN relationships (p)

WHERE X.TS_END = 7258136400) and ANY (X IN nodes (p)

WHERE X.ACCOUNT_ID = 'CONSOLIDATE')

It took approx 35 min but then question is how to retrieve status details. That is also required

Any data not currently active has TS_END value different from above value 7258136400.

Did profiling on query the above(both) are taking a millions hits on db

itz.007.786 · November 18, 2024, 9:58am

Tried that it took forever for query to run. using exists ()

Bennu · November 29, 2024, 11:48am

Hello hello!

Which Neo4j version are u using?

Topic		Replies	Views
Help with optimizing query Cypher performance , browser , cypher , operations	3	416	September 21, 2021
Improving the performance of a cypher query Neo4j Graph Platform	15	692	October 26, 2020
Performance query Cypher	0	750	October 29, 2018
Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice Cypher performance , cypher	5	409	December 18, 2020
Multiple matches performance drop Neo4j Graph Platform	4	786	January 29, 2019

Query Optimization taking more than an hour to laod

Related topics