Multiple match takes too long to execute due to node with a lot of connections

aurelien.facci · March 26, 2025, 10:21am

Hello,
We are trying (@julie.cavarroc) to optimize a cypher query. Here is the data model:

D is a central node. Every node D has nodes E, A and V.
Property P from node D is integer,
Property C from node A is integer
For one particular node D called D1 (node A called A1, node E called E1), I would like to retrieve all the other nodes D sharing the same node V and satisfying:

D1.P-50<D.P<D1.P+50,
A1.C = A.C,
D1.Date>D.Date>D1.Date-duration({hours:1})
To be noted, the node V can have more than 400000 other nodes D
Here is an example of a graph:

image999×702 34 KB

The next steps would be to calculate the JaroWinkler distance between E1 and all the nodes E linked to nodes D satisfying the above conditions.

Here is the code that I tried:

MATCH (e1:E)<-[:DE]-(d1:D)-[:DA]->(a1:A),(a2:A)<-[:DA]-(d2:D)-[:DE]->(e2:E)
WHERE  EXISTS((d1)-[:DV]->(:V)<-[:DV]-(d2)) 
	   AND  (a1.C=a2.C )
	   AND (d1.P-50<=d2.P<=d1.P+50)
       AND NOT EXISTS ((d1)-->(d2)) 
       AND d1.DATE>=d2.DATE>=d1.DATE - duration({hours:1}) 
       AND id(d1) <> id(d2) 
RETURN DISTINCT d1, count(d2)

What I saw from the EXPLAIN is that the two queries in the MATCH are too consuming. I would like to select only nodes D satisfying the conditions, not all the nodes D.

Also, I have the version neo4j 4 (and not v5).

Thank you in advance for your help.

joshcornejo · March 26, 2025, 10:43am

What is wrong with:

MATCH (e1:E)<-[:DE]-(d1:D)-[:DA]->(a1:A)<-[:DA]-(d2:D)-[:DE]->(e2:E)
WHERE  EXISTS((d1)-[:DV]->(:V)<-[:DV]-(d2)) 
	   AND (d1.P-50<=d2.P<=d1.P+50)
       AND NOT EXISTS ((d1)-->(d2)) 
       AND d1.DATE>=d2.DATE>=d1.DATE - duration({hours:1}) 
       AND id(d1) <> id(d2) 
RETURN DISTINCT d1, count(d2)

julie.cavarroc · March 26, 2025, 10:50am

The nodes D do not share necesseraly nodes A. Only the property A.C should be equal.

joshcornejo · March 26, 2025, 1:15pm

So your (D) can be connected to multiple (A), and you only want the connections where A.C is the same?

julie.cavarroc · March 26, 2025, 1:20pm

D is connected to only one A
Yes, I want conections where A.C is the same

joshcornejo · March 26, 2025, 1:41pm

If I get your query

Reordering the conditions:

MATCH (e1:E)<-[:DE]-(d1:D)-[:DA]->(a1:A),(a2:A)<-[:DA]-(d2:D)-[:DE]->(e2:E)
WHERE NOT EXISTS ((d1)-->(d2)) 
      AND id(d1) <> id(d2) 
      AND d1.DATE>=d2.DATE>=d1.DATE - duration({hours:1}) 
      AND (a1.C = a2.C) 
      AND EXISTS((d1)-[:DV]->(:V)<-[:DV]-(d2)) 
      AND (d1.P-50<=d2.P<=d1.P+50)
RETURN DISTINCT d1, count(d2)

Maybe other ways of organising them would make it even simpler?

julie.cavarroc · March 26, 2025, 4:39pm

Thank you for your help. I tried by reordering as mentionned but it does not work. Is there a way to filter only on nodes D satisfying the conditions ?
Here is my EXPLAIN anonymized

joshcornejo · March 27, 2025, 6:58am

What do you mean by "doesn't work"?

Your query's WHERE is just a sequence of AND, which should be commutative.

For example, these should yield the same result:

WHERE id(d1) <> id(d2) AND (a1.C = a2.C)

WHERE (a1.C = a2.C) AND id(d1) <> id(d2)

Froums · March 27, 2025, 1:32pm

Neo4 does not like SQL like queries with sets

Can you try with this ?

MATCH (d1)-[:DV]->(:V)<-[:DV]-(d2)
WHERE id(d1) <> id(d2)
AND NOT EXISTS ((d1)-->(d2))
AND (d1.P-50<=d2.P<=d1.P+50)
AND d1.DATE>=d2.DATE>=d1.DATE - duration({hours:1})
AND EXISTS (()<-[:DE]-(d1:D)-[:DA]->())
AND EXISTS (()<-[:DE]-(d2:D)-[:DA]->())
WITH d1, d2, head([(d1:D)-[:DA]->(a1) | a1]) as a1, head([(d2:D)-[:DA]->(a2) | a2]) as a2
WITH d1, d2, a1, a2
WHERE a1.C=a2.C
RETURN DISTINCT d1, count(d2)

Also Neo4j does not like when nodes contains thousands of relationships. It is definitely better for hops and relationships.

julie.cavarroc · March 27, 2025, 1:56pm

The request works in itself. However, it takes too much time to execute

julie.cavarroc · March 27, 2025, 2:20pm

Hello Farouk,

Thank you for your help. I tried, however, I have the same problem, it takes too much time to execute.
We found a way to bypass with the following query:

MATCH (e1:E1)<-[:DE]-(d1:D)-[:DA]->(a1:A)
	CALL{
		WITH  d1, a1
	    MATCH (a2:A)<-[:A]-(d2:D)-[:E]->(e2:E)
			WHERE  EXISTS((d1)-[:DV]->(:V)<-[:DV]-(d2)) 
		    AND  (a1.C= a2.C )
			AND (d1.P-50<=d2.P<=d1.P+50)
			AND NOT EXISTS ((d1)-[]->(d2)) 
	        AND d1.DATE>=d2.DATE>=d1.DATE- duration({hours:24}) AND id(d1) <> id(d2) 
	        RETURN d2
	        }
RETURN DISTINCT d1, collect(d2)

What do you think ?

joshcornejo · March 27, 2025, 2:27pm

How many nodes and relationships do you have?

Froums · March 27, 2025, 3:07pm

I would add a distinct on d1, a1

MATCH (e1:E1)<-[:DE]-(d1:D)-[:DA]->(a1:A)
WITH distinct d1, a1

Because you can have many paths and the query will execute as many time as you have e1, d1, a1 combinations.

My approach would be to filter as much as possible using labels, id and relationships and at the end filter on properties unless you have indexes.

Usually the idea is to give an entry point in the graph.

Check also the number of relationships per node or label.

Can you give us a profile or an explain plan ? Also if you want to test quickly add a limit. Its going to speed up your first test. When a query is too slow, it is painful to watch the spinner for ages

Topic		Replies	Views
Multiple match takes too long to execute Cypher	4	130	June 28, 2024
Extremely slow retrieval Neo4j Graph Platform performance	3	229	April 27, 2024
Writing out performant queries for n-relationship queries on a single node where n > 2 Cypher	20	3529	October 14, 2019
Multiple Node chain query by validity date Cypher cypher	8	528	October 16, 2020
Help with optimizing query Cypher performance , browser , cypher , operations	3	434	September 21, 2021

July Summer Fun!

Multiple match takes too long to execute due to node with a lot of connections

Related topics