Cypher query is taking too long when trying to find shared nodes between two separate nodes

UncleKenroy · July 26, 2021, 10:08pm

Hello I have a project that contains 109k nodes and 2.8 million connections. I have a place nodes, Visited edges and census block nodes. I am trying to find the top 10 places that have the most shared census block group nodes within a certain month.

Match (n:Place)<-[r:Visited{month: "April"}]-(c:CensusBG)-[b:Visited{month: "April"}]->(m:Place) 

return n.name, m.name, count(distinct c) as cnt

order by cnt desc limit 10

I tried running it over night but it still did not work and I had to restart my machine to get the database back up again.

andrew_bowman · July 27, 2021, 6:25am

This is a very large graph-wide query.

First, I think we'd recommend running this on Neo4j 4.3.x, and creating a relationship index on :Visted(month):

https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-search-performance/#administration-indexes-create-a-single-property-index-for-relationships

Also since your pattern is symmetric, it would be better to add a filtering so you don't see symmetric results (where it's the same two places and the same count, with just the places swapped for n and m).

You can add WHERE id(n) < id(m), that should do the trick.

That said, if the issues you're facing are heap-related, it would be best to use subqueries to divide up the work, which should make aggregations less of a threat to your heap. Something like this maybe:

MATCH (n:Place)
CALL {
 WITH n
 MATCH (n)<-[r:Visited{month: "April"}]-(c:CensusBG)-[b:Visited{month: "April"}]->(m:Place) 
 WHERE id(n) < id(m)
 RETURN n.name as name1, m.name as name2, count(c) as cnt
}
RETURN name1, name2, cnt
ORDER BY cnt DESC
LIMIT 10

This ensures the aggregations are performed with respect to only the matches from a single starting :Place node at a time, instead of executing over all possible paths in the graph, which was likely blowing your heap.

Topic		Replies	Views
Help with optimizing query Cypher performance , browser , cypher , operations	3	432	September 21, 2021
1000 queries takes time(as expected), how can i approach this this in a better way Neo4j Graph Platform performance , cypher	3	343	November 24, 2023
Can we improve this further: Optimize Expand Operation Cypher performance , union	9	1777	August 30, 2018
Slow Cypher Query Help Needed! Cypher performance	6	942	January 16, 2020
Optimizing simple queries for very large graph DB Cypher performance	12	1374	February 29, 2024

Get Certified in June!

Cypher query is taking too long when trying to find shared nodes between two separate nodes

Related topics