Hello Neo4J Community !
I encounter an issue of complexity recently and I don't know how to get past it.
My cypher code is the following :
MATCH (a:Aa)-[r_a:A]->(b:Bb)-[r_b:B]->(c:C), (c)<-[r_b1:B]-(b1:Bb)<-[r_a1:A]-(a1:Aa)
WHERE a.DATE>=a1.DATE>=a.DATE- duration({years:1}) AND a<>a1
RETURN a.id, count(distinct a1.id) as nb_1years
The idea behind it is that I want to count the number of Aa that are connected to c from a span of 1 year for each of the nodes in my database. The code is working correctly for a given node, but not for all the database, it runs indefinitely. I have tried to put it in a apoc.periodic.iterate, but it still does not work ...
The EXPLAIN returns this :
The issue seems to come from the NodeHashJoin which returns 801 millions rows. How can I improve this ?
Thanks for your help !
What is happening is the two paths patterns in your MATCH are being found, a Cartesian product being created, and a hash join computed to find pairs of paths where 'c' is equal. This is not efficient with a large dataset.
Is the value of 'id' unique for 'Aa' nodes? If so, the following should work:
MATCH (a:Aa)-[r_a:A]->(b:Bb)-[r_b:B]->(c:C)
RETURN a.id, COUNT {
MATCH (c)<-[r_b1:B]-(b1:Bb)<-[r_a1:A]-(a1:Aa)
WHERE a.DATE>=a1.DATE>=a.DATE- duration({years:1}) AND a<>a1
RETURN DISTINCT a1.id
} as nb_1years
Hello @glilienfield ,
First of all thank you for your response, it seems accurate.
From the response that you have given, I have now this issue :
It is working until the COUNT{ MATCH (c)...
.
We have the version 4.2.1. Maybe the COUNT{MATCH...}
does not exist on this version ?
Thank you again !
BTW : id is indeed a unique key
You are correct. The new specialized subqueries (count, collect, exists) I think are v5.x features.
Let's try this:
MATCH (a:Aa)-[r_a:A]->(b:Bb)-[r_b:B]->(c:C)
CALL {
WITH a, c
MATCH (c)<-[r_b1:B]-(b1:Bb)<-[r_a1:A]-(a1:Aa)
WHERE a.DATE>=a1.DATE>=a.DATE-duration({years:1}) AND a<>a1
RETURN COUNT(DISTINCT a1.id) as count
}
RETURN a.id, count
Thanks a lot @glilienfield ! It works like a charm
My database is a little bit big so it takes a little bit of time to run, but it works ! Really, thank you !
1 Like