Multiple match takes too long to execute

Hello Neo4J Community !

I encounter an issue of complexity recently and I don't know how to get past it.

My cypher code is the following :

MATCH (a:Aa)-[r_a:A]->(b:Bb)-[r_b:B]->(c:C), (c)<-[r_b1:B]-(b1:Bb)<-[r_a1:A]-(a1:Aa) 
WHERE a.DATE>=a1.DATE>=a.DATE- duration({years:1}) AND a<>a1 
RETURN a.id,  count(distinct a1.id) as nb_1years

The idea behind it is that I want to count the number of Aa that are connected to c from a span of 1 year for each of the nodes in my database. The code is working correctly for a given node, but not for all the database, it runs indefinitely. I have tried to put it in a apoc.periodic.iterate, but it still does not work ...

The EXPLAIN returns this :

The issue seems to come from the NodeHashJoin which returns 801 millions rows. How can I improve this ?

Thanks for your help !

What is happening is the two paths patterns in your MATCH are being found, a Cartesian product being created, and a hash join computed to find pairs of paths where 'c' is equal. This is not efficient with a large dataset.

Is the value of 'id' unique for 'Aa' nodes? If so, the following should work:

MATCH (a:Aa)-[r_a:A]->(b:Bb)-[r_b:B]->(c:C)
RETURN a.id, COUNT {
    MATCH (c)<-[r_b1:B]-(b1:Bb)<-[r_a1:A]-(a1:Aa) 
    WHERE a.DATE>=a1.DATE>=a.DATE- duration({years:1}) AND a<>a1
    RETURN DISTINCT a1.id
} as nb_1years

Hello @glilienfield ,

First of all thank you for your response, it seems accurate.
From the response that you have given, I have now this issue :
image
It is working until the COUNT{ MATCH (c)....
We have the version 4.2.1. Maybe the COUNT{MATCH...} does not exist on this version ?

Thank you again !

BTW : id is indeed a unique key

You are correct. The new specialized subqueries (count, collect, exists) I think are v5.x features.

Let's try this:

MATCH (a:Aa)-[r_a:A]->(b:Bb)-[r_b:B]->(c:C)
CALL {
    WITH a, c
    MATCH (c)<-[r_b1:B]-(b1:Bb)<-[r_a1:A]-(a1:Aa) 
    WHERE a.DATE>=a1.DATE>=a.DATE-duration({years:1}) AND a<>a1
    RETURN COUNT(DISTINCT a1.id) as count
}
RETURN a.id, count

Thanks a lot @glilienfield ! It works like a charm :slight_smile:

My database is a little bit big so it takes a little bit of time to run, but it works ! Really, thank you !

1 Like