cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Slow subsequent filtering query

jim_ruts
Node Link

Hi everyone.

We are using Neo4J for a while now in our company, but we are struggling with some of our queries (mostly performance issues on counts)

I would like to explain what it is we are trying to achieve, and hopefully someone can help us figuring it out.

The Setting
We have roughly 2 million :Product nodes with many relationships to other nodes like :Retailer, :Universe,...
e.g:
(:Product)-[:HAS_RETAILER]->(:Retailer)
(:Product)-[:HAS_UNIVERSE]->(:Universe)
...

All nodes have at least one index (usually on an id field.
Also, a product can have multiple retailer links and multiple universe links.

Note: These 2 relationships are just a few of many

The Problem
Whenever we try to filter our :Product by a single relationship, it's really fast.

 

 

// Note: the real query has the variables as query parameters
// not hard coded like this
match (p:Product)-[:HAS_RETAILER]->(r:Retailer)
where r.id in ["id1", "id2", ...]
return count(distinct p)

 

 



returning a count on the distinct products takes a few 100 milliseconds. 
Executing the plan shows us the proper index is hit and there is no issue with this whatsoever. 

The problem however is trying to filter the products further, lets say by :Retailer and :Universe
Below you can find an example of a query we try to run

 

 

// Note: the real query has the variables as query parameters
// not hard coded like this
match (p:Product)-[:HAS_RETAILER]->(r:Retailer)
where r.id in ["id1", "id2", ...]
with distinct p
match (p)-[:HAS_UNIVERSE]->(u:Universe) 
where u.id in ["1", "2", ...]
return count(distinct p)

 

 


This query  returns a lot slower (a few seconds). 
Investigating the plan it seems like no indexes are hit anymore the moment you reach a subsequent filter.
If the first filter returns 500.000 products, it seems the subsequent filter would have to inspect all of them to filter it down.

I believe we must be doing something wrong here but I cannot find a solution how to handle a situation like this and make the counts fast.

If we return the products itself, not the count. and limit the result then it is still really fast. However when paginating using skip and limit the results come in slower and slower the higher the skip is. Comparable with the response time of a count.

I am hoping someone has a solution for this or a good pattern to deal with these kinds of filtering.

Kind regards

2 REPLIES 2

bennu_neo
Neo4j
Neo4j

Hi @jim_ruts !

Before getting some fun, may I ask what Neo4J version are you using?

Oh, y’all wanted a twist, ey?

Hey @bennu_neo , 

thanks for replying.

We are using version 4.4 (we use Aura Professional (8GB  memory, 2 CPU, 16GB Storage) for 3 environments.)