Counting rows performance

neo4j.com · December 24, 2021, 3:14pm

I'm trying to count the amount of rows that Neo4j will return but the count (or the query) is very slow.

Version 1 (70 sec):

MATCH (person:Person)-[:HAS_ORDER]->(order:Order)
WHERE order.timestamp >= 1632434400 AND size((order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(:Product)) <= 20
WITH order
MATCH (order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(product:Product)
RETURN COUNT(product);

Version 2 (68 sec.):

MATCH (person:Person)-[:HAS_ORDER]->(order:Order)
WITH size((order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(:Product)) AS amount
WHERE order.timestamp >= 1632434400 AND amount <= 20
RETURN SUM(amount)

Using Neo4j 4.4 community with about 800000 orders and about 17000000 order lines.

Is there a more efficient way to count the rows?

These are the indexes:

CREATE INDEX idx_order_torder_id FOR (n:Order) ON (n.order_id);
CREATE INDEX idx_order_timestamp FOR (n:Order) ON (n.timestamp);
CREATE INDEX idx_person_person_id FOR (n:Person) ON (n.person_id);
CREATE INDEX idx_product_product_id FOR (n:Product) ON (n.product_id);

The amount of rows are equal to 4269011.

The EXPLAIN plan:

martin3 · December 24, 2021, 3:43pm

Maybe this can increase speed:

Is it possible to remove redundant data from the query (i.e. to remove Person in first match et.c.)?
Maybe avoiding using AND, but instead use several WITH, will increase speed?
Maybe using size() function on only the product node (or on a property of the product node) will increase speed?
Maybe MATCH-ing on timestamp in the beginning only, will increase speed, since maybe there are not so many nodes matching this criteria (i.e. that have a higher timestamp). You already have an index on timestamp.

MATCH (order:Order)
WHERE order.timestamp >= 1632434400
WITH order
MATCH (order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(product:Product)
WITH size(product) AS amount
WHERE amount <= 20
RETURN SUM(amount)

I guess it depends on your data. Maybe you can also refactor your data to increase speed, like this:

MATCH (order)-[:HAS_PRODUCT]->(product:Product)

It seems like your data is imported from a relational database with a many-to-many joining table, that maybe is not necessary in neo4j.

neo4j.com · January 10, 2022, 3:14pm

Thank you for your answer martin3. I tried your query but got an error because you can't use size on a node (product variable in this case). Instead of using size, I used count. And because every order line has one product, i can skip the counting of the relation order lines to products:

MATCH (order:Order) 
WHERE order.timestamp >= 1632434400 
WITH order 
MATCH (order)<-[:HAS_ORDER]-(orderLine:OrderLine) 
WITH COUNT(orderLine) as productCount 
WHERE productCount <= 20 
RETURN SUM(productCount);

This query took 0m17.342s

But i managed to snoop some seconds with the following query:

MATCH (order:Order) 
WHERE order.timestamp >= 1632434400
WITH order, size((order)<-[:HAS_ORDER]-(:OrderLine)) AS amount 
WHERE amount <= 20 
RETURN SUM(amount);

This query took 0m15.675s

Topic		Replies	Views
Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice Cypher performance , cypher	5	396	December 18, 2020
My count() query is too slow Cypher cypher , counts	6	1379	June 18, 2021
Fast count with count store on entity relations Cypher querying , performance , cypher	2	329	September 23, 2020
Is this query speed normal? Neo4j Graph Platform	5	575	October 23, 2020
How to load and analise data with large rows count? (Performance) Neo4j Graph Platform performance , cypher	0	71	June 5, 2024

Counting rows performance

Related topics