Query Taking Time

shubham · May 15, 2020, 6:55pm

Hii Everyone ,
Here is my query for extracting the products for each customers
match(p1:Customer)
with COLLECT(distinct p1.CustomerId ) as ws
unwind ws as ws1
match(p:Customer{CustomerId: ws1})-[x:InteractsWith]->(pr:Product)<-[:HasProduct]-(c:Category)
with c.Category as Category, x.Date as date, p.CustomerId as Id
order by date desc
with collect(distinct Category)[..5] as category, Id
unwind category as w2
match(c:Category{Category: w2})-[:HasProduct]->(pr:Product)
return Id, w2, collect(pr.ProductId)[..3] as pid

there are 38000 customer nodes ,87000 product nodes and 219 category nodes .
Database is running with max heap size 15gb and pagecache size of about 4gb
And when i replace the 2nd line of query(with COLLECT(distinct p1.CustomerId ) as ws) by with COLLECT(distinct p1.CustomerId ) as ws [..2500](basically for 2500 customers) the query took around 4 minutes to execute and when i run it for all the customers there was memory error
hope you can help me out in optimizing the query
Thank-you

andrew_bowman · May 15, 2020, 7:24pm

You're doing an unnecessary re-match to the :Customer nodes you already obtained. Just use p1, no need to rematch to p. This assumes CustomerId is unique per :Customer node (and if so you should have a uniqueness constraint on this).

Likewise, can we assume category is unique per :Category node? I'm going with that assumption for my suggestion.

Also no need to collect and unwind there.

MATCH (p1:Customer)
WITH p1
LIMIT 2500
MATCH (p1)-[x:InteractsWith]->(pr:Product)<-[:HasProduct]-(c:Category)
WITH c, x.Date as date, p1
ORDER BY date DESC
WITH collect(distinct c)[..5] as categories, p1
WITH categories, p1.CustomerId as Id
UNWIND categories as c
MATCH (c)-[:HasProduct]->(pr:Product)
WITH Id, c, collect(pr)[..3] as products
RETURN Id, c.Category as w2, [product in products | product.ProductId] as pid

shubham · May 15, 2020, 9:17pm

thanks for sharing the query
i tried executing the query that you have shared and the query that i have shared for limit 2500, here are my time logs
Start Execution : Sat May 16 02:10:30 2020
Stop Execution : Sat May 16 02:11:23 2020
Time taken = 53 seconds
time logs for the query that i have shared
Start Execution : Sat May 16 02:13:11 2020
Stop Execution : Sat May 16 02:14:10 2020
Time Taken = 61

Now if i set the limit to 5000
i observed the elapsed time for query was around 350 seconds and then the error connecting to appeared and database stopped.
i want to run the code to limit 38000 , is it possible or do i need to do it in loop?

danny.oberoi · May 16, 2020, 5:53am

Hi Shubham,

I would suggest you to set memory parameter in neo4j.conf file as per following command output in your environment before you try with higher value

neo4j-admin memrec

shubham · May 19, 2020, 8:11pm

Hey @danny.oberoi
I have already set the inital and maximum heap size to 20 and 24GB respectively but still it is not running the query in one go for 38000 customers as of now i have executed it for 25000 customers in one go and it took ~7 minutes to complete

Topic		Replies	Views
Cypher Query taking time Cypher	5	504	May 13, 2020
Query taking a lot of time Cypher	20	1106	December 3, 2021
How to fetch millions of data faster? Cypher	30	8498	October 18, 2019
Optimizing simple queries for very large graph DB Cypher performance	12	1435	February 29, 2024
Helllo, I am Djole from Bosnia Introduce-Yourself	2	378	July 22, 2020

July Summer Fun!

Query Taking Time

Related topics