Cypher Query taking time

Hello Friends ,
I am using the below query to return the csv of recommended product for each customer using content based filtering based on customers recent browsing history .We are looking into the categories and price of the product that they have browsed and
recommending a list of product for each customer, the number of customer node is around 38000 and to develop a list of product for each customer, the query is taking more than 1 hour .
here is my query-

"
** match(p1:Customer) where((p1) -[:InteractsWith]->(:Product))**
with COLLECT({CustomerId: p1.CustomerId }) as ws
unwind ws as ws1
match(p1:Customer{CustomerId: ws1.CustomerId})-[x:InteractsWith]->(pr:Product),
(c:Category)-[:HasProduct]->(pr)
where not exists ((p1)-[:InteractedWith]->(pr))
with c.Category as Category, x.Date as date, p1.CustomerId as Id
order by date desc
with collect(Category)[..5] as category, Id
unwind category as w2
match(c:Category{Category: w2})-[:HasProduct]->(pr:Product)
return Id, w2, collect(pr.ProductId)[..3] as pid "

now if i break the query into instances and run it on python the time taken is around 35 minutes
here is my python code for the same-
"
from py2neo import Graph
from tqdm import tqdm
import threading

#connecting database
graph = Graph('bolt://192.168.1.156:11002',user='neo4j',password='***')
#query for customers id and developing a list for customers
query_customer_id = 'match(p1:Customer) where((p1) -[:InteractsWith]->(:Product)) return p1.CustomerId'
customer = graph.run(query_customer_id).to_data_frame()

history_based_product =

#getting the category of recent browsing history category
def get_category_history(c):
query_category_history = '''
match(p1:Customer{CustomerId: "'''+c+'''"})-[x:InteractsWith]->(pr:Product),
(c:Category)-[:HasProduct]->(pr)
return x.Date as date,
c.Category as product_category
order by x.Date desc
limit 20'''
return graph.run(query_category_history).to_data_frame()

#getting product
def get_product_for_5(i):
query_getting_product_from_category_for_5 = '''
match(c:Category{Category: "'''+i+'''" })-[:HasProduct]->(pr:Product)
return pr.ProductId as pid limit 3'''
return graph.run(query_getting_product_from_category_for_5).to_data_frame().iloc[:,0].to_list()
"
is there any problem with the query itself ?
please help me out on the same.
Thank you

Hi Shubham,

I see that both the codes are not same in the Python query you are not looking for relationship ** where not exists ((p1)-[:InteractedWith]->(pr))

You are using Cartesian Product is there any specific reason for the same?
I am not sure home much data your code his hitting but 1 or 35 mins is too much .

I am ready to brainstorm with you on the performance.
Although I am very sure how much will it help, please just try below query post creating index on the node properties which are used in Where clause.

** match(p1:Customer) -[:InteractsWith]->(:Product))**
with COLLECT( p1.CustomerId ) as ws
unwind ws as ws1
match(p1:Customer{CustomerId: ws1.CustomerId})-[x:InteractsWith]->(pr:Product),
Optional Match (c:Category)-[:HasProduct]->(pr)
where not exists ((p1)-[:InteractedWith]->(pr))
with c.Category as Category, x.Date as date, p1.CustomerId as Id
order by date desc
with collect(Category)[..5] as category, Id
unwind category as w2
match(c:Category{Category: w2})-[:HasProduct]->(pr:Product)
return Id, w2, collect(pr.ProductId)[..3] as pid "

1 Like

hii @intouch_vivek , Thanks for the support


As you can see in the above picture, I have already created index on few properties ,now i tried running the below query
"
match(p1:Customer) -[:InteractsWith]->(:Product)
with COLLECT({CustomerId: p1.CustomerId }) as ws
unwind ws as ws1
match(p1:Customer{CustomerId: ws1.CustomerId})-[x:InteractsWith]->(pr:Product)
optional match (c:Category)-[:HasProduct]->(pr)
with c.Category as Category, x.Date as date, p1.CustomerId as Id
order by date desc
with collect(Category)[..5] as category, Id
unwind category as w2
match(c:Category{Category: w2})-[:HasProduct]->(pr:Product)
return Id, w2, collect(pr.ProductId)[..3] as pid "
I removed where not exist line , i just added that so that the product with which the customer has interacted earlier does not appear again and now i removed that.
there are 38000 nodes for the customers the query will return around 38000*5 rows,now i ran the query and found out that query was running till 558 seconds and then i checked the query-list but it was not there after 558 seconds and it didn't return anything either

and as you can see query is still processing but when i call it on query-list it is not there.
image
hope you can help me on the same , Thankyou Again for the support

Could you please try below
match(p1:Customer) -[:InteractsWith]->(:Product)
with COLLECT(p1.CustomerId ) as ws
unwind ws as ws1
match(p1:Customer{CustomerId: ws1.CustomerId})-[x:InteractsWith]->(pr:Product)<--[:HasProduct]-(c:Category)
with c.Category as Category, x.Date as date, p1.CustomerId as Id
order by date desc
with collect(Category)[..5] as category, Id
unwind category as w2
match(c:Category{Category: w2})-[:HasProduct]->(pr:Product)
return Id, w2, collect(pr.ProductId)[..3] as pid

hi @intouch_vivek , tried executing the above code
it returned the error Type mismatch: expected a map but was String("100000911")
so i changed with COLLECT( p1.CustomerId ) as ws line by
with COLLECT({CustomerId: p1.CustomerId }) as ws and it was running till 500 second and at the same time i call the query log, page hits were around 399065748 and after that query-logs were not there for the query i executed, and it was still running on terminal but didn't return anything.

Hi Shubham

Can we see it now?