Query taking a lot of time

mojojo7002 · May 17, 2020, 7:11am

Hello all,
I am using this query to extract products similar to a product in a certain category within a price range and it is taking a lot of time.

**match(p1:Product) **
with p1
limit 5000
match(p1)<-[:HasProduct]-(c:Category)-[:HasProduct]->(p:Product)
where tofloat(p.Price)>0.5tofloat(p1.Price) and tofloat(p.Price)<2tofloat(p1.Price)
with p1, collect(p)[..3] as products
return p1.ProductId, [product in products | product.ProductId] as pid

There are 87000 product nodes and 219 category nodes. With a limit 0f 5000, it is taking around 8 minutes to return.

Here is the explain plan for the query:

I have already created required indexes while creating the Graph DB itself. Snippet for the same is below:

cobra · May 17, 2020, 10:44am

Hello,

Did you add unique constraints on your nodes?

mojojo7002 · May 17, 2020, 1:09pm

Hello there,
I haven't added any constraints on my nodes (Probably because I am not very sure about them). Maybe you can help me with the same. I have already showed you my created indexes.

Here is my query for the node and relationship creation with all the labels and their properties.

:auto using periodic commit 5000
load csv with headers from "file:///data.csv" as line
merge (s:State{name:line.State})
merge (p:Product{ProductId:line.ProductId,Price:tofloat(line.ProductPrice),ProductName:line.ProductName})
merge (c:Category{Category:line.Category})
merge (cr:CategoryRollUp{CategoryRollUp:line.CategoryRollUp})
merge (cu:Customer{CustomerId:line.CustomerId,Grouping:line.NewGrouping})
merge (cu)-[:InteractsWith{Date:date(line.TransactionDate),Month:line.Month,EventScore:tofloat(line.EventScore),Event:line.Event,Quantity:toInteger(line.QtySold)}]->(p)
merge (s)-[:HasUser]->(cu)
merge (c)-[:HasProduct]->(p)
merge (cr)-[:HasCategory]->(c)

I have tried creating a constraint on the Price property of the Product Label but certainlt that is not unique so it showed me an error.
Can you guide me in creating contraints for the same?

cobra · May 17, 2020, 1:29pm

A unique constraint must be unique:) So for each Label Node, you must create an index and create a unique constraint on them:)

You can use uuid to generate unique ID: https://neo4j.com/docs/labs/apoc/current/graph-updates/uuid/ if you don't have unique ID. But I can see you have a ProductId property, so you can do this:

Before creating nodes, you must create a unique constraint, for exemple:

CREATE CONSTRAINT ON (p:Product) ASSERT p.ProductId IS UNIQUE

mojojo7002 · May 21, 2020, 7:52am

Hii there, I have created a unique id constraint by following the link you shared after adding a uuid property to the Product node. The code and screen grab for the same is here:

match(p1:Product)
SET p1.uuids = apoc.create.uuid()
return p1.uuids

But surprisingly the execution time is the same. Where am I going wrong?

cobra · May 21, 2020, 8:01am

I think it's because of your Cypher request, it's doing several Merge at the same time. On my projects, I have at least 2 requests:

one to merge nodes from a batch
one to merge relations from a batch

and I call them several times and I can create thousand of nodes and relations in a few seconds:)

Do you currently have your data in the same CSV file? So you use a single line to get everything you need, how many times does it take to create all the nodes and relationships?

mojojo7002 · May 21, 2020, 8:05am

I have no issue with the node and relationship creation query with that query I am able to make around 100000 nodes and 1.7 million relationships in 9 minutes but I am having issue with the following read type query:

match(p1:Product)
with p1
limit 5000
match(p1)<-[:HasProduct]-(c:Category)-[:HasProduct]->(p:Product)
where tofloat(p.Price)>0.5 tofloat(p1.Price) and tofloat(p.Price)<2 tofloat(p1.Price)
with p1, collect(p)[..3] as products
return p1.ProductId, [product in products | product.ProductId] as pid

With only 5000 products it took around 15 minutes to return and there are 87000 unique products

cobra · May 21, 2020, 8:15am

You also have unique constraint on Category?

I suppose the real request is:

MATCH (p1:Product)<-[:HasProduct]-(c:Category)-[:HasProduct]->(p:Product)
WHERE tofloat(p.Price) > 0.5 * tofloat(p1.Price)
AND tofloat(p.Price) < 2 * tofloat(p1.Price)
WITH p1, collect(p)[..3] AS products
RETURN p1.ProductId, [product in products | product.ProductId] AS pid

Can you show me the results of these requests:

CALL db.constraints
CALL db.schema.visualization()

mojojo7002 · May 21, 2020, 8:34am

Here are the results:

cobra · May 21, 2020, 8:37am

Your schema and your request are very simple so that's weird, did you try to increase the RAM of Neo4j in the neo4j.conf file?

mojojo7002 · May 21, 2020, 8:40am

I have already set the initial and maximum heap size to 20 and 24GB respectively but still it is not running the query. I have a 7 node cluster on the server each with similar specifications.

cobra · May 21, 2020, 8:42am

What do you mean by: a 7 node cluster on the server each with similar specifications.?

mojojo7002 · May 21, 2020, 8:43am

Probably this might help with your question:

cobra · May 21, 2020, 8:45am

Oh you are using Causal Clustering right?

mojojo7002 · May 21, 2020, 8:46am

Yes I am using Causal Clustering.

cobra · May 21, 2020, 8:48am

I didn't use it at the moment, did you try to create a simple local database to test without Causal Clustering if the same query was taking the same time or not?

mojojo7002 · May 21, 2020, 8:49am

Yes sure, I have created the same DB on desktop version too and it is taking time there also.

cobra · May 21, 2020, 8:53am

Did you try PROFILE and EXPLAIN on your request? Can you show me the results with them?

mojojo7002 · May 21, 2020, 9:08am

The PROFILE is only for limit 100 since it was taking time for the whole DB.

cobra · May 21, 2020, 10:36am

You can force Neo4j to use index: Planner hints and the USING keyword - Cypher Manual

To be honest, after I have no idea

Topic		Replies	Views
Cypher Query taking time Cypher	5	487	May 13, 2020
Query Taking Time Cypher	4	381	May 19, 2020
Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice Cypher performance , cypher	5	412	December 18, 2020
Help with optimizing query Cypher performance , browser , cypher , operations	3	418	September 21, 2021
1000 queries takes time(as expected), how can i approach this this in a better way Neo4j Graph Platform performance , cypher	3	341	November 24, 2023

Query taking a lot of time

Related topics