cancel
Showing results for 
Search instead for 
Did you mean: 

Cypher query optimization

busymo16
Ninja
Ninja

Hello everyone,

Basically I have the following query:

MATCH (f:Family)-[:BELONGS_TO]-(pr:Product)
UNWIND range(0, 100) as id
WITH f, pr, [x in keys(pr)] as props
WHERE toLower(f.name) = toLower('family_name') AND toLower('specification') IN [word in pr[props[id]] | toLower(toString(word))]
RETURN DISTINCT f.name AS Family

With this query I want to find the family that the product belongs to based of a quality that the product has. The quality of the product is stored as a property value (in a list of values). When I use PROFILE, for the query above I get 622795 total db hits in 126ms. When I just reduce the range of UNWIND (0, 50), I do get 248773 total db hits in 69 ms.

Is there any better way how to check if a 'specification' is inside a list of property values of the Product node without knowing the property key and using UNWIND (because this seems expensive)??

Thank you in advance.

1 ACCEPTED SOLUTION

dana_canzano
Neo4j
Neo4j

2 things

  1. you usage of toLower(f.name) = toLower('family_name') is going to negate any usage of an index on :Family(name) if such an index existed. is your data such that there are :Family nodes with a name and some have a value of family name and some have a value of FamilY NaMe and some have a value of faMily NAME such that the only way to solve this is to do a toLower() on both sides of the expression?

  2. you can move that portion of the WHERE clause higher up for example

MATCH (f:Family)
WHERE toLower(f.name) = 'family_name'
MATCH (f)-[:BELONGS_TO]-(pr:Product)
UNWIND range(0, 100) as id
WITH f, pr, [x in keys(pr)] as props
WHERE 'specification' IN [word in pr[props[id]] | toLower(toString(word))]
RETURN DISTINCT f.name AS Family

and this still keeps the toLower() on both sides of the expression and again still negating any usage of an index, if such an index existed on :Family(name), but is your data such that you must you really use toLower ?
Additionally I removed changed WHERE toLower('specification') IN ....... since toLower is unnecessary since 'specification' is already in its lower form. The same with toLower('family_name');

View solution in original post

6 REPLIES 6

dana_canzano
Neo4j
Neo4j

2 things

  1. you usage of toLower(f.name) = toLower('family_name') is going to negate any usage of an index on :Family(name) if such an index existed. is your data such that there are :Family nodes with a name and some have a value of family name and some have a value of FamilY NaMe and some have a value of faMily NAME such that the only way to solve this is to do a toLower() on both sides of the expression?

  2. you can move that portion of the WHERE clause higher up for example

MATCH (f:Family)
WHERE toLower(f.name) = 'family_name'
MATCH (f)-[:BELONGS_TO]-(pr:Product)
UNWIND range(0, 100) as id
WITH f, pr, [x in keys(pr)] as props
WHERE 'specification' IN [word in pr[props[id]] | toLower(toString(word))]
RETURN DISTINCT f.name AS Family

and this still keeps the toLower() on both sides of the expression and again still negating any usage of an index, if such an index existed on :Family(name), but is your data such that you must you really use toLower ?
Additionally I removed changed WHERE toLower('specification') IN ....... since toLower is unnecessary since 'specification' is already in its lower form. The same with toLower('family_name');

Thank you for your answer. I can see some optimizations happening. Another question: this optimization can be one when AND operator in the WHERE clause but not when OR operator in WHERE clause right?

But let's say that I have this node (f:Family {'key1':'value1', 'key2':'value2'....}. Is there any way that I can do a where condition over value1, value2 without knowing key1, key2 and without using UNWIND?

correct. But I can see some optimizations happening. ????? how so? whereas the query used to take time XXXms it now takes ???ms ????

regarding

not sure i understand? you want to restrict for values value1 and value2 but you dont know if these values are with a property named X or Y or Z or ???

busymo16
Ninja
Ninja

From 110ms to 40ms

UNWIND range(0, 100) as id followed by WHERE 'specification' IN [word in pr[props[id]] | toLower(toString(word))] is costing me a lot of time, because it will take in consideration all numbers as id. I do that because I do not know the property(key) and I want to check over property(value)?

so its now down to 40ms and you looking to get it even faster?

And regarding

UNWIND range(0, 100) as id

this assumes every Node has 100 properties. Is that the case with your datamodel?

you could also do

UNWIND range(0, size ( keys(pr) )  ) as id

which would perform a range from 0 to the number of properties in a :Product node.

But here again still we are basically saying there is a :Product node and somewhere within of of the properties, any property, that its name is 'specification'.

Is there no relative 'model'? i.e. some :Product nodes may have a specification but record it in a property named SpeCification2021 other :Product nodes record the same data in a property named specification_2021 and others with specification-2021

Nope, basically this is very good. Thank you!!

Nope, I just have to accept everything and then try to find out which product CONTAIN that specification as a property(value), but for this, I do not know in which property(key) this might be true. The system has to be dynamic that means different specifications might come as an input and that is what I need to check with the properties.