How to optimize queries on array properties?

lingvisa · November 19, 2020, 8:56pm

For example,

MATCH (m:Product) 
WHERE 'moisturize' IN m._effect
RETURN m

In this query, '_effect' is a property of an array. When the query above returns non-empty results, it takes only 2ms, shown in the browser. And when the result set is empty, it takes over 100ms to complete. I already created on the array property '_effect'. Why doesn't the index have an effect on the IN keyword in the query, when the query returns nothing?

However, Profile shows that:
In my database, when result is empty, it did 84409 db hits for NodeByLabelScan for 84408 estimated row, then another 84408 hits for the IN filter.
When the result is not empty, the number of db hits are the same, 84409 and 84408. The difference is this number:

Cypher version: CYPHER 4.0, planner: COST, runtime: INTERPRETED. 168817 total db hits in 8 ms.
Cypher version: CYPHER 4.0, planner: COST, runtime: INTERPRETED. 168817 total db hits in 109 ms

So, the number of DB hits are the same, but the time to complete is very different and this is shown in the browser: 8 VS 109. Why is that?

dana_canzano · November 19, 2020, 9:38pm

this is effectively the same as Index on array which is to say that at this time we do not support indexing on array properties

lingvisa · November 19, 2020, 9:50pm

@dana_canzano I have a general question on cypher query and times spent on them. In my application, the time allocated to neo4j queries is only <20 milliseconds per request. However, it seems I saw a lot of cyphers when it is a little complex, it takes more than 100 mill-seconds. Is this the reality for db query or my model isn't well designed? Generally, for is it common for a 100 millsecond query to occur in real-time application? Real-time application I mean production database. Clusters don't help with single request performance. Right?

I believe neo4j have been used by a lot of people for production system, how do they deal with performance, latency issue? Is it a realistic expectation to optimize most queries to take less than 50 or even less milliseconds for neo4j?

dana_canzano · November 19, 2020, 9:55pm

What are your requirements? Is 100ms too long? Must all queries return in < 2ms ??

But also performance is a large domain not just of Neo4j but any database. Are you queries written in a manner that is to achieve performance? Do you have indexes were they would prove beneficial?
Also, performance is related to hardware. Is your storage using SSDs or spinning disks? Does the size of your graph exceed total RAM?

lingvisa · November 19, 2020, 10:03pm

For instance, if my QPS requirement is 500, does that mean in every millisecond, it has to complete 2 requests?

dana_canzano · November 19, 2020, 10:06pm

yes from a simple mathematical perspective thats correct.

lingvisa · November 19, 2020, 10:11pm

That seems too harsh, but it is a real requirement. DB query alone may exceed that limit. @dana_canzano any suggestion to achieve that? Assuming that my hardware and other software is optimized to the best extent, neo4j can also make sure most queries to complete in a few milliseconds?

dana_canzano · November 19, 2020, 10:18pm

That seems too harsh, but it is a real requirement

thats not a question i can answer. The requirement would need to be set by the end users. i.e. if for example you are doing real time stock trading analysis, maybe a 2ms response is required. Again its not something I can answer.

also regarding

Assuming that my hardware and other software is optimized to the best extent, neo4j can also make sure most queries to complete in a few milliseconds?

its not as if Neo4j can just make queries finish in a few milliseconds. again there are lots of variables in play here. i.e. the size of the graph, the complexity of the queries, he amount of data to be returned by a given query, the number of concurrent queries, as well as hardware

lingvisa · November 19, 2020, 10:25pm

@dana_canzano. This is my code to execute the query for users to call. Regarding 'concurrent queries', you mean clients write concurrent code to make requests, or I need to write concurrent cypher/driver code for neo4j?

 with self.neo4j_driver.session() as session:
            return session.read_transaction(self._query_by_node_property, properties, match_mode, name_expansion, full_node, node_type, channel, limit)

dana_canzano · November 19, 2020, 10:33pm

By concurrent queries Im referring to the number of queries in flight/running at the same time regardless of if those queries are reads or writes.

Let's say everyone ( where everyone = 50 people) arrives at the office at 9a and everyone logs in and everyone all at the same time issues a query to report the number of orders received yesterday broken down by hour. Those 50 queries come in all at the same time. Thus 50 concurrent queries

lingvisa · November 19, 2020, 10:35pm

So clients send concurrent requests to neo4j to get results. For the code example above, do I need to do something special to support concurrent requests either by coding or DB configuration, or it's inherently supported?

dana_canzano · November 19, 2020, 10:36pm

Nothing special. Are you seeing issues ???

lingvisa · November 19, 2020, 10:44pm

I haven't tested concurrent queries yet. Will do that later when I see normal requests are optimized to satisfactory level.

For QPS=500, it means as long as you complete 500 requests in one second, you are fine. It is NOT necessarily mean on average each query need to take 2 milliseconds. For concurrent queries, 500 QPS can be easily achieved than non-concurrent queries. If that' s the case, if one query takes 100 milliseconds, and the overall QPS=500, it still meet the requirements. Is that right? I haven't considered about relationship between concurrency and QPS.

naftali0011 · November 25, 2020, 8:30pm

וואלה חשבתי שאנחנו חברים ומיסתבר לי עכשיו שדווקא אתם שגרמתם לי לתקלות. אני נפתלי סרוסי שמפרגן לכם גם בכול מקום איפשרי. סך הכול אמרתי לכם שאיפשר ליהיות לשלום עם כולם. גם אם לא הישתמשתי באיזשהי אפליקציה של מיתחרים אחרים. זה לא אומר שהייתם צריכים לעשות ככה. אמי היקרה והקדושה הלכה לעולמה ליפני שבוע וחצי ואפילו שיתפתי אותכם בכאב שלי. אתם גוגל ואתם במיוחד עלי שאישרתי לכם ליפתוח הכול מימפות ועד אתם יודעים עד כמה. אז למה ככה? וואלה תקופה לא פשוטה גם ככה. המקרר שלי ריק. ואני נילחם שנים על חיי והתקופה האחרונה יש החמרה קצת. בכול מצב תעשו מה שבא לכם. בא לכם לנתק לי הכול תנתקו כי היתעייפתי מהסברים וחוסר חמלה מצידכם על אדם שמיתוך כול מה שאני עובר תמיד מישתדל לפרגן. גם אם היה קשה ולא היה לי ראש לכלום ולאף אחד. רוב הפעמים בהיתאמצות לא קטנה אבל סבבה הכול טוב ויהיה עוד יותר טוב
לילה טוב ותודה באמת תודה

clem · November 25, 2020, 10:22pm

I don't know if you're willing to redo your schema (i.e. not use lists in a property), but I believe you'd get a speed up if you created separate properties for each item instead of having a list of items.

E.g. m.effect_moisturize = true OR have no property effect_moisturize (Null). (I have the prefix "effect_" so that you can search on property names in case you need to, which I believe is possible...)

I think it might be faster because I think the lookup to see if a property exists or not is faster than see if a string is in a list.

Of course, this may not be what you want to do. (You might consider keeping both a list of strings and properties representing the strings in the list.)

Topic		Replies	Views
Is this query speed normal? Neo4j Graph Platform	5	594	October 23, 2020
Tuning of cypher queries for List operations Cypher optimization	3	340	September 2, 2021
Index on array Cypher cypher , index	5	4128	June 30, 2021
Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice Cypher performance , cypher	5	409	December 18, 2020
How to optimize the query Cypher performance , cypher	0	442	February 6, 2020

How to optimize queries on array properties?

Related topics