Is there a more efficient version of this query?

lingvisa · May 23, 2020, 7:39pm

MATCH (n:Product) WHERE n.name =~ "(?i)Apple" OR n.type =~ "(?!)phone" OR n.model =~ "(?i)plus" RETURN n

Cobra · May 23, 2020, 8:12pm

I think you cannot do more efficient

Regards,
Cobra

a10554 · May 23, 2020, 10:05pm

Hi lingvisa,

i'm not an expert, but maybe you can explore indexes to improve speed of the query.
https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-search-performance/index.html

I'm sure Maxime can help you on that if that's what you are looking for.

brgs
JAS

Cobra · May 23, 2020, 10:09pm

Yeah @a10554, you right

If you want your index to be unique like a product id, use UNIQUE CONSTRANITS
DOC: Constraints - Cypher Manual

Regards,
Cobra

lingvisa · May 24, 2020, 12:50am

Yes, Cobra and a10554:

I have indexes and constraints. The reason I asked this question was that my program suddenly slowed down a lot and I suspect it was because I made this change:
WHERE n.name =~ "(?i)" + name + " OR n.type =~ "(?!)" + type + " OR n.model =~ "(?i)" + model"

name, type and model are parameters passed into a function.

Originally, it was exact match like:
Where n.name =

But I don't think this change should cause slow down too much. I need to investigate more on the change of speed.

Cobra · May 24, 2020, 9:49am

Hello @lingvisa,

Indeed, regexs are faster , one more thing you can try is to build regexs in parameters and not in the cypher request

Regards,
Cobra

a10554 · May 24, 2020, 10:35am

Hi lingvisa,

if you find the slowdown is because of the changes you made to the query, and you really want to speed up things, may i suggest you create a name_2 and model_2 properties with lowercase data and then match them with your parameters also in lowercase ?

Cobra · May 24, 2020, 6:56pm

@lingvisa, why did you add + " after each conditions?

This should be enough normally? WHERE n.name =~ "(?i)" + name OR n.type =~ "(?!)" + type OR n.model =~ "(?i)" + model

@a10554 I don't think that is a good practise and the regex which starts by (?i) is case insensitive

Regards,
Cobra

a10554 · May 24, 2020, 6:59pm

@Cobra was thinking if there are millions of nodes, going through regex over those millions of nodes would be computacionaly intensive, i guess. Having seen some practices done in business intelligence to achieve better performances by having redundant info doesn't shock me anymore. (check "Star schema" from Kimball)

Cobra · May 24, 2020, 7:03pm

If it's like other languages and I think it is, regex will be always faster than anything

I see what you mean but you can have performance and a clean database In this case maybe when he load the data, he can format them directly correctly to avoid to do this in the query, there are lot of improvments around that could be possible

lingvisa · May 24, 2020, 11:31pm

def get_nodes_by_name(channel, label, name, limit_count=500000):
    """
    Get node by matching  n.name, n.znname, n.enname As long as one matches, return
    """
    cypher = "MATCH (n:"  + label + ") WHERE n.channel = \"" + channel + "\" AND (n.name =~ \"(?i)" + name + "\" OR n.znname =~ \"(?i)" + name + "\" OR n.enname =~ \"(?i)" + name + "\") RETURN n LIMIT $nproduct"
    data = {
        "statements": [
            {
                "statement": cypher,
                "parameters": {
                    "name_value": name,
                    "nproduct": limit_count
                },
            }
        ]
    }
    r = requests.post(URL_V4, headers=headers, json=data)
    #print(label, name)
    rows = r.json()["results"][0]["data"]
    nodes = []

    for row in rows:
        node= row['row'][0]
        nodes.append((node))

    return nodes

Cobra, The above is one of my typical queries composed through a function call. How would you improve it in this case?

Cobra · May 24, 2020, 11:54pm

@lingvisa You are using Py2neo right?

You can pass channel and name as parameters:

cypher = "MATCH (n:"+label+") WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
data = {
    "statements": [
        {
             "statement": cypher,
             "parameters": {
                 "channel": "(?i)" + channel,
                 "name": "(?i)" + name,
                 "nproduct": int(limit_count)
              },
         }
     ]
}

Moreover, if you don't have a lot of different Labels, I advice you to do a ifelse condition and to write directly the label in cypher request (to avoid injection for example), for this request, I don't think there is any problem but it's better to use good practices:)

For example if you have only a Product and item labels:

label = str(label).lower()
if label == "product":
    cypher = "MATCH (n:Product) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
elif label == "item":
    cypher = "MATCH (n:Item) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
else:
    print("Label invalid!")

lingvisa · May 24, 2020, 11:59pm

Cobra, this looks a lot cleaner. Actually, I am using the original Neo4j HTTP API. It's my own function definition. I will use this syntax.

I have a few labels in a specific channel (domain). So the ifelse should work for better performance.

Cobra · May 25, 2020, 12:09am

Oh I see, I never used the Neo4j HTTP API

the parameters are here to speed up your query because the database will recognize your request
regexs make your query robustness because it is case insentive for channel and name parameters in your case
you can cast the limit_count to be sure to have an int and not something else
the ifelse on labels will prevent injection and unwanted behaviour

These things will make your code better, cleaner and resilient

lingvisa · May 25, 2020, 12:20am

That makes sense. I will have another query improvement in a similar fashion which is composed very similar but is used in a Neo4j Driver API scenario. I will ask later.

Cobra · May 25, 2020, 12:21am

No problem, I will be happy to help you

I always use Python Neo4j driver

Regards,
Cobra

mojo2go · May 29, 2020, 7:26pm

I agree with @a10554 that regex useage is going to be slower than native match terms. I love regex’s when they can get me out of a bind, or parse unstructured/semi-structured data, but I do t think they benefit much from the index. I would guess that they are at least as slow as a CONTAINS, which has to check every character in a string. In contrast STARTS WITH is fast ant totally makes use of the index. I think ENDS WITH is finally benefiting from the index, via a trick where the the search reverses word character sequence...so not as fast as STARTS WITH. But that’s a hint that you don’t have to make the match in a single step. You may be able to do an intermediate search that reduces the size of your matched objects, then use the surgical regex to make the final match.

michael.hunger · May 30, 2020, 8:58am

You should use fulltext indexes which are optimized for this can handle multiple properties and have full lucene syntax support

https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-full-text-search/

Cobra · May 30, 2020, 10:09am

Hello @michael.hunger

To use them, string properties must be unique?

Regards,
Cobra

michael.hunger · May 30, 2020, 11:15am

No they don’t have to be unique and can also be larger texts

Topic		Replies	Views
Query optimalisation Cypher	25	2455	February 25, 2019
Why do these two queries differ a lot in speed? Neo4j Graph Platform	8	562	June 24, 2021
Speeding up query Cypher cypher	2	1793	March 15, 2019
Why are these two queries perform differently? Neo4j Graph Platform	4	352	November 19, 2020
Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice Cypher performance , cypher	5	429	December 18, 2020

Free Online Global Conference

Is there a more efficient version of this query?

Related topics