Is there a more efficient version of this query?

MATCH (n:Product) WHERE n.name =~ "(?i)Apple" OR n.type =~ "(?!)phone" OR n.model =~ "(?i)plus" RETURN n

Hello @lingvisa,

I think you cannot do more efficient :smile:

Regards,
Cobra

Hi lingvisa,

i'm not an expert, but maybe you can explore indexes to improve speed of the query.
https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-search-performance/index.html

I'm sure Maxime can help you on that if that's what you are looking for.

brgs
JAS

1 Like

Yeah @a10554, you right :slight_smile:

If you want your index to be unique like a product id, use UNIQUE CONSTRANITS :slight_smile:
DOC: Constraints - Cypher Manual

Regards,
Cobra

Yes, Cobra and a10554:

I have indexes and constraints. The reason I asked this question was that my program suddenly slowed down a lot and I suspect it was because I made this change:
WHERE n.name =~ "(?i)" + name + " OR n.type =~ "(?!)" + type + " OR n.model =~ "(?i)" + model"

name, type and model are parameters passed into a function.

Originally, it was exact match like:
Where n.name =

But I don't think this change should cause slow down too much. I need to investigate more on the change of speed.

Hello @lingvisa,

Indeed, regexs are faster :slight_smile:, one more thing you can try is to build regexs in parameters and not in the cypher request :slight_smile:

Regards,
Cobra

Hi lingvisa,

if you find the slowdown is because of the changes you made to the query, and you really want to speed up things, may i suggest you create a name_2 and model_2 properties with lowercase data and then match them with your parameters also in lowercase ?

@lingvisa, why did you add + " after each conditions?

This should be enough normally? WHERE n.name =~ "(?i)" + name OR n.type =~ "(?!)" + type OR n.model =~ "(?i)" + model

@a10554 I don't think that is a good practise :smile: and the regex which starts by (?i) is case insensitive :slight_smile:

Regards,
Cobra

@cobra was thinking if there are millions of nodes, going through regex over those millions of nodes would be computacionaly intensive, i guess. Having seen some practices done in business intelligence to achieve better performances by having redundant info doesn't shock me anymore. (check "Star schema" from Kimball)

If it's like other languages and I think it is, regex will be always faster than anything :smile:

I see what you mean but you can have performance and a clean database :smile: In this case maybe when he load the data, he can format them directly correctly to avoid to do this in the query, there are lot of improvments around that could be possible :slight_smile:

def get_nodes_by_name(channel, label, name, limit_count=500000):
    """
    Get node by matching  n.name, n.znname, n.enname As long as one matches, return
    """
    cypher = "MATCH (n:"  + label + ") WHERE n.channel = \"" + channel + "\" AND (n.name =~ \"(?i)" + name + "\" OR n.znname =~ \"(?i)" + name + "\" OR n.enname =~ \"(?i)" + name + "\") RETURN n LIMIT $nproduct"
    data = {
        "statements": [
            {
                "statement": cypher,
                "parameters": {
                    "name_value": name,
                    "nproduct": limit_count
                },
            }
        ]
    }
    r = requests.post(URL_V4, headers=headers, json=data)
    #print(label, name)
    rows = r.json()["results"][0]["data"]
    nodes = []

    for row in rows:
        node= row['row'][0]
        nodes.append((node))

    return nodes

Cobra, The above is one of my typical queries composed through a function call. How would you improve it in this case?

@lingvisa You are using Py2neo right?

You can pass channel and name as parameters:

cypher = "MATCH (n:"+label+") WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
data = {
    "statements": [
        {
             "statement": cypher,
             "parameters": {
                 "channel": "(?i)" + channel,
                 "name": "(?i)" + name,
                 "nproduct": int(limit_count)
              },
         }
     ]
}

Moreover, if you don't have a lot of different Labels, I advice you to do a ifelse condition and to write directly the label in cypher request (to avoid injection for example), for this request, I don't think there is any problem but it's better to use good practices:)

For example if you have only a Product and item labels:

label = str(label).lower()
if label == "product":
    cypher = "MATCH (n:Product) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
elif label == "item":
    cypher = "MATCH (n:Item) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
else:
    print("Label invalid!")

Cobra, this looks a lot cleaner. Actually, I am using the original Neo4j HTTP API. It's my own function definition. I will use this syntax.

I have a few labels in a specific channel (domain). So the ifelse should work for better performance.

Oh I see, I never used the Neo4j HTTP API :smile:

  • the parameters are here to speed up your query because the database will recognize your request :slight_smile:
  • regexs make your query robustness because it is case insentive for channel and name parameters in your case :slight_smile:
  • you can cast the limit_count to be sure to have an int and not something else :slight_smile:
  • the ifelse on labels will prevent injection and unwanted behaviour :slight_smile:

These things will make your code better, cleaner and resilient :slight_smile:

That makes sense. I will have another query improvement in a similar fashion which is composed very similar but is used in a Neo4j Driver API scenario. I will ask later.

No problem, I will be happy to help you :slight_smile:

I always use Python Neo4j driver :smile:

Regards,
Cobra

I agree with @a10554 that regex useage is going to be slower than native match terms. I love regex’s when they can get me out of a bind, or parse unstructured/semi-structured data, but I do t think they benefit much from the index. I would guess that they are at least as slow as a CONTAINS, which has to check every character in a string. In contrast STARTS WITH is fast ant totally makes use of the index. I think ENDS WITH is finally benefiting from the index, via a trick where the the search reverses word character sequence...so not as fast as STARTS WITH. But that’s a hint that you don’t have to make the match in a single step. You may be able to do an intermediate search that reduces the size of your matched objects, then use the surgical regex to make the final match.

You should use fulltext indexes which are optimized for this can handle multiple properties and have full lucene syntax support

https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-full-text-search/

Hello @michael.hunger

To use them, string properties must be unique?

Regards,
Cobra

No they don’t have to be unique and can also be larger texts