Is there a more efficient version of this query?

MATCH (n:Product) WHERE n.name =~ "(?i)Apple" OR n.type =~ "(?!)phone" OR n.model =~ "(?i)plus" RETURN n

Hello @lingvisa,

I think you cannot do more efficient :smile:

Regards,
Cobra

Hi lingvisa,

i'm not an expert, but maybe you can explore indexes to improve speed of the query.
https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-search-performance/index.html

I'm sure Maxime can help you on that if that's what you are looking for.

brgs
JAS

1 Like

Yeah @a10554, you right :slight_smile:

If you want your index to be unique like a product id, use UNIQUE CONSTRANITS :slight_smile:
DOC: https://neo4j.com/docs/cypher-manual/current/administration/constraints/

Regards,
Cobra

Yes, Cobra and a10554:

I have indexes and constraints. The reason I asked this question was that my program suddenly slowed down a lot and I suspect it was because I made this change:
WHERE n.name =~ "(?i)" + name + " OR n.type =~ "(?!)" + type + " OR n.model =~ "(?i)" + model"

name, type and model are parameters passed into a function.

Originally, it was exact match like:
Where n.name =

But I don't think this change should cause slow down too much. I need to investigate more on the change of speed.

Hello @lingvisa,

Indeed, regexs are faster :slight_smile:, one more thing you can try is to build regexs in parameters and not in the cypher request :slight_smile:

Regards,
Cobra

Hi lingvisa,

if you find the slowdown is because of the changes you made to the query, and you really want to speed up things, may i suggest you create a name_2 and model_2 properties with lowercase data and then match them with your parameters also in lowercase ?

@lingvisa, why did you add + " after each conditions?

This should be enough normally? WHERE n.name =~ "(?i)" + name OR n.type =~ "(?!)" + type OR n.model =~ "(?i)" + model

@a10554 I don't think that is a good practise :smile: and the regex which starts by (?i) is case insensitive :slight_smile:

Regards,
Cobra

@cobra was thinking if there are millions of nodes, going through regex over those millions of nodes would be computacionaly intensive, i guess. Having seen some practices done in business intelligence to achieve better performances by having redundant info doesn't shock me anymore. (check "Star schema" from Kimball)

If it's like other languages and I think it is, regex will be always faster than anything :smile:

I see what you mean but you can have performance and a clean database :smile: In this case maybe when he load the data, he can format them directly correctly to avoid to do this in the query, there are lot of improvments around that could be possible :slight_smile:

def get_nodes_by_name(channel, label, name, limit_count=500000):
    """
    Get node by matching  n.name, n.znname, n.enname As long as one matches, return
    """
    cypher = "MATCH (n:"  + label + ") WHERE n.channel = \"" + channel + "\" AND (n.name =~ \"(?i)" + name + "\" OR n.znname =~ \"(?i)" + name + "\" OR n.enname =~ \"(?i)" + name + "\") RETURN n LIMIT $nproduct"
    data = {
        "statements": [
            {
                "statement": cypher,
                "parameters": {
                    "name_value": name,
                    "nproduct": limit_count
                },
            }
        ]
    }
    r = requests.post(URL_V4, headers=headers, json=data)
    #print(label, name)
    rows = r.json()["results"][0]["data"]
    nodes = []

    for row in rows:
        node= row['row'][0]
        nodes.append((node))

    return nodes

Cobra, The above is one of my typical queries composed through a function call. How would you improve it in this case?

@lingvisa You are using Py2neo right?

You can pass channel and name as parameters:

cypher = "MATCH (n:"+label+") WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
data = {
    "statements": [
        {
             "statement": cypher,
             "parameters": {
                 "channel": "(?i)" + channel,
                 "name": "(?i)" + name,
                 "nproduct": int(limit_count)
              },
         }
     ]
}

Moreover, if you don't have a lot of different Labels, I advice you to do a ifelse condition and to write directly the label in cypher request (to avoid injection for example), for this request, I don't think there is any problem but it's better to use good practices:)

For example if you have only a Product and item labels:

label = str(label).lower()
if label == "product":
    cypher = "MATCH (n:Product) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
elif label == "item":
    cypher = "MATCH (n:Item) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
else:
    print("Label invalid!")

Cobra, this looks a lot cleaner. Actually, I am using the original Neo4j HTTP API. It's my own function definition. I will use this syntax.

I have a few labels in a specific channel (domain). So the ifelse should work for better performance.

Oh I see, I never used the Neo4j HTTP API :smile:

  • the parameters are here to speed up your query because the database will recognize your request :slight_smile:
  • regexs make your query robustness because it is case insentive for channel and name parameters in your case :slight_smile:
  • you can cast the limit_count to be sure to have an int and not something else :slight_smile:
  • the ifelse on labels will prevent injection and unwanted behaviour :slight_smile:

These things will make your code better, cleaner and resilient :slight_smile:

That makes sense. I will have another query improvement in a similar fashion which is composed very similar but is used in a Neo4j Driver API scenario. I will ask later.

No problem, I will be happy to help you :slight_smile:

I always use Python Neo4j driver :smile:

Regards,
Cobra