Algo cosine similarity error

Hi all,

I am new to cypher and neo4j. We have a neo4j version 3.5.14 . We have articles as node and it's meta data as properties. I am trying to apply some graph algorithms on the nodes to create similar link between articles based on cosine similarity between articles embeddings property. I am able to do it on browser but through cypher query i am having issues. Here is my query -:

                    MATCH (a:article)-[r_0:has]-(k:keyword)-[q:has]-(b:article)
                    WHERE a._id = {article_id}
                    AND b.published > {window_left}
                    AND b.published < {window_right}
                    AND b.embeddings IS NOT NULL"
                    WITH a,b 
                    algo.similarity.cosine(a.embeddings,b.embeddings) as similarity
                    WHERE similarity > 0.8
                    MERGE (a)-[r:similar]-(b)
                     """ % (article['_id'],similarity_threshold,window_left,window_right)

We have keywords as nodes as well, since data volume is above 100k, I have the first article id and I am trying to first match articles on common keyword and then compare their embeddings. If the embeddings are above 0.8, i create a similar link between those and store their weights.
And is there is any better way to do it? I am able to run this in browser client but not through neo bolt from python

Versions -:
Neo4j -3.5.14-enterprise
neobolt - 1.7.4

But, what is the error ?

Hi gabriel,

this is the error

Any help to resolve this would be appreciated. Thanks

I think you are missing a comma (",") after "WITH a,b", meaning "WITH a,b,algo..."

Hi gabriel, thank you. Kinda embarrassed to be honest. :slight_smile:
Is there any way to optimize the query? I see, as my data increases, it is getting slower. Would using would make a difference in speed?
Thank you

It's always better to think about ways to optimise your graph, as to re-use already done calculatios, but, have you tried runnig your query in parallel ?

Do take a look here, it's an interesting discussion: How best to do parallel processing