Vector.similarity.cosine

rcasburn · April 2, 2024, 10:38pm

Hi there,

Question on the vector.similarity.cosine function that was added in 5.18.

First, thanks for this, super helpful.

I'm not an expert on geographic alegbra, so I'm hoping someone can help fill me in on my question.

This page documents the calculation that is done by Neo4J for a cosine similarity: Vector search indexes - Cypher Manual (neo4j.com)

0.5*(1+(dot product)/(len_a*len_b))

But, when I look at other cosine similarity implementations, it seems to be different:

Open AI used to have this utility function:
np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

SciPy uses this function:
scipy.spatial.distance.cosine — SciPy v1.12.0 Manual

So, one question: Are these all equivalent? (I know the answer is that they are not numerically equivalent, already ran into that. But will they always result in the same ordering of similarity?)

Another question: Why the difference? Does this have larger impacts that I need to be aware of?

michael.hunger · April 22, 2024, 7:45am

Yes the ordering will be the same within one system.
It's just that some are normalizing the values, others not.

Topic		Replies	Views
Text similarity using cosine similarity Neo4j Graph Platform migrated	2	878	January 3, 2023
Cosine similarity Cypher	1	216	November 1, 2021
Algo cosine similarity error Cypher apoc , cypher , operations , knowledge-base	5	814	September 14, 2020
Cosine similarity on 1M person nodes Neo4j Graph Platform migrated	5	1030	August 22, 2023
Couldnt use Cosine similarity Algorithm in my Neo4j Desktop Graph Algorithms/Graph Data Science	4	1766	October 9, 2018

Get Certified in June!

Vector.similarity.cosine

Related topics