Hi there,
Question on the vector.similarity.cosine function that was added in 5.18.
First, thanks for this, super helpful.
I'm not an expert on geographic alegbra, so I'm hoping someone can help fill me in on my question.
This page documents the calculation that is done by Neo4J for a cosine similarity: Vector search indexes - Cypher Manual (neo4j.com)
0.5*(1+(dot product)/(len_a*len_b))
But, when I look at other cosine similarity implementations, it seems to be different:
Open AI used to have this utility function:
np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
SciPy uses this function:
scipy.spatial.distance.cosine — SciPy v1.12.0 Manual
So, one question: Are these all equivalent? (I know the answer is that they are not numerically equivalent, already ran into that. But will they always result in the same ordering of similarity?)
Another question: Why the difference? Does this have larger impacts that I need to be aware of?