Why example do not use just SUM but mostly use REDUCE/COLLECT

knowledge-base

(Tpphu) #1

Hi all, I'm a newbie

I am learning at

https://guides.neo4j.com/sandbox/recommendations/index.html

At session: Collaborative Filtering – Similarity Metrics - Cosine Distance

I see the guide is using a Cypher syntax:

 MATCH (p1:User {name: "Cynthia Freeman"})-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:User)
WITH COUNT(m) AS numbermovies, SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength,
SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength,
p1, p2 WHERE numbermovies > 10
RETURN p1.name, p2.name, xLength, yLength, xyDotProduct / (xLength * yLength) AS sim
ORDER BY sim DESC LIMIT 100;

And I tested by another one, just remove REDUCE/COLLECT and use SUM

MATCH (p1:User {name: "Cynthia Freeman"})-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:User)
WITH COUNT(m) AS numbermovies, SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(SUM(x.rating^2)) AS xLength,
SQRT(SUM(y.rating^2)) AS yLength,
p1, p2 WHERE numbermovies > 10
RETURN p1.name, p2.name, xLength, yLength, xyDotProduct / (xLength * yLength) AS sim
ORDER BY sim DESC LIMIT 100;

I compared two results and they are same.

So I am very confused why the guide use complex above syntax bot just simple with SUM.


(Bratanic Tomaz) #2

If you want to simplify it even more you can check out graph algorithms, where cosine similarity is exposed as a procedure. Check the docs for more: https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-cosine/