Unexpected results when working through the Cosine Similarity examples in the documentation (Similarity functions - Neo4j Graph Data Science).
Using Neo4j developer edition 4.0.4 and GDS 1.3.
(1) documentation seems to miss that you need a Native Projection to make the streaming examples work. You can easily do that by passing in nodeProjection:'*', relationshipProjection:'*'
within the map or by using a pre-created named projection such as CALL gds.graph.create('blah', '*', '*') YIELD graphName, nodeCount, relationshipCount;
but that should probably be shown.
(2) Code as presented returns some symmetric results, so for example "Praveena" "Karin" 1.0 and "Karin" "Praveena" 1.0. Algorithm is symmetrical, the posted examples don't show these entries but I don't see a way of removing them other than some sort of equality comparison on id(node) which is a bit ugly.
(3) The results for Zhen - Anya and Zhen - Karin seem unexpected to me. They should both return 0 as there are no dimensions in common however, while the documented example shows them both returning 0, in my results I find Zhen - Anya gives me 0 when streaming, and Zhen Karin has no result. Passing empty vectors (indicating no dimensions in common) into gds.alpha.similarity.cosine() also generates an error instead of the expected 0.
Handy queries:
// Person name and data being passed into gds.alpha.similarity.cosine.stream
MATCH (p:Person), (c:Cuisine)
OPTIONAL MATCH (p)-[likes:LIKES]->(c)
WITH p, {item:id(p), weights: collect(coalesce(likes.score, gds.util.NaN()))} AS userData
WITH p, collect(userData) AS data
RETURN p.name, data
The query that provides unexpected results compared to manual calculations and documented results:
MATCH (p:Person), (c:Cuisine)
OPTIONAL MATCH (p)-[likes:LIKES]->(c)
WITH {item:id(p), weights: collect(coalesce(likes.score, gds.util.NaN()))} AS userData
WITH collect(userData) AS data
CALL gds.alpha.similarity.cosine.stream({nodeProjection:'*', relationshipProjection:'*', data: data})
YIELD item1, item2, count1, count2, similarity
RETURN gds.util.asNode(item1).name AS from, gds.util.asNode(item2).name AS to, similarity
ORDER BY similarity DESC