K nearest neighbor with cypher and euclidian distance

mohiniiimo · June 17, 2023, 3:42pm

Hello everyone!

I am currently trying to make recommendations based on collaborative Filtering. I have books users and authors in my database. The user are connected to the books with RATED and ADDED TO LIST. The books and authors with WRITTEN_BY.

// k-nearest neighbor

MATCH (u1:User {user_id: 11927})-[r:RATED]->(b:Book)

WITH u1,r.rating AS rating_u1

MATCH (u1)-[r1:RATED]->(b:Book)<-[r2:RATED]-(u2)

WITH u1, u2, COLLECT({r1: r1, r2: r2}) AS ratings WHERE size(ratings) > 5

MATCH (u2)-[r:RATED]->(b:Book)

WITH u1, u2, r.rating AS rating_u2, ratings

UNWIND ratings AS r

WITH sqrt(sum((r.r1.rating - r.r2.rating)^2)) AS euclidean_distance,

u1, u2 WHERE euclidean_distance <> 0

WITH u1, u2, 1 / (1 + euclidean_distance) AS similarity

ORDER BY similarity DESC LIMIT 10

MATCH (u2)-[r:RATED]->(b:Book)

WHERE NOT EXISTS((u1)-[:RATED]->(b))

RETURN b.title, SUM(similarity * r.rating) AS score

ORDER BY score DESC LIMIT 25

With this cypher queries I want to compare each pair of users who have rated at least 5 common books. Then, I want to compare the 10 most similar users to the user with ID 11927. But there must be something wrong with my code, it is not doing what it should. Have you an idea how I could solve this? (without graph data science playground or AuraDS. Thank you

glilienfield · June 17, 2023, 9:55pm

You some unnecessary querying going on in your query. For instance, after your first match, you can start the next match from b. The size of the collection should be the same as the number of u2 nodes.

I think the KNN metric would be practical for a given node, but not if you want to find the similarity between each pair of nodes, as I think that would be inefficient in cypher.

As an alternative, you could measure the similarity between pairs of nodes using the Jaccard measure. This could be applied in your problem by counting the number of books reviewed in common between to reviewers, compared to all the books the two reviewers reviewed.

I think the following query derives this metric (I did it kinda quickly).

match (r1:Reviewer)-[:REVIEWED]->(b:Book)<-[:REVIEWED]-(r2:Reviewer)
where id(r1) < id(r2)
with r1, r2, count(b) as intersection_count, count{(r1)-[:REVIEWED]->(:Book)} as r1_count, count{(r2)-[:REVIEWED]->(:Book)} as r2_count
return *, toFloat(intersection_count) /  toFloat(r1_count + r2_count - intersection_count) as Jaccard_similarity

Test Data:

create(a0:Reviewer{id:0})
create(a1:Reviewer{id:1})
create(a2:Reviewer{id:2})
create(a3:Reviewer{id:3})
create(b0:Book{id:0})
create(b1:Book{id:1})
create(b2:Book{id:2})
create(b3:Book{id:3})
create(a0)-[:REVIEWED]->(b0)
create(a0)-[:REVIEWED]->(b1)
create(a1)-[:REVIEWED]->(b0)
create(a1)-[:REVIEWED]->(b1)
create(a1)-[:REVIEWED]->(b2)
create(a2)-[:REVIEWED]->(b1)
create(a2)-[:REVIEWED]->(b2)

Result:

You can add your constraint to only compare reviews that have at least 5 books reviewed in common by adding a 'where' condition.

match (r1:Reviewer)-[:REVIEWED]->(b:Book)<-[:REVIEWED]-(r2:Reviewer)
where id(r1) < id(r2)
with r1, r2, count(b) as intersection_count, count{(r1)-[:REVIEWED]->(:Book)} as r1_count, count{(r2)-[:REVIEWED]->(:Book)} as r2_count
where intersection_count > 4
return *, toFloat(intersection_count) /  toFloat(r1_count + r2_count - intersection_count) as Jaccard_similarity

Hope this helps.

mohiniiimo · June 18, 2023, 11:56am

Thank you very much!!!

Topic		Replies	Views
Can someone please explain this cypher query Cypher	7	472	May 10, 2020
Euclidian Distance Similarity Question AI, ML, NLP	3	894	October 31, 2019
Trying to write a Cypher query for node similarity to design a movie recommendation system Graph Algorithms/Graph Data Science cypher , gds	4	331	March 27, 2024
Need help with cypher query Cypher cypher	5	316	June 22, 2021
Building a similarity graph with Neo4j’s Approximate Nearest Neighbors Algorithm Neo4j Developer Blog Archive	0	1972	October 1, 2019

K nearest neighbor with cypher and euclidian distance

Related topics