Help me calculate similarity between users

nvqhuy21 · April 24, 2024, 3:17pm

Hello Community,

I'm working on a project using Neo4j to analyze the similarity between users based on their behaviors after completing courses. Each user will have a set of behavioral parameters (A, B, C, D), but the number of courses they complete may vary, leading to unequal lengths of each set. My goal is to compute the similarity between users based on these parameters.

I'd like to ask the community about methods or procedures that can be used to solve this problem in Neo4j. Specifically, I'm interested in how to calculate similarity (potentially using cosine similarity, Euclidean distance, Jaccard similarity, etc.) between users based on unequal sets of behavioral parameters.

I greatly appreciate any contributions and suggestions from the community!

Thank you for reading!

glilienfield · April 24, 2024, 5:29pm

You could try the Jaccard measure. Basically, count the number of things that are the same between two nodes divided by the total number of things you are comparing between the two nodes. The ratio is the similarity, ranging from zero to one (inclusive).

Does it make sense to count the number of parameters both nodes have in common divided by the total number of unique parameters between the two.

ameyasoft · April 24, 2024, 10:07pm

Gary is right. Here is the apoc function to get the string similarity.

with "ABCDE" as n1, "ABCDEFGH" as n2
return (100 - toInteger(apoc.text.jaroWinklerDistance(n1, n2) * 100)) as similarity
Result:  93 If you remove 'H' from n2, the similarity = 95

nvqhuy21 · April 25, 2024, 1:49pm

Thank you @ameyasoft, but A, B, C or D is an array.

Example:
User1 completed 2 courses, so this user's behaviors are A: [a1,a2], B:[b1,b2], C:[c1,c2], D:[d1,d2]
User2 completed 1 course, so this user's behaviors are A:[a2], B:[b1], C:[c2], D:[d1]

The problem is finding similarity between 2 users using graph mining. If you have any suggest to solve this problem, please let me know!
Thank you so much

ameyasoft · April 25, 2024, 5:57pm

Created this sample:
merge (a:User  {name: "UserA", course1: "Completed", course2: "Completed"})
merge (a1:User {name: "UserB", course1: "Completed", course2: "Completed"})
merge (a2:User {name: "UserC", course1: "Incomplete", course2: "Completed"})
merge (a3:User {name: "UserD", course1: "NotStarted", course2: "Completed"})

Ran the similarity:
match (a:User) where a.name = "UserA"
with a, a.course1 as c1
match (b:User) where b.name <> a.name
with a, b, a.course1 as c1, b.course1 as c2
//convert the values to all lower case and also removes empty spaces and special characters.....
with a, b, apoc.text.clean(c1) as c1, apoc.text.clean(c2) as c2
with a, b, c1, c2, (100 - toInteger(apoc.text.jaroWinklerDistance(c1, c2) * 100)) as similarity
return a.name, c1, similarity, b.name, c2

Result:

Similarly you should run for each behavior.

Topic		Replies	Views
Calculating Similarity of Nodes based on relations Neo4j Graph Platform cypher , data-science	1	268	April 18, 2022
Need idea about Cypher query writing Operations operations	1	419	July 27, 2020
How to calculate similarity based on some properties in neo4j? Neo4j Graph Platform migrated	3	374	October 16, 2022
Similarity Problem Cypher	1	374	June 26, 2020
everyone's relationship with everyone Graph Algorithms/Graph Data Science	8	390	August 9, 2021

Help me calculate similarity between users

Related topics