I'm working on a project using Neo4j to analyze the similarity between users based on their behaviors after completing courses. Each user will have a set of behavioral parameters (A, B, C, D), but the number of courses they complete may vary, leading to unequal lengths of each set. My goal is to compute the similarity between users based on these parameters.
I'd like to ask the community about methods or procedures that can be used to solve this problem in Neo4j. Specifically, I'm interested in how to calculate similarity (potentially using cosine similarity, Euclidean distance, Jaccard similarity, etc.) between users based on unequal sets of behavioral parameters.
I greatly appreciate any contributions and suggestions from the community!
You could try the Jaccard measure. Basically, count the number of things that are the same between two nodes divided by the total number of things you are comparing between the two nodes. The ratio is the similarity, ranging from zero to one (inclusive).
Does it make sense to count the number of parameters both nodes have in common divided by the total number of unique parameters between the two.
Gary is right. Here is the apoc function to get the string similarity.
with "ABCDE" as n1, "ABCDEFGH" as n2
return (100 - toInteger(apoc.text.jaroWinklerDistance(n1, n2) * 100)) as similarity
Result: 93 If you remove 'H' from n2, the similarity = 95
Thank you @ameyasoft, but A, B, C or D is an array.
Example:
User1 completed 2 courses, so this user's behaviors are A: [a1,a2], B:[b1,b2], C:[c1,c2], D:[d1,d2]
User2 completed 1 course, so this user's behaviors are A:[a2], B:[b1], C:[c2], D:[d1]
The problem is finding similarity between 2 users using graph mining. If you have any suggest to solve this problem, please let me know!
Thank you so much
Created this sample:
merge (a:User {name: "UserA", course1: "Completed", course2: "Completed"})
merge (a1:User {name: "UserB", course1: "Completed", course2: "Completed"})
merge (a2:User {name: "UserC", course1: "Incomplete", course2: "Completed"})
merge (a3:User {name: "UserD", course1: "NotStarted", course2: "Completed"})
Ran the similarity:
match (a:User) where a.name = "UserA"
with a, a.course1 as c1
match (b:User) where b.name <> a.name
with a, b, a.course1 as c1, b.course1 as c2
//convert the values to all lower case and also removes empty spaces and special characters.....
with a, b, apoc.text.clean(c1) as c1, apoc.text.clean(c2) as c2
with a, b, c1, c2, (100 - toInteger(apoc.text.jaroWinklerDistance(c1, c2) * 100)) as similarity
return a.name, c1, similarity, b.name, c2