Hey there!
I am building a simple social network like recommendation engine that should score the users connections to each other.
I have a question about adding scores to relationships:
Graph schema:
(User)-[:KNOWS_VIA_SCHOOL]->(User)
(User)-[:KNOWS_VIA_PHONE]->(User)
(User)-[:WORKED_TOGETHER]->(User)
(User)-[:TRAVELLED_TOGETHER]->(User)
How can I create a simple recommendation engine for the data above based on the following constraints:
- Each relationship should be given a score.
- For instance, if a user A has a
KNOWS_VIA_SCHOOL
rel with another user B it has a score of 3, and if user A has aKNOWS_VIA_PHONE
rel with user B it has a score of 2.- If user A has two rels with user B, we should sum the score of the rels. For example now user A and user B have a connection score of 5 (3 because of KNOWS_VIA_SCHOOL and 2 because of KNOWS_VIA_PHONE)
The plan I have so far is to query the different rels and collect them in collections as a tuple of [user, score] as seen below:
Match(user: User { id: $user_id })
OPTIONAL MATCH (user)-[:KNOWS_VIA_SCHOOL]->(schoolUser)
// Collect the [user, score] array
WITH user, collect([schoolUser, 3]) as usersSoFar
OPTIONAL MATCH (user)-[:KNOWS_VIA_PHONE]->(phoneUser)
// Collect the [user, score] array -- note that the below code doesn't account for duplicates
WITH user, usersSoFar + collect([phoneUser, 2]) as usersSoFar
// return usersSoFar ordered by score
...etc
Notice that I used simple direct relationships for the example below but the real data has multiple hops and is not as straighforward.
The issue here is that the usersSoFar
collection now has duplicates that I can't merge together and sum the scores of.
How it is now: [[{id: 1}, score: 3], [{id: 1}, score: 2], [{id: 2}, score: 2]]
How it should be: [[{id: 1}, score: 5], [{id: 2}, score: 2]]
// similar behavior to lodash.mergeBy
Questions:
- How can I merge the collections above properly?
- Is this approach the proper way to go? Or is there maybe a simpler way to score relationships in Cypher?
Notes:
This is a simplified version of the data, not all user relationships are direct. for example sometimes the rels are as follows (User)--()--(User)