How to add scores to relationships of a recommendation engine

Hey there!

I am building a simple social network like recommendation engine that should score the users connections to each other.

I have a question about adding scores to relationships:

Graph schema:

(User)-[:KNOWS_VIA_SCHOOL]->(User)
(User)-[:KNOWS_VIA_PHONE]->(User)
(User)-[:WORKED_TOGETHER]->(User)
(User)-[:TRAVELLED_TOGETHER]->(User)

How can I create a simple recommendation engine for the data above based on the following constraints:

  • Each relationship should be given a score.
  • For instance, if a user A has a KNOWS_VIA_SCHOOL rel with another user B it has a score of 3, and if user A has a KNOWS_VIA_PHONE rel with user B it has a score of 2.
  • If user A has two rels with user B, we should sum the score of the rels. For example now user A and user B have a connection score of 5 (3 because of KNOWS_VIA_SCHOOL and 2 because of KNOWS_VIA_PHONE)

The plan I have so far is to query the different rels and collect them in collections as a tuple of [user, score] as seen below:

   Match(user: User { id: $user_id })

    OPTIONAL MATCH (user)-[:KNOWS_VIA_SCHOOL]->(schoolUser)
    // Collect the [user, score] array
    WITH user, collect([schoolUser, 3]) as usersSoFar

    OPTIONAL MATCH (user)-[:KNOWS_VIA_PHONE]->(phoneUser)
    // Collect the [user, score] array -- note that the below code doesn't account for duplicates
    WITH user, usersSoFar + collect([phoneUser, 2]) as usersSoFar

    // return usersSoFar ordered by score
...etc

Notice that I used simple direct relationships for the example below but the real data has multiple hops and is not as straighforward.

The issue here is that the usersSoFar collection now has duplicates that I can't merge together and sum the scores of.

How it is now: [[{id: 1}, score: 3], [{id: 1}, score: 2], [{id: 2}, score: 2]]
How it should be: [[{id: 1}, score: 5], [{id: 2}, score: 2]] // similar behavior to lodash.mergeBy

Questions:

  • How can I merge the collections above properly?
  • Is this approach the proper way to go? Or is there maybe a simpler way to score relationships in Cypher?


Notes:
This is a simplified version of the data, not all user relationships are direct. for example sometimes the rels are as follows (User)--()--(User)

One option would be to add a score value to the relationship.

Say you are creating KNOWS_VIA_SCHOOL between u1 and u2 you can set the property like this.

MERGE (u1)-[r:KNOWS_VIA_SCHOOL]->(u20
SET r.score=3

You can do the same for all the relationships.

Now to find the total score this way

MATCH(user: User { id: $user_id })
MATCH (user)-[r]->(other:User)
return u, other, sum(r.score) as score

This will give a score between user and other.

Other option is to have a separate relationship, says HAS_REL which contains the aggregated score between those users. You need to keep the scores updated when ever the relationships are created or removed.

Advantage of having this relationship is you can use only this relationship to traverse among users and do complex operations.

1 Like

@anthapu Thanks for the ideas!

Adding the score value to the relationship works fine in case of the query above due to the simple behavior of fetching via one pattern. The issue in my query is that not everything can be done on a single relationship level, so I might need to add another MATCH there and collect the results in a collection and UNWIND later making sure that I only leave distinct users.

eg:

MATCH(user: User { id: $user_id })
MATCH (user)-[r]->(other:User)
with user, collect([other, r]) as dataSoFar
MATCH (user)-[x]-()-[y]-(other)
with user dataSoFar + collect([other, y]) // here we choose the y rel not x since it's what matters
// here we are stuck with the collection above which might have the same user twice, with different rels eg: [[Tom, { score: 2 }], [Elona, { score: 3 }], [Tom, { score: 5 }]] -- Tom is duplicated here, he should be there once with a score of 7

The issue now is that the collection will have either:
– the user without the relationship, in which case I don't have the score anymore
– a tuple of the user and the relationship eg: [user, rel]. But in this case I have no way of summing the scores later


I think the idea of enriching the graph beforehand makes more sense. However how would 2nd and 3rd degree connections factor in with that? Would I have to create 2nd degree knows relationships every time 2 people connect?
I am not sure about how this would behave performance wise.

@A-Tokyo I am failing to understand your requirement. Why are you trying to manually look at each step?

Can you not use path using variable expand?

MATCH(user: User { id: $user_id })
MATCH p=(user)-[r*1..7]->(other:User)
WITH p, other, reduce(s = 0, x IN relationships(p) | s + x.score) as totalScore
return u, other, totalScore

This gives the whole path cost to reach other.

Also, it wold always be better to use directions in your query.

I am not trying to manually look at each step, I simply want a combination of different paths between users.

I think the path variable might work, have not tried it before. Will give it a try.

I can make do with the total cost of relationships in the path. The main point was that not all knows relationships are direct. Some of them have to be indirect.

The reason I try to look at each step is to combine the scores of all the ways one person might know the other.

The relationships are a lot, and I would like to either combine the scores or at least show the user the relationship label. eg: when knows_via_phone users are fetched we can show some label like You know Tony via phonebook. The graph is not enriched with direct rels before hand like mentioned before.

Thanks for the note! Direction was omitted on purpose for the sake of easiness

MATCH(user: User { id: $user_id })
MATCH p=(user)-[r*1..7]->(other:User)
WITH p, other, reduce(s = 0, x IN relationships(p) | s + x.score) as score
return u, other, sum(totalScore)

If you use this query you will get sum of all the paths no matter how you reach the other person.

This simple query

MATCH(user: User { id: $user_id })
MATCH (user)-[r]->(other:User)
return u, other, sum(r.score) as score

returns the sum of all the relationships the user has.

If you do want to have the types also you can change the query to

MATCH(user: User { id: $user_id })
MATCH (user)-[r]->(other:User)
return u, other, sum(r.score) as score, collect(type(r)) as relations

This will give the total score and how the users are related. You can use this information in the UI to show the message like you mentioned

For longer paths you might have to collect the paths in UI or calculate the sum in cypher and list of paths and decipher them in UI

Hi!
have you considered to use "personalized page rank" ?
We have developed a fast implementation for Neo4j: look here