Modeling and querying Cupid's matchmaking database

egermani · January 18, 2020, 9:48pm

Hi folks! I've got a toy problem I'd love to get some advice on -- I've got something working but get the feeling there may be a better approach. Here's the setup:

This database contains all of the eligible singles in a small city (ie we'll never have more than ~30,000 :Person nodes to worry about), and the goal is to pair these singles up with each other. The way I've modelled it is that certain :Quality nodes pair well together. John, a Person, has the Quality named "Funny", and that Quality pairs well with the "Sense of Humor" Quality. In an ideal world, we'd just do a pattern match from (single1:Person)-[:HAS]->(Quality)-[:PAIRS_WITH]-(Quality)<-[:HAS]-(single2:Person) and be done...

...but the singles are also superficial, and they have a type. So they've also got non-negotiable prerequisites for whom they'll date. Maybe John refuses to date a woman with green eyes. Or Abby will only date a person who owns a bicycle and a dog. (They can have multiple types, though: maybe somebody will date either a 6-footer with black hair OR a blue-eyed sports fan.)

The way I've modelled that is to bundle up everybody's prerequisites into :MustHave nodes. The :MustHaves are connected to :Attributes via :DESCRIBES relationships. So Abby's :MustHave would have two inbound links, one from the "Owns Dog" :Attribute and one from the "Owns Bike" :Attribute. The trouble I have is with the all-or-nothing nature of these. What I've been doing is counting up how many prerequisites are connected to a :MustHave node, doing a path expansion to find connected :Persons, and then doing some aggregation to confirm that the :Person in question has the same number of links to the :MustHave as the :MustHave has inbound links.

So that's been working okay, but I think I'm falling victim to combinatorial blow-up as I search for these potential matches in the city. I figure that the paired :Quality nodes are a great place to start, since there's a pretty modest number of potential Quality combos, and we know that without paired qualities, there's no match at all. But when I profile the query, I'm seeing a huge number of potential rows & db hits from the operation that figures out who satisfies someone else's prerequisites. If I consider 8 :MustHaves, things execute in <1s. If I look at 60 :MustHaves, it takes around 20 seconds. This is with a toy dataset, so I think there's going to be a whole lot more :MustHaves coming as I add in more picky singles.

I think if all else fails, I can reduce my cardinality by only considering one :Person at a time, since they'll only have a few :MustHave nodes connected to them. I guess I'm worried that that may not be enough once I've fleshed out all the potential Quality pairings and get up to 30,000 :Person nodes.

So here are the things I'm curious about:

1.) Would you model the :MustHaves in a different way? I like the idea of representing a "type" in the database, and being able to quickly see that Bob fits the "Tall Dark and Handsome" type.

2.) Do you think it's a big deal that I'm calculating the eligible singles on the fly each time? Would it make more sense to run these queries once, figure out who satisfies which :MustHaves, and draw a direct relationship from a person to the :MustHave? My goal is to keep the graph as sparse as possible, but I'm willing to junk it up a bit if there's some speed to be gained. My initial test didn't seem to make much difference, however, which makes me think it's the size of the possibility space and not the computation that's bogging me down.

Again, this is a toy problem and I'm very happy to take blue sky or highly vague advice. I'm just trying to develop my nose for code smells in the world of graph databases, really.

MuddyBootsCode · January 19, 2020, 1:25pm

If you haven't already you might try drawing out the data structure you're wanting to work with. It may seem odd but it helps quite a bit with creating a good model. This is a pretty useful tool for that.Arrow Tool

egermani · January 21, 2020, 1:09am

I appreciate the tip, looks like a good tool!

ameyasoft · January 21, 2020, 3:51am

Your problem is similar to SpeedDating. You need to create Ladies, Gentlemen, Ladies preferences, Gentlemen preferences nodes. Under the preferences, it is good to group them in to some categories. Once you have these nodes, you can select a lady with her preferences to match a gentleman with matching preferences.

In SpeedDating, ten men and ten women will be sitting in ten tables. Each couple is given ten minutes to talk and after that the men move to the next table and chat with women and this continues till every man gets a chance to meet with all the ten women. After this they will update their results.

With Neo4j we can analyze the final outcome with individual preferences. I worked on this project and was very interesting to me.

Topic		Replies	Views
Trying to squeeze as much performance as I can out of this complex query Cypher	2	505	December 31, 2019
Matchmaking system: is neo4j a good fit? Modeling apoc , performance , cypher , migrated	1	61	October 11, 2024
Cypher: match all couples of nodes iff they share a specific relationship Cypher cypher	12	1298	January 29, 2020
Cypher query validation for genome analysis Cypher querying	1	1433	January 29, 2019
Writing out performant queries for n-relationship queries on a single node where n > 2 Cypher	20	3495	October 14, 2019

Modeling and querying Cupid's matchmaking database

Related topics