Hi folks! I've got a toy problem I'd love to get some advice on -- I've got something working but get the feeling there may be a better approach. Here's the setup:
This database contains all of the eligible singles in a small city (ie we'll never have more than ~30,000 :Person nodes to worry about), and the goal is to pair these singles up with each other. The way I've modelled it is that certain
:Quality nodes pair well together. John, a Person, has the Quality named "Funny", and that Quality pairs well with the "Sense of Humor" Quality. In an ideal world, we'd just do a pattern match from
(single1:Person)-[:HAS]->(Quality)-[:PAIRS_WITH]-(Quality)<-[:HAS]-(single2:Person) and be done...
...but the singles are also superficial, and they have a type. So they've also got non-negotiable prerequisites for whom they'll date. Maybe John refuses to date a woman with green eyes. Or Abby will only date a person who owns a bicycle and a dog. (They can have multiple types, though: maybe somebody will date either a 6-footer with black hair OR a blue-eyed sports fan.)
The way I've modelled that is to bundle up everybody's prerequisites into
:MustHave nodes. The
:MustHaves are connected to
:DESCRIBES relationships. So Abby's
:MustHave would have two inbound links, one from the "Owns Dog"
:Attribute and one from the "Owns Bike"
:Attribute. The trouble I have is with the all-or-nothing nature of these. What I've been doing is counting up how many prerequisites are connected to a
:MustHave node, doing a path expansion to find connected :Persons, and then doing some aggregation to confirm that the
:Person in question has the same number of links to the
:MustHave as the
:MustHave has inbound links.
So that's been working okay, but I think I'm falling victim to combinatorial blow-up as I search for these potential matches in the city. I figure that the paired
:Quality nodes are a great place to start, since there's a pretty modest number of potential Quality combos, and we know that without paired qualities, there's no match at all. But when I profile the query, I'm seeing a huge number of potential rows & db hits from the operation that figures out who satisfies someone else's prerequisites. If I consider 8
:MustHaves, things execute in <1s. If I look at 60
:MustHaves, it takes around 20 seconds. This is with a toy dataset, so I think there's going to be a whole lot more
:MustHaves coming as I add in more picky singles.
I think if all else fails, I can reduce my cardinality by only considering one
:Person at a time, since they'll only have a few
:MustHave nodes connected to them. I guess I'm worried that that may not be enough once I've fleshed out all the potential Quality pairings and get up to 30,000
So here are the things I'm curious about:
1.) Would you model the
:MustHaves in a different way? I like the idea of representing a "type" in the database, and being able to quickly see that Bob fits the "Tall Dark and Handsome" type.
2.) Do you think it's a big deal that I'm calculating the eligible singles on the fly each time? Would it make more sense to run these queries once, figure out who satisfies which
:MustHaves, and draw a direct relationship from a person to the
:MustHave? My goal is to keep the graph as sparse as possible, but I'm willing to junk it up a bit if there's some speed to be gained. My initial test didn't seem to make much difference, however, which makes me think it's the size of the possibility space and not the computation that's bogging me down.
Again, this is a toy problem and I'm very happy to take blue sky or highly vague advice. I'm just trying to develop my nose for code smells in the world of graph databases, really.