Hi folks! I've got a toy problem I'd love to get some advice on -- I've got something working but get the feeling there may be a better approach. Here's the setup:
This database contains all of the eligible singles in a small city (ie we'll never have more than ~30,000 :Person nodes to worry about), and the goal is to pair these singles up with each other. The way I've modelled it is that certain :Quality
nodes pair well together. John, a Person, has the Quality named "Funny", and that Quality pairs well with the "Sense of Humor" Quality. In an ideal world, we'd just do a pattern match from (single1:Person)-[:HAS]->(Quality)-[:PAIRS_WITH]-(Quality)<-[:HAS]-(single2:Person)
and be done...
...but the singles are also superficial, and they have a type. So they've also got non-negotiable prerequisites for whom they'll date. Maybe John refuses to date a woman with green eyes. Or Abby will only date a person who owns a bicycle and a dog. (They can have multiple types, though: maybe somebody will date either a 6-footer with black hair OR a blue-eyed sports fan.)
The way I've modelled that is to bundle up everybody's prerequisites into :MustHave
nodes. The :MustHaves
are connected to :Attributes
via :DESCRIBES
relationships. So Abby's :MustHave
would have two inbound links, one from the "Owns Dog" :Attribute
and one from the "Owns Bike" :Attribute
. The trouble I have is with the all-or-nothing nature of these. What I've been doing is counting up how many prerequisites are connected to a :MustHave
node, doing a path expansion to find connected :Persons, and then doing some aggregation to confirm that the :Person
in question has the same number of links to the :MustHave
as the :MustHave
has inbound links.
So that's been working okay, but I think I'm falling victim to combinatorial blow-up as I search for these potential matches in the city. I figure that the paired :Quality
nodes are a great place to start, since there's a pretty modest number of potential Quality combos, and we know that without paired qualities, there's no match at all. But when I profile the query, I'm seeing a huge number of potential rows & db hits from the operation that figures out who satisfies someone else's prerequisites. If I consider 8 :MustHaves
, things execute in <1s. If I look at 60 :MustHaves
, it takes around 20 seconds. This is with a toy dataset, so I think there's going to be a whole lot more :MustHaves
coming as I add in more picky singles.
I think if all else fails, I can reduce my cardinality by only considering one :Person
at a time, since they'll only have a few :MustHave
nodes connected to them. I guess I'm worried that that may not be enough once I've fleshed out all the potential Quality pairings and get up to 30,000 :Person
nodes.
So here are the things I'm curious about:
1.) Would you model the :MustHaves
in a different way? I like the idea of representing a "type" in the database, and being able to quickly see that Bob fits the "Tall Dark and Handsome" type.
2.) Do you think it's a big deal that I'm calculating the eligible singles on the fly each time? Would it make more sense to run these queries once, figure out who satisfies which :MustHaves
, and draw a direct relationship from a person to the :MustHave
? My goal is to keep the graph as sparse as possible, but I'm willing to junk it up a bit if there's some speed to be gained. My initial test didn't seem to make much difference, however, which makes me think it's the size of the possibility space and not the computation that's bogging me down.
Again, this is a toy problem and I'm very happy to take blue sky or highly vague advice. I'm just trying to develop my nose for code smells in the world of graph databases, really.