I'm building software for animal breeders that allows them to create a "hypothetical mating" between two animals so they can view potential traits of their would-be offspring. We store very basic information in Neo4j right now -- just Animal nodes with an animal_id property, the species, and either a HAS_SIRE
or HAS_DAM
relationships between the nodes to model the lineage of a more complex DB stored in a RDBMS.
Our current hypothetical mating tool creates an Animal node with a negative animal_id and deletes it at the end of each request (we create temporary node/relationships -> run queries against DB -> detach delete temp node). I'm wondering if there's a better way? I just came across virtual nodes, relationships, and graphs in the APOC library, which is very similar to what I'm doing internally.
I've been playing around with them with a little success but can't quite get what I'm looking for yet. Here's a sample query I have so far that actually returns something (the hypothetical offspring with the sire, dam, and the HAS_SIRE/HAS_DAM relationships):
MATCH (hypoSire:Animal {animal_id: 333})
MATCH (hypoDam:Animal {animal_id: 321})
WITH hypoSire, hypoDam, apoc.create.vNode(['Animal'], {animal_id:-333321}) AS hypo
CALL apoc.create.vRelationship(hypo,'HAS_SIRE',{},hypoSire) YIELD rel as hypoSireRel
CALL apoc.create.vRelationship(hypo,'HAS_DAM',{},hypoDam) YIELD rel as hypoDamRel
RETURN *;
Now I'd like to take this a little further and query a 4-generation family lineage with this. Something like the following:
MATCH (hypoSire:Animal {animal_id: 333})
MATCH (hypoDam:Animal {animal_id: 321})
WITH hypoSire, hypoDam, apoc.create.vNode(['Animal'], {animal_id:-333321}) AS hypo
CALL apoc.create.vRelationship(hypo,'HAS_SIRE',{},hypoSire) YIELD rel as hypoSireRel
CALL apoc.create.vRelationship(hypo,'HAS_DAM',{},hypoDam) YIELD rel as hypoDamRel
MATCH ped = (hypo)-[:HAS_SIRE|HAS_DAM*0..4]->(ancestor:Animal)
RETURN ped;
In my final MATCH
I've also tried replacing the (hypo)
variable with the newly-created virtual node (:Animal {animal_id: -333321})
, but it always returns an empty result.
So a few questions...
- Am I on the right track with using virtual nodes/relationships to solve this problem?
- Would virtual graphs be a better solution? I wasn't sure how to utilize them, but it seems like it might be.
- If nodes/relationships are the answer, could someone help point me in the right direction for my 2nd query?
- How does Neo4j handle indexes and constraints with virtual nodes? If I have two completely separate requests simultaneously creating a node with a duplicate animal_id (which has a unique index/constraint), will one of them throw an error?
Thanks in advance!