Hi,
I have an organisation chart modelled as a hierarchy in Neo, and I'm trying to find an efficient way of answering, "who reports to a given individual?"
Here are the rules:
- All Departments apart from the root are PART_OF exactly one parent Department
- People WORK_IN Departments; some people WORK_IN a Department as head
Here's an example:
People report to the next-highest head of a Department:
- Erin straightforwardly reports to Alice
- Bob also reports to Alice, because 'Widgets', the parent of Bob's Department, doesn't have a head, so he skips that one and reports to the head of parent of the 'Widgets' Department.
Here's the Cypher to create the above structure.
CREATE (ceo:Department {name: 'CEO'})
CREATE (w:Department {name: 'Widgets'})-[:PART_OF]->(ceo)
CREATE (sw:Department {name: 'Software Widgets'})-[:PART_OF]->(w)
CREATE (hw:Department {name: 'Hardware Widgets'})-[:PART_OF]->(w)
CREATE (rprs:Department {name: 'Rocket-Powered Roller Skates'})-[:PART_OF]->(hw)
CREATE (jpps:Department {name: 'Jet-Powered Pogo Sticks'})-[:PART_OF]->(hw)
CREATE (s:Department {name: 'Sales'})-[:PART_OF]->(ceo)
CREATE (:Person {name: 'Alice'})-[:WORKS_IN {head: true}]->(ceo)
CREATE (:Person {name: 'Bob'})-[:WORKS_IN {head: true}]->(sw)
CREATE (:Person {name: 'Charlie'})-[:WORKS_IN {head: true}]->(hw)
CREATE (:Person {name: 'Dave'})-[:WORKS_IN]->(hw)
CREATE (:Person {name: 'Erin'})-[:WORKS_IN]->(s)
I found it's possible given a Department to identify all descendent Departments up to the first with a head using the following Cypher:
MATCH (root:Department {name: 'CEO'})<-[:PART_OF*]-(d)
WHERE NOT EXISTS {
(root)-[:PART_OF*]-(intermediate)-[:PART_OF*]-(d), (intermediate)-[:WORKS_IN {head: true}]-()
}
RETURN root, d
This is more or less saying "find all the descendant Departments of root except those that can only be reached by passing through another Department that has a head."
I was pretty pleased with this, until I tried it on a large data set. It turns out that starting out from a root with a few hundred descendants, it's fairly slow, and when you get into the thousands, it's unusable.
It appears there are two things that are bad about my query: the MATCH generates all paths to all reachable dependants, and then the WHERE NOT EXISTS clause has work to do on each path that grows (to the square?) with the length of path in question.
I suspect there's a better way of writing this query - the MATCH really doesn't need to generate all those paths, because it can stop descending the hierarchy whenever it encounters a Department with a head, however, I don't know how to express that in Cypher. Perhaps there's a different way of approaching it entirely I haven't considered?
Can anyone help me with how to go about this?