Cypher query to get root node and it's levels

Hi guys, I need help with query to fetch the root nodes and its levels (from the root node)

Data Structure Format

Here each nodes are connected with [:HAS_CHILD] releationship. Each of the nodes have a property uniq_id which is used to relate them together

So I am trying to find the root nodes here. The query I am using is

MATCH (w:Node{uniq_id: <some_uniq_id>})
WHERE NOT (w)<-[:HAS_CHILD]-()
WITH COLLECT(w) AS roots
UNWIND roots as root
Return root as RootNodes, 0 as Level

Then using the root node to traverse the graph to gets its child levels

MATCH (w:Node {uniq_id: <uniq_id>)
WHERE NOT (w)<-[:HAS_CHILD]-()
WITH COLLECT(w) AS roots

UNWIND roots AS root
MATCH path=(root)-[:HAS_CHILD*]->(child)
WITH root, COLLECT(DISTINCT child) AS descendants, path
RETURN DISTINCT descendants, REDUCE(s = 0, r IN RELATIONSHIPS(path) | s + 1) AS level

But here both P1 and C3 are getting considered as the root nodes, but the use case is to get P1 as the root node and define its level 0, and then come down each depth and set a level to all the nodes (directly/indirectly)

So something like
p1 - level 0
c1,c2,c3 - level 1
gc - level 2

Note: There can be multiple valid root nodes as long as they are at the top level (level=0)

Not sure if there is any apoc function that can help with my use case here. Also, is there a better way to query my use case here.

Any help would be appreciated :slight_smile:

What are roots in your diagram depends on your definition of a root node. It makes sense to me that a root node should not have a parent node, i.e., it is not a child of any node. With this definition, both P1 and C3 qualify.

To me, it makes sense to define the level of a node on a subgraph as its distance from the subgraph’s root node. The root node being at the top is at level 0.

What is a subgraph? In your case, where you have hierarchical data related by a HAS_CHILD relationship, it makes sense to define a subgraph as a root node and the collection of nodes descending from the root node. With this definition, your graph has two subgraphs, one with P1 as the root and one with C3 as the root. P1’s subgraph consists of all nodes except C3 and C3’s subgraph is just C3 and GC.

Your conclusion that C3 is at level 1 seems to be a result of how you drew your diagram. You could have just as easily drawn C3 adjacent to P1, making it look like level zero.

I believe the correct answer is determined by your definitions, which depend on your use case.

It seems like you want to consider the graph as one graph. You can determine the level of C3 relative to P1, as you see it, by counting the number of hops when traversing from P1 to C3, where an outgoing relationship adds to the count and an incoming relationship subtracts from the count. Thus would be computed on the shortest path if multiple exist. Using this approach results in a level of 1 + 1 - 1 = 1.

BTW, in the first query you collect something and then you unwind it, thus returning what you collected. You should get identical results from this simplified version.

MATCH (w:Node{uniq_id: <some_uniq_id>})
WHERE NOT EXISTS( (w)<-[:HAS_CHILD]-() )
Return w as RootNodes, 0 as Level

Will you get multiple root nodes when your match statement contains the constraint for uniq_id? Instead, is the follow more accurate for determining the root nodes:

MATCH (w)
WHERE NOT EXISTS( (w)<-[:HAS_CHILD]-() )
Return w as RootNodes, 0 as Level

As a final note, I think you can replace your reduce operation with the size of the relationship list, as your reduce effecting is counting the number of elements. The following should be equivalent:

size( RELATIONSHIPS(path) )

Actually, I think it is also equalent to the length of the path.

length(path)

@glilienfield Thanks for the reply, apologies in the delay in response. I have a couple of followup questions.

Your conclusion that C3 is at level 1 seems to be a result of how you drew your diagram. You could have just as easily drawn C3 adjacent to P1, making it look like level zero.

Here, C3 is level 1 and not in the level of P1 because of the no of relationships. P1 has 2 [:HAS_CHILD] relationships: P1->C2, C2->GC whereas C3 has one from the leaf node (GC) so this makes it at the level of C1,C2, C3.
An anology here would be Grandfather, Parent, Child, Grandchild, so in this use case we are essentially trying to find the top level (level 0) nodes only(hirerachy wise) and not just nodes without a parent.

It seems like you want to consider the graph as one graph. You can determine the level of C3 relative to P1, as you see it, by counting the number of hops when traversing from P1 to C3, where an outgoing relationship adds to the count and an incoming relationship subtracts from the count. Thus would be computed on the shortest path if multiple exist. Using this approach results in a level of 1 + 1 - 1 = 1.

Here, could you share some more light in terms of implementation (or maybe query snippet) of how that would look like?
Reason being:

  1. In this example P1,C3 are obvious, but ideally these would be computed dynamically and so there can be multiple valid root nodes
  2. Once we compute the dynamic root nodes, we need to find the levels (which was the second query)

So in a whole, based out of the example shared, my response should look like:
P1: 0 level
C1, C2, C3: 1 level
GC: 2 level

Any help or direction would greatly be appreciated. I'm new to Cypher and Graph Concepts :slight_smile:

A level is a relative measure describing the distance of one entity in relationship to another entity. The levels of a collection of nodes has to be measured against the same reference node. I view your graph as a collection of nodes having parent-child relationships. Your graph has two root nodes (nodes without any parents). Each root node and all its descendant child nodes forms a subgraph. The level of a node in the subgraph is the distance it is from its root node. A node can have a different level for each subgraph it is a member of. For example, GS's level is two for the subgraph originating from P1, while it is one for the subgraph origination from C3.

You seem to have a different definition of level, since you are viewing the graph differently for your use case. If you consider your graph an example of a family tree from the perspective of P1, then C1 and C2 are children, GC is a grandchild, and C3 is a daughter or son in law. From your description, you are defining level zero as the reference node (P1), level two as the grandchild nodes, and level one as the children and their spouses.

The following query extracts the info you want:

match(p1:Node{name:'P1'})
return {
    zero: [p1.name],
    one: [(p1)-[:HAS_CHILD]->(c)|c.name] + [(p1)-[:HAS_CHILD*2]->()<-[:HAS_CHILD]-(c)|c.name],
    two: [(p1)-[:HAS_CHILD*2]->(c)|c.name]
} as levels