Find graphs with broken parent/child relationship

Hi there. Trying to formalize my issue and find the direction to the solution.

I have many graphs of parent/child relationships.

In a VALID graph:

  1. Child node has only one parent (HAS_PARENT relation)
  2. Parent node can have multiple children (HAS_CHILDS relation)
  3. Parent node has HAS_CHILD out and HAS_PARENT in relation pairs to the same child node
  4. Child has relations HAS_PARENT (out) and HAS_CHILD (in) pair relation from/to the same parent node
  5. Childs cannot be connected to another child
  6. Parent cannot be connected to another parent
  7. There should be only 1 level of relationship between parent and child nodes.

Not sure I did define the graph in a way it can be formalized. The idea is to find not valid graphs, and it looks like defining a valid graph is easier than defining an invalid one (because there can be many combinations of invalid relations).

I'll try to show it in the picture.

for the spanning tree by some property (account in that case)

MATCH (p:Listing {accountId: '.......'})
CALL apoc.path.spanningTree(p, {
    minLevel: 1,
    maxLevel: 5
})
YIELD path
RETURN path;


Asking for direction how can I detect invalid graphs. I was thinking about taking the spanning-tree query result, and programmatically iterating the nodes and checking for the constraints. for example:

  1. Finding the parent (where all his out relation are has_child and all the in relations are has_parent), and the
    If no such node, the graph is invalid
  2. For all other nodes, check if it has has_parent/has_child pair relation to the parent node, and no another relations.

But maybe there is a simpler solution just by using query only, or another by a transformation of the graph representation to be known problem with well know algorithm (kruskal, Dijkstra, whatever).

P.S.
Thanks!

Hello @spaizadv :slight_smile:

Do you have one graph by database of several graphs in the same database?

What I would do is write as many queries for each check to do, then apply those queries to each graph. I think it would be more efficient.

Regards,
Cobra

I agree with @Cobra. You can write individual queries to detect each of the invalid conditions you enumerated above. You can add remediate steps to each script since you will know the specific invalid condition that script is addressing.

As an example using the scenario you highlighted in your screenshot, you are looking for a node that has more than one incoming HAS_CHILD relationships. This represents a node with multiple parents.

match(n)<-[:HAS_CHILD]-(m)
with n, count(m) as noOfParents
where noOfParents > 1
return n

Based on your model, a node can also have multiple parents if it has more than one outgoing HAS_PARENT relationship, which is invalid. You can write a query to detect this relation as well.

Write a script for each invalid scenario.

Btw, if you want to create one query that identifies all invalid scenarios to give you a consolidated report, you can string them together using the UNION clause. The example below shows two; add as many as needed.

Call {
match(n)<-[:HAS_CHILD]-(m)
with n, count(m) as noOfParents
where noOfParents > 1
return id(n) as id , ‘multiple incoming HAS_CHILD relationships’ as cause
UNION
match(m)<-[:HAS_PARENT]-(n)
with n, count(m) as noOfParents
where noOfParents > 1
return id(n) as id , ‘multiple outgoing HAS_PARENT relationships’ as cause
}
return id, cause

Thanks. I could write conditions per case even without Neo4J. This is what I'm trying to avoid. There are much more "invalid" cases than "valid" cases.

In my head I looks like define some rules for valid case, and then just do A-B where A is a set of allgraphs and B is set of valid graphs.