Detecting cycles using Cypher


(Dirk Mahler) #1

While analyzing Java code using jQAssistant I'm frequently facing the problem to detect cycles between packages or modules. My current approach is using the following query:

MATCH
  (m1:Module)-[:DEPENDS_ON]->(m2:Module),
  cyclePath=shortestPath((m2)-[:DEPENDS_ON*]->(m1))
RETURN
  m1, nodes(cyclePath) as cycle

The query works well but the result is redundant: A cycle "a,[b,c,a]" is reported redundantly
i.e. "b,[c,a,b]" and "c,[a,b,c]".
It's clear why this happens - each involved module is a starting point for the cycle - but I'm currently looking for an efficient way to reduce this to just one result. Any ideas out there?

Cheers

Dirk


(Andrew Bowman) #2

Yes, you can find the lowest (or highest) id node in the collection of cycle nodes, then pick the result with that node as the first:

MATCH
  (m1:Module)-[:DEPENDS_ON]->(m2:Module),
  cyclePath=shortestPath((m2)-[:DEPENDS_ON*]->(m1))
WITH
  m1, nodes(cyclePath) as cycle
WHERE id(m1) = apoc.coll.max([node in cycle | id(node)])
RETURN m1, cycle

This uses a collection helper function from APOC Procedures.


(Michael Hunger) #3

Would be kinda cool to have a proc that would create normalized cycles from node + path
e.g. apoc.path.cycle(node, path)
by returning a path that always starts with the smallest id
then you could just use DISTINCT


(Dirk Mahler) #4

Wouldn't it be even better to have a proc to detect cycles returning them in normalized form? The input must somehow represent a collection of nodes and a relationship type to traverse.


(Michael Hunger) #5

Yes perhaps even Andrews expand procedures infrastructure could be used for that which are already quite powerful


(Paul Thomas) #6

Andrew,

I wasn't aware of apoc.coll.max, this is a nice use of it ...