Detecting cycles using Cypher

While analyzing Java code using jQAssistant I'm frequently facing the problem to detect cycles between packages or modules. My current approach is using the following query:

MATCH
  (m1:Module)-[:DEPENDS_ON]->(m2:Module),
  cyclePath=shortestPath((m2)-[:DEPENDS_ON*]->(m1))
RETURN
  m1, nodes(cyclePath) as cycle

The query works well but the result is redundant: A cycle "a,[b,c,a]" is reported redundantly
i.e. "b,[c,a,b]" and "c,[a,b,c]".
It's clear why this happens - each involved module is a starting point for the cycle - but I'm currently looking for an efficient way to reduce this to just one result. Any ideas out there?

Cheers

Dirk

Yes, you can find the lowest (or highest) id node in the collection of cycle nodes, then pick the result with that node as the first:

MATCH
  (m1:Module)-[:DEPENDS_ON]->(m2:Module),
  cyclePath=shortestPath((m2)-[:DEPENDS_ON*]->(m1))
WITH
  m1, nodes(cyclePath) as cycle
WHERE id(m1) = apoc.coll.max([node in cycle | id(node)])
RETURN m1, cycle

This uses a collection helper function from APOC Procedures.

1 Like

Would be kinda cool to have a proc that would create normalized cycles from node + path
e.g. apoc.path.cycle(node, path)
by returning a path that always starts with the smallest id
then you could just use DISTINCT

Wouldn't it be even better to have a proc to detect cycles returning them in normalized form? The input must somehow represent a collection of nodes and a relationship type to traverse.

1 Like

Yes perhaps even Andrews expand procedures infrastructure could be used for that which are already quite powerful

Andrew,

I wasn't aware of apoc.coll.max, this is a nice use of it ...

Suggest to add an additional

WHERE m1 <> m2

to avoid shortestPath error when a self-reference is present

Krid