Finding all connected nodes in a cyclic graph

harshit_cr07 · February 17, 2020, 7:02am

I have a graph with lot of cycles in it. I want to find all nodes connected to a specific node by any length of path.
I've tried this
match (a{firstName:"XYZ"})-[ * ]-(b) return distinct b
but i think this got stuck in a cycle.
So, is their any way to do this?
Also, i was curious to know that can we find path with longest length and having distinct nodes in a cyclic graph?

Joel · February 17, 2020, 7:30pm

In answer to your first question, you might want to come at it from another angle (if you are able to add properties onto the nodes). I believe what you are describing is counting the nodes in a subgraph that is disconnected from the rest of the nodes in a Neo4j db? There is at least one algorithm in apoc that can be used to label all nodes in every disconnected graph with a unique group number, once that is complete then a cypher to count the nodes in a group (all with the same group number) is quick and easy.

More information on a related thread

Doing this can be very useful to get a perspective on what is in a graph, the characteristics and number of disconnected subgraphs can provide insights.

intouch_vivek · February 17, 2020, 10:53pm

Hi Harshit,
Could you please share any dataset and your required pattern?
I tried with below

A	B
B	C
C	D
D	A
D	E
E	F
F	G

match path = (n:Person{name:'A'})-->(n:Person{name:'A'}) return path
and it gave me one cyclic path only.
A-B-C-D-A

match path = (n:Person{name:'E'})-->(n:Person{name:'E'}) return path
no path

harshit_cr07 · February 18, 2020, 5:46am

So, this is an example dataset. I want to know all connected nodes from a node (connection can be of any length) by any relationship
So, i have tried
match (a{firstName:"Vikas"})-[ * ]-(b) return distinct b.firstName
but this query does not give results as it may have got stuck in a cycle....i wanted to know a way to do this

andrew_bowman · February 18, 2020, 4:19pm

Just to clarify something, Cypher queries will never get caught in an infinite loop. Cypher uses a uniqueness property such that per path, a relationship may only be used once. By definition infinite loops must reuse relationships.

That said, Cypher can be overwhelmed by the permutation of all possible distinct paths (that don't repeat relationships), that's what's happening here. If you set an upper bound on your var length relationship (let's say *..8), and RETURN count(*), you might see a pretty huge humber of paths. Keep increasing that by one and getting the counts, and that should explain why your query is hanging. It's likely trying to process tens or hundreds of millions or even billions of possible paths, each distinct because they use a different permutation of relationships.

This is also why it's generally a bad idea to to have reciprocal relationships between two nodes if a single relationship is enough to capture the the relation...the sheer presence of a second relationship between every two related nodes vastly increases the permutations of distinct paths for queries like these.

As for solutions, if you have APOC Procedures installed you can use apoc.path.subgraphNodes(), which is designed to find distinct connected nodes from a start node. It uses a different kind of uniqueness than Cypher such that a node can only be visited once across all paths, that means it's not prone to the skyrocketing permutations of paths issue that's affecting your current query.

Joel · February 18, 2020, 5:08pm

Hi Andrew, Thank you for your post on path following. Are there any references that cover this in more detail?

One note on this to be sure everyone is aware, (and let me if I'm wrong Andrew) as I understand it a single path can cross over the same node multiple times, as long as it is following unique relationships. A nuance that might not be obvious at first, but is important to know.

andrew_bowman · February 18, 2020, 6:18pm

Yes, here's the section on uniqueness behavior in Cypher in the current docs, with some examples.

With this approach, you are correct that the same node can occur multiple times in the same path as long as every relationship in the path is unique to the path.

In the APOC docs there's a section on the different kinds of uniqueness that exist. These can only be changed/selected via the traversal API. The APOC path expanders are built using the traversal API, and you can of course build your own custom procs that use it too.

intouch_vivek · February 24, 2020, 2:49pm

Hi Harshit,

Dataset I used

Query I tried to get the nodes which are somehow connected with A
match path = (from:Person{name:"A"})-->(to:Person) return distinct apoc.map.mergeList(nodes(path))

andrew_bowman · February 24, 2020, 5:22pm

mergeList() isn't an aggregation function, so that likely wouldn't work (each path is on a separate row and likewise the path nodes). What you're doing is, per path, merging the properties of all of the nodes of that particular path together, so it's like you have one smushed properties map representing all the nodes of that path at once (and likely several of those properties are overwriting the others), which likely isn't what you want.

To get all nodes connected from your start node, you could use the path expander procedures:

If direction doesn't matter:

MATCH (from:Person{name:"A"})
CALL apoc.path.subgraphNodes(from, {labelFilter:'Person'}) YIELD node
RETURN node

This finds all :Person nodes at any distance via any relationship reachable from your start node. The nature of the procedure is such that the nodes yielded will be distinct.

intouch_vivek · February 24, 2020, 5:28pm

It is possible that mergeList will not give the desired ouput however when I tried subgraph it gave me all the nodes, irrespective whether i can reach from A or not .
Output of query you gave me is as below and you can see that although I cannot reach to I from A but even then it is coming

andrew_bowman · February 24, 2020, 6:11pm

Ah, I missed that direction matters to you, the query I gave would find you all connected nodes regardless of the direction of the relationships. We can add a relationship filter to get you the desired results:

MATCH (from:Person{name:"A"})
CALL apoc.path.subgraphNodes(from, {labelFilter:'Person', relationshipFilter:'>'}) YIELD node
RETURN node

For this simple graph, a small fix on your previous Cypher should work just as well:

match path = (from:Person{name:"A"})-->(to:Person)
return distinct to

If you want the nodes in a list rather than per row, then you can chang eyour return to return collect(distinct to) as nodes

Topic		Replies	Views
Finding all connected nodes in a cyclic graph Neo4j Graph Platform migrated	1	230	September 5, 2022
Find random path of length n between two nodes and remove cycles Cypher apoc , cypher	1	308	April 20, 2022
Finding paths between multiple nodes Cypher apoc	3	9746	May 13, 2020
Getting to a certain label Cypher	4	4002	May 7, 2019
"Appendix" sub-graph detection Neo4j Graph Platform migrated	6	197	October 10, 2022

Take the Course Then Join The Aura Agent Hackathon

Finding all connected nodes in a cyclic graph

Related topics

Take the Course Then Join
The Aura Agent Hackathon