Subgraph identification

I have a problem: nodes and nodes are connected to each other into dozens of small networks through various relationships. How can I mark each subgraph? In other words, how do I know how many subgraphs, and what are each subgraphs? Thank you very much for your help!

2 Likes

You may want to check out the community detection graph algorithms that come as plugins to Neo4j.

When we talk about marking subgraphs, first we'd probably look at a partition of only certain node labels and relationship types. You'd then apply the community detection to that partition, and it would assign it a community ID (just an integer really).

These algos take a bit of tuning though according to how connected the communities are and how precise you want to be. Consider it like this: at one extreme, your entire graph is always 1 community. At the other extreme, every node is its own community, nothing is quite like it. So you have to find a balance that fits for your use case.

Make sure in particular -- in the community detection algos there are multiple options. Each one has a section on "Use cases" / When to use this. Read those carefully before choosing.

1 Like

First of all, thank you very much for your help! I think I may not describe my problem: for example, there are 10,000 nodes, each 100 nodes are connected to each other to form a subgraph, the subgraph and the other subgraphs are not connected, I need to mark each subgraph. I think what I need is the Connected Components algorithm. Is my choice correct? Thank you!

1 Like

If the communities are not connected, then yes, I think connected components is probably what you want, to identify the different "islands".

thank you very much!

I have some problems when using the Connected Components algorithm: Enter the example in the official website -> cypherCALL algo.unionFind.stream('User', 'FRIEND', {})
YIELD nodeId, setId

RETURN algo.asNode(nodeId).id AS user, setId.Display ERROR:Neo.ClientError.Statement.SyntaxError
Neo.ClientError.Statement.SyntaxError: Unknown function 'algo.asNode' (line 4, column 8 (offset: 78))

"RETURN algo.asNode(nodeId).id AS user, setId"
Neo.ClientError.Statement.SyntaxError: Unknown function 'algo.asNode' (line 4, column 8 (offset: 78)) "RETURN algo.a... How do I deal with this error?
Thank you very much!

I have some problems when using the Connected Components algorithm: Enter the example in the official website -> cypherCALL algo.unionFind.stream('User', 'FRIEND', {})
YIELD nodeId, setId . I experimented many times, each time the generated setId result is different. Why is this?

I completely second this. I've built several apps where this is needed; e.g. molecule for many atoms. "Just draw a circle around the entire subgraph and mark it as an entity"

I have connect an aggregate molecule object to.... what.... the first atom? which one is first? And then cypher query starting from that (molecule)-(atom)-[:BOND*]-(atom) instead of just grabbing the subgraph. apoc not ideal, and not supported in OGMs yet.

@ruanlovelin @HashRocketSyntax Use WCC instead of apoc.subgraph: https://neo4j.com/docs/graph-data-science/1.0/algorithms/wcc/

Weakly Connected Components in the new GDS library supports consecutiveIds as an optional configuration parameter, which will label each of the components found with a consecutive integer. In addition, you can use seedProperty so that each time you run on your graph, you initialize from the existing community identifiers and re-use those labels.