Looking for advice on community detection and cluster splitting

werner.mueller · December 16, 2023, 1:15pm

Hello

I'm currently looking for a way to split up larger detected community or a way to identify nodes that connect large sub-clusters on their own.

For some context:
The graph contains two Nodes: Case and Person.

A Case changes a Person (or several Persons)
A Person Relates to another Person

There are around 30 Mio Cases. And around 10 Mio Persons

I've read through the available algorithms Community detection - Neo4j Graph Data Science and settled for "Weakly Connected Components" and "Label Propagation" for now. The WCC alorithm splits the Person clusters apart the way I need it to. To build clusters only the Person nodes are relevant. Currently I have a small app that copies the essential data into Neo4j to do these experiments.

In the end, the cases need to be sorted by their date (all cases in one cluster) to be able to transfer data in parallel but in their correct order. But business rules on Person would fail if nodes within a cluster would be transfered in parallel.

In an abstract view:

(properties and relations are all the same but some left out for a cleaner diagram)

There are expectedly several thousand of those person clusters.
But to parallelise the transfer it would have been nice to have similar sized clusters or groups of clusters.

Sadly the largest cluster contains around 70% of all Persons.

I now wonder if there is a way to identify nodes in that huge cluster that connect large sub-clusters by their own, so I might be able to sort those sub clusters for parallel transer.

The label propagation algorithm splits up the clusters way more. But the result is hard to analyse for me.

I need to make sure the all Cases are transferd in the right order. Would I transfer clusters in parallel that have "bad" edges a business rule that relies on Person order existence fail the whole process. So none of the Person->Person edges can really be ignored.

Having identified that large cluster, is there a way to identify such Person nodes?

Or am I using the wrong alorithms anyway?

Any hint is appreciated

Topic		Replies	Views
Subgraph identification Graph Algorithms/Graph Data Science apoc	8	4476	March 23, 2020
Using Community Detection algorithms for load balancing in communication networks Neo4j Graph Platform migrated	0	151	October 24, 2022
Connect nodes using community detection when they are more than x relationships Neo4j Graph Platform migrated	0	55	November 16, 2022
Community detection on bipartite graph Graph Algorithms/Graph Data Science	3	831	November 1, 2023
Network graph clustering Neo4j Graph Platform migrated	7	144	October 31, 2022

Get Certified in June!

Looking for advice on community detection and cluster splitting

Related topics