Threshold member count in community detection

Hi,

I would like to constrain the community detection algorithms to write a value only if the community size is a minimum member size. For example: Write a node property with community id if the total member count in that community is at least 5 members. If the count is less the value written is null. The idea is in visualization to remove isolated nodes.
Andy

Hi Making some progress, but not quite there yet.

I have the louvain algorithm streaming the result:

CALL gds.labelPropagation.stream('self_cite',{ consecutiveIds: true })
YIELD nodeId, communityId AS Community
RETURN  Community,count(nodeId),collect(nodeId) order by count(nodeId) DESC

Which gets me an intermediate stage. Now What I would like to do is to go through line by line and set the community property to the index number (0 is the largest group, then 1,2...) and set each node property in the listing to that value. I really do not need the community label as provided by the algorithm and count number is just for sorting purposes. I would like to use the count field to as a test such that it must be greater than certain value for the nodes to labelled. How do I finish this off?

The reason for this is color code within Bloom and if I reuse the same community id number (starting at 0) I can set a coloring rule that is robust to different sets of runs.
Andy

See Below

Community count(nodeId) collect(nodeId)
2 292 [264125, 264096, 264132, 264081, 267488, 263696, 268648, 267382, 267499, 263953, 264245, 263949, 263983, 263788, 267783, 264204, 263967, 264123, 264255, 264001, 264170, 267525, 264218, 263889, 264290, 263455, 264018, 264288, 264185, 263973, 264067, 263956, 267801, 263955, 267400, 264410, 263799, 270966, 263961, 267224, 267813, 267574, 267689, 264138, 267729, 264316, 264071, 267667, 264349, 267346, 267785, 264306, 267586, 267662, 264280, 264317, 267735, 267513, 263501, 267803, 264265, 267606, 267607, 264168, 267576, 264162, 264201, 267747, 267344, 263461, 264069, 267392, 267787, 263965, 267615, 264334, 267546, 267605, 267622, 264356, 264217, 264269, 267627, 264326, 267807, 267585, 267428, 267941, 264211, 267589, 267569, 267683, 264073, 264122, 267802, 267369, 263836, 264637, 264166, 267551, 267750, 267769, 264048, 267621, 267743, 266663, 264223, 271932, 264064, 263867, 267579, 263988, 264140, 267455, 267721, 263846, 264261, 267565, 267819, 263978, 267399, 263469, 267751, 267356, 267534, 263475, 267415, 264252, 263918, 263926, 272351, 267406, 267726, 264180, 264314, 267348, 263833, 264154, 264107, 264273, 264008, 267566, 267447, 264292, 264392, 267577, 263827, 263468, 264004, 263452, 271030, 267637, 264160, 264058, 267754, 267365, 263446, 263456, 267347, 267463, 267357, 263924, 264105, 267471, 267432, 267770, 266654, 264093, 263474, 263454, 267593, 267456, 267599, 267633, 267676, 267557, 267578, 267872, 267417, 267379, 267604, 267467, 268126, 264012, 267571, 267360, 267404, 264167, 267727, 267226, 267765, 264236, 271234, 264311, 264127, 267736, 264036, 267359, 261766, 261577, 262543, 262520, 262907, 261442, 261379, 261946, 261066, 261189, 262895, 262479, 262925, 261700, 260814, 261346, 261392, 261418, 262018, 261786, 261307, 262825, 261374, 262076, 262201, 262798, 261240, 262610, 262656, 261229, 262359, 261219, 262766, 262122, 261651, 261896, 261574, 261149, 262780, 262465, 262280, 262244, 261257, 260839, 261571, 260884, 261918, 261589, 261305, 262367, 262350, 261138, 261089, 260873, 261556, 261131, 262578, 261227, 262038, 261542, 261073, 262692, 262832, 261886, 262282, 261851, 262106, 262053, 262333, 261183, 261607, 262805, 262019, 261649, 261865, 261077, 262588, 261775, 262199, 261701, 261046, 260808, 262308, 261520, 261844, 261507, 261343, 261198, 262385, 261569, 261712, 262013, 262644, 261413]
5 37 [264654, 265459, 267908, 265034, 267894, 267916, 265669, 270231, 267895, 267011, 270230, 265641, 267909, 263684, 265053, 272196, 264984, 267151, 265649, 267897, 265611, 264831, 263690, 261958, 262192, 261777, 262475, 262670, 261196, 262625, 262746, 262737, 262335, 261488, 262506, 262878, 261755]
10 32 [264428, 264393, 264425, 264390, 264430, 264406, 264389, 264396, 264417, 264420, 264384, 264404, 264415, 264419, 264431, 264617, 264426, 264399, 264423, 264403, 264391, 264383, 264432, 264418, 268643, 264416, 264405, 262126, 261365, 260853, 262722, 262343]
26 22 [269228, 268888, 261643, 262102, 261888, 261265, 262594, 261867, 262900, 261325, 262767, 261347, 262483, 261596, 261266, 260827, 262030, 262109, 262740, 261959, 262720, 261926]
6 15 [268349, 262641, 262504, 261990, 261315, 261085, 261669, 262162, 261158, 261606, 262290, 262586, 262216, 261637, 262065]

Since 1.6, you use Louvain or Wcc you can directly use the parameter minCommunitySize/minComponentSize -- Only community ids with a size greater than or equal to the given value are written to Neo4j. (Louvain - Neo4j Graph Data Science).

For Label Propagation, this will be added in the next release.

Hi Florentin,
I see it is an option in the write function. Can that be extended to stream?

Also what would be really cool:
Have the consecutiveIds flag return in sort order by member count. That way id 0 is always the largest member count.

Andy

Hi Andy,
at the moment it is only supported for write mode.
I also like the idea of adding support for other modes such as stream, so maybe it will be possible in the future.