I have been using GDS (2.8.0) to calculate graph clusters and have noticed an unexpected result. I was under the impression that the intermediate communities were always hierarchical ( so that all the "intermediateCommunityIds" for the nodes would form a Directed Acyclic Graph), but I have just run the Leiden algorithm and have identified a node whose intermediateCommunityIds are [162, 1, 30] and another whose ids are [10, 30, 1].
Clearly then, those communities 1 & 30 are forming a circular relationship.
Have I misunderstood what the Leiden algorithm is doing, is there some setting that I can use to prevent this occurring, or is this perfectly normal?
Hi @bruceweir1.bw,
This does look a little weird on first glance, but it can actually be totally normal behaviour,
let me explain:
Leiden works on a series of graphs as follows for graph G(i) we find communities C(i) and we create a new graph G(i+1) where each community of C(i) corresponds to a node in G(i+1).
Assuming C(i) consits of communities {0,10,100,..} for example in G(i) community 0 will get vertex id 0, 10 1, 100 2 etc. This means that there is no immediate commection between C(i), and C(i+1).
The intermediate cmmunities is just that intermediate: copies of C(i) for each i. Hence what you have described can possibly happen, communities 10,30 in second iteration are different than 10,30 in the third iteration.
Here is a small DAG example
C(2). uvjde
C(1). uv jde
C(0). u v j d e
The hierarchy dag creates a node for each intermediate community (in essence, contains all nodes for G(1),,G(last)). And there are links from directed edge from v to c_j(v):
here the links go u to uv, v to uv etc...
Feel free to ask for more, feel free to ask mroe if you have any other questions.
Best regards,
Ioannis.
Thanks for you response, Ioannis.
What I need to do is trace the community hierarchy at each iteration of Leiden. I.e, for the DAG example that you posted, I would need to know that community uvjde at iteration 3 has has parents uv and jde from iteration 2.
Since the community numbers are not consistent between iterations, it is there an easy way to do this, or is it just a question of tracing node membership for each iteration?
Yes, you would have to do manually unfortunately.
However there is a solution that should in theory work:
The Leiden algorithm comes with seedProperty
parameter.
This is used to give each node a starting community id in order to help the algorithm converge faster if you are already aware of some communities in the graph etc.
However it also has the effect that intermediate / as well as final communities are "mapped back" to the initial starting communities. So in iteration 3, community 10 contains all nodes of community 10 of iteration 2. However you would need to manually verify that C(3)[10] also contains there communities of C(2) that become merged in it, but I do not think this could be avoided anyway.
What you can do is create a new node property that gives to each node a distinct integer value. Then project a graph in gds with it, and use it one as seedProperty.
Then that might make it easier for you to follow along the movements between communities in iterations.
Let me know if that helps you or if you notice any weird behaviour there.
Best regards,
Ioannis.
Ah, thanks for the suggestions. In the interests of being able to work out what the heck the code is doing six months from now I think that I will go with the manual tracing of community relationships across iterations.