I'm trying to run the Louvain method of community detection on my database to group some of my nodes into clusters. However, I kept getting different results even just by trying to sort my results.
This is the result I get when I run the Louvain algorithm with no 'order by':
However, when I try to order my results by communityId, the communityId that was assigned to most of the nodes changes:
And when I try to order by nodeId instead, the communityId that was assigned changes again:
Can I get some help in understanding this please? I do understand that for Louvain algorithm, the initial communities assigned may be random at first, and hence, the end results could be different. However, I'm only introducing / changing the sorting order of the results here, not doing anything substantial to change the algorithm or the hyperparameter itself. So I'm really puzzled at how I could get different results in different runs.
I'm using Neo4J ver 4.3.1, with GDS ver 1.6.4, if that matters. Thank you!
Hi @wanderingcatto !
The reason behind this is that the algorithm randomizes the community ids, and you are executing the algorithm on each CALL command, but the communities created should be almost the same if nothing else has changed. You can validate this by getting all the members of one community, save their ids in a separate text file and search for these nodes in a new execution (you can add it to the WHERE clause), see if they are in the same community (regardless of the id). Hope this helps!
Thanks a lot for your help! Perhaps I should have been clearer. My problem was that the communities itself does change when I ran it in a different manner.
From domain knowledge, I know that the 4 particular nodes that I'm looking at are actually related to each other. In one of my earlier trial run, I've managed to get two of the nodes to belong to the same community by adjusting the tolerance level. However, I soon noticed that the community (and community id) changes with a slightly different run, and the two nodes that once belonged to the same community are subsequently in different communities. I wasn't able to replicate the results where the two nodes were in the same community since though, hence it wasn't reflected in my screenshot.
Apologies that I can't show the actual relationship between my nodes or my graph schema as it's sensitive data
Louvain is a stochastic algorithm; every time you run it you'll get different community IDs, and it's possible that community assignments may change. It uses a heuristic to maximize the modularity of each identified community - see the Wikipedia page for more details.
To make Louvain deterministic you can use Seeding - to reuse community IDs that have already been written to the graph.
If your question is "why don't the community align with your domain model" the issue may simply be that clustering based on modularity may not mirror what you expect. The nodes that you observe switching communities are likely loosely connected to the communities they're ending up in. You could try other community detection algorithms (SLLPA will detect overlapping communities, LPA is another very performant clustering algorithm) or use something like betweenness centrality to determine the structural roles of those nodes.