How to ensure results consistency in Community Detection algorithms?

mlnrt · February 25, 2023, 12:33pm

Hello,
I am using the GDS Python client to run Community detection algorithms on my Neo4j DB running on my local Ne04j Desktop (see versions at the bottom).
The Neo4j documentation mentions feature "like deterministic seeding for consistent result" but each time I run Community detection algorithms like

Modularity Optimization
Leiden
Label Propagation

I get a different number of communities each time, and I do not see any parameter to fix any seed to ensure results consistency from one run to another.
This also prevents me from running a search of the algorithms hyperparameters providing the best modularity score.

Is there really a way to do deterministic seeding for those algorithms?
How can we find the best algorithms hyperparameters (e.g. maxLevels, gamma, theta, tolerance for the Leiden algorithm) ?

Thank you in advance

Neo4j Desktop: 1.5.7
Neo4j DBMS: 5.4.0
APOC version: 5.4.1
GDS version: 2.3.1
Python Neo4j client: 5.4.0
Python GDS client: 1.6

alison.cossette · March 20, 2023, 9:30pm

You can check out the documentation here Running algorithms - Neo4j Graph Data Science.

" seedProperty - String

Some algorithms can be calculated incrementally. This means that results from a previous execution can be taken into account, even though the graph has changed. The seedProperty parameter defines the node property that contains the seed value. Seeding can speed up computation and write times."

florentin_dorre · March 23, 2023, 10:13am

Also further, you can set the randomSeed to fix the randomness inside the algorithm

mlnrt · March 30, 2023, 1:35pm

@alison.cossette, @florentin_dorre Thank you for the responses unfortunately I don't see how this is solving the problem. I can't find a randomSeed property in any of the Modularity Optimization, Leiden or Label Propagation community detection algorithm.
What I am trying to achieve is that if someone, takes my code and the raw data and rerun everything that they get consistent results. And I don't see how seedProperty is helping since in that case, there is no node property containing some seed value of previous run.
I hope this clarifies my question

florentin_dorre · April 13, 2023, 7:50am

@mlnrt Your requirement sounds a lot like the randomSeed parameter, maybe there was a confusion that its not a property but a parameter of the algorithm.
In our Leiden examples we do specify the randomSeed parameter (Leiden - Neo4j Graph Data Science).
However, you are correct that Modularity Optimization and Label Propagation do not support fixing the randomness.

Relating to your question on how to find the best hyper parameters, I would recommend to you Nathans talk - 70 - Interpreting the Results of Community Detection Algorithms - YouTube.
TLDR - inside GDS we offer the Conductance and Modularity metric.

Topic		Replies	Views
Updating in-memory gds projected graph Graph Algorithms/Graph Data Science apoc , cypher	4	331	June 29, 2023
Seed in label propagtion may get the wrong community structure? Graph Algorithms/Graph Data Science	7	680	March 16, 2020
Confused about seed label in Neo4j Label propagation algorithm and its operation Cypher cypher , knowledge-base	0	217	November 4, 2020
Set a specific seed value in gds.node2vec Procedures & APOC	2	175	February 15, 2022
How to ensure seeded community IDs don't get overwritten in Louvain algorithm Graph Algorithms/Graph Data Science	3	491	January 12, 2021

Get Certified in June!

How to ensure results consistency in Community Detection algorithms?

Related topics