Two leaders in cluster after OutOfMemoryError

The defaults for the min core cluster size at runtime should be fine.

This is a bit of a confusing topic, but I'll try to summarize.

In causal clusters, you can think of cluster membership like a VIP list, and only cluster members are allowed to take part in consensus operations, like commits. The number of members on the list is used to calculate majority quorum (for 3 members, majority is 2...for 4 or 5 members, majority is 3, etc). It is possible for a node to be a cluster member, but be unreachable or offline, and it still counts as a cluster member when it comes to calculating the majority.

The membership list can increase or decrease as members are added and removed. The thing to remember is that the voting in or out of members requires a majority quorum vote to succeed.

So if you have a 3-node cluster and another node is added, a majority (2) is needed to vote to add that member to the membership list and be counted as a cluster member (and the majority number will be updated to 3). If a member becomes unreachable or shuts down, even though it may be unreachable, a vote must take place first to remove that node's cluster membership, so 3 nodes must be online and able to vote out the member (and the cluster size would decrease to 3, majority becomes 2).

The minimum core cluster size at runtime (default 3) means the membership list cannot shrink below 3 members. If one node of the 3-node cluster leaves or is unreachable, there will not be a vote to remove the member since that would lower the cluster members below the minimum. Therefore the node that left is still counted as a cluster member, even if it is offline.

This is also why when you lose quorum (for example 2 out of 3 cluster members offline) that the only way to regain quorum is to bring back one of those offline members. You cannot add a brand new node to the cluster to recover, as quorum is required to vote in new cluster members.

And just to note, that is all by design in the Raft protocol, as this behavior is critical to maintaining data durability in the cluster. If we allowed brand new cluster members to be added even when we have lost quorum, although it would be a very fast way to regain write capability, it would risk scenarios that can result in losing previous commits to the database.

1 Like