Want to know Types of Clustering Neo4j supports

Hello Team,

As I am learning a Neo4j , there is a need of configuring an Neo4j application in Cluster.
Please suggest which types of clustering can be achieved in Neo4j like from Below options

  1. RED HAT CLUSTERING ( ACTIVE / ACTIVE 2 NODE CLUSTERING OR ACTIVE / PASSIVE 2 NODE CLUSTERING)

  2. VERITAS CLUSTERING ( ACTIVE / ACTIVE 2 NODE CLUSTERING OR ACTIVE / PASSIVE 2 NODE CLUSTERING)

  3. CAUSAL CLUSTERING ( ACTIVE / ACTIVE 2 NODE CLUSTERING OR ACTIVE / PASSIVE 2 NODE CLUSTERING)

Please also suggest which clustering is preferred and why for enterprise edition?

Regards
AK

At this point in time Causal Clustering is the approach to use (the only other option is the legacy HA clustering approach, which is deprecated and will not be present in Neo4j 4.0 and up).

This is built on the Raft protocol, only a single node of the cluster is the leader at a time (with the capability to write). A majority quorum of online core cluster members is required for commit operations (including accepting new members into the cluster or voting out members that are no longer responsive). The formula M = 2F + 1 is used where M is the core cluster size necessary in order to tolerate F simultaneous faults (failed core members) and maintain write capability.

Read replica instances can be attached to the cluster that will not participate in Raft commit operations but will replicate transactions from the core cluster members, and these can be used for horizontal scaling for servicing read queries.

Hello Andrew,

So as per your recommendations , Causal Clustering will be preferred over legacy HA clustering ( VCS or RED HAT Clustering ) as Neo4j 4.x will not support these legacy clustering.
May I know if we use these legacy clustering for neo4j version less than 4.x then what is your recommendation?
Please help in sharing some basic concept of causal clustering.
And I also require some test cases to test the causal clustering for 2 nodes or may be 3 nodes.

Please share your valuable thoughts.

Regards
AK

I don't know much about VCS or RED HAT clustering, so I can't say anything about how similar our clustering is to these.

We recommend Causal Clustering in nearly all situations. The only practical reason not use causal clustering is if you're using an embedded Neo4j deployment, as Causal Clustering does not yet support that.

Among the reasons to avoid the legacy HA clustering is that split brain (branching data) can occur in the presence of network partitions. In Causal Clustering this is impossible.

The documentation I linked should provide examples of Causal Clustering and the Raft protocol. We recommend using 3 core node clusters at a minimum. This would allow you to tolerate a single failure while maintaining quorum and write capability. If an additional node is lost then the cluster will lose quorum and will change into a read-only mode until one of the two offline nodes is brought back online.

While it is possible to configure the cluster to allow a 2-node cluster, you will not be able to tolerate any failures without losing quorum and write capability.

While it is possible to configure the cluster to allow a 2-node cluster, you will not be able to tolerate any failures without losing quorum and write capability.

Wouldn't it be possible to modify the Raft algorithm so that in case there are only 2 nodes then a majority is not required to select a leader, but one of the nodes is simply selected as the master, and the other one as a slave?

You can modify Raft for your own implementation if you decide to code your own clustering solution...but you cannot change how Neo4j implements Raft. Majority quorum is required for leader election, write capability, and vote-in/vote-out.

Also just pointing out, 2/3 nodes online still has quorum. It's only when you drop one of those nodes (or communication is interrupted between the two) that quorum is lost and the cluster shifts into read-only until quorum is restored.

Note again that the behavior is not some arbitrary decision to try to work around, it is criticial to maintaining durability and data integrity for cluster behavior. Drop that requirement, and you introduce possibilities for split-brain and data loss.

For example, assume a network partition separates communication between the two nodes. Using your proposal, both can potentially declare themselves leader of their now single-node cluster and begin accepting writes. Depending on how queries are dispatched, you could have cases where you successfully commit on one node, then your next request goes to the other node where your tx was never processed. And once both nodes get back in touch with each other, there is no means of resolving inconsistencies, you will have to drop one or the other. To avoid that messiness and to maintain data integrity we stick with the raft implementation.

1 Like

Thank you for a very good answer! It makes sense. I didn't think about the split-brain potential.