Data storage among the Read Replicas


(Abhishek2271) #1

Hi Guys,

I am new to neo4j and really finding the graph platform interesting.
I went through the operation manual and details about clustering but I could not find how data is distributed among the read replicas.
It says in the manual that each read replica maintains a transaction log so I guess each read replica stores all data but does it make sense for all read replicas to hold all of the data? or am I understanding this wrong?


(Andrew Bowman) #2

There isn't sharding here, all nodes in the cluster (including read replicas) contain a copy of the entire graph. The goal of the read replicas is to further scale out read transactions by distributing the transactions among the available followers + read replica nodes.


(Abhishek2271) #3

Thanks a lot :slight_smile:
I guess that makes sense.
One another question, while handling a query when the application gives query to a Core Server. How does the system choose which core server to handover the query (Leader) to? How is the leader selected? Is leader same for every instance. In the manual, it says that a server is a leader for a particular instance so I was a bit confused on how is the leader selected for a particular instance.

Also does this mean that the transaction is distributed across the core members as well? I thought that the core members were there only for redundancy of data.


(Andrew Bowman) #4

You may want to review our causal clustering docs if you haven't already. A causal cluster only has a single leader at a time, as a result of an election among the core nodes of the cluster (this uses the raft protocol). The leader is the only node in the cluster allowed to process write queries. Transaction commits occur via the raft protocol, which propagates them across the cluster from the leader to the other nodes in the cluster.

The routing is only used when using the bolt+routing protocol, and routing decisions happen at the driver level on the client, not on the server. The server does provide the routing table to the client/driver on initial connection, and updates the driver with routing table changes, but each client driver handles the actual routing (here's the documentation, we'll be surfacing relevant parts of this to the causal clustering docs so it's more visible).

In the client code, if a transaction or session is explicitly set as a READ transaction it will be routed to one of the follower or read replica nodes, and this covers the horizontal scaling for reads. Otherwise (by default or explicitly set as a WRITE transaction) the query will be routed to the leader.

The core members participate in the raft protocol, receiving graph updates from the leader, and also participating in leader elections if the current leader goes offline or needs to step down. So core nodes are all capable of becoming the leader should they win a raft leader election. This can allow the cluster to gracefully handle a number of node/leader failures, allowing writes to continue if a quorum of core nodes is maintained, but only allowing reads in the event quorum is lost.


(Abhishek2271) #5

Thanks for clearing this. I was mainly confused because I thought the leader election happen for every query from client.
Thanks again for the answer.