Experimental Multi-DC Causal Cluster with two Core - Bolt Routing issue

csaba.tuz · October 16, 2019, 12:10pm

We are having an experimental Multi-DC Causal Cluster setup with two Cores.
For certain reasons LEADER is locked to be one of them, the other FOLLOWER.

Our policies:

causal_clustering.load_balancing.config.server_policies.usa=\
groups(us);
causal_clustering.load_balancing.config.server_policies.europe=\
groups(eu);

We want each machine to connect local instance whenever possible, and use remote instance otherwise.

We are using the node.js neo4j driver, server version is 3.5.8 on both machines.

We do not use transactions (explicitly yet), but we specify driver.session(neo4j.session.READ) for sessions that we use only for read.

This way, on the FOLLOWER, all READ-identified queries correctly go to the local FOLLOWER instance, while the unspecified queries go to the LEADER instance (again correctly).

The problem is on the LEADER instance, where apparently the READ queries are hitting the remote instance (so the FOLLOWER).

For now, I can circumvent the issue by disabling bolt routing on the LEADER instance, but we plan to scale this up properly, with more complicated nested regional policies, and I'm wondering if we'll have similar issues then as well.

The bolt URI we're using is
bolt+routing://<ip_of_local_instance>:7687?policy=usa
bolt+routing://<ip_of_local_instance>:7687?policy=europe
which apart from this problem I described, works like charm.

I went through all docs twice, I double-triple checked all our configs, the group assignments, the policy definitions, but couldn't identify any mistake.

We suspect that we are not understanding something fundamental about how Routing works.

Our routing table looks like this. 10.10.1.1 is the LEADER, 10.20.1.1 is the FOLLOWER

ttl server.role server.addresses
300 "WRITE" ["10.10.1.1:7687"]
300 "READ" ["10.20.1.1:7687"]
300 "ROUTE" ["10.20.1.1:7687", "10.10.1.1:7687"]

Thanks for any help from anyone, we are out of ideas.

andrew_bowman · October 16, 2019, 8:55pm

The leader node will only be routed to for writeTransaction() queries. readTransaction() queries will only route to follower and read replica nodes in the cluster, so with only a 2-node cluster reads will only ever go to the single follower.

I think you would need a minimum of 3 nodes in the cluster to do what you want (an additional node deployed with your leader), ensuring the single node is configured to refuse to be leader.

Topic		Replies	Views
Implementing bolt+routing:// Drivers & Stacks	3	2096	October 19, 2019
Load balancing in Causal Cluster consisting of only Cores Cluster cluster	3	707	October 27, 2020
Writing Data In Neo4J Causal Clustering Implementation Neo4j Graph Platform	2	339	December 28, 2020
Bolt+routing and PHP PHP	3	1299	August 29, 2019
Bolt+routing Cluster	3	4139	September 7, 2018

July Summer Fun!

Experimental Multi-DC Causal Cluster with two Core - Bolt Routing issue

Related topics