We are having an experimental Multi-DC Causal Cluster setup with two Cores.
For certain reasons LEADER is locked to be one of them, the other FOLLOWER.
Our policies:
causal_clustering.load_balancing.config.server_policies.usa=\
groups(us);
causal_clustering.load_balancing.config.server_policies.europe=\
groups(eu);
We want each machine to connect local instance whenever possible, and use remote instance otherwise.
We are using the node.js neo4j driver, server version is 3.5.8 on both machines.
We do not use transactions (explicitly yet), but we specify driver.session(neo4j.session.READ) for sessions that we use only for read.
This way, on the FOLLOWER, all READ-identified queries correctly go to the local FOLLOWER instance, while the unspecified queries go to the LEADER instance (again correctly).
The problem is on the LEADER instance, where apparently the READ queries are hitting the remote instance (so the FOLLOWER).
For now, I can circumvent the issue by disabling bolt routing on the LEADER instance, but we plan to scale this up properly, with more complicated nested regional policies, and I'm wondering if we'll have similar issues then as well.
The bolt URI we're using is
bolt+routing://<ip_of_local_instance>:7687?policy=usa
bolt+routing://<ip_of_local_instance>:7687?policy=europe
which apart from this problem I described, works like charm.
I went through all docs twice, I double-triple checked all our configs, the group assignments, the policy definitions, but couldn't identify any mistake.
We suspect that we are not understanding something fundamental about how Routing works.
Our routing table looks like this. 10.10.1.1 is the LEADER, 10.20.1.1 is the FOLLOWER
ttl server.role server.addresses
300 "WRITE" ["10.10.1.1:7687"]
300 "READ" ["10.20.1.1:7687"]
300 "ROUTE" ["10.20.1.1:7687", "10.10.1.1:7687"]
Thanks for any help from anyone, we are out of ideas.