Hi everyone. I'm a DevOps engineer at the startup I work at and I'm currently experimenting with an architecture for a Neo4j database: basically I want to have two Causal Clusters of 3 nodes each, one of which will always be off at any given time (i.e. the two clusters, let's call them A and B, are mutually exclusive). Now, what I also want is to separate the data out from the VMs of the clusters, so what I did is I created a NFS server that exports directories for the following file locations: /var/lib/neo4j/data
, /var/lib/neo4j/import
, /var/lib/neo4j/logs
, /var/lib/neo4j/metrics
. What I also want is that the data is shared between the two clusters (that is, if I turn cluster A off and turn cluster B on I expect the data that was added while cluster A was on to still be present when querying using cluster B). I created a proof of concept of my desired architecture, and at first it seemed to work: I logged into the DB using cypher-shell
from cluster A, changed the password because of the change in data
location, added a node, no issue at all. Then I tried to turn cluster A off and turn cluster B on to see if it worked like I wanted, but I ran into the following problem: at first I couldn't connect to the DB using cypher-shell
using the neo4j
scheme, I had to fall back to bolt
. I quickly checked if the data that was added from cluster A was present, and to my delight it was there. However, what happened next was very disturbing: in the output of :show databases
all the VMs in cluster B were FOLLOWER
for all databases. That means, no writes could be processed at all. I turned off cluster B and turned on cluster A to see if that was still working, but the situation was exactly the same (all followers, no writes allowed). Can somebody help me figure out what's going on? Is there a way to trigger a Leader election manually? Or is there a better solution that I'm not seeing? Is what I want to achieve even possible?
I'm sorry for hitting you with a bunch of words and no actual outputs and diagrams. I don't have time to draw them and fetch logs and post them right now, but I'll be sure to do that tomorrow if that could help in understanding the architecture I'm trying to create and solving the problem.
Thanks in advance for any help with this.