Today our AWS hosted Enterprise Causal cluster hung (couldn't run any queries against any of the three nodes). We've spent 5 hours trying to recover it, and resorted to unbinding all nodes in the cluster. Unfortunately the cluster is failing to restart with errors like
2021-10-12 23:59:40.066+0000 WARN [a.r.a.InboundHandshake$anon$2] Dropping Handshake Request from [akka://cc-discovery-actor-system@<redacted>.compute.internal:5000#-<numbers>] addressed to unknown local address [akka://cc-discovery-actor-system@<private_dns_name>:5000]. Local address is [akka://cc-discovery-actor-system@<redacted>.compute.internal:5000]. Check that the sending system uses the same address to contact recipient system as defined in the 'akka.remote.artery.canonical.hostname' of the recipient system. The name of the ActorSystem must also match.
Can anyone give any suggestions what this means and what configuration to check?
The cluster was running fine until it hung. The nodes in the cluster are resolving each other, but seems like there's a hostname check that's bouncing cluster formation attempts.
What do the names
[akka://cc-discovery-actor-system@<redacted>.compute.internal:5000#-<numbers>]
and
[akka://cc-discovery-actor-system@<private_dns_name>:5000]
correspond to/where do they come from, and what might need changing so they align?