😱 Read Replia Won't Start 😱

Hey folks. Have set up a causal cluster on GCloud using Terraform. We have a single read replica that runs backups.

On dev: Read replica starts normally.

On prod, it never gets to fully started state.

``````Logs:

Sep 06 23:25:42 neo4j-prod-read-backup neo4j[12437]: 2022-09-06 23:25:42.562+0000 WARN The 'dbms.security.procedures.unrestricted' setting is overridden. Setting value changed from 'apoc.*' to 'apoc.*'.
Sep 06 23:25:42 neo4j-prod-read-backup neo4j[12437]: 2022-09-06 23:25:42.569+0000 WARN The 'dbms.security.procedures.allowlist' setting is overridden. Setting value changed from 'apoc.*' to 'apoc.*'.
Sep 06 23:25:42 neo4j-prod-read-backup neo4j[12437]: 2022-09-06 23:25:42.570+0000 INFO Note that since you did not explicitly set the port in causal_clustering.discovery_advertised_address Neo4j automatically set it to 5000 to match causal_clustering.discovery_listen_address. This behavior may change in the future and we recommend you to explicitly set it.
Sep 06 23:25:42 neo4j-prod-read-backup neo4j[12437]: 2022-09-06 23:25:42.575+0000 INFO Starting...
Sep 06 23:25:44 neo4j-prod-read-backup neo4j[12437]: 2022-09-06 23:25:44.241+0000 INFO This instance is ServerId{3f3a91ea} (3f3a91ea-a379-4684-9321-8ed4abbb6baf)
Sep 06 23:25:48 neo4j-prod-read-backup neo4j[12437]: 2022-09-06 23:25:48.690+0000 INFO ======== Neo4j 4.4.9 ========
^-- hangs here

The only difference that I can see between the two envs are IP addresses that get written to Neo4j.conf via terraform. I checked ports are open via nmap.

What can I do to debug further?

I'm a long-time Neo4j user, but not too much experience with Causal Clustering.

For the akka discovery message that fails, I can both ping this host and nmap it on port 5000

2022-09-08 00:47:51.641+0000 ERROR [a.e.DummyClassForStringSources] Outbound message stream to [akka://cc-discovery-actor-system@10.142.15.230:5000] failed. Restarting it. The connection has been aborted
akka.stream.StreamTcpException: The connection has been aborted
2022-09-08 00:47:51.661+0000 WARN [a.c.c.ClusterClient] Receptionist reconnect not successful within Some(10 seconds) stopping cluster client

Got this solved. Turned out to be firewall rules.

Hey, did you use?
firewall-cmd --zone=public --add-port=5000/tcp --permanent
firewall-cmd --reload