Error while configuring cluster in neo4j

Siddarth · November 26, 2022, 4:50am

I am creating a cluster with 3 primaries in AWS EC2 instance. I have enabled all the ports related to cluster in the security group. I have made all the relevant changes in the configuration file. Getting the below error when configuring the same. Please assist on this.

Caused by: com.neo4j.causalclustering.seeding.FailedValidationException: The seed validation failed with response [RemoteSeedValidationResponse{status=FAILURE, remote=10.185.168.40:6000}, RemoteSeedValidationResponse{status=FAILURE, remote=10.185.168.45:6000}]
at com.neo4j.causalclustering.seeding.MinimumNumberOfValidRemotesRule.check(MinimumNumberOfValidRemotesRule.java:29) ~[neo4j-causal-clustering-5.2.0.jar:5.2.0]
at com.neo4j.causalclustering.seeding.Validation.validate(Validation.java:89) ~[neo4j-causal-clustering-5.2.0.jar:5.2.0]
at com.neo4j.causalclustering.seeding.SeedValidationProcess.validateSeed(SeedValidationProcess.java:62) ~[neo4j-causal-clustering-5.2.0.jar:5.2.0]
at com.neo4j.causalclustering.seeding.SeedValidationLifecycle.start(SeedValidationLifecycle.java:35) ~[neo4j-causal-clustering-5.2.0.jar:5.2.0]
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:353) ~[neo4j-common-5.2.0.jar:5.2.0]
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:92) ~[neo4j-common-5.2.0.jar:5.2.0]
at com.neo4j.dbms.database.EnterpriseDatabase.start(EnterpriseDatabase.java:47) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.database.DatabaseManager.startDatabase(DatabaseManager.java:179) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.database.DatabaseManager.forSingleDatabase(DatabaseManager.java:242) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.database.DatabaseManager.startDatabase(DatabaseManager.java:140) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.Transition$Prepared.doTransitionAction(Transition.java:94) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.Transition$Prepared.doTransition(Transition.java:83) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:257) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:258) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:258) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.doTransitionSteps(DbmsReconciler.java:245) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.executeJob(DbmsReconciler.java:203) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.lambda$scheduleReconciliationJob$2(DbmsReconciler.java:187) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.ReconcilerJob.executeJob(ReconcilerJob.java:49) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.ReconcilerJobManager$ReconciliationWorker.run(ReconcilerJobManager.java:192) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at org.neo4j.kernel.impl.scheduler.ThreadPool.lambda$asCallable$1(ThreadPool.java:136) ~[neo4j-kernel-5.2.0.jar:5.2.0]
at org.neo4j.kernel.impl.scheduler.ThreadPool.lambda$submit$0(ThreadPool.java:108) ~[neo4j-kernel-5.2.0.jar:5.2.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
2022-11-25 18:32:06.728+0000 INFO Neo4j Server shutdown initiated by request
2022-11-25 18:32:06.729+0000 INFO Stopped.

john.stegeman · November 26, 2022, 3:31pm

@Siddarth are you starting all 3 servers at once? You can't just start one in this instance, as it will fail to reach a quorum.

john.stegeman · November 27, 2022, 2:09pm

@Siddarth , Clustering uses more than just port 6000. You'll want to check these in your neo4j.conf:

server.discovery.listen_address
server.cluster.listen_address
server.cluster.raft.listen_address
server.routing.listen_address

Defaults would be 5000, 6000, 7000, and 7688

Siddarth · November 27, 2022, 10:38am

In the firewall layer also port 6000 port has been whitelisted. Not sure what i am missing here. I have followed as per the documentation.

Siddarth · November 27, 2022, 10:34am

Yes i am starting all the 3 instance at once.

john.stegeman · November 28, 2022, 2:46pm

connection refused... seems like a networking issue if those other servers are up

Siddarth · November 28, 2022, 12:23pm

@steggy Further analyzing the debug logs getting the below error

2022-11-28 12:15:40.601+0000 INFO [c.n.d.DbmsReconciler] Database 'system' is requested to transition from INITIAL{db=system/00000000} to STARTED{db=system/00000000} by Startup:0
2022-11-28 12:15:40.607+0000 INFO [c.n.d.d.DatabaseManager] Creating db='DatabaseId{00000000[system]}'. options='DatabaseOptions[settings={}, mode=RAFT]'
2022-11-28 12:15:40.689+0000 WARN [a.s.Materializer] [outbound connection to [akka://cc-discovery-actor-system@ip-10-185-168-56.ec2.internal:5000], control stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(ip-10-185-168-56.ec2.internal/:5000,None,List(),Some(5000 milliseconds),true)] failed because of java.net.ConnectException: Connection refused
2022-11-28 12:15:40.691+0000 WARN [a.s.Materializer] [outbound connection to [akka://cc-discovery-actor-system@ip-10-185-168-48.ec2.internal:5000], message stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(ip-10-185-168-48.ec2.internal/:5000,None,List(),Some(5000 milliseconds),true)] failed because of java.net.ConnectException: Connection refused
2022-11-28 12:15:40.695+0000 WARN [a.s.Materializer] [outbound connection to [akka://cc-discovery-actor-system@ip-10-185-168-48.ec2.internal:5000], control stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(ip-10-185-168-48.ec2.internal/:5000,None,List(),Some(5000 milliseconds),true)] failed because of java.net.ConnectException: Connection refused
2022-11-28 12:15:40.695+0000 WARN [a.s.Materializer] [outbound connection to [akka://cc-discovery-actor-system@ip-10-185-168-56.ec2.internal:5000], message stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(ip-10-185-168-56.ec2.internal/:5000,None,List(),Some(5000 milliseconds),true)] failed because of java.net.ConnectException: Connection refused
2022-11-28 12:15:40.792+0000 INFO [c.n.c.u.s.TypicallyConnectToRandomSecondaryStrategy] [system/00000000] Using upstream selection strategy typically-connect-to-random-secondary

Any idea on this error?

Siddarth · November 28, 2022, 5:34pm

@steggy yes all the 3 servers are up and I have whitelisted the ips in the security group. When running as a standalone instance able to telnet to the port 5000 and all other ports. But when I make the changes in the configuration and start the cluster getting this error. Not sure what is happening behind the scene. Have also whitelisted the dynamic ports as well. There is no firewall also running inside the server. Rules are taken care in security group alone.

Siddarth · November 28, 2022, 8:26am

@steggy . Below is the configuration in my neo4j conf file

server.default_listen_address=0.0.0.0
server.discovery.listen_address=0.0.0.0:5000
dbms.cluster.discovery.endpoints=10.185.168.49:5000,10.185.168.40:5000,10.185.168.45:5000
server.cluster.listen_address=0.0.0.0:6000
server.cluster.raft.listen_address=0.0.0.0:7000
server.routing.listen_address=0.0.0.0:7688

Please correct me if i need to make any changes here. Neo4j version is 5.2.0

john.stegeman · November 28, 2022, 1:25pm

They all look OK to me. Do you have all of those ports open in your security group?

Siddarth · November 28, 2022, 1:42pm

@steggy Yes i have. Standalone instance is working fine as expected and able to telnet the host as well. But when i make the changes in the configuration and try to start the service its getting failed with the mentioned error.

Lanja_Ibonia · January 18, 2023, 10:47am

I have the same problem, did you find a solution

Topic		Replies	Views
Failed to config cluster neo4j v5 General migrated	0	188	January 25, 2023
Causal Cluster not forming Neo4j Graph Platform	5	5686	October 18, 2018
Causal cluster - Neo4j not running but it is? Neo4j Graph Platform	7	2556	December 4, 2019
Cannot form the causal cluster on remote servers Cluster	1	1369	February 7, 2019
Set up cluster neo4J 5 Drivers & Stacks migrated	0	332	January 26, 2023

Demystifying Neo4j UX Research

Error while configuring cluster in neo4j

Related topics