cancel
Showing results for 
Search instead for 
Did you mean: 

Error while configuring cluster in neo4j

Siddarth
Node

I am creating a cluster with 3 primaries in AWS EC2 instance. I have enabled all the ports related to cluster in the security group. I have made all the relevant changes in the configuration file. Getting the below error when configuring the same. Please assist on this.

Caused by: com.neo4j.causalclustering.seeding.FailedValidationException: The seed validation failed with response [RemoteSeedValidationResponse{status=FAILURE, remote=10.185.168.40:6000}, RemoteSeedValidationResponse{status=FAILURE, remote=10.185.168.45:6000}]
at com.neo4j.causalclustering.seeding.MinimumNumberOfValidRemotesRule.check(MinimumNumberOfValidRemotesRule.java:29) ~[neo4j-causal-clustering-5.2.0.jar:5.2.0]
at com.neo4j.causalclustering.seeding.Validation.validate(Validation.java:89) ~[neo4j-causal-clustering-5.2.0.jar:5.2.0]
at com.neo4j.causalclustering.seeding.SeedValidationProcess.validateSeed(SeedValidationProcess.java:62) ~[neo4j-causal-clustering-5.2.0.jar:5.2.0]
at com.neo4j.causalclustering.seeding.SeedValidationLifecycle.start(SeedValidationLifecycle.java:35) ~[neo4j-causal-clustering-5.2.0.jar:5.2.0]
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:353) ~[neo4j-common-5.2.0.jar:5.2.0]
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:92) ~[neo4j-common-5.2.0.jar:5.2.0]
at com.neo4j.dbms.database.EnterpriseDatabase.start(EnterpriseDatabase.java:47) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.database.DatabaseManager.startDatabase(DatabaseManager.java:179) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.database.DatabaseManager.forSingleDatabase(DatabaseManager.java:242) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.database.DatabaseManager.startDatabase(DatabaseManager.java:140) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.Transition$Prepared.doTransitionAction(Transition.java:94) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.Transition$Prepared.doTransition(Transition.java:83) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:257) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:258) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:258) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.doTransitionSteps(DbmsReconciler.java:245) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.executeJob(DbmsReconciler.java:203) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.DbmsReconciler.lambda$scheduleReconciliationJob$2(DbmsReconciler.java:187) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.ReconcilerJob.executeJob(ReconcilerJob.java:49) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at com.neo4j.dbms.ReconcilerJobManager$ReconciliationWorker.run(ReconcilerJobManager.java:192) ~[neo4j-dbms-enterprise-5.2.0.jar:5.2.0]
at org.neo4j.kernel.impl.scheduler.ThreadPool.lambda$asCallable$1(ThreadPool.java:136) ~[neo4j-kernel-5.2.0.jar:5.2.0]
at org.neo4j.kernel.impl.scheduler.ThreadPool.lambda$submit$0(ThreadPool.java:108) ~[neo4j-kernel-5.2.0.jar:5.2.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
2022-11-25 18:32:06.728+0000 INFO Neo4j Server shutdown initiated by request
2022-11-25 18:32:06.729+0000 INFO Stopped.

11 REPLIES 11

steggy
Neo4j
Neo4j

@Siddarth are you starting all 3 servers at once? You can't just start one in this instance, as it will fail to reach a quorum.

Yes i am starting all the 3 instance at once.

In the firewall layer also port 6000 port has been whitelisted. Not sure what i am missing here. I have followed as per the documentation.

steggy
Neo4j
Neo4j

@Siddarth , Clustering uses more than just port 6000. You'll want to check these in your neo4j.conf:

server.discovery.listen_address
server.cluster.listen_address
server.cluster.raft.listen_address
server.routing.listen_address

Defaults would be 5000, 6000, 7000, and 7688

@steggy . Below is the configuration in my neo4j conf file

server.default_listen_address=0.0.0.0
server.discovery.listen_address=0.0.0.0:5000
dbms.cluster.discovery.endpoints=10.185.168.49:5000,10.185.168.40:5000,10.185.168.45:5000
server.cluster.listen_address=0.0.0.0:6000
server.cluster.raft.listen_address=0.0.0.0:7000
server.routing.listen_address=0.0.0.0:7688

Please correct me if i need to make any changes here. Neo4j version is 5.2.0

They all look OK to me. Do you have all of those ports open in your security group?

@steggy Yes i have. Standalone instance is working fine as expected and able to telnet the host as well. But when i make the changes in the configuration and try to start the service its getting failed with the mentioned error.

Siddarth
Node

@steggy Further analyzing the debug logs getting the below error

2022-11-28 12:15:40.601+0000 INFO [c.n.d.DbmsReconciler] Database 'system' is requested to transition from INITIAL{db=system/00000000} to STARTED{db=system/00000000} by Startup:0
2022-11-28 12:15:40.607+0000 INFO [c.n.d.d.DatabaseManager] Creating db='DatabaseId{00000000[system]}'. options='DatabaseOptions[settings={}, mode=RAFT]'
2022-11-28 12:15:40.689+0000 WARN [a.s.Materializer] [outbound connection to [akka://cc-discovery-actor-system@ip-10-185-168-56.ec2.internal:5000], control stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(ip-10-185-168-56.ec2.internal/<unresolved>:5000,None,List(),Some(5000 milliseconds),true)] failed because of java.net.ConnectException: Connection refused
2022-11-28 12:15:40.691+0000 WARN [a.s.Materializer] [outbound connection to [akka://cc-discovery-actor-system@ip-10-185-168-48.ec2.internal:5000], message stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(ip-10-185-168-48.ec2.internal/<unresolved>:5000,None,List(),Some(5000 milliseconds),true)] failed because of java.net.ConnectException: Connection refused
2022-11-28 12:15:40.695+0000 WARN [a.s.Materializer] [outbound connection to [akka://cc-discovery-actor-system@ip-10-185-168-48.ec2.internal:5000], control stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(ip-10-185-168-48.ec2.internal/<unresolved>:5000,None,List(),Some(5000 milliseconds),true)] failed because of java.net.ConnectException: Connection refused
2022-11-28 12:15:40.695+0000 WARN [a.s.Materializer] [outbound connection to [akka://cc-discovery-actor-system@ip-10-185-168-56.ec2.internal:5000], message stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(ip-10-185-168-56.ec2.internal/<unresolved>:5000,None,List(),Some(5000 milliseconds),true)] failed because of java.net.ConnectException: Connection refused
2022-11-28 12:15:40.792+0000 INFO [c.n.c.u.s.TypicallyConnectToRandomSecondaryStrategy] [system/00000000] Using upstream selection strategy typically-connect-to-random-secondary

Any idea on this error?

steggy
Neo4j
Neo4j

connection refused... seems like a networking issue if those other servers are up

@steggy yes all the 3 servers are up and I have whitelisted the ips in the security group. When running as a standalone instance able to telnet to the port 5000 and all other ports. But when I make the changes in the configuration and start the cluster getting this error. Not sure what is happening behind the scene. Have also whitelisted the dynamic ports as well. There is no firewall also running inside the server. Rules are taken care in security group alone.

I have the same problem, did you find a solution