Can't form cluster using DNS as discovery type


(Henry) #1

If I use the default LIST and put the individual IPs of the 3 servers, the cluster forms successfully. I need to use DNS but somehow it gets stuck at "Attempting to connect to the other cluster members before continuing...". Any ideas?

Nov 08 19:33:10 ip-172-31-13-183 neo4j[2011]: 2018-11-08 19:33:10.932+0000 INFO  Initiating metrics...
Nov 08 19:33:11 ip-172-31-13-183 neo4j[2011]: 2018-11-08 19:33:11.220+0000 INFO  Resolved initial host 'mycluster.mydomain.com:5000' to [172.31.13.183:5000, 172.31.3.18
Nov 08 19:33:11 ip-172-31-13-183 neo4j[2011]: 2018-11-08 19:33:11.282+0000 INFO  My connection info: [
Nov 08 19:33:11 ip-172-31-13-183 neo4j[2011]:         Discovery:   listen=0.0.0.0:5000, advertised=172.31.13.183:5000,
Nov 08 19:33:11 ip-172-31-13-183 neo4j[2011]:         Transaction: listen=0.0.0.0:6000, advertised=172.31.13.183:6000,
Nov 08 19:33:11 ip-172-31-13-183 neo4j[2011]:         Raft:        listen=0.0.0.0:7000, advertised=172.31.13.183:7000,
Nov 08 19:33:11 ip-172-31-13-183 neo4j[2011]:         Client Connector Addresses: bolt://172.31.13.183:7687,http://172.31.13.183:7474,https://172.31.13.183:7473
Nov 08 19:33:11 ip-172-31-13-183 neo4j[2011]: ]
Nov 08 19:33:11 ip-172-31-13-183 neo4j[2011]: 2018-11-08 19:33:11.290+0000 INFO  Discovering cluster with initial members: [mycluster.mydomain.com:5000]
Nov 08 19:33:11 ip-172-31-13-183 neo4j[2011]: 2018-11-08 19:33:11.290+0000 INFO  Attempting to connect to the other cluster members before continuing...

(Henry) #2

This is what shows in syslog.

Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: 2018-11-08 22:37:43.139+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@17ae7628' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@17ae7628' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@17ae7628' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:220)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:111)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:79)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at com.neo4j.server.enterprise.CommercialEntryPoint.main(CommercialEntryPoint.java:22)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@17ae7628' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:212)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011... 3 more
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, /var/lib/neo4j/data/databases/graph.db
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:212)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.<init>(CommercialCoreGraphDatabase.java:35)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.<init>(CommercialCoreGraphDatabase.java:26)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at com.neo4j.server.enterprise.CommercialNeoServer.lambda$static$0(CommercialNeoServer.java:29)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:88)
Nov  8 22:37:43 ip-172-31-2-91 neo4j[4004]: #011at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)

(M. David Allen) #3

When you use discovery type DNS, it expects a single DNS entry with multiple A records pointing to your individual cluster nodes.

Part of your log message is chopped off. Can you paste the DNS record lookup to verify you have 3 A records behind that mycluster.mydomain.com address?

Also please paste a snippet of your neo4j.conf where you show your cluster settings. Finally -- you're showing logs from one machine, what are the other two saying?

In your debug.log file (generally in /var/log/neo4j) you can also find cluster lifecycle messages. I'd check these logs as well because they'll tell you which of the 3 are discovering which others.