cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 2: migrate recent content

Causal Cluster Deployment using Terraform

TheSch
Node

Hi there,

I followed the instructions from here: https://neo4j.com/docs/operations-manual/4.4/kubernetes/quickstart-cluster/server-setup/ and deployed a cluster of three core members using Terraform. 

Used helm-charts: https://github.com/neo4j/helm-charts/releases/tag/4.4.10

Used neo4j version:  Neo4j 4.4.11

 The code structure is as follows:

module/neo4j: 
-main.tf
-variables.tf
--core-1/main.tf
--core-1/variables.tf
--core-1/core-1.values.yaml
--core-2/main.tf
--core-2/variables.tf
--core-2/core-2.values.yaml
--core-3/main.tf
--core-3/variables.tf
--core-3/core-3.values.yaml

So the root main.tf creates modules of each core. Nothing special, nothing fancy.

The problem I am facing is: The deployment passes like only 1 out of 10 times. Whenever the deployment fails, it is due to a time-out of the Terraform helm_release of one or two core members stating: "Secret "neo4j-cluster-auth" exists. 

Looking into the log of the one (or two) members already deployed, the startup failed, because the cluster is missing members. (initialDelaySeconds have been configured for each core member and have been increased testwise too)

TheSch_0-1668675634470.png

 

2022-11-17 08:59:22.738+0000 ERROR Failed to start Neo4j on 0.0.0.0:7474.
java.lang.RuntimeException: Error starting Neo4j database server at /var/lib/neo4j/data/databases
	at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:227) ~[neo4j-4.4.11.jar:4.4.11]
	at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.build(DatabaseManagementServiceFactory.java:180) ~[neo4j-4.4.11.jar:4.4.11]
	at com.neo4j.causalclustering.core.CoreGraphDatabase.createManagementService(CoreGraphDatabase.java:38) ~[neo4j-causal-clustering-4.4.11.jar:4.4.11]
	at com.neo4j.causalclustering.core.CoreGraphDatabase.<init>(CoreGraphDatabase.java:30) ~[neo4j-causal-clustering-4.4.11.jar:4.4.11]
	at com.neo4j.server.enterprise.EnterpriseManagementServiceFactory.createManagementService(EnterpriseManagementServiceFactory.java:34) ~[neo4j-enterprise-4.4.11.jar:4.4.11]
	at com.neo4j.server.enterprise.EnterpriseBootstrapper.createNeo(EnterpriseBootstrapper.java:20) ~[neo4j-enterprise-4.4.11.jar:4.4.11]
	at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:142) [neo4j-4.4.11.jar:4.4.11]
	at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:95) [neo4j-4.4.11.jar:4.4.11]
	at com.neo4j.server.enterprise.EnterpriseEntryPoint.main(EnterpriseEntryPoint.java:24) [neo4j-enterprise-4.4.11.jar:4.4.11]
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'com.neo4j.dbms.ClusteredDbmsReconcilerModule@5c2ae7d7' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join or bootstrap a raft group with id RaftGroupId{00000000} and members RaftMembersSnapshot{raftGroupId=Not yet published, raftMembersSnapshot={ServerId{c72f54d8}=Published as : RaftMemberId{c72f54d8}}} in time. Please restart the cluster. Clue: not enough cores found".
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:463) ~[neo4j-common-4.4.11.jar:4.4.11]
	at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:110) ~[neo4j-common-4.4.11.jar:4.4.11]
	at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:218) ~[neo4j-4.4.11.jar:4.4.11]
	... 8 more

 

Hope someone can help! Thanks a lot, 
Cheers Theresa 

0 REPLIES 0