Neo4j cluster discovery failed using NodePort service in k8s

I have a 3-node kubernetes cluster where I deploy a 3-core neo4j cluster using the public neo4j chart: charts/stable/neo4j at master · helm/charts · GitHub

The 3-core neo would come up correctly using the DNS discovery type through the default headless service.

What I have been trying to achieve however is to have cluster members discover themselves through the NodePort services as a proof of concept that a neo cluster can be formed over two kubernetes clusters. However, each neo cluster member has been struck in stage: "Waiting for 3 members. Currently discovered 0 members: {}" and kept restarting after the timeout. Anyone see any problem with this approach and any suggestion to resolve this would be greatly appreciated. Thanks.

Below are what have been created/configured (btw, I tried using different node-ip for each initial_discovery_member and it did not help):

3 NodePort services for discovery:

  • core-0:5000->node-ip:31010
  • core-1:5000->node-ip:31011
  • core-2:5000->node-ip:31012

3 NodePort services for transaction:

  • core-0:6000->node-ip:31020
  • core-1:6000->node-ip:31021
  • core-2:6000->node-ip:31022

3 NodePort services for raft:

  • core-0:7000->node-ip:31030
  • core-1:7000->node-ip:31031
  • core-2:7000->node-ip:31032

And in the core-statefulset.yaml

  • NEO4J_causal__clustering_discovery__type: LIST
  • NEO4J_causal__clustering_initial__discovery__members: node-ip:31010,node-ip:31011,node-ip:31012
  • container.command:
    export NEO4J_causal__clustering_discovery__advertised__address=node-ip:$(31010+ordinal)
    export NEO4J_causal__clustering_transaction__advertised__address=node-ip:$(31020+ordinal)
    export NEO4J_causal__clustering_raft__advertised__address=node-ip:$(31030+ordinal)

Below is the container core-0 log :

2020-03-10 18:33:20.501+0000 INFO  ======== Neo4j 3.4.5 ========
2020-03-10 18:33:20.551+0000 INFO  Starting...
2020-03-10 18:33:22.478+0000 INFO  Initiating metrics...
2020-03-10 18:33:22.622+0000 INFO  My connection info: [
	Discovery:   listen=0.0.0.0:5000, advertised=192.168.96.9:31010,
	Transaction: listen=0.0.0.0:6000, advertised=192.168.96.9:31020,
	Raft:        listen=0.0.0.0:7000, advertised=192.168.96.9:31030,
	Client Connector Addresses: bolt://neo4j-neo4j-core-0.neo4j-neo4j.default.svc.cluster.local:7687,http://neo4j-neo4j-core-0.neo4j-neo4j.default.svc.cluster.local:7474,https://neo4j-neo4j-core-0.neo4j-neo4j.default.svc.cluster.local:7473
]
2020-03-10 18:33:22.623+0000 INFO  Discovering cluster with initial members: [192.168.96.9:31010, 192.168.96.9:31011, 192.168.96.9:31012]
2020-03-10 18:33:22.623+0000 INFO  Attempting to connect to the other cluster members before continuing...
2020-03-10 18:38:24.996+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@50825a02' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@50825a02' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".

And the one of the discovery service:

Name:                     neo4j-neo4j-discovery-0
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 statefulset.kubernetes.io/pod-name=neo4j-neo4j-core-0
Type:                     NodePort
IP:                       10.233.36.132
Port:                     discovery-0  5000/TCP
TargetPort:               5000/TCP
NodePort:                 discovery-0  31010/TCP
Endpoints:
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>