"Failed to join a cluster with members"

james · August 26, 2019, 12:53pm

I'm trying to build a cluster of 3 core neo4j servers on AWS.

But am getting

"Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster."

I've got 3 EC2 instances in a private subnet, all in the same security group. the SG has egress to 0.0.0.0/0 and ingress from the private subnet on all ports (for now while I work through this set up).

Here's the command I'm using to run and configure my nodes (with some terraform templating as this is passed as user data to the EC2 instances):

docker run --name=neo4j-core --detach \
    --volume=/mnt/neo4j/data:/data \
    --volume=/mnt/neo4j/logs:/logs \
    --publish=7474:7474 --publish=7687:7687 \
    --publish=5000:5000 --publish=6000:6000 --publish=7000:7000 \
    --publish=2004:2004 \
    --env=NEO4J_dbms_memory_heap_initial__size=8G \
    --env=NEO4J_dbms_memory_heap_max__size=8G \
    --env=NEO4J_dbms_memory_pagecache_size=7G \
    --env=NEO4J_dbms_mode=CORE \
    --env=NEO4J_causal__clustering_minimum__core__cluster__size__at__formation=3 \
    --env=NEO4J_causal__clustering_minimum__core__cluster__size__at__runtime=3 \
    --env=NEO4J_causal__clustering_initial__discovery__members=neo4j-seed-1.pasabi:5000,neo4j-seed-2.pasabi:5000,neo4j-seed-3.pasabi:5000 \
    --env=NEO4J_causal__clustering_discovery__advertised__address=neo4j-seed-${node_name}.pasabi:5000 \
    --env=NEO4J_causal__clustering_transaction__advertised__address=neo4j-seed-${node_name}.pasabi:6000 \
    --env=NEO4J_causal__clustering_raft__advertised__address=neo4j-seed-${node_name}.pasabi:7000 \
    --env=NEO4J_dbms_connectors_default__advertised__address=neo4j-seed-${node_name}.pasabi \
    --env=NEO4J_dbms_connectors_default__listen__address=0.0.0.0 \
    --env=NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
    --env=NEO4J_metrics_enabled=true \
    --env=NEO4J_metrics_prometheus_enabled=true \
    --env=NEO4J_metrics_prometheus_endpoint=0.0.0.0:2004 \
    --env=NEO4J_ULIMIT_NOFILE=40000 \
    neo4j:3.5-enterprise

david_allen · August 27, 2019, 9:23pm

In general, cluster formation issues of this type require that you pull the debug.log file off of all three instances to really effectively debug them. The error message you're reporting:

"Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster."

The question is really which node -- or all 3? The debug.log file would have more information on the handshake between cluster nodes that would allow you to pinpoint what's going on.

As for the clusterId specifically, you'll want to do this against a fresh data mount, to avoid the possibility that one or more of the nodes has a clusterId from a previous cluster. Normally in the non-docker world we'd say to use neo4j-admin unbind to remove cluster state (docs here: Unbind a Neo4j cluster server - Operations Manual) but that isn't really an option in a docker distribution, so the better option is to make sure you're starting cold.

I'd recommend by diving into the debug.log files and looking north of the error message you're reporting, and seeing what's happening with service discovery between the nodes.

Topic		Replies	Views
Neo4j enterprise local cluster setup Neo4j Graph Platform migrated	1	120	September 16, 2022
First time Cluster in AWS : Connection Refused Cluster operations	2	1129	August 25, 2019
Neo4j enterprise local cluster setup Cluster cluster	0	464	July 22, 2020
Error while Set up a local Causal Cluster Cluster	3	1064	September 22, 2020
Neo4j Clustering needs help.(Implemeation AWS marketplace Neo4j causal cluster) Cluster aws	2	469	July 15, 2021

"Failed to join a cluster with members"

Related topics