"Failed to join a cluster with members"

I'm trying to build a cluster of 3 core neo4j servers on AWS.

But am getting

"Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster."

I've got 3 EC2 instances in a private subnet, all in the same security group. the SG has egress to 0.0.0.0/0 and ingress from the private subnet on all ports (for now while I work through this set up).

Here's the command I'm using to run and configure my nodes (with some terraform templating as this is passed as user data to the EC2 instances):

docker run --name=neo4j-core --detach \
    --volume=/mnt/neo4j/data:/data \
    --volume=/mnt/neo4j/logs:/logs \
    --publish=7474:7474 --publish=7687:7687 \
    --publish=5000:5000 --publish=6000:6000 --publish=7000:7000 \
    --publish=2004:2004 \
    --env=NEO4J_dbms_memory_heap_initial__size=8G \
    --env=NEO4J_dbms_memory_heap_max__size=8G \
    --env=NEO4J_dbms_memory_pagecache_size=7G \
    --env=NEO4J_dbms_mode=CORE \
    --env=NEO4J_causal__clustering_minimum__core__cluster__size__at__formation=3 \
    --env=NEO4J_causal__clustering_minimum__core__cluster__size__at__runtime=3 \
    --env=NEO4J_causal__clustering_initial__discovery__members=neo4j-seed-1.pasabi:5000,neo4j-seed-2.pasabi:5000,neo4j-seed-3.pasabi:5000 \
    --env=NEO4J_causal__clustering_discovery__advertised__address=neo4j-seed-${node_name}.pasabi:5000 \
    --env=NEO4J_causal__clustering_transaction__advertised__address=neo4j-seed-${node_name}.pasabi:6000 \
    --env=NEO4J_causal__clustering_raft__advertised__address=neo4j-seed-${node_name}.pasabi:7000 \
    --env=NEO4J_dbms_connectors_default__advertised__address=neo4j-seed-${node_name}.pasabi \
    --env=NEO4J_dbms_connectors_default__listen__address=0.0.0.0 \
    --env=NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
    --env=NEO4J_metrics_enabled=true \
    --env=NEO4J_metrics_prometheus_enabled=true \
    --env=NEO4J_metrics_prometheus_endpoint=0.0.0.0:2004 \
    --env=NEO4J_ULIMIT_NOFILE=40000 \
    neo4j:3.5-enterprise

In general, cluster formation issues of this type require that you pull the debug.log file off of all three instances to really effectively debug them. The error message you're reporting:

"Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster."

The question is really which node -- or all 3? The debug.log file would have more information on the handshake between cluster nodes that would allow you to pinpoint what's going on.

As for the clusterId specifically, you'll want to do this against a fresh data mount, to avoid the possibility that one or more of the nodes has a clusterId from a previous cluster. Normally in the non-docker world we'd say to use neo4j-admin unbind to remove cluster state (docs here: https://neo4j.com/docs/operations-manual/current/tools/unbind/) but that isn't really an option in a docker distribution, so the better option is to make sure you're starting cold.

I'd recommend by diving into the debug.log files and looking north of the error message you're reporting, and seeing what's happening with service discovery between the nodes.