Unable to run neo4j v4 casual cluster with docker-compose or docker swarm

Iv'e been trying to deploy a v4 casual cluster on my dev machine (ubuntu) but it never makes it past the join phase. Also, I have tested this on an arch linux machine and another ubuntu machine, so I know it's not just my machine. Iv'e been getting the following errors:

  • Database 'system' is waiting for a total of 3 core members...
  • Clustering components for database 'system' have encountered a critical error Encountered error when attempting to reconcile database system from state 'EnterpriseDatabaseState{databaseId=DatabaseId{name='system', databaseId=DatabaseIdWithoutName{uuid=00000000-0000-0000-0000-000000000001}}, operatorState=STOPPED, failed=false}' to state 'online'

When checking debug.log I see this error repeated hundreds of times:
WARN [a.r.a.InboundHandshake$$anon$2] Dropping Handshake Request from [akka://cc-discovery-actor-system@65436ef85782:5000#-5597672656914033240] addressed to unknown local address [akka://cc-discovery-actor-system@core1:5000]. Local address is [akka://cc-discovery-actor-system@28db646210ca:5000]. Check that the sending system uses the same address to contact recipient system as defined in the 'akka.remote.artery.canonical.hostname' of the recipient system. The name of the ActorSystem must also match.

I have used the example from the documentation to deploy a v4 casual cluster

I also converted this into a docker-compose file which can be found here.

I'm not sure where to go from here, so any help is much appreciated!

Hi,

you might have a look at my example compose files:

as a starting point.

Bert

Greetings Eric,

There's an error (ticketed but not resolved yet) in the documentation ... We're using another (better) cluster coordinator in 4.0 and that requires a slightly different setup. For docker you can resolve it by adding a --hostname in every statement. So if you have a --name core1, you add (keep the name one too) a --hostname core1 as well.

Regards,
Tom

1 Like

Hello Tom,

I have the same problem with kubernetes: what is the solution ?

best regards
Eric

1 Like

Hi Guys,

I'm facing the same issue.

I used k8s and this chart specifically https://github.com/helm/charts/pull/20942.

When I change the following env variable from

export NEO4J_dbms_default__advertised__address=$(hostname -f)

to

export NEO4J_dbms_default__advertised__address=$(hostname -I | awk '{print $1}')

It works. The cluster members discover themselves.

Hello,

For me it does not change anything.
What a pitty

Hello,

To make it work I had to change the discovery mode from DNS to LIST:

      - name: NEO4J_causal__clustering_discovery__type
        value: LIST
      - name: NEO4J_causal__clustering_initial__discovery__members
        value: "db-neo4j-0.lb-neo4j-cores.testneo4j.svc.cluster.local:5000,db-neo4j-1.lb-neo4j-cores.testneo4j.svc.cluster.local:5000,db-neo4j-2.lb-neo4j-cores.testneo4j.svc.cluster.local:5000"

With DNS it uses the ip address of the node to call it, so you see the error.
With LIST it uses the dns name you give it to call the node.