Startup probe failed: dial tcp <pod_ip_address>:7687: connect: connection refused

emil · August 22, 2022, 5:16pm

Hi Folks,

I hope someone can point me in the right direction to find the root cause of this error. I have an Azure k8s instance of Neo4j, which I installed using helm with minimal custom config. Using azure disk like this:

volumes:

data:

mode: "volume"

volume:

  azureDisk:

    diskName: "neo4j-disk-stn"

    diskURI: "/subscriptions/696xxx-xxx-xxx6-xx-xxxx/resourceGroups/MC\_rg-kube-xxx-westeu-02\_kube-xxx-02\_westeu/providers/Microsoft.Compute/disks/neo4j-disk-stn"

    kind: Managed

Devs are using it for evaluation and today I just spent hours troubleshooting trying to get it up and running after the azureDisk was resized to 256GB by the devs.

The pod is in a crash loop doesn't start at all it fails with the following:

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  24m                    default-scheduler  Successfully assigned neo4j-ee-stn/neo4j-ee-stn-release-0 to aks-neo4j-33193140-vmss00002g
  Warning  Unhealthy  23m (x6 over 24m)      kubelet            Startup probe failed: dial tcp 10.244.0.15:7687: connect: connection refused
  Normal   Pulled     23m (x4 over 24m)      kubelet            Container image "neo4j:4.4.5-enterprise" already present on machine
  Normal   Created    23m (x4 over 24m)      kubelet            Created container neo4j
  Normal   Started    23m (x4 over 24m)      kubelet            Started container neo4j
  Warning  BackOff    4m16s (x100 over 24m)  kubelet            Back-off restarting failed container

SSL is disabled & I have also increased the startupProbe -> failureTreshhold explicitly;

Any ideas how to tackle this one please?

Thanks!

Emil

TrevorS · August 24, 2022, 7:00pm

Hello @emil

I did some searching and found a similar OpenStack ticket regarding kubernetes errors similar to this.
Here is what I was able to find!
https://stackoverflow.com/questions/61303668/kubernetes-readiness-probe-failed-dial-tcp-10-244-0-105000-connect-connectio
I hope this helps!

emil · August 25, 2022, 6:39am

Hi @TrevorS,

I appreciate your response.

It turned out the neo4j pod could not mount the newly resized disk that's why it was failing.

The startup probe was failing correctly, but I didn't even think the disk could be the issue (despite pod volume mount errors) until I tried to mount it on a VM and got more specific mount errors. I thought it could be something related more to the neo4j config, but I was wrong in my assumptions.

neo4j pod volume mount error:

Warning FailedMount 81s kubelet
Unable to attach or mount volumes:
unmounted volumes=[data], unattached volumes=[neo4j-conf data kube-api-access-fmwws]: timed out waiting for the condition

Trying to mount the azure disk under ubuntu:

mount: /neo4j: wrong fs type, bad option, bad superblock on /dev/sdd, missing codepage or helper program, or other error.

Trying to fix the disk under ubuntu shows:

fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
/dev/sdd: recovering journal
fsck.ext2: unable to set superblock flags on /dev/sdd

/dev/sdd: ***** FILE SYSTEM WAS MODIFIED *****

/dev/sdd: ********** WARNING: Filesystem still has errors **********

I hope this helps someone else with similar issue, however I would be interested if some has experience with any disk tools that could help recover from this azure disk failure.

Obviously the solution was to change the azure disk and to run helm update ...

`Thanks again!

Emil

Topic		Replies	Views
Can't Connect to the neo4j pod on Kubernetes Installation	0	606	November 9, 2020
Permission denied when restarting the neo4j kubernetes pod Aura & Cloud kubernetes	10	3240	September 27, 2021
Cant seem to mount volume with neo4j 3.3.9 (path in use?) Aura & Cloud kubernetes	5	3495	September 9, 2019
Neo4j 3.5.1 available on Azure Marketplace (single instance and cluster) Aura & Cloud microsoft , azure , vm	5	1540	February 15, 2019
Running a neo4j cluster on Amazon EKS kubernetes Aura & Cloud kubernetes , aws , cluster , js-driver	10	5759	March 13, 2019

Startup probe failed: dial tcp <pod_ip_address>:7687: connect: connection refused

Related topics