Neo4j browser can't connect after some random time

Hi everyone, we're hosting a Neo4j instance on our VPS server. We're utilizing Docker compose, neo4j official image, traefik, and a custom TLS dumper that dumps the TLS certs that's been read from the traefik and written to neo4j cert files.

Every service in the compose file (an LLM service, and a GraphQL API service) can access to the bolt port. But, the problem begins after some random days (1 day in one instance, 4 days in another one), while the services can still access the bolt port for query, the neo4j browser can't access to the bolt port. Mind you, it works perfectly in the beginning. Another problem is that, there's no log output about this. Only the neo4j browser stops working.

Here is the compose.yaml file:

services:

  neo4j:
    image: neo4j:5.26.0-community-bullseye
    hostname: neo4j
    expose:
      - "7474"
      - "7687"
        #- "443"
    environment:
      - NEO4J_server_default__listen__address=0.0.0.0
      - NEO4J_server_default__advertised__address=${MY_DOMAIN}
      
      - NEO4J_server_databases_default__to__read__only=true
      - NEO4J_PLUGINS=["apoc"]
      - NEO4J_dbms_usage__report_enabled=false
        
      - NEO4J_server_bolt_tls__level=OPTIONAL
      - NEO4J_server_bolt_enabled=true

      - NEO4J_dbms_ssl_policy_bolt_enabled=true
      - NEO4J_dbms_ssl_policy_bolt_base__directory=/var/lib/neo4j/certificates/bolt
      - NEO4J_dbms_ssl_policy_bolt_private__key=private.key
      - NEO4J_dbms_ssl_policy_bolt_public__certificate=public.crt

      - NEO4J_dbms_ssl_policy_https_base__directory=/var/lib/neo4j/certificates/https
      - NEO4J_dbms_ssl_policy_https_private__key=private.key
      - NEO4J_dbms_ssl_policy_https_public__certificate=public.crt

        # Memory config
      - NEO4J_server_memory_heap_max__size=6g
      - NEO4J_db_memory_transaction_max=2g


    env_file:
      - path: ./neo4j-password.env
        required: true
    volumes:
      - /var/lib/neo4j/data:/var/lib/neo4j/data
      - neo4j-certificates:/var/lib/neo4j/certificates
    depends_on:
      - tls-dumper
    restart: always
    labels:
      - "traefik.enable=true"

      # Listen to
      - "traefik.http.routers.neo4j-router.rule=Host(`${MY_DOMAIN}`) && PathPrefix(`/db`)"

      # Proxy pass to
      - "traefik.http.routers.neo4j-router.service=neo4j-service"
      - "traefik.http.services.neo4j-service.loadbalancer.server.port=7474"

      # TLS config
      - "traefik.http.routers.neo4j-router.tls=true"
      - "traefik.http.routers.neo4j-router.tls.certresolver=letsencrypt"

      # Define a new middleware to strip the URL prefix before sending it to static-files
      - "traefik.http.middlewares.neo4j-proxypass.replacepathregex.regex=^/db/(.*)"
      - "traefik.http.middlewares.neo4j-proxypass.replacepathregex.replacement=/browser/$$1"
      - "traefik.http.middlewares.neo4j-add-slash.redirectregex.regex=(^.*/db$$)"
      - "traefik.http.middlewares.neo4j-add-slash.redirectregex.replacement=$$1/"

      # tell Traefik which middlewares we want to use on this container
      - "traefik.http.routers.neo4j-router.middlewares=neo4j-add-slash,neo4j-proxypass"

      # Neo4j browser websocket
      # Listen to
      - "traefik.http.routers.neo4j-bolt-router.rule=Host(`${MY_DOMAIN}`)"
        #&& PathPrefix(`/db`)"
      - "traefik.http.routers.neo4j-bolt-router.entrypoints=bolt"

      # Proxy pass to
      - "traefik.http.routers.neo4j-bolt-router.service=neo4j-bolt-service"
      - "traefik.http.services.neo4j-bolt-service.loadbalancer.server.port=7687"

      # TLS config
      - "traefik.http.routers.neo4j-bolt-router.tls=true"
      # - "traefik.http.routers.neo4j-bolt-router.tls.passthrough=true"
      - "traefik.http.routers.neo4j-bolt-router.tls.certresolver=letsencrypt"

      - "traefik.http.middlewares.neo4j-sslheader.headers.customrequestheaders.X-Forwarded-Proto=https,wss"
      - "traefik.http.routers.neo4j-bolt-router.middlewares=neo4j-sslheader"

      - "com.centurylinklabs.watchtower.enable=true"


  reverse-proxy:
    image: traefik:3
    command:
      - "--providers.docker"
      - "--providers.docker.exposedbydefault=false"

      # TLS Certification
      - "--certificatesresolvers.letsencrypt.acme.tlschallenge=true"
      - "--certificatesresolvers.letsencrypt.acme.email=hubiodatascilab@gmail.com"
      - "--certificatesresolvers.letsencrypt.acme.storage=${ACME_JSON_PATH}"
      - "--certificatesresolvers.letsencrypt.acme.caServer=https://acme-v02.api.letsencrypt.org/directory"
        #- "--certificatesresolvers.letsencrypt.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory"

      # HTTP -> HTTPS
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"

      - "--entryPoints.websecure.address=:443"
      - "--entryPoints.bolt.address=:7687"

      - "com.centurylinklabs.watchtower.enable=true"

    ports:
      - "80:80"
      - "443:443"
      - "7687:7687"
    volumes:
      # So that Traefik can listen to Docker events
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - letsencrypt:/letsencrypt

    restart: always


  tls-dumper:
    
    build: ./tls-dumper
    image: tls-dumper
    
    labels:
      traefik.enable: false
    
    environment:
      - ACME_JSON_PATH=${ACME_JSON_PATH}
      - MY_DOMAIN=${MY_DOMAIN}
    
    volumes:
      - letsencrypt:/letsencrypt
      - neo4j-certificates:/letsencrypt/neo4j-certificates
    
    depends_on:
      - reverse-proxy
    restart: always


  watchtower:
    image: containrrr/watchtower:1.7.1
    command:
      - "--label-enable"
      - "--rolling-restart"
      - "--interval"
      - "300"
      - "--include-restarting"
      - "--include-stopped"
      - "--revive-stopped"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

volumes:
  letsencrypt:
  neo4j-certificates:

The TLS dumper script is this:

echo "A change has been detected in TLS certificates."
echo "Generating Neo4j certificates."

cd /letsencrypt/neo4j-certificates

for certsource in bolt https ; do
   mkdir -p $certsource/trusted
   cp /letsencrypt/certs/$MY_DOMAIN/certificate.crt $certsource/public.crt
   cp /letsencrypt/certs/$MY_DOMAIN/privatekey.key $certsource/private.key
   cp /letsencrypt/certs/$MY_DOMAIN/certificate.crt $certsource/trusted/public.crt ;
done

chmod -R g+rx *

Several services are omitted to keep the compose file short.

Maybe one interesting part of the problem, is that after stopping the containers, running the host server's neo4j for several minutes, and running the containers again fixes the problem sometimes. There were cases that simply restarting the container didn't help, and launching and killing host neo4j proved to be useful. But there were other cases that even that didn't work.

You can check the live website https://crossbarv2.hubiodatalab.com

You should enter https://crossbarv2.hubiodatalab.com to the Connect URL.