Neo4j Community in Docker on ECS: how to stop Neo4j without stopping container?

chen.charles.c · September 4, 2024, 6:26pm

I've got Neo4j Community successfully deployed into ECS (via AWS Copilot, nonetheless).

To take a backup successfully, I need to be able to stop the Neo4j service. But it seems that neo4j stop will effectively terminate the container as well (I assume because it's the CMD being executed by the Dockerfile: docker-neo4j-publish/5.17.0/bullseye/community/Dockerfile at 7422ac53238f689a26144d3c1c5aee434a07a325 · neo4j/docker-neo4j-publish · GitHub).

Calling neo4j-admin database dump will also fail because of the lock held by the Neo4j service.

So I'm wondering if there's some mechanism here to successfully stop the service to get a backup without killing the container. As the container is running in ECS, the documentation regarding the Docker operations for offline backups is not valid. Everything else works otherwise and we have no issues as we test Neo4j for graph RAG, but we'd like to be able to dump the database for local testing and development as well.

And for anyone curious, here is the AWS Copilot manifest for this (note the sidecar used for the healthcheck):

# Your service name will be used in naming your resources like log groups, ECS services, etc.
name: neo4j-db
type: Load Balanced Web Service

# We use a sidecar to respond to the healthcheck so we can stop the neo4j instance
sidecars:
  health:
    port: 7470
    image: ACCOUNT.dkr.ecr.us-east-1.amazonaws.com/our-apps/health-sidecar

# Configuration for your containers and service.
image:
  location: docker.io/neo4j:5-community
  # Port exposed through your container to route traffic to it.
  port: 7474
  depends_on:
    health: start

cpu: 1024       # Number of CPU units for the task.
memory: 2048    # Amount of memory in MiB used by the task.
count: 1       # Number of tasks that should be running in your service.
exec: true     # Enable running commands in your container.
network:
  connect: true # Enable Service Connect for intra-environment traffic between services.

# See EFS: https://aws.github.io/copilot-cli/docs/developing/storage/#managed-efs
# This is the path inside the container
storage:
  volumes:
    neo4j_data_volume:
      efs:
        uid: 7474 # The UID of the neo4j user via id -u neo4j
        gid: 7474 # The GID of the neo4j user via id -g neo4j
      path: /data
      read_only: false

# This is a workaround; see:
# - https://github.com/aws/copilot-cli/issues/5907
# - https://github.com/aws/copilot-cli/issues/1292
secrets:
  NEO4J_PLUGINS: /copilot/${COPILOT_APPLICATION_NAME}/${COPILOT_ENVIRONMENT_NAME}/secrets/NEO4J_PLUGINS

variables:
  NEO4J_apoc_export_file_enabled: true
  NEO4J_apoc_import_file_enabled: true
  NEO4J_apoc_import_file_use__neo4j__config: true
  #NEO4J_PLUGINS: "['apoc', 'apoc-extended', 'graph-data-science']"
  NEO4J_dbms_security_procedures_unrestricted: apoc.*,gds.*,algo.*,spatial.*

# Cannot add a certificate to the NLB; must manually do it or use CF
nlb:
  port: 7687/tcp
  target_port: 7687
  stickiness: true

# Force recreate since Neo4j is holding a lock on the file system.
deployment:
  rolling: recreate

# Distribute traffic to your service.
http:
  # Import the existing ownit-shared-lb
  alb: arn:aws:elasticloadbalancing:us-east-1:ACCOUNT:loadbalancer/app/shared-beta-lb/RESOURCE_ID
  path: "/"
  deregistration_delay: 5s # Speeds up deploys
  redirect_to_https: true
  alias: "domain.example.com"
  hosted_zone: "ZONE_ID"
  healthcheck:
    path: "/health"
    port: 7470
    healthy_threshold: 2
    unhealthy_threshold: 3
    grace_period: 240s

hakan.lofqvist1 · September 5, 2024, 7:07am

Does this help? Dump and load databases (offline) - Operations Manual

chen.charles.c · September 5, 2024, 11:32am

Hakan,

That works locally, but in ECS, the container is being managed by the orchestrator.

hakan.lofqvist1 · September 5, 2024, 12:03pm

Are you using the helm charts?

chen.charles.c · September 5, 2024, 12:19pm

Hakan,

Not the case; I've deployed the container to AWS ECS Fargate.

hakan.lofqvist1 · September 5, 2024, 12:42pm

I see. I think you have to do some hacks then. Like a an additional task definition with a custom entrypoint.

chen.charles.c · September 5, 2024, 2:22pm

That is what I think is probably necessary, but wanted to see if there was some creative way around it!

hakan.lofqvist1 · September 5, 2024, 6:45pm

The least creative way, get neo4j enterprise

chen.charles.c · September 5, 2024, 7:02pm

Thanks for the tip, but we're not quite at that threshold yet.

We are still in the experimental phase with Neo4j as a graph-RAG platform so it remains to be seen if the gains justify the budget. Our current focus is to understand the benefit that can be realized by using a graph-based approach versus using a standard vector RAG approach.

Topic		Replies	Views
Stopping/Restarting Standalone Neo4J Community Edition 4.1.13 Newbie Questions	2	731	February 8, 2024
Can't stop neo4j in Ubuntu Operations	11	10668	September 6, 2021
Trying to understand why a neo4j docker instance shutdown by itself - how to start? Neo4j Graph Platform performance , operations	0	256	October 6, 2020
Neo4j google kubernetes dump is failed Orchestration & Kubernetes operations	3	882	March 3, 2021
Dumping in Community edition Neo4j Graph Platform backup , community	1	2385	July 9, 2020

Get Certified in June!

Neo4j Community in Docker on ECS: how to stop Neo4j without stopping container?

Related topics