Neo4j Docker Container - Load Seed JSON Data at Startup

Hi, I am spinning up a docker container to run neo4j. I need to spin up the container and after it has started I would like to be able to run some cypher code to load a json file and run a couple of other queries in order to initialise the database so that when the browser is opened the initial graph is presented, is this possible?

I have read somewhere that there is an environment tag called EXTENSION_SCRIPT that can be used to execute scripts but I dont know if this the answer as the little test that I did with a simple 'echo' statement script appeared to be executed before the container had started as I see my message echoed to the the output in console window before I see statements about starting and then started.

This is still all very new to me, both neo4j and docker, so any pointers would really help.

Hi @andy_mcshane , please, check my blog and let me know if you need more assistance. You can modify as per your needs.

Thank you for your assistance, this is very helpful but I dont think it quite addresses my requirement as I wont have the option of being able to build the image.

A bit more detail may help. I have an application that will generate a json file as the output of a task. This generated output will change every time that it is generated. Once this json has been generated I then need to spin up an instance of neo4j using docker and once the container is running then load this json into the default neo4j database. I have also just had the requirement added to be able to add any number of specified query files to the 'Favourites' list available to neo4j browser at the same time (that seems like a long shot to me but that remains to be seen).

Thank you for your help and patience.

Hi @andy_mcshane , mine is an example of how to use a csv file on the disk and have it load into a docker. you can change the csv into json file of your choice.
i would recommend you to use Apache Airflow or other orchestration software's and devops to build the workflow.
Apache airflow can trigger jobs and whenever Jenkins or something like Travis CI CD outputs a file.
For favorites - you have 2 options - 1. Sync Neo4j with github 2. Create a custom autostart play when opening the browser.

Thank you replying so quickly. I have followed your blog and It now makes more sense to me and I cannot really see that much of a downside to building the container each time as it doesn't take long so I will have to argue that one out with the powers that be. I do still have a problem though as even after following your guide I cannot get my container to run, (I suspect it is at the point of attempting to run the cypher script?), as I get this error:

Unable to connect to localhost:7687, ensure the database is running and that there is a working network connection to it.

maybe a permissions error or is it because of trying to run cypher-shell in localhost? I have no idea!

Thank you very much for your assistance.

are you connecting from outside or inside the container ? if outside, I haven't exposed the ports.

You can access the ports either 3 ways, (1) use [EXPOSE] in the Dockerfile (Dockerfile reference | Docker Documentation), (2) start the docker with -p in the command line -> (neo4js is my built image name) (3) create a docker-compose file and add ports

docker run -it -d -p7474:7474 -p7687:7687 -p7473:7473 neo4j:seed

you can use -d to run in the background

PS C:\git> cd .\neo4j_forum\neo4j_seed_docker\
PS C:\git\neo4j_forum\neo4j_seed_docker> docker build -t neo4j:seed .
[+] Building 11.5s (9/9) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                 0.1s 
 => => transferring dockerfile: 32B                                                                                                                                  0.0s 
 => => transferring context: 2B                                                                                                                                      0.0s 
 => [internal] load metadata for docker.io/library/neo4j:latest                                                                                                     11.2s 
 => [auth] library/neo4j:pull token for registry-1.docker.io                                                                                                         0.0s 
 => [1/3] FROM docker.io/library/neo4j@sha256:c7f24de1dc1d2020ab24a884b8a39538937c1b14bc0ca1da3ddb2573b6fc412f                                                       0.0s 
 => [internal] load build context                                                                                                                                    0.0s 
 => => transferring context: 73B                                                                                                                                     0.0s 
 => CACHED [2/3] COPY countries.csv /var/lib/neo4j/import/                                                                                                           0.0s 
 => CACHED [3/3] COPY data_loader.cypher /var/lib/neo4j/import/                                                                                                      0.0s 
 => exporting to image                                                                                                                                               0.1s 
 => => exporting layers                                                                                                                                              0.0s 
 => => writing image sha256:ac7113b7e0ae6abe7145f2d112dfbbe9b45aa6c6eb4e4147cfffbff691185cde                                                                         0.0s 
 => => naming to docker.io/library/neo4js                                                                                                                            0.0s 
PS C:\git\neo4j_forum\neo4j_seed_docker> docker run -it -p7474:7474 -p7687:7687 -p7473:7473  neo4j:seed
Changed password for user 'neo4j'.
Directories in use:
  home:         /var/lib/neo4j
  config:       /var/lib/neo4j/conf
  logs:         /logs
  plugins:      /var/lib/neo4j/plugins
  import:       /var/lib/neo4j/import
  data:         /var/lib/neo4j/data
  certificates: /var/lib/neo4j/certificates
  run:          /var/lib/neo4j/run
Starting Neo4j.
Started neo4j (pid 246). It is available at http://localhost:7474/
There may be a short delay until the server is ready.
See /logs/neo4j.log for current status.
neo4j@6b9f8e01107b:~$

PS C:\git> docker exec -it blissful_ride bash
root@6b9f8e01107b:/var/lib/neo4j# cypher-shell 
username: neo4j
password: **********
Connected to Neo4j 4.2.0 at neo4j://localhost:7687 as user neo4j.      
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> match (n) return (n);
+-----------------------------------------------------------+
| n                                                         |
+-----------------------------------------------------------+
| (:Country {id: "AF", countryName: "Afghanistan"})         |
| (:Country {id: "AL", countryName: "Albania"})             |
| (:Country {id: "DZ", countryName: "Algeria"})             |
| (:Country {id: "AS", countryName: "American Samoa"})      |
| (:Country {id: "AD", countryName: "Andorra"})             |
| (:Country {id: "AO", countryName: "Angola"})              |
| (:Country {id: "AI", countryName: "Anguilla"})            |
| (:Country {id: "AQ", countryName: "Antarctica"})          |
| (:Country {id: "AG", countryName: "Antigua And Barbuda"}) |
+-----------------------------------------------------------+

9 rows available after 38 ms, consumed after another 2 ms
neo4j@neo4j>

Hi, I am wondering if it is because I am trying to use a docker-compose file and dockerfile together?
I have copied your dockerfile and only changed the import file name so currently have this

FROM neo4j

ENV NEO4J_HOME="/var/lib/neo4j" \

    NEO4J_PASSWD=neo4j_seed

    

COPY Dependencies_Tree_Full_UnSorted.json ${NEO4J_HOME}/import/

COPY data_loader.cypher ${NEO4J_HOME}/import/

# set initial-password to start loading the data

# sleep for 10 secs for neo4j to start without any overlapping

CMD bin/neo4j-admin set-initial-password ${NEO4J_PASSWD} && \

    bin/neo4j start && sleep 30 && \

    if [ -f "${NEO4J_HOME}/import/data_loader.cypher" ]; then  \

        cat ${NEO4J_HOME}/import/data_loader.cypher | NEO4J_USERNAME=neo4j NEO4J_PASSWORD=${NEO4J_PASSWD} bin/cypher-shell --fail-fast && rm ${NEO4J_HOME}/import/*; \

    fi && /bin/bash

I am then trying to execute this via a docker-compose file that I have been previously using but added the 'build: .' command. This is my docker-compose file that was working:

version: '3.4'

services:

  neo4j:

    container_name: neo4j_custom

    image: neo4j:4.2.2

    restart: unless-stopped

    build: .

    ports:

      - 7474:7474

      - 7687:7687

    volumes:

      # Set volumes and paths as required

      #- ./conf:/conf

      #- ./data:/data

      #- ./import:/import

      #- ./logs:/logs

      #- ./plugins:/plugins

      - D:\Neo4j\conf:/conf

      - D:\Neo4j\data:/data

      - D:\Neo4j\import:/import

      - D:\Neo4j\logs:/logs

      - D:\Neo4j\plugins:/plugins

    environment:

      # Authentication - temporsry default used will need changing for deployment

      - NEO4J_AUTH=neo4j/neo4j_seed

      # Raise memory limits

      - NEO4J_dbms_memory_pagecache_size=1G

      - NEO4J_dbms.memory.heap.initial_size=1G

      - NEO4J_dbms_memory_heap_max__size=1G

      # Install plugins as an array of items

      - NEO4JLABS_PLUGINS=["apoc"]

      # Enable APOC to import files

      - NEO4J_apoc_import_file_enabled=true

I am guessing that I have conflicts between the docker-compose file and dockerfile?

Again, thanks for your patience, as frustrating as this learning curve is I am quite enjoying it, I am getting there slowly!

since you are already pulling the neo4j image in Dockerfile, you don;t need this line in docker-compose.yml file -> image: neo4j:4.2.2

first time -> docker-compose up --build
next time, just docker-compose up

I did previously try what you suggest and commented out that line in my docker-compose file but that gives me another different error:

ERROR: The Compose file is invalid because:
Service neo4j has neither an image nor a build context specified. At least one must be provided.

are both the files in the same directory / folder ?
the dockerfile name should be like "Dockerfile" with no extension

Yes they are. I have tried running the dockerfile directly on its own using

docker run -it -d -p7474:7474 -p7687:7687 neo4j:seed

but then at startup I am back to the error:

`Changed password for user 'neo4j'.`

`Unable to connect to localhost:7687, ensure the database is running and that there is a working network connection to it.`

Yes the file is just called 'dockerfile', no extension.

Doh! My bad, I commented out the image line BUT also left the 'build' context commented out. Silly me.

Now I am just back to the original error

Unable to connect to localhost:7687, ensure the database is running and that there is a working network connection to it.

can you share the output of docker-compose ?

Sure, this is the output

Building neo4j
Step 1/5 : FROM neo4j:4.2.2
 ---> 9edee9e153ab
Step 2/5 : ENV NEO4J_HOME="/var/lib/neo4j"     NEO4J_PASSWD=neo4j_dependency
 ---> Using cache
 ---> 4a726ca36728
Step 3/5 : COPY Dependencies_Tree_Full_UnSorted.json ${NEO4J_HOME}/import/
 ---> Using cache
 ---> c4bf9966c2e5
Step 4/5 : COPY data_loader.cypher ${NEO4J_HOME}/import/
 ---> Using cache
 ---> 8e4b5613b8f9
Step 5/5 : CMD bin/neo4j start && sleep 10 &&     if [ -f "${NEO4J_HOME}/import/data_loader.cypher" ]; then          cat ${NEO4J_HOME}/import/data_loader.cypher | NEO4J_USERNAME=neo4j NEO4J_PASSWORD=${NEO4J_PASSWD} bin/cypher-shell --fail-fast && rm ${NEO4J_HOME}/import/*;     fi && /bin/bash
 ---> Running in b3985159a9d3
Removing intermediate container b3985159a9d3
 ---> 26e860ffa4cf

Successfully built 26e860ffa4cf
Successfully tagged deltafsdatamigrationdependency_neo4j:latest
Recreating neo4j_custom ... done
Attaching to neo4j_custom
neo4j_custom | grep: /var/lib/neo4j/conf/neo4j.conf: No such file or directory
neo4j_custom | Fetching versions.json for Plugin 'apoc' from https://neo4j-contrib.github.io/neo4j-apoc-procedures/versions.json
neo4j_custom | Installing Plugin 'apoc' from https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.2.0.1/apoc-4.2.0.1-all.jar to /plugins/apoc.jar 
neo4j_custom | Applying default values for plugin apoc to neo4j.conf
neo4j_custom | Directories in use:
neo4j_custom |   home:         /var/lib/neo4j     
neo4j_custom |   config:       /var/lib/neo4j/conf
neo4j_custom |   logs:         /logs
neo4j_custom |   plugins:      /plugins
neo4j_custom |   import:       /import
neo4j_custom |   data:         /var/lib/neo4j/data
neo4j_custom |   certificates: /var/lib/neo4j/certificates
neo4j_custom |   run:          /var/lib/neo4j/run
neo4j_custom | Starting Neo4j.
neo4j_custom | Started neo4j (pid 329). It is available at http://localhost:7474/
neo4j_custom | There may be a short delay until the server is ready.
neo4j_custom | See /logs/neo4j.log for current status.
neo4j_custom | Unable to connect to localhost:7687, ensure the database is running and that there is a working network connection to it.

It seems I am so close but for the 'unable to connect' issue!

Hi @andy.mcshane , gotcha ... modified the dockerfile

FROM neo4j
ENV NEO4J_HOME="/var/lib/neo4j" \
    NEO4J_PASSWD=neo4j_seed

COPY Dependencies_Tree_Full_UnSorted.json ${NEO4J_HOME}/import/
COPY data_loader.cypher ${NEO4J_HOME}/import/
# set initial-password to start loading the data
# sleep for 10 secs for neo4j to start without any overlapping
CMD bin/neo4j-admin set-initial-password ${NEO4J_PASSWD} && \
    bin/neo4j start && sleep 30 && \
    if [ -f "${NEO4J_HOME}/import/data_loader.cypher" ]; then  \
        cat ${NEO4J_HOME}/import/data_loader.cypher | NEO4J_USERNAME=neo4j NEO4J_PASSWORD=${NEO4J_PASSWD} bin/cypher-shell --fail-fast && rm ${NEO4J_HOME}/import/*; \
    fi && tail -f /logs/neo4j.log

added tail -f /logs/neo4j.log at the end to keep it running

Log

Creating network "everyday_default" with the default driver
Building neo4j
Step 1/5 : FROM neo4j
latest: Pulling from library/neo4j
a076a628af6f: Pull complete
943d8acaac04: Pull complete
b9998d19c116: Pull complete
eba5b958e041: Pull complete
b8d0884b547f: Pull complete
4b3572cb5079: Pull complete
e743ea4f2800: Pull complete
020ba241c011: Pull complete
Digest: sha256:c7f24de1dc1d2020ab24a884b8a39538937c1b14bc0ca1da3ddb2573b6fc412f
Status: Downloaded newer image for neo4j:latest
 ---> 9edee9e153ab
Step 2/5 : ENV NEO4J_HOME="/var/lib/neo4j"     NEO4J_PASSWD=neo4j_seed
 ---> Running in 0ac728b0e379
Removing intermediate container 0ac728b0e379
 ---> 8403c4c5b83a
Step 3/5 : COPY Dependencies_Tree_Full_UnSorted.json ${NEO4J_HOME}/import/
 ---> 173896eaffd9
Step 4/5 : COPY data_loader.cypher ${NEO4J_HOME}/import/
 ---> 254b3704b87f
Step 5/5 : CMD bin/neo4j-admin set-initial-password ${NEO4J_PASSWD} &&     bin/neo4j start && sleep 30 &&     if [ -f "${NEO4J_HOME}/import/data_loader.cypher" ]; then
 cat ${NEO4J_HOME}/import/data_loader.cypher | NEO4J_USERNAME=neo4j NEO4J_PASSWORD=${NEO4J_PASSWD} bin/cypher-shell --fail-fast && rm ${NEO4J_HOME}/import/*;     fi && tail -f 
/logs/neo4j.log
 ---> Running in f2a1ecb14a24
Removing intermediate container f2a1ecb14a24
 ---> 6ec65332f690

Successfully built 6ec65332f690
Successfully tagged everyday_neo4j:latest
Creating neo4j_custom ... done
Attaching to neo4j_custom
neo4j_custom | grep: /var/lib/neo4j/conf/neo4j.conf: No such file or directory
neo4j_custom | Changed password for user 'neo4j'.
neo4j_custom | Directories in use:
neo4j_custom |   home:         /var/lib/neo4j
neo4j_custom |   config:       /var/lib/neo4j/conf
neo4j_custom |   logs:         /logs
neo4j_custom |   plugins:      /plugins
neo4j_custom |   import:       /import
neo4j_custom |   data:         /var/lib/neo4j/data
neo4j_custom |   certificates: /var/lib/neo4j/certificates
neo4j_custom |   run:          /var/lib/neo4j/run
neo4j_custom | Starting Neo4j.
neo4j_custom | Started neo4j (pid 292). It is available at http://localhost:7474/
neo4j_custom | There may be a short delay until the server is ready.
neo4j_custom | See /logs/neo4j.log for current status.
neo4j_custom | n
neo4j_custom | (:person {name: "test"})
neo4j_custom | 2021-01-25 19:20:09.489+0000 INFO  Started.
neo4j_custom | 2021-01-25 19:20:49.736+0000 WARN  The client is unauthorized due to authentication failure.
neo4j_custom | 2021-01-25 19:26:43.119+0000 WARN  Unrecognized setting. No declared setting with name: PASSWD
neo4j_custom | 2021-01-25 19:26:43.136+0000 INFO  Starting...
neo4j_custom | 2021-01-25 19:26:46.126+0000 INFO  ======== Neo4j 4.2.2 ========
neo4j_custom | 2021-01-25 19:26:48.614+0000 INFO  Performing postInitialization step for component 'security-users' with version 2 and status CURRENT
neo4j_custom | 2021-01-25 19:26:48.615+0000 INFO  Updating the initial password in component 'security-users'
neo4j_custom | 2021-01-25 19:26:53.358+0000 INFO  Bolt enabled on 0.0.0.0:7687.
neo4j_custom | 2021-01-25 19:26:55.173+0000 INFO  Remote interface available at http://localhost:7474/
neo4j_custom | 2021-01-25 19:26:55.175+0000 INFO  Started.

my cypher query

merge (a:person{name:"test"});

verification

PS C:\scratch> docker exec -it neo4j_custom bash
root@cb3cf7d20ca6:/var/lib/neo4j# cypher-shell 
username: neo4j
password: **********
Connected to Neo4j 4.2.0 at neo4j://localhost:7687 as user neo4j.      
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> match (n) return (n);
+--------------------------+
| n                        |
+--------------------------+
| (:person {name: "test"}) |
+--------------------------+

1 row available after 5 ms, consumed after another 1 ms
neo4j@neo4j>

hank you so much for your help. What you suggest does indeed get past the connection issue if I run the dockerfile in islation. If I run the docker-compose file using build to run the dockerfile the connection still exists. I seem to be opening a can of worms here. I am going to try to continue just with the dockerfile and see how much further I can get, I need to figure out how to add the apoc plugin and then verify the COPY commands now though. :grinning:

  1. If you are adding apoc and gds in docker-compose, you can use
  - NEO4JLABS_PLUGINS=["apoc","graph-data-science"]

If you are adding in Dockerfile - gds is a zip file, and neo4j image doesnt have unzip, so you have to trick something like this -

FROM alpine:latest
WORKDIR /tmp
ADD https://s3-eu-west-1.amazonaws.com/com.neo4j.graphalgorithms.dist/graph-data-science/neo4j-graph-data-science-1.4.1-standalone.zip /tmp
RUN unzip neo4j-graph-data-science-1.1.6-standalone.zip

FROM neo4j
ENV NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
    NEO4J_HOME="/var/lib/neo4j"
  
ADD https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.2.0.1/apoc-4.2.0.1-all.jar "${NEO4J_HOME}"/plugins
COPY --from=0 /tmp/*.jar "${NEO4J_HOME}"/plugins

or if you have using a docker-compose volume mounts, then you can just copy and plugins to your soft link mount volume.

  1. Network - even if you run docker-compose and a separate Dockerfile, you can specify a Network name in both of them, and they will connect with each other.

You are being extremely helpful and I very much appreciate it and I feel that I must be missing the vital lightbulb moment for it all to click together. As it stands I have 1 issue outstanding.

It does not look like my cypher query is actually being executed? I cannot see the expected data in my default neo4j database and I do not see any errors on screen? I can confirm that the apoc library has been copied to the 'plugins' directory and also the json & cypher file have been copied to the 'import' directory.

Is there anything wrong with my cypher? This is what my 'data_loader.cypher' file contains.

// load data from json file (without full path default location is /import)

call apoc.load.json("file:///Dependencies_Tree_Full_UnSorted.json") yield value 

// merge loaded data into DB as nodes

merge (n:Node {nodeId: value.id}) on create set n.name = value.name

// using current data

with n, value

// unwind all of the node depends on nodes

unwind value.dependsOn as depOn

// merge loaded dependencies into DB as nodes

merge (e:Node {nodeId: depOn.id}) on create set e.name = depOn.name

// specify depends on relationship

merge (n)-[:Depends_On]-(e)

It is definately my cypher query as I have got it to work usig my current code and your example csv so things are looking up!

Oh boy, do I feel stupid, all that was missing was the semi0colon at the end of my cypher query and everything works as expected now, doh! Thank you very much for all your help and patience, it is very much appreciated. :grinning: