Is Neo4j Docker the recommended approach for production deployment?

I am reading the documentation:

I assume Neo4j Docker container is the recommended way of deploying it onto multiple containers. For performance consideration, if my graph is not that big, is it OK to deploy the graph onto a single very powerful machine other than multiple much less powerful containers? For example, if my performance metric requirement is QPS <= 500. Is that possible to achieve with a single big machine with many client to access concurrently? Read transaction only, no write.

Also, Neo4j Sandobox is a packaged example of how to use neo4j docker, not a product to use or install. It's for demonstration and testing only. Right?

Sandbox is for testing and learning, nothing to do with production deploys.

You can choose to use docker for production, or something else. It's not that Neo4j runs better in one vs. the other, it's really ultimately up to what's easiest for you.

You can run Neo4j either with multiple containers as a causal cluster, or with a single machine that's larger. These are not equivalent ways to run it though - it's less to do with how much hardware you have, and more to do with high availability - you simply can't make a single machine highly available, you need a cluster for that.

Have a look into what clusters do:

@david.allen, so you are suggesting cluster + docker as the preferred method for deployment.

For testing this architecture, can I use one machine with multiple cores to test it, or I need multiple containers or physical machines to test it? I want to try out docker + cluster on one machine first. Probably that's not possible?

Besides, when you say 'you can't simply make a single machine highly available', do you mean a single machine's resource is insufficient to serve potentially large amounts of client requests, or you mean a single machine may crash, therefore unable to provide services at all?

You can test causal cluster by running many containers on a single machine. It's a good test of the architecture overall, but of course it isn't highly available, since if the single machine crashes, you lose the entire database.

I mean the second one - "high availability" means that the database stays available even if a machine fails.

@david.allen I also saw a neo4j spark connector, https://neo4j.com/developer/spark/

How is that project related to the Neo4j Docker architecture? Is it a requirement for using Neo4j Docker?

The two are not connected - you can use the Neo4j Connector for Apache Spark with Neo4j deployed in docker, and you can also use it with VMs, or wherever you install Neo4j.

The connector doesn't have any restrictions on how you deploy Neo4j, so whatever you pick they should work together just fine

@david.allen

I see the (Set up a local Causal Cluster - Operations Manual) tutorial on a single machine.

Is there a tutorial for setting up a causal cluster on multiple machines? For example, causal cluster using 4 machines to host 3 core servers and 1 read-replica server?

you can change the localhost to your respective server -

causal_clustering.initial_discovery_members=localhost:5000,localhost:5001,localhost:5002

@dominicvivek06

I see that there is (GitHub - neo4j-contrib/neo4j-helm: Helm Charts for running Neo4j on Kubernetes). What I am not sure is how do you assign these servers.

From my limited understanding of helm (still learning), it's very easy to setup causal cluster on a single machine. But how do you do that across multiple machines?

Based on (Installation - Neo4j-Helm User Guide), you can create a causal cluster running

#  creates a cluster containing 3 core servers and 3 read replicas
helm install my-neo4j \
    --set core.numberOfServers=3,readReplica.numberOfServers=3,acceptLicenseAgreement=yes,neo4jPassword=mySecretPassword .

But this command don't specified all the required servers, should I do that in values.yaml? And it will automatically setup all neo4j core / read-replica servers in the specified machine locations?

Can you provide a simple operational step-by-step tutorial on this? Running Neo4j helm across multiple machines. Thanks!

cc @david.allen

Installation information for neo4j-helm can be found here: Installation - Neo4j-Helm User Guide

Basically, you probably want to specify a core.numberOfServers setting, and (optionally, possibly) a standalone setting to control the number of servers. Check the documentation.