Join the free virtual developer conference on knowledge graphs and AI. This year's themes are: applications, AI engineering, data intelligence, graphs, and architecture.
I assume Neo4j Docker container is the recommended way of deploying it onto multiple containers. For performance consideration, if my graph is not that big, is it OK to deploy the graph onto a single very powerful machine other than multiple much less powerful containers? For example, if my performance metric requirement is QPS <= 500. Is that possible to achieve with a single big machine with many client to access concurrently? Read transaction only, no write.
Also, Neo4j Sandobox is a packaged example of how to use neo4j docker, not a product to use or install. It's for demonstration and testing only. Right?
Sandbox is for testing and learning, nothing to do with production deploys.
You can choose to use docker for production, or something else. It's not that Neo4j runs better in one vs. the other, it's really ultimately up to what's easiest for you.
You can run Neo4j either with multiple containers as a causal cluster, or with a single machine that's larger. These are not equivalent ways to run it though - it's less to do with how much hardware you have, and more to do with high availability - you simply can't make a single machine highly available, you need a cluster for that.
@david_allen, so you are suggesting cluster + docker as the preferred method for deployment.
For testing this architecture, can I use one machine with multiple cores to test it, or I need multiple containers or physical machines to test it? I want to try out docker + cluster on one machine first. Probably that's not possible?
Besides, when you say 'you can't simply make a single machine highly available', do you mean a single machine's resource is insufficient to serve potentially large amounts of client requests, or you mean a single machine may crash, therefore unable to provide services at all?
You can test causal cluster by running many containers on a single machine. It's a good test of the architecture overall, but of course it isn't highly available, since if the single machine crashes, you lose the entire database.
I mean the second one - "high availability" means that the database stays available even if a machine fails.
The two are not connected - you can use the Neo4j Connector for Apache Spark with Neo4j deployed in docker, and you can also use it with VMs, or wherever you install Neo4j.
The connector doesn't have any restrictions on how you deploy Neo4j, so whatever you pick they should work together just fine
Is there a tutorial for setting up a causal cluster on multiple machines? For example, causal cluster using 4 machines to host 3 core servers and 1 read-replica server?
From my limited understanding of helm (still learning), it's very easy to setup causal cluster on a single machine. But how do you do that across multiple machines?
# creates a cluster containing 3 core servers and 3 read replicas
helm install my-neo4j \
--set core.numberOfServers=3,readReplica.numberOfServers=3,acceptLicenseAgreement=yes,neo4jPassword=mySecretPassword .
But this command don't specified all the required servers, should I do that in values.yaml? And it will automatically setup all neo4j core / read-replica servers in the specified machine locations?
Can you provide a simple operational step-by-step tutorial on this? Running Neo4j helm across multiple machines. Thanks!
Basically, you probably want to specify a core.numberOfServers setting, and (optionally, possibly) a standalone setting to control the number of servers. Check the documentation.