Hey everyone, I've been running Neo4j for a couple months now on an EC2 and as time has gone on my data has grown and I've been continually changing the instance type to get an increase in memory. Is this the right way to go about accommodating for growing data, or should I be looking into something else such as clustering?
It depends on what you need. What you're doing is perfectly fine -- clustering is something a bit different. What it gives you is (a) a "high-availability" guarantee, if your database is spread across 3 instances than it's still up and running even if any one instance goes down, and (b) the ability to spread out your read workload.
So for example, if you need very high uptime, or you have way too many read clients for one machine to do the job, then you need a cluster. If you're happy with the machine that you've got, and you just need more RAM because of the queries you're doing, then the way you're approaching this is fine, as long as you're taking regular backups.
We've been doing a lot of work in the last few months to improve our AWS support. Have you seen the documentation here: Hosting Neo4j in the Cloud - Developer Guides and are you using this AMI, or installing it yourself on the VM?
Ah okay thanks for clearing that up! I mistook clustering as a way to shard data across multiple servers.
That being said, are there any other options to accommodate for growing data? It isn't too much of an issue at the moment and I do have regular backups, however I'm worried that eventually the costs will get out of hand when trying to upgrade memory to keep up with the data.
I was using the AMI a few months ago but at the time it hadn't been updated to the latest version of Neo4j so I set it up myself on an EC2. Are there any benefits to using the AMI instead of setting up manually? I noticed that the AMI creates 3 different storage volumes though I couldn't find anything on what they do.
If you've been using the community AMI -- it recently got an update to the 3.4 series.
The AMI is fairly similar to installing neo4j yourself on Ubuntu, however it has a couple of other nice things with it, such as automatic configuration of your external IP address, and the ability to configure the instance through tags on the instance rather than manual editing of the configuration file. Documentation on how the AMIs work can be found here: Neo4j Cloud Virtual Machines - Developer Guides
But basically if you're OK with your config then it isn't broken, no need to fix it. ;)
You don't need more RAM in the machine for growing data alone, you might need a bigger EBS volume. If you want to share something about your use case maybe I can say more. Generally the more RAM you can give neo4j the better it's going to perform, but machine sizing has a lot of factors to it that depend on query workload and dataset.