I would like to create a large scale "social like" graph database with 400 million to 1 trillion nodes.
What is the best AMI or image source for AWS and what size instance should I use? This is fro analytics not production, so very few users - mostly report outputs.
Here are instructions on how to launch Neo4j from an AMI directly:
As of this writing, these are the latest AMIs, based on Neo4j 3.4.9. If you are a time traveler and coming to this post many months later, make sure to search for "Neo4j" in public AMIs -- there may be a more up to date answer.
As far as the size of the instance goes, it depends on how much you want to get out of performance VS how much money to pay. For optimal performance, you would want enough memory to hold all the nodes in memory: Neo4j Docs: Memory Config
AWS will allow you to upgrade your instance size, but you will want to make sure you use EBS (Elastic Block Storage, AWS Storage) for storing your database. Then you can always start out with one size, build up your graph, and then shrink it back down.
Actual size recommendations, I don't know. But luckily AWS is quite flexible! Hope it helps.