Large scale data - what is best cloud image

operations

(Dwcar49us) #1

Hi,

I am new to Neo4J.

I would like to create a large scale "social like" graph database with 400 million to 1 trillion nodes.

What is the best AMI or image source for AWS and what size instance should I use? This is fro analytics not production, so very few users - mostly report outputs.


(M. David Allen) #2

Here are instructions on how to launch Neo4j from an AMI directly:

As of this writing, these are the latest AMIs, based on Neo4j 3.4.9. If you are a time traveler and coming to this post many months later, make sure to search for "Neo4j" in public AMIs -- there may be a more up to date answer.

ap-northeast-1: ami-0dce788bd9a6cf840
ap-southeast-1: ami-0b2180823afdee1aa
eu-central-1: ami-049c9c0916b6c7284
eu-west-1: ami-077be3f74a612ead1
sa-east-1: ami-008ff2bf75937b085
us-east-1: ami-0237ff44323b51813
us-east-2: ami-02141f58a2cfd3b56
us-west-1: ami-0729e3867664b31b1
us-west-2: ami-036057864c223d1dc

(Jacob McCrumb) #3

Welcome!

As far as the size of the instance goes, it depends on how much you want to get out of performance VS how much money to pay. For optimal performance, you would want enough memory to hold all the nodes in memory: Neo4j Docs: Memory Config

AWS will allow you to upgrade your instance size, but you will want to make sure you use EBS (Elastic Block Storage, AWS Storage) for storing your database. Then you can always start out with one size, build up your graph, and then shrink it back down.

Actual size recommendations, I don't know. But luckily AWS is quite flexible! Hope it helps.