Showing results for 
Search instead for 
Did you mean: 

How to estimate Neo4j and system configuration for huge graphs

Hello Guys, I’m trying to understand what would be the system configuration for my project.
I have to implement Neo4j for Entity relationship with an estimated userbase of more than 2B nodes and 4B relationships. Also I want to run cypher pattern based reasoning for the initial graph to enrich the knowledge. I understand we can use apocalyptic estimation query to estimate required memory for GDS algorithm but is there anything same for cypher pattern query and what would be the performance metric for the same it terms of time ?

How I can setup neo4j instance for this volume of data and what would be the system and neo4j configuration for it?
How to speed up cypher pattern based reasoning ?

How can achieve good performance for real time analysis ?

Would appreciated if anyone can provide information on above questions. Thanks



regarding sizing, its not such a simple task. Even if we knew there were 2 Billion nodes, do said nodes average 3 properties per each node of 40 properties per each node. And then what are the datatypes of these properties. Are they all integers for example, or are they all strings and each string property could be any of 1 to 5000 characters? The same can be said for relationships.
You might want to first import a subset of the data, i.e. 100m nodes and 200million relationships get a rough estimate for graph size and then extrapolate up from there.
And in a perfect world it would be great to have an equivalent amount of RAM as the size of the database itself.

Also regarding the recommended system, again that would depend on the expected response time, the complexity of said queries and the number of concurrent queries. For example 100 near concurrent match (n) return n limit 1; is not going to be as performance intensive as match (n:Person {id:1})-[r*1..9]->(n2:Person) where n2.status='active' with n,n2, collect(r) as rels .... .... .... ....