Cluster, core server, read replica, data storage

How data is stored in core server and read replica ?

Please follow this thread with similar question.

1 Like

Hi, you could read this part of the operation manual https://neo4j.com/docs/operations-manual/current/clustering/introduction/
part of the magic of the core and the read replicas is that the core has two operations allow read and write, while your read replica has only the read operation.

part of the use cases for this is when you had to run graph algorithms and you have an intense loading of data, you could use a read replica for that processes that use graph algorithms and that is not going to have any impact in your loading operation.

hope this help.

Kind regards,
Roberto S.

1 Like

Hi,@roberto1 @ravi.anthapu I am concerned about data storage, let's we have 10TB data and we want to shard it into 1TB each with (5 core, 1000 replica ), then do we need to install hard disk at each server ? what is the optimal way to do sharding ?

Hi Deepak,
Can you please elaborate statement "1000 replica "? Do you mean to say you want 1000 copies of the content? What's the use case to have 1000 replica's. I am asking out of curiosity. I have never come across such a need and want to understand where this kind of setup might be needed.

Please remember Neo4J is a monolithic database. This means all the data for a DB is on a single server. If you have 3 core cluster all 3 servers in the cluster have the same information and of same size. Same goes for RR's. Each RR will be a replica of a core node in the cluster. Cluster nodes and RR's in Neo4J are used for high availability and horizontal scalability, not vertically.

In reality Neo4J uses of lot less storage say compared to Elasticsearch as the data is stored in normalized fashion.

Here's an example how the data size can be smaller not larger with Neo4J.

http://www.odbms.org/blog/2018/07/on-using-graph-database-technology-at-behance-interview-with-david-fox/

A quote from that document

"Our Neo4j activity implementation has led to a great decrease in complexity, storage, and infrastructure costs. Our full dataset size is now around 40 GB, down from 50 TB of data that we had stored in Cassandra. We’re able to power our entire activity feed infrastructure using a cluster of 3 Neo4j instances, down from 48 Cassandra instances of pretty much equal specs. That has also led to reduced infrastructure costs. Most importantly, it’s been a breeze for our operations staff to manage since the architecture is simple and lean."

So, before being worried about data size, it would be prudent to do a POC and see if Neo4J is really the solution for you.

2 Likes