Distributed data storage in Neo4j

manjunath.gopi87 · June 11, 2019, 4:09am

Given that data sharding is not straightforward in the case of graph data structure. How does Neo4j distribute the data in its servers for horizontal scaling of data.

stefan.armbruster · June 12, 2019, 2:27pm

It doesn't. Neo4j Causal Cluster is a full graph replication approach.

Note that upcoming releases might have support for application level provided partitioning.

jgaskins · June 12, 2019, 8:56pm

One concept that I believe I heard mentioned at GraphConnect was the idea of reading from different replicas that will have different subgraphs in their page cache. The idea was to make it a sort of "soft sharding", where you could reach beyond your own "shard" of the data when needed but your main working set could remain in memory.

I don't have enough data to warrant this sort of optimization yet but I could imagine it working pretty well if node clusters don't have much overlap. For example, when virtually all of a customer's data is scoped to their own account, you could route all queries for a given customer id to the same replica every time and you're more likely to get a cache hit.

And changing how you route those queries could be done with significantly less effort. Since you're not changing where the data physically lives there's no real shard migration you need to do, you're just tweaking where you read them from. It'd run slower for a little while as the caches adjust, but I'd bet it would catch up reasonably quickly.

Topic		Replies	Views
How data is stored across servers? Cluster	15	2042	March 11, 2020
Neo4j Clustering without storage copy? Cluster	2	953	February 26, 2019
Data storage among the Read Replicas Cluster	4	1928	December 18, 2018
Large connected data distribution? Server	1	350	February 18, 2020
Causal clustering Neo4j Graph Platform	1	310	April 13, 2020

August Summer Fun!

Distributed data storage in Neo4j

Related topics