How does Neo4j circumvent the slowdown cascade problem with causal consistency?

"slowdown cascade phenomenon, where a single slow or failed shard negatively impacts the entire system, delaying the visibility of updates across many shards and leading to growing queues of delayed updates [2, 25]. Slowdown cascades thus violate a basic commandment for scalability: do not let your performance be determined by the slowest component in your system. ... if any shard cannot keep up with the arrival rate of replicated writes, then the average queue length across all shards grows indefinitely."

[causal consistency], [distributed database], [multi-data center], [causal clustering], [sharding]

Perhaps using this approach?

This paper identifies slowdown cascades as a fundamental limitation of enforcing causal consistency as a global prop- erty of the datastore. Occult instead moves this responsibility to the client: the data store makes its updates available as soon as it receives them. Clients then enforce causal consistency on reads only for updates that they are actually interested in observing, using compressed timestamps to track causality.

  • S.A. Mehdi, C. Littley, N. Crooks, L. Alvisi, N. Bronson, and W. Lloyd. 'I Can't Believe It's Not Causal! Scalable Causal Consistency with No Slowdown Cascades'. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI '17), Boston, MA, April 2017. from

It seems like it, from the docs:

" On executing a transaction, the client can ask for a bookmark which it then presents as a parameter to subsequent transactions. Using that bookmark the cluster can ensure that only servers which have processed the client’s bookmarked transaction will run its next transaction. This provides a causal chain which ensures correct read-after-write semantics from the client’s point of view."

Would be happy if this could be confirmed. Thanks in advance.

Causal clustering leverages the Raft protocol, which enforces majority quorum for commits, so not all cluster members must be directly involved in a commit operation. This can allow for the case of nodes that are slowed for some reason and unable to keep the same pace, so they will not drag down the speed of committing. So for a 3 node cluster, only 2 must be be involved for a commit to happen. For a 5 node cluster, only 3 must be involved for a commit to happen (allowing the possibility of 2 nodes to be slow or not caught up).

So we aren't directly impacted by that since we do not require all nodes to be involved in a commit, some can be behind, and the number allowed is determined by the cluster size, since that dictates majority.

In the case of failed nodes, provided quorum is maintained and you're still above the min cluster size at runtime, the failed node will be voted out and the cluster will scale down in size, usually resulting in a new number for majority quorum. No impact to service.

And as arnleiftordal mentioned, we do use bookmarks for implementing causal chaining, when you absolutely know a read transaction must depend on being caught up with a previous write transaction.

If you're reading your own writes in a write tx, then you're executing on the leader and guaranteed to see the latest data anyway.

If you're executing a separate read transaction (which will route to a non-leader node) following a write transaction, provided you obtained a bookmark from the write tx and passed it when executing the read query (and this is done for you automatically if the read and write tx are executed within the same session), then whichever (follower or read replica) node receives the transaction must wait until their transactions are caught up with the bookmark before executing. So causal chaining is available, but it is client-driven, not automatic for all transactions.

1 Like

Awesome and candid answer! Thank you so much :slight_smile: