Causual cluster follower falling behind


(Tim Hanssen) #1

We're running a 3 server causal cluster locally hosted. A few times a day in the debug.log we see the message "follower has fallen behind". Sometimes its just one entry, sometimes a server keeps falling behind seconds after the moved to PIPELINE mode.

2018-12-12 10:05:59.430+0000 INFO [o.n.c.c.c.s.RaftLogShipper] MemberId{f3b56e53}[matchIndex: 50170773, lastSentIndex: 50171030, localAppendIndex: 50171031, mode: PIPELINE]: follower has fallen behind (target prevLogIndex was 50171030, maxAllowedShippingLag is 256), moving to CATCHUP mode

2018-12-12 10:06:01.372+0000 INFO [o.n.c.c.c.s.RaftLogShipper] MemberId{f3b56e53}[matchIndex: 50171094, lastSentIndex: 50171141, localAppendIndex: 50171141, mode: CATCHUP]: caught up, moving to PIPELINE mode

  • Ubuntu 18 LTS, 62GB
  • Neo4j 3.4.10
  • BOLT (without routing)
  • Causal cluster (from 3 nodes)

Any suggestions on where we should start looking? We're thinking about raising the causal_clustering.log_shipping_max_lag but the docs are not really clear about the implications.

(Tim Hanssen) #2

We changed the causal_clustering.log_shipping_max_lag from 256 to 512 and that seems to fixed the issue.

Cluster leader keeps changing