Implementing Raft in RabbitMQ

https://content.pivotal.io/rabbitmq/implementing-raft-in-rabbitmq

RAFT is a distributed consensus protocol.

RabbitMQ High Availability

  • Replication of data and operations
  • Message replication is done at queue level
  • Called "Queue mirroring"
    • Internally uses a component called "guaranteed multicast"
    • Provides replication and total ordering of operations
    • Chain replication ensures strong consistency and good availability guarantees in fail-stop scenarios. http://www.cs.cornell.edu/home/rvr/papers/OSDI04.pdf
  • In a cluster of RabbitMQ nodes a queue can have a mirror on one or more nodes
  • Provides fail-over and redundancy

Ring

Works well most of the time. Requieres good failure detection. Membership changes are expensive (requires queue sync). Master election algorithm is informally specified.

We can do better with RAFT.

RAFT

Requirements:

  • Strong consistency guarantees.
    • Total order of operations.
  • Predictable behavior in response to failure events (well-defined recovery procedure)
    • Safe queue master fail-over
  • Parallel replication

Options:

  • Paxos
  • Viewstampesd replication
  • Raft

What is RAFT:

  • A group of algorithms for reaching consensus in a distributed system
  • Similar problem space to RabbitMQ queue mirroring
  • Oriented towards implementers
  • Requires no external dependencies

RAFT provides:

  • A state machine log abstraction
    • Fits many domains
  • Leader-follower model
  • State machine log replication
    • Consistency-oriented, availability characteristics
    • Total order of operations
  • Well-defined algorithms important for implementers
    • Leader election
    • Safe cluster membership changes
    • Durable storage expectations
  • Recovery
    • Reply log to restore state
    • Snapshooting

RAFT Protocol

RAFT vs RabbitMQ

What to do when detecting a potencial failure?

  • Nothing
    • most reliable / least useful
  • Try to fix stuff
    • evict down nodes, reform topology
    • communicate changes to other nodes
  • The minimum required
    • regain / retain availability and consistency

RAFT In Action

References

http://raft.github.io

https://github.com/rabbitmq/ra

results matching ""

    No results matching ""