Quorum Queues
Overview
The RabbitMQ quorum queue is a modern queue type which implements a durable, replicated queue based on the Raft consensus algorithm and should be considered the default choice when needing a replicated, highly available queue.
Quorum queues are designed for excellent data safety as well as reliable and fast leader election properties to ensure high availability even during upgrades or other turbulence.
Quorum queues are optimized for certain use cases where data safety is the top priority. This is covered in the Motivation section.
Quorum queues have differences in behaviour compared to classic queues as well as some limitations that it is important to be aware of when converting an application from using classic to quorum queues.
Some features are specific to quorum queues such as poison message handling
, at least once dead-lettering and
the modified
outcome when using AMQP.
For cases that would benefit from replication and repeatable reads, streams may be a better option than quorum queues.
Quorum queues and streams are the two replicated data structures available. Classic queue mirroring was removed starting with RabbitMQ 4.0.
Topics Covered
Topics covered in this document include:
- What are quorum queues and why they were introduced
- How are they different from classic queues
- Primary use cases of quorum queues and when not to use them
- How to declare a quorum queue
- Replication-related topics: replica management, replica leader rebalancing, optimal number of replicas, etc
- What guarantees quorum queues offer in terms of leader failure handling, data safety and availability
- Continuous Membership Reconciliation
- Memory and disk footprint of quorum queues
- Performance characteristics of quorum queues and performance tuning relevant to them
- Poison message handling (failure-redelivery loop protection)
- Options to Relax Property Equivalence
- Configurable settings of quorum queues
and more.
General familiarity with RabbitMQ clustering would be helpful here when learning more about quorum queues.
Motivation
Quorum queues adopt a different replication and consensus protocol and give up support for certain "transient" in nature features, which results in some limitations. These limitations are covered later in this information.
Quorum queues pass a refactored and more demanding version of the original Jepsen test. This ensures they behave as expected under network partitions and failure scenarios. The new test runs continuously to spot possible regressions and is enhanced regularly to test new features (e.g. dead lettering).
What is a Quorum?
If intentionally simplified, quorum in a distributed system can
be defined as an agreement between the majority of nodes ((N/2)+1
where N
is the total number of
system participants).
When applied to queue mirroring in RabbitMQ clusters this means that the majority of replicas (including the currently elected queue leader) agree on the state of the queue and its contents.
Use Cases
Quorum queues intended use is in topologies where queues exist for a long time and are critical to certain aspects of an application's architecture.
Examples of good use cases would be incoming orders in a sales system or votes cast in an electoral system where potentially losing messages would have a significant impact on overall system correctness and function.
Quorum queues are not designed to be used for every problem. Stock tickers, instant messaging systems and RPC reply queues benefit less or not at all from use of quorum queues.
Publishers should use publisher confirms as this is how clients can interact with the quorum queue consensus system. Publisher confirms will only be issued once a published message has been successfully replicated to a quorum of replicas and is considered "safe" within the context of the queue.
Consumers should use manual acknowledgements to ensure messages that aren't successfully processed are returned to the queue so that another consumer can re-attempt processing.
When Not to Use Quorum Queues
In some cases quorum queues should not be used. They typically involve:
- Temporary queues: transient or exclusive queues, high queue churn (declaration and deletion rates)
- Lowest possible latency: the underlying consensus algorithm has an inherently higher latency due to its data safety features
- When data safety is not a priority (e.g. applications do not use manual acknowledgements and publisher confirms are not used)
- Very long queue backlogs (5M+ messages) (streams are likely to be a better fit)
- Large fanouts: (streams are likely to be a better fit)
Features
Quorum queues share most of the fundamentals with other queue types.
Comparison with Classic Queues
The following operations work the same way for quorum queues as they do for classic queues:
- Consumption, consumer registration
- Consumer acknowledgements (except for global QoS and prefetch)
- Consumer cancellation
- Purging
- Deletion
With some queue operations there are minor differences:
- Declaration
- Setting prefetch for consumers
Feature Matrix
Feature | Classic queues | Quorum queues |
---|---|---|
Non-durable queues | yes | no |
Message replication | no | yes |
Exclusivity | yes | no |
Per message persistence | per message | always |
Membership changes | no | semi-automatic |
Message TTL (Time-To-Live) | yes | yes |
Queue TTL | yes | partially (lease is not renewed on queue re-declaration) |
Queue length limits | yes | yes (except x-overflow : reject-publish-dlx ) |
Keeps messages in memory | see Classic Queues | never (see Resource Use) |
Message priority | yes | yes |
Single Active Consumer | yes | yes |
Consumer exclusivity | yes | no (use Single Active Consumer) |
Consumer priority | yes | yes |
Dead letter exchanges | yes | yes |
Adheres to policies | yes | yes (see Policy support) |
Poison message handling | no | yes |
Global QoS Prefetch | yes | no |
Server-named queues | yes | no |
Modern quorum queues also offer higher throughput and less latency variability for many workloads.
Queue and Per-Message TTL
Quorum queues support both Queue TTL and message TTL (since RabbitMQ 3.10) (including Per-Queue Message TTL in Queues and Per-Message TTL in Publishers). When using any form of message TTL, the memory overhead increases by 2 bytes per message.
Length Limit
Quorum queues has support for queue length limits.
The drop-head
and reject-publish
overflow behaviours are supported but they
do not support reject-publish-dlx
configurations as Quorum queues take a different
implementation approach than classic queues.
The current implementation of reject-publish
overflow behaviour does not strictly
enforce the limit and allows a quorum queue to overshoot its limit by at least
one message, therefore it should be taken with care in scenarios where a precise
limit is required.
When a quorum queue reaches the max-length limit and reject-publish
is configured
it notifies each publishing channel who from thereon will reject all messages back to
the client. This means that quorum queues may overshoot their limit by some small number
of messages as there may be messages in flight whilst the channels are notified.
The number of additional messages that are accepted by the queue will vary depending
on how many messages are in flight at the time.
Dead Lettering
Quorum queues support dead letter exchanges (DLXs).
Traditionally, using DLXs in a clustered environment has not been safe.
Since RabbitMQ 3.10 quorum queues support a safer form of dead-lettering that uses
at-least-once
guarantees for the message transfer between queues
(with the limitations and caveats outlined below).
This is done by implementing a special, internal dead-letter consumer process that works similarly to a normal queue consumer with manual acknowledgements apart from it only consumes messages that have been dead-lettered.
This means that the source quorum queue will retain the
dead-lettered messages until they have been acknowledged. The internal consumer
will consume dead-lettered messages and publish them to the target queue(s) using
publisher confirms. It will only acknowledge once publisher confirms have been
received, hence providing at-least-once
guarantees.
at-most-once
remains the default dead-letter-strategy for quorum queues and is useful for scenarios
where the dead lettered messages are more of an informational nature and where it does not matter so much
if they are lost in transit between queues or when the overflow
configuration restriction outlined below is not suitable.
Activating at-least-once dead-lettering
To activate or turn on at-least-once
dead-lettering for a source quorum queue, apply all of the following policies
(or the equivalent queue arguments starting with x-
):
- Set
dead-letter-strategy
toat-least-once
(default isat-most-once
). - Set
overflow
toreject-publish
(default isdrop-head
). - Configure a
dead-letter-exchange
. - Turn on feature flag
stream_queue
(turned on by default for RabbitMQ clusters created in 3.9 or later).
It is recommended to additionally configure max-length
or max-length-bytes
to prevent excessive message buildup in the source quorum queue (see caveats below).
Optionally, configure a dead-letter-routing-key
.
Limitations
at-least-once
dead lettering does not work with the default drop-head
overflow
strategy even if a queue length limit is not set.
Hence if drop-head
is configured the dead-lettering will fall back
to at-most-once
. Use the overflow strategy reject-publish
instead.
Caveats
at-least-once
dead-lettering will require more system resources such as memory and CPU.
Therefore, turn on at-least-once
only if dead lettered messages should not be lost.
at-least-once
guarantees opens up some specific failure cases that needs handling.
As dead-lettered messages are now retained by the source quorum queue until they have been
safely accepted by the dead-letter target queue(s) this means they have to contribute to the
queue resource limits, such as max length limits so that the queue can refuse to accept
more messages until some have been removed. Theoretically it is then possible for a queue
to only contain dead-lettered messages, in the case where, say a target dead-letter
queue isn't available to accept messages for a long time and normal queue consumers
consume most of the messages.
Dead-lettered messages are considered "live" until they have been confirmed by the dead-letter target queue(s).
There are few cases for which dead lettered messages will not be removed from the source queue in a timely manner:
- The configured dead-letter exchange does not exist.
- The messages cannot be routed to any queue (equivalent to the
mandatory
message property). - One (of possibly many) routed target queues does not confirm receipt of the message. This can happen when a target queue is not available or when a target queue rejects a message (e.g. due to exceeded queue length limit).
The dead-letter consumer process will retry periodically if either of the scenarios above occur which means there is a possibility of duplicates appearing at the DLX target queue(s).
For each quorum queue with at-least-once
dead-lettering turned on, there will be one internal dead-letter
consumer process. The internal dead-letter consumer process is co-located on the quorum queue leader node.
It keeps all dead-lettered message bodies in memory.
It uses a prefetch size of 32 messages to limit the amount of message bodies kept in memory if no confirms
are received from the target queues.
That prefetch size can be increased by the dead_letter_worker_consumer_prefetch
setting in the rabbit
app section of the
advanced config file if high dead-lettering throughput
(thousands of messages per second) is required.
For a source quorum queue, it is possible to switch dead-letter strategy dynamically from at-most-once
to at-least-once
and vice versa. If the dead-letter strategy is changed either directly
from at-least-once
to at-most-once
or indirectly, for example by changing overflow from reject-publish
to drop-head
, any dead-lettered messages that have not yet been confirmed by all target queues will be deleted.
Messages published to the source quorum queue are persisted on disk regardless of the message delivery mode (transient or persistent). However, messages that are dead lettered by the source quorum queue will keep the original message delivery mode. This means if dead lettered messages in the target queue should survive a broker restart, the target queue must be durable and the message delivery mode must be set to persistent when publishing messages to the source quorum queue.
Priorities
Quorum queue priority support is available as of RabbitMQ 4.0. However, there are differences in how quorum queues and classic queues implement priorities.
Quorum queues support consumer priorities and starting with 4.0, they also support a type of message prioritisation that is quite different from classic queue message priorities.
Quorum queue message priorities are always active and do not require a policy to work. As soon as a quorum queue receives a message with a priority set it will enable prioritization.
Quorum queues internally only support two priorities: high and normal. Messages without a priority set will be mapped to normal as will priorities 0 - 4. Messages with a priority higher than 4 will be mapped to high.
High priority messages will be favoured over normal priority messages at a ratio of 2:1, i.e. for every 2 high priority message the queue will deliver 1 normal priority message (if available). Hence, quorum queues implement a kind of non-strict, "fair share" priority processing. This ensures progress is always made on normal priority messages but high priorities are favoured at a ratio of 2:1.
If a high priority message was published before a normal priority one, the high priority message will always be delivered first even if it is the normal priority's turn.
More Advanced Scenarios
For more advanced message priority scenarios, separate queues should be used for different message types, one for each type (priority). Queues used for more important message types should generally have more (overprovisioned) consumers.
Poison Message Handling
Quorum queue support handling of poison messages, that is, messages that cause a consumer to repeatedly requeue a delivery (possibly due to a consumer failure) such that the message is never consumed completely and positively acknowledged so that it can be marked for deletion by RabbitMQ.
Quorum queues keep track of the number of unsuccessful (re)delivery attempts and expose it in the "x-delivery-count" header that is included with any redelivered message.
When a message has been redelivered more times than the limit the message will be dropped (removed) or dead-lettered (if a DLX is configured).
It is recommended that all quorum queues have a dead letter configuration of some sort to ensure messages aren't dropped and lost unintentionally. Using a single stream for a low priority dead letter policy is a good, low resource way to ensure dropped messages are retained for some time after.
Starting with RabbitMQ 4.0, the delivery limit for quorum queues defaults to 20.
The 3.13.x era behavior where there was no limit can be restored by setting the limit to -1
using an optional queue argument at declaration time or using a policy as demonstrated below.
See repeated requeues for more details.
Configuring the Limit
It is possible to set a delivery limit for a queue using a policy argument, delivery-limit
.
Overriding the Limit
The following example sets the limit to 50 for queues whose names begin with
qq
.
- bash
- PowerShell
- HTTP API
- Management UI
rabbitmqctl set_policy qq-overrides \
"^qq\." '{"delivery-limit": 50}' \
--priority 123 \
--apply-to "quorum_queues"
rabbitmqctl.bat set_policy qq-overrides ^
"^qq\." "{""delivery-limit"": 50}" ^
--priority 123 ^
--apply-to "quorum_queues"
PUT /api/policies/%2f/qq-overrides
{"pattern": "^qq\.",
"definition": {"delivery-limit": 50},
"priority": 1,
"apply-to": "quorum_queues"}
Navigate to
Admin
>Policies
>Add / update a policy
.Enter a policy name (such as "qq-overrides") next to Name, a pattern (such as "^qq.") next to Pattern, and select what kind of entities (quorum queues in this example) the policy should apply to using the
Apply to
drop down.Enter "delivery-limit" for policy argument and 50 for its value in the first line next to
Policy
.Click
Add policy
.
Disabling the Limit
The following example disables the limit for queues whose names begin with
qq.unlimited
.
- bash
- PowerShell
- HTTP API
- Management UI
rabbitmqctl set_policy qq-overrides \
"^qq\.unlimited" '{"delivery-limit": -1}' \
--priority 123 \
--apply-to "quorum_queues"
rabbitmqctl.bat set_policy qq-overrides ^
"^qq\.unlimited" "{""delivery-limit"": -1}" ^
--priority 123 ^
--apply-to "quorum_queues"
PUT /api/policies/%2f/qq-overrides
{"pattern": "^qq\.unlimited",
"definition": {"delivery-limit": -1},
"priority": 1,
"apply-to": "quorum_queues"}
Navigate to
Admin
>Policies
>Add / update a policy
.Enter a policy name (such as "qq-overrides") next to Name, a pattern (such as "^qq.unlimited") next to Pattern, and select what kind of entities (quorum queues in this example) the policy should apply to using the
Apply to
drop down.Enter "delivery-limit" for policy argument and -1 for its value in the first line next to
Policy
.Click
Add policy
.
Configuring the Limit and Setting Up Dead-Lettering
Messages that are redelivered more times than the limit allows for will be either dropped (removed) or dead-lettered.
The following example configures both the limit and an exchange to dead-letter (republish) such messages. The target exchange in this example is called "redeliveries.limit.dlx". Declaring it and setting up its topology (binding queues and/or streams to it) is not covered in this example.
- bash
- PowerShell
- HTTP API
- Management UI
rabbitmqctl set_policy qq-overrides \
"^qq\." '{"delivery-limit": 50, "dead-letter-exchange": "redeliveries.limit.dlx"}' \
--priority 123 \
--apply-to "quorum_queues"
rabbitmqctl.bat set_policy qq-overrides ^
"^qq\." "{""delivery-limit"": 50, ""dead-letter-exchange"": ""redeliveries.limit.dlx""}" ^
--priority 123 ^
--apply-to "quorum_queues"
PUT /api/policies/%2f/qq-overrides
{"pattern": "^qq\.",
"definition": {"delivery-limit": 50, "dead-letter-exchange": "redeliveries.limit.dlx"},
"priority": 1,
"apply-to": "quorum_queues"}
Navigate to
Admin
>Policies
>Add / update a policy
.Enter a policy name (such as "qq-overrides") next to Name, a pattern (such as "^qq.") next to Pattern, and select what kind of entities (quorum queues in this example) the policy should apply to using the
Apply to
drop down.Enter "delivery-limit" for policy argument and 50 for its value in the first line next to
Policy
, then "dead-letter-exchange" for the second key and "redeliveries.limit.dlx" for its value.Click
Add policy
.
To learn more about dead-lettering, please consult its dedicated guide.
Policy Support
Quorum queues can be configured via RabbitMQ policies. The below table summarises the policy keys they adhere to.
Definition Key | Type |
---|---|
max-length | Number |
max-length-bytes | Number |
overflow | "drop-head" or "reject-publish" |
expires | Number (milliseconds) |
dead-letter-exchange | String |
dead-letter-routing-key | String |
delivery-limit | Number |
Features that are not Supported
Transient (non-Durable) Queues
Classic queues can be non-durable. Quorum queues are always durable per their assumed use cases.
Exclusivity
Exclusive queues are tied to the lifecycle of their declaring connection. Quorum queues by design are replicated and durable, therefore the exclusive property makes no sense in their context. Therefore quorum queues cannot be exclusive.
Quorum queues are not meant to be used as temporary queues.