Grow-then-Shrink Upgrade
This strategy involves node identity changes and replica transfers to the newly added nodes.
With quorum queues and streams that have large data sets, this means that the cluster will experience substantial network traffic volume and disk I/O spikes that a rolling in-place upgrade would not.
Consider using in-place upgrades or Blue/Green deployment upgrades instead.
In order to safely perform a grow-then-shrink upgrade, several precautions must be taken
A Grow-then-Shrink upgrade usually involves the following steps. Consider a three node cluster with nodes A, B, and C:
- Add a new node, node D, to the cluster
- Place a new replica of every quorum queue and every stream to the new node using commands such as
rabbitmq-queues grow
- Check that the cluster is in a good state: no alarms are in effect, no ongoing queue or stream replica sync operations and the system is otherwise under a reasonable load
- Remove node A from the cluster using
rabbitmqctl forget_cluster_node
- Add a new node, node E, to the cluster
- Place a new replica of every quorum queue and every stream to the new node using commands such as
rabbitmq-queues grow
- Check that the cluster is in a good state
- Remove node B from the cluster using
rabbitmqctl forget_cluster_node
- and so on
This approach may seem like one that strikes a good balance between the relative simplicity of in-place upgrades and the safety of Blue-Green deployment upgrades. However, in practice this strategy has comparable characteristics to the in-place upgrade option:
- Newly added nodes may affect the existing cluster state
- Replicas will migrate between nodes during the upgrade process
In addition, this approach has its own unique potential risks:
- Node identities change during the upgrade process, which can affect historical monitoring data
- Nodes must transfer their data sets to the newly added members, which can result in a very substantial increase in network traffic and disk I/O
- Premature removal of nodes (see below) can lead to a quorum loss for a subset of quorum queues and streams
In order to safely perform a grow-then-shrink upgrade, several precautions must be taken
In order to safely perform a grow-then-shrink upgrade, several precautions must be taken:
- After a new node is added and a replica extension process is initiated, the process must be given enough time to complete
- Before a node is removed, a health check must be run to ensure that it is not quorum critical for any queues (or streams): that is, that the removal of the node will not leave any quorum queues or streams without an online majority
- Nodes must be removed from the cluster explicitly using
rabbitmqctl forget_cluster_node
Streams specifically were not designed for environments where replica (node) identity change is frequent, and all replicas can be transferred away and replaced over duration of a single cluster upgrade.