2 minute read  

Highly Available Shoot Control Plane

Shoot resource offers a way to request for a highly available control plane.

Failure Tolerance Types

A highly available shoot control plane can be setup with either a failure tolerance of zone or node.

Node Failure Tolerance

Failure tolerance of node will have the following characteristics:

  • Control plane components will be spread across different nodes within a single availability zone. There will not be more than one replica per node for each control plane component which has more than one replica.
  • Worker pool should have a minimum of 3 nodes.
  • A multi-node etcd (quorum size of 3) will be provisioned offering zero-downtime capabilities with each member in a different node within a single availability zone.

Zone Failure Tolerance

Failure tolerance of zone will have the following characteristics:

  • Control plane components will be spread across different availability zones. There will at least be one replica per zone for each control plane component which has more than one replica.
  • Gardener scheduler will automatically select a seed which has a minimum of 3 zones to host the shoot control plane.
  • A multi-node etcd (quorum size of 3) will be provisioned offering zero-downtime capabilities with each member in a different zone.

Shoot Spec

To request for a highly available shoot control plane gardener provides the following configuration in the shoot spec.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
spec:
  controlPlane:
    highAvailability:
      failureTolerance:
        type: <node | zone>

Allowed Transitions

If you already have a shoot cluster with non-HA control plane then following upgrades are possible:

  • Upgrade of non-HA shoot control plane to HA shoot control plane with node failure tolerance.
  • Upgrade of non-HA shoot control plane to HA shoot control plane with zone failure tolerance. However, it is essential that the seed which is currently hosting the shoot control plane should be multi-zonal. If it is not then request to upgrade will be rejected.

NOTE: There will be a small downtime during the upgrade especially for etcd which will transition from a single node etcd cluster to a multi-node etcd cluster.

Disallowed Transitions

If you have already set-up a HA shoot control plane with node failure tolerance then an upgrade to zone failure tolerance is currently not supported, mainly because already existing volumes are bound to the zone they were created in.