8 minute read  

Shoot Control Plane Migration

Motivation

Currently moving the control plane of a shoot cluster can only be done manually and requires deep knowledge of how exactly to transfer the resources and state from one seed to another. This can make it slow and prone to errors.

Automatic migration can be very useful in a couple of scenarios:

  • Seed goes down and can’t be repaired (fast enough or at all) and it’s control planes need to be brought to another seed
  • Seed needs to be changed, but this operation requires the recreation of the seed (e.g. turn a single-AZ seed into a multi-AZ seed)
  • Seeds need to be rebalanced
  • New seeds become available in a region closer to/in the region of the workers and the control plane should be moved there to improve latency
  • Gardener ring, which is a self-supporting setup/underlay for a highly available (usually cross-region) Gardener deployment

Goals

  • Provide a mechanism to migrate the control plane of a shoot cluster from one seed to another
  • The mechanism should support migration from a seed which is no longer reachable (Disaster Recovery)
  • The shoot cluster nodes are preserved and continue to run the workload, but will talk to the new control plane after the migration completes
  • Extension controllers implement a mechanism which allows them to store their state or to be restored from an already existing state on a different seed cluster.
  • The already existing shoot reconciliation flow is reused for migration with minimum changes

Terminology

Source Seed is the seed which currently hosts the control plane of a Shoot Cluster

Destination Seed is the seed to which the control plane is being migrated

Resources and controller state which have to be migrated between two seeds:

Note: The following lists are just FYI and are meant to show the current resources which need to be moved to the Destination Seed

Secrets

Gardener has preconfigured lists of needed secrets which are generated when a shoot is created and deployed in the seed. Following is a minimum set of secrets which must be migrated to the Destination Seed. Other secrets can be regenerated from them.

  • ca
  • ca-front-proxy
  • static-token
  • ca-kubelet
  • ca-metrics-server
  • etcd-encryption-secret
  • kube-aggregator
  • kube-apiserver-basic-auth
  • kube-apiserver
  • service-account-key
  • ssh-keypair

Custom Resources and state of extension controllers

Gardenlet deploys custom resources in the Source Seed cluster during shoot reconciliation which are reconciled by extension controllers. The state of these controllers and any additional resources they create is independent of the gardenlet and must also be migrated to the Destination Seed. Following is a list of custom resources, and the state which is generated by them that has to be migrated.

  • BackupBucket: nothing relevant for migration
  • BackupEntry: nothing relevant for migration
  • ControlPlane: nothing relevant for migration
  • DNSProvider/DNSEntry: nothing relevant for migration
  • Extensions: migration of state needs to be handled individually
  • Infrastructure: terraform state
  • Network: nothing relevant for migration
  • OperatingSystemConfig: nothing relevant for migration
  • Worker: Machine-Controller-Manager related objects: machineclasses, machinedeployments, machinesets, machines

This list depends on the currently installed extensions and can change in the future

Proposal

Custom Resource on the garden cluster

The Garden cluster has a new Custom Resource which is stored in the project namespace of the Shoot called ShootState. It contains all the required data described above so that the control plane can be recreated on the Destination Seed.

This data is separated into two sections. The first is generated by the gardenlet and then either used to generate new resources (e.g secrets) or is directly deployed to the Shoot’s control plane on the Destination Seed.

The second is generated by the extension controllers in the seed.

apiVersion: core.gardener.cloud/v1alpha1
kind: ShootState
metadata:
  name: my-shoot
  namespace: garden-core
  ownerReference:
    apiVersion: core.gardener.cloud/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Shoot
    name: my-shoot
    uid: ...
  finalizers:
  - gardener
gardenlet:
  secrets:
  - name: ca
    data:
      ca.crt: ...
      ca.key: ...
  - name: ssh-keypair
    data:
      id_rsa: ...
  - name:
...
extensions:
- kind: Infrastructure
  state: ... (Terraform state)
- kind: ControlPlane
  purpose: normal
  state: ... (Certificates generated by the extension)
- kind: Worker
  state: ... (Machine objects)

The state data is saved as a runtime.RawExtension type, which can be encoded/decoded by the corresponding extension controller.

There can be sensitive data in the ShootState which has to be hidden from the end-users. Hence, it will be recommended to provide an etcd encryption configuration to the Gardener API server in order to encrypt the ShootState resource.

Size limitations

There are limits on the size of the request bodies sent to the kubernetes API server when creating or updating resources: by default ETCD can only accept request bodies which do not exceed 1.5 MiB (this can be configured with the --max-request-bytes flag); the kubernetes API Server has a request body limit of 3 MiB which cannot be set from the outside (with a command line flag); the gRPC configuration used by the API server to talk to ETCD has a limit of 2 MiB per request body which cannot be configured from the outside; and watch requests have a 16 MiB limit on the buffer used to stream resources.

This means that if ShootState is bigger than 1.5 MiB, the ETCD max request bytes will have to be increased. However, there is still an upper limit of 2 MiB imposed by the gRPC configuration.

If ShootState exceeds this size limitation it must make use of configmap/secret references to store the state of extension controllers. This is an implementation detail of Gardener and can be done at a later time if necessary as extensions will not be affected.

Splitting the ShootState into multiple resources could have a positive benefit on performance as the Gardener API Server and Gardener Controller Manager would handle multiple small resources instead of one big resource.

Gardener extensions changes

All extension controllers which require state migration must save their state in a new status.state field and act on an annotation gardener.cloud/operation=restore in the respective Custom Resources which should trigger a restoration operation instead of reconciliation. A restoration operation means that the extension has to restore its state in the Shoot’s namespace on the Destination Seed from the status.state field.

As an example: the Infrastructure resource must save the terraform state.

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
  name: infrastructure
  namespace: shoot--foo--bar
spec:
  type: azure
  region: eu-west-1
  secretRef:
    name: cloudprovider
    namespace: shoot--foo--bar
  providerConfig:
    apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
    kind: InfrastructureConfig
    resourceGroup:
      name: mygroup
    networks:
      vnet: # specify either 'name' or 'cidr'
      # name: my-vnet
        cidr: 10.250.0.0/16
      workers: 10.250.0.0/19
status:
  state: |
      {
          "version": 3,
          "terraform_version": "0.11.14",
          "serial": 2,
          "lineage": "3a1e2faa-e7b6-f5f0-5043-368dd8ea6c10",
          "modules": [
              {
              }
          ]
          ...
      }

Extensions which do not require state migration should set status.state=nil in their Custom Resources and trigger a normal reconciliation operation if the CR contains the core.gardener.cloud/operation=restore annotation.

Similar to the contract for the reconcile operation, the extension controller has to remove the restore annotation after the restoration operation has finished.

An additional annotation gardener.cloud/operation=migrate is added to the Custom Resources. It is used to tell the extension controllers in the Source Seed that they must stop reconciling resources (in case they are requeued due to errors) and should perform cleanup activities in the Shoot’s control plane. These cleanup activities involve removing the finalizers on Custom Resources and deleting them without actually deleting any infrastructure resources.

Note: The same size limitations from the previous section are relevant here as well.

Shoot reconciliation flow changes

The only data which must be stored in the ShootState by the gardenlet is secrets (e.g ca for the API server). Therefore the botanist.DeploySecrets step is changed. It is split into two functions which take a list of secrets that have to be generated.

  • botanist.GenerateSecretState Generates certificate authorities and other secrets which have to be persisted in the ShootState and must not be regenerated on the Destination Seed.
  • botanist.DeploySecrets Takes secret data from the ShootState, generates new ones (e.g. client tls certificates from the saved certificate authorities) and deploys everything in the Shoot’s control plane on the Destination Seed

ShootState synchronization controller

The ShootState synchronization controller will become part of the gardenlet. It syncs the state of extension custom resources from the shoot namespace to the garden cluster and updates the corresponding spec.extension.state field in the ShootState resource. The controller can watch Custom Resources used by the extensions and update the ShootState only when changes occur.

Migration workflow

  1. Starting migration
    • Migration can only be started after a Shoot cluster has been successfully created so that the status.seed field in the Shoot resource has been set
    • The Shoot resource’s field spec.seedName="new-seed" is edited to hold the name of the Destination Seed and reconciliation is automatically triggered
    • The Garden Controller Manager checks if the equality between spec.seedName and status.seed, detects that they are different and triggers migration.
  2. The Garden Controller Manager waits for the Destination Seed to be ready
  3. Shoot’s API server is stopped
  4. Backup the Shoot’s ETCD.
  5. Extension resources in the Source Seed are annotated with gardener.cloud/operation=migrate
  6. Scale Down the Shoot’s control plane in the Source Seed.
  7. The gardenlet in the Destination Seed fetches the state of extension resources from the ShootState resource in the garden cluster.
  8. Normal reconciliation flow is resumed in the Destination Seed. Extension resources are annotated with gardener.cloud/operation=restore to instruct the extension controllers to reconstruct their state.
  9. The Shoot’s namespace in Source Seed is deleted.