This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Shoot Operations

1 - Controlling the Kubernetes Versions for Specific Worker Pools

Controlling the Kubernetes Versions for Specific Worker Pools

Since Gardener v1.36, worker pools can have different Kubernetes versions specified than the control plane.

In earlier Gardener versions, all worker pools inherited the Kubernetes version of the control plane. Once the Kubernetes version of the control plane was modified, all worker pools have been updated as well (either by rolling the nodes in case of a minor version change, or in-place for patch version changes).

In order to gracefully perform Kubernetes upgrades (triggering a rolling update of the nodes) with workloads sensitive to restarts (e.g., those dealing with lots of data), it might be required to be able to gradually perform the upgrade process. In such cases, the Kubernetes version for the worker pools can be pinned (.spec.provider.workers[].kubernetes.version) while the control plane Kubernetes version (.spec.kubernetes.version) is updated. This results in the nodes being untouched while the control plane is upgraded. Now a new worker pool (with the version equal to the control plane version) can be added. Administrators can then reschedule their workloads to the new worker pool according to their upgrade requirements and processes.

Example Usage in a Shoot

spec:
  kubernetes:
    version: 1.27.4
  provider:
    workers:
    - name: data1
      kubernetes:
        version: 1.26.8
    - name: data2
  • If .kubernetes.version is not specified in a worker pool, then the Kubernetes version of the kubelet is inherited from the control plane (.spec.kubernetes.version), i.e., in the above example, the data2 pool will use 1.26.8.
  • If .kubernetes.version is specified in a worker pool, then it must meet the following constraints:
    • It must be at most two minor versions lower than the control plane version.
    • If it was not specified before, then no downgrade is possible (you cannot set it to 1.26.8 while .spec.kubernetes.version is already 1.27.4). The “two minor version skew” is only possible if the worker pool version is set to the control plane version and then the control plane was updated gradually by two minor versions.
    • If the version is removed from the worker pool, only one minor version difference is allowed to the control plane (you cannot upgrade a pool from version 1.25.0 to 1.27.0 in one go).

Automatic updates of Kubernetes versions (see Shoot Maintenance) also apply to worker pool Kubernetes versions.

2 - Shoot Credentials Rotation

Credentials Rotation for Shoot Clusters

There are a lot of different credentials for Shoots to make sure that the various components can communicate with each other and to make sure it is usable and operable.

This page explains how the varieties of credentials can be rotated so that the cluster can be considered secure.

User-Provided Credentials

Cloud Provider Keys

End-users must provide credentials such that Gardener and Kubernetes controllers can communicate with the respective cloud provider APIs in order to perform infrastructure operations. For example, Gardener uses them to set up and maintain the networks, security groups, subnets, etc., while the cloud-controller-manager uses them to reconcile load balancers and routes, and the CSI controller uses them to reconcile volumes and disks.

Depending on the cloud provider, the required data keys of the Secret differ. Please consult the documentation of the respective provider extension documentation to get to know the concrete data keys (e.g., this document for AWS).

It is the responsibility of the end-user to regularly rotate those credentials. The following steps are required to perform the rotation:

  1. Update the data in the Secret with new credentials.
  2. ⚠️ Wait until all Shoots using the Secret are reconciled before you disable the old credentials in your cloud provider account! Otherwise, the Shoots will no longer work as expected. Check out this document to learn how to trigger a reconciliation of your Shoots.
  3. After all Shoots using the Secret were reconciled, you can go ahead and deactivate the old credentials in your provider account.

Gardener-Provided Credentials

The below credentials are generated by Gardener when shoot clusters are being created. Those include:

  • kubeconfig (if enabled)
  • certificate authorities (and related server and client certificates)
  • observability passwords for Plutono
  • SSH key pair for worker nodes
  • ETCD encryption key
  • ServiceAccount token signing key

🚨 There is no auto-rotation of those credentials, and it is the responsibility of the end-user to regularly rotate them.

While it is possible to rotate them one by one, there is also a convenient method to combine the rotation of all of those credentials. The rotation happens in two phases since it might be required to update some API clients (e.g., when CAs are rotated).

Prepare Rotation of All Credentials

In order to start the rotation (first phase), you have to annotate the shoot with the rotate-credentials-start operation:

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-credentials-start

Note: You can check the .status.credentials.rotation field in the Shoot to see when the rotation was last initiated and last completed.

Kindly consider the detailed descriptions below to learn how the rotation is performed and what your responsibilities are. Please note that all respective individual actions apply for this combined rotation as well (e.g., worker nodes are rolled out in the first phase).

Complete Rotation of All Credentials

You can complete the rotation (second phase) by annotating the shoot with the rotate-credentials-complete operation:

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-credentials-complete

Kubeconfig

If the .spec.kubernetes.enableStaticTokenKubeconfig field is set to true (default), then Gardener generates a kubeconfig with cluster-admin privileges for the Shoots containing credentials for communication with the kube-apiserver (see this document for more information).

This Secret is stored with the name <shoot-name>.kubeconfig in the project namespace in the garden cluster and has multiple data keys:

  • kubeconfig: the completed kubeconfig
  • ca.crt: the CA bundle for establishing trust to the API server (same as in the Cluster CA bundle secret)

Shoots created with Gardener <= 0.28 used to have a kubeconfig based on a client certificate instead of a static token. With the first kubeconfig rotation, such clusters will get a static token as well.

⚠️ This does not invalidate the old client certificate. In order to do this, you should perform a rotation of the CAs (see section below).

It is the responsibility of the end-user to regularly rotate those credentials (or disable this kubeconfig entirely). In order to rotate the token in this kubeconfig, annotate the Shoot with gardener.cloud/operation=rotate-kubeconfig-credentials. This operation is not allowed for Shoots that are already marked for deletion. Please note that only the token (and basic auth password, if enabled) are exchanged. The CA certificate remains the same (see section below for information about the rotation).

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-kubeconfig-credentials

You can check the .status.credentials.rotation.kubeconfig field in the Shoot to see when the rotation was last initiated and last completed.

Certificate Authorities

Gardener generates several certificate authorities (CAs) to ensure secured communication between the various components and actors. Most of those CAs are used for internal communication (e.g., kube-apiserver talks to etcd, vpn-shoot talks to the vpn-seed-server, kubelet talks to kube-apiserver). However, there is also the “cluster CA” which is part of all kubeconfigs and used to sign the server certificate exposed by the kube-apiserver.

Gardener populates a ConfigMap with the name <shoot-name>.ca-cluster in the project namespace in the garden cluster which contains the following data keys:

  • ca.crt: the CA bundle of the cluster

This bundle contains one or multiple CAs which are used for signing serving certificates of the Shoot’s API server. Hence, the certificates contained in this ConfigMap can be used to verify the API server’s identity when communicating with its public endpoint (e.g., as certificate-authority-data in a kubeconfig). This is the same certificate that is also contained in the kubeconfig’s certificate-authority-data field.

Shoots created with Gardener >= v1.45 have a dedicated client CA which verifies the legitimacy of client certificates. For older Shoots, the client CA is equal to the cluster CA. With the first CA rotation, such clusters will get a dedicated client CA as well.

All the certificates are valid for 10 years. Since it requires adaptation for the consumers of the Shoot, there is no automatic rotation, and it is the responsibility of the end-user to regularly rotate the CA certificates.

The rotation happens in three stages (see also GEP-18 for the full details):

  • In stage one, new CAs are created and added to the bundle (together with the old CAs). Client certificates are re-issued immediately.
  • In stage two, end-users update all cluster API clients that communicate with the control plane.
  • In stage three, the old CAs are dropped from the bundle and server certificate are re-issued.

Technically, the Preparing phase indicates stage one. Once it is completed, the Prepared phase indicates readiness for stage two. The Completing phase indicates stage three, and the Completed phase states that the rotation process has finished.

You can check the .status.credentials.rotation.certificateAuthorities field in the Shoot to see when the rotation was last initiated, last completed, and in which phase it currently is.

In order to start the rotation (stage one), you have to annotate the shoot with the rotate-ca-start operation:

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-ca-start

This will trigger a Shoot reconciliation and performs stage one. After it is completed, the .status.credentials.rotation.certificateAuthorities.phase is set to Prepared.

Now you must update all API clients outside the cluster (such as the kubeconfigs on developer machines) to use the newly issued CA bundle in the <shoot-name>.ca-cluster ConfigMap. Please also note that client certificates must be re-issued now.

After updating all API clients, you can complete the rotation by annotating the shoot with the rotate-ca-complete operation:

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-ca-complete

This will trigger another Shoot reconciliation and performs stage three. After it is completed, the .status.credentials.rotation.certificateAuthorities.phase is set to Completed. You could update your API clients again and drop the old CA from their bundle.

Note that the CA rotation also rotates all internal CAs and signed certificates. Hence, most of the components need to be restarted (including etcd and kube-apiserver).

⚠️ In stage one, all worker nodes of the Shoot will be rolled out to ensure that the Pods as well as the kubelets get the updated credentials as well.

Observability Password(s) For Plutono and Prometheus

For Shoots with .spec.purpose!=testing, Gardener deploys an observability stack with Prometheus for monitoring, Alertmanager for alerting (optional), Vali for logging, and Plutono for visualization. The Plutono instance is exposed via Ingress and accessible for end-users via basic authentication credentials generated and managed by Gardener.

Those credentials are stored in a Secret with the name <shoot-name>.monitoring in the project namespace in the garden cluster and has multiple data keys:

  • username: the username
  • password: the password
  • auth: the username with SHA-1 representation of the password

It is the responsibility of the end-user to regularly rotate those credentials. In order to rotate the password, annotate the Shoot with gardener.cloud/operation=rotate-observability-credentials. This operation is not allowed for Shoots that are already marked for deletion.

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-observability-credentials

You can check the .status.credentials.rotation.observability field in the Shoot to see when the rotation was last initiated and last completed.

SSH Key Pair for Worker Nodes

Gardener generates an SSH key pair whose public key is propagated to all worker nodes of the Shoot. The private key can be used to establish an SSH connection to the workers for troubleshooting purposes. It is recommended to use gardenctl-v2 and its gardenctl ssh command since it is required to first open up the security groups and create a bastion VM (no direct SSH access to the worker nodes is possible).

The private key is stored in a Secret with the name <shoot-name>.ssh-keypair in the project namespace in the garden cluster and has multiple data keys:

  • id_rsa: the private key
  • id_rsa.pub: the public key for SSH

In order to rotate the keys, annotate the Shoot with gardener.cloud/operation=rotate-ssh-keypair. This will propagate a new key to all worker nodes while keeping the old key active and valid as well (it will only be invalidated/removed with the next rotation).

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-ssh-keypair

You can check the .status.credentials.rotation.sshKeypair field in the Shoot to see when the rotation was last initiated or last completed.

The old key is stored in a Secret with the name <shoot-name>.ssh-keypair.old in the project namespace in the garden cluster and has the same data keys as the regular Secret.

ETCD Encryption Key

This key is used to encrypt the data of Secret resources inside etcd (see upstream Kubernetes documentation).

The encryption key has no expiration date. There is no automatic rotation, and it is the responsibility of the end-user to regularly rotate the encryption key.

The rotation happens in three stages:

  • In stage one, a new encryption key is created and added to the bundle (together with the old encryption key).
  • In stage two, all Secrets in the cluster and resources configured in the spec.kubernetes.kubeAPIServer.encryptionConfig of the Shoot (see ETCD Encryption Config) are rewritten by the kube-apiserver so that they become encrypted with the new encryption key.
  • In stage three, the old encryption is dropped from the bundle.

Technically, the Preparing phase indicates the stages one and two. Once it is completed, the Prepared phase indicates readiness for stage three. The Completing phase indicates stage three, and the Completed phase states that the rotation process has finished.

You can check the .status.credentials.rotation.etcdEncryptionKey field in the Shoot to see when the rotation was last initiated, last completed, and in which phase it currently is.

In order to start the rotation (stage one), you have to annotate the shoot with the rotate-etcd-encryption-key-start operation:

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-etcd-encryption-key-start

This will trigger a Shoot reconciliation and performs the stages one and two. After it is completed, the .status.credentials.rotation.etcdEncryptionKey.phase is set to Prepared. Now you can complete the rotation by annotating the shoot with the rotate-etcd-encryption-key-complete operation:

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-etcd-encryption-key-complete

This will trigger another Shoot reconciliation and performs stage three. After it is completed, the .status.credentials.rotation.etcdEncryptionKey.phase is set to Completed.

ServiceAccount Token Signing Key

Gardener generates a key which is used to sign the tokens for ServiceAccounts. Those tokens are typically used by workload Pods running inside the cluster in order to authenticate themselves with the kube-apiserver. This also includes system components running in the kube-system namespace.

The token signing key has no expiration date. Since it might require adaptation for the consumers of the Shoot, there is no automatic rotation, and it is the responsibility of the end-user to regularly rotate the signing key.

The rotation happens in three stages, similar to how the CA certificates are rotated:

  • In stage one, a new signing key is created and added to the bundle (together with the old signing key).
  • In stage two, end-users update all out-of-cluster API clients that communicate with the control plane via ServiceAccount tokens.
  • In stage three, the old signing key is dropped from the bundle.

Technically, the Preparing phase indicates stage one. Once it is completed, the Prepared phase indicates readiness for stage two. The Completing phase indicates stage three, and the Completed phase states that the rotation process has finished.

You can check the .status.credentials.rotation.serviceAccountKey field in the Shoot to see when the rotation was last initiated, last completed, and in which phase it currently is.

In order to start the rotation (stage one), you have to annotate the shoot with the rotate-serviceaccount-key-start operation:

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-serviceaccount-key-start

This will trigger a Shoot reconciliation and performs stage one. After it is completed, the .status.credentials.rotation.serviceAccountKey.phase is set to Prepared.

Now you must update all API clients outside the cluster using a ServiceAccount token (such as the kubeconfigs on developer machines) to use a token issued by the new signing key. Gardener already generates new secrets for those ServiceAccounts in the cluster, whose static token was automatically created by Kubernetes (typically before v1.22 - ref) However, if you need to create it manually, you can check out this document for instructions.

After updating all API clients, you can complete the rotation by annotating the shoot with the rotate-serviceaccount-key-complete operation:

kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-serviceaccount-key-complete

This will trigger another Shoot reconciliation and performs stage three. After it is completed, the .status.credentials.rotation.serviceAccountKey.phase is set to Completed.

⚠️ In stage one, all worker nodes of the Shoot will be rolled out to ensure that the Pods use a new token.

OpenVPN TLS Auth Keys

This key is used to ensure encrypted communication for the VPN connection between the control plane in the seed cluster and the shoot cluster. It is currently not rotated automatically and there is no way to trigger it manually.

3 - Shoot Kubernetes and Operating System Versioning in Gardener

Shoot Kubernetes and Operating System Versioning in Gardener

Motivation

On the one hand-side, Gardener is responsible for managing the Kubernetes and the Operating System (OS) versions of its Shoot clusters. On the other hand-side, Gardener needs to be configured and updated based on the availability and support of the Kubernetes and Operating System version it provides. For instance, the Kubernetes community releases minor versions roughly every three months and usually maintains three minor versions (the current and the last two) with bug fixes and security updates. Patch releases are done more frequently.

When using the term Machine image in the following, we refer to the OS version that comes with the machine image of the node/worker pool of a Gardener Shoot cluster. As such, we are not referring to the CloudProvider specific machine image like the AMI for AWS. For more information on how Gardener maps machine image versions to CloudProvider specific machine images, take a look at the individual gardener extension providers, such as the provider for AWS.

Gardener should be configured accordingly to reflect the “logical state” of a version. It should be possible to define the Kubernetes or Machine image versions that still receive bug fixes and security patches, and also vice-versa to define the version that are out-of-maintenance and are potentially vulnerable. Moreover, this allows Gardener to “understand” the current state of a version and act upon it (more information in the following sections).

Overview

As a Gardener operator:

  • I can classify a version based on it’s logical state (preview, supported, deprecated, and expired; see Version Classification).
  • I can define which Machine image and Kubernetes versions are eligible for the auto update of clusters during the maintenance time.
  • I can define a moment in time when Shoot clusters are forcefully migrated off a certain version (through an expirationDate).
  • I can define an update path for machine images for auto and force updates; see Update path for machine image versions).
  • I can disallow the creation of clusters having a certain version (think of severe security issues).

As an end-user/Shoot owner of Gardener:

  • I can get information about which Kubernetes and Machine image versions exist and their classification.
  • I can determine the time when my Shoot clusters Machine image and Kubernetes version will be forcefully updated to the next patch or minor version (in case the cluster is running a deprecated version with an expiration date).
  • I can get this information via API from the CloudProfile.

Version Classifications

Administrators can classify versions into four distinct “logical states”: preview, supported, deprecated, and expired. The version classification serves as a “point-of-reference” for end-users and also has implications during shoot creation and the maintenance time.

If a version is unclassified, Gardener cannot make those decision based on the “logical state”. Nevertheless, Gardener can operate without version classifications and can be added at any time to the Kubernetes and machine image versions in the CloudProfile.

As a best practice, versions usually start with the classification preview, then are promoted to supported, eventually deprecated and finally expired. This information is programmatically available in the CloudProfiles of the Garden cluster.

  • preview: A preview version is a new version that has not yet undergone thorough testing, possibly a new release, and needs time to be validated. Due to its short early age, there is a higher probability of undiscovered issues and is therefore not yet recommended for production usage. A Shoot does not update (neither auto-update or force-update) to a preview version during the maintenance time. Also, preview versions are not considered for the defaulting to the highest available version when deliberately omitting the patch version during Shoot creation. Typically, after a fresh release of a new Kubernetes (e.g., v1.25.0) or Machine image version (e.g., suse-chost 15.4.20220818), the operator tags it as preview until they have gained sufficient experience and regards this version to be reliable. After the operator has gained sufficient trust, the version can be manually promoted to supported.

  • supported: A supported version is the recommended version for new and existing Shoot clusters. This is the version that new Shoot clusters should use and existing clusters should update to. Typically for Kubernetes versions, the latest Kubernetes patch versions of the actual (if not still in preview) and the last 3 minor Kubernetes versions are maintained by the community. An operator could define these versions as being supported (e.g., v1.27.6, v1.26.10, and v1.25.12).

  • deprecated: A deprecated version is a version that approaches the end of its lifecycle and can contain issues which are probably resolved in a supported version. New Shoots should not use this version anymore. Existing Shoots will be updated to a newer version if auto-update is enabled (.spec.maintenance.autoUpdate.kubernetesVersion for Kubernetes version auto-update, or .spec.maintenance.autoUpdate.machineImageVersion for machine image version auto-update). Using automatic upgrades, however, does not guarantee that a Shoot runs a non-deprecated version, as the latest version (overall or of the minor version) can be deprecated as well. Deprecated versions should have an expiration date set for eventual expiration.

  • expired: An expired versions has an expiration date (based on the Golang time package) in the past. New clusters with that version cannot be created and existing clusters are forcefully migrated to a higher version during the maintenance time.

Below is an example how the relevant section of the CloudProfile might look like:

apiVersion: core.gardener.cloud/v1beta1
kind: CloudProfile
metadata:
  name: alicloud
spec:
  kubernetes:
    versions:
      - classification: preview
        version: 1.27.0
      - classification: preview
        version: 1.26.3
      - classification: supported
        version: 1.26.2
      - classification: preview
        version: 1.25.5
      - classification: supported
        version: 1.25.4
      - classification: supported
        version: 1.24.6
      - classification: deprecated
        expirationDate: "2022-11-30T23:59:59Z"
        version: 1.24.5

Automatic Version Upgrades

There are two ways, the Kubernetes version of the control plane as well as the Kubernetes and machine image version of a worker pool can be upgraded: auto update and forceful update. See Automatic Version Updates for how to enable auto updates for Kubernetes or machine image versions on the Shoot cluster.

If a Shoot is running a version after its expiration date has passed, it will be forcefully updated during its maintenance time. This happens even if the owner has opted out of automatic cluster updates!

When an auto update is triggered?:

  • The Shoot has auto-update enabled and the version is not the latest eligible version for the auto-update. Please note that this latest version that qualifies for an auto-update is not necessarily the overall latest version in the CloudProfile:
    • For Kubernetes version, the latest eligible version for auto-updates is the latest patch version of the current minor.
    • For machine image version, the latest eligible version for auto-updates is controlled by the updateStrategy field of the machine image in the CloudProfile.
  • The Shoot has auto-update disabled and the version is either expired or does not exist.

The auto update can fail if the version is already on the latest eligible version for the auto-update. A failed auto update triggers a force update. The force and auto update path for Kubernetes and machine image versions differ slightly and are described in more detail below.

Update rules for both Kubernetes and machine image versions

  • Both auto and force update first try to update to the latest patch version of the same minor.
  • An auto update prefers supported versions over deprecated versions. If there is a lower supported version and a higher deprecated version, auto update will pick the supported version. If all qualifying versions are deprecated, update to the latest deprecated version.
  • An auto update never updates to an expired version.
  • A force update prefers to update to not-expired versions. If all qualifying versions are expired, update to the latest expired version. Please note that therefore multiple consecutive version upgrades are possible. In this case, the version is again upgraded in the next maintenance time.

Update path for machine image versions

Administrators can define three different update strategies (field updateStrategy) for machine images in the CloudProfile: patch, minor, major (default). This is to accommodate the different version schemes of Operating Systems (e.g. Gardenlinux only updates major and minor versions with occasional patches).

  • patch: update to the latest patch version of the current minor version. When using an expired version: force update to the latest patch of the current minor. If already on the latest patch version, then force update to the next higher (not necessarily +1) minor version.
  • minor: update to the latest minor and patch version. When using an expired version: force update to the latest minor and patch of the current major. If already on the latest minor and patch of the current major, then update to the next higher (not necessarily +1) major version.
  • major: always update to the overall latest version. This is the legacy behavior for automatic machine image version upgrades. Force updates are not possible and will fail if the latest version in the CloudProfile for that image is expired (EOL scenario).

Example configuration in the CloudProfile:

machineImages:
  - name: gardenlinux
    updateStrategy: minor
    versions:
     - version: 1096.1.0
     - version: 934.8.0
     - version: 934.7.0
  - name: suse-chost
    updateStrategy: patch
    versions:
    - version: 15.3.20220818 
    - version: 15.3.20221118

Please note that force updates for machine images can skip minor versions (strategy: patch) or major versions (strategy: minor) if the next minor/major version has no qualifying versions (only preview versions).

Update path for Kubernetes versions

For Kubernetes versions, the auto update picks the latest non-preview patch version of the current minor version.

If the cluster is already on the latest patch version and the latest patch version is also expired, it will continue with the latest patch version of the next consecutive minor (minor +1) Kubernetes version, so it will result in an update of a minor Kubernetes version!

Kubernetes “minor version jumps” are not allowed - meaning to skip the update to the consecutive minor version and directly update to any version after that. For instance, the version 1.24.x can only update to a version 1.25.x, not to 1.26.x or any other version. This is because Kubernetes does not guarantee upgradability in this case, leading to possibly broken Shoot clusters. The administrator has to set up the CloudProfile in such a way that consecutive Kubernetes minor versions are available. Otherwise, Shoot clusters will fail to upgrade during the maintenance time.

Consider the CloudProfile below with a Shoot using the Kubernetes version 1.24.12. Even though the version is expired, due to missing 1.25.x versions, the Gardener Controller Manager cannot upgrade the Shoot’s Kubernetes version.

spec:
  kubernetes:
    versions:
    - version: 1.26.10
    - version: 1.26.9
    - version: 1.24.12
      expirationDate: "<expiration date in the past>"

The CloudProfile must specify versions 1.25.x of the consecutive minor version. Configuring the CloudProfile in such a way, the Shoot’s Kubernetes version will be upgraded to version 1.25.10 in the next maintenance time.

spec:
  kubernetes:
    versions:
    - version: 1.26.9
    - version: 1.25.10
    - version: 1.25.9
    - version: 1.24.12
      expirationDate: "<expiration date in the past>"

Version Requirements (Kubernetes and Machine Image)

The Gardener API server enforces the following requirements for versions:

  • A version that is in use by a Shoot cannot be deleted from the CloudProfile.
  • Creating a new version with expiration date in the past is not allowed.
  • There can be only one supported version per minor version.
  • The latest Kubernetes version cannot have an expiration date.
    • NOTE: The latest version for a machine image can have an expiration date. [*]

[*] Useful for cases in which support for a given machine image needs to be deprecated and removed (for example, the machine image reaches end of life).

You might want to read about the Shoot Updates and Upgrades procedures to get to know the effects of such operations.

4 - Shoot Updates and Upgrades

Shoot Updates and Upgrades

This document describes what happens during shoot updates (changes incorporated in a newly deployed Gardener version) and during shoot upgrades (changes for version controllable by end-users).

Updates

Updates to all aspects of the shoot cluster happen when the gardenlet reconciles the Shoot resource.

When are Reconciliations Triggered

Generally, when you change the specification of your Shoot the reconciliation will start immediately, potentially updating your cluster. Please note that you can also confine the reconciliation triggered due to your specification updates to the cluster’s maintenance time window. Please find more information in Confine Specification Changes/Updates Roll Out.

You can also annotate your shoot with special operation annotations (for more information, see Trigger Shoot Operations), which will cause the reconciliation to start due to your actions.

There is also an automatic reconciliation by Gardener. The period, i.e., how often it is performed, depends on the configuration of the Gardener administrators/operators. In some Gardener installations the operators might enable “reconciliation in maintenance time window only” (for more information, see Cluster Reconciliation), which will result in at least one reconciliation during the time configured in the Shoot’s .spec.maintenance.timeWindow field.

Which Updates are Applied

As end-users can only control the Shoot resource’s specification but not the used Gardener version, they don’t have any influence on which of the updates are rolled out (other than those settings configurable in the Shoot). A Gardener operator can deploy a new Gardener version at any point in time. Any subsequent reconciliation of Shoots will update them by rolling out the changes incorporated in this new Gardener version.

Some examples for such shoot updates are:

  • Add a new/remove an old component to/from the shoot’s control plane running in the seed, or to/from the shoot’s system components running on the worker nodes.
  • Change the configuration of an existing control plane/system component.
  • Restart of existing control plane/system components (this might result in a short unavailability of the Kubernetes API server, e.g., when etcd or a kube-apiserver itself is being restarted)

Behavioural Changes

Generally, some of such updates (e.g., configuration changes) could theoretically result in different behaviour of controllers. If such changes would be backwards-incompatible, then we usually follow one of those approaches (depends on the concrete change):

  • Only apply the change for new clusters.
  • Expose a new field in the Shoot resource that lets users control this changed behaviour to enable it at a convenient point in time.
  • Put the change behind an alpha feature gate (disabled by default) in the gardenlet (only controllable by Gardener operators), which will be promoted to beta (enabled by default) in subsequent releases (in this case, end-users have no influence on when the behaviour changes - Gardener operators should inform their end-users and provide clear timelines when they will enable the feature gate).

Upgrades

We consider shoot upgrades to change either the:

  • Kubernetes version (.spec.kubernetes.version)
  • Kubernetes version of the worker pool if specified (.spec.provider.workers[].kubernetes.version)
  • Machine image version of at least one worker pool (.spec.provider.workers[].machine.image.version)

Generally, an upgrade is also performed through a reconciliation of the Shoot resource, i.e., the same concepts as for shoot updates apply. If an end-user triggers an upgrade (e.g., by changing the Kubernetes version) after a new Gardener version was deployed but before the shoot was reconciled again, then this upgrade might incorporate the changes delivered with this new Gardener version.

In-Place vs. Rolling Updates

If the Kubernetes patch version is changed, then the upgrade happens in-place. This means that the shoot worker nodes remain untouched and only the kubelet process restarts with the new Kubernetes version binary. The same applies for configuration changes of the kubelet.

If the Kubernetes minor version is changed, then the upgrade is done in a “rolling update” fashion, similar to how pods in Kubernetes are updated (when backed by a Deployment). The worker nodes will be terminated one after another and replaced by new machines. The existing workload is gracefully drained and evicted from the old worker nodes to new worker nodes, respecting the configured PodDisruptionBudgets (see Specifying a Disruption Budget for your Application).

Customize Rolling Update Behaviour of Shoot Worker Nodes

The .spec.provider.workers[] list exposes two fields that you might configure based on your workload’s needs: maxSurge and maxUnavailable. The same concepts like in Kubernetes apply. Additionally, you might customize how the machine-controller-manager (abbrev.: MCM; the component instrumenting this rolling update) is behaving. You can configure the following fields in .spec.provider.worker[].machineControllerManager:

  • machineDrainTimeout: Timeout (in duration) used while draining of machine before deletion, beyond which MCM forcefully deletes the machine (default: 2h).
  • machineHealthTimeout: Timeout (in duration) used while re-joining (in case of temporary health issues) of a machine before it is declared as failed (default: 10m).
  • machineCreationTimeout: Timeout (in duration) used while joining (during creation) of a machine before it is declared as failed (default: 10m).
  • maxEvictRetries: Maximum number of times evicts would be attempted on a pod before it is forcibly deleted during the draining of a machine (default: 10).
  • nodeConditions: List of case-sensitive node-conditions which will change a machine to a Failed state after the machineHealthTimeout duration. It may further be replaced with a new machine if the machine is backed by a machine-set object (defaults: KernelDeadlock, ReadonlyFilesystem , DiskPressure).

Rolling Update Triggers

Apart from the above mentioned triggers, a rolling update of the shoot worker nodes is also triggered for some changes to your worker pool specification (.spec.provider.workers[], even if you don’t change the Kubernetes or machine image version). The complete list of fields that trigger a rolling update:

  • .spec.kubernetes.version (except for patch version changes)
  • .spec.provider.workers[].machine.image.name
  • .spec.provider.workers[].machine.image.version
  • .spec.provider.workers[].machine.type
  • .spec.provider.workers[].volume.type
  • .spec.provider.workers[].volume.size
  • .spec.provider.workers[].providerConfig (except if feature gate NewWorkerPoolHash)
  • .spec.provider.workers[].cri.name
  • .spec.provider.workers[].kubernetes.version (except for patch version changes)
  • .spec.systemComponents.nodeLocalDNS.enabled
  • .status.credentials.rotation.certificateAuthorities.lastInitiationTime (changed by Gardener when a shoot CA rotation is initiated)
  • .status.credentials.rotation.serviceAccountKey.lastInitiationTime (changed by Gardener when a shoot service account signing key rotation is initiated)

If feature gate NewWorkerPoolHash is enabled:

  • .spec.kubernetes.kubelet.kubeReserved (unless a worker pool-specific value is set)
  • .spec.kubernetes.kubelet.systemReserved (unless a worker pool-specific value is set)
  • .spec.kubernetes.kubelet.evictionHard (unless a worker pool-specific value is set)
  • .spec.kubernetes.kubelet.cpuManagerPolicy (unless a worker pool-specific value is set)
  • .spec.provider.workers[].kubernetes.kubelet.kubeReserved
  • .spec.provider.workers[].kubernetes.kubelet.systemReserved
  • .spec.provider.workers[].kubernetes.kubelet.evictionHard
  • .spec.provider.workers[].kubernetes.kubelet.cpuManagerPolicy

Changes to kubeReserved or systemReserved do not trigger a node roll if their sum does not change.

Generally, the provider extension controllers might have additional constraints for changes leading to rolling updates, so please consult the respective documentation as well. In particular, if the feature gate NewWorkerPoolHash is enabled and a worker pool uses the new hash, then the providerConfig as a whole is not included. Instead only fields selected by the provider extension are considered.

5 - Supported Kubernetes Versions

Supported Kubernetes Versions

Currently, Gardener supports the following Kubernetes versions:

Garden Clusters

The minimum version of a garden cluster that can be used to run Gardener is 1.25.x.

Seed Clusters

The minimum version of a seed cluster that can be connected to Gardener is 1.25.x.

Shoot Clusters

Gardener itself is capable of spinning up clusters with Kubernetes versions 1.25 up to 1.31. However, the concrete versions that can be used for shoot clusters depend on the installed provider extension. Consequently, please consult the documentation of your provider extension to see which Kubernetes versions are supported for shoot clusters.

👨🏼‍💻 Developers note: The Adding Support For a New Kubernetes Version topic explains what needs to be done in order to add support for a new Kubernetes version.

6 - Trigger Shoot Operations Through Annotations

Trigger Shoot Operations Through Annotations

You can trigger a few explicit operations by annotating the Shoot with an operation annotation. This might allow you to induct certain behavior without the need to change the Shoot specification. Some of the operations can also not be caused by changing something in the shoot specification because they can’t properly be reflected here. Note that once the triggered operation is considered by the controllers, the annotation will be automatically removed and you have to add it each time you want to trigger the operation.

Please note: If .spec.maintenance.confineSpecUpdateRollout=true, then the only way to trigger a shoot reconciliation is by setting the reconcile operation, see below.

Immediate Reconciliation

Annotate the shoot with gardener.cloud/operation=reconcile to make the gardenlet start a reconciliation operation without changing the shoot spec and possibly without being in its maintenance time window:

kubectl -n garden-<project-name> annotate shoot <shoot-name> gardener.cloud/operation=reconcile

Immediate Maintenance

Annotate the shoot with gardener.cloud/operation=maintain to make the gardener-controller-manager start maintaining your shoot immediately (possibly without being in its maintenance time window). If no reconciliation starts, then nothing needs to be maintained:

kubectl -n garden-<project-name> annotate shoot <shoot-name> gardener.cloud/operation=maintain

Retry Failed Reconciliation

Annotate the shoot with gardener.cloud/operation=retry to make the gardenlet start a new reconciliation loop on a failed shoot. Failed shoots are only reconciled again if a new Gardener version is deployed, the shoot specification is changed or this annotation is set:

kubectl -n garden-<project-name> annotate shoot <shoot-name> gardener.cloud/operation=retry

Credentials Rotation Operations

Please consult Credentials Rotation for Shoot Clusters for more information.

Restart systemd Services on Particular Worker Nodes

It is possible to make Gardener restart particular systemd services on your shoot worker nodes if needed. The annotation is not set on the Shoot resource but directly on the Node object you want to target. For example, the following will restart both the kubelet and the containerd services:

kubectl annotate node <node-name> worker.gardener.cloud/restart-systemd-services=kubelet,containerd

It may take up to a minute until the service is restarted. The annotation will be removed from the Node object after all specified systemd services have been restarted. It will also be removed even if the restart of one or more services failed.

ℹ️ In the example mentioned above, you could additionally verify when/whether the kubelet restarted by using kubectl describe node <node-name> and looking for such a Starting kubelet event.

Force Deletion

When the ShootForceDeletion feature gate in the gardener-apiserver is enabled, users will be able to force-delete the Shoot. This is only possible if the Shoot fails to be deleted normally. For forceful deletion, the following conditions must be met:

  • Shoot has a deletion timestamp.
  • Shoot status contains at least one of the following ErrorCodes:
    • ERR_CLEANUP_CLUSTER_RESOURCES
    • ERR_CONFIGURATION_PROBLEM
    • ERR_INFRA_DEPENDENCIES
    • ERR_INFRA_UNAUTHENTICATED
    • ERR_INFRA_UNAUTHORIZED

If the above conditions are satisfied, you can annotate the Shoot with confirmation.gardener.cloud/force-deletion=true, and Gardener will cleanup the Shoot controlplane and the Shoot metadata.

⚠️ You MUST ensure that all the resources created in the IaaS account are cleaned up to prevent orphaned resources. Gardener will NOT delete any resources in the underlying infrastructure account. Hence, use this annotation at your own risk and only if you are fully aware of these consequences.