This is the multi-page printable view of this section. Click here to print.
Shoot Operations
1 - Controlling the Kubernetes Versions for Specific Worker Pools
Controlling the Kubernetes Versions for Specific Worker Pools
Since Gardener v1.36
, worker pools can have different Kubernetes versions specified than the control plane.
In earlier Gardener versions, all worker pools inherited the Kubernetes version of the control plane. Once the Kubernetes version of the control plane was modified, all worker pools have been updated as well (either by rolling the nodes in case of a minor version change, or in-place for patch version changes).
In order to gracefully perform Kubernetes upgrades (triggering a rolling update of the nodes) with workloads sensitive to restarts (e.g., those dealing with lots of data), it might be required to be able to gradually perform the upgrade process.
In such cases, the Kubernetes version for the worker pools can be pinned (.spec.provider.workers[].kubernetes.version
) while the control plane Kubernetes version (.spec.kubernetes.version
) is updated.
This results in the nodes being untouched while the control plane is upgraded.
Now a new worker pool (with the version equal to the control plane version) can be added.
Administrators can then reschedule their workloads to the new worker pool according to their upgrade requirements and processes.
Example Usage in a Shoot
spec:
kubernetes:
version: 1.27.4
provider:
workers:
- name: data1
kubernetes:
version: 1.26.8
- name: data2
- If
.kubernetes.version
is not specified in a worker pool, then the Kubernetes version of the kubelet is inherited from the control plane (.spec.kubernetes.version
), i.e., in the above example, thedata2
pool will use1.26.8
. - If
.kubernetes.version
is specified in a worker pool, then it must meet the following constraints:- It must be at most two minor versions lower than the control plane version.
- If it was not specified before, then no downgrade is possible (you cannot set it to
1.26.8
while.spec.kubernetes.version
is already1.27.4
). The “two minor version skew” is only possible if the worker pool version is set to the control plane version and then the control plane was updated gradually by two minor versions. - If the version is removed from the worker pool, only one minor version difference is allowed to the control plane (you cannot upgrade a pool from version
1.25.0
to1.27.0
in one go).
Automatic updates of Kubernetes versions (see Shoot Maintenance) also apply to worker pool Kubernetes versions.
2 - Shoot Credentials Rotation
Credentials Rotation for Shoot Clusters
There are a lot of different credentials for Shoot
s to make sure that the various components can communicate with each other and to make sure it is usable and operable.
This page explains how the varieties of credentials can be rotated so that the cluster can be considered secure.
User-Provided Credentials
Cloud Provider Keys
End-users must provide credentials such that Gardener and Kubernetes controllers can communicate with the respective cloud provider APIs in order to perform infrastructure operations. For example, Gardener uses them to set up and maintain the networks, security groups, subnets, etc., while the cloud-controller-manager uses them to reconcile load balancers and routes, and the CSI controller uses them to reconcile volumes and disks.
Depending on the cloud provider, the required data keys of the Secret
differ.
Please consult the documentation of the respective provider extension documentation to get to know the concrete data keys (e.g., this document for AWS).
It is the responsibility of the end-user to regularly rotate those credentials. The following steps are required to perform the rotation:
- Update the data in the
Secret
with new credentials. - ⚠️ Wait until all
Shoot
s using theSecret
are reconciled before you disable the old credentials in your cloud provider account! Otherwise, theShoot
s will no longer work as expected. Check out this document to learn how to trigger a reconciliation of yourShoot
s. - After all
Shoot
s using theSecret
were reconciled, you can go ahead and deactivate the old credentials in your provider account.
Gardener-Provided Credentials
The below credentials are generated by Gardener when shoot clusters are being created. Those include:
- kubeconfig (if enabled)
- certificate authorities (and related server and client certificates)
- observability passwords for Plutono
- SSH key pair for worker nodes
- ETCD encryption key
ServiceAccount
token signing key- …
🚨 There is no auto-rotation of those credentials, and it is the responsibility of the end-user to regularly rotate them.
While it is possible to rotate them one by one, there is also a convenient method to combine the rotation of all of those credentials. The rotation happens in two phases since it might be required to update some API clients (e.g., when CAs are rotated).
Prepare Rotation of All Credentials
In order to start the rotation (first phase), you have to annotate the shoot with the rotate-credentials-start
operation:
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-credentials-start
Note: You can check the
.status.credentials.rotation
field in theShoot
to see when the rotation was last initiated and last completed.
Kindly consider the detailed descriptions below to learn how the rotation is performed and what your responsibilities are. Please note that all respective individual actions apply for this combined rotation as well (e.g., worker nodes are rolled out in the first phase).
Complete Rotation of All Credentials
You can complete the rotation (second phase) by annotating the shoot with the rotate-credentials-complete
operation:
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-credentials-complete
Kubeconfig
If the .spec.kubernetes.enableStaticTokenKubeconfig
field is set to true
(default), then Gardener generates a kubeconfig
with cluster-admin
privileges for the Shoot
s containing credentials for communication with the kube-apiserver
(see this document for more information).
This Secret
is stored with the name <shoot-name>.kubeconfig
in the project namespace in the garden cluster and has multiple data keys:
kubeconfig
: the completed kubeconfigca.crt
: the CA bundle for establishing trust to the API server (same as in the Cluster CA bundle secret)
Shoots
created with Gardener <= 0.28 used to have akubeconfig
based on a client certificate instead of a static token. With the first kubeconfig rotation, such clusters will get a static token as well.⚠️ This does not invalidate the old client certificate. In order to do this, you should perform a rotation of the CAs (see section below).
It is the responsibility of the end-user to regularly rotate those credentials (or disable this kubeconfig
entirely).
In order to rotate the token
in this kubeconfig
, annotate the Shoot
with gardener.cloud/operation=rotate-kubeconfig-credentials
.
This operation is not allowed for Shoot
s that are already marked for deletion.
Please note that only the token (and basic auth password, if enabled) are exchanged.
The CA certificate remains the same (see section below for information about the rotation).
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-kubeconfig-credentials
You can check the
.status.credentials.rotation.kubeconfig
field in theShoot
to see when the rotation was last initiated and last completed.
Certificate Authorities
Gardener generates several certificate authorities (CAs) to ensure secured communication between the various components and actors.
Most of those CAs are used for internal communication (e.g., kube-apiserver
talks to etcd, vpn-shoot
talks to the vpn-seed-server
, kubelet
talks to kube-apiserver
).
However, there is also the “cluster CA” which is part of all kubeconfig
s and used to sign the server certificate exposed by the kube-apiserver
.
Gardener populates a ConfigMap
with the name <shoot-name>.ca-cluster
in the project namespace in the garden cluster which contains the following data keys:
ca.crt
: the CA bundle of the cluster
This bundle contains one or multiple CAs which are used for signing serving certificates of the Shoot
’s API server.
Hence, the certificates contained in this ConfigMap
can be used to verify the API server’s identity when communicating with its public endpoint (e.g., as certificate-authority-data
in a kubeconfig
).
This is the same certificate that is also contained in the kubeconfig
’s certificate-authority-data
field.
Shoot
s created with Gardener >= v1.45 have a dedicated client CA which verifies the legitimacy of client certificates. For olderShoot
s, the client CA is equal to the cluster CA. With the first CA rotation, such clusters will get a dedicated client CA as well.
All the certificates are valid for 10 years.
Since it requires adaptation for the consumers of the Shoot
, there is no automatic rotation, and it is the responsibility of the end-user to regularly rotate the CA certificates.
The rotation happens in three stages (see also GEP-18 for the full details):
- In stage one, new CAs are created and added to the bundle (together with the old CAs). Client certificates are re-issued immediately.
- In stage two, end-users update all cluster API clients that communicate with the control plane.
- In stage three, the old CAs are dropped from the bundle and server certificate are re-issued.
Technically, the Preparing
phase indicates stage one.
Once it is completed, the Prepared
phase indicates readiness for stage two.
The Completing
phase indicates stage three, and the Completed
phase states that the rotation process has finished.
You can check the
.status.credentials.rotation.certificateAuthorities
field in theShoot
to see when the rotation was last initiated, last completed, and in which phase it currently is.
In order to start the rotation (stage one), you have to annotate the shoot with the rotate-ca-start
operation:
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-ca-start
This will trigger a Shoot
reconciliation and performs stage one.
After it is completed, the .status.credentials.rotation.certificateAuthorities.phase
is set to Prepared
.
Now you must update all API clients outside the cluster (such as the kubeconfig
s on developer machines) to use the newly issued CA bundle in the <shoot-name>.ca-cluster
ConfigMap
.
Please also note that client certificates must be re-issued now.
After updating all API clients, you can complete the rotation by annotating the shoot with the rotate-ca-complete
operation:
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-ca-complete
This will trigger another Shoot
reconciliation and performs stage three.
After it is completed, the .status.credentials.rotation.certificateAuthorities.phase
is set to Completed
.
You could update your API clients again and drop the old CA from their bundle.
Note that the CA rotation also rotates all internal CAs and signed certificates. Hence, most of the components need to be restarted (including etcd and
kube-apiserver
).⚠️ In stage one, all worker nodes of the
Shoot
will be rolled out to ensure that thePod
s as well as thekubelet
s get the updated credentials as well.
Observability Password(s) For Plutono and Prometheus
For Shoot
s with .spec.purpose!=testing
, Gardener deploys an observability stack with Prometheus for monitoring, Alertmanager for alerting (optional), Vali for logging, and Plutono for visualization.
The Plutono instance is exposed via Ingress
and accessible for end-users via basic authentication credentials generated and managed by Gardener.
Those credentials are stored in a Secret
with the name <shoot-name>.monitoring
in the project namespace in the garden cluster and has multiple data keys:
username
: the usernamepassword
: the passwordauth
: the username with SHA-1 representation of the password
It is the responsibility of the end-user to regularly rotate those credentials.
In order to rotate the password
, annotate the Shoot
with gardener.cloud/operation=rotate-observability-credentials
.
This operation is not allowed for Shoot
s that are already marked for deletion.
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-observability-credentials
You can check the
.status.credentials.rotation.observability
field in theShoot
to see when the rotation was last initiated and last completed.
SSH Key Pair for Worker Nodes
Gardener generates an SSH key pair whose public key is propagated to all worker nodes of the Shoot
.
The private key can be used to establish an SSH connection to the workers for troubleshooting purposes.
It is recommended to use gardenctl-v2
and its gardenctl ssh
command since it is required to first open up the security groups and create a bastion VM (no direct SSH access to the worker nodes is possible).
The private key is stored in a Secret
with the name <shoot-name>.ssh-keypair
in the project namespace in the garden cluster and has multiple data keys:
id_rsa
: the private keyid_rsa.pub
: the public key for SSH
In order to rotate the keys, annotate the Shoot
with gardener.cloud/operation=rotate-ssh-keypair
.
This will propagate a new key to all worker nodes while keeping the old key active and valid as well (it will only be invalidated/removed with the next rotation).
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-ssh-keypair
You can check the
.status.credentials.rotation.sshKeypair
field in theShoot
to see when the rotation was last initiated or last completed.
The old key is stored in a Secret
with the name <shoot-name>.ssh-keypair.old
in the project namespace in the garden cluster and has the same data keys as the regular Secret
.
ETCD Encryption Key
This key is used to encrypt the data of Secret
resources inside etcd (see upstream Kubernetes documentation).
The encryption key has no expiration date. There is no automatic rotation, and it is the responsibility of the end-user to regularly rotate the encryption key.
The rotation happens in three stages:
- In stage one, a new encryption key is created and added to the bundle (together with the old encryption key).
- In stage two, all
Secret
s in the cluster and resources configured in thespec.kubernetes.kubeAPIServer.encryptionConfig
of the Shoot (see ETCD Encryption Config) are rewritten by thekube-apiserver
so that they become encrypted with the new encryption key. - In stage three, the old encryption is dropped from the bundle.
Technically, the Preparing
phase indicates the stages one and two.
Once it is completed, the Prepared
phase indicates readiness for stage three.
The Completing
phase indicates stage three, and the Completed
phase states that the rotation process has finished.
You can check the
.status.credentials.rotation.etcdEncryptionKey
field in theShoot
to see when the rotation was last initiated, last completed, and in which phase it currently is.
In order to start the rotation (stage one), you have to annotate the shoot with the rotate-etcd-encryption-key-start
operation:
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-etcd-encryption-key-start
This will trigger a Shoot
reconciliation and performs the stages one and two.
After it is completed, the .status.credentials.rotation.etcdEncryptionKey.phase
is set to Prepared
.
Now you can complete the rotation by annotating the shoot with the rotate-etcd-encryption-key-complete
operation:
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-etcd-encryption-key-complete
This will trigger another Shoot
reconciliation and performs stage three.
After it is completed, the .status.credentials.rotation.etcdEncryptionKey.phase
is set to Completed
.
ServiceAccount
Token Signing Key
Gardener generates a key which is used to sign the tokens for ServiceAccount
s.
Those tokens are typically used by workload Pod
s running inside the cluster in order to authenticate themselves with the kube-apiserver
.
This also includes system components running in the kube-system
namespace.
The token signing key has no expiration date.
Since it might require adaptation for the consumers of the Shoot
, there is no automatic rotation, and it is the responsibility of the end-user to regularly rotate the signing key.
The rotation happens in three stages, similar to how the CA certificates are rotated:
- In stage one, a new signing key is created and added to the bundle (together with the old signing key).
- In stage two, end-users update all out-of-cluster API clients that communicate with the control plane via
ServiceAccount
tokens. - In stage three, the old signing key is dropped from the bundle.
Technically, the Preparing
phase indicates stage one.
Once it is completed, the Prepared
phase indicates readiness for stage two.
The Completing
phase indicates stage three, and the Completed
phase states that the rotation process has finished.
You can check the
.status.credentials.rotation.serviceAccountKey
field in theShoot
to see when the rotation was last initiated, last completed, and in which phase it currently is.
In order to start the rotation (stage one), you have to annotate the shoot with the rotate-serviceaccount-key-start
operation:
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-serviceaccount-key-start
This will trigger a Shoot
reconciliation and performs stage one.
After it is completed, the .status.credentials.rotation.serviceAccountKey.phase
is set to Prepared
.
Now you must update all API clients outside the cluster using a ServiceAccount
token (such as the kubeconfig
s on developer machines) to use a token issued by the new signing key.
Gardener already generates new secrets for those ServiceAccount
s in the cluster, whose static token was automatically created by Kubernetes (typically before v1.22
- ref)
However, if you need to create it manually, you can check out this document for instructions.
After updating all API clients, you can complete the rotation by annotating the shoot with the rotate-serviceaccount-key-complete
operation:
kubectl -n <shoot-namespace> annotate shoot <shoot-name> gardener.cloud/operation=rotate-serviceaccount-key-complete
This will trigger another Shoot
reconciliation and performs stage three.
After it is completed, the .status.credentials.rotation.serviceAccountKey.phase
is set to Completed
.
⚠️ In stage one, all worker nodes of the
Shoot
will be rolled out to ensure that thePod
s use a new token.
OpenVPN TLS Auth Keys
This key is used to ensure encrypted communication for the VPN connection between the control plane in the seed cluster and the shoot cluster. It is currently not rotated automatically and there is no way to trigger it manually.
3 - Shoot Kubernetes and Operating System Versioning in Gardener
Shoot Kubernetes and Operating System Versioning in Gardener
Motivation
On the one hand-side, Gardener is responsible for managing the Kubernetes and the Operating System (OS) versions of its Shoot clusters. On the other hand-side, Gardener needs to be configured and updated based on the availability and support of the Kubernetes and Operating System version it provides. For instance, the Kubernetes community releases minor versions roughly every three months and usually maintains three minor versions (the current and the last two) with bug fixes and security updates. Patch releases are done more frequently.
When using the term Machine image
in the following, we refer to the OS version that comes with the machine image of the node/worker pool of a Gardener Shoot cluster.
As such, we are not referring to the CloudProvider
specific machine image like the AMI
for AWS.
For more information on how Gardener maps machine image versions to CloudProvider
specific machine images, take a look at the individual gardener extension providers, such as the provider for AWS.
Gardener should be configured accordingly to reflect the “logical state” of a version. It should be possible to define the Kubernetes or Machine image versions that still receive bug fixes and security patches, and also vice-versa to define the version that are out-of-maintenance and are potentially vulnerable. Moreover, this allows Gardener to “understand” the current state of a version and act upon it (more information in the following sections).
Overview
As a Gardener operator:
- I can classify a version based on it’s logical state (
preview
,supported
,deprecated
, andexpired
; see Version Classification). - I can define which Machine image and Kubernetes versions are eligible for the auto update of clusters during the maintenance time.
- I can define a moment in time when Shoot clusters are forcefully migrated off a certain version (through an
expirationDate
). - I can define an update path for machine images for auto and force updates; see Update path for machine image versions).
- I can disallow the creation of clusters having a certain version (think of severe security issues).
As an end-user/Shoot owner of Gardener:
- I can get information about which Kubernetes and Machine image versions exist and their classification.
- I can determine the time when my Shoot clusters Machine image and Kubernetes version will be forcefully updated to the next patch or minor version (in case the cluster is running a deprecated version with an expiration date).
- I can get this information via API from the
CloudProfile
.
Version Classifications
Administrators can classify versions into four distinct “logical states”: preview
, supported
, deprecated
, and expired
.
The version classification serves as a “point-of-reference” for end-users and also has implications during shoot creation and the maintenance time.
If a version is unclassified, Gardener cannot make those decision based on the “logical state”.
Nevertheless, Gardener can operate without version classifications and can be added at any time to the Kubernetes and machine image versions in the CloudProfile
.
As a best practice, versions usually start with the classification preview
, then are promoted to supported
, eventually deprecated
and finally expired
.
This information is programmatically available in the CloudProfiles
of the Garden cluster.
preview: A
preview
version is a new version that has not yet undergone thorough testing, possibly a new release, and needs time to be validated. Due to its short early age, there is a higher probability of undiscovered issues and is therefore not yet recommended for production usage. A Shoot does not update (neitherauto-update
orforce-update
) to apreview
version during the maintenance time. Also,preview
versions are not considered for the defaulting to the highest available version when deliberately omitting the patch version during Shoot creation. Typically, after a fresh release of a new Kubernetes (e.g., v1.25.0) or Machine image version (e.g., suse-chost 15.4.20220818), the operator tags it aspreview
until they have gained sufficient experience and regards this version to be reliable. After the operator has gained sufficient trust, the version can be manually promoted tosupported
.supported: A
supported
version is the recommended version for new and existing Shoot clusters. This is the version that new Shoot clusters should use and existing clusters should update to. Typically for Kubernetes versions, the latest Kubernetes patch versions of the actual (if not still inpreview
) and the last 3 minor Kubernetes versions are maintained by the community. An operator could define these versions as beingsupported
(e.g., v1.27.6, v1.26.10, and v1.25.12).deprecated: A
deprecated
version is a version that approaches the end of its lifecycle and can contain issues which are probably resolved in a supported version. New Shoots should not use this version anymore. Existing Shoots will be updated to a newer version ifauto-update
is enabled (.spec.maintenance.autoUpdate.kubernetesVersion
for Kubernetes versionauto-update
, or.spec.maintenance.autoUpdate.machineImageVersion
for machine image versionauto-update
). Using automatic upgrades, however, does not guarantee that a Shoot runs a non-deprecated version, as the latest version (overall or of the minor version) can be deprecated as well. Deprecated versions should have an expiration date set for eventual expiration.expired: An
expired
versions has an expiration date (based on the Golang time package) in the past. New clusters with that version cannot be created and existing clusters are forcefully migrated to a higher version during the maintenance time.
Below is an example how the relevant section of the CloudProfile
might look like:
apiVersion: core.gardener.cloud/v1beta1
kind: CloudProfile
metadata:
name: alicloud
spec:
kubernetes:
versions:
- classification: preview
version: 1.27.0
- classification: preview
version: 1.26.3
- classification: supported
version: 1.26.2
- classification: preview
version: 1.25.5
- classification: supported
version: 1.25.4
- classification: supported
version: 1.24.6
- classification: deprecated
expirationDate: "2022-11-30T23:59:59Z"
version: 1.24.5
Automatic Version Upgrades
There are two ways, the Kubernetes version of the control plane as well as the Kubernetes and machine image version of a worker pool can be upgraded: auto update
and forceful
update.
See Automatic Version Updates for how to enable auto updates
for Kubernetes or machine image versions on the Shoot cluster.
If a Shoot is running a version after its expiration date has passed, it will be forcefully updated during its maintenance time. This happens even if the owner has opted out of automatic cluster updates!
When an auto update is triggered?:
- The
Shoot
has auto-update enabled and the version is not the latest eligible version for the auto-update. Please note that this latest version that qualifies for an auto-update is not necessarily the overall latest version in the CloudProfile:- For Kubernetes version, the latest eligible version for auto-updates is the latest patch version of the current minor.
- For machine image version, the latest eligible version for auto-updates is controlled by the
updateStrategy
field of the machine image in the CloudProfile.
- The
Shoot
has auto-update disabled and the version is either expired or does not exist.
The auto update can fail if the version is already on the latest eligible version for the auto-update. A failed auto update triggers a force update. The force and auto update path for Kubernetes and machine image versions differ slightly and are described in more detail below.
Update rules for both Kubernetes and machine image versions
- Both auto and force update first try to update to the latest patch version of the same minor.
- An auto update prefers supported versions over deprecated versions. If there is a lower supported version and a higher deprecated version, auto update will pick the supported version. If all qualifying versions are deprecated, update to the latest deprecated version.
- An auto update never updates to an expired version.
- A force update prefers to update to not-expired versions. If all qualifying versions are expired, update to the latest expired version. Please note that therefore multiple consecutive version upgrades are possible. In this case, the version is again upgraded in the next maintenance time.
Update path for machine image versions
Administrators can define three different update strategies (field updateStrategy
) for machine images in the CloudProfile: patch
, minor
, major (default)
. This is to accommodate the different version schemes of Operating Systems (e.g. Gardenlinux only updates major and minor versions with occasional patches).
patch
: update to the latest patch version of the current minor version. When using an expired version: force update to the latest patch of the current minor. If already on the latest patch version, then force update to the next higher (not necessarily +1) minor version.minor
: update to the latest minor and patch version. When using an expired version: force update to the latest minor and patch of the current major. If already on the latest minor and patch of the current major, then update to the next higher (not necessarily +1) major version.major
: always update to the overall latest version. This is the legacy behavior for automatic machine image version upgrades. Force updates are not possible and will fail if the latest version in the CloudProfile for that image is expired (EOL scenario).
Example configuration in the CloudProfile:
machineImages:
- name: gardenlinux
updateStrategy: minor
versions:
- version: 1096.1.0
- version: 934.8.0
- version: 934.7.0
- name: suse-chost
updateStrategy: patch
versions:
- version: 15.3.20220818
- version: 15.3.20221118
Please note that force updates for machine images can skip minor versions (strategy: patch) or major versions (strategy: minor) if the next minor/major version has no qualifying versions (only preview
versions).
Update path for Kubernetes versions
For Kubernetes versions, the auto update picks the latest non-preview
patch version of the current minor version.
If the cluster is already on the latest patch version and the latest patch version is also expired, it will continue with the latest patch version of the next consecutive minor (minor +1) Kubernetes version, so it will result in an update of a minor Kubernetes version!
Kubernetes “minor version jumps” are not allowed - meaning to skip the update to the consecutive minor version and directly update to any version after that.
For instance, the version 1.24.x
can only update to a version 1.25.x
, not to 1.26.x
or any other version.
This is because Kubernetes does not guarantee upgradability in this case, leading to possibly broken Shoot clusters.
The administrator has to set up the CloudProfile
in such a way that consecutive Kubernetes minor versions are available.
Otherwise, Shoot clusters will fail to upgrade during the maintenance time.
Consider the CloudProfile
below with a Shoot using the Kubernetes version 1.24.12
.
Even though the version is expired
, due to missing 1.25.x
versions, the Gardener Controller Manager cannot upgrade the Shoot’s Kubernetes version.
spec:
kubernetes:
versions:
- version: 1.26.10
- version: 1.26.9
- version: 1.24.12
expirationDate: "<expiration date in the past>"
The CloudProfile
must specify versions 1.25.x
of the consecutive minor version.
Configuring the CloudProfile
in such a way, the Shoot’s Kubernetes version will be upgraded to version 1.25.10
in the next maintenance time.
spec:
kubernetes:
versions:
- version: 1.26.9
- version: 1.25.10
- version: 1.25.9
- version: 1.24.12
expirationDate: "<expiration date in the past>"
Version Requirements (Kubernetes and Machine Image)
The Gardener API server enforces the following requirements for versions:
- A version that is in use by a Shoot cannot be deleted from the
CloudProfile
. - Creating a new version with expiration date in the past is not allowed.
- There can be only one
supported
version per minor version. - The latest Kubernetes version cannot have an expiration date.
- NOTE: The latest version for a machine image can have an expiration date. [*]
[*] Useful for cases in which support for a given machine image needs to be deprecated and removed (for example, the machine image reaches end of life).
Related Documentation
You might want to read about the Shoot Updates and Upgrades procedures to get to know the effects of such operations.
4 - Shoot Updates and Upgrades
Shoot Updates and Upgrades
This document describes what happens during shoot updates (changes incorporated in a newly deployed Gardener version) and during shoot upgrades (changes for version controllable by end-users).
Updates
Updates to all aspects of the shoot cluster happen when the gardenlet reconciles the Shoot
resource.
When are Reconciliations Triggered
Generally, when you change the specification of your Shoot
the reconciliation will start immediately, potentially updating your cluster.
Please note that you can also confine the reconciliation triggered due to your specification updates to the cluster’s maintenance time window. Please find more information in Confine Specification Changes/Updates Roll Out.
You can also annotate your shoot with special operation annotations (for more information, see Trigger Shoot Operations), which will cause the reconciliation to start due to your actions.
There is also an automatic reconciliation by Gardener.
The period, i.e., how often it is performed, depends on the configuration of the Gardener administrators/operators.
In some Gardener installations the operators might enable “reconciliation in maintenance time window only” (for more information, see Cluster Reconciliation), which will result in at least one reconciliation during the time configured in the Shoot
’s .spec.maintenance.timeWindow
field.
Which Updates are Applied
As end-users can only control the Shoot
resource’s specification but not the used Gardener version, they don’t have any influence on which of the updates are rolled out (other than those settings configurable in the Shoot
).
A Gardener operator can deploy a new Gardener version at any point in time.
Any subsequent reconciliation of Shoot
s will update them by rolling out the changes incorporated in this new Gardener version.
Some examples for such shoot updates are:
- Add a new/remove an old component to/from the shoot’s control plane running in the seed, or to/from the shoot’s system components running on the worker nodes.
- Change the configuration of an existing control plane/system component.
- Restart of existing control plane/system components (this might result in a short unavailability of the Kubernetes API server, e.g., when etcd or a kube-apiserver itself is being restarted)
Behavioural Changes
Generally, some of such updates (e.g., configuration changes) could theoretically result in different behaviour of controllers. If such changes would be backwards-incompatible, then we usually follow one of those approaches (depends on the concrete change):
- Only apply the change for new clusters.
- Expose a new field in the
Shoot
resource that lets users control this changed behaviour to enable it at a convenient point in time. - Put the change behind an alpha feature gate (disabled by default) in the gardenlet (only controllable by Gardener operators), which will be promoted to beta (enabled by default) in subsequent releases (in this case, end-users have no influence on when the behaviour changes - Gardener operators should inform their end-users and provide clear timelines when they will enable the feature gate).
Upgrades
We consider shoot upgrades to change either the:
- Kubernetes version (
.spec.kubernetes.version
) - Kubernetes version of the worker pool if specified (
.spec.provider.workers[].kubernetes.version
) - Machine image version of at least one worker pool (
.spec.provider.workers[].machine.image.version
)
Generally, an upgrade is also performed through a reconciliation of the Shoot
resource, i.e., the same concepts as for shoot updates apply.
If an end-user triggers an upgrade (e.g., by changing the Kubernetes version) after a new Gardener version was deployed but before the shoot was reconciled again, then this upgrade might incorporate the changes delivered with this new Gardener version.
In-Place vs. Rolling Updates
If the Kubernetes patch version is changed, then the upgrade happens in-place.
This means that the shoot worker nodes remain untouched and only the kubelet
process restarts with the new Kubernetes version binary.
The same applies for configuration changes of the kubelet.
If the Kubernetes minor version is changed, then the upgrade is done in a “rolling update” fashion, similar to how pods in Kubernetes are updated (when backed by a Deployment
).
The worker nodes will be terminated one after another and replaced by new machines.
The existing workload is gracefully drained and evicted from the old worker nodes to new worker nodes, respecting the configured PodDisruptionBudget
s (see Specifying a Disruption Budget for your Application).
Customize Rolling Update Behaviour of Shoot Worker Nodes
The .spec.provider.workers[]
list exposes two fields that you might configure based on your workload’s needs: maxSurge
and maxUnavailable
.
The same concepts like in Kubernetes apply.
Additionally, you might customize how the machine-controller-manager (abbrev.: MCM; the component instrumenting this rolling update) is behaving. You can configure the following fields in .spec.provider.worker[].machineControllerManager
:
machineDrainTimeout
: Timeout (in duration) used while draining of machine before deletion, beyond which MCM forcefully deletes the machine (default:2h
).machineHealthTimeout
: Timeout (in duration) used while re-joining (in case of temporary health issues) of a machine before it is declared as failed (default:10m
).machineCreationTimeout
: Timeout (in duration) used while joining (during creation) of a machine before it is declared as failed (default:10m
).maxEvictRetries
: Maximum number of times evicts would be attempted on a pod before it is forcibly deleted during the draining of a machine (default:10
).nodeConditions
: List of case-sensitive node-conditions which will change a machine to aFailed
state after themachineHealthTimeout
duration. It may further be replaced with a new machine if the machine is backed by a machine-set object (defaults:KernelDeadlock
,ReadonlyFilesystem
,DiskPressure
).
Rolling Update Triggers
Apart from the above mentioned triggers, a rolling update of the shoot worker nodes is also triggered for some changes to your worker pool specification (.spec.provider.workers[]
, even if you don’t change the Kubernetes or machine image version).
The complete list of fields that trigger a rolling update:
.spec.kubernetes.version
(except for patch version changes).spec.provider.workers[].machine.image.name
.spec.provider.workers[].machine.image.version
.spec.provider.workers[].machine.type
.spec.provider.workers[].volume.type
.spec.provider.workers[].volume.size
.spec.provider.workers[].providerConfig
(except if feature gateNewWorkerPoolHash
).spec.provider.workers[].cri.name
.spec.provider.workers[].kubernetes.version
(except for patch version changes).spec.systemComponents.nodeLocalDNS.enabled
.status.credentials.rotation.certificateAuthorities.lastInitiationTime
(changed by Gardener when a shoot CA rotation is initiated).status.credentials.rotation.serviceAccountKey.lastInitiationTime
(changed by Gardener when a shoot service account signing key rotation is initiated)
If feature gate NewWorkerPoolHash
is enabled:
.spec.kubernetes.kubelet.kubeReserved
(unless a worker pool-specific value is set).spec.kubernetes.kubelet.systemReserved
(unless a worker pool-specific value is set).spec.kubernetes.kubelet.evictionHard
(unless a worker pool-specific value is set).spec.kubernetes.kubelet.cpuManagerPolicy
(unless a worker pool-specific value is set).spec.provider.workers[].kubernetes.kubelet.kubeReserved
.spec.provider.workers[].kubernetes.kubelet.systemReserved
.spec.provider.workers[].kubernetes.kubelet.evictionHard
.spec.provider.workers[].kubernetes.kubelet.cpuManagerPolicy
Changes to kubeReserved
or systemReserved
do not trigger a node roll if their sum does not change.
Generally, the provider extension controllers might have additional constraints for changes leading to rolling updates, so please consult the respective documentation as well.
In particular, if the feature gate NewWorkerPoolHash
is enabled and a worker pool uses the new hash, then the providerConfig
as a whole is not included. Instead only fields selected by the provider extension are considered.
Related Documentation
5 - Supported Kubernetes Versions
Supported Kubernetes Versions
Currently, Gardener supports the following Kubernetes versions:
Garden Clusters
The minimum version of a garden cluster that can be used to run Gardener is 1.25.x
.
Seed Clusters
The minimum version of a seed cluster that can be connected to Gardener is 1.25.x
.
Shoot Clusters
Gardener itself is capable of spinning up clusters with Kubernetes versions 1.25
up to 1.31
.
However, the concrete versions that can be used for shoot clusters depend on the installed provider extension.
Consequently, please consult the documentation of your provider extension to see which Kubernetes versions are supported for shoot clusters.
👨🏼💻 Developers note: The Adding Support For a New Kubernetes Version topic explains what needs to be done in order to add support for a new Kubernetes version.
6 - Trigger Shoot Operations Through Annotations
Trigger Shoot Operations Through Annotations
You can trigger a few explicit operations by annotating the Shoot
with an operation annotation.
This might allow you to induct certain behavior without the need to change the Shoot
specification.
Some of the operations can also not be caused by changing something in the shoot specification because they can’t properly be reflected here.
Note that once the triggered operation is considered by the controllers, the annotation will be automatically removed and you have to add it each time you want to trigger the operation.
Please note: If .spec.maintenance.confineSpecUpdateRollout=true
, then the only way to trigger a shoot reconciliation is by setting the reconcile
operation, see below.
Immediate Reconciliation
Annotate the shoot with gardener.cloud/operation=reconcile
to make the gardenlet
start a reconciliation operation without changing the shoot spec and possibly without being in its maintenance time window:
kubectl -n garden-<project-name> annotate shoot <shoot-name> gardener.cloud/operation=reconcile
Immediate Maintenance
Annotate the shoot with gardener.cloud/operation=maintain
to make the gardener-controller-manager
start maintaining your shoot immediately (possibly without being in its maintenance time window).
If no reconciliation starts, then nothing needs to be maintained:
kubectl -n garden-<project-name> annotate shoot <shoot-name> gardener.cloud/operation=maintain
Retry Failed Reconciliation
Annotate the shoot with gardener.cloud/operation=retry
to make the gardenlet
start a new reconciliation loop on a failed shoot.
Failed shoots are only reconciled again if a new Gardener version is deployed, the shoot specification is changed or this annotation is set:
kubectl -n garden-<project-name> annotate shoot <shoot-name> gardener.cloud/operation=retry
Credentials Rotation Operations
Please consult Credentials Rotation for Shoot Clusters for more information.
Restart systemd
Services on Particular Worker Nodes
It is possible to make Gardener restart particular systemd services on your shoot worker nodes if needed.
The annotation is not set on the Shoot
resource but directly on the Node
object you want to target.
For example, the following will restart both the kubelet
and the containerd
services:
kubectl annotate node <node-name> worker.gardener.cloud/restart-systemd-services=kubelet,containerd
It may take up to a minute until the service is restarted.
The annotation will be removed from the Node
object after all specified systemd services have been restarted.
It will also be removed even if the restart of one or more services failed.
ℹ️ In the example mentioned above, you could additionally verify when/whether the kubelet restarted by using
kubectl describe node <node-name>
and looking for such aStarting kubelet
event.
Force Deletion
When the ShootForceDeletion
feature gate in the gardener-apiserver is enabled, users will be able to force-delete the Shoot. This is only possible if the Shoot fails to be deleted normally. For forceful deletion, the following conditions must be met:
- Shoot has a deletion timestamp.
- Shoot status contains at least one of the following ErrorCodes:
ERR_CLEANUP_CLUSTER_RESOURCES
ERR_CONFIGURATION_PROBLEM
ERR_INFRA_DEPENDENCIES
ERR_INFRA_UNAUTHENTICATED
ERR_INFRA_UNAUTHORIZED
If the above conditions are satisfied, you can annotate the Shoot with confirmation.gardener.cloud/force-deletion=true
, and Gardener will cleanup the Shoot controlplane and the Shoot metadata.
⚠️ You MUST ensure that all the resources created in the IaaS account are cleaned up to prevent orphaned resources. Gardener will NOT delete any resources in the underlying infrastructure account. Hence, use this annotation at your own risk and only if you are fully aware of these consequences.