그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그
5 minute read
Shoot Updates and Upgrades
This document describes what happens during shoot updates (changes incorporated in a newly deployed Gardener version) and during shoot upgrades (changes for version controllable by end-users).
Updates to all aspects of the shoot cluster happen when the gardenlet reconciles the
When are Reconciliations Triggered
Generally, when you change the specification of your
Shoot the reconciliation will start immediately, potentially updating your cluster.
Please note that you can also confine the reconciliation triggered due to your specification updates to the cluster’s maintenance time window. Please find more information here.
You can also annotate your shoot with special operation annotations (see this document) which will cause the reconciliation to start due to your actions.
There is also an automatic reconciliation by Gardener.
The period, i.e., how often it is performed, depends on the configuration of the Gardener administrators/operators.
In some Gardener installations the operators might enable “reconciliation in maintenance time window only” (more information) which will result in at least one reconciliation during the time configured in the
Which Updates are Applied
As end-users can only control the
Shoot resource’s specification but not the used Gardener version, they don’t have any influence on which of the updates are rolled out (other than those settings configurable in the
A Gardener operator can deploy a new Gardener version at any point in time.
Any subsequent reconciliation of
Shoots will update them by rolling out the changes incorporated in this new Gardener version.
Some examples for such shoot updates are:
- Add a new/remove an old component to/from the shoot’s control plane running in the seed, or to/from the shoot’s system components running on the worker nodes.
- Change the configuration of an existing control plane/system component.
- Restart of existing control plane/system components (this might result in a short unavailability of the Kubernetes API server, e.g., when etcd or a kube-apiserver itself is being restarted)
Generally, some of such updates (e.g., configuration changes) could theoretically result in different behaviour of controllers. If such changes would be backwards-incompatible then we usually follow one of those approaches (depends on the concrete change):
- Only apply the change for new clusters.
- Expose a new field in the
Shootresource that lets users control this changed behaviour to enable it at a convenient point in time.
- Put the change behind an alpha feature gate (disabled by default) in the gardenlet (only controllable by Gardener operators) which will be promoted to beta (enabled by default) in subsequent releases (in this case, end-users have no influence on when the behaviour changes - Gardener operators should inform their end-users and provide clear timelines when they will enable the feature gate).
We consider shoot upgrades to change either the
- Kubernetes version (
- Kubernetes version of the worker pool if specified (
- Machine image version of at least one worker pool (
Generally, an upgrade is also performed through a reconciliation of the
Shoot resource, i.e., the same concepts like for shoot updates apply.
If an end-user triggers an upgrade (e.g., by changing the Kubernetes version) after a new Gardener version was deployed but before the shoot was reconciled again, then this upgrade might incorporate the changes delivered with this new Gardener version.
In-Place vs. Rolling Updates
If the Kubernetes patch version is changed then the upgrade happens in-place.
This means that the shoot worker nodes remain untouched and only the
kubelet process restarts with the new Kubernetes version binary.
The same applies for configuration changes of the kubelet.
If the Kubernetes minor version is changed then the upgrade is done in a “rolling update” fashion, similar to how pods in Kubernetes are updated (when backed by a
The worker nodes will be terminated one after another and replaced by new machines.
The existing workload is gracefully drained and evicted from the old worker nodes to new worker nodes, respecting the configured
PodDisruptionBudgets (see Kubernetes documentation).
Customize Rolling Update Behaviour of Shoot Worker Nodes
.spec.provider.workers list exposes two fields that you might configure based on your workload’s needs:
The same concepts like in Kubernetes apply.
Additionally, you might customize how the machine-controller-manager (abbrev.: MCM; the component instrumenting this rolling update) is behaving. You can configure the following fields in
machineDrainTimeout: Timeout (in duration) used while draining of machine before deletion, beyond which MCM forcefully deletes machine (default:
machineHealthTimeout: Timeout (in duration) used while re-joining (in case of temporary health issues) of machine before it is declared as failed (default:
machineCreationTimeout: Timeout (in duration) used while joining (during creation) of machine before it is declared as failed (default:
maxEvictRetries: Maximum number of times evicts would be attempted on a pod before it is forcibly deleted during draining of a machine (default:
nodeConditions: List of case-sensitive node-conditions which will change a machine to a
Failedstate after the
machineHealthTimeoutduration. It may further be replaced with a new machine if the machine is backed by a machine-set object (defaults:
Rolling Update Triggers
Apart from the above mentioned triggers, a rolling update of the shoot worker nodes is also triggered for some changes to your worker pool specification (
.spec.provider.workers, even if you don’t change the Kubernetes or machine image version).
The complete list of fields that trigger a rolling update:
.spec.kubernetes.version(except for patch version changes)
.spec.provider.workers.kubernetes.version(except for patch version changes)
.status.credentials.rotation.certificateAuthorities.lastInitiationTime(changed by gardener when a shoot CA rotation is initiated)
.status.credentials.rotation.serviceAccountKey.lastInitiationTime(changed by gardener when a shoot service account signing key rotation is initiated)
Generally, the provider extension controllers might have additional constraints for changes leading to rolling updates, so please consult the respective documentation as well.