그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그
7 minute read
Checklist For Adding New Components
Adding new components that run in the garden, seed, or shoot cluster is theoretically quite simple - we just need a
Deployment (or other similar workload resource), the respective container image, and maybe a bit of configuration.
In practice, however, there are a couple of things to keep in mind in order to make the deployment production-ready.
This document provides a checklist for them that you can walk through.
Avoid usage of Helm charts (example)
Nowadays, we use Golang components instead of Helm charts for deploying components to a cluster. Please find a typical structure of such components in the provided metrics_server.go file (configuration values are typically managed in a
Valuesstructure). There are a few exceptions (e.g., Istio) still using charts, however the default should be using a Golang-based implementation. For the exceptional cases, use Golang’s embed package to embed the Helm chart directory (example 1, example 2).
For historic reasons, resources related to shoot control plane components are applied directly with the client. All other resources (seed or shoot system components) are deployed via
gardener-resource-manager’s Resource controller (
ManagedResources) since it performs health checks out-of-the-box and has a lot of other features (see its documentation for more information). Components that can run as both seed system component or shoot control plane component (e.g., VPA or
kube-state-metrics) can make use of these utility functions.
Secrets are immutable for modification and have a unique name. This has a couple of benefits, e.g. the
kubeletdoesn’t watch these resources, and it is always clear which resource contains which data since it cannot be changed. As a consequence, unique/immutable
Secretare superior to checksum annotations on the pod templates. Stale/unused
Secrets are garbage-collected by
gardener-resource-manager’s GarbageCollector. There are utility functions (see examples above) for using unique
Secrets in Golang components. It is essential to inject the annotations into the workload resource to make the garbage-collection work.
Note that some
Secrets should not be unique (e.g., those containing monitoring or logging configuration). The reason is that the old revision stays in the cluster even if unused until the garbage-collector acts. During this time, they would be wrongly aggregated to the full configuration.
You should use the secrets manager for the management of any kind of credentials. This makes sure that credentials rotation works out-of-the-box without you requiring to think about it. Generally, do not use client certificates (see the Security section).
Consider hibernation when calculating replica count (example)
Shoot clusters can be hibernated meaning that all control plane components in the shoot namespace in the seed cluster are scaled down to zero and all worker nodes are terminated. If your component runs in the seed cluster then you have to consider this case and provide the proper replica count. There is a utility function available (see example).
Only define the minimum of needed dependency tasks in the shoot reconciliation/deletion flows.
Handle shoot system components
Shoot system components deployed by
gardener-resource-managerare labelled with
resource.gardener.cloud/managed-by: gardener. This makes Gardener adding required label selectors and tolerations so that non-
Pods will exclusively run on selected nodes (for more information, see System Components Webhook).
DaemonSets on the other hand, should generally tolerate any
NoExecutetaints so that they can run on any
Node, regardless of user added taints.
Components that need to talk to the API server of their runtime cluster must always use a dedicated
ServiceAccount(do not use
false. This makes
gardener-resource-manager’s TokenInvalidator invalidate the static token secret and its
ProjectedTokenMountwebhook inject a projected token automatically.
Use shoot access tokens instead of a client certificates (example)
For components that need to talk to a target cluster different from their runtime cluster (e.g., running in seed cluster but talking to shoot) the
gardener-resource-manager’s TokenRequestor should be used to manage a so-called “shoot access token”.
Define RBAC roles with minimal privileges (example)
ServiceAccount(if it exists) should have as little privileges as possible. Consequently, please define proper RBAC roles for it. This might include a combination of
Roles. Please do not provide elevated privileges due to laziness (e.g., because there is already a
ClusterRolethat can be extended vs. creating a
Roleonly when access to a single namespace is needed).
You should restrict both ingress and egress traffic to/from your component as much as possible to ensure that it only gets access to/from other components if really needed. Gardener provides a few default policies for typical usage scenarios. For more information, see Seed Network Policies and Shoot Network Policies.
Avoid running components with
privileged=true. Instead, define the needed Linux capabilities.
The Seccomp profile will be defaulted by
gardener-resource-manager’s SeccompProfile webhook which works well for the majority of components. However, in some special cases you might need to overwrite it.
PodSecurityPolicys are deprecated, however Gardener still supports shoot clusters with older Kubernetes versions (ref). To make sure that such clusters can run with
.spec.kubernetes.allowPrivilegedContainers=false, you have to define proper
PodSecurityPolicys. For more information, see Pod Security.
High Availability / Stability
Specify the component type label for high availability (example)
To support high-availability deployments,
gardener-resource-managers HighAvailabilityConfig webhook injects the proper specification like replica or topology spread constraints. You only need to specify the type label. For more information, see High Availability Of Deployed Components.
Closely related to high availability but also to stability in general: The definition of a
maxUnavailable=1should be provided by default.
Choose the right
Consider defining liveness and readiness probes (example)
To ensure smooth rolling update behaviour, consider the definition of liveness and/or readiness probes.
Provide resource requirements (example)
All components should have resource requirements. Generally, they should always request CPU and memory, while only memory shall be limited (no CPU limits!).
We typically perform vertical auto-scaling via the VPA managed by the Kubernetes community. Each component should have a respective
VerticalPodAutoscalerwith “min allowed” resources, “auto update mode”, and “requests only”-mode. VPA is always enabled in garden or seed clusters, while it is optional for shoot clusters.
HorizontalPodAutoscalerif needed (example)
If your component is capable of scaling horizontally, you should consider defining a
Observability / Operations Productivity
Components should provide scrape configuration and alerting rules for Prometheus/Alertmanager if appropriate. This should be done inside a dedicated
monitoring.gofile. Extensions should follow the guidelines described in Extensions Monitoring Integration.
Components should provide parsers and filters for fluent-bit, if appropriate. This should be done inside a dedicated
logging.gofile. Extensions should follow the guidelines described in Fluent-bit log parsers and filters.
In order to allow easy inspection of two
ReplicaSets to quickly find the changes that lead to a rolling update, the revision history limit should be set to
gardenlet’s care controllers regularly check the health status of system or control plane components. You need to enhance the lists of components to check if your component related to the seed system or shoot control plane (shoot system components are automatically checked via their respective
ManagedResourceconditions), see examples above.
Gardener offers to restart components during the maintenance time window. For more information, see Restart Control Plane Controllers and Restart Some Core Addons. You can consider adding the needed label to your control plane component to get this automatic restart (probably not needed for most components).