This is the multi-page printable view of this section. Click here to print.
Development
1 - Changing the API
Changing the API
This document describes the steps that need to be performed when changing the API. It provides guidance for API changes to both (Gardener system in general or component configurations).
Generally, as Gardener is a Kubernetes-native extension, it follows the same API conventions and guidelines like Kubernetes itself. The Kubernetes API Conventions as well as Changing the API topics already provide a good overview and general explanation of the basic concepts behind it. We are following the same approaches.
Gardener API
The Gardener API is defined in the pkg/apis/{core,extensions,settings}
directories and is the main point of interaction with the system.
It must be ensured that the API is always backwards-compatible.
Changing the API
Checklist when changing the API:
- Modify the field(s) in the respective Golang files of all external versions and the internal version.
- Make sure new fields are being added as “optional” fields, i.e., they are of pointer types, they have the
// +optional
comment, and they have theomitempty
JSON tag. - Make sure that the existing field numbers in the protobuf tags are not changed.
- Make sure new fields are being added as “optional” fields, i.e., they are of pointer types, they have the
- If necessary, implement/adapt the conversion logic defined in the versioned APIs (e.g.,
pkg/apis/core/v1beta1/conversions*.go
). - If necessary, implement/adapt defaulting logic defined in the versioned APIs (e.g.,
pkg/apis/core/v1beta1/defaults*.go
). - Run the code generation:
make generate
- If necessary, implement/adapt validation logic defined in the internal API (e.g.,
pkg/apis/core/validation/validation*.go
). - If necessary, adapt the exemplary YAML manifests of the Gardener resources defined in
example/*.yaml
. - In most cases, it makes sense to add/adapt the documentation for administrators/operators and/or end-users in the
docs
folder to provide information on purpose and usage of the added/changed fields. - When opening the pull request, always add a release note so that end-users are becoming aware of the changes.
Removing a Field
If fields shall be removed permanently from the API, then a proper deprecation period must be adhered to so that end-users have enough time to adapt their clients.
Once the deprecation period is over, the field should be dropped from the API in a two-step process, i.e., in two release cycles. In the first step, all the usages in the code base should be dropped. In the second step, the field should be dropped from API. We need to follow this two-step process cause there can be the case where gardener-apiserver
is upgraded to a new version in which the field has been removed but other controllers are still on the old version of Gardener. This can lead to nil
pointer exceptions or other unexpected behaviour.
The steps for removing a field from the code base is:
The field in the external version(s) has to be commented out with appropriate doc string that the protobuf number of the corresponding field is reserved. Example:
- SeedTemplate *gardencorev1beta1.SeedTemplate `json:"seedTemplate,omitempty" protobuf:"bytes,2,opt,name=seedTemplate"` + // SeedTemplate is tombstoned to show why 2 is reserved protobuf tag. + // SeedTemplate *gardencorev1beta1.SeedTemplate `json:"seedTemplate,omitempty" protobuf:"bytes,2,opt,name=seedTemplate"`
The reasoning behind this is to prevent the same protobuf number being used by a new field. Introducing a new field with the same protobuf number would be a breaking change for clients still using the old protobuf definitions that have the old field for the given protobuf number. The field in the internal version can be removed.
A unit test has to be added to make sure that a new field does not reuse the already reserved protobuf tag.
Example of field removal can be found in the Remove seedTemplate
field from ManagedSeed API PR.
Component Configuration APIs
Most Gardener components have a component configuration that follows similar principles to the Gardener API.
Those component configurations are defined in pkg/{controllermanager,gardenlet,scheduler},pkg/apis/config
.
Hence, the above checklist also applies for changes to those APIs.
However, since these APIs are only used internally and only during the deployment of Gardener, the guidelines with respect to changes and backwards-compatibility are slightly relaxed.
If necessary, it is allowed to remove fields without a proper deprecation period if the release note uses the breaking operator
keywords.
In addition to the above checklist:
- If necessary, then adapt the Helm chart of Gardener defined in
charts/gardener
. Adapt thevalues.yaml
file as well as the manifest templates.
2 - Component Checklist
Checklist For Adding New Components
Adding new components that run in the garden, seed, or shoot cluster is theoretically quite simple - we just need a Deployment
(or other similar workload resource), the respective container image, and maybe a bit of configuration.
In practice, however, there are a couple of things to keep in mind in order to make the deployment production-ready.
This document provides a checklist for them that you can walk through.
General
Avoid usage of Helm charts (example)
Nowadays, we use Golang components instead of Helm charts for deploying components to a cluster. Please find a typical structure of such components in the provided metrics_server.go file (configuration values are typically managed in a
Values
structure). There are a few exceptions (e.g., Istio) still using charts, however the default should be using a Golang-based implementation. For the exceptional cases, use Golang’s embed package to embed the Helm chart directory (example 1, example 2).Choose the proper deployment way (example 1 (direct application w/ client), example 2 (using
ManagedResource
), example 3 (mixed scenario))For historic reasons, resources related to shoot control plane components are applied directly with the client. All other resources (seed or shoot system components) are deployed via
gardener-resource-manager
’s Resource controller (ManagedResource
s) since it performs health checks out-of-the-box and has a lot of other features (see its documentation for more information). Components that can run as both seed system component or shoot control plane component (e.g., VPA orkube-state-metrics
) can make use of these utility functions.Do not hard-code container image references (example 1, example 2, example 3)
We define all image references centrally in the
charts/images.yaml
file. Hence, the image references must not be hard-coded in the pod template spec but read from this so-called image vector instead.Use unique
ConfigMap
s/Secret
s (example 1, example 2)Unique
ConfigMap
s/Secret
s are immutable for modification and have a unique name. This has a couple of benefits, e.g. thekubelet
doesn’t watch these resources, and it is always clear which resource contains which data since it cannot be changed. As a consequence, unique/immutableConfigMap
s/Secret
are superior to checksum annotations on the pod templates. Stale/unusedConfigMap
s/Secret
s are garbage-collected bygardener-resource-manager
’s GarbageCollector. There are utility functions (see examples above) for using uniqueConfigMap
s/Secret
s in Golang components. It is essential to inject the annotations into the workload resource to make the garbage-collection work.
Note that someConfigMap
s/Secret
s should not be unique (e.g., those containing monitoring or logging configuration). The reason is that the old revision stays in the cluster even if unused until the garbage-collector acts. During this time, they would be wrongly aggregated to the full configuration.Manage certificates/secrets via secrets manager (example)
You should use the secrets manager for the management of any kind of credentials. This makes sure that credentials rotation works out-of-the-box without you requiring to think about it. Generally, do not use client certificates (see the Security section).
Consider hibernation when calculating replica count (example)
Shoot clusters can be hibernated meaning that all control plane components in the shoot namespace in the seed cluster are scaled down to zero and all worker nodes are terminated. If your component runs in the seed cluster then you have to consider this case and provide the proper replica count. There is a utility function available (see example).
Ensure task dependencies are as precise as possible in shoot flows (example 1, example 2)
Only define the minimum of needed dependency tasks in the shoot reconciliation/deletion flows.
Handle shoot system components
Shoot system components deployed by
gardener-resource-manager
are labelled withresource.gardener.cloud/managed-by: gardener
. This makes Gardener adding required label selectors and tolerations so that non-DaemonSet
managedPod
s will exclusively run on selected nodes (for more information, see System Components Webhook).DaemonSet
s on the other hand, should generally tolerate anyNoSchedule
orNoExecute
taints so that they can run on anyNode
, regardless of user added taints.
Security
Use a dedicated
ServiceAccount
and disable auto-mount (example)Components that need to talk to the API server of their runtime cluster must always use a dedicated
ServiceAccount
(do not usedefault
), withautomountServiceAccountToken
set tofalse
. This makesgardener-resource-manager
’s TokenInvalidator invalidate the static token secret and itsProjectedTokenMount
webhook inject a projected token automatically.Use shoot access tokens instead of a client certificates (example)
For components that need to talk to a target cluster different from their runtime cluster (e.g., running in seed cluster but talking to shoot) the
gardener-resource-manager
’s TokenRequestor should be used to manage a so-called “shoot access token”.Define RBAC roles with minimal privileges (example)
The component’s
ServiceAccount
(if it exists) should have as little privileges as possible. Consequently, please define proper RBAC roles for it. This might include a combination ofClusterRole
s andRole
s. Please do not provide elevated privileges due to laziness (e.g., because there is already aClusterRole
that can be extended vs. creating aRole
only when access to a single namespace is needed).Use
NetworkPolicy
s to restrict network traffic (example)You should restrict both ingress and egress traffic to/from your component as much as possible to ensure that it only gets access to/from other components if really needed. Gardener provides a few default policies for typical usage scenarios. For more information, see Seed Network Policies and Shoot Network Policies.
Do not run components in privileged mode (example 1, example 2)
Avoid running components with
privileged=true
. Instead, define the needed Linux capabilities.Choose the proper Seccomp profile (example 1, example 2)
The Seccomp profile will be defaulted by
gardener-resource-manager
’s SeccompProfile webhook which works well for the majority of components. However, in some special cases you might need to overwrite it.Define
PodSecurityPolicy
s (example)PodSecurityPolicy
s are deprecated, however Gardener still supports shoot clusters with older Kubernetes versions (ref). To make sure that such clusters can run with.spec.kubernetes.allowPrivilegedContainers=false
, you have to define properPodSecurityPolicy
s. For more information, see Pod Security.
High Availability / Stability
Specify the component type label for high availability (example)
To support high-availability deployments,
gardener-resource-manager
s HighAvailabilityConfig webhook injects the proper specification like replica or topology spread constraints. You only need to specify the type label. For more information, see High Availability Of Deployed Components.Define a
PodDisruptionBudget
(example)Closely related to high availability but also to stability in general: The definition of a
PodDisruptionBudget
withmaxUnavailable=1
should be provided by default.Choose the right
PriorityClass
(example)Each cluster runs many components with different priorities. Gardener provides a set of default
PriorityClass
es. For more information, see Priority Classes.Consider defining liveness and readiness probes (example)
To ensure smooth rolling update behaviour, consider the definition of liveness and/or readiness probes.
Scalability
Provide resource requirements (example)
All components should have resource requirements. Generally, they should always request CPU and memory, while only memory shall be limited (no CPU limits!).
Define a
VerticalPodAutoscaler
(example)We typically perform vertical auto-scaling via the VPA managed by the Kubernetes community. Each component should have a respective
VerticalPodAutoscaler
with “min allowed” resources, “auto update mode”, and “requests only”-mode. VPA is always enabled in garden or seed clusters, while it is optional for shoot clusters.Define a
HorizontalPodAutoscaler
if needed (example)If your component is capable of scaling horizontally, you should consider defining a
HorizontalPodAutoscaler
.
Observability / Operations Productivity
Provide monitoring scrape config and alerting rules (example 1, example 2)
Components should provide scrape configuration and alerting rules for Prometheus/Alertmanager if appropriate. This should be done inside a dedicated
monitoring.go
file. Extensions should follow the guidelines described in Extensions Monitoring Integration.Provide logging parsers and filters (example 1, example 2)
Components should provide parsers and filters for fluent-bit, if appropriate. This should be done inside a dedicated
logging.go
file. Extensions should follow the guidelines described in Fluent-bit log parsers and filters.Set the
revisionHistoryLimit
to2
forDeployment
s (example)In order to allow easy inspection of two
ReplicaSet
s to quickly find the changes that lead to a rolling update, the revision history limit should be set to2
.Define health checks (example 1, example 2)
gardenlet
’s care controllers regularly check the health status of system or control plane components. You need to enhance the lists of components to check if your component related to the seed system or shoot control plane (shoot system components are automatically checked via their respectiveManagedResource
conditions), see examples above.Configure automatic restarts in shoot maintenance time window (example 1, example 2)
Gardener offers to restart components during the maintenance time window. For more information, see Restart Control Plane Controllers and Restart Some Core Addons. You can consider adding the needed label to your control plane component to get this automatic restart (probably not needed for most components).
3 - Dependencies
Dependency Management
We are using go modules for depedency management.
In order to add a new package dependency to the project, you can perform go get <PACKAGE>@<VERSION>
or edit the go.mod
file and append the package along with the version you want to use.
Updating Dependencies
The Makefile
contains a rule called revendor
which performs go mod tidy
and go mod vendor
:
go mod tidy
makes surego.mod
matches the source code in the module. It adds any missing modules necessary to build the current module’s packages and dependencies, and it removes unused modules that don’t provide any relevant packages.go mod vendor
resets the main module’s vendor directory to include all packages needed to build and test all the main module’s packages. It does not include test code for vendored packages.
make revendor
The dependencies are installed into the vendor
folder, which should be added to the VCS.
⚠️ Make sure that you test the code after you have updated the dependencies!
Exported Packages
This repository contains several packages that could be considered “exported packages”, in a sense that they are supposed to be reused in other Go projects. For example:
- Gardener’s API packages:
pkg/apis
- Library for building Gardener extensions:
extensions
- Gardener’s Test Framework:
test/framework
There are a few more folders in this repository (non-Go sources) that are reused across projects in the Gardener organization:
- GitHub templates:
.github
- Concourse / cc-utils related helpers:
hack/.ci
- Development, build and testing helpers:
hack
These packages feature a dummy doc.go
file to allow other Go projects to pull them in as go mod dependencies.
These packages are explicitly not supposed to be used in other projects (consider them as “non-exported”):
- API validation packages:
pkg/apis/*/*/validation
- Operation package (main Gardener business logic regarding
Seed
andShoot
clusters):pkg/operation
- Third party code:
third_party
Currently, we don’t have a mechanism yet for selectively syncing out these exported packages into dedicated repositories like kube’s staging mechanism (publishing-bot).
Import Restrictions
We want to make sure that other projects can depend on this repository’s “exported” packages without pulling in the entire repository (including “non-exported” packages) or a high number of other unwanted dependencies. Hence, we have to be careful when adding new imports or references between our packages.
ℹ️ General rule of thumb: the mentioned “exported” packages should be as self-contained as possible and depend on as few other packages in the repository and other projects as possible.
In order to support that rule and automatically check compliance with that goal, we leverage import-boss.
The tool checks all imports of the given packages (including transitive imports) against rules defined in .import-restrictions
files in each directory.
An import is allowed if it matches at least one allowed prefix and does not match any forbidden prefixes.
Note:
''
(the empty string) is a prefix of everything. For more details, see the import-boss topic.
import-boss
is executed on every pull request and blocks the PR if it doesn’t comply with the defined import restrictions.
You can also run it locally using make check
.
Import restrictions should be changed in the following situations:
- We spot a new pattern of imports across our packages that was not restricted before but makes it more difficult for other projects to depend on our “exported” packages. In that case, the imports should be further restricted to disallow such problematic imports, and the code/package structure should be reworked to comply with the newly given restrictions.
- We want to share code between packages, but existing import restrictions prevent us from doing so. In that case, please consider what additional dependencies it will pull in, when loosening existing restrictions. Also consider possible alternatives, like code restructurings or extracting shared code into dedicated packages for minimal impact on dependent projects.
4 - Getting Started Locally
Developing Gardener Locally
This document will walk you through running Gardener on your local machine for development purposes. If you encounter difficulties, please open an issue so that we can make this process easier.
Gardener runs in any Kubernetes cluster. In this guide, we will start a KinD cluster which is used as both garden and seed cluster (please refer to the architecture overview) for simplicity.
The Gardener components, however, will be run as regular processes on your machine (hence, no container images are being built).
Alternatives
When developing Gardener on your local machine you might face several limitations:
- Your machine doesn’t have enough compute resources (see prerequisites) for hosting a second seed cluster or multiple shoot clusters.
- Developing Gardener’s IPv6 features requires a Linux machine and native IPv6 connectivity to the internet, but you’re on macOS or don’t have IPv6 connectivity in your office environment or via your home ISP.
In these cases, you might want to check out one of the following options that run the setup described in this guide elsewhere for circumventing these limitations:
- remote local setup: develop on a remote pod for more compute resources
- dev box on Google Cloud: develop on a Google Cloud machine for more compute resource and/or simple IPv4/IPv6 dual-stack networking
Prerequisites
Make sure that you have followed the Local Setup guide up until the Get the sources step.
Make sure your Docker daemon is up-to-date, up and running and has enough resources (at least
4
CPUs and4Gi
memory; see here how to configure the resources for Docker for Mac).Please note that 4 CPU / 4Gi memory might not be enough for more than one
Shoot
cluster, i.e., you might need to increase these values if you want to run additionalShoot
s. If you plan on following the optional steps to create a second seed cluster, the required resources will be more - at least10
CPUs and16Gi
memory.Additionally, please configure at least
120Gi
of disk size for the Docker daemon.Tip: With
docker system df
anddocker system prune -a
you can cleanup unused data.Make sure the
kind
docker network is using the CIDR172.18.0.0/16
.- If the network does not exist, it can be created with
docker network create kind --subnet 172.18.0.0/16
- If the network already exists, the CIDR can be checked with
docker network inspect kind | jq '.[].IPAM.Config[].Subnet'
. If it is not172.18.0.0/16
, delete the network withdocker network rm kind
and create it with the command above.
- If the network does not exist, it can be created with
Make sure that you increase the maximum number of open files on your host:
On Mac, run
sudo launchctl limit maxfiles 65536 200000
On Linux, extend the
/etc/security/limits.conf
file with* hard nofile 97816 * soft nofile 97816
and reload the terminal.
Setting Up the KinD Cluster (Garden and Seed)
make kind-up KIND_ENV=local
If you want to setup an IPv6 KinD cluster, use
make kind-up IPFAMILY=ipv6
instead.
This command sets up a new KinD cluster named gardener-local
and stores the kubeconfig in the ./example/gardener-local/kind/local/kubeconfig
file.
It might be helpful to copy this file to
$HOME/.kube/config
since you will need to target this KinD cluster multiple times. Alternatively, make sure to set yourKUBECONFIG
environment variable to./example/gardener-local/kind/local/kubeconfig
for all future steps viaexport KUBECONFIG=example/gardener-local/kind/local/kubeconfig
.
All following steps assume that you are using this kubeconfig.
Additionally, this command also deploys a local container registry to the cluster as well as a few registry mirrors, that are set up as a pull-through cache for all upstream registries Gardener uses by default.
This is done to speed up image pulls across local clusters.
The local registry can be accessed as localhost:5001
for pushing and pulling.
The storage directories of the registries are mounted to the host machine under dev/local-registry
.
With this, mirrored images don’t have to be pulled again after recreating the cluster.
The command also deploys a default calico installation as the cluster’s CNI implementation with NetworkPolicy
support (the default kindnet
CNI doesn’t provide NetworkPolicy
support).
Furthermore, it deploys the metrics-server in order to support HPA and VPA on the seed cluster.
Outgoing IPv6 Single-Stack Networking (optional)
If you want to test IPv6-related features, we need to configure NAT for outgoing traffic from the kind network to the internet.
After make kind-up IPFAMILY=ipv6
, check the network created by kind:
$ docker network inspect kind | jq '.[].IPAM.Config[].Subnet'
"172.18.0.0/16"
"fc00:f853:ccd:e793::/64"
Determine which device is used for outgoing internet traffic by looking at the default route:
$ ip route show default
default via 192.168.195.1 dev enp3s0 proto dhcp src 192.168.195.34 metric 100
Configure NAT for traffic from the kind cluster to the internet using the IPv6 range and the network device from the previous two steps:
ip6tables -t nat -A POSTROUTING -o enp3s0 -s fc00:f853:ccd:e793::/64 -j MASQUERADE
Setting Up Gardener
In a terminal pane, run:
make dev-setup # preparing the environment (without webhooks for now)
kubectl wait --for=condition=ready pod -l run=etcd -n garden --timeout 2m # wait for etcd to be ready
make start-apiserver # starting gardener-apiserver
In a new terminal pane, run:
kubectl wait --for=condition=available apiservice v1beta1.core.gardener.cloud # wait for gardener-apiserver to be ready
make start-admission-controller # starting gardener-admission-controller
In a new terminal pane, run:
make dev-setup DEV_SETUP_WITH_WEBHOOKS=true # preparing the environment with webhooks
make start-controller-manager # starting gardener-controller-manager
(Optional): In a new terminal pane, run:
make start-scheduler # starting gardener-scheduler
In a new terminal pane, run:
make register-local-env # registering the local environment (CloudProfile, Seed, etc.)
make start-gardenlet SEED_NAME=local # starting gardenlet
In a new terminal pane, run:
make start-extension-provider-local # starting gardener-extension-provider-local
ℹ️ The provider-local
is started with elevated privileges since it needs to manipulate your /etc/hosts
file to enable you accessing the created shoot clusters from your local machine, see this for more details.
Creating a Shoot
Cluster
You can wait for the Seed
to become ready by running:
kubectl wait --for=condition=gardenletready --for=condition=extensionsready --for=condition=bootstrapped seed local --timeout=5m
Alternatively, you can run kubectl get seed local
and wait for the STATUS
to indicate readiness:
NAME STATUS PROVIDER REGION AGE VERSION K8S VERSION
local Ready local local 4m42s vX.Y.Z-dev v1.21.1
In order to create a first shoot cluster, just run:
kubectl apply -f example/provider-local/shoot.yaml
You can wait for the Shoot
to be ready by running:
kubectl wait --for=condition=apiserveravailable --for=condition=controlplanehealthy --for=condition=observabilitycomponentshealthy --for=condition=everynodeready --for=condition=systemcomponentshealthy shoot local -n garden-local --timeout=10m
Alternatively, you can run kubectl -n garden-local get shoot local
and wait for the LAST OPERATION
to reach 100%
:
NAME CLOUDPROFILE PROVIDER REGION K8S VERSION HIBERNATION LAST OPERATION STATUS AGE
local local local local 1.21.0 Awake Create Processing (43%) healthy 94s
(Optional): You could also execute a simple e2e test (creating and deleting a shoot) by running:
make test-e2e-local-simple KUBECONFIG="$PWD/example/gardener-local/kind/local/kubeconfig"
When the shoot got successfully created you can access it as follows:
kubectl -n garden-local get secret local.kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d > /tmp/kubeconfig-shoot-local.yaml
kubectl --kubeconfig=/tmp/kubeconfig-shoot-local.yaml get nodes
(Optional): Setting Up a Second Seed Cluster
There are cases where you would want to create a second seed cluster in your local setup. For example, if you want to test the control plane migration feature. The following steps describe how to do that.
Add a new IP address on your loopback device which will be necessary for the new KinD cluster that you will create. On Mac, the default loopback device is lo0
.
sudo ip addr add 127.0.0.2 dev lo0 # adding 127.0.0.2 ip to the loopback interface
Next, setup the second KinD cluster:
make kind2-up KIND_ENV=local
This command sets up a new KinD cluster named gardener-local2
and stores its kubeconfig in the ./example/gardener-local/kind/local2/kubeconfig
file. You will need this file when starting the provider-local
extension controller for the second seed cluster.
make register-kind2-env # registering the local2 seed
make start-gardenlet SEED_NAME=local2 # starting gardenlet for the local2 seed
In a new terminal pane, run:
export KUBECONFIG=./example/gardener-local/kind/local2/kubeconfig # setting KUBECONFIG to point to second kind cluster
make start-extension-provider-local \
WEBHOOK_SERVER_PORT=9444 \
WEBHOOK_CERT_DIR=/tmp/gardener-extension-provider-local2 \
SERVICE_HOST_IP=127.0.0.2 \
METRICS_BIND_ADDRESS=:8082 \
HEALTH_BIND_ADDRESS=:8083 # starting gardener-extension-provider-local
If you want to perform a control plane migration you can follow the steps outlined in the Control Plane Migration topic to migrate the shoot cluster to the second seed you just created.
Deleting the Shoot
Cluster
./hack/usage/delete shoot local garden-local
(Optional): Tear Down the Second Seed Cluster
make tear-down-kind2-env
make kind2-down
Tear Down the Gardener Environment
make tear-down-local-env
make kind-down
Remote Local Setup
Just like Prow is executing the KinD based integration tests in a K8s pod, it is possible to interactively run this KinD based Gardener development environment aka “local setup” in a “remote” K8s pod.
k apply -f docs/development/content/remote-local-setup.yaml
k exec -it deployment/remote-local-setup -- sh
tmux -u a
Caveats
Please refer to the TMUX documentation for working effectively inside the remote-local-setup pod.
To access Grafana, Prometheus, or other components in a browser, two port forwards are needed:
The port forward from the laptop to the pod:
k port-forward deployment/remote-local-setup 3000
The port forward in the remote-local-setup pod to the respective component:
k port-forward -n shoot--local--local deployment/grafana-operators 3000
Related Links
5 - High Availability
High Availability of Deployed Components
gardenlet
s and extension controllers are deploying components via Deployment
s, StatefulSet
s, etc., as part of the shoot control plane, or the seed or shoot system components.
Some of the above component deployments must be further tuned to improve fault tolerance / resilience of the service. This document outlines what needs to be done to achieve this goal.
Please be forwarded to the Convenient Application Of These Rules section, if you want to take a shortcut to the list of actions that require developers’ attention.
Seed Clusters
The worker nodes of seed clusters can be deployed to one or multiple availability zones.
The Seed
specification allows you to provide the information which zones are available:
spec:
provider:
region: europe-1
zones:
- europe-1a
- europe-1b
- europe-1c
Independent of the number of zones, seed system components like the gardenlet
or the extension controllers themselves, or others like etcd-druid
, dependency-watchdog
, etc., should always be running with multiple replicas.
Concretely, all seed system components should respect the following conventions:
Replica Counts
Component Type < 3
Zones>= 3
ZonesComment Observability (Monitoring, Logging) 1 1 Downtimes accepted due to cost reasons Controllers 2 2 / (Webhook) Servers 2 2 / Apart from the above, there might be special cases where these rules do not apply, for example:
istio-ingressgateway
is scaled horizontally, hence the above numbers are the minimum values.nginx-ingress-controller
in the seed cluster is used to advertise all shoot observability endpoints, so due to performance reasons it runs with2
replicas at all times. In the future, this component might disappear in favor of theistio-ingressgateway
anyways.
Topology Spread Constraints
When the component has
>= 2
replicas …… then it should also have a
topologySpreadConstraint
, ensuring the replicas are spread over the nodes:spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway matchLabels: ...
Hence, the node spread is done on best-effort basis only.
… and the seed cluster has
>= 2
zones, then the component should also have a secondtopologySpreadConstraint
, ensuring the replicas are spread over the zones:spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule matchLabels: ...
According to these conventions, even seed clusters with only one availability zone try to be highly available “as good as possible” by spreading the replicas across multiple nodes. Hence, while such seed clusters obviously cannot handle zone outages, they can at least handle node failures.
Shoot Clusters
The Shoot
specification allows configuring “high availability” as well as the failure tolerance type for the control plane components, see Highly Available Shoot Control Plane for details.
Regarding the seed cluster selection, the only constraint is that shoot clusters with failure tolerance type zone
are only allowed to run on seed clusters with at least three zones.
All other shoot clusters (non-HA or those with failure tolerance type node
) can run on seed clusters with any number of zones.
Control Plane Components
All control plane components should respect the following conventions:
Replica Counts
Component Type w/o HA w/ HA ( node
)w/ HA ( zone
)Comment Observability (Monitoring, Logging) 1 1 1 Downtimes accepted due to cost reasons Controllers 1 2 2 / (Webhook) Servers 2 2 2 / Apart from the above, there might be special cases where these rules do not apply, for example:
etcd
is a server, though the most critical component of a cluster requiring a quorum to survive failures. Hence, it should have3
replicas even when the failure tolerance isnode
only.kube-apiserver
is scaled horizontally, hence the above numbers are the minimum values (even when the shoot cluster is not HA, there might be multiple replicas).
Topology Spread Constraints
When the component has
>= 2
replicas …… then it should also have a
topologySpreadConstraint
ensuring the replicas are spread over the nodes:spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway matchLabels: ...
Hence, the node spread is done on best-effort basis only.
However, if the shoot cluster has defined a failure tolerance type, the
whenUnsafisfiable
field should be set toDoNotSchedule
.… and the failure tolerance type of the shoot cluster is
zone
, then the component should also have a secondtopologySpreadConstraint
ensuring the replicas are spread over the zones:spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule matchLabels: ...
Node Affinity
The
gardenlet
annotates the shoot namespace in the seed cluster with thehigh-availability-config.resources.gardener.cloud/zones
annotation.- If the shoot cluster is non-HA or has failure tolerance type
node
, then the value will be always exactly one zone (e.g.,high-availability-config.resources.gardener.cloud/zones=europe-1b
). - If the shoot cluster has failure tolerance type
zone
, then the value will always contain exactly three zones (e.g.,high-availability-config.resources.gardener.cloud/zones=europe-1a,europe-1b,europe-1c
).
For backwards-compatibility, this annotation might contain multiple zones for shoot clusters created before
gardener/gardener@v1.60
and not having failure tolerance typezone
. This is because their volumes might already exist in multiple zones, hence pinning them to only one zone would not work.Hence, in case this annotation is present, the components should have the following node affinity:
spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - europe-1a # - ...
This is to ensure all pods are running in the same (set of) availability zone(s) such that cross-zone network traffic is avoided as much as possible (such traffic is typically charged by the underlying infrastructure provider).
- If the shoot cluster is non-HA or has failure tolerance type
System Components
The availability of system components is independent of the control plane since they run on the shoot worker nodes while the control plane components run on the seed worker nodes (for more information, see the Kubernetes architecture overview).
Hence, it only depends on the number of availability zones configured in the shoot worker pools via .spec.provider.workers[].zones
.
Concretely, the highest number of zones of a worker pool with systemComponents.allow=true
is considered.
All system components should respect the following conventions:
Replica Counts
Component Type 1
or2
Zones>= 3
ZonesControllers 2 2 (Webhook) Servers 2 2 Apart from the above, there might be special cases where these rules do not apply, for example:
coredns
is scaled horizontally (today), hence the above numbers are the minimum values (possibly, scaling these components vertically may be more appropriate, but that’s unrelated to the HA subject matter).- Optional addons like
nginx-ingress
orkubernetes-dashboard
are only provided on best-effort basis for evaluation purposes, hence they run with1
replica at all times.
Topology Spread Constraints
When the component has
>= 2
replicas …… then it should also have a
topologySpreadConstraint
ensuring the replicas are spread over the nodes:spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway matchLabels: ...
Hence, the node spread is done on best-effort basis only.
… and the cluster has
>= 2
zones, then the component should also have a secondtopologySpreadConstraint
ensuring the replicas are spread over the zones:spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule matchLabels: ...
Convenient Application of These Rules
According to above scenarios and conventions, the replicas
, topologySpreadConstraints
or affinity
settings of the deployed components might need to be adapted.
In order to apply those conveniently and easily for developers, Gardener installs a mutating webhook into both seed and shoot clusters which reacts on Deployment
s and StatefulSet
s deployed to namespaces with the high-availability-config.resources.gardener.cloud/consider=true
label set.
The following actions have to be taken by developers:
Check if
components
are prepared to run concurrently with multiple replicas, e.g. controllers usually use leader election to achieve this.All components should be generally equipped with
PodDisruptionBudget
s with.spec.maxUnavailable=1
:
spec:
maxUnavailable: 1
selector:
matchLabels: ...
- Add the label
high-availability-config.resources.gardener.cloud/type
todeployment
s orstatefulset
s, as well as optionally involvedhorizontalpodautoscaler
s orHVPA
s where the following two values are possible:
controller
server
Type server
is also preferred if a component is a controller and (webhook) server at the same time.
You can read more about the webhook’s internals in High Availability Config.
gardenlet
Internals
Make sure you have read the above document about the webhook internals before continuing reading this section.
Seed
Controller
The gardenlet
performs the following changes on all namespaces running seed system components:
- adds the label
high-availability-config.resources.gardener.cloud/consider=true
. - adds the annotation
high-availability-config.resources.gardener.cloud/zones=<zones>
, where<zones>
is the list provided in.spec.provider.zones[]
in theSeed
specification.
Note that neither the high-availability-config.resources.gardener.cloud/failure-tolerance-type
, nor the high-availability-config.resources.gardener.cloud/zone-pinning
annotations are set, hence the node affinity would never be touched by the webhook.
The only exception to this rule are the istio ingress gateway namespaces. This includes the default istio ingress gateway when SNI is enabled, as well as analogous namespaces for exposure classes and zone-specific istio ingress gateways. Those namespaces
will additionally be annotated with high-availability-config.resources.gardener.cloud/zone-pinning
set to true
, resulting in the node affinities and the topology spread constraints being set. The replicas are not touched, as the istio ingress gateways
are scaled by a horizontal autoscaler instance.
Shoot
Controller
Control Plane
The gardenlet
performs the following changes on the namespace running the shoot control plane components:
- adds the label
high-availability-config.resources.gardener.cloud/consider=true
. This makes the webhook mutate the replica count and the topology spread constraints. - adds the annotation
high-availability-config.resources.gardener.cloud/failure-tolerance-type
with value equal to.spec.controlPlane.highAvailability.failureTolerance.type
(or""
, if.spec.controlPlane.highAvailability=nil
). This makes the webhook mutate the node affinity according to the specified zone(s). - adds the annotation
high-availability-config.resources.gardener.cloud/zones=<zones>
, where<zones>
is a …- … random zone chosen from the
.spec.provider.zones[]
list in theSeed
specification (always only one zone (even if there are multiple available in the seed cluster)) in case theShoot
has no HA setting (i.e.,spec.controlPlane.highAvailability=nil
) or when theShoot
has HA setting with failure tolerance typenode
. - … list of three randomly chosen zones from the
.spec.provider.zones[]
list in theSeed
specification in case theShoot
has HA setting with failure tolerance typezone
.
- … random zone chosen from the
System Components
The gardenlet
performs the following changes on all namespaces running shoot system components:
- adds the label
high-availability-config.resources.gardener.cloud/consider=true
. This makes the webhook mutate the replica count and the topology spread constraints. - adds the annotation
high-availability-config.resources.gardener.cloud/zones=<zones>
where<zones>
is the merged list of zones provided in.zones[]
withsystemComponents.allow=true
for all worker pools in.spec.provider.workers[]
in theShoot
specification.
Note that neither the high-availability-config.resources.gardener.cloud/failure-tolerance-type
, nor the high-availability-config.resources.gardener.cloud/zone-pinning
annotations are set, hence the node affinity would never be touched by the webhook.
6 - Kubernetes Clients
Kubernetes Clients in Gardener
This document aims at providing a general developer guideline on different aspects of using Kubernetes clients in a large-scale distributed system and project like Gardener. The points included here are not meant to be consulted as absolute rules, but rather as general rules of thumb that allow developers to get a better feeling about certain gotchas and caveats. It should be updated with lessons learned from maintaining the project and running Gardener in production.
Prerequisites:
Please familiarize yourself with the following basic Kubernetes API concepts first, if you’re new to Kubernetes. A good understanding of these basics will help you better comprehend the following document.
- Kubernetes API Concepts (including terminology, watch basics, etc.)
- Extending the Kubernetes API (including Custom Resources and aggregation layer / extension API servers)
- Extend the Kubernetes API with CustomResourceDefinitions
- Working with Kubernetes Objects
- Sample Controller (the diagram helps to build an understanding of an controller’s basic structure)
Client Types: Client-Go, Generated, Controller-Runtime
For historical reasons, you will find different kinds of Kubernetes clients in Gardener:
Client-Go Clients
client-go is the default/official client for talking to the Kubernetes API in Golang.
It features the so called “client sets” for all built-in Kubernetes API groups and versions (e.g. v1
(aka core/v1
), apps/v1
, etc.).
client-go clients are generated from the built-in API types using client-gen and are composed of interfaces for every known API GroupVersionKind.
A typical client-go usage looks like this:
var (
ctx context.Context
c kubernetes.Interface // "k8s.io/client-go/kubernetes"
deployment *appsv1.Deployment // "k8s.io/api/apps/v1"
)
updatedDeployment, err := c.AppsV1().Deployments("default").Update(ctx, deployment, metav1.UpdateOptions{})
Important characteristics of client-go clients:
- clients are specific to a given API GroupVersionKind, i.e., clients are hard-coded to corresponding API-paths (don’t need to use the discovery API to map GVK to a REST endpoint path).
- client’s don’t modify the passed in-memory object (e.g.
deployment
in the above example). Instead, they return a new in-memory object.
This means that controllers have to continue working with the new in-memory object or overwrite the shared object to not lose any state updates.
Generated Client Sets for Gardener APIs
Gardener’s APIs extend the Kubernetes API by registering an extension API server (in the garden cluster) and CustomResourceDefinition
s (on Seed clusters), meaning that the Kubernetes API will expose additional REST endpoints to manage Gardener resources in addition to the built-in API resources.
In order to talk to these extended APIs in our controllers and components, client-gen is used to generate client-go-style clients to pkg/client/{core,extensions,seedmanagement,...}
.
Usage of these clients is equivalent to client-go
clients, and the same characteristics apply. For example:
var (
ctx context.Context
c gardencoreclientset.Interface // "github.com/gardener/gardener/pkg/client/core/clientset/versioned"
shoot *gardencorev1beta1.Shoot // "github.com/gardener/gardener/pkg/apis/core/v1beta1"
)
updatedShoot, err := c.CoreV1beta1().Shoots("garden-my-project").Update(ctx, shoot, metav1.UpdateOptions{})
Controller-Runtime Clients
controller-runtime is a Kubernetes community project (kubebuilder subproject) for building controllers and operators for custom resources. Therefore, it features a generic client that follows a different approach and does not rely on generated client sets. Instead, the client can be used for managing any Kubernetes resources (built-in or custom) homogeneously. For example:
var (
ctx context.Context
c client.Client // "sigs.k8s.io/controller-runtime/pkg/client"
deployment *appsv1.Deployment // "k8s.io/api/apps/v1"
shoot *gardencorev1beta1.Shoot // "github.com/gardener/gardener/pkg/apis/core/v1beta1"
)
err := c.Update(ctx, deployment)
// or
err = c.Update(ctx, shoot)
A brief introduction to controller-runtime and its basic constructs can be found at the official Go documentation.
Important characteristics of controller-runtime clients:
- The client functions take a generic
client.Object
orclient.ObjectList
value. These interfaces are implemented by all Golang types, that represent Kubernetes API objects or lists respectively which can be interacted with via usual API requests. [1] - The client first consults a
runtime.Scheme
(configured during client creation) for recognizing the object’sGroupVersionKind
(this happens on the client-side only).
Aruntime.Scheme
is basically a registry for Golang API types, defaulting and conversion functions. Schemes are usually provided perGroupVersion
(see this example forapps/v1
) and can be combined to one single scheme for further usage (example). In controller-runtime clients, schemes are used only for mapping a typed API object to itsGroupVersionKind
. - It then consults a
meta.RESTMapper
(also configured during client creation) for mapping theGroupVersionKind
to aRESTMapping
, which contains theGroupVersionResource
andScope
(namespaced or cluster-scoped). From these values, the client can unambiguously determine the REST endpoint path of the corresponding API resource. For instance:appsv1.DeploymentList
is available at/apis/apps/v1/deployments
or/apis/apps/v1/namespaces/<namespace>/deployments
respectively.- There are different
RESTMapper
implementations, but generally they are talking to the API server’s discovery API for retrievingRESTMappings
for all API resources known to the API server (either built-in, registered via API extension orCustomResourceDefinition
s). - The default implementation of a controller-runtime (which Gardener uses as well) is the dynamic
RESTMapper
. It caches discovery results (i.e.RESTMappings
) in-memory and only re-discovers resources from the API server when a client tries to use an unknownGroupVersionKind
, i.e., when it encounters aNo{Kind,Resource}MatchError
.
- There are different
- The client writes back results from the API server into the passed in-memory object.
- This means that controllers don’t have to worry about copying back the results and should just continue to work on the given in-memory object.
- This is a nice and flexible pattern, and helper functions should try to follow it wherever applicable. Meaning, if possible accept an object param, pass it down to clients and keep working on the same in-memory object instead of creating a new one in your helper function.
- The benefit is that you don’t lose updates to the API object and always have the last-known state in memory. Therefore, you don’t have to read it again, e.g., for getting the current
resourceVersion
when working with optimistic locking, and thus minimize the chances for running into conflicts. - However, controllers must not use the same in-memory object concurrently in multiple goroutines. For example, decoding results from the API server in multiple goroutines into the same maps (e.g., labels, annotations) will cause panics because of “concurrent map writes”. Also, reading from an in-memory API object in one goroutine while decoding into it in another goroutine will yield non-atomic reads, meaning data might be corrupt and represent a non-valid/non-existing API object.
- Therefore, if you need to use the same in-memory object in multiple goroutines concurrently (e.g., shared state), remember to leverage proper synchronization techniques like channels, mutexes,
atomic.Value
and/or copy the object prior to use. The average controller however, will not need to share in-memory API objects between goroutines, and it’s typically an indicator that the controller’s design should be improved.
- The client decoder erases the object’s
TypeMeta
(apiVersion
andkind
fields) after retrieval from the API server, see kubernetes/kubernetes#80609, kubernetes-sigs/controller-runtime#1517. Unstructured and metadata-only requests objects are an exception to this because the containedTypeMeta
is the only way to identify the object’s type. Because of this behavior,obj.GetObjectKind().GroupVersionKind()
is likely to return an emptyGroupVersionKind
. I.e., you must not rely onTypeMeta
being set orGetObjectKind()
to return something usable.
If you need to identify an object’sGroupVersionKind
, use a scheme and itsObjectKinds
function instead (or the helper functionapiutil.GVKForObject
). This is not specific to controller-runtime clients and applies to client-go clients as well.
[1] Other lower level, config or internal API types (e.g., such as AdmissionReview
) don’t implement client.Object
. However, you also can’t interact with such objects via the Kubernetes API and thus also not via a client, so this can be disregarded at this point.
Metadata-Only Clients
Additionally, controller-runtime clients can be used to easily retrieve metadata-only objects or lists.
This is useful for efficiently checking if at least one object of a given kind exists, or retrieving metadata of an object, if one is not interested in the rest (e.g., spec/status).
The Accept
header sent to the API server then contains application/json;as=PartialObjectMetadataList;g=meta.k8s.io;v=v1
, which makes the API server only return metadata of the retrieved object(s).
This saves network traffic and CPU/memory load on the API server and client side.
If the client fully lists all objects of a given kind including their spec/status, the resulting list can be quite large and easily exceed the controllers available memory.
That’s why it’s important to carefully check if a full list is actually needed, or if metadata-only list can be used instead.
For example:
var (
ctx context.Context
c client.Client // "sigs.k8s.io/controller-runtime/pkg/client"
shootList = &metav1.PartialObjectMetadataList{} // "k8s.io/apimachinery/pkg/apis/meta/v1"
)
shootList.SetGroupVersionKind(gardencorev1beta1.SchemeGroupVersion.WithKind("ShootList"))
if err := c.List(ctx, shootList, client.InNamespace("garden-my-project"), client.Limit(1)); err != nil {
return err
}
if len(shootList.Items) > 0 {
// project has at least one shoot
} else {
// project doesn't have any shoots
}
Gardener’s Client Collection, ClientMaps
The Gardener codebase has a collection of clients (kubernetes.Interface
), which can return all the above mentioned client types.
Additionally, it contains helpers for rendering and applying helm charts (ChartRender
, ChartApplier
) and retrieving the API server’s version (Version
).
Client sets are managed by so called ClientMap
s, which are a form of registry for all client set for a given type of cluster, i.e., Garden, Seed and Shoot.
ClientMaps manage the whole lifecycle of clients: they take care of creating them if they don’t exist already, running their caches, refreshing their cached server version and invalidating them when they are no longer needed.
var (
ctx context.Context
cm clientmap.ClientMap // "github.com/gardener/gardener/pkg/client/kubernetes/clientmap"
shoot *gardencorev1beta1.Shoot
)
cs, err := cm.GetClient(ctx, keys.ForShoot(shoot)) // kubernetes.Interface
if err != nil {
return err
}
c := cs.Client() // client.Client
The client collection mainly exist for historical reasons (there used to be a lot of code using the client-go style clients). However, Gardener is in the process of moving more towards controller-runtime and only using their clients, as they provide many benefits and are much easier to use. Also, gardener/gardener#4251 aims at refactoring our controller and admission components to native controller-runtime components.
⚠️ Please always prefer controller-runtime clients over other clients when writing new code or refactoring existing code.
Cache Types: Informers, Listers, Controller-Runtime Caches
Similar to the different types of client(set)s, there are also different kinds of Kubernetes client caches.
However, all of them are based on the same concept: Informer
s.
An Informer
is a watch-based cache implementation, meaning it opens watch connections to the API server and continuously updates cached objects based on the received watch events (ADDED
, MODIFIED
, DELETED
).
Informer
s offer to add indices to the cache for efficient object lookup (e.g., by name or labels) and to add EventHandler
s for the watch events.
The latter is used by controllers to fill queues with objects that should be reconciled on watch events.
Informers are used in and created via several higher-level constructs:
SharedInformerFactories, Listers
The generated clients (built-in as well as extended) feature a SharedInformerFactory
for every API group, which can be used to create and retrieve Informers
for all GroupVersionKinds.
Similarly, it can be used to retrieve Listers
that allow getting and listing objects from the Informer
’s cache.
However, both of these constructs are only used for historical reasons, and we are in the process of migrating away from them in favor of cached controller-runtime clients (see gardener/gardener#2414, gardener/gardener#2822). Thus, they are described only briefly here.
Important characteristics of Listers:
- Objects read from Informers and Listers can always be slightly out-out-date (i.e., stale) because the client has to first observe changes to API objects via watch events (which can intermittently lag behind by a second or even more).
- Thus, don’t make any decisions based on data read from Listers if the consequences of deciding wrongfully based on stale state might be catastrophic (e.g. leaking infrastructure resources). In such cases, read directly from the API server via a client instead.
- Objects retrieved from Informers or Listers are pointers to the cached objects, so they must not be modified without copying them first, otherwise the objects in the cache are also modified.
Controller-Runtime Caches
controller-runtime features a cache implementation that can be used equivalently as their clients. In fact, it implements a subset of the client.Client
interface containing the Get
and List
functions.
Under the hood, a cache.Cache
dynamically creates Informers
(i.e., opens watches) for every object GroupVersionKind that is being retrieved from it.
Note that the underlying Informers of a controller-runtime cache (cache.Cache
) and the ones of a SharedInformerFactory
(client-go) are not related in any way.
Both create Informers
and watch objects on the API server individually.
This means that if you read the same object from different cache implementations, you may receive different versions of the object because the watch connections of the individual Informers are not synced.
⚠️ Because of this, controllers/reconcilers should get the object from the same cache in the reconcile loop, where the
EventHandler
was also added to set up the controller. For example, if aSharedInformerFactory
is used for setting up the controller then read the object in the reconciler from theLister
instead of from a cached controller-runtime client.
By default, the client.Client
created by a controller-runtime Manager
is a DelegatingClient
. It delegates Get
and List
calls to a Cache
, and all other calls to a client that talks directly to the API server. Exceptions are requests with *unstructured.Unstructured
objects and object kinds that were configured to be excluded from the cache in the DelegatingClient
.
ℹ️
kubernetes.Interface.Client()
returns aDelegatingClient
that uses the cache returned fromkubernetes.Interface.Cache()
under the hood. This means that allClient()
usages need to be ready for cached clients and should be able to cater with stale cache reads.
Important characteristics of cached controller-runtime clients:
- Like for Listers, objects read from a controller-runtime cache can always be slightly out of date. Hence, don’t base any important decisions on data read from the cache (see above).
- In contrast to Listers, controller-runtime caches fill the passed in-memory object with the state of the object in the cache (i.e., they perform something like a “deep copy into”). This means that objects read from a controller-runtime cache can safely be modified without unintended side effects.
- Reading from a controller-runtime cache or a cached controller-runtime client implicitly starts a watch for the given object kind under the hood. This has important consequences:
- Reading a given object kind from the cache for the first time can take up to a few seconds depending on size and amount of objects as well as API server latency. This is because the cache has to do a full list operation and wait for an initial watch sync before returning results.
- ⚠️ Controllers need appropriate RBAC permissions for the object kinds they retrieve via cached clients (i.e.,
list
andwatch
). - ⚠️ By default, watches started by a controller-runtime cache are cluster-scoped, meaning it watches and caches objects across all namespaces. Thus, be careful which objects to read from the cache as it might significantly increase the controller’s memory footprint.
- There is no interaction with the cache on writing calls (
Create
,Update
,Patch
andDelete
), see below.
Uncached objects, filtered caches, APIReader
s:
In order to allow more granular control over which object kinds should be cached and which calls should bypass the cache, controller-runtime offers a few mechanisms to further tweak the client/cache behavior:
- When creating a
DelegatingClient
, certain object kinds can be configured to always be read directly from the API instead of from the cache. Note that this does not prevent starting a new Informer when retrieving them directly from the cache. - Watches can be restricted to a given (set of) namespace(s) by using
cache.MultiNamespacedCacheBuilder
or settingcache.Options.Namespace
. - Watches can be filtered (e.g., by label) per object kind by configuring
cache.Options.SelectorsByObject
on creation of the cache. - Retrieving metadata-only objects or lists from a cache results in a metadata-only watch/cache for that object kind.
- The
APIReader
can be used to always talk directly to the API server for a givenGet
orList
call (use with care and only as a last resort!).
To Cache or Not to Cache
Although watch-based caches are an important factor for the immense scalability of Kubernetes, it definitely comes at a price (mainly in terms of memory consumption). Thus, developers need to be careful when introducing new API calls and caching new object kinds. Here are some general guidelines on choosing whether to read from a cache or not:
- Always try to use the cache wherever possible and make your controller able to tolerate stale reads.
- Leverage optimistic locking: use deterministic naming for objects you create (this is what the
Deployment
controller does [2]). - Leverage optimistic locking / concurrency control of the API server: send updates/patches with the last-known
resourceVersion
from the cache (see below). This will make the request fail, if there were concurrent updates to the object (conflict error), which indicates that we have operated on stale data and might have made wrong decisions. In this case, let the controller handle the error with exponential backoff. This will make the controller eventually consistent. - Track the actions you took, e.g., when creating objects with
generateName
(this is what theReplicaSet
controller does [3]). The actions can be tracked in memory and repeated if the expected watch events don’t occur after a given amount of time. - Always try to write controllers with the assumption that data will only be eventually correct and can be slightly out of date (even if read directly from the API server!).
- If there is already some other code that needs a cache (e.g., a controller watch), reuse it instead of doing extra direct reads.
- Don’t read an object again if you just sent a write request. Write requests (
Create
,Update
,Patch
andDelete
) don’t interact with the cache. Hence, use the current state that the API server returned (filled into the passed in-memory object), which is basically a “free direct read” instead of reading the object again from a cache, because this will probably set back the object to an olderresourceVersion
.
- Leverage optimistic locking: use deterministic naming for objects you create (this is what the
- If you are concerned about the impact of the resulting cache, try to minimize that by using filtered or metadata-only watches.
- If watching and caching an object type is not feasible, for example because there will be a lot of updates, and you are only interested in the object every ~5m, or because it will blow up the controllers memory footprint, fallback to a direct read. This can either be done by disabling caching the object type generally or doing a single request via an
APIReader
. In any case, please bear in mind that every direct API call results in a quorum read from etcd, which can be costly in a heavily-utilized cluster and impose significant scalability limits. Thus, always try to minimize the impact of direct calls by filtering results by namespace or labels, limiting the number of results and/or using metadata-only calls.
[2] The Deployment
controller uses the pattern <deployment-name>-<podtemplate-hash>
for naming ReplicaSets
. This means, the name of a ReplicaSet
it tries to create/update/delete at any given time is deterministically calculated based on the Deployment
object. By this, it is insusceptible to stale reads from its ReplicaSets
cache.
[3] In simple terms, the ReplicaSet
controller tracks its CREATE pod
actions as follows: when creating new Pods
, it increases a counter of expected ADDED
watch events for the corresponding ReplicaSet
. As soon as such events arrive, it decreases the counter accordingly. It only creates new Pods
for a given ReplicaSet
once all expected events occurred (counter is back to zero) or a timeout has occurred. This way, it prevents creating more Pods
than desired because of stale cache reads and makes the controller eventually consistent.
Conflicts, Concurrency Control, and Optimistic Locking
Every Kubernetes API object contains the metadata.resourceVersion
field, which identifies an object’s version in the backing data store, i.e., etcd. Every write to an object in etcd results in a newer resourceVersion
.
This field is mainly used for concurrency control on the API server in an optimistic locking fashion, but also for efficient resumption of interrupted watch connections.
Optimistic locking in the Kubernetes API sense means that when a client wants to update an API object, then it includes the object’s resourceVersion
in the request to indicate the object’s version the modifications are based on.
If the resourceVersion
in etcd has not changed in the meantime, the update request is accepted by the API server and the updated object is written to etcd.
If the resourceVersion
sent by the client does not match the one of the object stored in etcd, there were concurrent modifications to the object. Consequently, the request is rejected with a conflict error (status code 409
, API reason Conflict
), for example:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Operation cannot be fulfilled on configmaps \"foo\": the object has been modified; please apply your changes to the latest version and try again",
"reason": "Conflict",
"details": {
"name": "foo",
"kind": "configmaps"
},
"code": 409
}
This concurrency control is an important mechanism in Kubernetes as there are typically multiple clients acting on API objects at the same time (humans, different controllers, etc.). If a client receives a conflict error, it should read the object’s latest version from the API server, make the modifications based on the newest changes, and retry the update. The reasoning behind this is that a client might choose to make different decisions based on the concurrent changes made by other actors compared to the outdated version that it operated on.
Important points about concurrency control and conflicts:
- The
resourceVersion
field carries a string value and clients must not assume numeric values (the type and structure of versions depend on the backing data store). This means clients may compareresourceVersion
values to detect whether objects were changed. But they must not compareresourceVersion
s to figure out which one is newer/older, i.e., no greater/less-than comparisons are allowed. - By default, update calls (e.g. via client-go and controller-runtime clients) use optimistic locking as the passed in-memory usually object contains the latest
resourceVersion
known to the controller, which is then also sent to the API server. - API servers can also choose to accept update calls without optimistic locking (i.e., without a
resourceVersion
in the object’s metadata) for any given resource. However, sending update requests without optimistic locking is strongly discouraged, as doing so overwrites the entire object, discarding any concurrent changes made to it. - On the other side, patch requests can always be executed either with or without optimistic locking, by (not) including the
resourceVersion
in the patched object’s metadata. Sending patch requests without optimistic locking might be safe and even desirable as a patch typically updates only a specific section of the object. However, there are also situations where patching without optimistic locking is not safe (see below).
Don’t Retry on Conflict
Similar to how a human would typically handle a conflict error, there are helper functions implementing RetryOnConflict
-semantics, i.e., try an update call, then re-read the object if a conflict occurs, apply the modification again and retry the update.
However, controllers should generally not use RetryOnConflict
-semantics. Instead, controllers should abort their current reconciliation run and let the queue handle the conflict error with exponential backoff.
The reasoning behind this is that a conflict error indicates that the controller has operated on stale data and might have made wrong decisions earlier on in the reconciliation.
When using a helper function that implements RetryOnConflict
-semantics, the controller doesn’t check which fields were changed and doesn’t revise its previous decisions accordingly.
Instead, retrying on conflict basically just ignores any conflict error and blindly applies the modification.
To properly solve the conflict situation, controllers should immediately return with the error from the update call. This will cause retries with exponential backoff so that the cache has a chance to observe the latest changes to the object. In a later run, the controller will then make correct decisions based on the newest version of the object, not run into conflict errors, and will then be able to successfully reconcile the object. This way, the controller becomes eventually consistent.
The other way to solve the situation is to modify objects without optimistic locking in order to avoid running into a conflict in the first place (only if this is safe). This can be a preferable solution for controllers with long-running reconciliations (which is actually an anti-pattern but quite unavoidable in some of Gardener’s controllers). Aborting the entire reconciliation run is rather undesirable in such cases, as it will add a lot of unnecessary waiting time for end users and overhead in terms of compute and network usage.
However, in any case, retrying on conflict is probably not the right option to solve the situation (there are some correct use cases for it, though, they are very rare). Hence, don’t retry on conflict.
To Lock or Not to Lock
As explained before, conflicts are actually important and prevent clients from doing wrongful concurrent updates. This means that conflicts are not something we generally want to avoid or ignore. However, in many cases controllers are exclusive owners of the fields they want to update and thus it might be safe to run without optimistic locking.
For example, the gardenlet is the exclusive owner of the spec
section of the Extension resources it creates on behalf of a Shoot (e.g., the Infrastructure
resource for creating VPC, etc.). Meaning, it knows the exact desired state and no other actor is supposed to update the Infrastructure’s spec
fields.
When the gardenlet now updates the Infrastructures spec
section as part of the Shoot reconciliation, it can simply issue a PATCH
request that only updates the spec
and runs without optimistic locking.
If another controller concurrently updated the object in the meantime (e.g., the status
section), the resourceVersion
got changed, which would cause a conflict error if running with optimistic locking.
However, concurrent status
updates would not change the gardenlet’s mind on the desired spec
of the Infrastructure resource as it is determined only by looking at the Shoot’s specification.
If the spec
section was changed concurrently, it’s still fine to overwrite it because the gardenlet should reconcile the spec
back to its desired state.
Generally speaking, if a controller is the exclusive owner of a given set of fields and they are independent of concurrent changes to other fields in that object, it can patch these fields without optimistic locking. This might ignore concurrent changes to other fields or blindly overwrite changes to the same fields, but this is fine if the mentioned conditions apply. Obviously, this applies only to patch requests that modify only a specific set of fields but not to update requests that replace the entire object.
In such cases, it’s even desirable to run without optimistic locking as it will be more performant and save retries. If certain requests are made with high frequency and have a good chance of causing conflicts, retries because of optimistic locking can cause a lot of additional network traffic in a large-scale Gardener installation.
Updates, Patches, Server-Side Apply
There are different ways of modifying Kubernetes API objects. The following snippet demonstrates how to do a given modification with the most frequently used options using a controller-runtime client:
var (
ctx context.Context
c client.Client
shoot *gardencorev1beta1.Shoot
)
// update
shoot.Spec.Kubernetes.Version = "1.22"
err := c.Update(ctx, shoot)
// json merge patch
patch := client.MergeFrom(shoot.DeepCopy())
shoot.Spec.Kubernetes.Version = "1.22"
err = c.Patch(ctx, shoot, patch)
// strategic merge patch
patch = client.StrategicMergeFrom(shoot.DeepCopy())
shoot.Spec.Kubernetes.Version = "1.22"
err = c.Patch(ctx, shoot, patch)
Important characteristics of the shown request types:
- Update requests always send the entire object to the API server and update all fields accordingly. By default, optimistic locking is used (
resourceVersion
is included). - Both patch types run without optimistic locking by default. However, it can be enabled explicitly if needed:
// json merge patch + optimistic locking patch := client.MergeFromWithOptions(shoot.DeepCopy(), client.MergeFromWithOptimisticLock{}) // ... // strategic merge patch + optimistic locking patch = client.StrategicMergeFrom(shoot.DeepCopy(), client.MergeFromWithOptimisticLock{}) // ...
- Patch requests only contain the changes made to the in-memory object between the copy passed to
client.*MergeFrom
and the object passed toClient.Patch()
. The diff is calculated on the client-side based on the in-memory objects only. This means that if in the meantime some fields were changed on the API server to a different value than the one on the client-side, the fields will not be changed back as long as they are not changed on the client-side as well (there will be no diff in memory). - Thus, if you want to ensure a given state using patch requests, always read the object first before patching it, as there will be no diff otherwise, meaning the patch will be empty. For more information, see gardener/gardener#4057 and the comments in gardener/gardener#4027.
- Also, always send updates and patch requests even if your controller hasn’t made any changes to the current state on the API server. I.e., don’t make any optimization for preventing empty patches or no-op updates. There might be mutating webhooks in the system that will modify the object and that rely on update/patch requests being sent (even if they are no-op). Gardener’s extension concept makes heavy use of mutating webhooks, so it’s important to keep this in mind.
- JSON merge patches always replace lists as a whole and don’t merge them. Keep this in mind when operating on lists with merge patch requests. If the controller is the exclusive owner of the entire list, it’s safe to run without optimistic locking. Though, if you want to prevent overwriting concurrent changes to the list or its items made by other actors (e.g., additions/removals to the
metadata.finalizers
list), enable optimistic locking. - Strategic merge patches are able to make more granular modifications to lists and their elements without replacing the entire list. It uses Golang struct tags of the API types to determine which and how lists should be merged. See Update API Objects in Place Using kubectl patch or the strategic merge patch documentation for more in-depth explanations and comparison with JSON merge patches.
With this, controllers might be able to issue patch requests for individual list items without optimistic locking, even if they are not exclusive owners of the entire list. Remember to check the
patchStrategy
andpatchMergeKey
struct tags of the fields you want to modify before blindly adding patch requests without optimistic locking. - Strategic merge patches are only supported by built-in Kubernetes resources and custom resources served by Extension API servers. Strategic merge patches are not supported by custom resources defined by
CustomResourceDefinition
s (see this comparison). In that case, fallback to JSON merge patches. - Server-side Apply is yet another mechanism to modify API objects, which is supported by all API resources (in newer Kubernetes versions). However, it has a few problems and more caveats preventing us from using it in Gardener at the time of writing. See gardener/gardener#4122 for more details.
Generally speaking, patches are often the better option compared to update requests because they can save network traffic, encoding/decoding effort, and avoid conflicts under the presented conditions. If choosing a patch type, consider which type is supported by the resource you’re modifying and what will happen in case of a conflict. Consider whether your modification is safe to run without optimistic locking. However, there is no simple rule of thumb on which patch type to choose.
On Helper Functions
Here is a note on some helper functions, that should be avoided and why:
controllerutil.CreateOrUpdate
does a basic get, mutate and create or update call chain, which is often used in controllers. We should avoid using this helper function in Gardener, because it is likely to cause conflicts for cached clients and doesn’t send no-op requests if nothing was changed, which can cause problems because of the heavy use of webhooks in Gardener extensions (see above).
That’s why usage of this function was completely replaced in gardener/gardener#4227 and similar PRs.
controllerutil.CreateOrPatch
is similar to CreateOrUpdate
but does a patch request instead of an update request. It has the same drawback as CreateOrUpdate
regarding no-op updates.
Also, controllers can’t use optimistic locking or strategic merge patches when using CreateOrPatch
.
Another reason for avoiding use of this function is that it also implicitly patches the status section if it was changed, which is confusing for others reading the code. To accomplish this, the func does some back and forth conversion, comparison and checks, which are unnecessary in most of our cases and simply wasted CPU cycles and complexity we want to avoid.
There were some Try{Update,UpdateStatus,Patch,PatchStatus}
helper functions in Gardener that were already removed by gardener/gardener#4378 but are still used in some extension code at the time of writing.
The reason for eliminating these functions is that they implement RetryOnConflict
-semantics. Meaning, they first get the object, mutate it, then try to update and retry if a conflict error occurs.
As explained above, retrying on conflict is a controller anti-pattern and should be avoided in almost every situation.
The other problem with these functions is that they read the object first from the API server (always do a direct call), although in most cases we already have a recent version of the object at hand. So, using this function generally does unnecessary API calls and therefore causes unwanted compute and network load.
For the reasons explained above, there are similar helper functions that accomplish similar things but address the mentioned drawbacks: controllerutils.{GetAndCreateOrMergePatch,GetAndCreateOrStrategicMergePatch}
.
These can be safely used as replacements for the aforementioned helper funcs.
If they are not fitting for your use case, for example because you need to use optimistic locking, just do the appropriate calls in the controller directly.
Related Links
- Kubernetes Client usage in Gardener (Community Meeting talk, 2020-06-26)
These resources are only partially related to the topics covered in this doc, but might still be interesting for developer seeking a deeper understanding of Kubernetes API machinery, architecture and foundational concepts.
7 - Local Setup
Overview
Conceptually, all Gardener components are designed to run as a Pod inside a Kubernetes cluster. The Gardener API server extends the Kubernetes API via the user-aggregated API server concepts. However, if you want to develop it, you may want to work locally with the Gardener without building a Docker image and deploying it to a cluster each and every time. That means that the Gardener runs outside a Kubernetes cluster which requires providing a Kubeconfig in your local filesystem and point the Gardener to it when starting it (see below).
Further details can be found in
This guide is split into three main parts:
- Preparing your setup by installing all dependencies and tools
- Building and starting Gardener components locally
- Using your local Gardener setup to create a Shoot
Preparing the Setup
[macOS only] Installing homebrew
The copy-paste instructions in this guide are designed for macOS and use the package manager Homebrew.
On macOS run
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Installing git
We use git
as VCS which you need to install. On macOS run
brew install git
For other OS, please check the Git installation documentation.
Installing Go
Install the latest version of Go. On macOS run
brew install go
For other OS, please check Go installation documentation.
Installing kubectl
Install kubectl
. Please make sure that the version of kubectl
is at least v1.20.x
. On macOS run
brew install kubernetes-cli
For other OS, please check the kubectl installation documentation.
Installing Docker
You need to have docker installed and running. On macOS run
brew install --cask docker
For other OS please check the docker installation documentation.
Installing iproute2
iproute2
provides a collection of utilities for network administration and configuration. On macOS run
brew install iproute2mac
Installing jq
brew install jq
Installing GNU Parallel
GNU Parallel is a shell tool for executing jobs in parallel, used by the code generation scripts (make generate
). On macOS run
brew install parallel
[macOS only] Install GNU Core Utilities
When running on macOS, install the GNU core utilities and friends:
brew install coreutils gnu-sed gnu-tar grep
This will create symbolic links for the GNU utilities with g
prefix in /usr/local/bin
, e.g., gsed
or gbase64
. To allow using them without the g
prefix please put /usr/local/opt/coreutils/libexec/gnubin
etc., at the beginning of your PATH
environment variable, e.g., export PATH=/usr/local/opt/coreutils/libexec/gnubin:$PATH
(brew
will print out instructions for each installed formula).
export PATH=/usr/local/opt/coreutils/libexec/gnubin:$PATH
export PATH=/usr/local/opt/gnu-sed/libexec/gnubin:$PATH
export PATH=/usr/local/opt/gnu-tar/libexec/gnubin:$PATH
export PATH=/usr/local/opt/grep/libexec/gnubin:$PATH
[Windows Only] WSL2
Apart from Linux distributions and macOS, the local gardener setup can also run on the Windows Subsystem for Linux 2.
While WSL1, plain docker for Windows and various Linux distributions and local Kubernetes environments may be supported, this setup was verified with:
- WSL2
- Docker Desktop WSL2 Engine
- Ubuntu 18.04 LTS on WSL2
- Nodeless local garden (see below)
The Gardener repository and all the above-mentioned tools (git, golang, kubectl, …) should be installed in your WSL2 distro, according to the distribution-specific Linux installation instructions.
Start Gardener Locally
Get the Sources
Clone the repository from GitHub into your $GOPATH
.
mkdir -p $(go env GOPATH)/src/github.com/gardener
cd $(go env GOPATH)/src/github.com/gardener
git clone git@github.com:gardener/gardener.git
cd gardener
Note: Gardener is using Go modules and cloning the repository into
$GOPATH
is not a hard requirement. However it is still recommended to clone into$GOPATH
becausek8s.io/code-generator
does not work yet outside of$GOPATH
- kubernetes/kubernetes#86753.
Start the Gardener
ℹ️ In the following guide, you have to define the configuration (CloudProfile
s, SecretBinding
s, Seed
s, etc.) manually for the infrastructure environment you want to develop against.
Additionally, you have to register the respective Gardener extensions manually.
If you are rather looking for a quick start guide to develop entirely locally on your machine (no real cloud provider or infrastructure involved), then you should rather follow this guide.
Start a Local Kubernetes Cluster
For the development of Gardener you need a Kubernetes API server on which you can register Gardener’s own Extension API Server as APIService
. This cluster doesn’t need any worker nodes to run pods, though, therefore, you can use the “nodeless Garden cluster setup” residing in hack/local-garden
. This will start all minimally required components of a Kubernetes cluster (etcd
, kube-apiserver
, kube-controller-manager
)
and an etcd
Instance for the gardener-apiserver
as Docker containers. This is the easiest way to get your
Gardener development setup up and running.
Using the nodeless cluster setup
Use the provided Makefile rules to start your local Garden:
make local-garden-up
[...]
Starting gardener-dev kube-etcd cluster..!
Starting gardener-dev kube-apiserver..!
Starting gardener-dev kube-controller-manager..!
Starting gardener-dev gardener-etcd cluster..!
namespace/garden created
clusterrole.rbac.authorization.k8s.io/gardener.cloud:admin created
clusterrolebinding.rbac.authorization.k8s.io/front-proxy-client created
[...]
ℹ️ [Optional] If you want to develop the SeedAuthorization
feature then you have to run make ACTIVATE_SEEDAUTHORIZER=true local-garden-up
. However, please note that this forces you to start the gardener-admission-controller
via make start-admission-controller
.
To tear down the local Garden cluster and remove the Docker containers, simply run:
make local-garden-down
Alternative: Using a local Kubernetes cluster
Instead of starting a Kubernetes API server and etcd as docker containers, you can also opt for running a local Kubernetes cluster, provided by e.g. minikube, kind or docker desktop.
Note: Gardener requires self-contained kubeconfig files because of a security issue. You can configure your minikube to create self-contained kubeconfig files via:
minikube config set embed-certs true
or when starting the local cluster
minikube start --embed-certs
Alternative: Using a remote Kubernetes cluster
For some testing scenarios, you may want to use a remote cluster instead of a local one as your Garden cluster.
To do this, you can use the “remote Garden cluster setup” residing in hack/remote-garden
. This will start an etcd
instance for the gardener-apiserver
as a Docker container, and open tunnels for accessing local gardener components from the remote cluster.
To avoid mistakes, the remote cluster must have a garden
namespace labeled with gardener.cloud/purpose=remote-garden
.
You must create the garden
namespace and label it manually before running make remote-garden-up
as described below.
Use the provided Makefile
rules to bootstrap your remote Garden:
export KUBECONFIG=<path to kubeconfig>
make remote-garden-up
[...]
# Start gardener etcd used to store gardener resources (e.g., seeds, shoots)
Starting gardener-dev-remote gardener-etcd cluster!
[...]
# Open tunnels for accessing local gardener components from the remote cluster
[...]
To close the tunnels and remove the locally-running Docker containers, run:
make remote-garden-down
ℹ️ [Optional] If you want to use the remote Garden cluster setup with the SeedAuthorization
feature, you have to adapt the kube-apiserver
process of your remote Garden cluster. To do this, perform the following steps after running make remote-garden-up
:
Create an authorization webhook configuration file using the IP of the
garden/quic-server
pod running in your remote Garden cluster and port 10444 that tunnels to your locally runninggardener-admission-controller
process.apiVersion: v1 kind: Config current-context: seedauthorizer clusters: - name: gardener-admission-controller cluster: insecure-skip-tls-verify: true server: https://<quic-server-pod-ip>:10444/webhooks/auth/seed users: - name: kube-apiserver user: {} contexts: - name: seedauthorizer context: cluster: gardener-admission-controller user: kube-apiserver
Change or add the following command line parameters to your
kube-apiserver
process:--authorization-mode=<...>,Webhook
--authorization-webhook-config-file=<path to config file>
--authorization-webhook-cache-authorized-ttl=0
--authorization-webhook-cache-unauthorized-ttl=0
Delete the cluster role and rolebinding
gardener.cloud:system:seeds
from your remote Garden cluster.
If your remote Garden cluster is a Gardener shoot, and you can access the seed on which this shoot is scheduled, you can automate the above steps by running the enable-seed-authorizer
script and passing the kubeconfig of the seed cluster and the shoot namespace as parameters:
hack/local-development/remote-garden/enable-seed-authorizer <seed kubeconfig> <namespace>
Note: This script is not working anymore, as the
ReversedVPN
feature can’t be disabled. The annotationalpha.featuregates.shoot.gardener.cloud/reversed-vpn
onShoot
s is no longer respected.
To prevent Gardener from reconciling the shoot and overwriting your changes, add the annotation shoot.gardener.cloud/ignore: 'true'
to the remote Garden shoot. Note that this annotation takes effect only if it is enabled via the constollers.shoot.respectSyncPeriodOverwrite: true
option in the gardenlet
configuration.
To disable the seed authorizer again, run the same script with -d
as a third parameter:
hack/local-development/remote-garden/enable-seed-authorizer <seed kubeconfig> <namespace> -d
If the seed authorizer is enabled, you also have to start the gardener-admission-controller
via make start-admission-controller
.
⚠️ In the remote garden setup all Gardener components run with administrative permissions, i.e., there is no fine-grained access control via RBAC (as opposed to productive installations of Gardener).
Prepare the Gardener
Now, that you have started your local cluster, we can go ahead and register the Gardener API Server.
Just point your KUBECONFIG
environment variable to the cluster you created in the previous step and run:
make dev-setup
[...]
namespace/garden created
namespace/garden-dev created
deployment.apps/etcd created
service/etcd created
service/gardener-apiserver created
service/gardener-admission-controller created
endpoints/gardener-apiserver created
endpoints/gardener-admission-controller created
apiservice.apiregistration.k8s.io/v1alpha1.core.gardener.cloud created
apiservice.apiregistration.k8s.io/v1beta1.core.gardener.cloud created
apiservice.apiregistration.k8s.io/v1alpha1.seedmanagement.gardener.cloud created
apiservice.apiregistration.k8s.io/v1alpha1.settings.gardener.cloud created
ℹ️ [Optional] If you want to enable logging, in the gardenlet configuration add:
logging:
enabled: true
The Gardener exposes the API servers of Shoot clusters via Kubernetes services of type LoadBalancer
.
In order to establish stable endpoints (robust against changes of the load balancer address), it creates DNS records pointing to these load balancer addresses. They are used internally and by all cluster components to communicate.
You need to have control over a domain (or subdomain) for which these records will be created.
Please provide an internal domain secret (see this for an example) which contains credentials with the proper privileges. Further information can be found in Gardener Configuration and Usage.
kubectl apply -f example/10-secret-internal-domain-unmanaged.yaml
secret/internal-domain-unmanaged created
Run the Gardener
Next, run the Gardener API Server, the Gardener Controller Manager (optionally), the Gardener Scheduler (optionally), and the gardenlet in different terminal windows/panes using rules in the Makefile
.
make start-apiserver
[...]
I0306 15:23:51.044421 74536 plugins.go:84] Registered admission plugin "ResourceReferenceManager"
I0306 15:23:51.044523 74536 plugins.go:84] Registered admission plugin "DeletionConfirmation"
[...]
I0306 15:23:51.626836 74536 secure_serving.go:116] Serving securely on [::]:8443
[...]
(Optional) Now you are ready to launch the Gardener Controller Manager.
make start-controller-manager
time="2019-03-06T15:24:17+02:00" level=info msg="Starting Gardener controller manager..."
time="2019-03-06T15:24:17+02:00" level=info msg="Feature Gates: "
time="2019-03-06T15:24:17+02:00" level=info msg="Starting HTTP server on 0.0.0.0:2718"
time="2019-03-06T15:24:17+02:00" level=info msg="Acquired leadership, starting controllers."
time="2019-03-06T15:24:18+02:00" level=info msg="Starting HTTPS server on 0.0.0.0:2719"
time="2019-03-06T15:24:18+02:00" level=info msg="Found internal domain secret internal-domain-unmanaged for domain nip.io."
time="2019-03-06T15:24:18+02:00" level=info msg="Successfully bootstrapped the Garden cluster."
time="2019-03-06T15:24:18+02:00" level=info msg="Gardener controller manager (version 1.0.0-dev) initialized."
time="2019-03-06T15:24:18+02:00" level=info msg="ControllerRegistration controller initialized."
time="2019-03-06T15:24:18+02:00" level=info msg="SecretBinding controller initialized."
time="2019-03-06T15:24:18+02:00" level=info msg="Project controller initialized."
time="2019-03-06T15:24:18+02:00" level=info msg="Quota controller initialized."
time="2019-03-06T15:24:18+02:00" level=info msg="CloudProfile controller initialized."
[...]
(Optional) Now you are ready to launch the Gardener Scheduler.
make start-scheduler
time="2019-05-02T16:31:50+02:00" level=info msg="Starting Gardener scheduler ..."
time="2019-05-02T16:31:50+02:00" level=info msg="Starting HTTP server on 0.0.0.0:10251"
time="2019-05-02T16:31:50+02:00" level=info msg="Acquired leadership, starting scheduler."
time="2019-05-02T16:31:50+02:00" level=info msg="Gardener scheduler initialized (with Strategy: SameRegion)"
time="2019-05-02T16:31:50+02:00" level=info msg="Scheduler controller initialized."
[...]
The Gardener should now be ready to operate on Shoot resources. You can use
kubectl get shoots
No resources found.
to operate against your local running Gardener API Server.
Note: It may take several seconds until the Gardener API server has been started and is available.
No resources found
is the expected result of our initial development setup.
Create a Shoot
The steps below describe the general process of creating a Shoot. Have in mind that the steps do not provide full example manifests. The reader needs to check the provider documentation and adapt the manifests accordingly.
1. Copy the Example Manifests
The next steps require modifications of the example manifests. These modifications are part of local setup and should not be git push
-ed. To do not interfere with git, let’s copy the example manifests to dev/
which is ignored by git.
cp example/*.yaml dev/
2. Create a Project
Every Shoot is associated with a Project. Check the corresponding example manifests dev/00-namespace-garden-dev.yaml
and dev/05-project-dev.yaml
. Adapt them and create them.
kubectl apply -f dev/00-namespace-garden-dev.yaml
kubectl apply -f dev/05-project-dev.yaml
Make sure that the Project is successfully reconciled:
$ kubectl get project dev
NAME NAMESPACE STATUS OWNER CREATOR AGE
dev garden-dev Ready john.doe@example.com kubernetes-admin 6s
3. Create a CloudProfile
The CloudProfile
resource is provider specific and describes the underlying cloud provider (available machine types, regions, machine images, etc.). Check the corresponding example manifest dev/30-cloudprofile.yaml
. Check also the documentation and example manifests of the provider extension. Adapt dev/30-cloudprofile.yaml
and apply it.
kubectl apply -f dev/30-cloudprofile.yaml
4. Install Necessary Gardener Extensions
The Known Extension Implementations section contains a list of available extension implementations. You need to create a ControllerRegistration and ControllerDeployment for:
- at least one infrastructure provider
- a DNS provider (if the DNS for the Seed is not disabled)
- at least one operating system extension
- at least one network plugin extension
As a convention, the example ControllerRegistration manifest (containing also the necessary ControllerDeployment) for an extension is located under example/controller-registration.yaml
in the corresponding repository (for example for AWS the ControllerRegistration can be found here). An example creation for provider-aws (make sure to replace <version>
with the newest released version tag):
kubectl apply -f https://raw.githubusercontent.com/gardener/gardener-extension-provider-aws/<version>/example/controller-registration.yaml
Instead of updating extensions manually you can use Gardener Extensions Manager to install and update extension controllers. This is especially useful if you want to keep and maintain your development setup for a longer time. Also, please refer to Registering Extension Controllers for further information about how extensions are registered in case you want to use other versions than the latest releases.
5. Register a Seed
Shoot controlplanes run in seed clusters, so we need to create our first Seed now.
Check the corresponding example manifest dev/40-secret-seed.yaml
and dev/50-seed.yaml
. Update dev/40-secret-seed.yaml
with base64 encoded kubeconfig of the cluster that will be used as Seed (the scope of the permissions should be identical to the kubeconfig that the gardenlet creates during bootstrapping - for now, cluster-admin
privileges are recommended).
kubectl apply -f dev/40-secret-seed.yaml
Adapt dev/50-seed.yaml
- adjust .spec.secretRef
to refer the newly created Secret, adjust .spec.provider
with the Seed cluster provider and revise the other fields.
kubectl apply -f dev/50-seed.yaml
6. Start the gardenlet
Once the Seed is created, start the gardenlet to reconcile it. The make start-gardenlet
command will automatically configure the local gardenlet process to use the Seed and its kubeconfig. If you have multiple Seeds, you have to specify which to use by setting the SEED_NAME
environment variable like in make start-gardenlet SEED_NAME=my-first-seed
.
make start-gardenlet
time="2019-11-06T15:24:17+02:00" level=info msg="Starting Gardenlet..."
time="2019-11-06T15:24:17+02:00" level=info msg="Feature Gates: HVPA=true, Logging=true"
time="2019-11-06T15:24:17+02:00" level=info msg="Acquired leadership, starting controllers."
time="2019-11-06T15:24:18+02:00" level=info msg="Found internal domain secret internal-domain-unmanaged for domain nip.io."
time="2019-11-06T15:24:18+02:00" level=info msg="Gardenlet (version 1.0.0-dev) initialized."
time="2019-11-06T15:24:18+02:00" level=info msg="ControllerInstallation controller initialized."
time="2019-11-06T15:24:18+02:00" level=info msg="Shoot controller initialized."
time="2019-11-06T15:24:18+02:00" level=info msg="Seed controller initialized."
[...]
The gardenlet will now reconcile the Seed. Check the progess from time to time until it’s Ready
:
kubectl get seed
NAME STATUS PROVIDER REGION AGE VERSION K8S VERSION
seed-aws Ready aws eu-west-1 4m v1.61.0-dev v1.24.8
7. Create a Shoot
A Shoot requires a SecretBinding. The SecretBinding refers to a Secret that contains the cloud provider credentials. The Secret data keys are provider specific and you need to check the documentation of the provider to find out which data keys are expected (for example for AWS the related documentation can be found at Provider Secret Data). Adapt dev/70-secret-provider.yaml
and dev/80-secretbinding.yaml
and apply them.
kubectl apply -f dev/70-secret-provider.yaml
kubectl apply -f dev/80-secretbinding.yaml
After the SecretBinding creation, you are ready to proceed with the Shoot creation. You need to check the documentation of the provider to find out the expected configuration (for example for AWS the related documentation and example Shoot manifest can be found at Using the AWS provider extension with Gardener as end-user). Adapt dev/90-shoot.yaml
and apply it.
To make sure that a specific Seed cluster will be chosen or to skip the scheduling (the sheduling requires Gardener Scheduler to be running), specify the .spec.seedName
field (see here).
kubectl apply -f dev/90-shoot.yaml
Watch the progress of the operation and make sure that the Shoot will be successfully created.
watch kubectl get shoot --all-namespaces
8 - Log Parsers
How to Create Log Parser for Container into fluent-bit
If our log message is parsed correctly, it has to be showed in Grafana like this:
{"log":"OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io","pid":"1","severity":"INFO","source":"controller.go:107"}
Otherwise it will looks like this:
{
"log":"{
\"level\":\"info\",\"ts\":\"2020-06-01T11:23:26.679Z\",\"logger\":\"gardener-resource-manager.health-reconciler\",\"msg\":\"Finished ManagedResource health checks\",\"object\":\"garden/provider-aws-dsm9r\"
}\n"
}
}
Create a Custom Parser
First of all, we need to know how the log for the specific container looks like (for example, lets take a log from the
alertmanager
:level=info ts=2019-01-28T12:33:49.362015626Z caller=main.go:175 build_context="(go=go1.11.2, user=root@4ecc17c53d26, date=20181109-15:40:48)
)We can see that this log contains 4 subfields(severity=info, timestamp=2019-01-28T12:33:49.362015626Z, source=main.go:175 and the actual message). So we have to write a regex which matches this log in 4 groups(We can use https://regex101.com/ like helping tool). So, for this purpose our regex looks like this:
^level=(?<severity>\w+)\s+ts=(?<time>\d{4}-\d{2}-\d{2}[Tt].*[zZ])\s+caller=(?<source>[^\s]*+)\s+(?<log>.*)
- Now we have to create correct time format for the timestamp (We can use this site for this purpose: http://ruby-doc.org/stdlib-2.4.1/libdoc/time/rdoc/Time.html#method-c-strptime). So our timestamp matches correctly the following format:
%Y-%m-%dT%H:%M:%S.%L
- It’s time to apply our new regex into fluent-bit configuration. Go to
fluent-bit-configmap.yaml
and create new filter using the following template:
[FILTER]
Name parser
Match kubernetes.<< pod-name >>*<< container-name >>*
Key_Name log
Parser << parser-name >>
Reserve_Data True
EXAMPLE
[FILTER]
Name parser
Match kubernetes.alertmanager*alertmanager*
Key_Name log
Parser alermanagerParser
Reserve_Data True
- Now lets check if there already exists parser with such a regex and time format that we need. If it doesn’t, create one:
[PARSER]
Name << parser-name >>
Format regex
Regex << regex >>
Time_Key time
Time_Format << time-format >>
EXAMPLE
[PARSER]
Name alermanagerParser
Format regex
Regex ^level=(?<severity>\w+)\s+ts=(?<time>\d{4}-\d{2}-\d{2}[Tt].*[zZ])\s+caller=(?<source>[^\s]*+)\s+(?<log>.*)
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Follow your development setup to validate that the parsers are working correctly.
9 - Logging
Logging in Gardener Components
This document aims at providing a general developer guideline on different aspects of logging practices and conventions used in the Gardener codebase. It contains mostly Gardener-specific points, and references other existing and commonly accepted logging guidelines for general advice. Developers and reviewers should consult this guide when writing, refactoring, and reviewing Gardener code. If parts are unclear or new learnings arise, this guide should be adapted accordingly.
Logging Libraries / Implementations
Historically, Gardener components have been using logrus.
There is a global logrus logger (logger.Logger
) that is initialized by components on startup and used across the codebase.
In most places, it is used as a printf
-style logger and only in some instances we make use of logrus’ structured logging functionality.
In the process of migrating our components to native controller-runtime components (see gardener/gardener#4251), we also want to make use of controller-runtime’s built-in mechanisms for streamlined logging. controller-runtime uses logr, a simple structured logging interface, for library-internal logging and logging in controllers.
logr itself is only an interface and doesn’t provide an implementation out of the box. Instead, it needs to be backed by a logging implementation like zapr. Code that uses the logr interface is thereby not tied to a specific logging implementation and makes the implementation easily exchangeable. controller-runtime already provides a set of helpers for constructing zapr loggers, i.e., logr loggers backed by zap, which is a popular logging library in the go community. Hence, we are migrating our component logging from logrus to logr (backed by zap) as part of gardener/gardener#4251.
⚠️
logger.Logger
(logrus logger) is deprecated in Gardener and shall not be used in new code – use logr loggers when writing new code! (also see Migration from logrus to logr)ℹ️ Don’t use zap loggers directly, always use the logr interface in order to avoid tight coupling to a specific logging implementation.
gardener-apiserver differs from the other components as it is based on the apiserver library and therefore uses klog – just like kube-apiserver. As gardener-apiserver writes (almost) no logs in our coding (outside the apiserver library), there is currently no plan for switching the logging implementation. Hence, the following sections focus on logging in the controller and admission components only.
logcheck
Tool
To ensure a smooth migration to logr and make logging in Gardener components more consistent, the logcheck
tool was added.
It enforces (parts of) this guideline and detects programmer-level errors early on in order to prevent bugs.
Please check out the tool’s documentation for a detailed description.
Structured Logging
Similar to efforts in the Kubernetes project, we want to migrate our component logs to structured logging. As motivated above, we will use the logr interface instead of klog though.
You can read more about the motivation behind structured logging in logr’s background and FAQ (also see this blog post by Dave Cheney). Also, make sure to check out controller-runtime’s logging guideline with specifics for projects using the library. The following sections will focus on the most important takeaways from those guidelines and give general instructions on how to apply them to Gardener and its controller-runtime components.
Note: Some parts in this guideline differ slightly from controller-runtime’s document.
TL;DR of Structured Logging
❌ Stop using printf
-style logging:
var logger *logrus.Logger
logger.Infof("Scaling deployment %s/%s to %d replicas", deployment.Namespace, deployment.Name, replicaCount)
✅ Instead, write static log messages and enrich them with additional structured information in form of key-value pairs:
var logger logr.Logger
logger.Info("Scaling deployment", "deployment", client.ObjectKeyFromObject(deployment), "replicas", replicaCount)
Log Configuration
Gardener components can be configured to either log in json
(default) or text
format:
json
format is supposed to be used in production, while text
format might be nicer for development.
# json
{"level":"info","ts":"2021-12-16T08:32:21.059+0100","msg":"Hello botanist","garden":"eden"}
# text
2021-12-16T08:32:21.059+0100 INFO Hello botanist {"garden": "eden"}
Components can be set to one of the following log levels (with increasing verbosity): error
, info
(default), debug
.
Log Levels
logr uses V-levels (numbered log levels), higher V-level means higher verbosity.
V-levels are relative (in contrast to klog
’s absolute V-levels), i.e., V(1)
creates a logger, that is one level more verbose than its parent logger.
In Gardener components, the mentioned log levels in the component config (error
, info
, debug
) map to the zap levels with the same names (see here).
Hence, our loggers follow the same mapping from numerical logr levels to named zap levels like described in zapr, i.e.:
- component config specifies
debug
➡️ bothV(0)
andV(1)
are enabled - component config specifies
info
➡️V(0)
is enabled,V(1)
will not be shown - component config specifies
error
➡️ neitherV(0)
norV(1)
will be shown Error()
logs will always be shown
This mapping applies to the components’ root loggers (the ones that are not “derived” from any other logger; constructed on component startup).
If you derive a new logger with e.g. V(1)
, the mapping will shift by one. For example, V(0)
will then log at zap’s debug
level.
There is no warning
level (see Dave Cheney’s post).
If there is an error condition (e.g., unexpected error received from a called function), the error should either be handled or logged at error
if it is neither handled nor returned.
If you have an error
value at hand that doesn’t represent an actual error condition, but you still want to log it as an informational message, log it at info
level with key err
.
We might consider to make use of a broader range of log levels in the future when introducing more logs and common command line flags for our components (comparable to --v
of Kubernetes components).
For now, we stick to the mentioned two log levels like controller-runtime: info (V(0)
) and debug (V(1)
).
Logging in Controllers
Named Loggers
Controllers should use named loggers that include their name, e.g.:
controllerLogger := rootLogger.WithName("controller").WithName("shoot")
controllerLogger.Info("Deploying kube-apiserver")
results in
2021-12-16T09:27:56.550+0100 INFO controller.shoot Deploying kube-apiserver
Logger names are hierarchical. You can make use of it, where controllers are composed of multiple “subcontrollers”, e.g., controller.shoot.hibernation
or controller.shoot.maintenance
.
Using the global logger logf.Log
directly is discouraged and should be rather exceptional because it makes correlating logs with code harder.
Preferably, all parts of the code should use some named logger.
Reconciler Loggers
In your Reconcile
function, retrieve a logger from the given context.Context
.
It inherits from the controller’s logger (i.e., is already named) and is preconfigured with name
and namespace
values for the reconciliation request:
func (r *reconciler) Reconcile(ctx context.Context, request reconcile.Request) (reconcile.Result, error) {
log := logf.FromContext(ctx)
log.Info("Reconciling Shoot")
// ...
return reconcile.Result{}, nil
}
results in
2021-12-16T09:35:59.099+0100 INFO controller.shoot Reconciling Shoot {"name": "sunflower", "namespace": "garden-greenhouse"}
The logger is injected by controller-runtime’s Controller
implementation. The logger returned by logf.FromContext
is never nil
. If the context doesn’t carry a logger, it falls back to the global logger (logf.Log
), which might discard logs if not configured, but is also never nil
.
⚠️ Make sure that you don’t overwrite the
name
ornamespace
value keys for such loggers, otherwise you will lose information about the reconciled object.
The controller implementation (controller-runtime) itself takes care of logging the error returned by reconcilers. Hence, don’t log an error that you are returning. Generally, functions should not return an error, if they already logged it, because that means the error is already handled and not an error anymore. See Dave Cheney’s post for more on this.
Messages
- Log messages should be static. Don’t put variable content in there, i.e., no
fmt.Sprintf
or string concatenation (+
). Use key-value pairs instead. - Log messages should be capitalized. Note: This contrasts with error messages, that should not be capitalized. However, both should not end with a punctuation mark.
Keys and Values
Use
WithValues
instead of repeatedly adding key-value pairs for multiple log statements.WithValues
creates a new logger from the parent, that carries the given key-value pairs. E.g., use it when acting on one object in multiple steps and logging something for each step:log := parentLog.WithValues("infrastructure", client.ObjectKeyFromObject(infrastrucutre)) // ... log.Info("Creating Infrastructure") // ... log.Info("Waiting for Infrastructure to be reconciled") // ...
Note:
WithValues
bypasses controller-runtime’s special zap encoder that nicely encodesObjectKey
/NamespacedName
andruntime.Object
values, see kubernetes-sigs/controller-runtime#1290. Thus, the end result might look different depending on the value and itsStringer
implementation.
Use lowerCamelCase for keys. Don’t put spaces in keys, as it will make log processing with simple tools like
jq
harder.Keys should be constant, human-readable, consistent across the codebase and naturally match parts of the log message, see logr guideline.
When logging object keys (name and namespace), use the object’s type as the log key and a
client.ObjectKey
/types.NamespacedName
value as value, e.g.:var deployment *appsv1.Deployment log.Info("Creating Deployment", "deployment", client.ObjectKeyFromObject(deployment))
which results in
{"level":"info","ts":"2021-12-16T08:32:21.059+0100","msg":"Creating Deployment","deployment":{"name": "bar", "namespace": "foo"}}
Earlier, we often used
kutil.ObjectName()
for logging object keys, which encodes them into a flat string likefoo/bar
. However, this flat string cannot be processed so easily by logging stacks (orjq
) like a structured log. Hence, the use ofkutil.ObjectName()
for logging object keys is discouraged. Existing usages should be refactored to useclient.ObjectKeyFromObject()
instead.There are cases where you don’t have the full object key or the object itself at hand, e.g., if an object references another object (in the same namespace) by name (think
secretRef
or similar). In such a cases, either construct the full object key including the implied namespace or log the object name under a key ending inName
, e.g.:var ( // object to reconcile shoot *gardencorev1beta1.Shoot // retrieved via logf.FromContext, preconfigured by controller with namespace and name of reconciliation request log logr.Logger ) // option a: full object key, manually constructed log.Info("Shoot uses SecretBinding", "secretBinding", client.ObjectKey{Namespace: shoot.Namespace, Name: shoot.Spec.SecretBindingName}) // option b: only name under respective *Name log key log.Info("Shoot uses SecretBinding", "secretBindingName", shoot.Spec.SecretBindingName)
Both options result in well-structured logs, that are easy to interpret and process:
{"level":"info","ts":"2022-01-18T18:00:56.672+0100","msg":"Shoot uses SecretBinding","name":"my-shoot","namespace":"garden-project","secretBinding":{"namespace":"garden-project","name":"aws"}} {"level":"info","ts":"2022-01-18T18:00:56.673+0100","msg":"Shoot uses SecretBinding","name":"my-shoot","namespace":"garden-project","secretBindingName":"aws"}
When handling generic
client.Object
values (e.g. in helper funcs), useobject
as key.When adding timestamps to key-value pairs, use
time.Time
values. By this, they will be encoded in the same format as the log entry’s timestamp.
Don’t usemetav1.Time
values, as they will be encoded in a different format by theirStringer
implementation. Pass<someTimestamp>.Time
to loggers in case you have ametav1.Time
value at hand.Same applies to durations. Use
time.Duration
values instead of*metav1.Duration
. Durations can be handled specially by zap just like timestamps.Event recorders not only create
Event
objects but also log them. However, both Gardener’s manually instantiated event recorders and the ones that controller-runtime provides log todebug
level and use generic formats, that are not very easy to interpret or process (no structured logs). Hence, don’t use event recorders as replacements for well-structured logs. If a controller records an event for a completed action or important information, it should probably log it as well, e.g.:log.Info("Creating ManagedSeed", "replica", r.GetObjectKey()) a.recorder.Eventf(managedSeedSet, corev1.EventTypeNormal, EventCreatingManagedSeed, "Creating ManagedSeed %s", r.GetFullName())
Logging in Test Code
If the tested production code requires a logger, you can pass
logr.Discard()
orlogf.NullLogger{}
in your test, which simply discards all logs.logf.Log
is safe to use in tests and will not cause a nil pointer deref, even if it’s not initialized vialogf.SetLogger
. It is initially set to aNullLogger
by default, which means all logs are discarded, unlesslogf.SetLogger
is called in the first 30 seconds of execution.Pass
zap.WriteTo(GinkgoWriter)
in tests where you want to see the logs on test failure but not on success, for example:logf.SetLogger(logger.MustNewZapLogger(logger.DebugLevel, logger.FormatJSON, zap.WriteTo(GinkgoWriter))) log := logf.Log.WithName("test")
10 - Monitoring Stack
Extending the Monitoring Stack
This document provides instructions to extend the Shoot cluster monitoring stack by integrating new scrape targets, alerts and dashboards.
Please ensure that you have understood the basic principles of Prometheus and its ecosystem before you continue.
‼️ The purpose of the monitoring stack is to observe the behaviour of the control plane and the system components deployed by Gardener onto the worker nodes. Monitoring of custom workloads running in the cluster is out of scope.
Overview
Each Shoot cluster comes with its own monitoring stack. The following components are deployed into the seed and shoot:
- Seed
- Prometheus
- Grafana
- blackbox-exporter
- kube-state-metrics (Seed metrics)
- kube-state-metrics (Shoot metrics)
- Alertmanager (Optional)
- Shoot
In each Seed cluster there is a Prometheus in the garden
namespace responsible for collecting metrics from the Seed kubelets and cAdvisors. These metrics are provided to each Shoot Prometheus via federation.
The alerts for all Shoot clusters hosted on a Seed are routed to a central Alertmanger running in the garden
namespace of the Seed. The purpose of this central alertmanager is to forward all important alerts to the operators of the Gardener setup.
The Alertmanager in the Shoot namespace on the Seed is only responsible for forwarding alerts from its Shoot cluster to a cluster owner/cluster alert receiver via email. The Alertmanager is optional and the conditions for a deployment are already described in Alerting.
Adding New Monitoring Targets
After exploring the metrics which your component provides or adding new metrics, you should be aware which metrics are required to write the needed alerts and dashboards.
Prometheus prefers a pull based metrics collection approach and therefore the targets to observe need to be defined upfront. The targets are defined in charts/seed-monitoring/charts/prometheus/templates/config.yaml
.
New scrape jobs can be added in the section scrape_configs
. Detailed information how to configure scrape jobs and how to use the kubernetes service discovery are available in the Prometheus documentation.
The job_name
of a scrape job should be the name of the component e.g. kube-apiserver
or vpn
. The collection interval should be the default of 30s
. You do not need to specify this in the configuration.
Please do not ingest all metrics which are provided by a component. Rather, collect only those metrics which are needed to define the alerts and dashboards (i.e. whitelist). This can be achieved by adding the following metric_relabel_configs
statement to your scrape jobs (replace exampleComponent
with component name).
- job_name: example-component
...
metric_relabel_configs:
{{ include "prometheus.keep-metrics.metric-relabel-config" .Values.allowedMetrics.exampleComponent | indent 6 }}
The whitelist for the metrics of your job can be maintained in charts/seed-monitoring/charts/prometheus/values.yaml
in section allowedMetrics.exampleComponent
(replace exampleComponent
with component name). Check the following example:
allowedMetrics:
...
exampleComponent:
* metrics_name_1
* metrics_name_2
...
Adding Alerts
The alert definitons are located in charts/seed-monitoring/charts/prometheus/rules
. There are two approaches for adding new alerts.
- Adding additional alerts for a component which already has a set of alerts. In this case you have to extend the existing rule file for the component.
- Adding alerts for a new component. In this case a new rule file with name scheme
example-component.rules.yaml
needs to be added. - Add the new alert to
alertInhibitionGraph.dot
, add any required inhibition flows and render the new graph. To render the graph, run:
dot -Tpng ./content/alertInhibitionGraph.dot -o ./content/alertInhibitionGraph.png
- Create a test for the new alert. See
Alert Tests
.
Example alert:
groups:
* name: example.rules
rules:
* alert: ExampleAlert
expr: absent(up{job="exampleJob"} == 1)
for: 20m
labels:
service: example
severity: critical # How severe is the alert? (blocker|critical|info|warning)
type: shoot # For which topology is the alert relevant? (seed|shoot)
visibility: all # Who should receive the alerts? (all|operator|owner)
annotations:
description: A longer description of the example alert that should also explain the impact of the alert.
summary: Short summary of an example alert.
If the deployment of component is optional then the alert definitions needs to be added to charts/seed-monitoring/charts/prometheus/optional-rules
instead. Furthermore the alerts for component need to be activatable in charts/seed-monitoring/charts/prometheus/values.yaml
via rules.optional.example-component.enabled
. The default should be true
.
Basic instruction how to define alert rules can be found in the Prometheus documentation.
Routing Tree
The Alertmanager is grouping incoming alerts based on labels into buckets. Each bucket has its own configuration like alert receivers, initial delaying duration or resending frequency, etc. You can find more information about Alertmanager routing in the Prometheus/Alertmanager documentation. The routing trees for the Alertmanagers deployed by Gardener are depicted below.
Central Seed Alertmanager
∟ main route (all alerts for all shoots on the seed will enter)
∟ group by project and shoot name
∟ group by visibility "all" and "operator"
∟ group by severity "blocker", "critical", and "info" → route to Garden operators
∟ group by severity "warning" (dropped)
∟ group by visibility "owner" (dropped)
Shoot Alertmanager
∟ main route (only alerts for one Shoot will enter)
∟ group by visibility "all" and "owner"
∟ group by severity "blocker", "critical", and "info" → route to cluster alert receiver
∟ group by severity "warning" (dropped, will change soon → route to cluster alert receiver)
∟ group by visibility "operator" (dropped)
Alert Inhibition
All alerts related to components running on the Shoot workers are inhibited in case of an issue with the vpn connection, because those components can’t be scraped anymore and Prometheus will fire alerts in consequence. The components running on the workers are probably healthy and the alerts are presumably false positives. The inhibition flow is shown in the figure below. If you add a new alert, make sure to add it to the diagram.
Alert Attributes
Each alert rule definition has to contain the following annotations:
- summary: A short description of the issue.
- description: A detailed explanation of the issue with hints to the possible root causes and the impact assessment of the issue.
In addtion, each alert must contain the following labels:
- type
shoot
: Components running on the Shoot worker nodes in thekube-system
namespace.seed
: Components running on the Seed in the Shoot namespace as part of/next to the control plane.
- service
- Name of the component (in lowercase) e.g.
kube-apiserver
,alertmanager
orvpn
.
- Name of the component (in lowercase) e.g.
- severity
blocker
: All issues which make the cluster entirely unusable, e.g.KubeAPIServerDown
orKubeSchedulerDown
critical
: All issues which affect single functionalities/components but do not affect the cluster in its core functionality e.g.VPNDown
orKubeletDown
.info
: All issues that do not affect the cluster or its core functionality, but if this component is down we cannot determine if a blocker alert is firing. (i.e. A component with an info level severity is a dependency for a component with a blocker severity)warning
: No current existing issue, rather a hint for situations which could lead to real issue in the close future e.g.HighLatencyApiServerToWorkers
orApiServerResponseSlow
.
Alert Tests
To test the Prometheus alerts:
make test-prometheus
If you want to add alert tests:
Create a new file in
rules-tests
in the form<alert-group-name>.rules.test.yaml
or if the alerts are for an existing component with existing tests, simply add the tests to the appropriate files.Make sure that newly added tests succeed. See above.
Adding Grafana Dashboards
The dashboard definition files are located in charts/seed-monitoring/charts/grafana/dashboards
. Every dashboard needs its own file.
If you are adding a new component dashboard please also update the overview dashboard by adding a chart for its current up/down status and with a drill down option to the component dashboard.
Dashboard Structure
The dashboards should be structured in the following way. The assignment of the component dashboards to the categories should be handled via dashboard tags.
- Kubernetes control plane components (Tag:
control-plane
)- All components which are part of the Kubernetes control plane e. g. Kube API Server, Kube Controller Manager, Kube Scheduler and Cloud Controller Manager
- ETCD + Backup/Restore
- Kubernetes Addon Manager
- Node/Machine components (Tag:
node/machine
)- All metrics which are related to the behaviour/control of the Kubernetes nodes and kubelets
- Machine-Controller-Manager + Cluster Autoscaler
- Networking components (Tag:
network
)- CoreDNS, KubeProxy, Calico, VPN, Nginx Ingress
- Addon components (Tag:
addon
)- Cert Broker
- Monitoring components (Tag:
monitoring
) - Logging components (Tag:
logging
)
Mandatory Charts for Component Dashboards
For each new component, its corresponding dashboard should contain the following charts in the first row, before adding custom charts for the component in the subsequent rows.
- Pod up/down status
up{job="example-component"}
- Pod/containers cpu utilization
- Pod/containers memorty consumption
- Pod/containers network i/o
That information is provided by the cAdvisor metrics. These metrics are already integrated. Please check the other dashboards for detailed information on how to query.
Chart Requirements
Each chart needs to contain:
- a meaningful name
- a detailed description (for non trivial charts)
- appropriate x/y axis descriptions
- appropriate scaling levels for the x/y axis
- proper units for the x/y axis
Dashboard Parameters
The following parameters should be added to all dashboards to ensure a homogeneous experience across all dashboards.
Dashboards have to:
- contain a title which refers to the component name(s)
- contain a timezone statement which should be the browser time
- contain tags which express where the component is running (
seed
orshoot
) and to which category the component belong (see dashboard structure) - contain a version statement with a value of 1
- be immutable
Example dashboard configuration:
{
"title": "example-component",
"timezone": "utc",
"tags": [
"seed",
"control-plane"
],
"version": 1,
"editable": "false"
}
Furthermore, all dashboards should contain the following time options:
{
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"30s",
"1m",
"5m"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"10d"
]
}
}
11 - New Cloud Provider
Adding Cloud Providers
This document provides an overview of how to integrate a new cloud provider into Gardener. Each component that requires integration has a detailed description of how to integrate it and the steps required.
Cloud Components
Gardener is composed of 2 or more Kubernetes clusters:
- Shoot: These are the end-user clusters, the regular Kubernetes clusters you have seen. They provide places for your workloads to run.
- Seed: This is the “management” cluster. It manages the control planes of shoots by running them as native Kubernetes workloads.
These two clusters can run in the same cloud provider, but they do not need to. For example, you could run your Seed in AWS, while having one shoot in Azure, two in Google, two in Alicloud, and three in Equinix Metal.
The Seed cluster deploys and manages the Shoot clusters. Importantly, for this discussion, the etcd
data store backing each Shoot runs as workloads inside the Seed. Thus, to use the above example, the clusters in Azure, Google, Alicloud and Equinix Metal will have their worker nodes and master nodes running in those clouds, but the etcd
clusters backing them will run as separate deployments in the Seed Kubernetes cluster on AWS.
This distinction becomes important when preparing the integration to a new cloud provider.
Gardener Cloud Integration
Gardener and its related components integrate with cloud providers at the following key lifecycle elements:
- Create/destroy/get/list machines for the Shoot.
- Create/destroy/get/list infrastructure components for the Shoot, e.g. VPCs, subnets, routes, etc.
- Backup/restore etcd for the Seed via writing files to and reading them from object storage.
Thus, the integrations you need for your cloud provider depend on whether you want to deploy Shoot clusters to the provider, Seed or both.
- Shoot Only: machine lifecycle management, infrastructure
- Seed: etcd backup/restore
Gardener API
In addition to the requirements to integrate with the cloud provider, you also need to enable the core Gardener app to receive, validate, and process requests to use that cloud provider.
- Expose the cloud provider to the consumers of the Gardener API, so it can be told to use that cloud provider as an option.
- Validate that API as requests come in.
- Write cloud provider specific implementation (called “provider extension”).
Cloud Provider API Requirements
In order for a cloud provider to integrate with Gardener, the provider must have an API to perform machine lifecycle events, specifically:
- Create a machine
- Destroy a machine
- Get information about a machine and its state
- List machines
In addition, if the Seed is to run on the given provider, it also must have an API to save files to block storage and retrieve them, for etcd backup/restore.
The current integration with cloud providers is to add their API calls to Gardener and the Machine Controller Manager. As both Gardener and the Machine Controller Manager are written in go, the cloud provider should have a go SDK. However, if it has an API that is wrappable in go, e.g. a REST API, then you can use that to integrate.
The Gardener team is working on bringing cloud provider integrations out-of-tree, making them pluggable, which should simplify the process and make it possible to use other SDKs.
Summary
To add a new cloud provider, you need some or all of the following. Each repository contains instructions on how to extend it to a new cloud provider.
Type | Purpose | Location | Documentation |
---|---|---|---|
Seed or Shoot | Machine Lifecycle | machine-controller-manager | MCM new cloud provider |
Seed only | etcd backup/restore | etcd-backup-restore | In process |
All | Extension implementation | gardener | Extension controller |
12 - New Kubernetes Version
Adding Support For a New Kubernetes Version
This document describes the steps needed to perform in order to confidently add support for a new Kubernetes minor version.
⚠️ Typically, once a minor Kubernetes version
vX.Y
is supported by Gardener, then all patch versionsvX.Y.Z
are also automatically supported without any required action. This is because patch versions do not introduce any new feature or API changes, so there is nothing that needs to be adapted ingardener/gardener
code.
The Kubernetes community release a new minor version roughly every 4 months. Please refer to the official documentation about their release cycles for any additional information.
Shortly before a new release, an “umbrella” issue should be opened which is used to collect the required adaptations and to track the work items.
For example, #5102 can be used as a template for the issue description.
As you can see, the task of supporting a new Kubernetes version also includes the provider extensions maintained in the gardener
GitHub organization and is not restricted to gardener/gardener
only.
Generally, the work items can be split into two groups: The first group contains Kubernetes release-independent tasks, the second group contains tasks specific to the changes in the given Kubernetes release.
ℹ️ Upgrading the
k8s.io/*
andsigs.k8s.io/controller-runtime
Golang dependencies is typically tracked and worked on separately (see e.g. #4772 or #5282).
Deriving Release-Specific Tasks
Most new minor Kubernetes releases incorporate API changes, deprecations, or new features.
The community announces them via their change logs.
In order to derive the release-specific tasks, the respective change log for the new version vX.Y
has to be read and understood (for example, the changelog for v1.24
).
As already mentioned, typical changes to watch out for are:
- API version promotions or deprecations
- Feature gate promotions or deprecations
- CLI flag changes for Kubernetes components
- New default values in resources
- New available fields in resources
- New features potentially relevant for the Gardener system
- Changes of labels or annotations Gardener relies on
- …
Obviously, this requires a certain experience and understanding of the Gardener project so that all “relevant changes” can be identified.
While reading the change log, add the tasks (along with the respective PR in kubernetes/kubernetes
to the umbrella issue).
ℹ️ Some of the changes might be specific to certain cloud providers. Pay attention to those as well and add related tasks to the issue.
List Of Release-Independent Tasks
The following paragraphs describe recurring tasks that need to be performed for each new release.
Make Sure a New hyperkube
Image Is Released
The gardener/hyperkube
repository is used to release container images consisting of the kubectl
and kubelet
binaries.
There is a CI/CD job that runs periodically and releases a new hyperkube
image when there is a new Kubernetes release. Before proceeding with the next steps, make sure that a new hyperkube
image is released for the corresponding new Kubernetes minor version. Make sure that container image is present in GCR.
Adapting Gardener
- Allow instantiation of a Kubernetes client for the new minor version and update the
README.md
:- See this example commit.
- Maintain the Kubernetes feature gates used for validation of
Shoot
resources:- The feature gates are maintained in this file.
- To maintain this list for new Kubernetes versions, run
hack/compare-k8s-feature-gates.sh <old-version> <new-version>
(e.g.hack/compare-k8s-feature-gates.sh v1.22 v1.23
). - It will present 3 lists of feature gates: those added and those removed in
<new-version>
compared to<old-version>
and feature gates that got locked to default in<new-version>
. - Add all added feature gates to the map with
<new-version>
asAddedInVersion
and noRemovedInVersion
. - For any removed feature gates, add
<new-version>
asRemovedInVersion
to the already existing feature gate in the map. - For feature gates locked to default, add
<new-version>
asLockedToDefaultInVersion
to the already existing feature gate in the map. - See this example commit.
- Maintain the Kubernetes
kube-apiserver
admission plugins used for validation ofShoot
resources:- The admission plugins are maintained in this file.
- To maintain this list for new Kubernetes versions, run
hack/compare-k8s-admission-plugins.sh <old-version> <new-version>
(e.g.hack/compare-k8s-admission-plugins.sh 1.24 1.25
). - It will present 2 lists of admission plugins: those added and those removed in
<new-version>
compared to<old-version>
. - Add all added admission plugins to the
admissionPluginsVersionRanges
map with<new-version>
asAddedInVersion
and noRemovedInVersion
. - For any removed admission plugins, add
<new-version>
asRemovedInVersion
to the already existing admission plugin in the map. - Flag any admission plugins that are required (plugins that must not be disabled in the
Shoot
spec) by setting theRequired
boolean variable to true for the admission plugin in the map. - Flag any admission plugins that are forbidden by setting the
Forbidden
boolean variable to true for the admission plugin in the map.
- Maintain the
ServiceAccount
names for the controllers part ofkube-controller-manager
:- The names are maintained in this file.
- To maintain this list for new Kubernetes versions, run
hack/compare-k8s-controllers.sh <old-version> <new-version>
(e.g.hack/compare-k8s-controllers.sh 1.22 1.23
). - It will present 2 lists of controllers: those added and those removed in
<new-version>
compared to<old-version>
. - Double check whether such
ServiceAccount
indeed appears in thekube-system
namespace when creating a cluster with<new-version>
. Note that it sometimes might be hidden behind a default-off feature gate. You can create a local cluster with the new version using the local provider. - If it appears, add all added controllers to the list based on the Kubernetes version (example).
- For any removed controllers, add them only to the Kubernetes version if it is low enough.
- Bump the used Kubernetes version for local
Shoot
and local e2e test.- See this example commit.
Filing the Pull Request
Work on all the tasks you have collected and validate them using the local provider. Execute the e2e tests and if everything looks good, then go ahead and file the PR (example PR). Generally, it is great if you add the PRs also to the umbrella issue so that they can be tracked more easily.
Adapting Provider Extensions
After the PR in gardener/gardener
for the support of the new version has been merged, you can go ahead and work on the provider extensions.
Actually, you can already start even if the PR is not yet merged and use the branch of your fork.
- Revendor the
github.com/gardener/gardener
dependency in the extension and update theREADME.md
. - Work on release-specific tasks related to this provider.
Maintaining the cloud-controller-manager
Images
Some of the cloud providers are not yet using upstream cloud-controller-manager
images.
Instead, we build and maintain them ourselves:
- https://github.com/gardener/cloud-provider-aws
- https://github.com/gardener/cloud-provider-azure (since
v1.23
, we use the upstream image) - https://github.com/gardener/cloud-provider-gcp
Until we switch to upstream images, you need to revendor the Kubernetes dependencies and release a new image. The required steps are as follows:
- Checkout the
legacy-cloud-provider
branch of the respective repository - Bump the versions in the
Dockerfile
(example commit). - Update the
VERSION
tovX.Y.Z-dev
whereZ
is the latest available Kubernetes patch version for thevX.Y
minor version. - Update the
k8s.io/*
dependencies in thego.mod
file tovX.Y.Z
and rungo mod vendor
andgo mod tidy
(example commit). - Checkout a new
release-vX.Y
branch and release it (example)
As you are already on it, it is great if you also bump the
k8s.io/*
dependencies for the last three minor releases as well. In this case, you need to checkout therelease-vX.{Y-{1,2,3}}
branches and only perform the last three steps (example branch, example commit).
Now you need to update the new releases in the charts/images.yaml
of the respective provider extension so that they are used (see this example commit for reference).
Filing the Pull Request
Again, work on all the tasks you have collected. This time, you cannot use the local provider for validation but should create real clusters on the various infrastructures. Typically, the following validations should be performed:
- Create new clusters with versions <
vX.Y
- Create new clusters with version =
vX.Y
- Upgrade old clusters from version
vX.{Y-1}
to versionvX.Y
- Delete clusters with versions <
vX.Y
- Delete clusters with version =
vX.Y
If everything looks good, then go ahead and file the PR (example PR). Generally, it is again great if you add the PRs also to the umbrella issue so that they can be tracked more easily.
13 - Priority Classes
PriorityClass
es in Gardener Clusters
Gardener makes use of PriorityClass
es to improve the overall robustness of the system.
In order to benefit from the full potential of PriorityClass
es, the gardenlet manages a set of well-known PriorityClass
es with fine-granular priority values.
All components of the system should use these well-known PriorityClass
es instead of creating and using separate ones with arbitrary values, which would compromise the overall goal of using PriorityClass
es in the first place.
The gardenlet manages the well-known PriorityClass
es listed in this document, so that third parties (e.g., Gardener extensions) can rely on them to be present when deploying components to Seed and Shoot clusters.
The listed well-known PriorityClass
es follow this rough concept:
- Values are close to the maximum that can be declared by the user. This is important to ensure that Shoot system components have higher priority than the workload deployed by end-users.
- Values have a bit of headroom in between to ensure flexibility when the need for intermediate priority values arises.
- Values of
PriorityClass
es created on Seed clusters are lower than the ones on Shoots to ensure that Shoot system components have higher priority than Seed components, if the Seed is backed by a Shoot (ManagedSeed
), e.g.coredns
should have higher priority thangardenlet
. - Names simply include the last digits of the value to minimize confusion caused by many (similar) names like
critical
,importance-high
, etc.
Garden Clusters
When using the gardener-operator
for managing the garden runtime and virtual cluster, the following PriorityClass
es are available:
PriorityClass
es for Garden Control Plane Components
Name | Priority | Associated Components (Examples) |
---|---|---|
gardener-garden-system-critical | 999999550 | gardener-operator , gardener-resource-manager |
gardener-garden-system-500 | 999999500 | virtual-garden-etcd-events , virtual-garden-etcd-main |
gardener-garden-system-400 | 999999400 | |
gardener-garden-system-300 | 999999300 | vpa-admission-controller , etcd-druid |
gardener-garden-system-200 | 999999200 | vpa-recommender , vpa-updater , hvpa-controller |
gardener-garden-system-100 | 999999100 |
Seed Clusters
PriorityClass
es for Seed System Components
Name | Priority | Associated Components (Examples) |
---|---|---|
gardener-system-critical | 999998950 | gardenlet , gardener-resource-manager , istio-ingressgateway , istiod |
gardener-system-900 | 999998900 | Extensions, reversed-vpn-auth-server |
gardener-system-800 | 999998800 | dependency-watchdog-endpoint , dependency-watchdog-probe , etcd-druid , (auditlog-)mutator , vpa-admission-controller |
gardener-system-700 | 999998700 | auditlog-seed-controller , hvpa-controller , vpa-recommender , vpa-updater |
gardener-system-600 | 999998600 | aggregate-alertmanager , alertmanager , fluent-bit , grafana , kube-state-metrics , nginx-ingress-controller , nginx-k8s-backend , prometheus , loki , seed-prometheus |
gardener-reserve-excess-capacity | -5 | reserve-excess-capacity (ref) |
PriorityClass
es for Shoot Control Plane Components
Name | Priority | Associated Components (Examples) |
---|---|---|
gardener-system-500 | 999998500 | etcd-events , etcd-main , kube-apiserver |
gardener-system-400 | 999998400 | gardener-resource-manager |
gardener-system-300 | 999998300 | cloud-controller-manager , cluster-autoscaler , csi-driver-controller , kube-controller-manager , kube-scheduler , machine-controller-manager , terraformer , vpn-seed-server |
gardener-system-200 | 999998200 | csi-snapshot-controller , csi-snapshot-validation , cert-controller-manager , shoot-dns-service , vpa-admission-controller , vpa-recommender , vpa-updater |
gardener-system-100 | 999998100 | alertmanager , grafana-operators , grafana-users , kube-state-metrics , prometheus , loki , event-logger |
Shoot Clusters
PriorityClass
es for Shoot System Components
Name | Priority | Associated Components (Examples) |
---|---|---|
system-node-critical (created by Kubernetes) | 2000001000 | calico-node , kube-proxy , apiserver-proxy , csi-driver , egress-filter-applier |
system-cluster-critical (created by Kubernetes) | 2000000000 | calico-typha , calico-kube-controllers , coredns , vpn-shoot |
gardener-shoot-system-900 | 999999900 | node-problem-detector |
gardener-shoot-system-800 | 999999800 | calico-typha-horizontal-autoscaler , calico-typha-vertical-autoscaler |
gardener-shoot-system-700 | 999999700 | blackbox-exporter , node-exporter |
gardener-shoot-system-600 | 999999600 | addons-nginx-ingress-controller , addons-nginx-ingress-k8s-backend , kubernetes-dashboard , kubernetes-metrics-scraper |
14 - Process
Releases, Features, Hotfixes
This document describes how to contribute features or hotfixes, and how new Gardener releases are usually scheduled, validated, etc.
Releases
The @gardener-maintainers are trying to provide a new release roughly every other week (depending on their capacity and the stability/robustness of the master
branch).
Hotfixes are usually maintained for the latest three minor releases, though, there are no fixed release dates.
Release Responsible Plan
Version | Week No | Begin Validation Phase | Due Date | Release Responsible |
---|---|---|---|---|
v1.63 | Week 01-04 | January 2, 2023 | January 29, 2023 | @shafeeqes |
v1.64 | Week 05-06 | January 30, 2023 | February 12, 2023 | @ary1992 |
v1.65 | Week 07-08 | February 13, 2023 | February 26, 2023 | @timuthy |
v1.66 | Week 09-10 | February 27, 2023 | March 12, 2023 | @plkokanov |
v1.67 | Week 11-12 | March 13, 2023 | March 26, 2023 | @rfranzke |
v1.68 | Week 13-14 | March 27, 2023 | April 9, 2023 | @acumino |
v1.69 | Week 15-16 | April 10, 2023 | April 23, 2023 | @oliver-goetz |
v1.70 | Week 17-18 | April 24, 2023 | May 7, 2023 | @ialidzhikov |
v1.71 | Week 19-20 | May 8, 2023 | May 21, 2023 | @shafeeqes |
v1.72 | Week 21-22 | May 22, 2023 | June 4, 2023 | @ary1992 |
v1.73 | Week 23-24 | June 5, 2023 | June 18, 2023 | @timuthy |
v1.74 | Week 25-26 | June 19, 2023 | July 2, 2023 | @oliver-goetz |
v1.75 | Week 27-28 | July 3, 2023 | July 16, 2023 | @rfranzke |
v1.76 | Week 29-30 | July 17, 2023 | July 30, 2023 | @plkokanov |
v1.77 | Week 31-32 | July 31, 2023 | August 13, 2023 | @ialidzhikov |
v1.78 | Week 33-34 | August 14, 2023 | August 27, 2023 | @acumino |
Apart from the release of the next version, the release responsible is also taking care of potential hotfix releases of the last three minor versions. The release responsible is the main contact person for coordinating new feature PRs for the next minor versions or cherry-pick PRs for the last three minor versions.
Click to expand the archived release responsible associations!
Version | Week No | Begin Validation Phase | Due Date | Release Responsible |
---|---|---|---|---|
v1.17 | Week 07-08 | February 15, 2021 | February 28, 2021 | @rfranzke |
v1.18 | Week 09-10 | March 1, 2021 | March 14, 2021 | @danielfoehrKn |
v1.19 | Week 11-12 | March 15, 2021 | March 28, 2021 | @timebertt |
v1.20 | Week 13-14 | March 29, 2021 | April 11, 2021 | @vpnachev |
v1.21 | Week 15-16 | April 12, 2021 | April 25, 2021 | @timuthy |
v1.22 | Week 17-18 | April 26, 2021 | May 9, 2021 | @BeckerMax |
v1.23 | Week 19-20 | May 10, 2021 | May 23, 2021 | @ialidzhikov |
v1.24 | Week 21-22 | May 24, 2021 | June 5, 2021 | @stoyanr |
v1.25 | Week 23-24 | June 7, 2021 | June 20, 2021 | @rfranzke |
v1.26 | Week 25-26 | June 21, 2021 | July 4, 2021 | @danielfoehrKn |
v1.27 | Week 27-28 | July 5, 2021 | July 18, 2021 | @timebertt |
v1.28 | Week 29-30 | July 19, 2021 | August 1, 2021 | @ialidzhikov |
v1.29 | Week 31-32 | August 2, 2021 | August 15, 2021 | @timuthy |
v1.30 | Week 33-34 | August 16, 2021 | August 29, 2021 | @BeckerMax |
v1.31 | Week 35-36 | August 30, 2021 | September 12, 2021 | @stoyanr |
v1.32 | Week 37-38 | September 13, 2021 | September 26, 2021 | @vpnachev |
v1.33 | Week 39-40 | September 27, 2021 | October 10, 2021 | @voelzmo |
v1.34 | Week 41-42 | October 11, 2021 | October 24, 2021 | @plkokanov |
v1.35 | Week 43-44 | October 25, 2021 | November 7, 2021 | @kris94 |
v1.36 | Week 45-46 | November 8, 2021 | November 21, 2021 | @timebertt |
v1.37 | Week 47-48 | November 22, 2021 | December 5, 2021 | @danielfoehrKn |
v1.38 | Week 49-50 | December 6, 2021 | December 19, 2021 | @rfranzke |
v1.39 | Week 01-04 | January 3, 2022 | January 30, 2022 | @ialidzhikov, @timuthy |
v1.40 | Week 05-06 | January 31, 2022 | February 13, 2022 | @BeckerMax |
v1.41 | Week 07-08 | February 14, 2022 | February 27, 2022 | @plkokanov |
v1.42 | Week 09-10 | February 28, 2022 | March 13, 2022 | @kris94 |
v1.43 | Week 11-12 | March 14, 2022 | March 27, 2022 | @rfranzke |
v1.44 | Week 13-14 | March 28, 2022 | April 10, 2022 | @timebertt |
v1.45 | Week 15-16 | April 11, 2022 | April 24, 2022 | @acumino |
v1.46 | Week 17-18 | April 25, 2022 | May 8, 2022 | @ialidzhikov |
v1.47 | Week 19-20 | May 9, 2022 | May 22, 2022 | @shafeeqes |
v1.48 | Week 21-22 | May 23, 2022 | June 5, 2022 | @ary1992 |
v1.49 | Week 23-24 | June 6, 2022 | June 19, 2022 | @plkokanov |
v1.50 | Week 25-26 | June 20, 2022 | July 3, 2022 | @rfranzke |
v1.51 | Week 27-28 | July 4, 2022 | July 17, 2022 | @timebertt |
v1.52 | Week 29-30 | July 18, 2022 | July 31, 2022 | @acumino |
v1.53 | Week 31-32 | August 1, 2022 | August 14, 2022 | @kris94 |
v1.54 | Week 33-34 | August 15, 2022 | August 28, 2022 | @ialidzhikov |
v1.55 | Week 35-36 | August 29, 2022 | September 11, 2022 | @oliver-goetz |
v1.56 | Week 37-38 | September 12, 2022 | September 25, 2022 | @shafeeqes |
v1.57 | Week 39-40 | September 26, 2022 | October 9, 2022 | @ary1992 |
v1.58 | Week 41-42 | October 10, 2022 | October 23, 2022 | @plkokanov |
v1.59 | Week 43-44 | October 24, 2022 | November 6, 2022 | @rfranzke |
v1.60 | Week 45-46 | November 7, 2022 | November 20, 2022 | @acumino |
v1.61 | Week 47-48 | November 21, 2022 | December 4, 2022 | @ialidzhikov |
v1.62 | Week 49-50 | December 5, 2022 | December 18, 2022 | @oliver-goetz |
Release Validation
The release phase for a new minor version lasts two weeks. Typically, the first week is used for the validation of the release. This phase includes the following steps:
master
(or latestrelease-*
branch) is deployed to a development landscape that already hosts some existing seed and shoot clusters.- An extended test suite is triggered by the “release responsible” which:
- executes the Gardener integration tests for different Kubernetes versions, infrastructures, and
Shoot
settings. - executes the Kubernetes conformance tests.
- executes further tests like Kubernetes/OS patch/minor version upgrades.
- executes the Gardener integration tests for different Kubernetes versions, infrastructures, and
- Additionally, every four hours (or on demand) more tests (e.g., including the Kubernetes e2e test suite) are executed for different infrastructures.
- The “release responsible” is verifying new features or other notable changes (derived of the draft release notes) in this development system.
Usually, the new release is triggered in the beginning of the second week if all tests are green, all checks were successful, and if all of the planned verifications were performed by the release responsible.
Contributing New Features or Fixes
Please refer to the Gardener contributor guide.
Besides a lot of a general information, it also provides a checklist for newly created pull requests that may help you to prepare your changes for an efficient review process.
If you are contributing a fix or major improvement, please take care to open cherry-pick PRs to all affected and still supported versions once the change is approved and merged in the master
branch.
⚠️ Please ensure that your modifications pass the verification checks (linting, formatting, static code checks, tests, etc.) by executing
make verify
before filing your pull request.
The guide applies for both changes to the master
and to any release-*
branch.
All changes must be submitted via a pull request and be reviewed and approved by at least one code owner.
Cherry Picks
This section explains how to initiate cherry picks on release branches within the gardener/gardener
repository.
Prerequisites
Before you initiate a cherry pick, make sure that the following prerequisites are accomplished.
- A pull request merged against the
master
branch. - The release branch exists (check in the branches section).
- Have the
gardener/gardener
repository cloned as follows:- the
origin
remote should point to your fork (alternatively this can be overwritten by passingFORK_REMOTE=<fork-remote>
). - the
upstream
remote should point to the Gardener GitHub org (alternatively this can be overwritten by passingUPSTREAM_REMOTE=<upstream-remote>
).
- the
- Have
hub
installed, which is most easily installed viago get github.com/github/hub
assuming you have a standard golang development environment. - A GitHub token which has permissions to create a PR in an upstream branch.
Initiate a Cherry Pick
Run the [cherry pick script][cherry-pick-script].
This example applies a master branch PR #3632 to the remote branch
upstream/release-v3.14
:GITHUB_USER=<your-user> hack/cherry-pick-pull.sh upstream/release-v3.14 3632
Be aware the cherry pick script assumes you have a git remote called
upstream
that points at the Gardener GitHub org.You will need to run the cherry pick script separately for each patch release you want to cherry pick to. Cherry picks should be applied to all active release branches where the fix is applicable.
When asked for your GitHub password, provide the created GitHub token rather than your actual GitHub password. Refer https://github.com/github/hub/issues/2655#issuecomment-735836048
15 - Secrets Management
Secrets Management for Seed and Shoot Cluster
The gardenlet needs to create quite some amount of credentials (certificates, private keys, passwords, etc.) for seed and shoot clusters in order to ensure secure deployments. Such credentials typically should be renewed automatically when their validity expires, rotated regularly, and they potentially need to be persisted such that they don’t get lost in case of a control plane migration or a lost seed cluster.
SecretsManager Introduction
These requirements can be covered by using the SecretsManager
package maintained in pkg/utils/secrets/manager
.
It is built on top of the ConfigInterface
and DataInterface
interfaces part of pkg/utils/secrets
and provides the following functions:
Generate(context.Context, secrets.ConfigInterface, ...GenerateOption) (*corev1.Secret, error)
This method either retrieves the current secret for the given configuration or it (re)generates it in case the configuration changed, the signing CA changed (for certificate secrets), or when proactive rotation was triggered. If the configuration describes a certificate authority secret then this method automatically generates a bundle secret containing the current and potentially the old certificate. Available
GenerateOption
s:SignedByCA(string, ...SignedByCAOption)
: This is only valid for certificate secrets and automatically retrieves the correct certificate authority in order to sign the provided server or client certificate.- There are two
SignedByCAOption
s:UseCurrentCA
. This option will sign server certificates with the new/current CA in case of a CA rotation. For more information, please refer to the “Certificate Signing” section below.UseOldCA
. This option will sign client certificates with the old CA in case of a CA rotation. For more information, please refer to the “Certificate Signing” section below.
- There are two
Persist()
: This marks the secret such that it gets persisted in theShootState
resource in the garden cluster. Consequently, it should only be used for secrets related to a shoot cluster.Rotate(rotationStrategy)
: This specifies the strategy in case this secret is to be rotated or regenerated (eitherInPlace
which immediately forgets about the old secret, orKeepOld
which keeps the old secret in the system).IgnoreOldSecrets()
: This specifies that old secrets should not be considered and loaded (contrary to the default behavior). It should be used when old secrets are no longer important and can be “forgotten” (e.g. in “phase 2” (t2
) of the CA certificate rotation). Such old secrets will be deleted onCleanup()
.IgnoreOldSecretsAfter(time.Duration)
: This specifies that old secrets should not be considered and loaded once a given duration after rotation has passed. It can be used to clean up old secrets after automatic rotation (e.g. the Seed cluster CA is automatically rotated when its validity will soon end and the old CA will be cleaned up 24 hours after triggering the rotation).Validity(time.Duration)
: This specifies how long the secret should be valid. For certificate secret configurations, the manager will automatically deduce this information from the generated certificate.
Get(string, ...GetOption) (*corev1.Secret, bool)
This method retrieves the current secret for the given name. In case the secret in question is a certificate authority secret then it retrieves the bundle secret by default. It is important that this method only knows about secrets for which there were prior
Generate
calls. AvailableGetOption
s:Bundle
(default): This retrieves the bundle secret.Current
: This retrieves the current secret.Old
: This retrieves the old secret.
Cleanup(context.Context) error
This method deletes secrets which are no longer required. No longer required secrets are those still existing in the system which weren’t detected by prior
Generate
calls. Consequently, only callCleanup
after you have executedGenerate
calls for all desired secrets.
Some exemplary usages would look as follows:
secret, err := k.secretsManager.Generate(
ctx,
&secrets.CertificateSecretConfig{
Name: "my-server-secret",
CommonName: "server-abc",
DNSNames: []string{"first-name", "second-name"},
CertType: secrets.ServerCert,
SkipPublishingCACertificate: true,
},
secretsmanager.SignedByCA("my-ca"),
secretsmanager.Persist(),
secretsmanager.Rotate(secretsmanager.InPlace),
)
if err != nil {
return err
}
As explained above, the caller does not need to care about the renewal, rotation or the persistence of this secret - all of these concerns are handled by the secrets manager.
Automatic renewal of secrets happens when their validity approaches 80% or less than 10d
are left until expiration.
In case a CA certificate is needed by some component, then it can be retrieved as follows:
caSecret, found := k.secretsManager.Get("my-ca")
if !found {
return fmt.Errorf("secret my-ca not found")
}
As explained above, this returns the bundle secret for the CA my-ca
which might potentially contain both the current and the old CA (in case of rotation/regeneration).
Certificate Signing
By default, client certificates are always signed by the current CA while server certificate are signed by the old CA (if it exists). This is to ensure a smooth exchange of certificate during a CA rotation (typically has two phases, ref GEP-18):
- Client certificates:
- In phase 1, clients get new certificates as soon as possible to ensure that all clients have been adapted before phase 2.
- In phase 2, the respective server drops accepting certificates signed by the old CA.
- Server certificates:
- In phase 1, servers still use their old/existing certificates to allow clients to update their CA bundle used for verification of the servers’ certificates.
- In phase 2, the old CA is dropped, hence servers need to get a certificate signed by the new/current CA. At this point in time, clients have already adapted their CA bundles.
Always Sign Server Certificates with Current CA
In case you control all clients and update them at the same time as the server, it is possible to make the secrets manager generate even server certificates with the new/current CA. This can help to prevent certificate mismatches when the CA bundle is already exchanged while the server still serves with a certificate signed by a CA no longer part of the bundle.
Let’s consider the two following examples:
gardenlet
deploys a webhook server (gardener-resource-manager
) and a correspondingMutatingWebhookConfiguration
at the same time. In this case, the server certificate should be generated with the new/current CA to avoid above mentioned certificate mismatches during a CA rotation.gardenlet
deploys a server (etcd
) in one step, and a client (kube-apiserver
) in a subsequent step. In this case, the default behaviour should apply (server certificate should be signed by old/existing CA).
Always Sign Client Certificate with Old CA
In the unusual case where the client is deployed before the server, it might be useful to always use the old CA for signing the client’s certificate. This can help to prevent certificate mismatches when the client already gets a new certificate while the server still only accepts certificates signed by the old CA.
Let’s consider the following example:
gardenlet
deploys thekube-apiserver
before thekubelet
. However, thekube-apiserver
has a client certificate signed by theca-kubelet
in order to communicate with it (e.g., when retrieving logs or forwarding ports). In this case, the client certificate should be generated with the old CA to avoid above mentioned certificate mismatches during a CA rotation.
Reusing the SecretsManager in Other Components
While the SecretsManager
is primarily used by gardenlet, it can be reused by other components (e.g. extensions) as well for managing secrets that are specific to the component or extension. For example, provider extensions might use their own SecretsManager
instance for managing the serving certificate of cloud-controller-manager
.
External components that want to reuse the SecretsManager
should consider the following aspects:
- On initialization of a
SecretsManager
, pass anidentity
specific to the component, controller and purpose. For example, gardenlet’s shoot controller usesgardenlet
as theSecretsManager
’s identity, theWorker
controller inprovider-foo
should useprovider-foo-worker
, and theControlPlane
controller should useprovider-foo-controlplane-exposure
forControlPlane
objects of purposeexposure
. The given identity is added as a value for themanager-identity
label on managedSecret
s. This label is used by theCleanup
function to select only thoseSecret
s that are actually managed by the particularSecretManager
instance. This is done to prevent removing still neededSecret
s that are managed by other instances. - Generate dedicated CAs for signing certificates instead of depending on CAs managed by gardenlet.
- Names of
Secret
s managed by externalSecretsManager
instances must not conflict withSecret
names from other instances (e.g. gardenlet). - For CAs that should be rotated in lock-step with the Shoot CAs managed by gardenlet, components need to pass information about the last rotation initiation time and the current rotation phase to the
SecretsManager
upon initialization. The relevant information can be retrieved from theCluster
resource under.spec.shoot.status.credentials.rotation.certificateAuthorities
. - Independent of the specific identity, secrets marked with the
Persist
option are automatically saved in theShootState
resource by the gardenlet and are also restored by the gardenlet on Control Plane Migration to the new Seed.
Migrating Existing Secrets To SecretsManager
If you already have existing secrets which were not created with SecretsManager
, then you can (optionally) migrate them by labeling them with secrets-manager-use-data-for=<config-name>
.
For example, if your SecretsManager
generates a CertificateConfigSecret
with name foo
like this
secret, err := k.secretsManager.Generate(
ctx,
&secrets.CertificateSecretConfig{
Name: "foo",
// ...
},
)
and you already have an existing secret in your system whose data should be kept instead of regenerated, then labeling it with secrets-manager-use-data-for=foo
will instruct SecretsManager
accordingly.
⚠️ Caveat: You have to make sure that the existing data
keys match with what SecretsManager
uses:
Secret Type | Data Keys |
---|---|
Basic Auth | username , password , auth |
CA Certificate | ca.crt , ca.key |
Non-CA Certificate | tls.crt , tls.key |
Control Plane Secret | ca.crt , username , password , token , kubeconfig |
ETCD Encryption Key | key , secret |
Kubeconfig | kubeconfig |
RSA Private Key | id_rsa , id_rsa.pub |
Static Token | static_tokens.csv |
VPN TLS Auth | vpn.tlsauth |
Implementation Details
The source of truth for the secrets manager is the list of Secret
s in the Kubernetes cluster it acts upon (typically, the seed cluster).
The persisted secrets in the ShootState
are only used if and only if the shoot is in the Restore
phase - in this case all secrets are just synced to the seed cluster so that they can be picked up by the secrets manager.
In order to prevent kubelets from unneeded watches (thus, causing some significant traffic against the kube-apiserver
), the Secret
s are marked as immutable.
Consequently, they have a unique, deterministic name which is computed as follows:
- For CA secrets, the name is just exactly the name specified in the configuration (e.g.,
ca
). This is for backwards-compatibility and will be dropped in a future release once all components depending on the static name have been adapted. - For all other secrets, the name specified in the configuration is used as prefix followed by an 8-digit hash. This hash is computed out of the checksum of the secret configuration and the checksum of the certificate of the signing CA (only for certificate configurations).
In all cases, the name of the secrets is suffixed with a 5-digit hash computed out of the time when the rotation for this secret was last started.
16 - Seed Network Policies
Network Policies in the Seed Cluster
This document describes the Kubernetes network policies deployed by Gardener into the Seed cluster.
For network policies deployed into the Shoot kube-system
namespace, please see the usage section.
Network policies deployed by Gardener have names and annotations describing their purpose, so this document only highlights a subset of the policies in detail.
Network Policies in the Shoot Namespace in the Seed
The network policies in the Shoot namespace in the Seed can roughly be grouped into policies required for the control plane components and policies required for logging & monitoring.
The network policy deny-all
plays a special role. This policy denies all ingress and egress traffic from each pod in the Shoot namespace.
So per default, a pod running in the control plane cannot talk to any other pod in the whole Seed cluster.
This means the pod needs to have labels matching to appropriate network policies allowing it to talk to exactly the components required to execute its desired functionality.
This has also implications for Gardener extensions that need to deploy additional components into the Shoot's
control plane.
Network Policies for Control Plane Components
This section highlights a selection of network policies that exist in the Shoot namespace in the Seed cluster. In general, the control plane components serve different purposes and thus need access to different pods and network ranges.
In contrast to other network policies, the policy allow-to-shoot-networks
is tailored to the individual Shoot cluster,
because it is based on the network configuration in the Shoot manifest.
It allows pods with the label networking.gardener.cloud/to-shoot-networks=allowed
to access pods in the Shoot pod,
service and node CIDR range. This is used by the Shoot API Server and the Prometheus pods to communicate over VPN/proxy with pods in the Shoot cluster.
This network policy is only useful if reversed vpn is disabled, as otherwise the vpn-seed-server pod in the control plane is the only pod with layer 3 routing to the shoot network.
The policy allow-to-blocked-cidrs
allows pods with the label networking.gardener.cloud/to-blocked-cidrs=allowed
to access IPs that are explicitly blocked for all control planes in a Seed cluster (configurable via spec.networks.blockCIDRS
).
This is used for instance to block the cloud provider’s metadata service.
Another network policy to be highlighted is allow-to-runtime-apiserver
.
Some components need access to the Seed API Server. This can be allowed by labeling the pod with networking.gardener.cloud/to-runtime-apiserver=allowed
.
This policy allows exactly the IPs of the kube-apiserver
of the Seed.
While all other policies have a static set of permissions (do not change during the lifecycle of the Shoot), the policy allow-to-runtime-apiserver
is reconciled to reflect the endpoints in the default
namespace.
This is required because endpoint IPs are not necessarily stable (think of scaling the Seed API Server pods or hibernating the Seed cluster (acting as a managed seed) in a local development environment).
Furthermore, the following network policies exist in the Shoot namespace. These policies are the same for every Shoot control plane.
NAME POD-SELECTOR
# Pods that need to access the Shoot API server. Used by all Kubernetes control plane components.
allow-to-shoot-apiserver networking.gardener.cloud/to-shoot-apiserver=allowed
# allows access to kube-dns/core-dns pods for DNS queries
allow-to-dns networking.gardener.cloud/to-dns=allowed
# allows access to private IP address ranges
allow-to-private-networks networking.gardener.cloud/to-private-networks=allowed
# allows access to all but private IP address ranges
allow-to-public-networks networking.gardener.cloud/to-public-networks=allowed
# allows Ingress to etcd pods from the Shoot's Kubernetes API Server
allow-etcd app=etcd-statefulset,gardener.cloud/role=controlplane
# used by the Shoot API server to allows ingress from pods labeled
# with'networking.gardener.cloud/to-shoot-apiserver=allowed', from Prometheus, and allows Egress to etcd pods
allow-kube-apiserver app=kubernetes,gardener.cloud/role=controlplane,role=apiserver
Network Policies for Logging & Monitoring
Gardener currently introduces a logging stack based on Loki. So this section is subject to change. For more information, see the Loki Gardener Community Meeting .
These are the logging and monitoring related network policies:
NAME POD-SELECTOR
allow-from-prometheus networking.gardener.cloud/from-prometheus=allowed
allow-grafana component=grafana,gardener.cloud/role=monitoring
allow-prometheus app=prometheus,gardener.cloud/role=monitoring,role=monitoring
allow-to-aggregate-prometheus networking.gardener.cloud/to-aggregate-prometheus=allowed
allow-to-loki networking.gardener.cloud/to-loki=allowed
For instance, let’s take a look at the network policy from-prometheus
.
As part of the shoot reconciliation flow, Gardener deploys a shoot-specific Prometheus into the shoot namespace.
Each pod that should be scraped for metrics must be labeled with networking.gardener.cloud/from-prometheus=allowed
to allow incoming network requests by the Prometheus pod.
Most components of the Shoot cluster’s control plane expose metrics and are therefore labeled appropriately.
Implications for Gardener Extensions
Gardener extensions sometimes need to deploy additional components into the Shoot namespace in the Seed hosting the control plane.
For example, the Gardener extension provider-aws deploys the MachineControllerManager
into the Shoot namespace, that is ultimately responsible to create the VMs with the cloud provider AWS.
Every Shoot namespace in the Seed contains the network policy deny-all
.
This requires a pod deployed by a Gardener extension to have labels from network policies that exist in the Shoot namespace, that allow the required network ranges.
Additionally, extensions could also deploy their own network policies. This is used e.g by the Gardener extension provider-aws to serve Admission Webhooks for the Shoot API server that need to be reachable from within the Shoot namespace.
The pod can use an arbitrary combination of network policies.
Network Policies in the garden
Namespace
The network policies in the garden
namespace are, with a few exceptions (e.g Kubernetes control plane specific policies), the same as in the Shoot namespaces.
For your reference, these are all the deployed network policies.
NAME POD-SELECTOR
allow-fluentbit app=fluent-bit,gardener.cloud/role=logging,role=logging
allow-from-aggregate-prometheus networking.gardener.cloud/from-aggregate-prometheus=allowed
allow-to-aggregate-prometheus networking.gardener.cloud/to-aggregate-prometheus=allowed
allow-to-all-shoot-apiservers networking.gardener.cloud/to-all-shoot-apiservers=allowed
allow-to-blocked-cidrs networking.gardener.cloud/to-blocked-cidrs=allowed
allow-to-dns networking.gardener.cloud/to-dns=allowed
allow-to-loki networking.gardener.cloud/to-loki=allowed
allow-to-private-networks networking.gardener.cloud/to-private-networks=allowed
allow-to-public-networks networking.gardener.cloud/to-public-networks=allowed
allow-to-runtime-apiserver networking.gardener.cloud/to-runtime-apiserver=allowed
This section describes the network policies that are unique to the garden
namespace.
The network policy allow-to-all-shoot-apiservers
allows pods to access every Shoot
API server in the Seed
.
This is, for instance, used by the dependency watchdog to regularly check
the health of all the Shoot API servers.
Gardener deploys a central Prometheus instance in the garden
namespace that fetches metrics and data from all seed cluster nodes and all seed cluster pods.
The network policies allow-to-aggregate-prometheus
and allow-from-aggregate-prometheus
allow traffic from and to this Prometheus instance.
Worth mentioning is, that the network policy allow-to-shoot-networks
does not exist in the garden
namespace. This is to forbid Gardener system components to talk to workload deployed in the Shoot VPC.
17 - Testing
Testing Strategy and Developer Guideline
This document walks you through:
- What kind of tests we have in Gardener
- How to run each of them
- What purpose each kind of test serves
- How to best write tests that are correct, stable, fast and maintainable
- How to debug tests that are not working as expected
The document is aimed towards developers that want to contribute code and need to write tests, as well as maintainers and reviewers that review test code. It serves as a common guide that we commit to follow in our project to ensure consistency in our tests, good coverage for high confidence, and good maintainability.
The guidelines are not meant to be absolute rules. Always apply common sense and adapt the guideline if it doesn’t make much sense for some cases. If in doubt, don’t hesitate to ask questions during a PR review (as an author, but also as a reviewer). Add new learnings as soon as we make them!
Generally speaking, tests are a strict requirement for contributing new code. If you touch code that is currently untested, you need to add tests for the new cases that you introduce as a minimum. Ideally though, you would add the missing test cases for the current code as well (boy scout rule – “always leave the campground cleaner than you found it”).
Writing Tests (Relevant for All Kinds)
- We follow BDD (behavior-driven development) testing principles and use Ginkgo, along with Gomega.
- Make sure to check out their extensive guides for more information and how to best leverage all of their features
- Use
By
to structure test cases with multiple steps, so that steps are easy to follow in the logs: example test - Call
defer GinkgoRecover()
if making assertions in goroutines: doc, example test - Use
DeferCleanup
instead of cleaning up manually (or use custom coding from the test framework): example test, example testDeferCleanup
makes sure to run the cleanup code in the right point in time, e.g., aDeferCleanup
added inBeforeEach
is executed withAfterEach
.
- Test failures should point to an exact location, so that failures in CI aren’t too difficult to debug/fix.
- Use
ExpectWithOffset
for making assertions in helper funcs likeexpectSomethingWasCreated
: example test - Make sure to add additional descriptions to Gomega matchers if necessary (e.g. in a loop): example test
- Use
- Introduce helper functions for assertions to make test more readable where applicable: example test
- Introduce custom matchers to make tests more readable where applicable: example matcher
- Don’t rely on accurate timing of
time.Sleep
and friends.- If doing so, CPU throttling in CI will make tests flaky, example flake
- Use fake clocks instead, example PR
- Use the same client schemes that are also used by production code to avoid subtle bugs/regressions: example PR, production schemes, usage in test
- Make sure that your test is actually asserting the right thing and it doesn’t pass if the exact bug is introduced that you want to prevent.
- Use specific error matchers instead of asserting any error has happened, make sure that the corresponding branch in the code is tested, e.g., preferover
Expect(err).To(MatchError("foo"))
Expect(err).To(HaveOccurred())
- If you’re unsure about your test’s behavior, attaching the debugger can sometimes be helpful to make sure your test is correct.
- Use specific error matchers instead of asserting any error has happened, make sure that the corresponding branch in the code is tested, e.g., prefer
- About overwriting global variables:
- This is a common pattern (or hack?) in go for faking calls to external functions.
- However, this can lead to races, when the global variable is used from a goroutine (e.g., the function is called).
- Alternatively, set fields on structs (passed via parameter or set directly): this is not racy, as struct values are typically (and should be) only used for a single test case.
- An alternative to dealing with function variables and fields:
- Add an interface which your code depends on
- Write a fake and a real implementation (similar to
clock.Clock.Sleep
) - The real implementation calls the actual function (
clock.RealClock.Sleep
callstime.Sleep
) - The fake implementation does whatever you want it to do for your test (
clock.FakeClock.Sleep
waits until the test code advanced the time)
- Use constants in test code with care.
- Typically, you should not use constants from the same package as the tested code, instead use literals.
- If the constant value is changed, tests using the constant will still pass, although the “specification” is not fulfilled anymore.
- There are cases where it’s fine to use constants, but keep this caveat in mind when doing so.
- Creating sample data for tests can be a high effort.
- If valuable, add a package for generating common sample data, e.g. Shoot/Cluster objects.
- Make use of the
testdata
directory for storing arbitrary sample data needed by tests (helm charts, YAML manifests, etc.), example PR- From https://pkg.go.dev/cmd/go/internal/test:
The go tool will ignore a directory named “testdata”, making it available to hold ancillary data needed by the tests.
- From https://pkg.go.dev/cmd/go/internal/test:
Unit Tests
Running Unit Tests
Run all unit tests:
make test
Run all unit tests with test coverage:
make test-cov
open test.coverage.html
make test-cov-clean
Run unit tests of specific packages:
# run with same settings like in CI (race dector, timeout, ...)
./hack/test.sh ./pkg/resourcemanager/controller/... ./pkg/utils/secrets/...
# freestyle
go test ./pkg/resourcemanager/controller/... ./pkg/utils/secrets/...
ginkgo run ./pkg/resourcemanager/controller/... ./pkg/utils/secrets/...
Debugging Unit Tests
Use ginkgo to focus on (a set of) test specs via code or via CLI flags. Remember to unfocus specs before contributing code, otherwise your PR tests will fail.
$ ginkgo run --focus "should delete the unused resources" ./pkg/resourcemanager/controller/garbagecollector
...
Will run 1 of 3 specs
SS•
Ran 1 of 3 Specs in 0.003 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 2 Skipped
PASS
Use ginkgo to run tests until they fail:
$ ginkgo run --until-it-fails ./pkg/resourcemanager/controller/garbagecollector
...
Ran 3 of 3 Specs in 0.004 seconds
SUCCESS! -- 3 Passed | 0 Failed | 0 Pending | 0 Skipped
PASS
All tests passed...
Will keep running them until they fail.
This was attempt #58
No, seriously... you can probably stop now.
Use the stress
tool for deflaking tests that fail sporadically in CI, e.g., due resource contention (CPU throttling):
# get the stress tool
go install golang.org/x/tools/cmd/stress@latest
# build a test binary
ginkgo build ./pkg/resourcemanager/controller/garbagecollector
# alternatively
go test -c ./pkg/resourcemanager/controller/garbagecollector
# run the test in parallel and report any failures
stress -p 16 ./pkg/resourcemanager/controller/garbagecollector/garbagecollector.test -ginkgo.focus "should delete the unused resources"
5s: 1077 runs so far, 0 failures
10s: 2160 runs so far, 0 failures
stress
will output a path to a file containing the full failure message when a test run fails.
Purpose of Unit Tests
- Unit tests prove the correctness of a single unit according to the specification of its interface.
- Think: Is the unit that I introduced doing what it is supposed to do for all cases?
- Unit tests protect against regressions caused by adding new functionality to or refactoring of a single unit.
- Think: Is the unit that was introduced earlier (by someone else) and that I changed still doing what it was supposed to do for all cases?
- Example units: functions (conversion, defaulting, validation, helpers), structs (helpers, basic building blocks like the Secrets Manager), predicates, event handlers.
- For these purposes, unit tests need to cover all important cases of input for a single unit and cover edge cases / negative paths as well (e.g., errors).
- Because of the possible high dimensionality of test input, unit tests need to be fast to execute: individual test cases should not take more than a few seconds, test suites not more than 2 minutes.
- Fuzzing can be used as a technique in addition to usual test cases for covering edge cases.
- Test coverage can be used as a tool during test development for covering all cases of a unit.
- However, test coverage data can be a false safety net.
- Full line coverage doesn’t mean you have covered all cases of valid input.
- We don’t have strict requirements for test coverage, as it doesn’t necessarily yield the desired outcome.
- Unit tests should not test too large components, e.g. entire controller
Reconcile
functions.- If a function/component does many steps, it’s probably better to split it up into multiple functions/components that can be unit tested individually
- There might be special cases for very small
Reconcile
functions. - If there are a lot of edge cases, extract dedicated functions that cover them and use unit tests to test them.
- Usual-sized controllers should rather be tested in integration tests.
- Individual parts (e.g. helper functions) should still be tested in unit test for covering all cases, though.
- Unit tests are especially easy to run with a debugger and can help in understanding concrete behavior of components.
Writing Unit Tests
- For the sake of execution speed, fake expensive calls/operations, e.g. secret generation: example test
- Generally, prefer fakes over mocks, e.g., use controller-runtime fake client over mock clients.
- Mocks decrease maintainability because they expect the tested component to follow a certain way to reach the desired goal (e.g., call specific functions with particular arguments), example consequence
- Generally, fakes should be used in “result-oriented” test code (e.g., that a certain object was labelled, but the test doesn’t care if it was via patch or update as both a valid ways to reach the desired goal).
- Although rare, there are valid use cases for mocks, e.g. if the following aspects are important for correctness:
- Asserting that an exact function is called
- Asserting that functions are called in a specific order
- Asserting that exact parameters/values/… are passed
- Asserting that a certain function was not called
- Many of these can also be verified with fakes, although mocks might be simpler
- Only use mocks if the tested code directly calls the mock; never if the tested code only calls the mock indirectly (e.g., through a helper package/function).
- Keep in mind the maintenance implications of using mocks:
- Can you make a valid non-behavioral change in the code without breaking the test or dependent tests?
- It’s valid to mix fakes and mocks in the same test or between test cases.
- Generally, use the go test package, i.e., declare
package <production_package>_test
:- Helps in avoiding cyclic dependencies between production, test and helper packages
- Also forces you to distinguish between the public (exported) API surface of your code and internal state that might not be of interest to tests
- It might be valid to use the same package as the tested code if you want to test unexported functions.
- Alternatively, an
internal
package can be used to host “internal” helpers: example package
- Alternatively, an
- Helpers can also be exported if no one is supposed to import the containing package (e.g. controller package).
Integration Tests (envtests)
Integration tests in Gardener use the sigs.k8s.io/controller-runtime/pkg/envtest
package.
It sets up a temporary control plane (etcd + kube-apiserver) and runs the test against it.
The test suites start their individual envtest
environment before running the tested controller/webhook and executing test cases.
Before exiting, the test suites tear down the temporary test environment.
Package github.com/gardener/gardener/pkg/envtest
augments the controller-runtime’s envtest
package by starting and registering gardener-apiserver
.
This is used to test controllers that act on resources in the Gardener APIs (aggregated APIs).
Historically, test machinery tests have also been called “integration tests”. However, test machinery does not perform integration testing but rather executes a form of end-to-end tests against a real landscape. Hence, we tried to sharpen the terminology that we use to distinguish between “real” integration tests and test machinery tests but you might still find “integration tests” referring to test machinery tests in old issues or outdated documents.
Running Integration Tests
The test-integration
make rule prepares the environment automatically by downloading the respective binaries (if not yet present) and setting the necessary environment variables.
make test-integration
If you want to run a specific set of integration tests, you can also execute them using ./hack/test-integration.sh
directly instead of using the test-integration
rule. Prior to execution, the PATH
environment variable needs to be set to also included the tools binary directory. For example:
export PATH="$PATH:$PWD/hack/tools/bin"
source ./hack/test-integration.env
./hack/test-integration.sh ./test/integration/resourcemanager/tokenrequestor
The script takes care of preparing the environment for you.
If you want to execute the test suites directly via go test
or ginkgo
, you have to point the KUBEBUILDER_ASSETS
environment variable to the path that contains the etcd and kube-apiserver binaries. Alternatively, you can install the binaries to /usr/local/kubebuilder/bin
. Additionally, the environment variables from hack/test-integration.env
should be sourced.
Debugging Integration Tests
You can configure envtest
to use an existing cluster or control plane instead of starting a temporary control plane that is torn down immediately after executing the test.
This can be helpful for debugging integration tests because you can easily inspect what is going on in your test environment with kubectl
.
While you can use an existing cluster (e.g., kind
), some test suites expect that no controllers and no nodes are running in the test environment (as it is the case in envtest
test environments).
Hence, using a full-blown cluster with controllers and nodes might sometimes be impractical, as you would need to stop cluster components for the tests to work.
You can use make start-envtest
to start an envtest
test environment that is managed separately from individual test suites.
This allows you to keep the test environment running for as long as you want, and to debug integration tests by executing multiple test runs in parallel or inspecting test runs using kubectl
.
When you are finished, just hit CTRL-C
for tearing down the test environment.
The kubeconfig for the test environment is placed in dev/envtest-kubeconfig.yaml
.
make start-envtest
brings up an envtest
environment using the default configuration.
If your test suite requires a different control plane configuration (e.g., disabled admission plugins or enabled feature gates), feel free to locally modify the configuration in test/start-envtest
while debugging.
Run an envtest
suite (not using gardener-apiserver
) against an existing test environment:
make start-envtest
# in another terminal session:
export KUBECONFIG=$PWD/dev/envtest-kubeconfig.yaml
export USE_EXISTING_CLUSTER=true
# run test with verbose output
./hack/test-integration.sh -v ./test/integration/resourcemanager/health -ginkgo.v
# in another terminal session:
export KUBECONFIG=$PWD/dev/envtest-kubeconfig.yaml
# watch test objects
k get managedresource -A -w
Run a gardenerenvtest
suite (using gardener-apiserver
) against an existing test environment:
# modify test/start-envtest to disable admission plugins and enable feature gates like in test suite...
make start-envtest ENVTEST_TYPE=gardener
# in another terminal session:
export KUBECONFIG=$PWD/dev/envtest-kubeconfig.yaml
export USE_EXISTING_GARDENER=true
# run test with verbose output
./hack/test-integration.sh -v ./test/integration/controllermanager/bastion -ginkgo.v
# in another terminal session:
export KUBECONFIG=$PWD/dev/envtest-kubeconfig.yaml
# watch test objects
k get bastion -A -w
Similar to debugging unit tests, the stress
tool can help hunting flakes in integration tests.
Though, you might need to run less tests in parallel though (specified via -p
) and have a bit more patience.
Generally, reproducing flakes in integration tests is easier when stress-testing against an existing test environment instead of starting temporary individual control planes per test run.
Stress-test an envtest
suite (not using gardener-apiserver
):
# build a test binary
ginkgo build ./test/integration/resourcemanager/health
# prepare a test environment to run the test against
make start-envtest
# in another terminal session:
export KUBECONFIG=$PWD/dev/envtest-kubeconfig.yaml
export USE_EXISTING_CLUSTER=true
# use same timeout settings like in CI
source ./hack/test-integration.env
# switch to test package directory like `go test`
cd ./test/integration/resourcemanager/health
# run the test in parallel and report any failures
stress -ignore "unable to grab random port" -p 16 ./health.test
...
Stress-test a gardenerenvtest
suite (using gardener-apiserver
):
# modify test/start-envtest to disable admission plugins and enable feature gates like in test suite...
# build a test binary
ginkgo build ./test/integration/controllermanager/bastion
# prepare a test environment including gardener-apiserver to run the test against
make start-envtest ENVTEST_TYPE=gardener
# in another terminal session:
export KUBECONFIG=$PWD/dev/envtest-kubeconfig.yaml
export USE_EXISTING_GARDENER=true
# use same timeout settings like in CI
source ./hack/test-integration.env
# switch to test package directory like `go test`
cd ./test/integration/controllermanager/bastion
# run the test in parallel and report any failures
stress -ignore "unable to grab random port" -p 16 ./bastion.test
...
Purpose of Integration Tests
- Integration tests prove that multiple units are correctly integrated into a fully-functional component of the system.
- Example components with multiple units:
- A controller with its reconciler, watches, predicates, event handlers, queues, etc.
- A webhook with its server, handler, decoder, and webhook configuration.
- Integration tests set up a full component (including used libraries) and run it against a test environment close to the actual setup.
- e.g., start controllers against a real Kubernetes control plane to catch bugs that can only happen when talking to a real API server.
- Integration tests are generally more expensive to run (e.g., in terms of execution time).
- Integration tests should not cover each and every detailed case.
- Rather than that, cover a good portion of the “usual” cases that components will face during normal operation (positive and negative test cases).
- Also, there is no need to cover all failure cases or all cases of predicates -> they should be covered in unit tests already.
- Generally, not supposed to “generate test coverage” but to provide confidence that components work well.
- As integration tests typically test only one component (or a cohesive set of components) isolated from others, they cannot catch bugs that occur when multiple controllers interact (could be discovered by e2e tests, though).
- Rule of thumb: a new integration tests should be added for each new controller (an integration test doesn’t replace unit tests though).
Writing Integration Tests
- Make sure to have a clean test environment on both test suite and test case level:
- Set up dedicated test environments (envtest instances) per test suite.
- Use dedicated namespaces per test suite:
- Use
GenerateName
with a test-specific prefix: example test - Restrict the controller-runtime manager to the test namespace by setting
manager.Options.Namespace
: example test - Alternatively, use a test-specific prefix with a random suffix determined upfront: example test
- This can be used to restrict webhooks to a dedicated test namespace: example test
- This allows running a test in parallel against the same existing cluster for deflaking and stress testing: example PR
- Use
- If the controller works on cluster-scoped resources:
- Label the resources with a label specific to the test run, e.g. the test namespace’s name: example test
- Restrict the manager’s cache for these objects with a corresponding label selector: example test
- Alternatively, use a checksum of a random UUID using
uuid.NewUUID()
function: example test - This allows running a test in parallel against the same existing cluster for deflaking and stress testing, even if it works with cluster-scoped resources that are visible to all parallel test runs: example PR
- Use dedicated test resources for each test case:
- Use
GenerateName
: example test - Alternatively, use a checksum of a random UUID using
uuid.NewUUID()
function: example test - Logging the created object names is generally a good idea to support debugging failing or flaky tests: example test
- Always delete all resources after the test case (e.g., via
DeferCleanup
) that were created for the test case - This avoids conflicts between test cases and cascading failures which distract from the actual root failures
- Use
- Don’t tolerate already existing resources (~dirty test environment), code smell: ignoring already exist errors
- Don’t use a cached client in test code (e.g., the one from a controller-runtime manager), always construct a dedicated test client (uncached): example test
- Use asynchronous assertions:
Eventually
andConsistently
.- Never
Expect
anything to happen synchronously (immediately). - Don’t use retry or wait until functions -> use
Eventually
,Consistently
instead: example test - This allows to override the interval/timeout values from outside instead of hard-coding this in the test (see
hack/test-integration.sh
): example PR - Beware of the default
Eventually
/Consistently
timeouts / poll intervals: docs - Don’t set custom (high) timeouts and intervals in test code: example PR
- iInstead, shorten sync period of controllers, overwrite intervals of the tested code, or use fake clocks: example test
- Pass
g Gomega
toEventually
/Consistently
and useg.Expect
in it: docs, example test, example PR - Don’t forget to call
{Eventually,Consistently}.Should()
, otherwise the assertions always silently succeeds without errors: onsi/gomega#561
- Never
- When using Gardener’s envtest (
envtest.GardenerTestEnvironment
):- Disable gardener-apiserver’s admission plugins that are not relevant to the integration test itself by passing
--disable-admission-plugins
: example test - This makes setup / teardown code simpler and ensures to only test code relevant to the tested component itself (but not the entire set of admission plugins)
- e.g., you can disable the
ShootValidator
plugin to createShoots
that reference non-existingSecretBindings
or disable theDeletionConfirmation
plugin to delete Gardener resources without adding a deletion confirmation first.
- Disable gardener-apiserver’s admission plugins that are not relevant to the integration test itself by passing
- Use a custom rate limiter for controllers in integration tests: example test
- This can be used for limiting exponential backoff to shorten wait times.
- Otherwise, if using the default rate limiter, exponential backoff might exceed the timeout of
Eventually
calls and cause flakes.
End-to-End (e2e) Tests (Using provider-local)
We run a suite of e2e tests on every pull request and periodically on the master
branch.
It uses a KinD cluster and skaffold to boostrap a full installation of Gardener based on the current revision, including provider-local.
This allows us to run e2e tests in an isolated test environment and fully locally without any infrastructure interaction.
The tests perform a set of operations on Shoot clusters, e.g. creating, deleting, hibernating and waking up.
These tests are executed in our prow instance at prow.gardener.cloud, see job definition and job history.
Running e2e Tests
You can also run these tests on your development machine, using the following commands:
make kind-up
export KUBECONFIG=$PWD/example/gardener-local/kind/local/kubeconfig
make gardener-up
make test-e2e-local # alternatively: make test-e2e-local-simple
If you want to run a specific set of e2e test cases, you can also execute them using ./hack/test-e2e-local.sh
directly in combination with ginkgo label filters. For example:
./hack/test-e2e-local.sh --label-filter "Shoot && credentials-rotation"
If you want to use an existing shoot instead of creating a new one for the test case and deleting it afterwards, you can specify the existing shoot via the following flags. This can be useful to speed up the development of e2e tests.
./hack/test-e2e-local.sh --label-filter "Shoot && credentials-rotation" -- --project-namespace=garden-local --existing-shoot-name=local
For more information, see Developing Gardener Locally and Deploying Gardener Locally.
Debugging e2e Tests
When debugging e2e test failures in CI, logs of the cluster components can be very helpful.
Our e2e test jobs export logs of all containers running in the kind cluster to prow’s artifacts storage.
You can find them by clicking the Artifacts
link in the top bar in prow’s job view and navigating to artifacts
.
This directory will contain all cluster component logs grouped by node.
Pull all artifacts using gsutil
for searching and filtering the logs locally (use the path displayed in the artifacts view):
gsutil cp -r gs://gardener-prow/pr-logs/pull/gardener_gardener/6136/pull-gardener-e2e-kind/1542030416616099840/artifacts/gardener-local-control-plane /tmp
Purpose of e2e Tests
- e2e tests provide a high level of confidence that our code runs as expected by users when deployed to production.
- They are supposed to catch bugs resulting from interaction between multiple components.
- Test cases should be as close as possible to real usage by end users:
- You should test “from the perspective of the user” (or operator).
- Example: I create a Shoot and expect to be able to connect to it via the provided kubeconfig.
- Accordingly, don’t assert details of the system.
- e.g., the user also wouldn’t expect that there is a kube-apiserver deployment in the seed, they rather expect that they can talk to it no matter how it is deployed
- Only assert details of the system if the tested feature is not fully visible to the end-user and there is no other way of ensuring that the feature works reliably
- e.g., the Shoot CA rotation is not fully visible to the user but is assertable by looking at the secrets in the Seed.
- Pro: can be executed by developers and users without any real infrastructure (provider-local).
- Con: they currently cannot be executed with real infrastructure (e.g., provider-aws), we will work on this as part of #6016.
- Keep in mind that the tested scenario is still artificial in a sense of using default configuration, only a few objects, only a few config/settings combinations are covered.
- We will never be able to cover the full “test matrix” and this should not be our goal.
- Bugs will still be released and will still happen in production; we can’t avoid it.
- Instead, we should add test cases for preventing bugs in features or settings that were frequently regressed: example PR
- Usually e2e tests cover the “straight-forward cases”.
- However, negative test cases can also be included, especially if they are important from the user’s perspective.
Writing e2e Tests
- Always wrap API calls and similar things in
Eventually
blocks: example test- At this point, we are pretty much working with a distributed system and failures can happen anytime.
- Wrapping calls in
Eventually
makes tests more stable and more realistic (usually, you wouldn’t call the system broken if a single API call fails because of a short connectivity issue).
- Most of the points from writing integration tests are relevant for e2e tests as well (especially the points about asynchronous assertions).
- In contrast to integration tests, in e2e tests, it might make sense to specify higher timeouts for
Eventually
calls, e.g., when waiting for aShoot
to be reconciled.- Generally, try to use the default settings for
Eventually
specified via the environment variables. - Only set higher timeouts if waiting for long-running reconciliations to be finished.
- Generally, try to use the default settings for
Gardener Upgrade Tests (Using provider-local)
Gardener upgrade tests setup a kind cluster and deploy Gardener version vX.X.X
before upgrading it to a given version vY.Y.Y
.
This allows verifying whether the current (unreleased) revision/branch (or a specific release) is compatible with the latest (or a specific other) release. The GARDENER_PREVIOUS_RELEASE
and GARDENER_NEXT_RELEASE
environment variables are used to specify the respective versions.
This helps understanding what happens or how the system reacts when Gardener upgrades from versions vX.X.X
to vY.Y.Y
for existing shoots in different states (creation
/hibernation
/wakeup
/deletion
). Gardener upgrade tests also help qualifying releases for all flavors (non-HA or HA with failure tolerance node
/zone
).
Just like E2E tests, upgrade tests also use a KinD cluster and skaffold for bootstrapping a full Gardener installation based on the current revision/branch, including provider-local. This allows running e2e tests in an isolated test environment, fully locally without any infrastructure interaction. The tests perform a set of operations on Shoot clusters, e.g. create, delete, hibernate and wake up.
Below is a sequence describing how the tests are performed.
- Create a
kind
cluster. - Install Gardener version
vX.X.X
. - Run gardener pre-upgrade tests which are labeled with
pre-upgrade
. - Upgrade Gardener version from
vX.X.X
tovY.Y.Y
. - Run gardener post-upgrade tests which are labeled with
post-upgrade
- Tear down seed and kind cluster.
How to Run Upgrade Tests Between Two Gardener Releases
Sometimes, we need to verify/qualify two Gardener releases when we upgrade from one version to another.
This can performed by fetching the two Gardener versions from the GitHub Gardener release page and setting appropriate env variables GARDENER_PREVIOUS_RELEASE
, GARDENER_NEXT_RELEASE
.
GARDENER_PREVIOUS_RELEASE
– This env variable refers to a source revision/branch (or a specific release) which has to be installed first and then upgraded to versionGARDENER_NEXT_RELEASE
. By default, it fetches the latest release version from GitHub Gardener release page.
GARDENER_NEXT_RELEASE
– This env variable refers to the target revision/branch (or a specific release) to be upgraded to after successful installation ofGARDENER_PREVIOUS_RELEASE
. By default, it considers the local HEAD revision, builds code, and installs Gardener from the current revision where the Gardener upgrade tests triggered.
make ci-e2e-kind-upgrade GARDENER_PREVIOUS_RELEASE=v1.60.0 GARDENER_NEXT_RELEASE=v1.61.0
make ci-e2e-kind-ha-single-zone-upgrade GARDENER_PREVIOUS_RELEASE=v1.60.0 GARDENER_NEXT_RELEASE=v1.61.0
make ci-e2e-kind-ha-multi-zone-upgrade GARDENER_PREVIOUS_RELEASE=v1.60.0 GARDENER_NEXT_RELEASE=v1.61.0
Purpose of Upgrade Tests
- Tests will ensure that shoot clusters reconciled with the previous version of Gardener work as expected even with the next Gardener version.
- This will reproduce or catch actual issues faced by end users.
- One of the test cases ensures no downtime is faced by the end-users for shoots while upgrading Gardener if the shoot’s control-plane is configured as HA.
Writing Upgrade Tests
- Tests are divided into two parts and labeled with
pre-upgrade
andpost-upgrade
labels. - An example test case which ensures a shoot which was
hibernated
in a previous Gardener release shouldwakeup
as expected in next release:- Creating a shoot and hibernating a shoot is pre-upgrade test case which should be labeled
pre-upgrade
label. - Then wakeup a shoot and delete a shoot is post-upgrade test case which should be labeled
post-upgrade
label.
- Creating a shoot and hibernating a shoot is pre-upgrade test case which should be labeled
Test Machinery Tests
Please see Test Machinery Tests.
Purpose of Test Machinery Tests
- Test machinery tests have to be executed against full-blown Gardener installations.
- They can provide a very high level of confidence that an installation is functional in its current state, this includes: all Gardener components, Extensions, the used Cloud Infrastructure, all relevant settings/configuration.
- This brings the following benefits:
- They test more realistic scenarios than e2e tests (real configuration, real infrastructure, etc.).
- Tests run “where the users are”.
- However, this also brings significant drawbacks:
- Tests are difficult to develop and maintain.
- Tests require a full Gardener installation and cannot be executed in CI (on PR-level or against master).
- Tests require real infrastructure (think cloud provider credentials, cost).
- Using
TestDefinitions
under.test-defs
requires a full test machinery installation. - Accordingly, tests are heavyweight and expensive to run.
- Testing against real infrastructure can cause flakes sometimes (e.g., in outage situations).
- Failures are hard to debug, because clusters are deleted after the test (for obvious cost reasons).
- Bugs can only be caught, once it’s “too late”, i.e., when code is merged and deployed.
- Today, test machinery tests cover a bigger “test matrix” (e.g., Shoot creation across infrastructures, kubernetes versions, machine image versions, etc.).
- Test machinery also runs Kubernetes conformance tests.
- However, because of the listed drawbacks, we should rather focus on augmenting our e2e tests, as we can run them locally and in CI in order to catch bugs before they get merged.
- It’s still a good idea to add test machinery tests if a feature that is depending on some installation-specific configuration needs to be tested.
Writing Test Machinery Tests
- Generally speaking, most points from writing integration tests and writing e2e tests apply here as well.
- However, test machinery tests contain a lot of technical debt and existing code doesn’t follow these best practices.
- As test machinery tests are out of our general focus, we don’t intend on reworking the tests soon or providing more guidance on how to write new ones.
Manual Tests
- Manual tests can be useful when the cost of trying to automatically test certain functionality are too high.
- Useful for PR verification, if a reviewer wants to verify that all cases are properly tested by automated tests.
- Currently, it’s the simplest option for testing upgrade scenarios.
- e.g. migration coding is probably best tested manually, as it’s a high effort to write an automated test for little benefit
- Obviously, the need for manual tests should be kept at a bare minimum.
- Instead, we should add e2e tests wherever sensible/valuable.
- We want to implement some form of general upgrade tests as part of #6016.
18 - Testmachinery Tests
Test Machinery Tests
In order to automatically qualify Gardener releases, we execute a set of end-to-end tests using Test Machinery. This requires a full Gardener installation including infrastructure extensions, as well as a setup of Test Machinery itself. These tests operate on Shoot clusters across different Cloud Providers, using different supported Kubernetes versions and various configuration options (huge test matrix).
This manual gives an overview about test machinery tests in Gardener.
Structure
Gardener test machinery tests are split into two test suites that can be found under test/testmachinery/suites
:
- The Gardener Test Suite contains all tests that only require a running gardener instance.
- The Shoot Test Suite contains all tests that require a predefined running shoot cluster.
The corresponding tests of a test suite are defined in the import statement of the suite definition (see shoot/run_suite_test.go
)
and their source code can be found under test/testmachinery
.
The test
directory is structured as follows:
test
├── e2e # end-to-end tests (using provider-local)
│ └── shoot
├── framework # helper code shared across integration, e2e and testmachinery tests
├── integration # integration tests (envtests)
│ ├── controllermanager
│ ├── envtest
│ ├── resourcemanager
│ ├── scheduler
│ ├── shootmaintenance
│ └── ...
└── testmachinery # test machinery tests
├── gardener # actual test cases imported by suites/gardener
│ └── security
├── shoots # actual test cases imported by suites/shoot
│ ├── applications
│ ├── care
│ ├── logging
│ ├── operatingsystem
│ ├── operations
│ └── vpntunnel
├── suites # suites that run agains a running garden or shoot cluster
│ ├── gardener
│ └── shoot
└── system # suites that are used for building a full test flow
├── complete_reconcile
├── managed_seed_creation
├── managed_seed_deletion
├── shoot_cp_migration
├── shoot_creation
├── shoot_deletion
├── shoot_hibernation
├── shoot_hibernation_wakeup
└── shoot_update
A suite can be executed by running the suite definition with ginkgo’s focus
and skip
flags
to control the execution of specific labeled test. See the example below:
go test -timeout=0 -mod=vendor ./test/testmachinery/suites/shoot \
--v -ginkgo.v -ginkgo.progress -ginkgo.no-color \
--report-file=/tmp/report.json \ # write elasticsearch formatted output to a file
--disable-dump=false \ # disables dumping of teh current state if a test fails
-kubecfg=/path/to/gardener/kubeconfig \
-shoot-name=<shoot-name> \ # Name of the shoot to test
-project-namespace=<gardener project namespace> \ # Name of the gardener project the test shoot resides
-ginkgo.focus="\[RELEASE\]" \ # Run all tests that are tagged as release
-ginkgo.skip="\[SERIAL\]|\[DISRUPTIVE\]" # Exclude all tests that are tagged SERIAL or DISRUPTIVE
Add a New Test
To add a new test the framework requires the following steps (step 1. and 2. can be skipped if the test is added to an existing package):
- Create a new test file e.g.
test/testmachinery/shoot/security/my-sec-test.go
- Import the test into the appropriate test suite (gardener or shoot):
import _ "github.com/gardener/gardener/test/testmachinery/shoot/security"
- Define your test with the testframework. The framework will automatically add its initialization, cleanup and dump functions.
var _ = ginkgo.Describe("my suite", func(){
f := framework.NewShootFramework(nil)
f.Beta().CIt("my first test", func(ctx context.Context) {
f.ShootClient.Get(xx)
// testing ...
})
})
The newly created test can be tested by focusing the test with the default ginkgo focus f.Beta().FCIt("my first test", func(ctx context.Context)
and running the shoot test suite with:
go test -timeout=0 -mod=vendor ./test/testmachinery/suites/shoot \
--v -ginkgo.v -ginkgo.progress -ginkgo.no-color \
--report-file=/tmp/report.json \ # write elasticsearch formatted output to a file
--disable-dump=false \ # disables dumping of the current state if a test fails
-kubecfg=/path/to/gardener/kubeconfig \
-shoot-name=<shoot-name> \ # Name of the shoot to test
-project-namespace=<gardener project namespace> \
-fenced=<true|false> # Tested shoot is running in a fenced environment and cannot be reached by gardener
or for the gardener suite with:
go test -timeout=0 -mod=vendor ./test/testmachinery/suites/gardener \
--v -ginkgo.v -ginkgo.progress -ginkgo.no-color \
--report-file=/tmp/report.json \ # write elasticsearch formatted output to a file
--disable-dump=false \ # disables dumping of the current state if a test fails
-kubecfg=/path/to/gardener/kubeconfig \
-project-namespace=<gardener project namespace>
⚠️ Make sure that you do not commit any focused specs as this feature is only intended for local development! Ginkgo will fail the test suite if there are any focused specs.
Alternatively, a test can be triggered by specifying a ginkgo focus regex with the name of the test e.g.
go test -timeout=0 -mod=vendor ./test/testmachinery/suites/gardener \
--v -ginkgo.v -ginkgo.progress -ginkgo.no-color \
--report-file=/tmp/report.json \ # write elasticsearch formatted output to a file
-kubecfg=/path/to/gardener/kubeconfig \
-project-namespace=<gardener project namespace> \
-ginkgo.focus="my first test" # regex to match test cases
Test Labels
Every test should be labeled by using the predefined labels available with every framework to have consistent labeling across all test machinery tests.
The labels are applied to every new It()/CIt()
definition by:
f := framework.NewCommonFramework()
f.Default().Serial().It("my test") => "[DEFAULT] [SERIAL] my test"
f := framework.NewShootFramework()
f.Default().Serial().It("my test") => "[DEFAULT] [SERIAL] [SHOOT] my test"
f := framework.NewGardenerFramework()
f.Default().Serial().It("my test") => "[DEFAULT] [GARDENER] [SERIAL] my test"
Labels:
- Beta: Newly created tests with no experience on stableness should be first labeled as beta tests. They should be watched (and probably improved) until stable enough to be promoted to Default.
- Default: Tests that were Beta before and proved to be stable are promoted to Default eventually. Default tests run more often, produce alerts and are considered during the release decision although they don’t necessarily block a release.
- Release: Test are release relevant. A failing Release test blocks the release pipeline. Therefore, these tests need to be stable. Only tests proven to be stable will eventually be promoted to Release.
Behavior Labels:
- Serial: The test should always be executed in serial with no other tests running, as it may impact other tests.
- Destructive: The test is destructive. Which means that is runs with no other tests and may break Gardener or the shoot. Only create such tests if really necessary, as the execution will be expensive (neither Gardener nor the shoot can be reused in this case for other tests).
Framework
The framework directory contains all the necessary functions / utilities for running test machinery tests. For example, there are methods for creation/deletion of shoots, waiting for shoot deletion/creation, downloading/installing/deploying helm charts, logging, etc.
The framework itself consists of 3 different frameworks that expect different prerequisites and offer context specific functionality.
- CommonFramework: The common framework is the base framework that handles logging and setup of commonly needed resources like helm.
It also contains common functions for interacting with Kubernetes clusters like
Waiting for resources to be ready
orExec into a running pod
. - GardenerFramework contains all functions of the common framework and expects a running Gardener instance with the provided Gardener kubeconfig and a project namespace.
It also contains functions to interact with gardener like
Waiting for a shoot to be reconciled
orPatch a shoot
orGet a seed
. - ShootFramework: contains all functions of the common and the gardener framework. It expects a running shoot cluster defined by the shoot’s name and namespace (project namespace). This framework contains functions to directly interact with the specific shoot.
The whole framework also includes commonly used checks, ginkgo wrapper, etc., as well as commonly used tests. Theses common application tests (like the guestbook test) can be used within multiple tests to have a default application (with ingress, deployment, stateful backend) to test external factors.
Config
Every framework commandline flag can also be defined by a configuration file (the value of the configuration file is only used if a flag is not specified by commandline).
The test suite searches for a configuration file (yaml is preferred) if the command line flag --config=/path/to/config/file
is provided.
A framework can be defined in the configuration file by just using the flag name as root key e.g.
verbose: debug
kubecfg: /kubeconfig/path
project-namespace: garden-it
Report
The framework automatically writes the ginkgo default report to stdout and a specifically structured elastichsearch bulk report file to a specified location. The elastichsearch bulk report will write one json document per testcase and injects the metadata of the whole testsuite. An example document for one test case would look like the following document:
{
"suite": {
"name": "Shoot Test Suite",
"phase": "Succeeded",
"tests": 3,
"failures": 1,
"errors": 0,
"time": 87.427
},
"name": "Shoot application testing [DEFAULT] [RELEASE] [SHOOT] should download shoot kubeconfig successfully",
"shortName": "should download shoot kubeconfig successfully",
"labels": [
"DEFAULT",
"RELEASE",
"SHOOT"
],
"phase": "Succeeded",
"time": 0.724512057
}
Resources
The resources directory contains all the templates, helm config files (e.g., repositories.yaml, charts, and cache index which are downloaded upon the start of the test), shoot configs, etc.
resources
├── charts
├── repository
│ └── repositories.yaml
└── templates
├── guestbook-app.yaml.tpl
└── logger-app.yaml.tpl
There are two special directories that are dynamically filled with the correct test files:
- charts - the charts will be downloaded and saved in this directory
- repository - contains the
repository.yaml
file that the target helm repos will be read from and the cache where thestable-index.yaml
file will be created
System Tests
This directory contains the system tests that have a special meaning for the testmachinery with their own Test Definition. Currently, these system tests consist of:
- Shoot creation
- Shoot deletion
- Shoot Kubernetes update
- Gardener Full reconcile check
Shoot Creation Test
Create Shoot test is meant to test shoot creation.
Example Run
go test -mod=vendor -timeout=0 ./test/testmachinery/system/shoot_creation \
--v -ginkgo.v -ginkgo.progress \
-kubecfg=$HOME/.kube/config \
-shoot-name=$SHOOT_NAME \
-cloud-profile=$CLOUDPROFILE \
-seed=$SEED \
-secret-binding=$SECRET_BINDING \
-provider-type=$PROVIDER_TYPE \
-region=$REGION \
-k8s-version=$K8S_VERSION \
-project-namespace=$PROJECT_NAMESPACE \
-annotations=$SHOOT_ANNOTATIONS \
-infrastructure-provider-config-filepath=$INFRASTRUCTURE_PROVIDER_CONFIG_FILEPATH \
-controlplane-provider-config-filepath=$CONTROLPLANE_PROVIDER_CONFIG_FILEPATH \
-workers-config-filepath=$$WORKERS_CONFIG_FILEPATH \
-worker-zone=$ZONE \
-networking-pods=$NETWORKING_PODS \
-networking-services=$NETWORKING_SERVICES \
-networking-nodes=$NETWORKING_NODES \
-start-hibernated=$START_HIBERNATED
Shoot Deletion Test
Delete Shoot test is meant to test the deletion of a shoot.
Example Run
go test -mod=vendor -timeout=0 -ginkgo.v -ginkgo.progress \
./test/testmachinery/system/shoot_deletion \
-kubecfg=$HOME/.kube/config \
-shoot-name=$SHOOT_NAME \
-project-namespace=$PROJECT_NAMESPACE
Shoot Update Test
The Update Shoot test is meant to test the Kubernetes version update of a existing shoot. If no specific version is provided, the next patch version is automatically selected. If there is no available newer version, this test is a noop.
Example Run
go test -mod=vendor -timeout=0 ./test/testmachinery/system/shoot_update \
--v -ginkgo.v -ginkgo.progress \
-kubecfg=$HOME/.kube/config \
-shoot-name=$SHOOT_NAME \
-project-namespace=$PROJECT_NAMESPACE \
-version=$K8S_VERSION
Gardener Full Reconcile Test
The Gardener Full Reconcile test is meant to test if all shoots of a Gardener instance are successfully reconciled.
Example Run
go test -mod=vendor -timeout=0 ./test/testmachinery/system/complete_reconcile \
--v -ginkgo.v -ginkgo.progress \
-kubecfg=$HOME/.kube/config \
-project-namespace=$PROJECT_NAMESPACE \
-gardenerVersion=$GARDENER_VERSION # needed to validate the last acted gardener version of a shoot