Extensions
Extensibility Overview
Initially, everything was developed in-tree in the Gardener project. All cloud providers and the configuration for all the supported operating systems were released together with the Gardener core itself.
But as the project grew, it got more and more difficult to add new providers and maintain the existing code base.
As a consequence and in order to become agile and flexible again, we proposed GEP-1 (Gardener Enhancement Proposal).
The document describes an out-of-tree extension architecture that keeps the Gardener core logic independent of provider-specific knowledge (similar to what Kubernetes has achieved with out-of-tree cloud providers or with CSI volume plugins).
Basic Concepts
Gardener keeps running in the “garden cluster” and implements the core logic of shoot cluster reconciliation / deletion.
Extensions are Kubernetes controllers themselves (like Gardener) and run in the seed clusters.
As usual, we try to use Kubernetes wherever applicable.
We rely on Kubernetes extension concepts in order to enable extensibility for Gardener.
The main ideas of GEP-1 are the following:
During the shoot reconciliation process, Gardener will write CRDs into the seed cluster that are watched and managed by the extension controllers. They will reconcile (based on the .spec
) and report whether everything went well or errors occurred in the CRD’s .status
field.
Gardener keeps deploying the provider-independent control plane components (etcd, kube-apiserver, etc.). However, some of these components might still need little customization by providers, e.g., additional configuration, flags, etc. In this case, the extension controllers register webhooks in order to manipulate the manifests.
Example 1:
Gardener creates a new AWS shoot cluster and requires the preparation of infrastructure in order to proceed (networks, security groups, etc.).
It writes the following CRD into the seed cluster:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--core--aws-01
spec:
type: aws
providerConfig:
apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vpc:
cidr: 10.250.0.0/16
internal:
- 10.250.112.0/22
public:
- 10.250.96.0/22
workers:
- 10.250.0.0/19
zones:
- eu-west-1a
dns:
apiserver: api.aws-01.core.example.com
region: eu-west-1
secretRef:
name: my-aws-credentials
sshPublicKey: |
base64(key)
Please note that the .spec.providerConfig
is a raw blob and not evaluated or known in any way by Gardener.
Instead, it was specified by the user (in the Shoot
resource) and just “forwarded” to the extension controller.
Only the AWS controller understands this configuration and will now start provisioning/reconciling the infrastructure.
It reports in the .status
field the result:
status:
observedGeneration: ...
state: ...
lastError: ..
lastOperation: ...
providerStatus:
apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureStatus
vpc:
id: vpc-1234
subnets:
- id: subnet-acbd1234
name: workers
zone: eu-west-1
securityGroups:
- id: sg-xyz12345
name: workers
iam:
nodesRoleARN: <some-arn>
instanceProfileName: foo
ec2:
keyName: bar
Gardener waits until the .status.lastOperation
/ .status.lastError
indicates that the operation reached a final state and either continuous with the next step, or stops and reports the potential error.
The extension-specific output in .status.providerStatus
is - similar to .spec.providerConfig
- not evaluated, and simply forwarded to CRDs in subsequent steps.
Example 2:
Gardener deploys the control plane components into the seed cluster, e.g. the kube-controller-manager
deployment with the following flags:
apiVersion: apps/v1
kind: Deployment
...
spec:
template:
spec:
containers:
- command:
- /usr/local/bin/kube-controller-manager
- --allocate-node-cidrs=true
- --attach-detach-reconcile-sync-period=1m0s
- --controllers=*,bootstrapsigner,tokencleaner
- --cluster-cidr=100.96.0.0/11
- --cluster-name=shoot--core--aws-01
- --cluster-signing-cert-file=/srv/kubernetes/ca/ca.crt
- --cluster-signing-key-file=/srv/kubernetes/ca/ca.key
- --concurrent-deployment-syncs=10
- --concurrent-replicaset-syncs=10
...
The AWS controller requires some additional flags in order to make the cluster functional.
It needs to provide a Kubernetes cloud-config and also some cloud-specific flags.
Consequently, it registers a MutatingWebhookConfiguration
on Deployment
s and adds these flags to the container:
- --cloud-provider=external
- --external-cloud-volume-plugin=aws
- --cloud-config=/etc/kubernetes/cloudprovider/cloudprovider.conf
Of course, it would have needed to create a ConfigMap
containing the cloud config and to add the proper volume
and volumeMounts
to the manifest as well.
(Please note for this special example: The Kubernetes community is also working on making the kube-controller-manager
provider-independent.
However, there will most probably be still components other than the kube-controller-manager
which need to be adapted by extensions.)
If you are interested in writing an extension, or generally in digging deeper to find out the nitty-gritty details of the extension concepts, please read GEP-1.
We are truly looking forward to your feedback!
Current Status
Meanwhile, the out-of-tree extension architecture of Gardener is in place and has been productively validated. We are tracking all internal and external extensions of Gardener in the Gardener Extensions Library repo.
1 - Access to the Garden Cluster for Extensions
Access to the Garden Cluster for Extensions
Gardener offers different means to provide or equip registered extensions with a kubeconfig which may be used to connect to the garden cluster.
Admission Controllers
For extensions with an admission controller deployment, gardener-operator
injects a token-based kubeconfig as a volume and volume mount.
The token is valid for 12h
, automatically renewed, and associated with a dedicated ServiceAccount
in the garden cluster.
The path to this kubeconfig is revealed under the GARDEN_KUBECONFIG
environment variable, also added to the pod spec(s).
Extensions on Seed
Clusters
Extensions that are installed on seed clusters via a ControllerInstallation
can simply read the kubeconfig file specified by the GARDEN_KUBECONFIG
environment variable to create a garden cluster client.
With this, they use a short-lived token (valid for 12h
) associated with a dedicated ServiceAccount
in the seed-<seed-name>
namespace to securely access the garden cluster.
The used ServiceAccounts
are granted permissions in the garden cluster similar to gardenlet clients.
Background
Historically, gardenlet
has been the only component running in the seed cluster that has access to both the seed cluster and the garden cluster.
Accordingly, extensions running on the seed cluster didn’t have access to the garden cluster.
Starting from Gardener v1.74.0
, there is a new mechanism for components running on seed clusters to get access to the garden cluster.
For this, gardenlet
runs an instance of the TokenRequestor
for requesting tokens that can be used to communicate with the garden cluster.
Using Gardenlet-Managed Garden Access
By default, extensions are equipped with secure access to the garden cluster using a dedicated ServiceAccount
without requiring any additional action.
They can simply read the file specified by the GARDEN_KUBECONFIG
and construct a garden client with it.
When installing a ControllerInstallation
, gardenlet creates two secrets in the installation’s namespace: a generic garden kubeconfig (generic-garden-kubeconfig-<hash>
) and a garden access secret (garden-access-extension
).
Note that the ServiceAccount
created based on this access secret will be created in the respective seed-*
namespace in the garden cluster and labelled with controllerregistration.core.gardener.cloud/name=<name>
.
Additionally, gardenlet injects volume
, volumeMounts
, and two environment variables into all (init) containers in all objects in the apps
and batch
API groups:
GARDEN_KUBECONFIG
: points to the path where the generic garden kubeconfig is mounted.SEED_NAME
: set to the name of the Seed
where the extension is installed.
This is useful for restricting watches in the garden cluster to relevant objects.
If an object already contains the GARDEN_KUBECONFIG
environment variable, it is not overwritten and injection of volume
and volumeMounts
is skipped.
For example, a Deployment
deployed via a ControllerInstallation
will be mutated as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gardener-extension-provider-local
annotations:
reference.resources.gardener.cloud/secret-795f7ca6: garden-access-extension
reference.resources.gardener.cloud/secret-d5f5a834: generic-garden-kubeconfig-81fb3a88
spec:
template:
metadata:
annotations:
reference.resources.gardener.cloud/secret-795f7ca6: garden-access-extension
reference.resources.gardener.cloud/secret-d5f5a834: generic-garden-kubeconfig-81fb3a88
spec:
containers:
- name: gardener-extension-provider-local
env:
- name: GARDEN_KUBECONFIG
value: /var/run/secrets/gardener.cloud/garden/generic-kubeconfig/kubeconfig
- name: SEED_NAME
value: local
volumeMounts:
- mountPath: /var/run/secrets/gardener.cloud/garden/generic-kubeconfig
name: garden-kubeconfig
readOnly: true
volumes:
- name: garden-kubeconfig
projected:
defaultMode: 420
sources:
- secret:
items:
- key: kubeconfig
path: kubeconfig
name: generic-garden-kubeconfig-81fb3a88
optional: false
- secret:
items:
- key: token
path: token
name: garden-access-extension
optional: false
The generic garden kubeconfig will look like this:
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: LS0t...
server: https://garden.local.gardener.cloud:6443
name: garden
contexts:
- context:
cluster: garden
user: extension
name: garden
current-context: garden
users:
- name: extension
user:
tokenFile: /var/run/secrets/gardener.cloud/garden/generic-kubeconfig/token
Manually Requesting a Token for the Garden Cluster
Seed components that need to communicate with the garden cluster can request a token in the garden cluster by creating a garden access secret.
This secret has to be labelled with resources.gardener.cloud/purpose=token-requestor
and resources.gardener.cloud/class=garden
, e.g.:
apiVersion: v1
kind: Secret
metadata:
name: garden-access-example
namespace: example
labels:
resources.gardener.cloud/purpose: token-requestor
resources.gardener.cloud/class: garden
annotations:
serviceaccount.resources.gardener.cloud/name: example
type: Opaque
This will instruct gardenlet to create a new ServiceAccount
named example
in its own seed-<seed-name>
namespace in the garden cluster, request a token for it, and populate the token in the secret’s data under the token
key.
Permissions in the Garden Cluster
Both the SeedAuthorizer
and the SeedRestriction
plugin handle extensions clients and generally grant the same permissions in the garden cluster to them as to gardenlet clients.
With this, extensions are restricted to work with objects in the garden cluster that are related to seed they are running one just like gardenlet.
Note that if the plugins are not enabled, extension clients are only granted read access to global resources like CloudProfiles
(this is granted to all authenticated users).
There are a few exceptions to the granted permissions as documented here.
Additional Permissions
If an extension needs access to additional resources in the garden cluster (e.g., extension-specific custom resources), permissions need to be granted via the usual RBAC means.
Let’s consider the following example: An extension requires the privileges to create authorization.k8s.io/v1.SubjectAccessReview
s (which is not covered by the “default” permissions mentioned above).
This requires a human Gardener operator to create a ClusterRole
in the garden cluster with the needed rules:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: extension-create-subjectaccessreviews
annotations:
authorization.gardener.cloud/extensions-serviceaccount-selector: '{"matchLabels":{"controllerregistration.core.gardener.cloud/name":"<extension-name>"}}'
labels:
authorization.gardener.cloud/custom-extensions-permissions: "true"
rules:
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
Note the label authorization.gardener.cloud/extensions-serviceaccount-selector
which contains a label selector for ServiceAccount
s.
There is a controller part of gardener-controller-manager
which takes care of maintaining the respective ClusterRoleBinding
resources.
It binds all ServiceAccount
s in the seed namespaces in the garden cluster (i.e., all extension clients) whose labels match.
You can read more about this controller here.
Custom Permissions
If an extension wants to create a dedicated ServiceAccount
for accessing the garden cluster without automatically inheriting all permissions of the gardenlet, it first needs to create a garden access secret in its extension namespace in the seed cluster:
apiVersion: v1
kind: Secret
metadata:
name: my-custom-component
namespace: <extension-namespace>
labels:
resources.gardener.cloud/purpose: token-requestor
resources.gardener.cloud/class: garden
annotations:
serviceaccount.resources.gardener.cloud/name: my-custom-component-extension-foo
serviceaccount.resources.gardener.cloud/labels: '{"foo":"bar}'
type: Opaque
❗️️Do not prefix the service account name with extension-
to prevent inheriting the gardenlet permissions! It is still recommended to add the extension name (e.g., as a suffix) for easier identification where this ServiceAccount
comes from.
Next, you can follow the same approach described above.
However, the authorization.gardener.cloud/extensions-serviceaccount-selector
annotation should not contain controllerregistration.core.gardener.cloud/name=<extension-name>
but rather custom labels, e.g. foo=bar
.
This way, the created ServiceAccount
will only get the permissions of above ClusterRole
and nothing else.
Renewing All Garden Access Secrets
Operators can trigger an automatic renewal of all garden access secrets in a given Seed
and their requested ServiceAccount
tokens, e.g., when rotating the garden cluster’s ServiceAccount
signing key.
For this, the Seed
has to be annotated with gardener.cloud/operation=renew-garden-access-secrets
.
2 - Admission
Extension Admission
The extensions are expected to validate their respective resources for their extension specific configurations, when the resources are newly created or updated. For example, provider extensions would validate spec.provider.infrastructureConfig
and spec.provider.controlPlaneConfig
in the Shoot
resource and spec.providerConfig
in the CloudProfile
resource, networking extensions would validate spec.networking.providerConfig
in the Shoot
resource. As best practice, the validation should be performed only if there is a change in the spec
of the resource. Please find an exemplary implementation in the gardener/gardener-extension-provider-aws repository.
When a resource is newly created or updated, Gardener adds an extension label for all the extension types referenced in the spec
of the resource. This label is of the form <extension-type>.extensions.gardener.cloud/<extension-name> : "true"
. For example, an extension label for a provider extension type aws
looks like provider.extensions.gardener.cloud/aws : "true"
. The extensions should add object selectors in their admission webhooks for these labels, to filter out the objects they are responsible for. At present, these labels are added to BackupEntry
s, BackupBucket
s, CloudProfile
s, Seed
s, SecretBinding
s and Shoot
s. Please see the types_constants.go file for the full list of extension labels.
3 - BackupBucket
Contract: BackupBucket
Resource
The Gardener project features a sub-project called etcd-backup-restore to take periodic backups of etcd backing Shoot clusters. It demands the bucket (or its equivalent in different object store providers) to be created and configured externally with appropriate credentials. The BackupBucket
resource takes this responsibility in Gardener.
Before introducing the BackupBucket
extension resource, Gardener was using Terraform in order to create and manage these provider-specific resources (e.g., see AWS Backup).
Now, Gardener commissions an external, provider-specific controller to take over this task. You can also refer to backupInfra proposal documentation to get an idea about how the transition was done and understand the resource in a broader scope.
What Is the Scope of a Bucket?
A bucket will be provisioned per Seed
. So, a backup of every Shoot
created on that Seed
will be stored under a different shoot specific prefix under the bucket.
For the backup of the Shoot
rescheduled on different Seed
, it will continue to use the same bucket.
What Is the Lifespan of a BackupBucket
?
The bucket associated with BackupBucket
will be created at the creation of the Seed
. And as per current implementation, it will also be deleted on deletion of the Seed
, if there isn’t any BackupEntry
resource associated with it.
In the future, we plan to introduce a schedule for BackupBucket
- the deletion logic for the BackupBucket
resource, which will reschedule it on different available Seed
s on deletion or failure of a health check for the currently associated seed
. In that case, the BackupBucket
will be deleted only if there isn’t any schedulable Seed
available and there isn’t any associated BackupEntry
resource.
What Needs to Be Implemented to Support a New Infrastructure Provider?
As part of the seed flow, Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: BackupBucket
metadata:
name: foo
spec:
type: azure
providerConfig:
<some-optional-provider-specific-backupbucket-configuration>
region: eu-west-1
secretRef:
name: backupprovider
namespace: shoot--foo--bar
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used to create the needed resources. This provider secret will be configured by the Gardener operator in the Seed
resource and propagated over there by the seed controller.
After your controller has created the required bucket, if required, it generates the secret to access the objects in the bucket and put a reference to it in status
. This secret is supposed to be used by Gardener, or eventually a BackupEntry
resource and etcd-backup-restore component, to backup the etcd.
In order to support a new infrastructure provider, you need to write a controller that watches all BackupBucket
s with .spec.type=<my-provider-name>
. You can take a look at the below referenced example implementation for the Azure provider.
References and Additional Resources
4 - BackupEntry
Contract: BackupEntry
Resource
The Gardener project features a sub-project called etcd-backup-restore to take periodic backups of etcd backing Shoot clusters. It demands the bucket (or its equivalent in different object store providers) access credentials to be created and configured externally with appropriate credentials. The BackupEntry
resource takes this responsibility in Gardener to provide this information by creating a secret specific to the component.
That being said, the core motivation for introducing this resource was to support retention of backups post deletion of Shoot
. The etcd-backup-restore components take responsibility of garbage collecting old backups out of the defined period. Once a shoot is deleted, we need to persist the backups for few days. Hence, Gardener uses the BackupEntry
resource for this housekeeping work post deletion of a Shoot
. The BackupEntry
resource is responsible for shoot specific prefix under referred bucket.
Before introducing the BackupEntry
extension resource, Gardener was using Terraform in order to create and manage these provider-specific resources (e.g., see AWS Backup).
Now, Gardener commissions an external, provider-specific controller to take over this task. You can also refer to backupInfra proposal documentation to get idea about how the transition was done and understand the resource in broader scope.
What Is the Lifespan of a BackupEntry
?
The bucket associated with BackupEntry
will be created by using a BackupBucket
resource. The BackupEntry
resource will be created as a part of the Shoot
creation. But resources might continue to exist post deletion of a Shoot
(see gardenlet for more details).
What Needs to be Implemented to Support a New Infrastructure Provider?
As part of the shoot flow, Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: BackupEntry
metadata:
name: shoot--foo--bar
spec:
type: azure
providerConfig:
<some-optional-provider-specific-backup-bucket-configuration>
backupBucketProviderStatus:
<some-optional-provider-specific-backup-bucket-status>
region: eu-west-1
bucketName: foo
secretRef:
name: backupprovider
namespace: shoot--foo--bar
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used to create the needed resources. This provider secret will be propagated from the BackupBucket
resource by the shoot controller.
Your controller is supposed to create the etcd-backup
secret in the control plane namespace of a shoot. This secret is supposed to be used by Gardener or eventually by the etcd-backup-restore component to backup the etcd. The controller implementation should clean up the objects created under the shoot specific prefix in the bucket equivalent to the name of the BackupEntry
resource.
In order to support a new infrastructure provider, you need to write a controller that watches all the BackupBucket
s with .spec.type=<my-provider-name>
. You can take a look at the below referenced example implementation for the Azure provider.
References and Additional Resources
5 - Bastion
Contract: Bastion
Resource
The Gardener project allows users to connect to Shoot worker nodes via SSH. As nodes are usually firewalled and not directly accessible from the public internet, GEP-15 introduced the concept of “Bastions”. A bastion is a dedicated server that only serves to allow SSH ingress to the worker nodes.
Bastion
resources contain the user’s public SSH key and IP address, in order to provision the server accordingly: The public key is put onto the Bastion and SSH ingress is only authorized for the given IP address (in fact, it’s not a single IP address, but a set of IP ranges, however for most purposes a single IP is be used).
What Is the Lifespan of a Bastion
?
Once a Bastion
has been created in the garden, it will be replicated to the appropriate seed cluster, where a controller then reconciles a server and firewall rules etc., on the cloud provider used by the target Shoot. When the Bastion is ready (i.e. has a public IP), that IP is stored in the Bastion
’s status and from there it is picked up by the garden cluster and gardenctl
eventually.
To make multiple SSH sessions possible, the existence of the Bastion
is not directly tied to the execution of gardenctl
: users can exit out of gardenctl
and use ssh
manually to connect to the bastion and worker nodes.
However, Bastion
s have an expiry date, after which they will be garbage collected.
When SSH access is set to false
for the Shoot
in the workers settings (see Shoot Worker Nodes Settings), Bastion
resources are deleted during Shoot
reconciliation and new Bastion
s are prevented from being created.
What Needs to Be Implemented to Support a New Infrastructure Provider?
As part of the shoot flow, Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Bastion
metadata:
name: mybastion
namespace: shoot--foo--bar
spec:
type: aws
# userData is base64-encoded cloud provider user data; this contains the
# user's SSH key
userData: IyEvYmluL2Jhc2ggL....Nlcgo=
ingress:
- ipBlock:
cidr: 192.88.99.0/32 # this is most likely the user's IP address
Your controller is supposed to create a new instance at the given cloud provider, firewall it to only allow SSH (TCP port 22) from the given IP blocks, and then configure the firewall for the worker nodes to allow SSH from the bastion instance. When a Bastion
is deleted, all these changes need to be reverted.
Implementation Details
ConfigValidator
Interface
For bastion controllers, the generic Reconciler
also delegates to a ConfigValidator
interface that contains a single Validate
method. This method is called by the generic Reconciler
at the beginning of every reconciliation, and can be implemented by the extension to validate the .spec.providerConfig
part of the Bastion
resource with the respective cloud provider, typically the existence and validity of cloud provider resources such as VPCs, images, etc.
The Validate
method returns a list of errors. If this list is non-empty, the generic Reconciler
will fail with an error. This error will have the error code ERR_CONFIGURATION_PROBLEM
, unless there is at least one error in the list that has its ErrorType
field set to field.ErrorTypeInternal
.
References and Additional Resources
6 - CA Rotation
CA Rotation in Extensions
GEP-18 proposes adding support for automated rotation of Shoot cluster certificate authorities (CAs).
This document outlines all the requirements that Gardener extensions need to fulfill in order to support the CA rotation feature.
Requirements for Shoot Cluster CA Rotation
- Extensions must not rely on static CA
Secret
names managed by the gardenlet, because their names are changing during CA rotation. - Extensions cannot issue or use client certificates for authenticating against shoot API servers. Instead, they should use short-lived auto-rotated
ServiceAccount
tokens via gardener-resource-manager’s TokenRequestor
. Also see Conventions and TokenRequestor
documents. - Extensions need to generate dedicated CAs for signing server certificates (e.g.
cloud-controller-manager
). There should be one CA per controller and purpose in order to bind the lifecycle to the reconciliation cycle of the respective object for which it is created. - CAs managed by extensions should be rotated in lock-step with the shoot cluster CA.
When the user triggers a rotation, the gardenlet writes phase and initiation time to
Shoot.status.credentials.rotation.certificateAuthorities.{phase,lastInitiationTime}
. See GEP-18 for a detailed description on what needs to happen in each phase.
Extensions can retrieve this information from Cluster.shoot.status
.
Utilities for Secrets Management
In order to fulfill the requirements listed above, extension controllers can reuse the SecretsManager
that the gardenlet uses to manage all shoot cluster CAs, certificates, and other secrets as well.
It implements the core logic for managing secrets that need to be rotated, auto-renewed, etc.
Additionally, there are utilities for reusing SecretsManager
in extension controllers.
They already implement the above requirements based on the Cluster
resource and allow focusing on the extension controllers’ business logic.
For example, a simple SecretsManager
usage in an extension controller could look like this:
const (
// identity for SecretsManager instance in ControlPlane controller
identity = "provider-foo-controlplane"
// secret config name of the dedicated CA
caControlPlaneName = "ca-provider-foo-controlplane"
)
func Reconcile() {
var (
cluster *extensionscontroller.Cluster
client client.Client
// define wanted secrets with options
secretConfigs = []extensionssecretsmanager.SecretConfigWithOptions{
{
// dedicated CA for ControlPlane controller
Config: &secretutils.CertificateSecretConfig{
Name: caControlPlaneName,
CommonName: "ca-provider-foo-controlplane",
CertType: secretutils.CACert,
},
// persist CA so that it gets restored on control plane migration
Options: []secretsmanager.GenerateOption{secretsmanager.Persist()},
},
{
// server cert for control plane component
Config: &secretutils.CertificateSecretConfig{
Name: "cloud-controller-manager",
CommonName: "cloud-controller-manager",
DNSNames: kutil.DNSNamesForService("cloud-controller-manager", namespace),
CertType: secretutils.ServerCert,
},
// sign with our dedicated CA
Options: []secretsmanager.GenerateOption{secretsmanager.SignedByCA(caControlPlaneName)},
},
}
)
// initialize SecretsManager based on Cluster object
sm, err := extensionssecretsmanager.SecretsManagerForCluster(ctx, logger.WithName("secretsmanager"), clock.RealClock{}, client, cluster, identity, secretConfigs)
// generate all wanted secrets (first CAs, then the rest)
secrets, err := extensionssecretsmanager.GenerateAllSecrets(ctx, sm, secretConfigs)
// cleanup any secrets that are not needed any more (e.g. after rotation)
err = sm.Cleanup(ctx)
}
Please pay attention to the following points:
- There should be one
SecretsManager
identity per controller (and purpose if applicable) in order to prevent conflicts between different instances.
E.g., there should be different identities for Infrastructrue
, Worker
controller, etc., and the ControlPlane
controller should use dedicated SecretsManager
identities per purpose (e.g. provider-foo-controlplane
and provider-foo-controlplane-exposure
). - All other points in Reusing the SecretsManager in Other Components.
7 - Cluster
Cluster
Resource
As part of the extensibility epic, a lot of responsibility that was previously taken over by Gardener directly has now been shifted to extension controllers running in the seed clusters.
These extensions often serve a well-defined purpose, e.g. the management of DNS records, infrastructure, etc.
We have introduced a couple of extension CRDs in the seeds whose specification is written by Gardener, and which are acted up by the extensions.
However, the extensions sometimes require more information that is not directly part of the specification.
One example of that is the GCP infrastructure controller which needs to know the shoot’s pod and service network.
Another example is the Azure infrastructure controller which requires some information out of the CloudProfile
resource.
The problem is that Gardener does not know which extension requires which information so that it can write it into their specific CRDs.
In order to deal with this problem we have introduced the Cluster
extension resource.
This CRD is written into the seeds, however, it does not contain a status
, so it is not expected that something acts upon it.
Instead, you can treat it like a ConfigMap
which contains data that might be interesting for you.
In the context of Gardener, seeds and shoots, and extensibility the Cluster
resource contains the CloudProfile
, Seed
, and Shoot
manifest.
Extension controllers can take whatever information they want out of it that might help completing their individual tasks.
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Cluster
metadata:
name: shoot--foo--bar
spec:
cloudProfile:
apiVersion: core.gardener.cloud/v1beta1
kind: CloudProfile
...
seed:
apiVersion: core.gardener.cloud/v1beta1
kind: Seed
...
shoot:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
The resource is written by Gardener before it starts the reconciliation flow of the shoot.
⚠️ All Gardener components use the core.gardener.cloud/v1beta1
version, i.e., the Cluster
resource will contain the objects in this version.
There are some fields in the Shoot
specification that might be interesting to take into account.
.spec.hibernation.enabled={true,false}
: Extension controllers might want to behave differently if the shoot is hibernated or not (probably they might want to scale down their control plane components, for example)..status.lastOperation.state=Failed
: If Gardener sets the shoot’s last operation state to Failed
, it means that Gardener won’t automatically retry to finish the reconciliation/deletion flow because an error occurred that could not be resolved within the last 24h
(default). In this case, end-users are expected to manually re-trigger the reconciliation flow in case they want Gardener to try again. Extension controllers are expected to follow the same principle. This means they have to read the shoot state out of the Cluster
resource.
Extension Resources Not Associated with a Shoot
In some cases, Gardener may create extension resources that are not associated with a shoot, but are needed to support some functionality internal to Gardener. Such resources will be created in the garden
namespace of a seed cluster.
For example, if the managed ingress controller is active on the seed, Gardener will create a DNSRecord resource(s) in the garden
namespace of the seed cluster for the ingress DNS record.
Extension controllers that may be expected to reconcile extension resources in the garden
namespace should make sure that they can tolerate the absence of a cluster resource. This means that they should not attempt to read the cluster resource in such cases, or if they do they should ignore the “not found” error.
References and Additional Resources
8 - ContainerRuntime
Gardener Container Runtime Extension
At the lowest layers of a Kubernetes node is the software that, among other things, starts and stops containers. It is called “Container Runtime”.
The most widely known container runtime is Docker, but it is not alone in this space. In fact, the container runtime space has been rapidly evolving.
Kubernetes supports different container runtimes using Container Runtime Interface (CRI) – a plugin interface which enables kubelet to use a wide variety of container runtimes.
Gardener supports creation of Worker machines using CRI. For more information, see CRI Support.
Motivation
Prior to the Container Runtime Extensibility
concept, Gardener used Docker as the only
container runtime to use in shoot worker machines. Because of the wide variety of different container runtimes
offering multiple important features (for example, enhanced security concepts), it is important to enable end users to use other container runtimes as well.
The ContainerRuntime
Extension Resource
Here is what a typical ContainerRuntime
resource would look like:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: ContainerRuntime
metadata:
name: my-container-runtime
spec:
binaryPath: /var/bin/containerruntimes
type: gvisor
workerPool:
name: worker-ubuntu
selector:
matchLabels:
worker.gardener.cloud/pool: worker-ubuntu
Gardener deploys one ContainerRuntime
resource per worker pool per CRI.
To exemplify this, consider a Shoot having two worker pools (worker-one
, worker-two
) using containerd
as the CRI as well as gvisor
and kata
as enabled container runtimes.
Gardener would deploy four ContainerRuntime
resources. For worker-one
: one ContainerRuntime
for type gvisor
and one for type kata
. The same resource are being deployed for worker-two
.
Supporting a New Container Runtime Provider
To add support for another container runtime (e.g., gvisor, kata-containers), a container runtime extension controller needs to be implemented. It should support Gardener’s supported CRI plugins.
The container runtime extension should install the necessary resources into the shoot cluster (e.g., RuntimeClass
es), and it should copy the runtime binaries to the relevant worker machines in path: spec.binaryPath
.
Gardener labels the shoot nodes according to the CRI configured: worker.gardener.cloud/cri-name=<value>
(e.g worker.gardener.cloud/cri-name=containerd
) and multiple labels for each of the container runtimes configured for the shoot Worker machine:
containerruntime.worker.gardener.cloud/<container-runtime-type-value>=true
(e.g containerruntime.worker.gardener.cloud/gvisor=true
).
The way to install the binaries is by creating a daemon set which copies the binaries from an image in a docker registry to the relevant labeled Worker’s nodes (avoid downloading binaries from the internet to also cater with isolated environments).
For additional reference, please have a look at the runtime-gvsior provider extension, which provides more information on how to configure the necessary charts, as well as the actuators required to reconcile container runtime inside the Shoot
cluster to the desired state.
9 - ControllerRegistration
Registering Extension Controllers
Extensions are registered in the garden cluster via ControllerRegistration
resources.
Deployment for respective extensions are specified via ControllerDeployment
resources.
Gardener evaluates the registrations and deployments and creates ControllerInstallation
resources which describe the request “please install this controller X
to this seed Y
”.
Similar to how CloudProfile
or Seed
resources get into the system, the Gardener administrator must deploy the ControllerRegistration
and ControllerDeployment
resources (this does not happen automatically in any way - the administrator decides which extensions shall be enabled).
The specification mainly describes which of Gardener’s extension CRDs are managed, for example:
apiVersion: core.gardener.cloud/v1
kind: ControllerDeployment
metadata:
name: os-gardenlinux
helm:
ociRepository:
ref: registry.example.com/os-gardenlinux/charts/os-gardenlinux:1.0.0
# or a base64-encoded, gzip'ed, tar'ed extension controller chart
# rawChart: H4sIFAAAAAAA/yk...
values:
foo: bar
---
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
metadata:
name: os-gardenlinux
spec:
deployment:
deploymentRefs:
- name: os-gardenlinux
resources:
- kind: OperatingSystemConfig
type: gardenlinux
primary: true
This information tells Gardener that there is an extension controller that can handle OperatingSystemConfig
resources of type gardenlinux
.
A reference to the shown ControllerDeployment
specifies how the deployment of the extension controller is accomplished.
Also, it specifies that this controller is the primary one responsible for the lifecycle of the OperatingSystemConfig
resource.
Setting primary
to false
would allow to register additional, secondary controllers that may also watch/react on the OperatingSystemConfig/coreos
resources, however, only the primary controller may change/update the main status
of the extension object (that are used to “communicate” with the gardenlet).
Particularly, only the primary controller may set .status.lastOperation
, .status.lastError
, .status.observedGeneration
, and .status.state
.
Secondary controllers may contribute to the .status.conditions[]
if they like, of course.
Secondary controllers might be helpful in scenarios where additional tasks need to be completed which are not part of the reconciliation logic of the primary controller but separated out into a dedicated extension.
⚠️ There must be exactly one primary controller for every registered kind/type combination.
Also, please note that the primary
field cannot be changed after creation of the ControllerRegistration
.
Deploying Extension Controllers
Submitting the above ControllerDeployment
and ControllerRegistration
will create a ControllerInstallation
resource:
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerInstallation
metadata:
name: os-gardenlinux
spec:
deploymentRef:
name: os-gardenlinux
registrationRef:
name: os-gardenlinux
seedRef:
name: aws-eu1
This resource expresses that Gardener requires the os-gardenlinux
extension controller to run on the aws-eu1
seed cluster.
gardener-controller-manager automatically determines which extension is required on which seed cluster and will only create ControllerInstallation
objects for those.
Also, it will automatically delete ControllerInstallation
s referencing extension controllers that are no longer required on a seed (e.g., because all shoots on it have been deleted).
There are additional configuration options, please see the Deployment Configuration Options section.
After gardener-controller-manager has written the ControllerInstallation
resource, gardenlet picks it up and installs the controller on the respective Seed
using the referenced ControllerDeployment
.
It is sufficient to create a Helm chart and deploy it together with some static configuration values.
For this, operators have to provide the deployment information in the ControllerDeployment.helm
section:
...
helm:
rawChart: H4sIFAAAAAAA/yk...
values:
foo: bar
You can check out hack/generate-controller-registration.yaml
for generating a ControllerDeployment
including a controller helm chart.
If ControllerDeployment.helm
is specified, gardenlet either decodes the provided Helm chart (.helm.rawChart
) or pulls the chart from the referenced OCI Repository (.helm.ociRepository
).
When referencing an OCI Repository, you have several options in how to specify where to pull the chart:
helm:
ociRepository:
# full ref with either tag or digest, or both
ref: registry.example.com/foo:1.0.0@sha256:abc
---
helm:
ociRepository:
# repository and tag
repository: registry.example.com
tag: 1.0.0
---
helm:
ociRepository:
# repository and digest
repository: registry.example.com
digest: sha256:abc
---
helm:
ociRepository:
# when specifying both tag and digest, the tag is ignored.
repository: registry.example.com
tag: 1.0.0
digest: sha256:abc
Gardenlet caches the downloaded chart in memory. It is recommended to always specify a digest, because if it is not specified, gardenlet needs to fetch the manifest in every reconciliation to compare the digest with the local cache.
No matter where the chart originates from, gardenlet deploys it with the provided static configuration (.helm.values
).
The chart and the values can be updated at any time - Gardener will recognize it and re-trigger the deployment process.
In order to allow extensions to get information about the garden and the seed cluster, gardenlet mixes in certain properties into the values (root level) of every deployed Helm chart:
gardener:
version: <gardener-version>
garden:
clusterIdentity: <uuid-of-gardener-installation>
genericKubeconfigSecretName: <generic-garden-kubeconfig-secret-name>
seed:
name: <seed-name>
clusterIdentity: <seed-cluster-identity>
annotations: <seed-annotations>
labels: <seed-labels>
provider: <seed-provider-type>
region: <seed-region>
volumeProvider: <seed-first-volume-provider>
volumeProviders: <seed-volume-providers>
ingressDomain: <seed-ingress-domain>
protected: <seed-protected-taint>
visible: <seed-visible-setting>
taints: <seed-taints>
networks: <seed-networks>
blockCIDRs: <seed-networks-blockCIDRs>
spec: <seed-spec>
gardenlet:
featureGates: <gardenlet-feature-gates>
Extensions can use this information in their Helm chart in case they require knowledge about the garden and the seed environment.
The list might be extended in the future.
gardenlet reports whether the extension controller has been installed successfully and running in the ControllerInstallation
status:
status:
conditions:
- lastTransitionTime: "2024-05-16T13:04:16Z"
lastUpdateTime: "2024-05-16T13:04:16Z"
message: The controller running in the seed cluster is healthy.
reason: ControllerHealthy
status: "True"
type: Healthy
- lastTransitionTime: "2024-05-16T13:04:06Z"
lastUpdateTime: "2024-05-16T13:04:06Z"
message: The controller was successfully installed in the seed cluster.
reason: InstallationSuccessful
status: "True"
type: Installed
- lastTransitionTime: "2024-05-16T13:04:16Z"
lastUpdateTime: "2024-05-16T13:04:16Z"
message: The controller has been rolled out successfully.
reason: ControllerRolledOut
status: "False"
type: Progressing
- lastTransitionTime: "2024-05-16T13:03:39Z"
lastUpdateTime: "2024-05-16T13:03:39Z"
message: chart could be rendered successfully.
reason: RegistrationValid
status: "True"
type: Valid
Deployment Configuration Options
The .spec.deployment
resource allows to configure a deployment policy
.
There are the following policies:
OnDemand
(default): Gardener will demand the deployment and deletion of the extension controller to/from seed clusters dynamically. It will automatically determine (based on other resources like Shoot
s) whether it is required and decide accordingly.Always
: Gardener will demand the deployment of the extension controller to seed clusters independent of whether it is actually required or not. This might be helpful if you want to add a new component/controller to all seed clusters by default. Another use-case is to minimize the durations until extension controllers get deployed and ready in case you have highly fluctuating seed clusters.AlwaysExceptNoShoots
: Similar to Always
, but if the seed does not have any shoots, then the extension is not being deployed. It will be deleted from a seed after the last shoot has been removed from it.
Also, the .spec.deployment.seedSelector
allows to specify a label selector for seed clusters.
Only if it matches the labels of a seed, then it will be deployed to it.
Please note that a seed selector can only be specified for secondary controllers (primary=false
for all .spec.resources[]
).
Extensions in the Garden Cluster Itself
The Shoot
resource itself will contain some provider-specific data blobs.
As a result, some extensions might also want to run in the garden cluster, e.g., to provide ValidatingWebhookConfiguration
s for validating the correctness of their provider-specific blobs:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: johndoe-aws
namespace: garden-dev
spec:
...
cloud:
type: aws
region: eu-west-1
providerConfig:
apiVersion: aws.cloud.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vpc: # specify either 'id' or 'cidr'
# id: vpc-123456
cidr: 10.250.0.0/16
internal:
- 10.250.112.0/22
public:
- 10.250.96.0/22
workers:
- 10.250.0.0/19
zones:
- eu-west-1a
...
In the above example, Gardener itself does not understand the AWS-specific provider configuration for the infrastructure.
However, if this part of the Shoot
resource should be validated, then you should run an AWS-specific component in the garden cluster that registers a webhook. You can do it similarly if you want to default some fields of a resource (by using a MutatingWebhookConfiguration
).
Again, similar to how Gardener is deployed to the garden cluster, these components must be deployed and managed by the Gardener administrator.
Extension
Resource Configurations
The Extension
resource allows injecting arbitrary steps into the shoot reconciliation flow that are unknown to Gardener.
Hence, it is slightly special and allows further configuration when registering it:
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
metadata:
name: extension-foo
spec:
resources:
- kind: Extension
type: foo
primary: true
globallyEnabled: true
reconcileTimeout: 30s
lifecycle:
reconcile: AfterKubeAPIServer
delete: BeforeKubeAPIServer
migrate: BeforeKubeAPIServer
The globallyEnabled=true
option specifies that the Extension/foo
object shall be created by default for all shoots (unless they opted out by setting .spec.extensions[].enabled=false
in the Shoot
spec).
The reconcileTimeout
tells Gardener how long it should wait during its shoot reconciliation flow for the Extension/foo
’s reconciliation to finish.
Extension
Lifecycle
The lifecycle
field tells Gardener when to perform a certain action on the Extension
resource during the reconciliation flows. If omitted, then the default behaviour will be applied. Please find more information on the defaults in the explanation below. Possible values for each control flow are AfterKubeAPIServer
, BeforeKubeAPIServer
, and AfterWorker
. Let’s take the following configuration and explain it.
...
lifecycle:
reconcile: AfterKubeAPIServer
delete: BeforeKubeAPIServer
migrate: BeforeKubeAPIServer
reconcile: AfterKubeAPIServer
means that the extension resource will be reconciled after the successful reconciliation of the kube-apiserver
during shoot reconciliation. This is also the default behaviour if this value is not specified. During shoot hibernation, the opposite rule is applied, meaning that in this case the reconciliation of the extension will happen before the kube-apiserver
is scaled to 0 replicas. On the other hand, if the extension needs to be reconciled before the kube-apiserver
and scaled down after it, then the value BeforeKubeAPIServer
should be used.delete: BeforeKubeAPIServer
means that the extension resource will be deleted before the kube-apiserver
is destroyed during shoot deletion. This is the default behaviour if this value is not specified.migrate: BeforeKubeAPIServer
means that the extension resource will be migrated before the kube-apiserver
is destroyed in the source cluster during control plane migration. This is the default behaviour if this value is not specified. The restoration of the control plane follows the reconciliation control flow.
The lifecycle value AfterWorker
is only available during reconcile
. When specified, the extension resource will be reconciled after the workers are deployed. This is useful for extensions that want to deploy a workload in the shoot control plane and want to wait for the workload to run and get ready on a node. During shoot creation the extension will start its reconciliation before the first workers have joined the cluster, they will become available at some later point.
10 - ControlPlane
Contract: ControlPlane
Resource
Most Kubernetes clusters require a cloud-controller-manager
or CSI drivers in order to work properly.
Before introducing the ControlPlane
extension resource Gardener was having several different Helm charts for the cloud-controller-manager
deployments for the various providers.
Now, Gardener commissions an external, provider-specific controller to take over this task.
Which control plane resources are required?
As mentioned in the controlplane customization webhooks document, Gardener shall not deploy any cloud-controller-manager
or any other provider-specific component.
Instead, it creates a ControlPlane
CRD that should be picked up by provider extensions.
Its purpose is to trigger the deployment of such provider-specific components in the shoot namespace in the seed cluster.
What needs to be implemented to support a new infrastructure provider?
As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: ControlPlane
metadata:
name: control-plane
namespace: shoot--foo--bar
spec:
type: openstack
region: europe-west1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
providerConfig:
apiVersion: openstack.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
loadBalancerProvider: provider
zone: eu-1a
cloudControllerManager:
featureGates:
CustomResourceValidation: true
infrastructureProviderStatus:
apiVersion: openstack.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureStatus
networks:
floatingPool:
id: vpc-1234
subnets:
- purpose: nodes
id: subnetid
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used for the shoot cluster.
However, the most important section is the .spec.providerConfig
and the .spec.infrastructureProviderStatus
.
The first one contains an embedded declaration of the provider specific configuration for the control plane (that cannot be known by Gardener itself).
You are responsible for designing how this configuration looks like.
Gardener does not evaluate it but just copies this part from what has been provided by the end-user in the Shoot
resource.
The second one contains the output of the Infrastructure
resource (that might be relevant for the CCM config).
In order to support a new control plane provider, you need to write a controller that watches all ControlPlane
s with .spec.type=<my-provider-name>
.
You can take a look at the below referenced example implementation for the Alicloud provider.
The control plane controller as part of the ControlPlane
reconciliation often deploys resources (e.g. pods/deployments) into the Shoot namespace in the Seed
as part of its ControlPlane
reconciliation loop.
Because the namespace contains network policies that per default deny all ingress and egress traffic,
the pods may need to have proper labels matching to the selectors of the network policies in order to allow the required network traffic.
Otherwise, they won’t be allowed to talk to certain other components (e.g., the kube-apiserver of the shoot).
For more information, see NetworkPolicy
s In Garden, Seed, Shoot Clusters.
Most providers might require further information that is not provider specific but already part of the shoot resource.
One example for this is the GCP control plane controller, which needs the Kubernetes version of the shoot cluster (because it already uses the in-tree Kubernetes cloud-controller-manager).
As Gardener cannot know which information is required by providers, it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information that is not part of the Infrastructure
resource itself.
References and Additional Resources
11 - ControlPlane Exposure
Contract: ControlPlane
Resource with Purpose exposure
Some Kubernetes clusters require an additional deployments required by the seed cloud provider in order to work properly, e.g. AWS Load Balancer Readvertiser.
Before using ControlPlane resources with purpose exposure
, Gardener was having different Helm charts for the deployments for the various providers.
Now, Gardener commissions an external, provider-specific controller to take over this task.
Which control plane resources are required?
As mentioned in the controlplane document, Gardener shall not deploy any other provider-specific component.
Instead, it creates a ControlPlane
CRD with purpose exposure
that should be picked up by provider extensions.
Its purpose is to trigger the deployment of such provider-specific components in the shoot namespace in the seed cluster that are needed to expose the kube-apiserver.
The shoot cluster’s kube-apiserver are exposed via a Service
of type LoadBalancer
from the shoot provider (you may run the control plane of an Azure shoot in a GCP seed). It’s the seed provider extension controller that should act on the ControlPlane
resources with purpose exposure
.
If SNI is enabled, then the Service
from above is of type ClusterIP
and Gardner will not create ControlPlane
resources with purpose exposure
.
What needs to be implemented to support a new infrastructure provider?
As part of the shoot flow, Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: ControlPlane
metadata:
name: control-plane-exposure
namespace: shoot--foo--bar
spec:
type: aws
purpose: exposure
region: europe-west1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used for the shoot cluster.
It is most likely not needed, however, still added for some potential corner cases.
If you don’t need it, then just ignore it.
The .spec.region
contains the region of the seed cluster.
In order to support a control plane provider with purpose exposure
, you need to write a controller or expand the existing controlplane controller that watches all ControlPlane
s with .spec.type=<my-provider-name>
and purpose exposure
.
You can take a look at the below referenced example implementation for the AWS provider.
Most providers might require further information that is not provider specific but already part of the shoot resource.
As Gardener cannot know which information is required by providers, it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information.
References and Additional Resources
12 - ControlPlane Webhooks
ControlPlane Customization Webhooks
Gardener creates the Shoot controlplane in several steps of the Shoot flow. At different point of this flow, it:
- Deploys standard controlplane components such as kube-apiserver, kube-controller-manager, and kube-scheduler by creating the corresponding deployments, services, and other resources in the Shoot namespace.
- Initiates the deployment of custom controlplane components by ControlPlane controllers by creating a
ControlPlane
resource in the Shoot namespace.
In order to apply any provider-specific changes to the configuration provided by Gardener for the standard controlplane components, cloud extension providers can install mutating admission webhooks for the resources created by Gardener in the Shoot namespace.
What needs to be implemented to support a new cloud provider?
In order to support a new cloud provider, you should install “controlplane” mutating webhooks for any of the following resources:
- Deployment with name
kube-apiserver
, kube-controller-manager
, or kube-scheduler
- Service with name
kube-apiserver
OperatingSystemConfig
with any name, and purpose reconcile
See Contract Specification for more details on the contract that Gardener and webhooks should adhere to regarding the content of the above resources.
You can install 3 different kinds of controlplane webhooks:
Shoot
, or controlplane
webhooks apply changes needed by the Shoot cloud provider, for example the --cloud-provider
command line flag of kube-apiserver
and kube-controller-manager
. Such webhooks should only operate on Shoot namespaces labeled with shoot.gardener.cloud/provider=<provider>
.Seed
, or controlplaneexposure
webhooks apply changes needed by the Seed cloud provider, for example annotations on the kube-apiserver
service to ensure cloud-specific load balancers are correctly provisioned for a service of type LoadBalancer
. Such webhooks should only operate on Shoot namespaces labeled with seed.gardener.cloud/provider=<provider>
.
The labels shoot.gardener.cloud/provider
and seed.gardener.cloud/provider
are added by Gardener when it creates the Shoot namespace.
The resources mutated by the “controlplane” mutating webhooks are labeled with provider.extensions.gardener.cloud/mutated-by-controlplane-webhook: true
by gardenlet. The provider extensions can add an object selector to their “controlplane” mutating webhooks to not intercept requests for unrelated objects.
Contract Specification
This section specifies the contract that Gardener and webhooks should adhere to in order to ensure smooth interoperability. Note that this contract can’t be specified formally and is therefore easy to violate, especially by Gardener. The Gardener team will nevertheless do its best to adhere to this contract in the future and to ensure via additional measures (tests, validations) that it’s not unintentionally broken. If it needs to be changed intentionally, this can only happen after proper communication has taken place to ensure that the affected provider webhooks could be adapted to work with the new version of the contract.
Note: The contract described below may not necessarily be what Gardener does currently (as of May 2019). Rather, it reflects the target state after changes for Gardener extensibility have been introduced.
kube-apiserver
To deploy kube-apiserver, Gardener shall create a deployment and a service both named kube-apiserver
in the Shoot namespace. They can be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener.
The pod template of the kube-apiserver
deployment shall contain a container named kube-apiserver
.
The command
field of the kube-apiserver
container shall contain the kube-apiserver command line. It shall contain a number of provider-independent flags that should be ignored by webhooks, such as:
- admission plugins (
--enable-admission-plugins
, --disable-admission-plugins
) - secure communications (
--etcd-cafile
, --etcd-certfile
, --etcd-keyfile
, …) - audit log (
--audit-log-*
) - ports (
--secure-port
)
The kube-apiserver command line shall not contain any provider-specific flags, such as:
--cloud-provider
--cloud-config
These flags can be added by webhooks if needed.
The kube-apiserver
command line may contain a number of additional provider-independent flags. In general, webhooks should ignore these unless they are known to interfere with the desired kube-apiserver behavior for the specific provider. Among the flags to be considered are:
--endpoint-reconciler-type
--advertise-address
--feature-gates
Gardener uses SNI to expose the apiserver. In this case, Gardener will label the kube-apiserver
’s Deployment
with core.gardener.cloud/apiserver-exposure: gardener-managed
label (deprecated, the label will no longer be added as of v1.80
) and expects that the --endpoint-reconciler-type
and --advertise-address
flags are not modified.
The --enable-admission-plugins
flag may contain admission plugins that are not compatible with CSI plugins such as PersistentVolumeLabel
. Webhooks should therefore ensure that such admission plugins are either explicitly enabled (if CSI plugins are not used) or disabled (otherwise).
The env
field of the kube-apiserver
container shall not contain any provider-specific environment variables (so it will be empty). If any provider-specific environment variables are needed, they should be added by webhooks.
The volumes
field of the pod template of the kube-apiserver
deployment, and respectively the volumeMounts
field of the kube-apiserver
container shall not contain any provider-specific Secret
or ConfigMap
resources. If such resources should be mounted as volumes, this should be done by webhooks.
The kube-apiserver
Service
may be of type LoadBalancer
, but shall not contain any provider-specific annotations that may be needed to actually provision a load balancer resource in the Seed provider’s cloud. If any such annotations are needed, they should be added by webhooks (typically controlplaneexposure
webhooks).
The kube-apiserver
Service
will be of type ClusterIP
. In this case, Gardener will label this Service
with core.gardener.cloud/apiserver-exposure: gardener-managed
label (deprecated, the label will no longer be added as of v1.80
) and expects that no mutations happen.
kube-controller-manager
To deploy kube-controller-manager, Gardener shall create a deployment named kube-controller-manager
in the Shoot namespace. It can be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener.
The pod template of the kube-controller-manager
deployment shall contain a container named kube-controller-manager
.
The command
field of the kube-controller-manager
container shall contain the kube-controller-manager command line. It shall contain a number of provider-independent flags that should be ignored by webhooks, such as:
--kubeconfig
, --authentication-kubeconfig
, --authorization-kubeconfig
--leader-elect
- secure communications (
--tls-cert-file
, --tls-private-key-file
, …) - cluster CIDR and identity (
--cluster-cidr
, --cluster-name
) - sync settings (
--concurrent-deployment-syncs
, --concurrent-replicaset-syncs
) - horizontal pod autoscaler (
--horizontal-pod-autoscaler-*
) - ports (
--port
, --secure-port
)
The kube-controller-manager command line shall not contain any provider-specific flags, such as:
--cloud-provider
--cloud-config
--configure-cloud-routes
--external-cloud-volume-plugin
These flags can be added by webhooks if needed.
The kube-controller-manager command line may contain a number of additional provider-independent flags. In general, webhooks should ignore these unless they are known to interfere with the desired kube-controller-manager behavior for the specific provider. Among the flags to be considered are:
The env
field of the kube-controller-manager
container shall not contain any provider-specific environment variables (so it will be empty). If any provider-specific environment variables are needed, they should be added by webhooks.
The volumes
field of the pod template of the kube-controller-manager
deployment, and respectively the volumeMounts
field of the kube-controller-manager
container shall not contain any provider-specific Secret
or ConfigMap
resources. If such resources should be mounted as volumes, this should be done by webhooks.
kube-scheduler
To deploy kube-scheduler, Gardener shall create a deployment named kube-scheduler
in the Shoot namespace. It can be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener.
The pod template of the kube-scheduler
deployment shall contain a container named kube-scheduler
.
The command
field of the kube-scheduler
container shall contain the kube-scheduler command line. It shall contain a number of provider-independent flags that should be ignored by webhooks, such as:
--config
--authentication-kubeconfig
, --authorization-kubeconfig
- secure communications (
--tls-cert-file
, --tls-private-key-file
, …) - ports (
--port
, --secure-port
)
The kube-scheduler command line may contain additional provider-independent flags. In general, webhooks should ignore these unless they are known to interfere with the desired kube-controller-manager behavior for the specific provider. Among the flags to be considered are:
The kube-scheduler command line can’t contain provider-specific flags, and it makes no sense to specify provider-specific environment variables or mount provider-specific Secret
or ConfigMap
resources as volumes.
etcd-main and etcd-events
To deploy etcd, Gardener shall create 2 Etcd named etcd-main
and etcd-events
in the Shoot namespace. They can be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener.
Gardener shall configure the Etcd
resource completely to set up an etcd cluster which uses the default storage class of the seed cluster.
cloud-controller-manager
Gardener shall not deploy a cloud-controller-manager. If it is needed, it should be added by a ControlPlane
controller
CSI Controllers
Gardener shall not deploy a CSI controller. If it is needed, it should be added by a ControlPlane
controller
kubelet
To specify the kubelet configuration, Gardener shall create a OperatingSystemConfig
resource with any name and purpose reconcile
in the Shoot namespace. It can therefore also be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener. Gardener may write multiple such resources with different type
to the same Shoot namespaces if multiple OSs are used.
The OSC resource shall contain a unit named kubelet.service
, containing the corresponding systemd unit configuration file. The [Service]
section of this file shall contain a single ExecStart
option having the kubelet command line as its value.
The OSC resource shall contain a file with path /var/lib/kubelet/config/kubelet
, which contains a KubeletConfiguration
resource in YAML format. Most of the flags that can be specified in the kubelet command line can alternatively be specified as options in this configuration as well.
The kubelet command line shall contain a number of provider-independent flags that should be ignored by webhooks, such as:
--config
--bootstrap-kubeconfig
, --kubeconfig
--network-plugin
(and, if it equals cni
, also --cni-bin-dir
and --cni-conf-dir
)--node-labels
The kubelet command line shall not contain any provider-specific flags, such as:
--cloud-provider
--cloud-config
--provider-id
These flags can be added by webhooks if needed.
The kubelet command line / configuration may contain a number of additional provider-independent flags / options. In general, webhooks should ignore these unless they are known to interfere with the desired kubelet behavior for the specific provider. Among the flags / options to be considered are:
--enable-controller-attach-detach
(enableControllerAttachDetach
) - should be set to true
if CSI plugins are used, but in general can also be ignored since its default value is also true
, and this should work both with and without CSI plugins.--feature-gates
(featureGates
) - should contain a list of specific feature gates if CSI plugins are used. If CSI plugins are not used, the corresponding feature gates can be ignored since enabling them should not harm in any way.
13 - Conventions
General Conventions
All the extensions that are registered to Gardener are deployed to the seed clusters on which they are required (also see ControllerRegistration).
Some of these extensions might need to create global resources in the seed (e.g., ClusterRole
s), i.e., it’s important to have a naming scheme to avoid conflicts as it cannot be checked or validated upfront that two extensions don’t use the same names.
Consequently, this page should help answering some general questions that might come up when it comes to developing an extension.
PriorityClass
es
Extensions are not supposed to create and use self-defined PriorityClasses
.
Instead, they can and should rely on well-known PriorityClasses
managed by gardenlet.
High Availability of Deployed Components
Extensions might deploy components via Deployment
s, StatefulSet
s, etc., as part of the shoot control plane, or the seed or shoot system components.
In case a seed or shoot cluster is highly available, there are various failure tolerance types. For more information, see Highly Available Shoot Control Plane.
Accordingly, the replicas
, topologySpreadConstraints
or affinity
settings of the deployed components might need to be adapted.
Instead of doing this one-by-one for each and every component, extensions can rely on a mutating webhook provided by Gardener.
Please refer to High Availability of Deployed Components for details.
To reduce costs and to improve the network traffic latency in multi-zone clusters, extensions can make a Service topology-aware.
Please refer to this document for details.
Is there a naming scheme for (global) resources?
As there is no formal process to validate non-existence of conflicts between two extensions, please follow these naming schemes when creating resources (especially, when creating global resources, but it’s in general a good idea for most created resources):
The resource name should be prefixed with extensions.gardener.cloud:<extension-type>-<extension-name>:<resource-name>
, for example:
extensions.gardener.cloud:provider-aws:some-controller-manager
extensions.gardener.cloud:extension-certificate-service:cert-broker
How to create resources in the shoot cluster?
Some extensions might not only create resources in the seed cluster itself but also in the shoot cluster. Usually, every extension comes with a ServiceAccount
and the required RBAC permissions when it gets installed to the seed.
However, there are no credentials for the shoot for every extension.
Extensions are supposed to use ManagedResources
to manage resources in shoot clusters.
gardenlet deploys gardener-resource-manager instances into all shoot control planes, that will reconcile ManagedResources
without a specified class (spec.class=null
) in shoot clusters. Mind that Gardener acts on ManagedResources
with the origin=gardener
label. In order to prevent unwanted behavior, extensions should omit the origin
label or provide their own unique value for it when creating such resources.
If you need to deploy a non-DaemonSet resource, Gardener automatically ensures that it only runs on nodes that are allowed to host system components and extensions. For more information, see System Components Webhook.
How to create kubeconfigs for the shoot cluster?
Historically, Gardener extensions used to generate kubeconfigs with client certificates for components they deploy into the shoot control plane.
For this, they reused the shoot cluster CA secret (ca
) to issue new client certificates.
With gardener/gardener#4661 we moved away from using client certificates in favor of short-lived, auto-rotated ServiceAccount
tokens. These tokens are managed by gardener-resource-manager’s TokenRequestor
.
Extensions are supposed to reuse this mechanism for requesting tokens and a generic-token-kubeconfig
for authenticating against shoot clusters.
With GEP-18 (Shoot cluster CA rotation), a dedicated CA will be used for signing client certificates (gardener/gardener#5779) which will be rotated when triggered by the shoot owner.
With this, extensions cannot reuse the ca
secret anymore to issue client certificates.
Hence, extensions must switch to short-lived ServiceAccount
tokens in order to support the CA rotation feature.
The generic-token-kubeconfig
secret contains the CA bundle for establishing trust to shoot API servers. However, as the secret is immutable, its name changes with the rotation of the cluster CA.
Extensions need to look up the generic-token-kubeconfig.secret.gardener.cloud/name
annotation on the respective Cluster
object in order to determine which secret contains the current CA bundle.
The helper function extensionscontroller.GenericTokenKubeconfigSecretNameFromCluster
can be used for this task.
You can take a look at CA Rotation in Extensions for more details on the CA rotation feature in regard to extensions.
How to create certificates for the shoot cluster?
Gardener creates several certificate authorities (CA) that are used to create server certificates for various components.
For example, the shoot’s etcd has its own CA, the kube-aggregator has its own CA as well, and both are different to the actual cluster’s CA.
With GEP-18 (Shoot cluster CA rotation), extensions are required to do the same and generate dedicated CAs for their components (e.g. for signing a server certificate for cloud-controller-manager). They must not depend on the CA secrets managed by gardenlet.
Please see CA Rotation in Extensions for the exact requirements that extensions need to fulfill in order to support the CA rotation feature.
How to enforce a Pod Security Standard for extension namespaces?
The pod-security.kubernetes.io/enforce
namespace label enforces the Pod Security Standards.
You can set the pod-security.kubernetes.io/enforce
label for extension namespace by adding the security.gardener.cloud/pod-security-enforce
annotation to your ControllerRegistration
. The value of the annotation would be the value set for the pod-security.kubernetes.io/enforce
label. It is advised to set the annotation with the most restrictive pod security standard that your extension pods comply with.
If you are using the ./hack/generate-controller-registration.sh
script to generate your ControllerRegistration
you can use the -e, –pod-security-enforce option to set the security.gardener.cloud/pod-security-enforce
annotation. If the option is not set, it defaults to baseline
.
14 - DNS Record
Contract: DNSRecord
Resources
Every shoot cluster requires external DNS records that are publicly resolvable.
The management of these DNS records requires provider-specific knowledge which is to be developed outside the Gardener’s core repository.
Currently, Gardener uses DNSProvider
and DNSEntry
resources. However, this introduces undesired coupling of Gardener to a controller that does not adhere to the Gardener extension contracts. Because of this, we plan to stop using DNSProvider
and DNSEntry
resources for Gardener DNS records in the future and use the DNSRecord
resources described here instead.
What does Gardener create DNS records for?
Internal Domain Name
Every shoot cluster’s kube-apiserver running in the seed is exposed via a load balancer that has a public endpoint (IP or hostname).
This endpoint is used by end-users and also by system components (that are running in another network, e.g., the kubelet or kube-proxy) to talk to the cluster.
In order to be robust against changes of this endpoint (e.g., caused due to re-creation of the load balancer or move of the DNS record to another seed cluster), Gardener creates a so-called internal domain name for every shoot cluster.
The internal domain name is a publicly resolvable DNS record that points to the load balancer of the kube-apiserver.
Gardener uses this domain name in the kubeconfigs of all system components, instead of using directly the load balancer endpoint.
This way Gardener does not need to recreate all kubeconfigs if the endpoint changes - it just needs to update the DNS record.
External Domain Name
The internal domain name is not configurable by end-users directly but configured by the Gardener administrator.
However, end-users usually prefer to have another DNS name, maybe even using their own domain sometimes, to access their Kubernetes clusters.
Gardener supports that by creating another DNS record, named external domain name, that actually points to the internal domain name.
The kubeconfig handed out to end-users does contain this external domain name, i.e., users can access their clusters with the DNS name they like to.
As not every end-user has an own domain, it is possible for Gardener administrators to configure so-called default domains.
If configured, shoots that do not specify a domain explicitly get an external domain name based on a default domain (unless explicitly stated that this shoot should not get an external domain name (.spec.dns.provider=unmanaged
).
Ingress Domain Name (Deprecated)
Gardener allows to deploy a nginx-ingress-controller
into a shoot cluster (deprecated).
This controller is exposed via a public load balancer (again, either IP or hostname).
Gardener creates a wildcard DNS record pointing to this load balancer.
Ingress
resources can later use this wildcard DNS record to expose underlying applications.
Seed Ingress
If .spec.ingress
is configured in the Seed, Gardener deploys the ingress controller mentioned in .spec.ingress.controller.kind
to the seed cluster. Currently, the only supported kind is “nginx”. If the ingress field is set, then .spec.dns.provider
must also be set. Gardener creates a wildcard DNS record pointing to the load balancer of the ingress controller. The Ingress
resources of components like Plutono and Prometheus in the garden
namespace and the shoot namespaces use this wildcard DNS record to expose their underlying applications.
What needs to be implemented to support a new DNS provider?
As part of the shoot flow, Gardener will create a number of DNSRecord
resources in the seed cluster (one for each of the DNS records mentioned above) that need to be reconciled by an extension controller.
These resources contain the following information:
- The DNS provider type (e.g.,
aws-route53
, google-clouddns
, …) - A reference to a
Secret
object that contains the provider-specific credentials used to communicate with the provider’s API. - The fully qualified domain name (FQDN) of the DNS record, e.g. “api.<shoot domain>”.
- The DNS record type, one of
A
, AAAA
, CNAME
, or TXT
. - The DNS record values, that is a list of IP addresses for A records, a single hostname for CNAME records, or a list of texts for TXT records.
Optionally, the DNSRecord
resource may contain also the following information:
- The region of the DNS record. If not specified, the region specified in the referenced
Secret
shall be used. If that is also not specified, the extension controller shall use a certain default region. - The DNS hosted zone of the DNS record. If not specified, it shall be determined automatically by the extension controller by getting all hosted zones of the account and searching for the longest zone name that is a suffix of the fully qualified domain name (FQDN) mentioned above.
- The TTL of the DNS record in seconds. If not specified, it shall be set by the extension controller to 120.
Example DNSRecord
:
---
apiVersion: v1
kind: Secret
metadata:
name: dnsrecord-bar-external
namespace: shoot--foo--bar
type: Opaque
data:
# aws-route53 specific credentials here
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: DNSRecord
metadata:
name: dnsrecord-external
namespace: default
spec:
type: aws-route53
secretRef:
name: dnsrecord-bar-external
namespace: shoot--foo--bar
# region: eu-west-1
# zone: ZFOO
name: api.bar.foo.my-fancy-domain.com
recordType: A
values:
- 1.2.3.4
# ttl: 600
In order to support a new DNS record provider, you need to write a controller that watches all DNSRecord
s with .spec.type=<my-provider-name>
.
You can take a look at the below referenced example implementation for the AWS route53 provider.
Key Names in Secrets Containing Provider-Specific Credentials
For compatibility with existing setups, extension controllers shall support two different namings of keys in secrets containing provider-specific credentials:
- The naming used by the external-dns-management DNS controller. For example, on AWS the key names are
AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and AWS_REGION
. - The naming used by other provider-specific extension controllers, e.g. for infrastructure. For example, on AWS the key names are
accessKeyId
, secretAccessKey
, and region
.
Avoiding Reading the DNS Hosted Zones
If the DNS hosted zone is not specified in the DNSRecord
resource, during the first reconciliation the extension controller shall determine the correct DNS hosted zone for the specified FQDN and write it to the status of the resource:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: DNSRecord
metadata:
name: dnsrecord-external
namespace: shoot--foo--bar
spec:
...
status:
lastOperation: ...
zone: ZFOO
On subsequent reconciliations, the extension controller shall use the zone from the status and avoid reading the DNS hosted zones from the provider.
If the DNSRecord
resource specifies a zone in .spec.zone
and the extension controller has written a value to .status.zone
, the first one shall be considered with higher priority by the extension controller.
Some providers might require further information that is not provider specific but already part of the shoot resource.
As Gardener cannot know which information is required by providers, it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information that is not part of the DNSRecord
resource itself.
Using DNSRecord
Resources
gardenlet manages DNSRecord
resources for all three DNS records mentioned above (internal, external, and ingress).
In order to successfully reconcile a shoot with the feature gate enabled, extension controllers for DNSRecord
resources for types used in the default, internal, and custom domain secrets should be registered via ControllerRegistration
resources.
Note: For compatibility reasons, the spec.dns.providers
section is still used to specify additional providers. Only the one marked as primary: true
will be used for DNSRecord
. All others are considered by the shoot-dns-service
extension only (if deployed).
Support for DNSRecord
Resources in the Provider Extensions
The following table contains information about the provider extension version that adds support for DNSRecord
resources:
Extension | Version |
---|
provider-alicloud | v1.26.0 |
provider-aws | v1.27.0 |
provider-azure | v1.21.0 |
provider-gcp | v1.18.0 |
provider-openstack | v1.21.0 |
provider-vsphere | N/A |
provider-equinix-metal | N/A |
provider-kubevirt | N/A |
provider-openshift | N/A |
Support for DNSRecord
IPv6 recordType: AAAA
in the Provider Extensions
The following table contains information about the provider extension version that adds support for DNSRecord
IPv6 recordType: AAAA
:
Extension | Version |
---|
provider-alicloud | N/A |
provider-aws | N/A |
provider-azure | N/A |
provider-gcp | N/A |
provider-openstack | N/A |
provider-vsphere | N/A |
provider-equinix-metal | N/A |
provider-kubevirt | N/A |
provider-openshift | N/A |
provider-local | v1.63.0 |
References and Additional Resources
15 - Extension
Contract: Extension
Resource
Gardener defines common procedures which must be passed to create a functioning shoot cluster. Well known steps are represented by special resources like Infrastructure
, OperatingSystemConfig
or DNS
. These resources are typically reconciled by dedicated controllers setting up the infrastructure on the hyperscaler or managing DNS entries, etc.
But, some requirements don’t match with those special resources or don’t depend on being proceeded at a specific step in the creation / deletion flow of the shoot. They require a more generic hook. Therefore, Gardener offers the Extension
resource.
What is required to register and support an Extension type?
Gardener creates one Extension
resource per registered extension type in ControllerRegistration
per shoot.
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
metadata:
name: extension-example
spec:
resources:
- kind: Extension
type: example
globallyEnabled: true
workerlessSupported: true
If spec.resources[].globallyEnabled
is true
, then the Extension
resources of the given type
is created for every shoot cluster. Set to false
, the Extension
resource is only created if configured in the Shoot
manifest. In case of workerless Shoot
, a globally enabled Extension
resource is created only if spec.resources[].workerlessSupported
is also set to true
. If an extension configured in the spec of a workerless Shoot
is not supported yet, the admission request will be rejected.
The Extension
resources are created in the shoot namespace of the seed cluster.
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: example
namespace: shoot--foo--bar
spec:
type: example
providerConfig: {}
Your controller needs to reconcile extensions.extensions.gardener.cloud
. Since there can exist multiple Extension
resources per shoot, each one holds a spec.type
field to let controllers check their responsibility (similar to all other extension resources of Gardener).
ProviderConfig
It is possible to provide data in the Shoot
resource which is copied to spec.providerConfig
of the Extension
resource.
---
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: bar
namespace: garden-foo
spec:
extensions:
- type: example
providerConfig:
foo: bar
...
results in
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: example
namespace: shoot--foo--bar
spec:
type: example
providerConfig:
foo: bar
Shoot Reconciliation Flow and Extension Status
Gardener creates Extension resources as part of the Shoot reconciliation. Moreover, it is guaranteed that the Cluster resource exists before the Extension
resource is created. Extension
s can be reconciled at different stages during Shoot reconciliation depending on the defined extension lifecycle strategy in the respective ControllerRegistration resource. Please consult the Extension Lifecycle section for more information.
For an Extension
controller it is crucial to maintain the Extension
’s status correctly. At the end Gardener checks the status of each Extension
and only reports a successful shoot reconciliation if the state of the last operation is Succeeded
.
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
generation: 1
name: example
namespace: shoot--foo--bar
spec:
type: example
status:
lastOperation:
state: Succeeded
observedGeneration: 1
16 - Force Deletion
Force Deletion
From v1.81
, Gardener supports Shoot Force Deletion. All extension controllers should also properly support it. This document outlines some important points that extension maintainers should keep in mind to support force deletion in their extensions.
Overall Principles
The following principles should always be upheld:
- All resources pertaining to the extension and managed by it should be appropriately handled and cleaned up by the extension when force deletion is initiated.
Implementation Details
ForceDelete Actuator Methods
Most extension controller implementations follow a common pattern where a generic Reconciler
implementation delegates to an Actuator
interface that contains the methods Reconcile
, Delete
, Migrate
and Restore
provided by the extension. A new method, ForceDelete
has been added to all such Actuator
interfaces; see the infrastructure Actuator
interface as an example. The generic reconcilers call this method if the Shoot has annotation confirmation.gardener.cloud/force-deletion=true
. Thus, it should be implemented by the extension controller to forcefully delete resources if not possible to delete them gracefully. If graceful deletion is possible, then in the ForceDelete
, they can simply call the Delete
method.
Extension Controllers Based on Generic Actuators
In practice, the implementation of many extension controllers (for example, the controlplane and worker controllers in most provider extensions) are based on a generic Actuator
implementation that only delegates to extension methods for behavior that is truly provider-specific. In all such cases, the ForceDelete
method has already been implemented with a method that should suit most of the extensions. If it doesn’t suit your extension, then the ForceDelete
method needs to be overridden; see the Azure controlplane controller as an example.
Extension Controllers Not Based on Generic Actuators
The implementation of some extension controllers (for example, the infrastructure controllers in all provider extensions) are not based on a generic Actuator
implementation. Such extension controllers must always provide a proper implementation of the ForceDelete
method according to the above guidelines; see the AWS infrastructure controller as an example. In practice, this might result in code duplication between the different extensions, since the ForceDelete
code is usually not OS-specific.
Some General Implementation Examples
- If the extension deploys only resources in the shoot cluster not backed by infrastructure in third-party systems, then performing the regular deletion code (
actuator.Delete
) will suffice in the majority of cases. (e.g - https://github.com/gardener/gardener-extension-shoot-networking-filter/blob/1d95a483d803874e8aa3b1de89431e221a7d574e/pkg/controller/lifecycle/actuator.go#L175-L178) - If the extension deploys resources which are backed by infrastructure in third-party systems:
- If the resource is in the Seed cluster, the extension should remove the finalizers and delete the resource. This is needed especially if the resource is a custom resource since
gardenlet
will not be aware of this resource and cannot take action. - If the resource is in the Shoot and if it’s deployed by a
ManagedResource
, then gardenlet
will take care to forcefully delete it in a later step of force-deletion. If the resource is not deployed via a ManagedResource
, then it wouldn’t block the deletion flow anyway since it is in the Shoot cluster. In both cases, the extension controller can ignore the resource and return nil
.
17 - Healthcheck Library
Health Check Library
Goal
Typically, an extension reconciles a specific resource (Custom Resource Definitions (CRDs)) and creates / modifies resources in the cluster (via helm, managed resources, kubectl, …).
We call these API Objects ‘dependent objects’ - as they are bound to the lifecycle of the extension.
The goal of this library is to enable extensions to setup health checks for their ‘dependent objects’ with minimal effort.
Usage
The library provides a generic controller with the ability to register any resource that satisfies the extension object interface.
An example is the Worker
CRD.
Health check functions for commonly used dependent objects can be reused and registered with the controller, such as:
- Deployment
- DaemonSet
- StatefulSet
- ManagedResource (Gardener specific)
See the below example taken from the provider-aws.
health.DefaultRegisterExtensionForHealthCheck(
aws.Type,
extensionsv1alpha1.SchemeGroupVersion.WithKind(extensionsv1alpha1.WorkerResource),
func() runtime.Object { return &extensionsv1alpha1.Worker{} },
mgr, // controller runtime manager
opts, // options for the health check controller
nil, // custom predicates
map[extensionshealthcheckcontroller.HealthCheck]string{
general.CheckManagedResource(genericactuator.McmShootResourceName): string(gardencorev1beta1.ShootSystemComponentsHealthy),
general.CheckSeedDeployment(aws.MachineControllerManagerName): string(gardencorev1beta1.ShootEveryNodeReady),
worker.SufficientNodesAvailable(): string(gardencorev1beta1.ShootEveryNodeReady),
})
This creates a health check controller that reconciles the extensions.gardener.cloud/v1alpha1.Worker
resource with the spec.type ‘aws’.
Three health check functions are registered that are executed during reconciliation.
Each health check is mapped to a single HealthConditionType
that results in conditions with the same condition.type
(see below).
To contribute to the Shoot’s health, the following conditions can be used: SystemComponentsHealthy
, EveryNodeReady
, ControlPlaneHealthy
, ObservabilityComponentsHealthy
. In case of workerless Shoot
the EveryNodeReady
condition is not present, so it can’t be used.
The Gardener/Gardenlet checks each extension for conditions matching these types.
However, extensions are free to choose any HealthConditionType
.
For more information, see Contributing to Shoot Health Status Conditions.
A health check has to satisfy the below interface.
You can find implementation examples in the healtcheck folder.
type HealthCheck interface {
// Check is the function that executes the actual health check
Check(context.Context, types.NamespacedName) (*SingleCheckResult, error)
// InjectSeedClient injects the seed client
InjectSeedClient(client.Client)
// InjectShootClient injects the shoot client
InjectShootClient(client.Client)
// SetLoggerSuffix injects the logger
SetLoggerSuffix(string, string)
// DeepCopy clones the healthCheck
DeepCopy() HealthCheck
}
The health check controller regularly (default: 30s
) reconciles the extension resource and executes the registered health checks for the dependent objects.
As a result, the controller writes condition(s) to the status of the extension containing the health check result.
In our example, two checks are mapped to ShootEveryNodeReady
and one to ShootSystemComponentsHealthy
, leading to conditions with two distinct HealthConditionTypes
(condition.type):
status:
conditions:
- lastTransitionTime: "20XX-10-28T08:17:21Z"
lastUpdateTime: "20XX-11-28T08:17:21Z"
message: (1/1) Health checks successful
reason: HealthCheckSuccessful
status: "True"
type: SystemComponentsHealthy
- lastTransitionTime: "20XX-10-28T08:17:21Z"
lastUpdateTime: "20XX-11-28T08:17:21Z"
message: (2/2) Health checks successful
reason: HealthCheckSuccessful
status: "True"
type: EveryNodeReady
Please note that there are four statuses: True
, False
, Unknown
, and Progressing
.
True
should be used for successful health checks.False
should be used for unsuccessful/failing health checks.Unknown
should be used when there was an error trying to determine the health status.Progressing
should be used to indicate that the health status did not succeed but for expected reasons (e.g., a cluster scale up/down could make the standard health check fail because something is wrong with the Machines
, however, it’s actually an expected situation and known to be completed within a few minutes.)
Health checks that report Progressing
should also provide a timeout, after which this “progressing situation” is expected to be completed.
The health check library will automatically transition the status to False
if the timeout was exceeded.
Additional Considerations
It is up to the extension to decide how to conduct health checks, though it is recommended to make use of the build-in health check functionality of managed-resources
for trivial checks.
By deploying the depending resources via managed resources, the gardener resource manager conducts basic checks for different API objects out-of-the-box (e.g Deployments
, DaemonSets
, …) - and writes health conditions.
By default, Gardener performs health checks for all the ManagedResource
s created in the shoot namespaces.
Their status will be aggregated to the Shoot
conditions according to the following rules:
- Health checks of
ManagedResource
with .spec.class=nil
are aggregated to the SystemComponentsHealthy
condition - Health checks of
ManagedResource
with .spec.class!=nil
are aggregated to the ControlPlaneHealthy
condition unless the ManagedResource
is labeled with care.gardener.cloud/condition-type=<other-condition-type>
. In such case, it is aggregated to the <other-condition-type>
.
More sophisticated health checks should be implemented by the extension controller itself (implementing the HealthCheck
interface).
18 - Heartbeat
Heartbeat Controller
The heartbeat controller renews a dedicated Lease
object named gardener-extension-heartbeat
at regular 30 second intervals by default. This Lease
is used for heartbeats similar to how gardenlet
uses Lease
objects for seed heartbeats (see gardenlet heartbeats).
The gardener-extension-heartbeat
Lease
can be checked by other controllers to verify that the corresponding extension controller is still running. Currently, gardenlet
checks this Lease
when performing shoot health checks and expects to find the Lease
inside the namespace where the extension controller is deployed by the corresponding ControllerInstallation
. For each extension resource deployed in the Shoot control plane, gardenlet
finds the corresponding gardener-extension-heartbeat
Lease
resource and checks whether the Lease
’s .spec.renewTime
is older than the allowed threshold for stale extension health checks - in this case, gardenlet
considers the health check report for an extension resource as “outdated” and reflects this in the Shoot
status.
19 - Infrastructure
Contract: Infrastructure
Resource
Every Kubernetes cluster requires some low-level infrastructure to be setup in order to work properly.
Examples for that are networks, routing entries, security groups, IAM roles, etc.
Before introducing the Infrastructure
extension resource Gardener was using Terraform in order to create and manage these provider-specific resources (e.g., see here).
Now, Gardener commissions an external, provider-specific controller to take over this task.
Which infrastructure resources are required?
Unfortunately, there is no general answer to this question as it is highly provider specific.
Consider the above mentioned resources, i.e. VPC, subnets, route tables, security groups, IAM roles, SSH key pairs.
Most of the resources are required in order to create VMs (the shoot cluster worker nodes), load balancers, and volumes.
What needs to be implemented to support a new infrastructure provider?
As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--foo--bar
spec:
type: azure
region: eu-west-1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
providerConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
resourceGroup:
name: mygroup
networks:
vnet: # specify either 'name' or 'cidr'
# name: my-vnet
cidr: 10.250.0.0/16
workers: 10.250.0.0/19
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used to create the needed resources.
However, the most important section is the .spec.providerConfig
.
It contains an embedded declaration of the provider specific configuration for the infrastructure (that cannot be known by Gardener itself).
You are responsible for designing how this configuration looks like.
Gardener does not evaluate it but just copies this part from what has been provided by the end-user in the Shoot
resource.
After your controller has created the required resources in your provider’s infrastructure it needs to generate an output that can be used by other controllers in subsequent steps.
An example for that is the Worker
extension resource controller.
It is responsible for creating virtual machines (shoot worker nodes) in this prepared infrastructure.
Everything that it needs to know in order to do that (e.g. the network IDs, security group names, etc. (again: provider-specific)) needs to be provided as output in the Infrastructure
resource:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--foo--bar
spec:
...
status:
lastOperation: ...
providerStatus:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureStatus
resourceGroup:
name: mygroup
networks:
vnet:
name: my-vnet
subnets:
- purpose: nodes
name: my-subnet
availabilitySets:
- purpose: nodes
id: av-set-id
name: av-set-name
routeTables:
- purpose: nodes
name: route-table-name
securityGroups:
- purpose: nodes
name: sec-group-name
In order to support a new infrastructure provider you need to write a controller that watches all Infrastructure
s with .spec.type=<my-provider-name>
.
You can take a look at the below referenced example implementation for the Azure provider.
Dynamic nodes network for shoot clusters
Some environments do not allow end-users to statically define a CIDR for the network that shall be used for the shoot worker nodes.
In these cases it is possible for the extension controllers to dynamically provision a network for the nodes (as part of their reconciliation loops), and to provide the CIDR in the status
of the Infrastructure
resource:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--foo--bar
spec:
...
status:
lastOperation: ...
providerStatus: ...
nodesCIDR: 10.250.0.0/16
Gardener will pick this nodesCIDR
and use it to configure the VPN components to establish network connectivity between the control plane and the worker nodes.
If the Shoot
resource already specifies a nodes CIDR in .spec.networking.nodes
and the extension controller provides also a value in .status.nodesCIDR
in the Infrastructure
resource then the latter one will always be considered with higher priority by Gardener.
Some providers might require further information that is not provider specific but already part of the shoot resource.
One example for this is the GCP infrastructure controller which needs the pod and the service network of the cluster in order to prepare and configure the infrastructure correctly.
As Gardener cannot know which information is required by providers it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information that is not part of the Infrastructure
resource itself.
Implementation details
Actuator
interface
Most existing infrastructure controller implementations follow a common pattern where a generic Reconciler
delegates to an Actuator
interface that contains the methods Reconcile
, Delete
, Migrate
, and Restore
. These methods are called by the generic Reconciler
for the respective operations, and should be implemented by the extension according to the contract described here and the migration guidelines.
ConfigValidator
interface
For infrastructure controllers, the generic Reconciler
also delegates to a ConfigValidator
interface that contains a single Validate
method. This method is called by the generic Reconciler
at the beginning of every reconciliation, and can be implemented by the extension to validate the .spec.providerConfig
part of the Infrastructure
resource with the respective cloud provider, typically the existence and validity of cloud provider resources such as AWS VPCs or GCP Cloud NAT IPs.
The Validate
method returns a list of errors. If this list is non-empty, the generic Reconciler
will fail with an error. This error will have the error code ERR_CONFIGURATION_PROBLEM
, unless there is at least one error in the list that has its ErrorType
field set to field.ErrorTypeInternal
.
References and additional resources
20 - Logging And Monitoring
Logging and Monitoring for Extensions
Gardener provides an integrated logging and monitoring stack for alerting, monitoring, and troubleshooting of its managed components by operators or end users. For further information how to make use of it in these roles, refer to the corresponding guides for exploring logs and for monitoring with Plutono.
The components that constitute the logging and monitoring stack are managed by Gardener. By default, it deploys Prometheus and Alertmanager (managed via prometheus-operator
, and Plutono into the garden
namespace of all seed clusters. If the logging is enabled in the gardenlet
configuration (logging.enabled
), it will deploy fluent-operator and Vali in the garden
namespace too.
Each shoot namespace hosts managed logging and monitoring components. As part of the shoot reconciliation flow, Gardener deploys a shoot-specific Prometheus, blackbox-exporter, Plutono, and, if configured, an Alertmanager into the shoot namespace, next to the other control plane components. If the logging is enabled in the gardenlet
configuration (logging.enabled
) and the shoot purpose is not testing
, it deploys a shoot-specific Vali in the shoot namespace too.
The logging and monitoring stack is extensible by configuration. Gardener extensions can take advantage of that and contribute monitoring configurations encoded in ConfigMap
s for their own, specific dashboards, alerts and other supported assets and integrate with it. As with other Gardener resources, they will be continuously reconciled. The extensions can also deploy directly fluent-operator custom resources which will be created in the seed cluster and plugged into the fluent-bit instance.
This guide is about the roles and extensibility options of the logging and monitoring stack components, and how to integrate extensions with:
Monitoring
Seed Cluster
Cache Prometheus
The central Prometheus instance in the garden
namespace (called “cache Prometheus”) fetches metrics and data from all seed cluster nodes and all seed cluster pods.
It uses the federation concept to allow the shoot-specific instances to scrape only the metrics for the pods of the control plane they are responsible for.
This mechanism allows to scrape the metrics for the nodes/pods once for the whole cluster, and to have them distributed afterwards.
For more details, continue reading here.
Typically, this is not necessary, but in case an extension wants to extend the configuration for this cache Prometheus, they can create the prometheus-operator
’s custom resources and label them with prometheus=cache
, for example:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
prometheus: cache
name: cache-my-component
namespace: garden
spec:
selector:
matchLabels:
app: my-component
endpoints:
- metricRelabelings:
- action: keep
regex: ^(metric1|metric2|...)$
sourceLabels:
- __name__
port: metrics
Seed Prometheus
Another Prometheus instance in the garden
namespace (called “seed Prometheus”) fetches metrics and data from seed system components, kubelets, cAdvisors, and extensions.
If you want your extension pods to be scraped then they must be annotated with prometheus.io/scrape=true
and prometheus.io/port=<metrics-port>
.
For more details, continue reading here.
Typically, this is not necessary, but in case an extension wants to extend the configuration for this seed Prometheus, they can create the prometheus-operator
’s custom resources and label them with prometheus=seed
, for example:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
prometheus: seed
name: seed-my-component
namespace: garden
spec:
selector:
matchLabels:
app: my-component
endpoints:
- metricRelabelings:
- action: keep
regex: ^(metric1|metric2|...)$
sourceLabels:
- __name__
port: metrics
Aggregate Prometheus
Another Prometheus instance in the garden
namespace (called “aggregate Prometheus”) stores pre-aggregated data from the cache Prometheus and shoot Prometheus.
An ingress exposes this Prometheus instance allowing it to be scraped from another cluster.
For more details, continue reading here.
Typically, this is not necessary, but in case an extension wants to extend the configuration for this aggregate Prometheus, they can create the prometheus-operator
’s custom resources and label them with prometheus=aggregate
, for example:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
prometheus: aggregate
name: aggregate-my-component
namespace: garden
spec:
selector:
matchLabels:
app: my-component
endpoints:
- metricRelabelings:
- action: keep
regex: ^(metric1|metric2|...)$
sourceLabels:
- __name__
port: metrics
Plutono
A Plutono instance is deployed by gardenlet
into the seed cluster’s garden
namespace for visualizing monitoring metrics and logs via dashboards.
In order to provide custom dashboards, create a ConfigMap
in the garden
namespace labelled with dashboard.monitoring.gardener.cloud/seed=true
that contains the respective JSON documents, for example:
apiVersion: v1
kind: ConfigMap
metadata:
labels:
dashboard.monitoring.gardener.cloud/seed: "true"
name: extension-foo-my-custom-dashboard
namespace: garden
data:
my-custom-dashboard.json: <dashboard-JSON-document>
Shoot Cluster
Shoot Prometheus
The shoot-specific metrics are then made available to operators and users in the shoot Plutono, using the shoot Prometheus as data source.
Extension controllers might deploy components as part of their reconciliation next to the shoot’s control plane.
Examples for this would be a cloud-controller-manager or CSI controller deployments. Extensions that want to have their managed control plane components integrated with monitoring can contribute their per-shoot configuration for scraping Prometheus metrics, Alertmanager alerts or Plutono dashboards.
Extensions Monitoring Integration
In case an extension wants to extend the configuration for the shoot Prometheus, they can create the prometheus-operator
’s custom resources and label them with prometheus=shoot
.
ServiceMonitor
When the component runs in the seed cluster (e.g., as part of the shoot control plane), ServiceMonitor
resources should be used:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
prometheus: shoot
name: shoot-my-controlplane-component
namespace: shoot--foo--bar
spec:
selector:
matchLabels:
app: my-component
endpoints:
- metricRelabelings:
- action: keep
regex: ^(metric1|metric2|...)$
sourceLabels:
- __name__
port: metrics
In case HTTPS
scheme is used, the CA certificate should be provided like this:
spec:
scheme: HTTPS
tlsConfig:
ca:
secret:
name: <name-of-ca-bundle-secret>
key: bundle.crt
In case the component requires credentials when contacting its metrics endpoint, provide them like this:
spec:
authorization:
credentials:
name: <name-of-secret-containing-credentials>
key: <data-keyin-secret>
If the component delegates authorization to the kube-apiserver
of the shoot cluster, you can use the shoot-access-prometheus-shoot
secret:
spec:
authorization:
credentials:
name: shoot-access-prometheus-shoot
key: token
# in case the component's server certificate is signed by the cluster CA:
scheme: HTTPS
tlsConfig:
ca:
secret:
name: <name-of-ca-bundle-secret>
key: bundle.crt
ScrapeConfig
s
If the component runs in the shoot cluster itself, metrics are scraped via the kube-apiserver
proxy.
In this case, Prometheus needs to authenticate itself with the API server.
This can be done like this:
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
labels:
prometheus: shoot
name: shoot-my-cluster-component
namespace: shoot--foo--bar
spec:
authorization:
credentials:
name: shoot-access-prometheus-shoot
key: token
scheme: HTTPS
tlsConfig:
ca:
secret:
name: <name-of-ca-bundle-secret>
key: bundle.crt
kubernetesSDConfigs:
- apiServer: https://kube-apiserver
authorization:
credentials:
name: shoot-access-prometheus-shoot
key: token
followRedirects: true
namespaces:
names:
- kube-system
role: endpoints
tlsConfig:
ca:
secret:
name: <name-of-ca-bundle-secret>
key: bundle.crt
cert: {}
metricRelabelings:
- sourceLabels:
- __name__
action: keep
regex: ^(metric1|metric2)$
- sourceLabels:
- namespace
action: keep
regex: kube-system
relabelings:
- action: replace
replacement: my-cluster-component
targetLabel: job
- sourceLabels: [__meta_kubernetes_service_name, __meta_kubernetes_pod_container_port_name]
separator: ;
regex: my-component-service;metrics
replacement: $1
action: keep
- sourceLabels: [__meta_kubernetes_endpoint_node_name]
separator: ;
regex: (.*)
targetLabel: node
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
targetLabel: pod
replacement: $1
action: replace
- targetLabel: __address__
replacement: kube-apiserver:443
- sourceLabels: [__meta_kubernetes_pod_name, __meta_kubernetes_pod_container_port_number]
separator: ;
regex: (.+);(.+)
targetLabel: __metrics_path__
replacement: /api/v1/namespaces/kube-system/pods/${1}:${2}/proxy/metrics
action: replace
[!TIP]
Developers can make use of the pkg/component/observability/monitoring/prometheus/shoot.ClusterComponentScrapeConfigSpec
function in order to generate a ScrapeConfig
like above.
PrometheusRule
Similar to ServiceMonitor
s, PrometheusRule
s can be created with the prometheus=shoot
label:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: shoot
name: shoot-my-component
namespace: shoot--foo--bar
spec:
groups:
- name: my.rules
rules:
# ...
Plutono Dashboards
A Plutono instance is deployed by gardenlet
into the shoot cluster’s namespace for visualizing monitoring metrics and logs via dashboards.
In order to provide custom dashboards, create a ConfigMap
in the shoot cluster’s namespace labelled with dashboard.monitoring.gardener.cloud/shoot=true
that contains the respective JSON documents, for example:
apiVersion: v1
kind: ConfigMap
metadata:
labels:
dashboard.monitoring.gardener.cloud/shoot: "true"
name: extension-foo-my-custom-dashboard
namespace: shoot--project--name
data:
my-custom-dashboard.json: <dashboard-JSON-document>
Logging
In Kubernetes clusters, container logs are non-persistent and do not survive stopped and destroyed containers. Gardener addresses this problem for the components hosted in a seed cluster by introducing its own managed logging solution. It is integrated with the Gardener monitoring stack to have all troubleshooting context in one place.
Gardener logging consists of components in three roles - log collectors and forwarders, log persistency and exploration/consumption interfaces. All of them live in the seed clusters in multiple instances:
- Logs are persisted by Vali instances deployed as StatefulSets - one per shoot namespace, if the logging is enabled in the
gardenlet
configuration (logging.enabled
) and the shoot purpose is not testing
, and one in the garden
namespace. The shoot instances store logs from the control plane components hosted there. The garden
Vali instance is responsible for logs from the rest of the seed namespaces - kube-system
, garden
, extension-*
, and others. - Fluent-bit DaemonSets deployed by the fluent-operator on each seed node collect logs from it. A custom plugin takes care to distribute the collected log messages to the Vali instances that they are intended for. This allows to fetch the logs once for the whole cluster, and to distribute them afterwards.
- Plutono is the UI component used to explore monitoring and log data together for easier troubleshooting and in context. Plutono instances are configured to use the corresponding Vali instances, sharing the same namespace as data providers. There is one Plutono Deployment in the
garden
namespace and one Deployment per shoot namespace (exposed to the end users and to the operators).
Logs can be produced from various sources, such as containers or systemd, and in different formats. The fluent-bit design supports configurable data pipeline to address that problem. Gardener provides such configuration for logs produced by all its core managed components as ClusterFilters
and ClusterParsers
. Extensions can contribute their own, specific configurations as fluent-operator custom resources too. See for example the logging configuration for the Gardener AWS provider extension.
Fluent-bit Log Parsers and Filters
To integrate with Gardener logging, extensions can and should specify how fluent-bit will handle the logs produced by the managed components that they contribute to Gardener. Normally, that would require to configure a parser for the specific logging format, if none of the available is applicable, and a filter defining how to apply it. For a complete reference for the configuration options, refer to fluent-bit’s documentation.
To contribute its own configuration to the fluent-bit agents data pipelines, an extension must deploy a fluent-operator
custom resource labeled with fluentbit.gardener/type: seed
in the seed cluster.
Note: Take care to provide the correct data pipeline elements in the corresponding fields and not to mix them.
Example: Logging configuration for provider-specific cloud-controller-manager
deployed into shoot namespaces that reuses the kube-apiserver-parser
defined in logging.go to parse the component logs:
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterFilter
metadata:
labels:
fluentbit.gardener/type: "seed"
name: cloud-controller-manager-aws-cloud-controller-manager
spec:
filters:
- parser:
keyName: log
parser: kube-apiserver-parser
reserveData: true
match: kubernetes.*cloud-controller-manager*aws-cloud-controller-manager*
Further details how to define parsers and use them with examples can be found in the following guide.
Plutono
The two types of Plutono instances found in a seed cluster are configured to expose logs of different origin in their dashboards:
- Garden Plutono dashboards expose logs from non-shoot namespaces of the seed clusters
- Shoot Plutono dashboards expose logs from the shoot cluster namespace where they belong
- Kube Apiserver
- Kube Controller Manager
- Kube Scheduler
- Cluster Autoscaler
- VPA components
- Kubernetes Pods
If the type of logs exposed in the Plutono instances needs to be changed, it is necessary to update the corresponding instance dashboard configurations.
Tips
- Be careful to create
ClusterFilters
and ClusterParsers
with unique names because they are not namespaced. We use pod_name
for filters with one container and pod_name--container_name
for pods with multiple containers. - Be careful to match exactly the log names that you need for a particular parser in your filters configuration. The regular expression you will supply will match names in the form
kubernetes.pod_name.<metadata>.container_name
. If there are extensions with the same container and pod names, they will all match the same parser in a filter. That may be a desired effect, if they all share the same log format. But it will be a problem if they don’t. To solve it, either the pod or container names must be unique, and the regular expression in the filter has to match that unique pattern. A recommended approach is to prefix containers with the extension name and tune the regular expression to match it. For example, using myextension-container
as container name and a regular expression kubernetes.mypod.*myextension-container
will guarantee match of the right log name. Make sure that the regular expression does not match more than you expect. For example, kubernetes.systemd.*systemd.*
will match both systemd-service
and systemd-monitor-service
. You will want to be as specific as possible. - It’s a good idea to put the logging configuration into the Helm chart that also deploys the extension controller, while the monitoring configuration can be part of the Helm chart/deployment routine that deploys the component managed by the controller.
References and Additional Resources
21 - Machine Controller Provider Local
machine-controller-manager-provider-local
Out of tree (controller-based) implementation for local
as a new provider.
The local out-of-tree provider implements the interface defined at MCM OOT driver.
Fundamental Design Principles
Following are the basic principles kept in mind while developing the external plugin.
- Communication between this Machine Controller (MC) and Machine Controller Manager (MCM) is achieved using the Kubernetes native declarative approach.
- Machine Controller (MC) behaves as the controller used to interact with the
local
provider and manage the VMs corresponding to the machine objects. - Machine Controller Manager (MCM) deals with higher level objects such as machine-set and machine-deployment objects.
22 - Managedresources
Deploy Resources to the Shoot Cluster
We have introduced a component called gardener-resource-manager
that is deployed as part of every shoot control plane in the seed.
One of its tasks is to manage CRDs, so called ManagedResource
s.
Managed resources contain Kubernetes resources that shall be created, reconciled, updated, and deleted by the gardener-resource-manager.
Extension controllers may create these ManagedResource
s in the shoot namespace if they need to create any resource in the shoot cluster itself, for example RBAC roles (or anything else).
Please take a look at the respective documentation.
23 - Migration
Control Plane Migration
Control Plane Migration is a new Gardener feature that has been recently implemented as proposed in GEP-7 Shoot Control Plane Migration. It should be properly supported by all extensions controllers. This document outlines some important points that extension maintainers should keep in mind to properly support migration in their extensions.
Overall Principles
The following principles should always be upheld:
- All states maintained by the extension that is external from the seed cluster, for example infrastructure resources in a cloud provider, DNS entries, etc., should be kept during the migration. No such state should be deleted and then recreated, as this might cause disruption in the availability of the shoot cluster.
- All Kubernetes resources maintained by the extension in the shoot cluster itself should also be kept during the migration. No such resources should be deleted and then recreated.
Migrate and Restore Operations
Two new operations have been introduced in Gardener. They can be specified as values of the gardener.cloud/operation
annotation on an extension resource to indicate that an operation different from a normal reconcile
should be performed by the corresponding extension controller:
- The
migrate
operation is used to ask the extension controller in the source seed to stop reconciling extension resources (in case they are requeued due to errors) and perform cleanup activities, if such are required. These cleanup activities might involve removing finalizers on resources in the shoot namespace that have been previously created by the extension controller and deleting them without actually deleting any resources external to the seed cluster. This is also the last opportunity for extensions to persist their state into the .status.state
field of the reconciled extension resource before its restored in the new destination seed cluster. - The
restore
operation is used to ask the extension controller in the destination seed to restore any state saved in the extension resource status
, before performing the actual reconciliation.
Unlike the reconcile operation, extension controllers must remove the gardener.cloud/operation
annotation at the end of a successful reconciliation when the current operation is migrate
or restore
, not at the beginning of a reconciliation.
Cleaning-Up Source Seed Resources
All resources in the source seed that have been created by an extension controller, for example secrets, config maps, managed resources, etc., should be properly cleaned up by the extension controller when the current operation is migrate
. As mentioned above, such resources should be deleted without actually deleting any resources external to the seed cluster.
There is one exception to this: Secret
s labeled with persist=true
created via the secrets manager. They should be kept (i.e., the Cleanup
function of secrets manager should not be called) and will be garbage collected automatically at the end of the migrate
operation. This ensures that they can be properly persisted in the ShootState
resource and get restored on the new destination seed cluster.
For many custom resources, for example MCM resources, the above requirement means in practice that any finalizers should be removed before deleting the resource, in addition to ensuring that the resource deletion is not reconciled by its respective controller if there is no finalizer. For managed resources, the above requirement means in practice that the spec.keepObjects
field should be set to true
before deleting the extension resource.
Here it is assumed that any resources that contain state needed by the extension controller can be safely deleted, since any such state has been saved as described in Saving and Restoring Extension States at the end of the last successful reconciliation.
Saving and Restoring Extension States
Some extension controllers create and maintain their own state when reconciling extension resources. For example, most infrastructure controllers use Terraform and maintain the terraform state in a special config map in the shoot namespace. This state must be properly migrated to the new seed cluster during control plane migration, so that subsequent reconciliations in the new seed could find and use it appropriately.
All extension controllers that require such state migration must save their state in the status.state
field of their extension resource at the end of a successful reconciliation. They must also restore their state from that same field upon reconciling an extension resource when the current operation is restore
, as specified by the gardener.cloud/operation
annotation, before performing the actual reconciliation.
As an example, an infrastructure controller that uses Terraform must save the terraform state in the status.state
field of the Infrastructure
resource. An Infrastructure
resource with a properly saved state might look as follows:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--foo--bar
spec:
type: azure
region: eu-west-1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
providerConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
resourceGroup:
name: mygroup
...
status:
state: |
{
"version": 3,
"terraform_version": "0.11.14",
"serial": 2,
"lineage": "3a1e2faa-e7b6-f5f0-5043-368dd8ea6c10",
...
}
Extension controllers that do not use a saved state and therefore do not require state migration could leave the status.state
field as nil
at the end of a successful reconciliation, and just perform a normal reconciliation when the current operation is restore
.
In addition, extension controllers that use referenced resources (usually secrets) must also make sure that these resources are added to the status.resources
field of their extension resource at the end of a successful reconciliation, so they could be properly migrated by Gardener to the destination seed.
Implementation Details
Migrate and Restore Actuator Methods
Most extension controller implementations follow a common pattern where a generic Reconciler
implementation delegates to an Actuator
interface that contains the methods Reconcile
and Delete
, provided by the extension.
Two methods Migrate
and Restore
are available in all such Actuator
interfaces, see the infrastructure Actuator
interface as an example.
These methods are called by the generic reconcilers for the migrate and restore operations respectively, and should be implemented by the extension according to the above guidelines.
Extension Controllers Based on Generic Actuators
In practice, the implementation of many extension controllers (for example, the ControlPlane
and Worker
controllers in most provider extensions) are based on a generic Actuator
implementation that only delegates to extension methods for behavior that is truly provider specific.
In all such cases, the Migrate
and Restore
methods have already been implemented properly in the generic actuators and there is nothing more to do in the extension itself.
In some rare cases, extension controllers based on a generic actuator might still introduce a custom Actuator
implementation to override some of the generic actuator methods in order to enhance or change their behavior in a certain way.
In such cases, the Migrate
and Restore
methods might need to be overridden as well, see the Azure controlplane controller as an example.
Worker
State
Note that the machine state is handled specially by gardenlet
(i.e., all relevant objects in the machine.sapcloud.io/v1alpha1
API are directly persisted by gardenlet
and NOT by the generic actuators).
In the past, they were persisted to the Worker
’s .status.state
field by the so-called “worker state reconciler”, however, this reconciler was dropped and changed as part of GEP-22.
Nowadays, gardenlet
directly writes the state to the ShootState
resource during the Migrate
phase of a Shoot
(without the detour of the Worker
’s .status.state
field).
On restoration, unlike for other extension kinds, gardenlet
no longer populates the machine state into the Worker
’s .status.state
field.
Instead, the extension controller should read the machine state directly from the ShootState
in the garden cluster (see this document for information how to access the garden cluster) and use it to subsequently restore the relevant machine.sapcloud.io/v1alpha1
resources.
This flow is implemented in the generic Worker
actuator.
As a result, Extension controllers using this generic actuator do not need to implement any custom logic.
Extension Controllers Not Based on Generic Actuators
The implementation of some extension controllers (for example, the infrastructure controllers in all provider extensions) are not based on a generic Actuator
implementation.
Such extension controllers must always provide a proper implementation of the Migrate
and Restore
methods according to the above guidelines, see the AWS infrastructure controller as an example.
In practice, this might result in code duplication between the different extensions, since the Migrate
and Restore
code is usually not provider or OS-specific.
If you do not use the generic Worker
actuator, see this section for information how to handle the machine state related to the Worker
resource.
24 - Network
Gardener Network Extension
Gardener is an open-source project that provides a nested user model. Basically, there are two types of services provided by Gardener to its users:
- Managed: end-users only request a Kubernetes cluster (Clusters-as-a-Service)
- Hosted: operators utilize Gardener to provide their own managed version of Kubernetes (Cluster-Provisioner-as-a-service)
Whether a user is an operator or an end-user, it makes sense to provide choice. For example, for an end-user it might make sense to
choose a network-plugin that would support enforcing network policies (some plugins does not come with network-policy support by default).
For operators however, choice only matters for delegation purposes i.e., when providing an own managed-service, it becomes important to also provide choice over which network-plugins to use.
Furthermore, Gardener provisions clusters on different cloud-providers with different networking requirements. For example, Azure does not support Calico overlay networking with IP in IP [1], this leads to the introduction of manual exceptions in static add-on charts which is error prone and can lead to failures during upgrades.
Finally, every provider is different, and thus the network always needs to adapt to the infrastructure needs to provide better performance. Consistency does not necessarily lie in the implementation but in the interface.
Motivation
Prior to the Network Extensibility
concept, Gardener followed a mono network-plugin support model (i.e., Calico). Although this seemed to be the easier approach, it did not completely reflect the real use-case.
The goal of the Gardener Network Extensions is to support different network plugins, therefore, the specification for the network resource won’t be fixed and will be customized based on the underlying network plugin.
To do so, a ProviderConfig
field in the spec will be provided where each plugin will define. Below is an example for how to deploy Calico as the cluster network plugin.
The Network Extensions Resource
Here is what a typical Network
resource would look-like:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Network
metadata:
name: my-network
spec:
ipFamilies:
- IPv4
podCIDR: 100.244.0.0/16
serviceCIDR: 100.32.0.0/13
type: calico
providerConfig:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
backend: bird
ipam:
cidr: usePodCIDR
type: host-local
The above resources is divided into two parts (more information can be found at Using the Networking Calico Extension):
- global configuration (e.g., podCIDR, serviceCIDR, and type)
- provider specific config (e.g., for calico we can choose to configure a
bird
backend)
Note: Certain cloud-provider extensions might have webhooks that would modify the network-resource to fit into their network specific context. As previously mentioned, Azure does not support IPIP, as a result, the Azure provider extension implements a webhook to mutate the backend and set it to None
instead of bird
.
Supporting a New Network Extension Provider
To add support for another networking provider (e.g., weave, Cilium, Flannel) a network extension controller needs to be implemented which would optionally have its own custom configuration specified in the spec.providerConfig
in the Network
resource. For example, if support for a network plugin named gardenet
is required, the following Network
resource would be created:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Network
metadata:
name: my-network
spec:
ipFamilies:
- IPv4
podCIDR: 100.244.0.0/16
serviceCIDR: 100.32.0.0/13
type: gardenet
providerConfig:
apiVersion: gardenet.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
gardenetCustomConfigField: <value>
ipam:
cidr: usePodCIDR
type: host-local
Once applied, the presumably implemented Gardenet
extension controller would pick the configuration up, parse the providerConfig
, and create the necessary resources in the shoot.
For additional reference, please have a look at the networking-calico provider extension, which provides more information on how to configure the necessary charts, as well as the actuators required to reconcile networking inside the Shoot
cluster to the desired state.
Supporting kube-proxy
-less Service Routing
Some networking extensions support service routing without the kube-proxy
component. This is why Gardener supports disabling of kube-proxy
for service routing by setting .spec.kubernetes.kubeproxy.enabled
to false
in the Shoot
specification. The implicit contract of the flag is:
If kube-proxy
is disabled, then the networking extension is responsible for the service routing.
The networking extensions need to handle this twofold:
- During the reconciliation of the networking resources, the extension needs to check whether
kube-proxy
takes care of the service routing or the networking extension itself should handle it. In case the networking extension should be responsible according to .spec.kubernetes.kubeproxy.enabled
(but is unable to perform the service routing), it should raise an error during the reconciliation. If the networking extension should handle the service routing, it may reconfigure itself accordingly. - (Optional) In case the networking extension does not support taking over the service routing (in some scenarios), it is recommended to also provide a validating admission webhook to reject corresponding changes early on. The validation may take the current operating mode of the networking extension into consideration.
25 - Operatingsystemconfig
Contract: OperatingSystemConfig
Resource
Gardener uses the machine API and leverages the functionalities of the machine-controller-manager (MCM) in order to manage the worker nodes of a shoot cluster.
The machine-controller-manager itself simply takes a reference to an OS-image and (optionally) some user-data (a script or configuration that is executed when a VM is bootstrapped), and forwards both to the provider’s API when creating VMs.
MCM does not have any restrictions regarding supported operating systems as it does not modify or influence the machine’s configuration in any way - it just creates/deletes machines with the provided metadata.
Consequently, Gardener needs to provide this information when interacting with the machine-controller-manager.
This means that basically every operating system is possible to be used, as long as there is some implementation that generates the OS-specific configuration in order to provision/bootstrap the machines.
⚠️ Currently, there are a few requirements of pre-installed components that must be present in all OS images:
- containerd
- ctr (client CLI)
containerd
must listen on its default socket path: unix:///run/containerd/containerd.sock
containerd
must be configured to work with the default configuration file in: /etc/containerd/config.toml
(eventually created by Gardener).
- systemd
The reasons for that will become evident later.
What does the user-data bootstrapping the machines contain?
Gardener installs a few components onto every worker machine in order to allow it to join the shoot cluster.
There is the kubelet
process, some scripts for continuously checking the health of kubelet
and containerd
, but also configuration for log rotation, CA certificates, etc.
You can find the complete configuration at the components folder. We are calling this the “original” user-data.
How does Gardener bootstrap the machines?
gardenlet
makes use of gardener-node-agent
to perform the bootstrapping and reconciliation of systemd units and files on the machine.
Please refer to this document for a first overview.
Usually, you would submit all the components you want to install onto the machine as part of the user-data during creation time.
However, some providers do have a size limitation (around ~16KB) for that user-data.
That’s why we do not send the “original” user-data to the machine-controller-manager (who then forwards it to the provider’s API).
Instead, we only send a small “init” script that bootstrap the gardener-node-agent
.
It fetches the “original” content from a Secret
and applies it on the machine directly.
This way we can extend the “original” user-data without any size restrictions (except for the 1 MB
limit for Secret
s).
The high-level flow is as follows:
- For every worker pool
X
in the Shoot
specification, Gardener creates a Secret
named cloud-config-<X>
in the kube-system
namespace of the shoot cluster. The secret contains the “original” OperatingSystemConfig
(i.e., systemd units and files for kubelet
, etc.). - Gardener generates a kubeconfig with minimal permissions just allowing reading these secrets. It is used by the
gardener-node-agent
later. - Gardener provides the
gardener-node-init.sh
bash script and the machine image stated in the Shoot
specification to the machine-controller-manager. - Based on this information, the machine-controller-manager creates the VM.
- After the VM has been provisioned, the
gardener-node-init.sh
script starts, fetches the gardener-node-agent
binary, and starts it. - The
gardener-node-agent
will read the gardener-node-agent-<X>
Secret
for its worker pool (containing the “original” OperatingSystemConfig
), and reconciles it.
The gardener-node-agent
can update itself in case of newer Gardener versions, and it performs a continuous reconciliation of the systemd units and files in the provided OperatingSystemConfig
(just like any other Kubernetes controller).
What needs to be implemented to support a new operating system?
As part of the Shoot
reconciliation flow, gardenlet
will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: OperatingSystemConfig
metadata:
name: pool-01-original
namespace: default
spec:
type: <my-operating-system>
purpose: reconcile
units:
- name: containerd.service
dropIns:
- name: 10-containerd-opts.conf
content: |
[Service]
Environment="SOME_OPTS=--foo=bar"
- name: containerd-monitor.service
command: start
enable: true
content: |
[Unit]
Description=Containerd-monitor daemon
After=kubelet.service
[Install]
WantedBy=multi-user.target
[Service]
Restart=always
EnvironmentFile=/etc/environment
ExecStart=/opt/bin/health-monitor containerd
files:
- path: /var/lib/kubelet/ca.crt
permissions: 0644
encoding: b64
content:
secretRef:
name: default-token-5dtjz
dataKey: token
- path: /etc/sysctl.d/99-k8s-general.conf
permissions: 0644
content:
inline:
data: |
# A higher vm.max_map_count is great for elasticsearch, mongo, or other mmap users
# See https://github.com/kubernetes/kops/issues/1340
vm.max_map_count = 135217728
In order to support a new operating system, you need to write a controller that watches all OperatingSystemConfig
s with .spec.type=<my-operating-system>
.
For those it shall generate a configuration blob that fits to your operating system.
OperatingSystemConfig
s can have two purposes: either provision
or reconcile
.
provision
Purpose
The provision
purpose is used by gardenlet
for the user-data that it later passes to the machine-controller-manager (and then to the provider’s API) when creating new VMs.
It contains the gardener-node-init.sh
script and systemd unit.
The OS controller has to translate the .spec.units
and .spec.files
into configuration that fits to the operating system.
For example, a Flatcar controller might generate a CoreOS cloud-config or Ignition, SLES might generate cloud-init, and others might simply generate a bash script translating the .spec.units
into systemd
units, and .spec.files
into real files on the disk.
⚠️ Please avoid mixing in additional systemd units or files - this step should just translate what gardenlet
put into .spec.units
and .spec.files
.
After generation, extension controllers are asked to store their OS config inside a Secret
(as it might contain confidential data) in the same namespace.
The secret’s .data
could look like this:
apiVersion: v1
kind: Secret
metadata:
name: osc-result-pool-01-original
namespace: default
ownerReferences:
- apiVersion: extensions.gardener.cloud/v1alpha1
blockOwnerDeletion: true
controller: true
kind: OperatingSystemConfig
name: pool-01-original
uid: 99c0c5ca-19b9-11e9-9ebd-d67077b40f82
data:
cloud_config: base64(generated-user-data)
Finally, the secret’s metadata must be provided in the OperatingSystemConfig
’s .status
field:
...
status:
cloudConfig:
secretRef:
name: osc-result-pool-01-original
namespace: default
lastOperation:
description: Successfully generated cloud config
lastUpdateTime: "2019-01-23T07:45:23Z"
progress: 100
state: Succeeded
type: Reconcile
observedGeneration: 5
reconcile
Purpose
The reconcile
purpose contains the “original” OperatingSystemConfig
(which is later stored in Secret
s in the shoot’s kube-system
namespace (see step 1)).
The OS controller does not need to translate anything here, but it has the option to provide additional systemd units or files via the .status
field:
status:
extensionUnits:
- name: my-custom-service.service
command: start
enable: true
content: |
[Unit]
// some systemd unit content
extensionFiles:
- path: /etc/some/file
permissions: 0644
content:
inline:
data: some-file-content
lastOperation:
description: Successfully generated cloud config
lastUpdateTime: "2019-01-23T07:45:23Z"
progress: 100
state: Succeeded
type: Reconcile
observedGeneration: 5
The gardener-node-agent
will merge .spec.units
and .status.extensionUnits
as well as .spec.files
and .status.extensionFiles
when applying.
You can find an example implementation here.
Bootstrap Tokens
gardenlet
adds a file with the content <<BOOTSTRAP_TOKEN>>
to the OperatingSystemConfig
with purpose provision
and sets transmitUnencoded=true
.
This instructs the responsible OS extension to pass this file (with its content in clear-text) to the corresponding Worker
resource.
machine-controller-manager
makes sure that
- a bootstrap token gets created per machine
- the
<<BOOTSTRAP_TOKEN>>
string in the user data of the machine gets replaced by the generated token.
After the machine has been bootstrapped, the token secret in the shoot cluster gets deleted again.
The token is used to bootstrap Gardener Node Agent and kubelet
.
What needs to be implemented to support a new operating system?
As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: OperatingSystemConfig
metadata:
name: pool-01-original
namespace: default
spec:
type: <my-operating-system>
purpose: reconcile
units:
- name: docker.service
dropIns:
- name: 10-docker-opts.conf
content: |
[Service]
Environment="DOCKER_OPTS=--log-opt max-size=60m --log-opt max-file=3"
- name: docker-monitor.service
command: start
enable: true
content: |
[Unit]
Description=Containerd-monitor daemon
After=kubelet.service
[Install]
WantedBy=multi-user.target
[Service]
Restart=always
EnvironmentFile=/etc/environment
ExecStart=/opt/bin/health-monitor docker
files:
- path: /var/lib/kubelet/ca.crt
permissions: 0644
encoding: b64
content:
secretRef:
name: default-token-5dtjz
dataKey: token
- path: /etc/sysctl.d/99-k8s-general.conf
permissions: 0644
content:
inline:
data: |
# A higher vm.max_map_count is great for elasticsearch, mongo, or other mmap users
# See https://github.com/kubernetes/kops/issues/1340
vm.max_map_count = 135217728
In order to support a new operating system, you need to write a controller that watches all OperatingSystemConfig
s with .spec.type=<my-operating-system>
.
For those it shall generate a configuration blob that fits to your operating system.
For example, a CoreOS controller might generate a CoreOS cloud-config or Ignition, SLES might generate cloud-init, and others might simply generate a bash script translating the .spec.units
into systemd
units, and .spec.files
into real files on the disk.
OperatingSystemConfig
s can have two purposes which can be used (or ignored) by the extension controllers: either provision
or reconcile
.
- The
provision
purpose is used by Gardener for the user-data that it later passes to the machine-controller-manager (and then to the provider’s API) when creating new VMs. It contains the gardener-node-init
unit. - The
reconcile
purpose contains the “original” user-data (that is then stored in Secret
s in the shoot’s kube-system
namespace (see step 1). This is downloaded and applies late (see step 5).
As described above, the “original” user-data must be re-applicable to allow in-place updates.
The way how this is done is specific to the generated operating system config (e.g., for CoreOS cloud-init the command is /usr/bin/coreos-cloudinit --from-file=<path>
, whereas SLES would run cloud-init --file <path> single -n write_files --frequency=once
).
Consequently, besides the generated OS config, the extension controller must also provide a command for re-application an updated version of the user-data.
As visible in the mentioned examples, the command requires a path to the user-data file.
As soon as Gardener detects that the user data has changed it will reload the systemd daemon and restart all the units provided in the .status.units[]
list (see the below example). The same logic applies during the very first application of the whole configuration.
After generation, extension controllers are asked to store their OS config inside a Secret
(as it might contain confidential data) in the same namespace.
The secret’s .data
could look like this:
apiVersion: v1
kind: Secret
metadata:
name: osc-result-pool-01-original
namespace: default
ownerReferences:
- apiVersion: extensions.gardener.cloud/v1alpha1
blockOwnerDeletion: true
controller: true
kind: OperatingSystemConfig
name: pool-01-original
uid: 99c0c5ca-19b9-11e9-9ebd-d67077b40f82
data:
cloud_config: base64(generated-user-data)
Finally, the secret’s metadata, the OS-specific command to re-apply the configuration, and the list of systemd
units that shall be considered to be restarted if an updated version of the user-data is re-applied must be provided in the OperatingSystemConfig
’s .status
field:
...
status:
cloudConfig:
secretRef:
name: osc-result-pool-01-original
namespace: default
lastOperation:
description: Successfully generated cloud config
lastUpdateTime: "2019-01-23T07:45:23Z"
progress: 100
state: Succeeded
type: Reconcile
observedGeneration: 5
units:
- docker-monitor.service
Once the .status
indicates that the extension controller finished reconciling Gardener will continue with the next step of the shoot reconciliation flow.
CRI Support
Gardener supports specifying a Container Runtime Interface (CRI) configuration in the OperatingSystemConfig
resource. If the .spec.cri
section exists, then the name
property is mandatory. The only supported value for cri.name
at the moment is: containerd
.
For example:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: OperatingSystemConfig
metadata:
name: pool-01-original
namespace: default
spec:
type: <my-operating-system>
purpose: reconcile
cri:
name: containerd
# cgroupDriver: cgroupfs # or systemd
containerd:
sandboxImage: registry.k8s.io/pause
# registries:
# - upstream: docker.io
# server: https://registry-1.docker.io
# hosts:
# - url: http://<service-ip>:<port>]
# plugins:
# - op: add # add (default) or remove
# path: [io.containerd.grpc.v1.cri, containerd]
# values: '{"default_runtime_name": "runc"}'
...
To support containerd
, an OS extension must satisfy the following criteria:
- The operating system must have built-in containerd and ctr (client CLI).
containerd
must listen on its default socket path: unix:///run/containerd/containerd.sock
containerd
must be configured to work with the default configuration file in: /etc/containerd/config.toml
(Created by Gardener).
For a convenient handling, gardener-node-agent can manage various aspects of containerd’s config, e.g. the registry configuration, if given in the OperatingSystemConfig
.
Any Gardener extension which needs to modify the config, should check the functionality exposed through this API first.
If applicable, adjustments can be implemented through mutating webhooks, acting on the created or updated OperatingSystemConfig
resource.
If CRI configurations are not supported, it is recommended to create a validating webhook running in the garden cluster that prevents specifying the .spec.providers.workers[].cri
section in the Shoot
objects.
cgroup driver
For Shoot clusters using Kubernetes < 1.31, Gardener is setting the kubelet’s cgroup driver to cgroupfs
and containerd’s cgroup driver is unmanaged. For Shoot clusters using Kubernetes 1.31+, Gardener is setting both kubelet’s and containerd’s cgroup driver to systemd
.
The systemd
cgroup driver is a requirement for operating systems using cgroup v2. It’s important to ensure that both kubelet and the container runtime (containerd) are using the same cgroup driver to avoid potential issues.
OS extensions might also overwrite the cgroup driver for containerd and kubelet.
References and Additional Resources
26 - Overview
Extensibility Overview
Initially, everything was developed in-tree in the Gardener project. All cloud providers and the configuration for all the supported operating systems were released together with the Gardener core itself.
But as the project grew, it got more and more difficult to add new providers and maintain the existing code base.
As a consequence and in order to become agile and flexible again, we proposed GEP-1 (Gardener Enhancement Proposal).
The document describes an out-of-tree extension architecture that keeps the Gardener core logic independent of provider-specific knowledge (similar to what Kubernetes has achieved with out-of-tree cloud providers or with CSI volume plugins).
Basic Concepts
Gardener keeps running in the “garden cluster” and implements the core logic of shoot cluster reconciliation / deletion.
Extensions are Kubernetes controllers themselves (like Gardener) and run in the seed clusters.
As usual, we try to use Kubernetes wherever applicable.
We rely on Kubernetes extension concepts in order to enable extensibility for Gardener.
The main ideas of GEP-1 are the following:
During the shoot reconciliation process, Gardener will write CRDs into the seed cluster that are watched and managed by the extension controllers. They will reconcile (based on the .spec
) and report whether everything went well or errors occurred in the CRD’s .status
field.
Gardener keeps deploying the provider-independent control plane components (etcd, kube-apiserver, etc.). However, some of these components might still need little customization by providers, e.g., additional configuration, flags, etc. In this case, the extension controllers register webhooks in order to manipulate the manifests.
Example 1:
Gardener creates a new AWS shoot cluster and requires the preparation of infrastructure in order to proceed (networks, security groups, etc.).
It writes the following CRD into the seed cluster:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--core--aws-01
spec:
type: aws
providerConfig:
apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vpc:
cidr: 10.250.0.0/16
internal:
- 10.250.112.0/22
public:
- 10.250.96.0/22
workers:
- 10.250.0.0/19
zones:
- eu-west-1a
dns:
apiserver: api.aws-01.core.example.com
region: eu-west-1
secretRef:
name: my-aws-credentials
sshPublicKey: |
base64(key)
Please note that the .spec.providerConfig
is a raw blob and not evaluated or known in any way by Gardener.
Instead, it was specified by the user (in the Shoot
resource) and just “forwarded” to the extension controller.
Only the AWS controller understands this configuration and will now start provisioning/reconciling the infrastructure.
It reports in the .status
field the result:
status:
observedGeneration: ...
state: ...
lastError: ..
lastOperation: ...
providerStatus:
apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureStatus
vpc:
id: vpc-1234
subnets:
- id: subnet-acbd1234
name: workers
zone: eu-west-1
securityGroups:
- id: sg-xyz12345
name: workers
iam:
nodesRoleARN: <some-arn>
instanceProfileName: foo
ec2:
keyName: bar
Gardener waits until the .status.lastOperation
/ .status.lastError
indicates that the operation reached a final state and either continuous with the next step, or stops and reports the potential error.
The extension-specific output in .status.providerStatus
is - similar to .spec.providerConfig
- not evaluated, and simply forwarded to CRDs in subsequent steps.
Example 2:
Gardener deploys the control plane components into the seed cluster, e.g. the kube-controller-manager
deployment with the following flags:
apiVersion: apps/v1
kind: Deployment
...
spec:
template:
spec:
containers:
- command:
- /usr/local/bin/kube-controller-manager
- --allocate-node-cidrs=true
- --attach-detach-reconcile-sync-period=1m0s
- --controllers=*,bootstrapsigner,tokencleaner
- --cluster-cidr=100.96.0.0/11
- --cluster-name=shoot--core--aws-01
- --cluster-signing-cert-file=/srv/kubernetes/ca/ca.crt
- --cluster-signing-key-file=/srv/kubernetes/ca/ca.key
- --concurrent-deployment-syncs=10
- --concurrent-replicaset-syncs=10
...
The AWS controller requires some additional flags in order to make the cluster functional.
It needs to provide a Kubernetes cloud-config and also some cloud-specific flags.
Consequently, it registers a MutatingWebhookConfiguration
on Deployment
s and adds these flags to the container:
- --cloud-provider=external
- --external-cloud-volume-plugin=aws
- --cloud-config=/etc/kubernetes/cloudprovider/cloudprovider.conf
Of course, it would have needed to create a ConfigMap
containing the cloud config and to add the proper volume
and volumeMounts
to the manifest as well.
(Please note for this special example: The Kubernetes community is also working on making the kube-controller-manager
provider-independent.
However, there will most probably be still components other than the kube-controller-manager
which need to be adapted by extensions.)
If you are interested in writing an extension, or generally in digging deeper to find out the nitty-gritty details of the extension concepts, please read GEP-1.
We are truly looking forward to your feedback!
Current Status
Meanwhile, the out-of-tree extension architecture of Gardener is in place and has been productively validated. We are tracking all internal and external extensions of Gardener in the Gardener Extensions Library repo.
27 - Project Roles
Extending Project Roles
The Project
resource allows to specify a list of roles for every member (.spec.members[*].roles
).
There are a few standard roles defined by Gardener itself.
Please consult Projects for further information.
However, extension controllers running in the garden cluster may also create CustomResourceDefinition
s that project members might be able to CRUD.
For this purpose, Gardener also allows to specify extension roles.
An extension role is prefixed with extension:
, e.g.
apiVersion: core.gardener.cloud/v1beta1
kind: Project
metadata:
name: dev
spec:
members:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: alice.doe@example.com
role: admin
roles:
- owner
- extension:foo
The project controller will, for every extension role, create a ClusterRole
with name gardener.cloud:extension:project:<projectName>:<roleName>
, i.e., for the above example: gardener.cloud:extension:project:dev:foo
.
This ClusterRole
aggregates other ClusterRole
s that are labeled with rbac.gardener.cloud/aggregate-to-extension-role=foo
which might be created by extension controllers.
An extension that might want to contribute to the core admin
or viewer
roles can use the labels rbac.gardener.cloud/aggregate-to-project-member=true
or rbac.gardener.cloud/aggregate-to-project-viewer=true
, respectively.
Please note that the names of the extension roles are restricted to 20 characters!
Moreover, the project controller will also create a corresponding RoleBinding
with the same name in the project namespace.
It will automatically assign all members that are assigned to this extension role.
28 - Provider Local
Local Provider Extension
The “local provider” extension is used to allow the usage of seed and shoot clusters which run entirely locally without any real infrastructure or cloud provider involved.
It implements Gardener’s extension contract (GEP-1) and thus comprises several controllers and webhooks acting on resources in seed and shoot clusters.
The code is maintained in pkg/provider-local
.
Motivation
The motivation for maintaining such extension is the following:
- 🛡 Output Qualification: Run fast and cost-efficient end-to-end tests, locally and in CI systems (increased confidence ⛑ before merging pull requests)
- ⚙️ Development Experience: Develop Gardener entirely on a local machine without any external resources involved (improved costs 💰 and productivity 🚀)
- 🤝 Open Source: Quick and easy setup for a first evaluation of Gardener and a good basis for first contributions
Current Limitations
The following enlists the current limitations of the implementation.
Please note that all of them are not technical limitations/blockers, but simply advanced scenarios that we haven’t had invested yet into.
No load balancers for Shoot clusters.
We have not yet developed a cloud-controller-manager
which could reconcile load balancer Service
s in the shoot cluster.
In case a seed cluster with multiple availability zones, i.e. multiple entries in .spec.provider.zones
, is used in conjunction with a single-zone shoot control plane, i.e. a shoot cluster without .spec.controlPlane.highAvailability
or with .spec.controlPlane.highAvailability.failureTolerance.type
set to node
, the local address of the API server endpoint needs to be determined manually or via the in-cluster coredns
.
As the different istio ingress gateway loadbalancers have individual external IP addresses, single-zone shoot control planes can end up in a random availability zone. Having the local host use the coredns
in the cluster as name resolver would form a name resolution cycle. The tests mitigate the issue by adapting the DNS configuration inside the affected test.
ManagedSeed
s
It is possible to deploy ManagedSeed
s with provider-local
by first creating a Shoot
in the garden
namespace and then creating a referencing ManagedSeed
object.
Please note that this is only supported by the Skaffold
-based setup.
The corresponding e2e test can be run via:
./hack/test-e2e-local.sh --label-filter "ManagedSeed"
Implementation Details
The images locally built by Skaffold
for the Gardener components which are deployed to this shoot cluster are managed by a container registry in the registry
namespace in the kind cluster.
provider-local
configures this registry as mirror for the shoot by mutating the OperatingSystemConfig
and using the default contract for extending the containerd
configuration.
In order to bootstrap a seed cluster, the gardenlet
deploys PersistentVolumeClaim
s and Service
s of type LoadBalancer
.
While storage is supported in shoot clusters by using the local-path-provisioner
, load balancers are not supported yet.
However, provider-local
runs a Service
controller which specifically reconciles the seed-related Service
s of type LoadBalancer
.
This way, they get an IP and gardenlet
can finish its bootstrapping process.
Note that these IPs are not reachable, however for the sake of developing ManagedSeed
s this is sufficient for now.
Also, please note that the provider-local
extension only gets deployed because of the Always
deployment policy in its corresponding ControllerRegistration
and because the DNS provider type of the seed is set to local
.
Implementation Details
This section contains information about how the respective controllers and webhooks in provider-local
are implemented and what their purpose is.
Bootstrapping
The Helm chart of the provider-local
extension defined in its ControllerDeployment
contains a special deployment for a CoreDNS instance in a gardener-extension-provider-local-coredns
namespace in the seed cluster.
This CoreDNS instance is responsible for enabling the components running in the shoot clusters to be able to resolve the DNS names when they communicate with their kube-apiserver
s.
It contains a static configuration to resolve the DNS names based on local.gardener.cloud
to istio-ingressgateway.istio-ingress.svc
.
Controllers
There are controllers for all resources in the extensions.gardener.cloud/v1alpha1
API group except for BackupBucket
and BackupEntry
s.
ControlPlane
This controller is deploying the local-path-provisioner as well as a related StorageClass
in order to support PersistentVolumeClaim
s in the local shoot cluster.
Additionally, it creates a few (currently unused) dummy secrets (CA, server and client certificate, basic auth credentials) for the sake of testing the secrets manager integration in the extensions library.
DNSRecord
The controller adapts the cluster internal DNS configuration by extending the coredns
configuration for every observed DNSRecord
. It will add two corresponding entries in the custom DNS configuration per shoot cluster:
data:
api.local.local.external.local.gardener.cloud.override: |
rewrite stop name regex api.local.local.external.local.gardener.cloud istio-ingressgateway.istio-ingress.svc.cluster.local answer auto
api.local.local.internal.local.gardener.cloud.override: |
rewrite stop name regex api.local.local.internal.local.gardener.cloud istio-ingressgateway.istio-ingress.svc.cluster.local answer auto
Infrastructure
This controller generates a NetworkPolicy
which allows the control plane pods (like kube-apiserver
) to communicate with the worker machine pods (see Worker
section).
Network
This controller is not implemented anymore. In the initial version of provider-local
, there was a Network
controller deploying kindnetd (see release v1.44.1).
However, we decided to drop it because this setup prevented us from using NetworkPolicy
s (kindnetd does not ship a NetworkPolicy
controller).
In addition, we had issues with shoot clusters having more than one node (hence, we couldn’t support rolling updates, see PR #5666).
OperatingSystemConfig
This controller renders a simple cloud-init template which can later be executed by the shoot worker nodes.
The shoot worker nodes are Pod
s with a container based on the kindest/node
image. This is maintained in the gardener/machine-controller-manager-provider-local repository and has a special run-userdata
systemd service which executes the cloud-init generated earlier by the OperatingSystemConfig
controller.
Worker
This controller leverages the standard generic Worker
actuator in order to deploy the machine-controller-manager
as well as the machine-controller-manager-provider-local
.
Additionally, it generates the MachineClass
es and the MachineDeployment
s based on the specification of the Worker
resources.
Ingress
The gardenlet creates a wildcard DNS record for the Seed’s ingress domain pointing to the nginx-ingress-controller
’s LoadBalancer.
This domain is commonly used by all Ingress
objects created in the Seed for Seed and Shoot components.
As provider-local implements the DNSRecord
extension API (see the DNSRecord
section), this controller reconciles all Ingress
s and creates DNSRecord
s of type local
for each host included in spec.rules
.
This only happens for shoot namespaces (gardener.cloud/role=shoot
label) to make Ingress
domains resolvable on the machine pods.
Service
This controller reconciles Services
of type LoadBalancer
in the local Seed
cluster.
Since the local Kubernetes clusters used as Seed clusters typically don’t support such services, this controller sets the .status.ingress.loadBalancer.ip[0]
to the IP of the host.
It makes important LoadBalancer Services (e.g. istio-ingress/istio-ingressgateway
and garden/nginx-ingress-controller
) available to the host by setting spec.ports[].nodePort
to well-known ports that are mapped to hostPorts
in the kind cluster configuration.
istio-ingress/istio-ingressgateway
is set to be exposed on nodePort
30433
by this controller.
In case the seed has multiple availability zones (.spec.provider.zones
) and it uses SNI, the different zone-specific istio-ingressgateway
loadbalancers are exposed via different IP addresses. Per default, IP addresses 172.18.255.10
, 172.18.255.11
, and 172.18.255.12
are used for the zones 0
, 1
, and 2
respectively.
ETCD Backups
This controller reconciles the BackupBucket
and BackupEntry
of the shoot allowing the etcd-backup-restore
to create and copy backups using the local
provider functionality. The backups are stored on the host file system. This is achieved by mounting that directory to the etcd-backup-restore
container.
Extension Seed
This controller reconciles Extensions
of type local-ext-seed
. It creates a single serviceaccount
named local-ext-seed
in the shoot’s namespace in the seed. The extension is reconciled before the kube-apiserver
. More on extension lifecycle strategies can be read in Registering Extension Controllers.
Extension Shoot
This controller reconciles Extensions
of type local-ext-shoot
. It creates a single serviceaccount
named local-ext-shoot
in the kube-system
namespace of the shoot. The extension is reconciled after the kube-apiserver
. More on extension lifecycle strategies can be read Registering Extension Controllers.
Extension Shoot After Worker
This controller reconciles Extensions
of type local-ext-shoot-after-worker
. It creates a deployment
named local-ext-shoot-after-worker
in the kube-system
namespace of the shoot. The extension is reconciled after the workers and waits until the deployment is ready. More on extension lifecycle strategies can be read Registering Extension Controllers.
Health Checks
The health check controller leverages the health check library in order to:
- check the health of the
ManagedResource/extension-controlplane-shoot-webhooks
and populate the SystemComponentsHealthy
condition in the ControlPlane
resource. - check the health of the
ManagedResource/extension-networking-local
and populate the SystemComponentsHealthy
condition in the Network
resource. - check the health of the
ManagedResource/extension-worker-mcm-shoot
and populate the SystemComponentsHealthy
condition in the Worker
resource. - check the health of the
Deployment/machine-controller-manager
and populate the ControlPlaneHealthy
condition in the Worker
resource. - check the health of the
Node
s and populate the EveryNodeReady
condition in the Worker
resource.
Webhooks
Control Plane
This webhook reacts on the OperatingSystemConfig
containing the configuration of the kubelet and sets the failSwapOn
to false
(independent of what is configured in the Shoot
spec) (ref).
DNS Config
This webhook reacts on events for the dependency-watchdog-probe
Deployment
, the blackbox-exporter
Deployment
, as well as on events for Pod
s created when the machine-controller-manager
reconciles Machine
s.
All these pods need to be able to resolve the DNS names for shoot clusters.
It sets the .spec.dnsPolicy=None
and .spec.dnsConfig.nameServers
to the cluster IP of the coredns
Service
created in the gardener-extension-provider-local-coredns
namespaces so that these pods can resolve the DNS records for shoot clusters (see the Bootstrapping section for more details).
Machine Controller Manager
This webhook mutates the global ClusterRole
related to machine-controller-manager
and injects permissions for Service
resources.
The machine-controller-manager-provider-local
deploys Pod
s for each Machine
(while real infrastructure provider obviously deploy VMs, so no Kubernetes resources directly).
It also deploys a Service
for these machine pods, and in order to do so, the ClusterRole
must allow the needed permissions for Service
resources.
Node
This webhook reacts on updates to nodes/status
in both seed and shoot clusters and sets the .status.{allocatable,capacity}.cpu="100"
and .status.{allocatable,capacity}.memory="100Gi"
fields.
Background: Typically, the .status.{capacity,allocatable}
values are determined by the resources configured for the Docker daemon (see for example the docker Quick Start Guide for Mac).
Since many of the Pod
s deployed by Gardener have quite high .spec.resources.requests
, the Node
s easily get filled up and only a few Pod
s can be scheduled (even if they barely consume any of their reserved resources).
In order to improve the user experience, on startup/leader election the provider-local extension submits an empty patch which triggers the “node webhook” (see the below section) for the seed cluster.
The webhook will increase the capacity of the Node
s to allow all Pod
s to be scheduled.
For the shoot clusters, this empty patch trigger is not needed since the MutatingWebhookConfiguration
is reconciled by the ControlPlane
controller and exists before the Node
object gets registered.
Shoot
This webhook reacts on the ConfigMap
used by the kube-proxy
and sets the maxPerCore
field to 0
since other values don’t work well in conjunction with the kindest/node
image which is used as base for the shoot worker machine pods (ref).
DNS Configuration for Multi-Zonal Seeds
In case a seed cluster has multiple availability zones as specified in .spec.provider.zones
, multiple istio ingress gateways are deployed, one per availability zone in addition to the default deployment. The result is that single-zone shoot control planes, i.e. shoot clusters with .spec.controlPlane.highAvailability
set or with .spec.controlPlane.highAvailability.failureTolerance.type
set to node
, may be exposed via any of the zone-specific istio ingress gateways. Previously, the endpoints were statically mapped via /etc/hosts
. Unfortunately, this is no longer possible due to the aforementioned dynamic in the endpoint selection.
For multi-zonal seed clusters, there is an additional configuration following coredns
’s view plugin mapping the external IP addresses of the zone-specific loadbalancers to the corresponding internal istio ingress gateway domain names. This configuration is only in place for requests from outside of the seed cluster. Those requests are currently being identified by the protocol. UDP requests are interpreted as originating from within the seed cluster while TCP requests are assumed to come from outside the cluster via the docker hostport mapping.
The corresponding test sets the DNS configuration accordingly so that the name resolution during the test use coredns
in the cluster.
Future Work
Future work could mostly focus on resolving the above listed limitations, i.e.:
- Implement a
cloud-controller-manager
and deploy it via the ControlPlane
controller. - Properly implement
.spec.machineTypes
in the CloudProfile
s (i.e., configure .spec.resources
properly for the created shoot worker machine pods).
29 - Reconcile Trigger
Reconcile Trigger
Gardener dictates the time of reconciliation for resources of the API group extensions.gardener.cloud
.
It does that by annotating the respected resource with gardener.cloud/operation=reconcile
.
Extension controllers shall react to this annotation and start reconciling the resource.
They have to remove this annotation as soon as they begin with their reconcile operation and maintain the status
of the extension resource accordingly.
The reason for this behaviour is that it is possible to configure Gardener to reconcile only in the shoots’ maintenance time windows.
In order to avoid that, extension controllers reconcile outside of the shoot’s maintenance time window we have introduced this contract.
This way extension controllers don’t need to care about when the shoot maintenance time window happens.
Gardener keeps control and decides when the shoot shall be reconciled/updated.
Our extension controller library provides all the required utilities to conveniently implement this behaviour.
30 - Referenced Resources
Referenced Resources
The Shoot resource can include a list of resources (usually secrets) that can be referenced by name in the extension providerConfig
and other Shoot sections, for example:
kind: Shoot
apiVersion: core.gardener.cloud/v1beta1
metadata:
name: crazy-botany
namespace: garden-dev
...
spec:
...
extensions:
- type: foobar
providerConfig:
apiVersion: foobar.extensions.gardener.cloud/v1alpha1
kind: FooBarConfig
foo: bar
secretRef: foobar-secret
resources:
- name: foobar-secret
resourceRef:
apiVersion: v1
kind: Secret
name: my-foobar-secret
Gardener expects to find these referenced resources in the project namespace (e.g. garden-dev
) and will copy them to the Shoot namespace in the Seed cluster when reconciling a Shoot, adding a prefix to their names to avoid naming collisions with Gardener’s own resources.
Extension controllers can resolve the references to these resources by accessing the Shoot via the Cluster
resource. To properly read a referenced resources, extension controllers should use the utility function GetObjectByReference
from the extensions/pkg/controller
package, for example:
...
ref = &autoscalingv1.CrossVersionObjectReference{
APIVersion: "v1",
Kind: "Secret",
Name: "foo",
}
secret := &corev1.Secret{}
if err := controller.GetObjectByReference(ctx, client, ref, "shoot--test--foo", secret); err != nil {
return err
}
// Use secret
...
31 - Shoot Health Status Conditions
Contributing to Shoot Health Status Conditions
Gardener checks regularly (every minute by default) the health status of all shoot clusters.
It categorizes its checks into five different types:
APIServerAvailable
: This type indicates whether the shoot’s kube-apiserver is available or not.ControlPlaneHealthy
: This type indicates whether the core components of the Shoot controlplane (ETCD, KAPI, KCM..) are healthy.EveryNodeReady
: This type indicates whether all Node
s and all Machine
objects report healthiness.ObservabilityComponentsHealthy
: This type indicates whether the observability components of the Shoot control plane (Prometheus, Vali, Plutono..) are healthy.SystemComponentsHealthy
: This type indicates whether all system components deployed to the kube-system
namespace in the shoot do exist and are running fine.
In case of workerless Shoot
, EveryNodeReady
condition is not present in the Shoot
’s conditions since there are no nodes in the cluster.
Every Shoot
resource has a status.conditions[]
list that contains the mentioned types, together with a status
(True
/False
) and a descriptive message/explanation of the status
.
Most extension controllers are deploying components and resources as part of their reconciliation flows into the seed or shoot cluster.
A prominent example for this is the ControlPlane
controller that usually deploys a cloud-controller-manager or CSI controllers as part of the shoot control plane.
Now that the extensions deploy resources into the cluster, especially resources that are essential for the functionality of the cluster, they might want to contribute to Gardener’s checks mentioned above.
What can extensions do to contribute to Gardener’s health checks?
Every extension resource in Gardener’s extensions.gardener.cloud/v1alpha1
API group also has a status.conditions[]
list (like the Shoot
).
Extension controllers can write conditions to the resource they are acting on and use a type that also exists in the shoot’s conditions.
One exception is that APIServerAvailable
can’t be used, as Gardener clearly can identify the status of this condition and it doesn’t make sense for extensions to try to contribute/modify it.
As an example for the ControlPlane
controller, let’s take a look at the following resource:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: ControlPlane
metadata:
name: control-plane
namespace: shoot--foo--bar
spec:
...
status:
conditions:
- type: ControlPlaneHealthy
status: "False"
reason: DeploymentUnhealthy
message: 'Deployment cloud-controller-manager is unhealthy: condition "Available" has
invalid status False (expected True) due to MinimumReplicasUnavailable: Deployment
does not have minimum availability.'
lastUpdateTime: "2014-05-25T12:44:27Z"
- type: ConfigComputedSuccessfully
status: "True"
reason: ConfigCreated
message: The cloud-provider-config has been successfully computed.
lastUpdateTime: "2014-05-25T12:43:27Z"
The extension controller has declared in its extension resource that one of the deployments it is responsible for is unhealthy.
Also, it has written a second condition using a type that is unknown by Gardener.
Gardener will pick the list of conditions and recognize that there is one with a type ControlPlaneHealthy
.
It will merge it with its own ControlPlaneHealthy
condition and report it back to the Shoot
’s status:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
labels:
shoot.gardener.cloud/status: unhealthy
name: some-shoot
namespace: garden-core
spec:
status:
conditions:
- type: APIServerAvailable
status: "True"
reason: HealthzRequestSucceeded
message: API server /healthz endpoint responded with success status code. [response_time:31ms]
lastUpdateTime: "2014-05-23T08:26:52Z"
lastTransitionTime: "2014-05-25T12:45:13Z"
- type: ControlPlaneHealthy
status: "False"
reason: ControlPlaneUnhealthyReport
message: 'Deployment cloud-controller-manager is unhealthy: condition "Available" has
invalid status False (expected True) due to MinimumReplicasUnavailable: Deployment
does not have minimum availability.'
lastUpdateTime: "2014-05-25T12:45:13Z"
lastTransitionTime: "2014-05-25T12:45:13Z"
...
Hence, the only duty extensions have is to maintain the health status of their components in the extension resource they are managing.
This can be accomplished using the health check library for extensions.
Error Codes
The Gardener API includes some well-defined error codes, e.g., ERR_INFRA_UNAUTHORIZED
, ERR_INFRA_DEPENDENCIES
, etc.
Extension may set these error codes in the .status.conditions[].codes[]
list in case it makes sense.
Gardener will pick them up and will similarly merge them into the .status.conditions[].codes[]
list in the Shoot
:
status:
conditions:
- type: ControlPlaneHealthy
status: "False"
reason: DeploymentUnhealthy
message: 'Deployment cloud-controller-manager is unhealthy: condition "Available" has
invalid status False (expected True) due to MinimumReplicasUnavailable: Deployment
does not have minimum availability.'
lastUpdateTime: "2014-05-25T12:44:27Z"
codes:
- ERR_INFRA_UNAUTHORIZED
32 - Shoot Maintenance
Shoot Maintenance
There is a general document about shoot maintenance that you might want to read.
Here, we describe how you can influence certain operations that happen during a shoot maintenance.
Restart Control Plane Controllers
As outlined in the above linked document, Gardener offers to restart certain control plane controllers running in the seed during a shoot maintenance.
Extension controllers can extend the amount of pods being affected by these restarts.
If your Gardener extension manages pods of a shoot’s control plane (shoot namespace in seed) and it could potentially profit from a regular restart, please consider labeling it with maintenance.gardener.cloud/restart=true
.
33 - Shoot Webhooks
Shoot Resource Customization Webhooks
Gardener deploys several components/resources into the shoot cluster.
Some of these resources are essential (like the kube-proxy
), others are optional addons (like the kubernetes-dashboard
or the nginx-ingress-controller
).
In either case, some provider extensions might need to mutate these resources and inject provider-specific bits into it.
What’s the approach to implement such mutations?
Similar to how control plane components in the seed are modified, we are using MutatingWebhookConfiguration
s to achieve the same for resources in the shoot.
Both the provider extension and the kube-apiserver of the shoot cluster are running in the same seed.
Consequently, the kube-apiserver can talk cluster-internally to the provider extension webhook, which makes such operations even faster.
How is the MutatingWebhookConfiguration
object created in the shoot?
The preferred approach is to use a ManagedResource
(see also Deploy Resources to the Shoot Cluster) in the seed cluster.
This way the gardener-resource-manager
ensures that end-users cannot delete/modify the webhook configuration.
The provider extension doesn’t need to care about the same.
What else is needed?
The shoot’s kube-apiserver must be allowed to talk to the provider extension.
To achieve this, you need to make sure that the relevant NetworkPolicy
get created for allowing the network traffic.
Please refer to this guide for more information.
34 - Worker
Contract: Worker
Resource
While the control plane of a shoot cluster is living in the seed and deployed as native Kubernetes workload, the worker nodes of the shoot clusters are normal virtual machines (VMs) in the end-users infrastructure account.
The Gardener project features a sub-project called machine-controller-manager.
This controller is extending the Kubernetes API using custom resource definitions to represent actual VMs as Machine
objects inside a Kubernetes system.
This approach unlocks the possibility to manage virtual machines in the Kubernetes style and benefit from all its design principles.
What is the machine-controller-manager doing exactly?
Generally, there are provider-specific MachineClass
objects (AWSMachineClass
, AzureMachineClass
, etc.; similar to StorageClass
), and MachineDeployment
, MachineSet
, and Machine
objects (similar to Deployment
, ReplicaSet
, and Pod
).
A machine class describes where and how to create virtual machines (in which networks, region, availability zone, SSH key, user-data for bootstrapping, etc.), while a Machine
results in an actual virtual machine.
You can read up more information in the machine-controller-manager’s repository.
The gardenlet
deploys the machine-controller-manager
, hence, provider extensions only have to inject their specific out-of-tree machine-controller-manager
sidecar container into the Deployment
.
What needs to be implemented to support a new worker provider?
As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Worker
metadata:
name: bar
namespace: shoot--foo--bar
spec:
type: azure
region: eu-west-1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
infrastructureProviderStatus:
apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureStatus
ec2:
keyName: shoot--foo--bar-ssh-publickey
iam:
instanceProfiles:
- name: shoot--foo--bar-nodes
purpose: nodes
roles:
- arn: arn:aws:iam::0123456789:role/shoot--foo--bar-nodes
purpose: nodes
vpc:
id: vpc-0123456789
securityGroups:
- id: sg-1234567890
purpose: nodes
subnets:
- id: subnet-01234
purpose: nodes
zone: eu-west-1b
- id: subnet-56789
purpose: public
zone: eu-west-1b
- id: subnet-0123a
purpose: nodes
zone: eu-west-1c
- id: subnet-5678a
purpose: public
zone: eu-west-1c
pools:
- name: cpu-worker
minimum: 3
maximum: 5
maxSurge: 1
maxUnavailable: 0
machineType: m4.large
machineImage:
name: coreos
version: 1967.5.0
nodeAgentSecretName: gardener-node-agent-local-ee46034b8269353b
nodeTemplate:
capacity:
cpu: 2
gpu: 0
memory: 8Gi
labels:
node.kubernetes.io/role: node
worker.gardener.cloud/cri-name: containerd
worker.gardener.cloud/pool: cpu-worker
worker.gardener.cloud/system-components: "true"
userDataSecretRef:
name: user-data-secret
key: cloud_config
volume:
size: 20Gi
type: gp2
zones:
- eu-west-1b
- eu-west-1c
machineControllerManager:
drainTimeout: 10m
healthTimeout: 10m
creationTimeout: 10m
maxEvictRetries: 30
nodeConditions:
- ReadonlyFilesystem
- DiskPressure
- KernelDeadlock
clusterAutoscaler:
scaleDownUtilizationThreshold: 0.5
scaleDownGpuUtilizationThreshold: 0.5
scaleDownUnneededTime: 30m
scaleDownUnreadyTime: 1h
maxNodeProvisionTime: 15m
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used to create the needed virtual machines.
Also, as you can see, Gardener copies the output of the infrastructure creation (.spec.infrastructureProviderStatus
, see Infrastructure
resource), into the .spec
.
In the .spec.pools[]
field, the desired worker pools are listed.
In the above example, one pool with machine type m4.large
and min=3
, max=5
machines shall be spread over two availability zones (eu-west-1b
, eu-west-1c
).
This information together with the infrastructure status must be used to determine the proper configuration for the machine classes.
The spec.pools[].labels
map contains all labels that should be added to all nodes of the corresponding worker pool.
Gardener configures kubelet’s --node-labels
flag to contain all labels that are mentioned here and allowed by the NodeRestriction
admission plugin.
This makes sure that kubelet adds all user-specified and gardener-managed labels to the new Node
object when registering a new machine with the API server.
Nevertheless, this is only effective when bootstrapping new nodes.
The provider extension (respectively, machine-controller-manager) is still responsible for updating the labels of existing Nodes
when the worker specification changes.
The spec.pools[].nodeTemplate.capacity
field contains the resource information of the machine like cpu
, gpu
, and memory
. This info is used by Cluster Autoscaler to generate nodeTemplate
during scaling the nodeGroup
from zero.
The spec.pools[].machineControllerManager
field allows to configure the settings for machine-controller-manager component. Providers must populate these settings on worker-pool to the related fields in MachineDeployment.
The spec.pools[].clusterAutoscaler
field contains cluster-autoscaler
settings that are to be applied only to specific worker group. cluster-autoscaler
expects to find these settings as annotations on the MachineDeployment
, and so providers must pass these values to the corresponding MachineDeployment
via annotations. The keys for these annotations can be found here and the values for the corresponding annotations should be the same as what is passed into the field. Providers can use the helper function extensionsv1alpha1helper.GetMachineDeploymentClusterAutoscalerAnnotations
that returns the annotation map to be used.
The controller must only inject its provider-specific sidecar container into the machine-controller-manager
Deployment
managed by gardenlet
.
After that, it must compute the desired machine classes and the desired machine deployments.
Typically, one class maps to one deployment, and one class/deployment is created per availability zone.
Following this convention, the created resource would look like this:
apiVersion: v1
kind: Secret
metadata:
name: shoot--foo--bar-cpu-worker-z1-3db65
namespace: shoot--foo--bar
labels:
gardener.cloud/purpose: machineclass
type: Opaque
data:
providerAccessKeyId: eW91ci1hd3MtYWNjZXNzLWtleS1pZAo=
providerSecretAccessKey: eW91ci1hd3Mtc2VjcmV0LWFjY2Vzcy1rZXkK
userData: c29tZSBkYXRhIHRvIGJvb3RzdHJhcCB0aGUgVk0K
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: AWSMachineClass
metadata:
name: shoot--foo--bar-cpu-worker-z1-3db65
namespace: shoot--foo--bar
spec:
ami: ami-0123456789 # Your controller must map the stated version to the provider specific machine image information, in the AWS case the AMI.
blockDevices:
- ebs:
volumeSize: 20
volumeType: gp2
iam:
name: shoot--foo--bar-nodes
keyName: shoot--foo--bar-ssh-publickey
machineType: m4.large
networkInterfaces:
- securityGroupIDs:
- sg-1234567890
subnetID: subnet-01234
region: eu-west-1
secretRef:
name: shoot--foo--bar-cpu-worker-z1-3db65
namespace: shoot--foo--bar
tags:
kubernetes.io/cluster/shoot--foo--bar: "1"
kubernetes.io/role/node: "1"
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: MachineDeployment
metadata:
name: shoot--foo--bar-cpu-worker-z1
namespace: shoot--foo--bar
spec:
replicas: 2
selector:
matchLabels:
name: shoot--foo--bar-cpu-worker-z1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
name: shoot--foo--bar-cpu-worker-z1
spec:
class:
kind: AWSMachineClass
name: shoot--foo--bar-cpu-worker-z1-3db65
for the first availability zone eu-west-1b
, and
apiVersion: v1
kind: Secret
metadata:
name: shoot--foo--bar-cpu-worker-z2-5z6as
namespace: shoot--foo--bar
labels:
gardener.cloud/purpose: machineclass
type: Opaque
data:
providerAccessKeyId: eW91ci1hd3MtYWNjZXNzLWtleS1pZAo=
providerSecretAccessKey: eW91ci1hd3Mtc2VjcmV0LWFjY2Vzcy1rZXkK
userData: c29tZSBkYXRhIHRvIGJvb3RzdHJhcCB0aGUgVk0K
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: AWSMachineClass
metadata:
name: shoot--foo--bar-cpu-worker-z2-5z6as
namespace: shoot--foo--bar
spec:
ami: ami-0123456789 # Your controller must map the stated version to the provider specific machine image information, in the AWS case the AMI.
blockDevices:
- ebs:
volumeSize: 20
volumeType: gp2
iam:
name: shoot--foo--bar-nodes
keyName: shoot--foo--bar-ssh-publickey
machineType: m4.large
networkInterfaces:
- securityGroupIDs:
- sg-1234567890
subnetID: subnet-0123a
region: eu-west-1
secretRef:
name: shoot--foo--bar-cpu-worker-z2-5z6as
namespace: shoot--foo--bar
tags:
kubernetes.io/cluster/shoot--foo--bar: "1"
kubernetes.io/role/node: "1"
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: MachineDeployment
metadata:
name: shoot--foo--bar-cpu-worker-z1
namespace: shoot--foo--bar
spec:
replicas: 1
selector:
matchLabels:
name: shoot--foo--bar-cpu-worker-z1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
name: shoot--foo--bar-cpu-worker-z1
spec:
class:
kind: AWSMachineClass
name: shoot--foo--bar-cpu-worker-z2-5z6as
for the second availability zone eu-west-1c
.
Another convention is the 5-letter hash at the end of the machine class names.
Most controllers compute a checksum out of the specification of the machine class.
Any change to the value of the nodeAgentSecretName
field must result in a change of the machine class name.
The checksum in the machine class name helps to trigger a rolling update of the worker nodes if, for example, the machine image version changes.
In this case, a new checksum will be generated which results in the creation of a new machine class.
The MachineDeployment
’s machine class reference (.spec.template.spec.class.name
) is updated, which triggers the rolling update process in the machine-controller-manager.
However, all of this is only a convention that eases writing the controller, but you can do it completely differently if you desire - as long as you make sure that the described behaviours are implemented correctly.
After the machine classes and machine deployments have been created, the machine-controller-manager will start talking to the provider’s IaaS API and create the virtual machines.
Gardener makes sure that the content of the Secret
referenced in the userDataSecretRef
field that is used to bootstrap the machines contains the required configuration for installation of the kubelet and registering the VM as worker node in the shoot cluster.
The Worker
extension controller shall wait until all the created MachineDeployment
s indicate healthiness/readiness before it ends the control loop.
Another important benefit of the machine-controller-manager’s design principles (extending the Kubernetes API using CRDs) is that the cluster-autoscaler can be used without any provider-specific implementation.
We have forked the upstream Kubernetes community’s cluster-autoscaler and extended it so that it understands the machine API.
Definitely, we will merge it back into the community’s versions once it has been adapted properly.
Our cluster-autoscaler only needs to know the minimum and maximum number of replicas per MachineDeployment
and is ready to act. Without knowing that, it needs to talk to the provider APIs (it just modifies the .spec.replicas
field in the MachineDeployment
object).
Gardener deploys this autoscaler if there is at least one worker pool that specifies max>min
.
In order to know how it needs to configure it, the provider-specific Worker
extension controller must expose which MachineDeployment
s it has created and how the min
/max
numbers should look like.
Consequently, your controller should write this information into the Worker
resource’s .status.machineDeployments
field. It should also update the .status.machineDeploymentsLastUpdateTime
field along with .status.machineDeployments
, so that gardener is able to deploy Cluster-Autoscaler right after the status is updated with the latest MachineDeployment
s and does not wait for the reconciliation to be completed:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Worker
metadata:
name: worker
namespace: shoot--foo--bar
spec:
...
status:
lastOperation: ...
machineDeployments:
- name: shoot--foo--bar-cpu-worker-z1
minimum: 2
maximum: 3
- name: shoot--foo--bar-cpu-worker-z2
minimum: 1
maximum: 2
machineDeploymentsLastUpdateTime: "2023-05-01T12:44:27Z"
In order to support a new worker provider, you need to write a controller that watches all Worker
s with .spec.type=<my-provider-name>
.
You can take a look at the below referenced example implementation for the AWS provider.
That sounds like a lot that needs to be done, can you help me?
All of the described behaviour is mostly the same for every provider.
The only difference is maybe the version/configuration of the provider-specific machine-controller-manager
sidecar container, and the machine class specification itself.
You can take a look at our extension library, especially the worker controller part where you will find a lot of utilities that you can use.
Note that there are also utility functions for getting the default sidecar container specification or corresponding VPA container policy in the machinecontrollermanager
package called ProviderSidecarContainer
and ProviderSidecarVPAContainerPolicy
.
Also, using the library you only need to implement your provider specifics - all the things that can be handled generically can be taken for free and do not need to be re-implemented.
Take a look at the AWS worker controller for finding an example.
All the providers require further information that is not provider specific but already part of the shoot resource.
One example for such information is whether the shoot is hibernated or not.
In this case, all the virtual machines should be deleted/terminated, and after that the machine controller-manager should be scaled down.
You can take a look at the AWS worker controller to see how it reads this information and how it is used.
As Gardener cannot know which information is required by providers, it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information that is not part of the Worker
resource itself.
References and Additional Resources