This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Extensions

1 - Admission

Extension Admission

The extensions are expected to validate their respective resources for their extension specific configurations, when the resources are newly created or updated. For example, provider extensions would validate spec.provider.infrastructureConfig and spec.provider.controlPlaneConfig in the Shoot resource and spec.providerConfig in the CloudProfile resource, networking extensions would validate spec.networking.providerConfig in the Shoot resource. As best practice, the validation should be performed only if there is a change in the spec of the resource. Please find an exemplary implementation here.

When a resource is newly created or updated, Gardener adds an extension label for all the extension types referenced in the spec of the resource. This label is of the form <extension-type>.extensions.gardener.cloud/<extension-name> : "true". For example, an extension label for provider extension type aws, looks like provider.extensions.gardener.cloud/aws : "true". The extensions should add object selectors in their admission webhooks for these labels, to filter out the objects they are responsible for. At present, these labels are added to BackupEntrys, BackupBuckets, CloudProfiles, Seeds, SecretBindings and Shoots. Please see this for the full list of extension labels.

2 - BackupBucket

Contract: BackupBucket resource

The Gardener project features a sub-project called etcd-backup-restore to take periodic backups of etcd backing Shoot clusters. It demands the bucket (or its equivalent in different object store providers) to be created and configured externally with appropriate credentials. The BackupBucket resource takes this responsibility in Gardener.

Before introducing the BackupBucket extension resource Gardener was using Terraform in order to create and manage these provider-specific resources (e.g., see here). Now, Gardener commissions an external, provider-specific controller to take over this task. You can also refer to backupInfra proposal documentation to get idea about how the transition was done and understand the resource in broader scope.

What is the scope of bucket?

A bucket will be provisioned per Seed. So, backup of every Shoot created on that Seed will be stored under different shoot specific prefix under the bucket. For the backup of the Shoot rescheduled on different Seed it will continue to use the same bucket.

What is the lifespan of BackupBucket?

The bucket associated with BackupBucket will be created at creation of Seed. And as per current implementation, it will be deleted on deletion of Seed and there isn’t any BackupEntry resource associated with it.

In the future, we plan to introduce schedule for BackupBucket the deletion logic for BackupBucket resource, which will reschedule the it on different available Seed, on deletion or failure of health check for current associated seed. In that case, BackupBucket will be deleted only if there isn’t any schedulable Seed available and there isn’t any associated BackupEntry resource.

What needs to be implemented to support a new infrastructure provider?

As part of the seed flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: BackupBucket
metadata:
  name: foo
spec:
  type: azure
  providerConfig:
    <some-optional-provider-specific-backupbucket-configuration>
  region: eu-west-1
  secretRef:
    name: backupprovider
    namespace: shoot--foo--bar

The .spec.secretRef contains a reference to the provider secret pointing to the account that shall be used to create the needed resources. This provider secret will be configured by Gardener operator in the Seed resource and propagated over there by seed controller.

After your controller has created the required bucket, if required it generates the secret to access the objects in buckets and put reference to it in status. This secret is supposed to be used by Gardener or eventually BackupEntry resource and etcd-backup-restore component to backup the etcd.

In order to support a new infrastructure provider you need to write a controller that watches all BackupBuckets with .spec.type=<my-provider-name>. You can take a look at the below referenced example implementation for the Azure provider.

References and additional resources

3 - BackupEntry

Contract: BackupEntry resource

The Gardener project features a sub-project called etcd-backup-restore to take periodic backups of etcd backing Shoot clusters. It demands the bucket (or its equivalent in different object store providers) access credentials to be created and configured externally with appropriate credentials. The BackupEntry resource takes this responsibility in Gardener to provide this information by creating a secret specific to the component. Said that, the core motivation for introducing this resource was to support retention of backups post deletion of Shoot. The etcd-backup-restore components takes responsibility of garbage collecting old backups out of the defined period. Once a shoot is deleted, we need to persist the backups for few days. Hence, Gardener uses the BackupEntry resource for this housekeeping work post deletion of a Shoot. The BackupEntry resource is responsible for shoot specific prefix under referred bucket.

Before introducing the BackupEntry extension resource Gardener was using Terraform in order to create and manage these provider-specific resources (e.g., see here). Now, Gardener commissions an external, provider-specific controller to take over this task. You can also refer to backupInfra proposal documentation to get idea about how the transition was done and understand the resource in broader scope.

What is the lifespan of BackupEntry?

The bucket associated with BackupEntry will be created at using BackupBucket resource. The BackupEntry resource will be created as a part of a Shoot creation. But resource might continue to exist post deletion of a Shoot (see this for more details).

What needs to be implemented to support a new infrastructure provider?

As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: BackupEntry
metadata:
  name: shoot--foo--bar
spec:
  type: azure
  providerConfig:
    <some-optional-provider-specific-backup-bucket-configuration>
  backupBucketProviderStatus:
    <some-optional-provider-specific-backup-bucket-status>
  region: eu-west-1
  bucketName: foo
  secretRef:
    name: backupprovider
    namespace: shoot--foo--bar

The .spec.secretRef contains a reference to the provider secret pointing to the account that shall be used to create the needed resources. This provider secret will be propagated from BackupBucket resource by Shoot controller.

Your controller is supposed to create the etcd-backup secret in control-plane namespace of a shoot. This secret is supposed to be used by Gardener or eventually the etcd-backup-restore component to backup the etcd. The controller implementation should cleanup the objects created under shoot specific prefix in bucket equivalent to name of BackupEntry resource.

In order to support a new infrastructure provider you need to write a controller that watches all BackupBuckets with .spec.type=<my-provider-name>. You can take a look at the below referenced example implementation for the Azure provider.

References and additional resources

4 - Bastion

Contract: Bastion resource

The Gardener project allows users to connect to Shoot worker nodes via SSH. As nodes are usually firewalled and not directly accessible from the public internet, GEP-15 introduced the concept of “Bastions”. A bastion is a dedicated server that only serves to allow SSH ingress to the worker nodes.

Bastion resources contain the user’s public SSH key and IP address, in order to provision the server accordingly: The public key is put onto the Bastion and SSH ingress is only authorized for the given IP address (in fact, it’s not a single IP address, but a set of IP ranges, however for most purposes a single IP is be used).

What is the lifespan of Bastion?

Once a Bastion has been created in the garden, it will be replicated to the appropriate seed cluster, where a controller then reconciles a server and firewall rules etc. on the cloud provider used by the target Shoot. When the Bastion is ready (i.e. has a public IP), that IP is stored in the Bastion’s status and from there is picked up by the garden cluster and gardenctl eventually.

To make multiple SSH sessions possible, the existence of the Bastion is not directly tied to the execution of gardenctl: users can exit out of gardenctl and use ssh manually to connect to the bastion and worker nodes.

However, Bastions have an expiry date, after which they will be garbage collected.

What needs to be implemented to support a new infrastructure provider?

As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Bastion
metadata:
  name: mybastion
  namespace: shoot--foo--bar
spec:
  type: aws
  # userData is base64-encoded cloud provider user data; this contains the
  # user's SSH key
  userData: IyEvYmluL2Jhc2ggL....Nlcgo=
  ingress:
    - ipBlock:
        cidr: 192.88.99.0/32 # this is most likely the user's IP address

Your controller is supposed to create a new instance at the given cloud provider, firewall it to only allow SSH (TCP port 22) from the given IP blocks, and then to configure the firewall for the worker nodes to allow SSH from the bastion instance. When a Bastion is deleted, all these changes need to be reverted.

References and additional resources

5 - CA Rotation

CA Rotation in Extensions

GEP-18 proposes adding support for automated rotation of Shoot cluster certificate authorities (CAs). This document outlines all requirements that Gardener extensions need to fulfill in order to support the CA rotation feature.

Requirements for Shoot Cluster CA Rotation

  • Extensions must not rely on static CA Secret names managed by gardenlet, because their names are changing during CA rotation.
  • Extensions cannot issue or use client certificates for authenticating against shoot API servers. Instead, they should use short-lived auto-rotated ServiceAccount tokens via gardener-resource-manager’s TokenRequestor. Also see Conventions and TokenRequestor documents.
  • Extensions need to generate dedicated CAs for signing server certificates (e.g. cloud-controller-manager). There should be one CA per controller and purpose in order to bind the lifecycle to the reconciliation cycle of the respective object for which it is created.
  • CAs managed by extensions should be rotated in lock-step with the shoot cluster CA. When the user triggers a rotation, gardenlet writes phase and initiation time to Shoot.status.credentials.rotation.certificateAuthorities.{phase,lastInitiationTime}. See GEP-18 for a detailed description on what needs to happen in each phase. Extensions can retrieve this information from Cluster.shoot.status.

Utilities for Secrets Management

In order to fulfill the requirements listed above, extension controllers can reuse the SecretsManager that gardenlet uses to manage all shoot cluster CAs, certificates, and other secrets as well. It implements the core logic for managing secrets that need to be rotated, auto-renewed etc.

Additionally, there are utilities for reusing SecretsManager in extension controllers. They already implement above requirements based on the Cluster resource and allow focusing on the extension controllers’ business logic.

For example, a simple SecretsManager usage in an extension controller could look like this:

const (
  // identity for SecretsManager instance in ControlPlane controller
  identity = "provider-foo-controlplane"
  // secret config name of the dedicated CA
  caControlPlaneName = "ca-provider-foo-controlplane"
)

func Reconcile() {
  var (
    cluster *extensionscontroller.Cluster
    client  client.Client

    // define wanted secrets with options
    secretConfigs = []extensionssecretsmanager.SecretConfigWithOptions{
      {
        // dedicated CA for ControlPlane controller
        Config: &secretutils.CertificateSecretConfig{
          Name:       caControlPlaneName,
          CommonName: "ca-provider-foo-controlplane",
          CertType:   secretutils.CACert,
        },
        // persist CA so that it gets restored on control plane migration
        Options: []secretsmanager.GenerateOption{secretsmanager.Persist()},
      },
      {
        // server cert for control plane component
        Config: &secretutils.CertificateSecretConfig{
          Name:       "cloud-controller-manager",
          CommonName: "cloud-controller-manager",
          DNSNames:   kutil.DNSNamesForService("cloud-controller-manager", namespace),
          CertType:   secretutils.ServerCert,
        },
        // sign with our dedicated CA
        Options: []secretsmanager.GenerateOption{secretsmanager.SignedByCA(caControlPlaneName)},
      },
    }
  )

  // initialize SecretsManager based on Cluster object
  sm, err := extensionssecretsmanager.SecretsManagerForCluster(ctx, logger.WithName("secretsmanager"), clock.RealClock{}, client, cluster, identity, secretConfigs)
  
  // generate all wanted secrets (first CAs, then the rest)
  secrets, err := extensionssecretsmanager.GenerateAllSecrets(ctx, sm, secretConfigs)

  // cleanup any secrets that are not needed any more (e.g. after rotation)
  err = sm.Cleanup(ctx)
}

Please pay attention to the following points:

  • There should be one SecretsManager identity per controller (and purpose if applicable) in order to prevent conflicts between different instances. E.g., there should be different identities for Infrastructrue, Worker controller etc. and the ControlPlane controller should use dedicated SecretsManager identities per purpose (e.g. provider-foo-controlplane and provider-foo-controlplane-exposure).
  • All other points in Reusing the SecretsManager in Other Components

6 - Cluster

Cluster resource

As part of the extensibility epic a lot of responsibility that was previously taken over by Gardener directly has now been shifted to extension controllers running in the seed clusters. These extensions often serve a well-defined purpose, e.g. the management of DNS records, infrastructure, etc. We have introduced a couple of extension CRDs in the seeds whose specification is written by Gardener, and which are acted up by the extensions.

However, the extensions sometimes require more information that is not directly part of the specification. One example of that is the GCP infrastructure controller which needs to know the shoot’s pod and service network. Another example is the Azure infrastructure controller which requires some information out of the CloudProfile resource. The problem is that Gardener does not know which extension requires which information so that it can write it into their specific CRDs.

In order to deal with this problem we have introduced the Cluster extension resource. This CRD is written into the seeds, however, it does not contain a status, so it is not expected that something acts upon it. Instead, you can treat it like a ConfigMap which contains data that might be interesting for you. In the context of Gardener, seeds and shoots, and extensibility the Cluster resource contains the CloudProfile, Seed, and Shoot manifest. Extension controllers can take whatever information they want out of it that might help completing their individual tasks.

---

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Cluster
metadata:
  name: shoot--foo--bar
spec:
  cloudProfile:
    apiVersion: core.gardener.cloud/v1beta1
    kind: CloudProfile
    ...
  seed:
    apiVersion: core.gardener.cloud/v1beta1
    kind: Seed
    ...
  shoot:
    apiVersion: core.gardener.cloud/v1beta1
    kind: Shoot
    ...

The resource is written by Gardener before it starts the reconciliation flow of the shoot.

⚠️ All Gardener components use the core.gardener.cloud/v1beta1 version, i.e., the Cluster resource will contain the objects in this version.

Important information that should be taken into account

There are some fields in the Shoot specification that might be interesting to take into account.

  • .spec.hibernation.enabled={true,false}: Extension controllers might want to behave differently if the shoot is hibernated or not (probably they might want to scale down their control plane components, for example).
  • .status.lastOperation.state=Failed: If Gardener sets the shoot’s last operation state to Failed it means that Gardener won’t automatically retry to finish the reconciliation/deletion flow because an error occurred that could not be resolved within the last 24h (default). In this case end-users are expected to manually re-trigger the reconciliation flow in case they want Gardener to try again. Extension controllers are expected to follow the same principle. This means they have to read the shoot state out of the Cluster resource.

Extension resources not associated with a shoot

In some cases, Gardener may create extension resources that are not associated with a shoot, but are needed to support some functionality internal to Gardener. Such resources will be created in the garden namespace of a seed cluster.

For example, if the managed ingress controller is active on the seed, Gardener will create a DNSProvider / DNSEntry or a DNSRecord resource(s) in the garden namespace of the seed cluster for the ingress DNS record.

Extension controllers that may be expected to reconcile extension resources in the garden namespace should make sure that they can tolerate the absence of a cluster resource. This means that they should not attempt to read the cluster resource in such cases, or if they do they should ignore the “not found” error.

References and additional resources

7 - ContainerRuntime

Gardener Container Runtime Extension

At the lowest layers of a Kubernetes node is the software that, among other things, starts and stops containers. It is called “Container Runtime”. The most widely known container runtime is Docker, but it is not alone in this space. In fact, the container runtime space has been rapidly evolving.

Kubernetes supports different container runtimes using Container Runtime Interface (CRI) – a plugin interface which enables kubelet to use a wide variety of container runtimes.

Gardener supports creation of Worker machines using CRI, more information can be found here: CRI Support.

Motivation

Prior to the Container Runtime Extensibility concept, Gardener used Docker as the only container runtime to use in shoot worker machines. Because of the wide variety of different container runtimes offers multiple important features (for example enhanced security concepts) it is important to enable end users to use other container runtimes as well.

The ContainerRuntime Extension Resource

Here is what a typical ContainerRuntime resource would look-like:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: ContainerRuntime
metadata:
  name: my-container-runtime
spec:
  binaryPath: /var/bin/containerruntimes
  type: gvisor
  workerPool:
    name: worker-ubuntu
    selector:
      matchLabels:
        worker.gardener.cloud/pool: worker-ubuntu

Gardener deploys one ContainerRuntime resource per worker pool per CRI. To exemplify this, consider a Shoot having two worker pools (worker-one, worker-two) using containerd as the CRI as well as gvisor and kata as enabled container runtimes. Gardener would deploy four ContainerRuntime resources. For worker-one: one ContainerRuntime for type gvisor and one for type kata. The same resource are being deployed for worker-two.

Supporting a new Container Runtime Provider

To add support for another container runtime (e.g., gvisor, kata-containers, etc.) a container runtime extension controller needs to be implemented. It should support Gardener’s supported CRI plugins.

The container runtime extension should install the necessary resources into the shoot cluster (e.g., RuntimeClasses), and it should copy the runtime binaries to the relevant worker machines in path: spec.binaryPath. Gardener labels the shoot nodes according to the CRI configured: worker.gardener.cloud/cri-name=<value> (e.g worker.gardener.cloud/cri-name=containerd) and multiple labels for each of the container runtimes configured for the shoot Worker machine: containerruntime.worker.gardener.cloud/<container-runtime-type-value>=true (e.g containerruntime.worker.gardener.cloud/gvisor=true). The way to install the binaries is by creating a daemon set which copies the binaries from an image in a docker registry to the relevant labeled Worker’s nodes (avoid downloading binaries from internet to also cater with isolated environments).

For additional reference, please have a look at the runtime-gvsior provider extension, which provides more information on how to configure the necessary charts as well as the actuators required to reconcile container runtime inside the Shoot cluster to the desired state.

8 - ControllerRegistration

Registering Extension Controllers

Extensions are registered in the garden cluster via ControllerRegistration resources. Deployment for respective extensions are specified via ControllerDeployment resources. Gardener evaluates the registrations and deployments and creates ControllerInstallation resources which describe the request “please install this controller X to this seed Y”.

Similar to how CloudProfile or Seed resources get into the system, the Gardener administrator must deploy the ControllerRegistration and ControllerDeployment resources (this does not happen automatically in any way - the administrator decides which extensions shall be enabled).

The specification mainly describes which of Gardener’s extension CRDs are managed, for example:

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerDeployment
metadata:
  name: os-gardenlinux
type: helm
providerConfig:
  chart: H4sIFAAAAAAA/yk... # <base64-gzip-chart>
  values:
    foo: bar
---
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
metadata:
  name: os-gardenlinux
spec:
  deployment:
    deploymentRefs:
    - name: os-gardenlinux
  resources:
  - kind: OperatingSystemConfig
    type: gardenlinux
    primary: true

This information tells Gardener that there is an extension controller that can handle OperatingSystemConfig resources of type gardenlinux. A reference to the shown ControllerDeployment specifies how the deployment of the extension controller is accomplished.

Also, it specifies that this controller is the primary one responsible for the lifecycle of the OperatingSystemConfig resource. Setting primary to false would allow to register additional, secondary controllers that may also watch/react on the OperatingSystemConfig/coreos resources, however, only the primary controller may change/update the main status of the extension object (that are used to “communicate” with the Gardenlet). Particularly, only the primary controller may set .status.lastOperation, .status.lastError, .status.observedGeneration, and .status.state. Secondary controllers may contribute to the .status.conditions[] if they like, of course.

Secondary controllers might be helpful in scenarios where additional tasks need to be completed which are not part of the reconciliation logic of the primary controller but separated out into a dedicated extension.

⚠️ There must be exactly one primary controller for every registered kind/type combination. Also, please note that the primary field cannot be changed after creation of the ControllerRegistration.

Deploying Extension Controllers

Submitting above ControllerDeployment and ControllerRegistration will create a ControllerInstallation resource:

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerInstallation
metadata:
  name: os-gardenlinux
spec:
  deploymentRef:
    name: os-gardenlinux
  registrationRef:
    name: os-gardenlinux
  seedRef:
    name: aws-eu1

This resource expresses that Gardener requires the os-gardenlinux extension controller to run on the aws-eu1 seed cluster.

The Gardener Controller Manager does automatically determine which extension is required on which seed cluster and will only create ControllerInstallation objects for those. Also, it will automatically delete ControllerInstallations referencing extension controllers that are no longer required on a seed (e.g., because all shoots on it have been deleted). There are additional configuration options, please see this section.

How do extension controllers get deployed to seeds?

After Gardener has written the ControllerInstallation resource some component must satisfy this request and start deploying the extension controller to the seed. Depending on the complexity of the controller’s lifecycle management, configuration, etc. there are two possible scenarios:

Scenario 1: Deployed by Gardener

In many cases the extension controllers are easy to deploy and configure. It is sufficient to simply create a Helm chart (standardized way of packaging software in the Kubernetes context) and deploy it together with some static configuration values. Gardener supports this scenario and allows to provide arbitrary deployment information in the ControllerDeployment resource’s .providerConfig section:

...
type: helm
providerConfig:
  chart: H4sIFAAAAAAA/yk...
  values:
    foo: bar

If .type=helm then Gardener itself will take over the responsibility the deployment. It base64-decodes the provided Helm chart (.providerConfig.chart) and deploys it with the provided static configuration (.providerConfig.values). The chart and the values can be updated at any time - Gardener will recognize and re-trigger the deployment process.

In order to allow extensions to get information about the garden and the seed cluster Gardener does mix-in certain properties into the values (root level) of every deployed Helm chart:

gardener:
  garden:
    identifier: <uuid-of-gardener-installation>
  seed:
    identifier: <seed-name>
    region: europe
    spec: <complete-seed-spec>

Extensions can use this information in their Helm chart in case they require knowledge about the garden and the seed environment. The list might be extended in the future.

ℹ️ Gardener uses the UUID of the garden Namespace object in the .gardener.garden.identifier property.

Scenario 2: Deployed by a (non-human) Kubernetes operator

Some extension controllers might be more complex and require additional domain-specific knowledge wrt. lifecycle or configuration. In this case, we encourage to follow the Kubernetes operator pattern and deploy a dedicated operator for this extension into the garden cluster. The ControllerDeployments’s .type field would then not be helm, and no Helm chart or values need to be provided there. Instead, the operator itself knows how to deploy the extension into the seed. It must watch ControllerInstallation resources and act one those referencing a ControllerRegistration the operator is responsible for.

In order to let Gardener know that the extension controller is ready and running in the seed the ControllerInstallation’s .status field supports two conditions: RegistrationValid and InstallationSuccessful - both must be provided by the responsible operator:

...
status:
  conditions:
  - lastTransitionTime: "2019-01-22T11:51:11Z"
    lastUpdateTime: "2019-01-22T11:51:11Z"
    message: Chart could be rendered successfully.
    reason: RegistrationValid
    status: "True"
    type: Valid
  - lastTransitionTime: "2019-01-22T11:51:12Z"
    lastUpdateTime: "2019-01-22T11:51:12Z"
    message: Installation of new resources succeeded.
    reason: InstallationSuccessful
    status: "True"
    type: Installed

Additionally, the .status field has a providerStatus section into which the operator can (optionally) put any arbitrary data associated with this installation.

Extensions in the garden cluster itself

The Shoot resource itself will contain some provider-specific data blobs. As a result, some extensions might also want to run in the garden cluster, e.g., to provide ValidatingWebhookConfigurations for validating the correctness of their provider-specific blobs:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: johndoe-aws
  namespace: garden-dev
spec:
  ...
  cloud:
    type: aws
    region: eu-west-1
    providerConfig:
      apiVersion: aws.cloud.gardener.cloud/v1alpha1
      kind: InfrastructureConfig
      networks:
        vpc: # specify either 'id' or 'cidr'
        # id: vpc-123456
          cidr: 10.250.0.0/16
        internal:
        - 10.250.112.0/22
        public:
        - 10.250.96.0/22
        workers:
        - 10.250.0.0/19
      zones:
      - eu-west-1a
...

In the above example, Gardener itself does not understand the AWS-specific provider configuration for the infrastructure. However, if this part of the Shoot resource should be validated then you should run an AWS-specific component in the garden cluster that registers a webhook. You can do it similarly if you want to default some fields of a resource (by using a MutatingWebhookConfiguration).

Again, similar to how Gardener is deployed to the garden cluster, these components must be deployed and managed by the Gardener administrator.

Extension resource configurations

The Extension resource allows injecting arbitrary steps into the shoot reconciliation flow that are unknown to Gardener. Hence, it is slightly special and allows further configuration when registering it:

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
metadata:
  name: extension-foo
spec:
  resources:
  - kind: Extension
    type: foo
    primary: true
    globallyEnabled: true
    reconcileTimeout: 30s

The globallyEnabled=true option specifies that the Extension/foo object shall be created by default for all shoots (unless they opted out by setting .spec.extensions[].enabled=false in the Shoot spec).

The reconcileTimeout tells Gardener how long it should wait during its shoot reconciliation flow for the Extension/foo’s reconciliation to finish.

Deployment configuration options

The .spec.deployment resource allows to configure a deployment policy. There are the following policies:

  • OnDemand (default): Gardener will demand the deployment and deletion of the extension controller to/from seed clusters dynamically. It will automatically determine (based on other resources like Shoots) whether it is required and decide accordingly.
  • Always: Gardener will demand the deployment of the extension controller to seed clusters independent of whether it is actually required or not. This might be helpful if you want to add a new component/controller to all seed clusters by default. Another use-case is to minimize the durations until extension controllers get deployed and ready in case you have highly fluctuating seed clusters.
  • AlwaysExceptNoShoots: Similar to Always, but if the seed does not have any shoots then the extension is not being deployed. It will be deleted from a seed after the last shoot has been removed from it.

Also, the .spec.deployment.seedSelector allows to specify a label selector for seed clusters. Only if it matches the labels of a seed then it will be deployed to it. Please note that a seed selector can only be specified for secondary controllers (primary=false for all .spec.resources[]).

9 - ControlPlane

Contract: ControlPlane resource

Most Kubernetes clusters require a cloud-controller-manager or CSI drivers in order to work properly. Before introducing the ControlPlane extension resource Gardener was having several different Helm charts for the cloud-controller-manager deployments for the various providers. Now, Gardener commissions an external, provider-specific controller to take over this task.

Which control plane resources are required?

As mentioned in the controlplane customization webhooks document Gardener shall not deploy any cloud-controller-manager or any other provider-specific component. Instead, it creates a ControlPlane CRD that should be picked up by provider extensions. Its purpose is to trigger the deployment of such provider-specific components in the shoot namespace in the seed cluster.

What needs to be implemented to support a new infrastructure provider?

As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: ControlPlane
metadata:
  name: control-plane
  namespace: shoot--foo--bar
spec:
  type: openstack
  region: europe-west1
  secretRef:
    name: cloudprovider
    namespace: shoot--foo--bar
  providerConfig:
    apiVersion: openstack.provider.extensions.gardener.cloud/v1alpha1
    kind: ControlPlaneConfig
    loadBalancerProvider: provider
    zone: eu-1a
    cloudControllerManager:
      featureGates:
        CustomResourceValidation: true
  infrastructureProviderStatus:
    apiVersion: openstack.provider.extensions.gardener.cloud/v1alpha1
    kind: InfrastructureStatus
    networks:
      floatingPool:
        id: vpc-1234
      subnets:
      - purpose: nodes
        id: subnetid

The .spec.secretRef contains a reference to the provider secret pointing to the account that shall be used for the shoot cluster. However, the most important section is the .spec.providerConfig and the .spec.infrastructureProviderStatus. The first one contains an embedded declaration of the provider specific configuration for the control plane (that cannot be known by Gardener itself). You are responsible for designing how this configuration looks like. Gardener does not evaluate it but just copies this part from what has been provided by the end-user in the Shoot resource. The second one contains the output of the Infrastructure resource (that might be relevant for the CCM config).

In order to support a new control plane provider you need to write a controller that watches all ControlPlanes with .spec.type=<my-provider-name>. You can take a look at the below referenced example implementation for the Alicloud provider.

The control plane controller as part of the ControlPlane reconciliation, often deploys resources (e.g. pods/deployments) into the Shoot namespace in the Seed as part of its ControlPlane reconciliation loop. Because the namespace contains network policies that per default deny all ingress and egress traffic, the pods may need to have proper labels matching to the selectors of the network policies in order to allow the required network traffic. Otherwise, they won’t be allowed to talk to certain other components (e.g., the kube-apiserver of the shoot). Please see this document for more information.

Non-provider specific information required for infrastructure creation

Most providers might require further information that is not provider specific but already part of the shoot resource. One example for this is the GCP control plane controller which needs the Kubernetes version of the shoot cluster (because it already uses the in-tree Kubernetes cloud-controller-manager). As Gardener cannot know which information is required by providers it simply mirrors the Shoot, Seed, and CloudProfile resources into the seed. They are part of the Cluster extension resource and can be used to extract information that is not part of the Infrastructure resource itself.

References and additional resources

10 - ControlPlane Exposure

Contract: ControlPlane resource with purpose exposure

Some Kubernetes clusters require an additional deployments required by the seed cloud provider in order to work properly, e.g. AWS Load Balancer Readvertiser. Before using ControlPlane resources with purpose exposure Gardener was having different Helm charts for the deployments for the various providers. Now, Gardener commissions an external, provider-specific controller to take over this task.

Which control plane resources are required?

As mentioned in the controlplane document Gardener shall not deploy any other provider-specific component. Instead, it creates a ControlPlane CRD with purpose exposure that should be picked up by provider extensions. Its purpose is to trigger the deployment of such provider-specific components in the shoot namespace in the seed cluster that are needed to expose the kube-apiserver.

The shoot cluster’s kube-apiserver are exposed via a Service of type LoadBalancer from the shoot provider (you may run the control plane of an Azure shoot in a GCP seed) it’s the seed provider extension controller that should act on the ControlPlane resources with purpose exposure.

If SNI is enabled, then the Service from above is of type ClusterIP and Gardner will not create ControlPlane resources with purpose exposure.

What needs to be implemented to support a new infrastructure provider?

As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: ControlPlane
metadata:
  name: control-plane-exposure
  namespace: shoot--foo--bar
spec:
  type: aws
  purpose: exposure
  region: europe-west1
  secretRef:
    name: cloudprovider
    namespace: shoot--foo--bar

The .spec.secretRef contains a reference to the provider secret pointing to the account that shall be used for the shoot cluster. It is most likely not needed, however, still added for some potential corner cases. If you don’t need it then just ignore it. The .spec.region contains the region of the seed cluster.

In order to support a control plane provider with purpose exposure you need to write a controller or expand the existing controlplane controller that watches all ControlPlanes with .spec.type=<my-provider-name> and purpose exposure. You can take a look at the below referenced example implementation for the AWS provider.

Non-provider specific information required for infrastructure creation

Most providers might require further information that is not provider specific but already part of the shoot resource. As Gardener cannot know which information is required by providers it simply mirrors the Shoot, Seed, and CloudProfile resources into the seed. They are part of the Cluster extension resource and can be used to extract information.

References and additional resources

11 - ControlPlane Webhooks

Controlplane customization webhooks

Gardener creates the Shoot controlplane in several steps of the Shoot flow. At different point of this flow, it:

  • deploys standard controlplane components such as kube-apiserver, kube-controller-manager, and kube-scheduler by creating the corresponding deployments, services, and other resources in the Shoot namespace.
  • initiates the deployment of custom controlplane components by ControlPlane controllers by creating a ControlPlane resource in the Shoot namespace.

In order to apply any provider-specific changes to the configuration provided by Gardener for the standard controlplane components, cloud extension providers can install mutating admission webhooks for the resources created by Gardener in the Shoot namespace.

What needs to be implemented to support a new cloud provider?

In order to support a new cloud provider you should install “controlplane” mutating webhooks for any of the following resources:

  • Deployment with name kube-apiserver, kube-controller-manager, or kube-scheduler
  • Service with name kube-apiserver
  • OperatingSystemConfig with any name and purpose reconcile

See Contract Specification for more details on the contract that Gardener and webhooks should adhere to regarding the content of the above resources.

You can install 3 different kinds of controlplane webhooks:

  • Shoot, or controlplane webhooks apply changes needed by the Shoot cloud provider, for example the --cloud-provider command line flag of kube-apiserver and kube-controller-manager. Such webhooks should only operate on Shoot namespaces labeled with shoot.gardener.cloud/provider=<provider>.
  • Seed, or controlplaneexposure webhooks apply changes needed by the Seed cloud provider, for example annotations on the kube-apiserver service to ensure cloud-specific load balancers are correctly provisioned for a service of type LoadBalancer. Such webhooks should only operate on Shoot namespaces labeled with seed.gardener.cloud/provider=<provider>.

The labels shoot.gardener.cloud/provider and shoot.gardener.cloud/provider are added by Gardener when it creates the Shoot namespace.

Contract Specification

This section specifies the contract that Gardener and webhooks should adhere to in order to ensure smooth interoperability. Note that this contract can’t be specified formally and is therefore easy to violate, especially by Gardener. The Gardener team will nevertheless do its best to adhere to this contract in the future and to ensure via additional measures (tests, validations) that it’s not unintentionally broken. If it needs to be changed intentionally, this can only happen after proper communication has taken place to ensure that the affected provider webhooks could be adapted to work with the new version of the contract.

Note: The contract described below may not necessarily be what Gardener does currently (as of May 2019). Rather, it reflects the target state after changes for Gardener extensibility have been introduced.

kube-apiserver

To deploy kube-apiserver, Gardener shall create a deployment and a service both named kube-apiserver in the Shoot namespace. They can be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener.

The pod template of the kube-apiserver deployment shall contain a container named kube-apiserver.

The command field of the kube-apiserver container shall contain the kube-apiserver command line. It shall contain a number of provider-independent flags that should be ignored by webhooks, such as:

  • admission plugins (--enable-admission-plugins, --disable-admission-plugins)
  • secure communications (--etcd-cafile, --etcd-certfile, --etcd-keyfile, …)
  • audit log (--audit-log-*)
  • ports (--insecure-port, --secure-port)

The kube-apiserver command line shall not contain any provider-specific flags, such as:

  • --cloud-provider
  • --cloud-config

These flags can be added by webhooks if needed.

The kube-apiserver command line may contain a number of additional provider-independent flags. In general, webhooks should ignore these unless they are known to interfere with the desired kube-apiserver behavior for the specific provider. Among the flags to be considered are:

  • --endpoint-reconciler-type
  • --advertise-address
  • --feature-gates

Gardener may use SNI to expose the apiserver (APIServerSNI feature gate). In this case, Gardener shall label the kube-apiserver’s Deployment with core.gardener.cloud/apiserver-exposure: gardener-managed label and expects that the --endpoint-reconciler-type and --advertise-address flags are not modified.

The --enable-admission-plugins flag may contain admission plugins that are not compatible with CSI plugins such as PersistentVolumeLabel. Webhooks should therefore ensure that such admission plugins are either explicitly enabled (if CSI plugins are not used) or disabled (otherwise).

The env field of the kube-apiserver container shall not contain any provider-specific environment variables (so it will be empty). If any provider-specific environment variables are needed, they should be added by webhooks.

The volumes field of the pod template of the kube-apiserver deployment, and respectively the volumeMounts field of the kube-apiserver container shall not contain any provider-specific Secret or ConfigMap resources. If such resources should be mounted as volumes, this should be done by webhooks.

The kube-apiserver Service may be of type LoadBalancer, but shall not contain any provider-specific annotations that may be needed to actually provision a load balancer resource in the Seed provider’s cloud. If any such annotations are needed, they should be added by webhooks (typically controlplaneexposure webhooks).

The kube-apiserver Service shall be of type ClusterIP, if Gardener is using SNI to expose the apiserver (APIServerSNI feature gate). In this case, Gardener shall label this Service with core.gardener.cloud/apiserver-exposure: gardener-managed label and expects that no mutations happen.

kube-controller-manager

To deploy kube-controller-manager, Gardener shall create a deployment named kube-controller-manager in the Shoot namespace. It can be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener.

The pod template of the kube-controller-manager deployment shall contain a container named kube-controller-manager.

The command field of the kube-controller-manager container shall contain the kube-controller-manager command line. It shall contain a number of provider-independent flags that should be ignored by webhooks, such as:

  • --kubeconfig, --authentication-kubeconfig, --authorization-kubeconfig
  • --leader-elect
  • secure communications (--tls-cert-file, --tls-private-key-file, …)
  • cluster CIDR and identity (--cluster-cidr, --cluster-name)
  • sync settings (--concurrent-deployment-syncs, --concurrent-replicaset-syncs)
  • horizontal pod autoscaler (--horizontal-pod-autoscaler-*)
  • ports (--port, --secure-port)

The kube-controller-manager command line shall not contain any provider-specific flags, such as:

  • --cloud-provider
  • --cloud-config
  • --configure-cloud-routes
  • --external-cloud-volume-plugin

These flags can be added by webhooks if needed.

The kube-controller-manager command line may contain a number of additional provider-independent flags. In general, webhooks should ignore these unless they are known to interfere with the desired kube-controller-manager behavior for the specific provider. Among the flags to be considered are:

  • --feature-gates

The env field of the kube-controller-manager container shall not contain any provider-specific environment variables (so it will be empty). If any provider-specific environment variables are needed, they should be added by webhooks.

The volumes field of the pod template of the kube-controller-manager deployment, and respectively the volumeMounts field of the kube-controller-manager container shall not contain any provider-specific Secret or ConfigMap resources. If such resources should be mounted as volumes, this should be done by webhooks.

kube-scheduler

To deploy kube-scheduler, Gardener shall create a deployment named kube-scheduler in the Shoot namespace. It can be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener.

The pod template of the kube-scheduler deployment shall contain a container named kube-scheduler.

The command field of the kube-scheduler container shall contain the kube-scheduler command line. It shall contain a number of provider-independent flags that should be ignored by webhooks, such as:

  • --config
  • --authentication-kubeconfig, --authorization-kubeconfig
  • secure communications (--tls-cert-file, --tls-private-key-file, …)
  • ports (--port, --secure-port)

The kube-scheduler command line may contain additional provider-independent flags. In general, webhooks should ignore these unless they are known to interfere with the desired kube-controller-manager behavior for the specific provider. Among the flags to be considered are:

  • --feature-gates

The kube-scheduler command line can’t contain provider-specific flags, and it makes no sense to specify provider-specific environment variables or mount provider-specific Secret or ConfigMap resources as volumes.

etcd-main and etcd-events

To deploy etcd, Gardener shall create 2 Etcd named etcd-main and etcd-events in the Shoot namespace. They can be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener.

Gardener shall configure the Etcd resource completely to set up an etcd cluster which uses the default storage class of the seed cluster.

cloud-controller-manager

Gardener shall not deploy a cloud-controller-manager. If it is needed, it should be added by a ControlPlane controller

CSI controllers

Gardener shall not deploy a CSI controller. If it is needed, it should be added by a ControlPlane controller

kubelet

To specify the kubelet configuration, Gardener shall create a OperatingSystemConfig resource with any name and purpose reconcile in the Shoot namespace. It can therefore also be mutated by webhooks to apply any provider-specific changes to the standard configuration provided by Gardener. Gardener may write multiple such resources with different type to the same Shoot namespaces if multiple OSs are used.

The OSC resource shall contain a unit named kubelet.service, containing the corresponding systemd unit configuration file. The [Service] section of this file shall contain a single ExecStart option having the kubelet command line as its value.

The OSC resource shall contain a file with path /var/lib/kubelet/config/kubelet, which contains a KubeletConfiguration resource in YAML format. Most of the flags that can be specified in the kubelet command line can alternatively be specified as options in this configuration as well.

The kubelet command line shall contain a number of provider-independent flags that should be ignored by webhooks, such as:

  • --config
  • --bootstrap-kubeconfig, --kubeconfig
  • --network-plugin (and, if it equals cni, also --cni-bin-dir and --cni-conf-dir)
  • --node-labels

The kubelet command line shall not contain any provider-specific flags, such as:

  • --cloud-provider
  • --cloud-config
  • --provider-id

These flags can be added by webhooks if needed.

The kubelet command line / configuration may contain a number of additional provider-independent flags / options. In general, webhooks should ignore these unless they are known to interfere with the desired kubelet behavior for the specific provider. Among the flags / options to be considered are:

  • --enable-controller-attach-detach (enableControllerAttachDetach) - should be set to true if CSI plugins are used, but in general can also be ignored since its default value is also true, and this should work both with and without CSI plugins.
  • --feature-gates (featureGates) - should contain a list of specific feature gates if CSI plugins are used. If CSI plugins are not used, the corresponding feature gates can be ignored since enabling them should not harm in any way.

12 - Conventions

General conventions

All the extensions that are registered to Gardener are deployed to the seed clusters, on which they are required (also see ControllerRegistration).

Some of these extensions might need to create global resources in the seed (e.g., ClusterRoles), i.e., it’s important to have a naming scheme to avoid conflicts as it cannot be checked or validated upfront that two extensions don’t use the same names.

Consequently, this page should help answering some general questions that might come up when it comes to developing an extension.

Is there a naming scheme for (global) resources?

As there is no formal process to validate non-existence of conflicts between two extensions please follow these naming schemes when creating resources (especially, when creating global resources, but it’s in general a good idea for most created resources):

The resource name should be prefixed with extensions.gardener.cloud:<extension-type>-<extension-name>:<resource-name>, for example:

  • extensions.gardener.cloud:provider-aws:machine-controller-manager
  • extensions.gardener.cloud:extension-certificate-service:cert-broker

How to create resources in the shoot cluster?

Some extensions might not only create resources in the seed cluster itself but also in the shoot cluster. Usually, every extension comes with a ServiceAccount and the required RBAC permissions when it gets installed to the seed. However, there are no credentials for the shoot for every extension.

Extensions are supposed to use ManagedResources to manage resources in shoot clusters. gardenlet deploys gardener-resource-manager instances into all shoot control planes, that will reconcile ManagedResources without a specified class (spec.class=null) in shoot clusters.

If you need to deploy a non-DaemonSet resource you need to ensure that it only runs on nodes that are allowed to host system components and extensions. To do that you need to configure a nodeSelector as following:

nodeSelector:
 worker.gardener.cloud/system-components: "true"

How to create kubeconfigs for the shoot cluster?

Historically, Gardener extensions used to generate kubeconfigs with client certificates for components they deploy into the shoot control plane. For this, they reused the shoot cluster CA secret (ca) to issue new client certificates. With gardener/gardener#4661 we moved away from using client certificates in favor of short-lived, auto-rotated ServiceAccount tokens. These tokens are managed by gardener-resource-manager’s TokenRequestor. Extensions are supposed to reuse this mechanism for requesting tokens and a generic-token-kubeconfig for authenticating against shoot clusters.

With GEP-18 (Shoot cluster CA rotation), a dedicated CA will be used for signing client certificates (gardener/gardener#5779) which will be rotated when triggered by the shoot owner. With this, extensions cannot reuse the ca secret anymore to issue client certificates. Hence, extensions must switch to short-lived ServiceAccount tokens in order to support the CA rotation feature.

The generic-token-kubeconfig secret contains the CA bundle for establishing trust to shoot API servers. However, as the secret is immutable its name changes with the rotation of the cluster CA. Extensions need to look up the generic-token-kubeconfig.secret.gardener.cloud/name annotation on the respective Cluster object in order to determine which secret contains the current CA bundle. The helper function extensionscontroller.GenericTokenKubeconfigSecretNameFromCluster can be used for this task.

You can take a look at CA Rotation in Extensions for more details on the CA rotation feature in regard to extensions.

How to create certificates for the shoot cluster?

Gardener creates several certificate authorities (CA) that are used to create server certificates for various components. For example, the shoot’s etcd has its own CA, the kube-aggregator has its own CA as well, and both are different to the actual cluster’s CA.

With GEP-18 (Shoot cluster CA rotation), extensions are required to do the same and generate dedicated CAs for their components (e.g. for signing a server certificate for cloud-controller-manager). They must not depend on the CA secrets managed by gardenlet.

Please see CA Rotation in Extensions for the exact requirements, that extensions need to fulfill in order to support the CA rotation feature.

13 - DNS

Contract: DNSProvider and DNSEntry resources

Every shoot cluster requires external DNS records that are publicly resolvable. The management of these DNS records requires provider-specific knowledge which is to be developed outside of the Gardener’s core repository.

What does Gardener create DNS records for?

Internal domain name

Every shoot cluster’s kube-apiserver running in the seed is exposed via a load balancer that has a public endpoint (IP or hostname). This endpoint is used by end-users and also by system components (that are running in another network, e.g., the kubelet or kube-proxy) to talk to the cluster. In order to be robust against changes of this endpoint (e.g., caused due to re-creation of the load balancer or move of the control plane to another seed cluster) Gardener creates a so-called internal domain name for every shoot cluster. The internal domain name is a publicly resolvable DNS record that points to the load balancer of the kube-apiserver. Gardener uses this domain name in the kubeconfigs of all system components (instead of writing the load balancer endpoint directly into it. This way Gardener does not need to recreate all the kubeconfigs if the endpoint changes - it just needs to update the DNS record.

External domain name

The internal domain name is not configurable by end-users directly but dictated by the Gardener administrator. However, end-users usually prefer to have another DNS name, maybe even using their own domain sometimes to access their Kubernetes clusters. Gardener supports that by creating another DNS record, named external domain name, that actually points to the internal domain name. The kubeconfig handed out to end-users does contain this external domain name, i.e., users can access their clusters with the DNS name they like to.

As not every end-user has an own domain it is possible for Gardener administrators to configure so-called default domains. If configured, shoots that do not specify a domain explicitly get an external domain name based on a default domain (unless explicitly stated that this shoot should not get an external domain name (.spec.dns.provider=unmanaged).

Domain name for ingress (deprecated)

Gardener allows to deploy a nginx-ingress-controller into a shoot cluster (deprecated). This controller is exposed via a public load balancer (again, either IP or hostname). Gardener creates a wildcard DNS record pointing to this load balancer. Ingress resources can later use this wildcard DNS record to expose underlying applications.

What needs to be implemented to support a new DNS provider?

As part of the shoot flow Gardener will create two special resources in the seed cluster that need to be reconciled by an extension controller. The first resource (DNSProvider) is a declaration of a DNS provider (e.g., aws-route53, google-clouddns, …) with a reference to a Secret object that contains the provider-specific credentials in order to talk to the provider’s API. It also allows to specify two lists of domains that shall be allowed or disallowed to be used for DNS entries:

---
apiVersion: v1
kind: Secret
metadata:
  name: aws-credentials
  namespace: default
type: Opaque
data:
  # aws-route53 specific credentials here
---
apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSProvider
metadata:
  name: my-aws-account
  namespace: default
spec:
  type: aws-route53
  secretRef:
    name: aws-credentials
  domains:
    include:
    - dev.my-fancy-domain.com
    exclude:
    - staging.my-fancy-domain.com
    - prod.my-fancy-domain.com

When reconciling this resource the DNS controller has to read information about available DNS zones to figure out which domains can actually be supported by the provided credentials. Based on the constraints given in the DNSProvider resources .spec.domains.{include|exclude} fields it shall later only allow certain DNS entries. Gardener waits until the status indicates that the registration went well:

apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSProvider
...
status:
  state: Ready
  message: everything ok

Other possible states are Pending, Error, and Invalid. The DNS controller may provide an explanation of the .status.state in the .status.message field.

Now Gardener may create DNSEntry objects that represent the ask to create an actual external DNS record:

---
apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSEntry
metadata:
  name: dns
  namespace: default
spec:
  dnsName: apiserver.cluster1.dev.my-fancy-domain.com
  ttl: 600
  targets:
  - 8.8.8.8

It has to be automatically determined whether the to-be-created DNS record is of type A or CNAME. The spec shall also allow the creation of TXT records, e.g.:

---
apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSEntry
metadata:
  name: dns
  namespace: default
spec:
  dnsName: data.apiserver.cluster1.dev.my-fancy-domain.com
  ttl: 120
  text: |
        content for the DNS TXT record

The status section of this resource looks similar like the DNSProvider’s. Gardener is (as of today) only evaluating the .status.state and .status.message fields.

References and additional resources

14 - DNS Record

Contract: DNSRecord resources

Every shoot cluster requires external DNS records that are publicly resolvable. The management of these DNS records requires provider-specific knowledge which is to be developed outside the Gardener’s core repository.

Currently, Gardener uses DNSProvider and DNSEntry resources. However, this introduces undesired coupling of Gardener to a controller that does not adhere to the Gardener extension contracts. Because of this, we plan to stop using DNSProvider and DNSEntry resources for Gardener DNS records in the future and use the DNSRecord resources described here instead.

What does Gardener create DNS records for?

Internal domain name

Every shoot cluster’s kube-apiserver running in the seed is exposed via a load balancer that has a public endpoint (IP or hostname). This endpoint is used by end-users and also by system components (that are running in another network, e.g., the kubelet or kube-proxy) to talk to the cluster. In order to be robust against changes of this endpoint (e.g., caused due to re-creation of the load balancer or move of the DNS record to another seed cluster) Gardener creates a so-called internal domain name for every shoot cluster. The internal domain name is a publicly resolvable DNS record that points to the load balancer of the kube-apiserver. Gardener uses this domain name in the kubeconfigs of all system components, instead of using directly the load balancer endpoint. This way Gardener does not need to recreate all kubeconfigs if the endpoint changes - it just needs to update the DNS record.

External domain name

The internal domain name is not configurable by end-users directly but configured by the Gardener administrator. However, end-users usually prefer to have another DNS name, maybe even using their own domain sometimes to access their Kubernetes clusters. Gardener supports that by creating another DNS record, named external domain name, that actually points to the internal domain name. The kubeconfig handed out to end-users does contain this external domain name, i.e., users can access their clusters with the DNS name they like to.

As not every end-user has an own domain it is possible for Gardener administrators to configure so-called default domains. If configured, shoots that do not specify a domain explicitly get an external domain name based on a default domain (unless explicitly stated that this shoot should not get an external domain name (.spec.dns.provider=unmanaged).

Ingress domain name (deprecated)

Gardener allows to deploy a nginx-ingress-controller into a shoot cluster (deprecated). This controller is exposed via a public load balancer (again, either IP or hostname). Gardener creates a wildcard DNS record pointing to this load balancer. Ingress resources can later use this wildcard DNS record to expose underlying applications.

What needs to be implemented to support a new DNS provider?

As part of the shoot flow Gardener will create a number of DNSRecord resources in the seed cluster (one for each of the DNS records mentioned above) that need to be reconciled by an extension controller. This resource contains the following information:

  • The DNS provider type (e.g., aws-route53, google-clouddns, …)
  • A reference to a Secret object that contains the provider-specific credentials used to communicate with the provider’s API.
  • The fully qualified domain name (FQDN) of the DNS record, e.g. “api.<shoot domain>”.
  • The DNS record type, one of A, CNAME, or TXT.
  • The DNS record values, that is a list of IP addresses for A records, a single hostname for CNAME records, or a list of texts for TXT records.

Optionally, the DNSRecord resource may contain also the following information:

  • The region of the DNS record. If not specified, the region specified in the referenced Secret shall be used. If that is also not specified, the extension controller shall use a certain default region.
  • The DNS hosted zone of the DNS record. If not specified, it shall be determined automatically by the extension controller by getting all hosted zones of the account and searching for the longest zone name that is a suffix of the fully qualified domain name (FQDN) mentioned above.
  • The TTL of the DNS record in seconds. If not specified, it shall be set by the extension controller to 120.

Example DNSRecord:

---
apiVersion: v1
kind: Secret
metadata:
  name: dnsrecord-bar-external
  namespace: shoot--foo--bar
type: Opaque
data:
  # aws-route53 specific credentials here
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: DNSRecord
metadata:
  name: dnsrecord-external
  namespace: default
spec:
  type: aws-route53
  secretRef:
    name: dnsrecord-bar-external
    namespace: shoot--foo--bar
# region: eu-west-1
# zone: ZFOO
  name: api.bar.foo.my-fancy-domain.com
  recordType: A
  values:
  - 1.2.3.4
# ttl: 600

In order to support a new DNS record provider you need to write a controller that watches all DNSRecords with .spec.type=<my-provider-name>. You can take a look at the below referenced example implementation for the AWS route53 provider.

Key names in secrets containing provider-specific credentials

For compatibility with existing setups, extension controllers shall support two different namings of keys in secrets containing provider-specific credentials:

  • The naming used by the external-dns-management DNS controller. For example on AWS, the key names are AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION.
  • The naming used by other provider-specific extension controllers, e.g. for infrastructure. For example on AWS, the key names are accessKeyId, secretAccessKey, and region.

Avoiding reading the DNS hosted zones

If the DNS hosted zone is not specified in the DNSRecord resource, during the first reconciliation the extension controller shall determine the correct DNS hosted zone for the specified FQDN and write it to the status of the resource:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: DNSRecord
metadata:
  name: dnsrecord-external
  namespace: shoot--foo--bar
spec:
  ...
status:
  lastOperation: ...
  zone: ZFOO

On subsequent reconciliations, the extension controller shall use the zone from the status and avoid reading the DNS hosted zones from the provider. If the DNSRecord resource specifies a zone in .spec.zone and the extension controller has written a value to .status.zone, the first one shall be considered with higher priority by the extension controller.

Non-provider specific information required for DNS record creation

Some providers might require further information that is not provider specific but already part of the shoot resource. As Gardener cannot know which information is required by providers it simply mirrors the Shoot, Seed, and CloudProfile resources into the seed. They are part of the Cluster extension resource and can be used to extract information that is not part of the DNSRecord resource itself.

Using DNSRecord instead of DNSProvider and DNSEntry resources

Currently, Gardener will create DNSRecord resources only if the feature gate UseDNSRecords is enabled on gardener-apiserver, gardener-controller-manager, and gardenlet (it should be enabled on all three of them for the feature to work properly). If this feature gate is enabled, all three DNS records mentioned above (internal, external, and ingress) will be managed via DNSRecords and not DNSProvider / DNSEntry. DNSProvider resources will still be created for all providers listed in spec.dns.providers, including the one marked as primary: true. These providers can be used for DNSEntry resources needed by workloads deployed on the shoot cluster.

If the feature gate is disabled, Gardener will not create any DNSRecord resources and use DNSProvider / DNSEntry resources for its DNS records. The feature gate was introduced in v1.27 and was in Alpha stage (disabled by default) until v1.38 (including). With v1.44 the feature gate is graduated to GA and can’t be disabled.

In order to successfully reconcile a shoot with the feature gate enabled, extension controllers for DNSRecord resources for types used in the default, internal and custom domain secrets should be registered via ControllerRegistration resources.

Support for DNSRecord resources in the provider extensions

The following table contains information about the provider extension version that adds support for DNSRecord resources:

ExtensionVersion
provider-alicloudv1.26.0
provider-awsv1.27.0
provider-azurev1.21.0
provider-gcpv1.18.0
provider-openstackv1.21.0
provider-vsphereN/A
provider-equinix-metalN/A
provider-kubevirtN/A
provider-openshiftN/A

References and additional resources

15 - Extension

Contract: Extension resource

Gardener defines common procedures which must be passed to create a functioning shoot cluster. Well known steps are represented by special resources like Infrastructure, OperatingSystemConfig or DNS. These resources are typically reconciled by dedicated controllers setting up the infrastructure on the hyperscaler or managing DNS entries, etc..

But, some requirements don’t match with those special resources or don’t depend on being proceeded at a specific step in the creation / deletion flow of the shoot. They require a more generic hook. Therefore, Gardener offers the Extension resource.

What is required to register and support an Extension type?

Gardener creates one Extension resource per registered extension type in ControllerRegistration per shoot.

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
metadata:
  name: extension-example
spec:
  resources:
  - kind: Extension
    type: example
    globallyEnabled: true

If spec.resources[].globallyEnabled is true then the Extension resources of the given type is created for every shoot cluster. Set to false, the Extension resource is only created if configured in the Shoot manifest.

The Extension resources are created in the shoot namespace of the seed cluster.

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
  name: example
  namespace: shoot--foo--bar
spec:
  type: example
  providerConfig: {}

Your controller needs to reconcile extensions.extensions.gardener.cloud. Since there can exist multiple Extension resources per shoot, each one holds a spec.type field to let controllers check their responsibility (similar to all other extension resources of Gardener).

ProviderConfig

It is possible to provide data in the Shoot resource which is copied to spec.providerConfig of the Extension resource.

---
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: bar
  namespace: garden-foo
spec:
  extensions:
  - type: example
    providerConfig:
      foo: bar
...

results in

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
  name: example
  namespace: shoot--foo--bar
spec:
  type: example
  providerConfig:
    foo: bar

Shoot reconciliation flow and Extension status

Gardener creates Extension resources as part of the Shoot reconciliation. Moreover, it is guaranteed that the Cluster resource exists before the Extension resource is created.

For an Extension controller it is crucial to maintain the Extension’s status correctly. At the end Gardener checks the status of each Extension and only reports a successful shoot reconciliation if the state of the last operation is Succeeded.

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
  generation: 1
  name: example
  namespace: shoot--foo--bar
spec:
  type: example
status:
  lastOperation:
    state: Succeeded
  observedGeneration: 1

16 - Healthcheck Library

Health Check Library

Goal

Typically an extension reconciles a specific resource (Custom Resource Definitions (CRDs)) and creates/modifies resources in the cluster (via helm, managed resources, kubectl, …). We call these API Objects ‘dependent objects’ - as they are bound to the lifecycle of the extension.

The goal of this library is to enable extensions to setup health checks for their ‘dependent objects’ with minimal effort.

Usage

The library provides a generic controller with the ability to register any resource that satisfies the extension object interface. An example is the Worker CRD.

Health check functions for commonly used dependent objects can be reused and registered with the controller, such as:

  • Deployment
  • DaemonSet
  • StatefulSet
  • ManagedResource (Gardener specific)

See below example taken from the provider-aws.

health.DefaultRegisterExtensionForHealthCheck(
               aws.Type,
               extensionsv1alpha1.SchemeGroupVersion.WithKind(extensionsv1alpha1.WorkerResource),
               func() runtime.Object { return &extensionsv1alpha1.Worker{} },
               mgr, // controller runtime manager
               opts, // options for the health check controller
               nil, // custom predicates
               map[extensionshealthcheckcontroller.HealthCheck]string{
                       general.CheckManagedResource(genericactuator.McmShootResourceName): string(gardencorev1beta1.ShootSystemComponentsHealthy),
                       general.CheckSeedDeployment(aws.MachineControllerManagerName):      string(gardencorev1beta1.ShootEveryNodeReady),
                       worker.SufficientNodesAvailable():                                  string(gardencorev1beta1.ShootEveryNodeReady),
               })

This creates a health check controller that reconciles the extensions.gardener.cloud/v1alpha1.Worker resource with the spec.type ‘aws’. Three health check functions are registered that are executed during reconciliation. Each health check is mapped to a single HealthConditionType that results in conditions with the same condition.type (see below). To contribute to the Shoot’s health, the following can be used: SystemComponentsHealthy, EveryNodeReady, ControlPlaneHealthy. The Gardener/Gardenlet checks each extension for conditions matching these types. However extensions are free to choose any HealthConditionType. More information can be found here.

A health check has to satisfy below interface. You can find implementation examples here.

type HealthCheck interface {
    // Check is the function that executes the actual health check
    Check(context.Context, types.NamespacedName) (*SingleCheckResult, error)
    // InjectSeedClient injects the seed client
    InjectSeedClient(client.Client)
    // InjectShootClient injects the shoot client
    InjectShootClient(client.Client)
    // SetLoggerSuffix injects the logger
    SetLoggerSuffix(string, string)
    // DeepCopy clones the healthCheck
    DeepCopy() HealthCheck
}

The health check controller regularly (default: 30s) reconciles the extension resource and executes the registered health checks for the dependent objects. As a result, the controller writes condition(s) to the status of the extension containing the health check result. In our example, two checks are mapped to ShootEveryNodeReady and one to ShootSystemComponentsHealthy, leading to conditions with two distinct HealthConditionTypes (condition.type)

status:
  conditions:
    - lastTransitionTime: "20XX-10-28T08:17:21Z"
      lastUpdateTime: "20XX-11-28T08:17:21Z"
      message: (1/1) Health checks successful
      reason: HealthCheckSuccessful
      status: "True"
      type: SystemComponentsHealthy
    - lastTransitionTime: "20XX-10-28T08:17:21Z"
      lastUpdateTime: "20XX-11-28T08:17:21Z"
      message: (2/2) Health checks successful
      reason: HealthCheckSuccessful
      status: "True"
      type: EveryNodeReady

Please note that there are four statuses: True, False, Unknown, and Progressing.

  • True should be used for successful health checks.
  • False should be used for unsuccessful/failing health checks.
  • Unknown should be used when there was an error trying to determine the health status.
  • Progressing should be used to indicate that the health status did not succeed but for expected reasons (e.g., a cluster scale up/down could make the standard health check fail because something is wrong with the Machines, however, it’s actually an expected situation and known to be completed within a few minutes.)

Health checks that report Progressing should also provide a timeout after which this “progressing situation” is expected to be completed. The health check library will automatically transition the status to False if the timeout was exceeded.

Additional Considerations

It is up to the extension to decide how to conduct health checks, though it is recommended to make use of the build-in health check functionality of managed-resources for trivial checks. By deploying the depending resources via managed resources, the gardener resource manager conducts basic checks for different API objects out-of-the-box (e.g Deployments, DaemonSets, …) - and writes health conditions. In turn, the library contains a health check function to gather the health information from managed resources.

More sophisticated health checks should be implemented by the extension controller itself (implementing the HealthCheck interface).

17 - Infrastructure

Contract: Infrastructure resource

Every Kubernetes cluster requires some low-level infrastructure to be setup in order to work properly. Examples for that are networks, routing entries, security groups, IAM roles, etc. Before introducing the Infrastructure extension resource Gardener was using Terraform in order to create and manage these provider-specific resources (e.g., see here). Now, Gardener commissions an external, provider-specific controller to take over this task.

Which infrastructure resources are required?

Unfortunately, there is no general answer to this question as it is highly provider specific. Consider the above mentioned resources, i.e. VPC, subnets, route tables, security groups, IAM roles, SSH key pairs. Most of the resources are required in order to create VMs (the shoot cluster worker nodes), load balancers, and volumes.

What needs to be implemented to support a new infrastructure provider?

As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
  name: infrastructure
  namespace: shoot--foo--bar
spec:
  type: azure
  region: eu-west-1
  secretRef:
    name: cloudprovider
    namespace: shoot--foo--bar
  providerConfig:
    apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
    kind: InfrastructureConfig
    resourceGroup:
      name: mygroup
    networks:
      vnet: # specify either 'name' or 'cidr'
      # name: my-vnet
        cidr: 10.250.0.0/16
      workers: 10.250.0.0/19

The .spec.secretRef contains a reference to the provider secret pointing to the account that shall be used to create the needed resources. However, the most important section is the .spec.providerConfig. It contains an embedded declaration of the provider specific configuration for the infrastructure (that cannot be known by Gardener itself). You are responsible for designing how this configuration looks like. Gardener does not evaluate it but just copies this part from what has been provided by the end-user in the Shoot resource.

After your controller has created the required resources in your provider’s infrastructure it needs to generate an output that can be used by other controllers in subsequent steps. An example for that is the Worker extension resource controller. It is responsible for creating virtual machines (shoot worker nodes) in this prepared infrastructure. Everything that it needs to know in order to do that (e.g., the network IDs, security group names, etc. (again: provider-specific)) needs to be provided as output in the Infrastructure resource:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
  name: infrastructure
  namespace: shoot--foo--bar
spec:
  ...
status:
  lastOperation: ...
  providerStatus:
    apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
    kind: InfrastructureStatus
    resourceGroup:
      name: mygroup
    networks:
      vnet:
        name: my-vnet
      subnets:
      - purpose: nodes
        name: my-subnet
    availabilitySets:
    - purpose: nodes
      id: av-set-id
      name: av-set-name
    routeTables:
    - purpose: nodes
      name: route-table-name
    securityGroups:
    - purpose: nodes
      name: sec-group-name

In order to support a new infrastructure provider you need to write a controller that watches all Infrastructures with .spec.type=<my-provider-name>. You can take a look at the below referenced example implementation for the Azure provider.

Dynamic nodes network for shoot clusters

Some environments do not allow end-users to statically define a CIDR for the network that shall be used for the shoot worker nodes. In these cases it is possible for the extension controllers to dynamically provision a network for the nodes (as part of their reconciliation loops), and to provide the CIDR in the status of the Infrastructure resource:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
  name: infrastructure
  namespace: shoot--foo--bar
spec:
  ...
status:
  lastOperation: ...
  providerStatus: ...
  nodesCIDR: 10.250.0.0/16

Gardener will pick this nodesCIDR and use it to configure the VPN components to establish network connectivity between the control plane and the worker nodes. If the Shoot resource already specifies a nodes CIDR in .spec.networking.nodes and the extension controller provides also a value in .status.nodesCIDR in the Infrastructure resource then the latter one will always be considered with higher priority by Gardener.

Non-provider specific information required for infrastructure creation

Some providers might require further information that is not provider specific but already part of the shoot resource. One example for this is the GCP infrastructure controller which needs the pod and the service network of the cluster in order to prepare and configure the infrastructure correctly. As Gardener cannot know which information is required by providers it simply mirrors the Shoot, Seed, and CloudProfile resources into the seed. They are part of the Cluster extension resource and can be used to extract information that is not part of the Infrastructure resource itself.

Implementation details

Actuator interface

Most existing infrastructure controller implementations follow a common pattern where a generic Reconciler delegates to an Actuator interface that contains the methods Reconcile, Delete, Migrate, and Restore. These methods are called by the generic Reconciler for the respective operations, and should be implemented by the extension according to the contract described here and the migration guidelines.

ConfigValidator interface

For infrastructure controllers, the generic Reconciler also delegates to a ConfigValidator interface that contains a single Validate method. This method is called by the generic Reconciler at the beginning of every reconciliation, and can be implemented by the extension to validate the .spec.providerConfig part of the Infrastructure resource with the respective cloud provider, typically the existence and validity of cloud provider resources such as AWS VPCs or GCP Cloud NAT IPs.

The Validate method returns a list of errors. If this list is non-empty, the generic Reconciler will fail with an error. This error will have the error code ERR_CONFIGURATION_PROBLEM, unless there is at least one error in the list that has its ErrorType field set to field.ErrorTypeInternal.

References and additional resources

18 - Logging And Monitoring

Logging and Monitoring for Extensions

Gardener provides an integrated logging and monitoring stack for alerting, monitoring and troubleshooting of its managed components by operators or end users. For further information how to make use of it in these roles, refer to the corresponding guides for exploring logs and for monitoring with Grafana.

The components that constitute the logging and monitoring stack are managed by Gardener. By default, it deploys Prometheus, Alertmanager and Grafana into the garden namespace of all seed clusters. If the Logging feature gate in the gardenlet configuration is enabled, it will deploy fluent-bit and Loki in the garden namespace too.

Each shoot namespace hosts managed logging and monitoring components. As part of the shoot reconciliation flow, Gardener deploys a shoot-specific Prometheus, Grafana and, if configured, an Alertmanager into the shoot namespace, next to the other control plane components. If the Logging feature gate is enabled and the shoot purpose is not testing, it deploys a shoot-specific Loki in the shoot namespace too.

The logging and monitoring stack is extensible by configuration. Gardener extensions can take advantage of that and contribute configurations encoded in ConfigMaps for their own, specific dashboards, alerts, log parsers and other supported assets and integrate with it. As with other Gardener resources, they will be continuously reconciled.

This guide is about the roles and extensibility options of the logging and monitoring stack components, and how to integrate extensions with:

Monitoring

The central Prometheus instance in the garden namespace fetches metrics and data from all seed cluster nodes and all seed cluster pods. It uses the federation concept to allow the shoot-specific instances to scrape only the metrics for the pods of the control plane they are responsible for. This mechanism allows to scrape the metrics for the nodes/pods once for the whole cluster, and to have them distributed afterwards.

The shoot-specific metrics are then made available to operators and users in the shoot Grafana, using the shoot Prometheus as data source.

Extension controllers might deploy components as part of their reconciliation next to the shoot’s control plane. Examples for this would be a cloud-controller-manager or CSI controller deployments. Extensions that want to have their managed control plane components integrated with monitoring can contribute their per-shoot configuration for scraping Prometheus metrics, Alertmanager alerts or Grafana dashboards.

Extensions monitoring integration

Before deploying the shoot-specific Prometheus instance, Gardener will read all ConfigMaps in the shoot namespace, which are labeled with extensions.gardener.cloud/configuration=monitoring. Such ConfigMaps may contain four fields in their data:

  • scrape_config: This field contains Prometheus scrape configuration for the component(s) and metrics that shall be scraped.
  • alerting_rules: This field contains Alertmanager rules for alerts that shall be raised.
  • (deprecated)dashboard_operators: This field contains a Grafana dashboard in JSON that is only relevant for Gardener operators.
  • (deprecated)dashboard_users: This field contains a Grafana dashboard in JSON that is only relevant for Gardener users (shoot owners).

Example: A ControlPlane controller deploying a cloud-controller-manager into the shoot namespace wants to integrate monitoring configuration for scraping metrics, alerting rules, dashboards and logging configuration for exposing logs to the end users.

apiVersion: v1
kind: ConfigMap
metadata:
  name: extension-controlplane-monitoring-ccm
  namespace: shoot--project--name
  labels:
    extensions.gardener.cloud/configuration: monitoring
data:
  scrape_config: |
    - job_name: cloud-controller-manager
      scheme: https
      tls_config:
        insecure_skip_verify: true
      authorization:
        type: Bearer
        credentials_file: /var/run/secrets/gardener.cloud/shoot/token/token
      honor_labels: false
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [shoot--project--name]
      relabel_configs:
      - source_labels:
        - __meta_kubernetes_service_name
        - __meta_kubernetes_endpoint_port_name
        action: keep
        regex: cloud-controller-manager;metrics
      # common metrics
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [ __meta_kubernetes_pod_name ]
        target_label: pod
      metric_relabel_configs:
      - process_max_fds
      - process_open_fds    

  alerting_rules: |
    cloud-controller-manager.rules.yaml: |
      groups:
      - name: cloud-controller-manager.rules
        rules:
        - alert: CloudControllerManagerDown
          expr: absent(up{job="cloud-controller-manager"} == 1)
          for: 15m
          labels:
            service: cloud-controller-manager
            severity: critical
            type: seed
            visibility: all
          annotations:
            description: All infrastructure specific operations cannot be completed (e.g. creating load balancers or persistent volumes).
            summary: Cloud controller manager is down.    

Logging

In Kubernetes clusters, container logs are non-persistent and do not survive stopped and destroyed containers. Gardener addresses this problem for the components hosted in a seed cluster, by introducing its own managed logging solution. It is integrated with the Gardener monitoring stack to have all troubleshooting context in one place.

&ldquo;Cluster Logging Topology&rdquo;

Gardener logging consists of components in three roles - log collectors and forwarders, log persistency and exploration/consumption interfaces. All of them live in the seed clusters in multiple instances:

  • Logs are persisted by Loki instances deployed as StatefulSets - one per shoot namespace, if the Logging feature gate is enabled and the shoot purpose is not testing, and one in the garden namespace. The shoot instances store logs from the control plane components hosted there. The garden Loki instance is responsible for logs from the rest of the seed namespaces - kube-system, garden extension-* and others.
  • Fluent-bit DaemonSets deployed on each seed node collect logs from it. A custom plugin takes care to distribute the collected log messages to the Loki instances that they are intended for. This allows to fetch the logs once for the whole cluster, and to distribute them afterwards.
  • Grafana is the UI component used to explore monitoring and log data together for easier troubleshooting and in context. Grafana instances are configured to use the coresponding Loki instances, sharing the same namespace, as data providers. There is one Grafana Deployment in the garden namespace and two Deployments per shoot namespace (one exposed to the end users and one for the operators).

Logs can be produced from various sources, such as containers or systemd, and in different formats. The fluent-bit design supports configurable data pipeline to address that problem. Gardener provides such configuration for logs produced by all its core managed components as a ConfigMap. Extensions can contribute their own, specific configurations as ConfigMaps too. See for example the logging configuration for the Gardener AWS provider extension. The Gardener reconciliation loop watches such resources and updates the fluent-bit agents dynamically.

Fluent-bit log parsers and filters

To integrate with Gardener logging, extensions can and should specify how fluent-bit will handle the logs produced by the managed components that they contribute to Gardener. Normally, that would require to configure a parser for the specific logging format, if none of the available is applicable, and a filter defining how to apply it. For a complete reference for the configuration options, refer to fluent-bit’s documentation.

Note: At the moment only parser and filter configurations are supported.

To contribute its own configuration to the fluent-bit agents data pipelines, an extension must provide it as a ConfigMap labeled extensions.gardener.cloud/configuration=logging and deployed in the seed’s garden namespace. Unlike the monitoring stack, where configurations are deployed per shoot, here a single configuration ConfigMap is sufficient and it applies to all fluent-bit agents in the seed. Its data field can have the following properties:

  • filter-kubernetes.conf - configuration for data pipeline filters
  • parser.conf - configuration for data pipeline parsers

Note: Take care to provide the correct data pipeline elements in the coresponding data field and not to mix them.

Example: Logging configuration for provider-specific (OpenStack) worker controller deploying a machine-controller-manager component into a shoot namespace that reuses the kubeapiserverParser defined in fluent-bit-configmap.yaml to parse the component logs

apiVersion: v1
kind: ConfigMap
metadata:
  name: gardener-extension-provider-openstack-logging-config
  namespace: garden
  labels:
    extensions.gardener.cloud/configuration: logging
data:
  filter-kubernetes.conf: |
    [FILTER]
        Name                parser
        Match               kubernetes.machine-controller-manager*openstack-machine-controller-manager*
        Key_Name            log
        Parser              kubeapiserverParser
        Reserve_Data        True    
How to expose logs to the users

To expose logs from extension components to the users, the extension owners have to specify a modify filter which will add __gardener_multitenant_id__=operator;user entry to the log record. This entry contains all of the tenants, which have to receive this log. The tenants are semicolon separated. This specific dedicated entry will be extracted and removed from the log in the gardener fluent-bit-to-loki output plugin and added to the label set of that log. Then it will be parsed and removed from the label set. Any whitespace will be truncated during the parsing. The extension components logs can be found in Controlplane Logs Dashboard Grafana dashboard.

Example: In this example we configure fluent-bit when it finds a log with field tag, which match the Condition, to add __gardener_multitenant_id__=operator;user into the log record.

apiVersion: v1
kind: ConfigMap
metadata:
  name: gardener-extension-provider-aws-logging-config
  namespace: garden
  labels:
    extensions.gardener.cloud/configuration: logging
data:
  filter-kubernetes.conf: |
    [FILTER]
        Name          modify
        Match         kubernetes.*
        Condition     Key_value_matches tag ^kubernetes\.var\.log\.containers\.(cloud-controller-manager-.+?_.+?_aws-cloud-controller-manager|csi-driver-controller-.+?_.+?_aws-csi)_.+?
        Add           __gardener_multitenant_id__ operator;user    

In this case we have predefined filter which copies the log’s tag into the log record under the tag field. The tag consists of the container logs directories path, plus <pod_name>_<shoot_controlplane_namespace>_<container_name>_<container_id>, so here we say:

When you see a record from pod cloud-controller-manager and some of the aws-cloud-controller-manager, csi-driver-controller or aws-csi containers add __gardener_multitenant_id__ key with operator;user value into the log record.

Further details how to define parsers and use them with examples can be found in the following guide.

Grafana

The three types of Grafana instances found in a seed cluster are configured to expose logs of different origin in their dashboards:

  • Garden Grafana dashboards expose logs from non-shoot namespaces of the seed clusters
  • Shoot User Grafana dashboards expose a subset of the logs shown to operators
    • Kube Apiserver
    • Kube Controller Manager
    • Kube Scheduler
    • Cluster Autoscaler
    • VPA components
  • Shoot Operator Grafana dashboards expose logs from the shoot cluster namespace where they belong

If the type of logs exposed in the Grafana instances needs to be changed, it is necessary to update the coresponding instance dashboard configurations.

Tips

  • Be careful to match exactly the log names that you need for a particular parser in your filters configuration. The regular expression you will supply will match names in the form kubernetes.pod_name.<metadata>.container_name. If there are extensions with the same container and pod names, they will all match the same parser in a filter. That may be a desired effect, if they all share the same log format. But it will be a problem if they don’t. To solve it, either the pod or container names must be unique, and the regular expression in the filter has to match that unique pattern. A recommended approach is to prefix containers with the extension name and tune the regular expression to match it. For example, using myextension-container as container name, and a regular expression kubernetes.mypod.*myextension-container will guarantee match of the right log name. Make sure that the regular expression does not match more than you expect. For example, kubernetes.systemd.*systemd.* will match both systemd-service and systemd-monitor-service. You will want to be as specific as possible.
  • It’s a good idea to put the logging configuration into the Helm chart that also deploys the extension controller, while the monitoring configuration can be part of the Helm chart/deployment routine that deploys the component managed by the controller.

References and additional resources

19 - Managedresources

Deploy resources to the Shoot cluster

We have introduced a component called gardener-resource-manager that is deployed as part of every shoot control plane in the seed. One of its tasks is to manage CRDs, so called ManagedResources. Managed resources contain Kubernetes resources that shall be created, reconciled, updated, and deleted by the gardener-resource-manager.

Extension controllers may create these ManagedResources in the shoot namespace if they need to create any resource in the shoot cluster itself, for example RBAC roles (or anything else).

Where can I find more examples and more information how to use ManagedResources?

Please take a look at the respective documentation.

20 - Migration

Control Plane Migration

Control Plane Migration is a new Gardener feature that has been recently implemented as proposed in GEP-7 Shoot Control Plane Migration. It should be properly supported by all extensions controllers. This document outlines some important points that extension maintainers should keep in mind to properly support migration in their extensions.

Overall Principles

The following principles should always be upheld:

  • All state maintained by the extension that is external from the seed cluster, for example infrastructure resources in a cloud provider, DNS entries, etc., should be kept during the migration. No such state should be deleted and then recreated, as this might cause disruption in the availability of the shoot cluster.
  • All Kubernetes resources maintained by the extension in the shoot cluster itself should also be kept during the migration. No such resources should be deleted and then recreated.

Migrate and Restore Operations

Two new operations have been introduced in Gardener. They can be specified as values of the gardener.cloud/operation annotation on an extension resource to indicate that an operation different from a normal reconcile should be performed by the corresponding extension controller:

  • The migrate operation is used to ask the extension controller in the source seed to stop reconciling extension resources (in case they are requeued due to errors) and perform cleanup activities, if such are required. These cleanup activities might involve removing finalizers on resources in the shoot namespace that have been previously created by the extension controller and deleting them without actually deleting any resources external to the seed cluster.
  • The restore operation is used to ask extension controller in the destination seed to restore any state saved in the extension resource status, before performing the actual reconciliation.

Unlike the reconcile operation, extension controllers must remove the gardener.cloud/operation annotation at the end of a successful reconciliation when the current operation is migrate or restore, not at the beginning of a reconciliation.

Cleaning-up Source Seed Resources

All resources in the source seed that have been created by an extension controller, for example secrets, config maps, managed resources, etc. should be properly cleaned up by the extension controller when the current operation is migrate. As mentioned above, such resources should be deleted without actually deleting any resources external to the seed cluster.

For many custom resources, for example MCM resources, the above requirement means in practice that any finalizers should be removed before deleting the resource, in addition to ensuring that the resource deletion is not reconciled by its respective controller if there is no finalizer. For managed resources, the above requirement means in practice that the spec.keepObjects field should be set to true before deleting the extension resource.

Here it is assumed that any resources that contain state needed by the extension controller can be safely deleted, since any such state has been saved as described in Saving and Restoring Extension States at the end of the last successful reconciliation.

Saving and Restoring Extension States

Some extension controllers create and maintain their own state when reconciling extension resources. For example, most infrastructure controllers use Terraform and maintain the terraform state in a special config map in the shoot namespace. This state must be properly migrated to the new seed cluster during control plane migration, so that subsequent reconciliations in the new seed could find and use it appropriately.

All extension controllers that require such state migration must save their state in the status.state field of their extension resource at the end of a successful reconciliation. They must also restore their state from that same field upon reconciling an extension resource when the current operation is restore, as specified by the gardener.cloud/operation annotation, before performing the actual reconciliation.

As an example, an infrastructure controller that uses Terraform must save the terraform state in the status.state field of the Infrastructure resource. An Infrastructure resource with a properly saved state might look as follows:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
  name: infrastructure
  namespace: shoot--foo--bar
spec:
  type: azure
  region: eu-west-1
  secretRef:
    name: cloudprovider
    namespace: shoot--foo--bar
  providerConfig:
    apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
    kind: InfrastructureConfig
    resourceGroup:
      name: mygroup
    ...
status:
  state: |
    {
      "version": 3,
      "terraform_version": "0.11.14",
      "serial": 2,
      "lineage": "3a1e2faa-e7b6-f5f0-5043-368dd8ea6c10",
      ...
    }    

Extension controllers that do not use a saved state and therefore do not require state migration could leave the status.state field as nil at the end of a successful reconciliation, and just perform a normal reconciliation when the current operation is restore.

In addition, extension controllers that use referenced resources (usually secrets) must also make sure that these resources are added to the status.resources field of their extension resource at the end of a successful reconciliation, so they could be properly migrated by Gardener to the destination seed.

Implementation Details

Migrate and Restore Actuator Methods

Most extension controller implementations follow a common pattern where a generic Reconciler implementation delegates to an Actuator interface that contains the methods Reconcile and Delete, provided by the extension. The two new methods Migrate and Restore have been added to all such Actuator interfaces, see the infrastructure Actuator interface as an example. These methods are called by the generic reconcilers for the migrate and restore operations respectively, and should be implemented by the extension according to the above guidelines.

Owner Checks

The so called “bad case” scenario for control plane migration proposed in GEP-17 introduced the requirement for extension controllers to check whether they are currently operating in the source or destination seed during reconciliations to avoid the case in which controllers from different seeds can operate on the same IaaS resources (split brain scenario). To that end a special “owner checking” mechanism has been added to the Reconciler implementations of all extension controllers. For an example usage of this mechanism see the infrastructure Reconciler implementation. The purpose of the owner check is to interrupt reconciliations of extension controllers that do not operate in the seed that is currently configured to host the shoot’s control plane. Note that Migrate operations must not be interrupted, as they are required to clean up kubernetes resources left in the shoot’s control plane namespace and do not act on IaaS resources.

Extension Controllers Based on Generic Actuators

In practice, the implementation of many extension controllers (for example, the controlplane and worker controllers in most provider extensions) are based on a generic Actuator implementation that only delegates to extension methods for behavior that is truly provider specific. In all such cases, the Migrate and Restore methods have already been implemented properly in the generic actuators and there is nothing more to do in the extension itself.

In some rare cases, extension controllers based on a generic actuator might still introduce a custom Actuator implementation to override some of the generic actuator methods in order to enhance or change their behavior in a certain way. In such cases, the Migrate and Restore methods might need to be overridden as well, see the Azure controlplane controller as an example.

Extension Controllers Not Based on Generic Actuators

The implementation of some extension controllers (for example, the infrastructure controllers in all provider extensions) are not based on a generic Actuator implementation. Such extension controllers must always provide a proper implementation of the Migrate and Restore methods according to the above guidelines, see the AWS infrastructure controller as an example. In practice this might result in code duplication between the different extensions, since the Migrate and Restore code is usually not provider or OS-specific.

21 - Network

Gardener Network Extension

Gardener is an open-source project that provides a nested user model. Basically, there are two types of services provided by Gardener to its users:

  • Managed: end-users only request a Kubernetes cluster (Clusters-as-a-Service)
  • Hosted: operators utilize Gardener to provide their own managed version of Kubernetes (Cluster-Provisioner-as-a-service)

Whether an operator or an end-user, it makes sense to provide choice. For example, for an end-user it might make sense to choose a network-plugin that would support enforcing network policies (some plugins does not come with network-policy support by default). For operators however, choice only matters for delegation purposes i.e., when providing an own managed-service, it becomes important to also provide choice over which network-plugins to use.

Furthermore, Gardener provisions clusters on different cloud-providers with different networking requirements. For example, Azure does not support Calico Networking [1], this leads to the introduction of manual exceptions in static add-on charts which is error prone and can lead to failures during upgrades.

Finally, every provider is different, and thus the network always needs to adapt to the infrastructure needs to provide better performance. Consistency does not necessarily lie in the implementation but in the interface.

Motivation

Prior to the Network Extensibility concept, Gardener followed a mono network-plugin support model (i.e., Calico). Although this seemed to be the easier approach, it did not completely reflect the real use-case. The goal of the Gardener Network Extensions is to support different network plugins, therefore, the specification for the network resource won’t be fixed and will be customized based on the underlying network plugin.

To do so, a ProviderConfig field in the spec will be provided where each plugin will define. Below is an example for how to deploy Calico as the cluster network plugin.

The Network Extensions Resource

Here is what a typical Network resource would look-like:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Network
metadata:
  name: my-network
spec:
  podCIDR: 100.244.0.0/16
  serviceCIDR: 100.32.0.0/13
  type: calico
  providerConfig:
    apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
    kind: NetworkConfig
    backend: bird
    ipam:
      cidr: usePodCIDR
      type: host-local

The above resources is divided into two parts (more information can be found here):

  • global configuration (e.g., podCIDR, serviceCIDR, and type)
  • provider specific config (e.g., for calico we can choose to configure a bird backend)

Note: certain cloud-provider extensions might have webhooks that would modify the network-resource to fit into their network specific context. As previously mentioned, Azure does not support IPIP, as a result, the Azure provider extension implements a webhook to mutate the backend and set it to None instead of bird.

Supporting a new Network Extension Provider

To add support for another networking provider (e.g., weave, Cilium, Flannel, etc.) a network extension controller needs to be implemented which would optionally have its own custom configuration specified in the spec.providerConfig in the Network resource. For example, if support for a network plugin named gardenet is required, the following Network resource would be created:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Network
metadata:
  name: my-network
spec:
  podCIDR: 100.244.0.0/16
  serviceCIDR: 100.32.0.0/13
  type: gardenet
  providerConfig:
    apiVersion: gardenet.networking.extensions.gardener.cloud/v1alpha1
    kind: NetworkConfig
    gardenetCustomConfigField: <value>
    ipam:
      cidr: usePodCIDR
      type: host-local

Once applied, the presumably implemented Gardenet extension controller, would pick the configuration up, parse the providerConfig and create the necessary resources in the shoot.

For additional reference, please have a look at the networking-calico provider extension, which provides more information on how to configure the necessary charts as well as the actuators required to reconcile networking inside the Shoot cluster to the desired state.

Supporting kube-proxy less Service Routing

Some networking extensions support service routing without the kube-proxy component. This is why Gardener supports disabling of kube-proxy for service routing by setting .spec.kubernetes.kubeproxy.enabled to false in the Shoot specification. The implicit contract of the flag is: If kube-proxy is disabled then the networking extension is responsible for the service routing. The networking extensions need to handle this twofold:

  1. During the reconciliation of the networking resources, the extension needs to check whether kube-proxy takes care of the service routing or the networking extension itself should handle it. In case the networking extension should be responsible according to .spec.kubernetes.kubeproxy.enabled (but is unable to perform the service routing) it should raise an error during the reconciliation. If the networking extension should handle the service routing it may reconfigure itself accordingly.
  2. (optional) In case the networking extension does not support taking over the service routing (in some scenarios), it is recommended to also provide a validating admission webhook to reject corresponding changes early on. The validation may take the current operating mode of the networking extension into consideration.

References

[1] Azure support for Calico Networking

22 - Operatingsystemconfig

Contract: OperatingSystemConfig resource

Gardener uses the machine API and leverages the functionalities of the machine-controller-manager (MCM) in order to manage the worker nodes of a shoot cluster. The machine-controller-manager itself simply takes a reference to an OS-image and (optionally) some user-data (a script or configuration that is executed when a VM is bootstrapped), and forwards both to the provider’s API when creating VMs. MCM does not have any restrictions regarding supported operating systems as it does not modify or influence the machine’s configuration in any way - it just creates/deletes machines with the provided metadata.

Consequently, Gardener needs to provide this information when interacting with the machine-controller-manager. This means that basically every operating system is possible to be used as long as there is some implementation that generates the OS-specific configuration in order to provision/bootstrap the machines.

⚠️ Currently, there are a few requirements:

  1. The operating system must have built-in Docker support.
  2. The operating system must have systemd support.
  3. The operating system must have wget pre-installed.
  4. The operating system must have jq pre-installed.

The reasons for that will become evident later.

What does the user-data bootstrapping the machines contain?

Gardener installs a few components onto every worker machine in order to allow it to join the shoot cluster. There is the kubelet process, some scripts for continuously checking the health of kubelet and docker, but also configuration for log rotation, CA certificates, etc. The complete configuration you can find here. We are calling this the “original” user-data.

How does Gardener bootstrap the machines?

Usually, you would submit all the components you want to install onto the machine as part of the user-data during creation time. However, some providers do have a size limitation (like ~16KB) for that user-data. That’s why we do not send the “original” user-data to the machine-controller-manager (who forwards it then to the provider’s API). Instead, we only send a small script that downloads the “original” data and applies it on the machine directly. This way we can extend the “original” user-data without any size restrictions - plus we can modify it without the necessity of re-creating the machine (because we run a script that downloads and updates it continuously).

The high-level flow is as follows:

  1. For every worker pool X in the Shoot specification, Gardener creates a Secret named cloud-config-<X> in the kube-system namespace of the shoot cluster. The secret contains the “original” user-data.

  2. Gardener generates a kubeconfig with minimal permissions just allowing reading these secrets. It is used by the downloader script later.

  3. Gardener provides the downloader script, the kubeconfig, and the machine image stated in the Shoot specification to the machine-controller-manager.

  4. Based on this information the machine-controller-manager creates the VM.

  5. After the VM has been provisioned the downloader script starts and fetches the appropriate Secret for its worker pool (containing the “original” user-data) and applies it.

Detailed bootstrap flow with a worker generated bootstrap-token

With gardener v1.23 a file with the content <<BOOTSTRAP_TOKEN>> is added to the cloud-config-<worker-group>-downloader OperatingSystemConfig (part of step 2 in the graphic below). Via the OS extension the new file (with its content in clear-text) gets passed to the corresponding Worker resource.

The Worker controller has to guarantee that:

  • a bootstrap token is created.
  • the <<BOOTSTRAP_TOKEN>> in the user data is replaced by the generated token. One implementation of that is depicted in the picture where the machine-controller-manager creates a temporary token and replaces the placeholder.

As part of the user-data the bootstrap-token is placed on the newly created VM under a defined path. The cloud-config-script will then refer to the file path of the added bootstrap token in the kubelet-bootstrap script.

Bootstrap flow with shortlived bootstrapTokens

Compatibility matrix for node bootstrap-token

With Gardener v1.23, we replaced the long-valid bootstrap-token shared between nodes with a short-lived token unique for each node, ref: #3898.

❗ When updating to Gardener version >=1.35 the old bootstrap-token will be removed. You are required to update your extensions to the following versions when updating Gardener:

ExtensionVersionRelease DatePull Request
os-gardenlinuxv0.9.02 Julhttps://github.com/gardener/gardener-extension-os-gardenlinux/pull/29
os-suse-chostv1.11.02 Julhttps://github.com/gardener/gardener-extension-os-suse-chost/pull/41
os-ubuntuv1.11.02 Julhttps://github.com/gardener/gardener-extension-os-ubuntu/pull/42
os-flatcarv1.7.02 Julhttps://github.com/gardener/gardener-extension-os-coreos/pull/24
infrastructure-provider using Machine Controller Managervaries~ end of 2019https://github.com/gardener/machine-controller-manager/pull/351

⚠️ If you run a provider extension that does not use Machine Controller Manager (MCM) you need to implement the functionality of creating a temporary bootstrap-token before updating your Gardener version to v1.35 or higher. All provider extensions maintained in https://github.com/gardener/ use MCM.

How does Gardener update the user-data on already existing machines?

With ongoing development and new releases of Gardener some new components could be required to get installed onto every shoot worker VM, or existing components need to be changed. Gardener achieves that by simply updating the user-data inside the Secrets mentioned above (step 1). The downloader script is continuously (every 30s) reading the secret’s content (which might include an updated user-data) and storing it onto the disk. In order to re-apply the (new) downloaded data the secrets do not only contain the “original” user-data but also another short script (called “execution” script). This script checks whether the downloaded user-data differs from the one previously applied - and if required - re-applies it. After that it uses systemctl to restart the installed systemd units.

With the help of the execution script Gardener can centrally control how machines are updated without the need of OS providers to (re-)implement that logic. However, as stated in the mentioned requirements above, the execution script assumes existence of Docker and systemd.

What needs to be implemented to support a new operating system?

As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: OperatingSystemConfig
metadata:
  name: pool-01-original
  namespace: default
spec:
  type: <my-operating-system>
  purpose: reconcile
  reloadConfigFilePath: /var/lib/cloud-config-downloader/cloud-config
  units:
  - name: docker.service
    dropIns:
    - name: 10-docker-opts.conf
      content: |
        [Service]
        Environment="DOCKER_OPTS=--log-opt max-size=60m --log-opt max-file=3"        
  - name: docker-monitor.service
    command: start
    enable: true
    content: |
      [Unit]
      Description=Docker-monitor daemon
      After=kubelet.service
      [Install]
      WantedBy=multi-user.target
      [Service]
      Restart=always
      EnvironmentFile=/etc/environment
      ExecStart=/opt/bin/health-monitor docker      
  files:
  - path: /var/lib/kubelet/ca.crt
    permissions: 0644
    encoding: b64
    content:
      secretRef:
        name: default-token-5dtjz
        dataKey: token
  - path: /etc/sysctl.d/99-k8s-general.conf
    permissions: 0644
    content:
      inline:
        data: |
          # A higher vm.max_map_count is great for elasticsearch, mongo, or other mmap users
          # See https://github.com/kubernetes/kops/issues/1340
          vm.max_map_count = 135217728          

In order to support a new operating system you need to write a controller that watches all OperatingSystemConfigs with .spec.type=<my-operating-system>. For those it shall generate a configuration blob that fits to your operating system. For example, a CoreOS controller might generate a CoreOS cloud-config or Ignition, SLES might generate cloud-init, and others might simply generate a bash script translating the .spec.units into systemd units, and .spec.files into real files on the disk.

OperatingSystemConfigs can have two purposes which can be used (or ignored) by the extension controllers: either provision or reconcile.

  • The provision purpose is used by Gardener for the user-data that it later passes to the machine-controller-manager (and then to the provider’s API) when creating new VMs. It contains the downloader unit.
  • The reconcile purpose contains the “original” user-data (that is then stored in Secrets in the shoot’s kube-system namespace (see step 1). This is downloaded and applies late (see step 5).

As described above, the “original” user-data must be re-applicable to allow in-place updates. The way how this is done is specific to the generated operating system config (e.g., for CoreOS cloud-init the command is /usr/bin/coreos-cloudinit --from-file=<path>, whereas SLES would run cloud-init --file <path> single -n write_files --frequency=once). Consequently, besides the generated OS config, the extension controller must also provide a command for re-application an updated version of the user-data. As visible in the mentioned examples the command requires a path to the user-data file. Gardener will provide the path to the file in the OperatingSystemConfigs .spec.reloadConfigFilePath field (only if .spec.purpose=reconcile). As soon as Gardener detects that the user data has changed it will reload the systemd daemon and restart all the units provided in the .status.units[] list (see below example). The same logic applies during the very first application of the whole configuration.

After generation extension controllers are asked to store their OS config inside a Secret (as it might contain confidential data) in the same namespace. The secret’s .data could look like this:

apiVersion: v1
kind: Secret
metadata:
  name: osc-result-pool-01-original
  namespace: default
  ownerReferences:
  - apiVersion: extensions.gardener.cloud/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: OperatingSystemConfig
    name: pool-01-original
    uid: 99c0c5ca-19b9-11e9-9ebd-d67077b40f82
data:
  cloud_config: base64(generated-user-data)

Finally, the secret’s metadata, the OS-specific command to re-apply the configuration, and the list of systemd units that shall be considered to be restarted if an updated version of the user-data is re-applied must be provided in the OperatingSystemConfig’s .status field:

...
status:
  cloudConfig:
    secretRef:
      name: osc-result-pool-01-original
      namespace: default
  command: /usr/bin/coreos-cloudinit --from-file=/var/lib/cloud-config-downloader/cloud-config
  lastOperation:
    description: Successfully generated cloud config
    lastUpdateTime: "2019-01-23T07:45:23Z"
    progress: 100
    state: Succeeded
    type: Reconcile
  observedGeneration: 5
  units:
  - docker-monitor.service

(The .status.command field is optional and must only be provided if .spec.reloadConfigFilePath exists).

Once the .status indicates that the extension controller finished reconciling Gardener will continue with the next step of the shoot reconciliation flow.

CRI Support

Gardener supports specifying Container Runtime Interface (CRI) configuration in the OperatingSystemConfig resource. If the .spec.cri section exists then the name property is mandatory. The only supported values for cri.name at the moment are: containerd and docker, which uses the in-tree dockershim. For example:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: OperatingSystemConfig
metadata:
  name: pool-01-original
  namespace: default
spec:
  type: <my-operating-system>
  purpose: reconcile
  reloadConfigFilePath: /var/lib/cloud-config-downloader/cloud-config
  cri:
    name: containerd
...

To support ContainerD, an OS extension must :

  1. The operating system must have built-in ContainerD and the Client CLI
  2. ContainerD must listen on its default socket path: unix:///run/containerd/containerd.sock
  3. ContainerD must be configured to work with the default configuration file in: /etc/containerd/config.toml (Created by Gardener).

If CRI configurations are not supported it is recommended create a validating webhook running in the garden cluster that prevents specifying the .spec.providers.workers[].cri section in the Shoot objects.

References and additional resources

23 - Overview

Extensibility overview

Initially, everything was developed in-tree in the Gardener project. All cloud providers and the configuration for all the supported operating systems were released together with the Gardener core itself. But as the project grew, it got more and more difficult to add new providers and maintain the existing code base. As a consequence and in order to become agile and flexible again, we proposed GEP-1 (Gardener Enhancement Proposal). The document describes an out-of-tree extension architecture that keeps the Gardener core logic independent of provider-specific knowledge (similar to what Kubernetes has achieved with out-of-tree cloud providers or with CSI volume plugins).

Basic concepts

Gardener keeps running in the “garden cluster” and implements the core logic of shoot cluster reconciliation/deletion. Extensions are Kubernetes controllers themselves (like Gardener) and run in the seed clusters. As usual, we try to use Kubernetes wherever applicable. We rely on Kubernetes extension concepts in order to enable extensibility for Gardener. The main ideas of GEP-1 are the following:

  1. During the shoot reconciliation process Gardener will write CRDs into the seed cluster that are watched and managed by the extension controllers. They will reconcile (based on the .spec) and report whether everything went well or errors occurred in the CRD’s .status field.

  2. Gardener keeps deploying the provider-independent control plane components (etcd, kube-apiserver, etc.). However, some of these components might still need little customization by providers, e.g., additional configuration, flags, etc. In this case, the extension controllers register webhooks in order to manipulate the manifests.

Example 1:

Gardener creates a new AWS shoot cluster and requires the preparation of infrastructure in order to proceed (networks, security groups, etc.). It writes the following CRD into the seed cluster:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
  name: infrastructure
  namespace: shoot--core--aws-01
spec:
  type: aws
  providerConfig:
    apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
    kind: InfrastructureConfig
    networks:
      vpc:
        cidr: 10.250.0.0/16
      internal:
      - 10.250.112.0/22
      public:
      - 10.250.96.0/22
      workers:
      - 10.250.0.0/19
    zones:
    - eu-west-1a
  dns:
    apiserver: api.aws-01.core.example.com
  region: eu-west-1
  secretRef:
    name: my-aws-credentials
  sshPublicKey: |
        base64(key)

Please note that the .spec.providerConfig is a raw blob and not evaluated or known in any way by Gardener. Instead, it was specified by the user (in the Shoot resource) and just “forwarded” to the extension controller. Only the AWS controller understands this configuration and will now start provisioning/reconciling the infrastructure. It reports in the .status field the result:

status:
  observedGeneration: ...
  state: ...
  lastError: ..
  lastOperation: ...
  providerStatus:
    apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
    kind: InfrastructureStatus
    vpc:
      id: vpc-1234
      subnets:
      - id: subnet-acbd1234
        name: workers
        zone: eu-west-1
      securityGroups:
      - id: sg-xyz12345
        name: workers
    iam:
      nodesRoleARN: <some-arn>
      instanceProfileName: foo
    ec2:
      keyName: bar

Gardener waits until the .status.lastOperation/.status.lastError indicates that the operation reached a final state and either continuous with the next step or stops and reports the potential error. The extension-specific output in .status.providerStatus is - similar to .spec.providerConfig - not evaluated and simply forwarded to CRDs in subsequent steps.

Example 2:

Gardener deploys the control plane components into the seed cluster, e.g. the kube-controller-manager deployment with the following flags:

apiVersion: apps/v1
kind: Deployment
...
spec:
  template:
    spec:
      containers:
      - command:
        - /usr/local/bin/kube-controller-manager
        - --allocate-node-cidrs=true
        - --attach-detach-reconcile-sync-period=1m0s
        - --controllers=*,bootstrapsigner,tokencleaner
        - --cluster-cidr=100.96.0.0/11
        - --cluster-name=shoot--core--aws-01
        - --cluster-signing-cert-file=/srv/kubernetes/ca/ca.crt
        - --cluster-signing-key-file=/srv/kubernetes/ca/ca.key
        - --concurrent-deployment-syncs=10
        - --concurrent-replicaset-syncs=10
...

The AWS controller requires some additional flags in order to make the cluster functional. It needs to provide a Kubernetes cloud-config and also some cloud-specific flags. Consequently, it registers a MutatingWebhookConfiguration on Deployments and adds these flags to the container:

        - --cloud-provider=external
        - --external-cloud-volume-plugin=aws
        - --cloud-config=/etc/kubernetes/cloudprovider/cloudprovider.conf

Of course, it would have needed to create a ConfigMap containing the cloud config and to add the proper volume and volumeMounts to the manifest as well.

(Please note for this special example: The Kubernetes community is also working on making the kube-controller-manager provider-independent. However, there will most probably be still components other than the kube-controller-manager which need to be adapted by extensions.)

If you are interested in writing an extension, or generally in digging deeper to find out the nitty-gritty details of the extension concepts please read GEP-1. We are truly looking forward to your feedback!

Current status

Meanwhile, the out-of-tree extension architecture of Gardener is in place and has been productively validated. We are tracking all internal and external extensions of Gardener in the repo: Gardener Extensions Library.

24 - Project Roles

Extending project roles

The Project resource allows to specify a list of roles for every member (.spec.members[*].roles). There are a few standard roles defined by Gardener itself. Please consult this document for further information.

However, extension controllers running in the garden cluster may also create CustomResourceDefinitions that project members might be able to CRUD. For this purpose Gardener also allows to specify extension roles.

An extension role is prefixed with extension:, e.g.

apiVersion: core.gardener.cloud/v1beta1
kind: Project
metadata:
  name: dev
spec:
  members:
  - apiGroup: rbac.authorization.k8s.io
    kind: User
    name: alice.doe@example.com
    role: admin
    roles:
    - owner
    - extension:foo

The project controller will, for every extension role, create a ClusterRole with name name: gardener.cloud:extension:project:<projectName>:<roleName>, i.e., for above example: name: gardener.cloud:extension:project:dev:foo. This ClusterRole aggregates other ClusterRoles that are labeled with rbac.gardener.cloud/aggregate-to-extension-role=foo which might be created by extension controllers.

Extension that might want to contribute to the core admin or viewer roles can use the labels rbac.gardener.cloud/aggregate-to-project-member=true or rbac.gardener.cloud/aggregate-to-project-viewer=true, respectively.

Please note that the names of the extension roles are restricted to 20 characters!

Moreover, the project controller will also create a corresponding RoleBinding with the same name in the project namespace. It will automatically assign all members that are assigned to this extension role.

25 - Provider Local

Local Provider Extension

The “local provider” extension is used to allow the usage of seed and shoot clusters which run entirely locally without any real infrastructure or cloud provider involved. It implements Gardener’s extension contract (GEP-1) and thus comprises several controllers and webhooks acting on resources in seed and shoot clusters.

The code is maintained in pkg/provider-local.

Motivation

The motivation for maintaining such extension is the following:

  • 🛡 Output Qualification: Run fast and cost-efficient end-to-end tests, locally and in CI systems (increased confidence ⛑ before merging pull requests)
  • ⚙️ Development Experience: Develop Gardener entirely on a local machine without any external resources involved (improved costs 💰 and productivity 🚀)
  • 🤝 Open Source: Quick and easy setup for a first evaluation of Gardener and a good basis for first contributions

Current Limitations

The following enlists the current limitations of the implementation. Please note that all of them are no technical limitations/blockers but simply advanced scenarios that we haven’t had invested yet into.

  1. Shoot clusters can only have multiple nodes, but inter-pod communication for pods on different nodes does not work.

    We are using the networking-calico extension for the CNI plugin in shoot clusters, however, it doesn’t seem to be configured correctly yet to support this scenario.

  2. Shoot clusters don’t support persistent storage.

    We don’t install any CSI plugin into the shoot cluster yet, hence, there is no persistent storage for shoot clusters.

  3. No owner TXT DNSRecords (hence, no “bad-case” control plane migration).

    In order to realize DNS (see Implementation Details section below), the /etc/hosts file is manipulated. This does not work for TXT records. In the future, we could look into using CoreDNS instead.

  4. No load balancers for Shoot clusters.

    We have not yet developed a cloud-controller-manager which could reconcile load balancer Services in the shoot cluster. Hence, when the gardenlet’s ReversedVPN feature gate is disabled then the kube-system/vpn-shoot Service must be manually patched (with {"status": {"loadBalancer": {"ingress": [{"hostname": "vpn-shoot"}]}}}) to make the reconciliation work.

  5. Only one shoot cluster possible when gardenlet’s APIServerSNI feature gate is disabled.

    When APIServerSNI is disabled then gardenlet uses load balancer Services in order to expose the shoot clusters’ kube-apiservers. Typically, local Kubernetes clusters don’t support this. In this case, the local extension uses the host IP to expose the kube-apiserver, however, this can only be done once.

Implementation Details

This section contains information about how the respective controllers and webhooks are implemented and what their purpose is.

Bootstrapping

The Helm chart of the provider-local extension defined in its ControllerDeployment contains a special deployment for a CoreDNS instance in a gardener-extension-provider-local-coredns namespace in the seed cluster.

This CoreDNS instance is responsible for enabling the components running in the shoot clusters to be able to resolve the DNS names when they communicate with their kube-apiservers.

It contains static configuration to resolve the DNS names based on local.gardener.cloud to either the istio-ingressgateway.istio-ingress.svc or the kube-apiserver.<shoot-namespace>.svc addresses (depending on whether the --apiserver-sni-enabled flag is set to true or false).

Controllers

There are controllers for all resources in the extensions.gardener.cloud/v1alpha1 API group except for BackupBucket and BackupEntrys.

ControlPlane

This controller is not deploying anything since we haven’t invested yet into a cloud-controller-manager or CSI solution. For the latter, we could probably use the local-path-provisioner.

DNSRecord

This controller manipulates the /etc/hosts file and adds a new line for each DNSRecord it observes. This enables accessing the shoot clusters from the respective machine, however, it also requires to run the extension with elevated privileges (sudo).

The /etc/hosts would be extended as follows:

# Begin of gardener-extension-provider-local section
10.84.23.24 api.local.local.external.local.gardener.cloud
10.84.23.24 api.local.local.internal.local.gardener.cloud
...
# End of gardener-extension-provider-local section

Infrastructure

This controller generates a NetworkPolicy which allows the control plane pods (like kube-apiserver) to communicate with the worker machine pods (see Worker section)).

In addition, it creates a Service named vpn-shoot which is only used in case the gardenlet’s ReversedVPN feature gate is disabled. This Service enables the vpn-seed containers in the kube-apiserver pods in the seed cluster to communicate with the vpn-shoot pod running in the shoot cluster.

Network

This controller is not implemented anymore. In the initial version of provider-local, there was a Network controller deploying kindnetd (see https://github.com/gardener/gardener/tree/v1.44.1/pkg/provider-local/controller/network). However, we decided to drop it because this setup prevented us from using NetworkPolicys (kindnetd does not ship a NetworkPolicy controller). In addition, we had issues with shoot clusters having more than one node (hence, we couldn’t support rolling updates, see https://github.com/gardener/gardener/pull/5666/commits/491b3cd16e40e5c20ef02367fda93a34ff9465eb).

OperatingSystemConfig

This controller leverages the standard oscommon library in order to render a simple cloud-init template which can later be executed by the shoot worker nodes.

The shoot worker nodes are Pods with a container based on the kindest/node image. This is maintained in https://github.com/gardener/machine-controller-manager-provider-local/tree/master/node and has a special run-userdata systemd service which executes the cloud-init generated earlier by the OperatingSystemConfig controller.

Worker

This controller leverages the standard generic Worker actuator in order to deploy the machine-controller-manager as well as the machine-controller-manager-provider-local.

Additionally, it generates the MachineClasses and the MachineDeployments based on the specification of the Worker resources.

DNSProvider

Due to legacy reasons, the gardenlet still creates DNSProvider resources part of the dns.gardener.cloud/v1alpha1 API group. Since those are only needed in conjunction with the shoot-dns-service extension and have no relevance for the local provider, it just sets their status.state=Ready to please the expectations. In the future, this controller can be dropped when the gardenlet no longer creates such DNSProviders.

Ingress

Gardenlet creates a wildcard DNS record for the Seed’s ingress domain pointing to the nginx-ingress-controller’s LoadBalancer. This domain is commonly used by all Ingress objects created in the Seed for Seed and Shoot components. However, provider-local implements the DNSRecord extension API by writing the DNS record to /etc/hosts, which doesn’t support wildcard entries. To make Ingress domains resolvable on the host, this controller reconciles all Ingresses and creates DNSRecords of type local for each host included in spec.rules.

Service

This controller reconciles Services of type LoadBalancer in the local Seed cluster. Since the local Kubernetes clusters used as Seed clusters typically don’t support such services, this controller sets the .status.ingress.loadBalancer.ip[0] to the IP of the host. It makes important LoadBalancer Services (e.g. istio-ingress/istio-ingressgateway and garden/nginx-ingress-controller) available to the host by setting spec.ports[].nodePort to well-known ports that are mapped to hostPorts in the kind cluster configuration.

If the --apiserver-sni-enabled flag is set to true (default), istio-ingress/istio-ingressgateway is set to be exposed on nodePort 30433 by this controller. Otherwise, the kube-apiserver Service in the shoot namespaces in the seed cluster needs to be patched to be exposed on 30443 by the Control Plane Exposure Webhook.

ETCD Backups

This controller reconciles the BackuBucket and BackupEntry of the shoot allowing the etcd-backup-restore to create and copy backups using the local provider functionality. The backups are stored on the host file system. This is achieved by mounting that directory to the etcd-backup-restore container.

Health Checks

The health check controller leverages the health check library in order to

  • check the health of the ManagedResource/extension-controlplane-shoot-webhooks and populate the SystemComponentsHealthy condition in the ControlPlane resource.
  • check the health of the ManagedResource/extension-networking-local and populate the SystemComponentsHealthy condition in the Network resource.
  • check the health of the ManagedResource/extension-worker-mcm-shoot and populate the SystemComponentsHealthy condition in the Worker resource.
  • check the health of the Deployment/machine-controller-manager and populate the ControlPlaneHealthy condition in the Worker resource.
  • check the health of the Nodes and populate the EveryNodeReady condition in the Worker resource.

Webhooks

Control Plane

This webhook reacts on the OperatingSystemConfig containing the configuration of the kubelet and sets the failSwapOn to false (independent of what is configured in the Shoot spec) (ref).

Control Plane Exposure

This webhook reacts on the kube-apiserver Service in shoot namespaces in the seed in case the gardenlet’s APIServerSNI feature gate is disabled. It sets the nodePort to 30443 to enable communication from the host (this requires a port mapping to work when creating the local cluster).

DNS Config

This webhook reacts on events for the dependency-watchdog-probe Deployment as well as on events for Pods created when the machine-controller-manager reconciles Machines. It sets the .spec.dnsPolicy=None and .spec.dnsConfig.nameServers to the cluster IP of the coredns Service created in the gardener-extension-provider-local-coredns namespaces so that these pods can resolve the DNS records for shoot clusters (see the Bootstrapping section for more details).

Node

This webhook reacts on kind Nodes and sets the .status.{allocatable,capacity}.cpu="100" and .status.{allocatable,capacity}.memory="100Gi" fields.

Background: Typically, the .status.{capacity,allocatable} values are determined by the resources configured for the Docker daemon (see for example this for Mac). Since many of the Pods deployed by Gardener have quite high .spec.resources.{requests,limits}, the kind Nodes easily get filled up and only a few Pods can be scheduled (even if they barely consume any of their reserved resources). In order to improve the user experience, on startup/leader election the provider-local extension submits an empty patch which triggers the “Node webhook” (see below section). The webhook will increase the capacity of the Nodes to allow all Pods to be scheduled.

Shoot

This webhook reacts on the ConfigMap used by the kube-proxy and sets the maxPerCore field to 0 since other values don’t work well in conjunction with the kindest/node image which is used as base for the shoot worker machine pods (ref).

Future Work

Future work could mostly focus on resolving above listed limitations, i.e.,

  • Add storage support for shoot clusters.
  • Implement a cloud-controller-manager and deploy it via the ControlPlane controller.
  • Implement support for BackupBucket and BackupEntrys to enable ETCD backups for shoot clusters (based on the support for local disks in etcd-backup-restore).
  • Properly implement .spec.machineTypes in the CloudProfiles (i.e., configure .spec.resources properly for the created shoot worker machine pods).

26 - Reconcile Trigger

Reconcile trigger

Gardener dictates the time of reconciliation for resources of the API group extensions.gardener.cloud. It does that by annotating the respected resource with gardener.cloud/operation=reconcile. Extension controllers shall react to this annotation and start reconciling the resource. They have to remove this annotation as soon as they begin with their reconcile operation and maintain the status of the extension resource accordingly.

The reason for this behaviour is that it is possible to configure Gardener to reconcile only in the shoots’ maintenance time windows. In order to avoid that extension controllers reconcile outside of the shoot’s maintenance time window we have introduced this contract. This way extension controllers don’t need to care about when the shoot maintenance time window happens. Gardener keeps control and decides when the shoot shall be reconciled/updated.

Our extension controller library provides all the required utilities to conveniently implement this behaviour.

27 - Referenced Resources

Referenced Resources

The Shoot resource can include a list of resources (usually secrets) that can be referenced by name in extension providerConfig and other Shoot sections, for example:

kind: Shoot
apiVersion: core.gardener.cloud/v1beta1
metadata:
  name: crazy-botany
  namespace: garden-dev
  ...
spec:
  ...
  extensions:
  - type: foobar
    providerConfig:
      apiVersion: foobar.extensions.gardener.cloud/v1alpha1
      kind: FooBarConfig
      foo: bar
      secretRef: foobar-secret
  resources:
  - name: foobar-secret
    resourceRef:
      apiVersion: v1
      kind: Secret
      name: my-foobar-secret

Gardener expects to find these referenced resources in the project namespace (e.g. garden-dev) and will copy them to the Shoot namespace in the Seed cluster when reconciling a Shoot, adding a prefix to their names to avoid naming collisions with Gardener’s own resources.

Extension controllers can resolve the references to these resources by accessing the Shoot via the Cluster resource. To properly read a referenced resources, extension controllers should use the utility function GetObjectByReference from the extensions/pkg/controller package, for example:

    ...
    ref = &autoscalingv1.CrossVersionObjectReference{
        APIVersion: "v1",
        Kind:       "Secret",
        Name:       "foo",
    }
    secret := &corev1.Secret{}
    if err := controller.GetObjectByReference(ctx, client, ref, "shoot--test--foo", secret); err != nil {
        return err
    }
    // Use secret
    ...

28 - Shoot Health Status Conditions

Contributing to shoot health status conditions

Gardener checks regularly (every minute by default) the health status of all shoot clusters. It categorizes its checks into four different types:

  • APIServerAvailable: This type indicates whether the shoot’s kube-apiserver is available or not.
  • ControlPlaneHealthy: This type indicates whether all the control plane components deployed to the shoot’s namespace in the seed do exist and are running fine.
  • EveryNodeReady: This type indicates whether all Nodes and all Machine objects report healthiness.
  • SystemComponentsHealthy: This type indicates whether all system components deployed to the kube-system namespace in the shoot do exist and are running fine.

Every Shoot resource has a status.conditions[] list that contains the mentioned types, together with a status (True/False) and a descriptive message/explanation of the status.

Most extension controllers are deploying components and resources as part of their reconciliation flows into the seed or shoot cluster. A prominent example for this is the ControlPlane controller that usually deploys a cloud-controller-manager or CSI controllers as part of the shoot control plane. Now that the extensions deploy resources into the cluster, especially resources that are essential for the functionality of the cluster, they might want to contribute to Gardener’s checks mentioned above.

What can extensions do to contribute to Gardener’s health checks?

Every extension resource in Gardener’s extensions.gardener.cloud/v1alpha1 API group also has a status.conditions[] list (like the Shoot). Extension controllers can write conditions to the resource they are acting on and use a type that also exist in the shoot’s conditions. One exception is that APIServerAvailable can’t be used as the Gardener clearly can identify the status of this condition and it doesn’t make sense for extensions to try to contribute/modify it.

As an example for the ControlPlane controller let’s take a look at the following resource:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: ControlPlane
metadata:
  name: control-plane
  namespace: shoot--foo--bar
spec:
  ...
status:
  conditions:
  - type: ControlPlaneHealthy
    status: "False"
    reason: DeploymentUnhealthy
    message: 'Deployment cloud-controller-manager is unhealthy: condition "Available" has
      invalid status False (expected True) due to MinimumReplicasUnavailable: Deployment
      does not have minimum availability.'
    lastUpdateTime: "2014-05-25T12:44:27Z"
  - type: ConfigComputedSuccessfully
    status: "True"
    reason: ConfigCreated
    message: The cloud-provider-config has been successfully computed.
    lastUpdateTime: "2014-05-25T12:43:27Z"

The extension controller has declared in its extension resource that one of the deployments it is responsible for is unhealthy. Also, it has written a second condition using a type that is unknown by Gardener.

Gardener will pick the list of conditions and recognize that the there is one with a type ControlPlaneHealthy. It will merge it with its own ControlPlaneHealthy condition and report it back to the Shoot’s status:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  labels:
    shoot.gardener.cloud/status: unhealthy
  name: some-shoot
  namespace: garden-core
spec:
status:
  conditions:
  - type: APIServerAvailable
    status: "True"
    reason: HealthzRequestSucceeded
    message: API server /healthz endpoint responded with success status code. [response_time:31ms]
    lastUpdateTime: "2014-05-23T08:26:52Z"
    lastTransitionTime: "2014-05-25T12:45:13Z"
  - type: ControlPlaneHealthy
    status: "False"
    reason: ControlPlaneUnhealthyReport
    message: 'Deployment cloud-controller-manager is unhealthy: condition "Available" has
      invalid status False (expected True) due to MinimumReplicasUnavailable: Deployment
      does not have minimum availability.'
    lastUpdateTime: "2014-05-25T12:45:13Z"
    lastTransitionTime: "2014-05-25T12:45:13Z"
  ...

Hence, the only duty extensions have is to maintain the health status of their components in the extension resource they are managing. This can be accomplished using the health check library for extensions.

Error Codes

The Gardener API includes some well-defined error codes, e.g., ERR_INFRA_UNAUTHORIZED, ERR_INFRA_DEPENDENCIES, etc. Extension may set these error codes in the .status.conditions[].codes[] list in case it makes sense. Gardener will pick them up and will similarly merge them into the .status.conditions[].codes[] list in the Shoot:

status:
  conditions:
  - type: ControlPlaneHealthy
    status: "False"
    reason: DeploymentUnhealthy
    message: 'Deployment cloud-controller-manager is unhealthy: condition "Available" has
      invalid status False (expected True) due to MinimumReplicasUnavailable: Deployment
      does not have minimum availability.'
    lastUpdateTime: "2014-05-25T12:44:27Z"
    codes:
    - ERR_INFRA_UNAUTHORIZED 

29 - Shoot Maintenance

Shoot maintenance

There is a general document about shoot maintenance that you might want to read. Here, we describe how you can influence certain operations that happen during a shoot maintenance.

Restart Control Plane Controllers

As outlined in above linked document, Gardener offers to restart certain control plane controllers running in the seed during a shoot maintenance.

Extension controllers can extend the amount of pods being affected by these restarts. If your Gardener extension manages pods of a shoot’s control plane (shoot namespace in seed) and it could potentially profit from a regular restart please consider labeling it with maintenance.gardener.cloud/restart=true.

30 - Shoot Webhooks

Shoot resource customization webhooks

Gardener deploys several components/resources into the shoot cluster. Some of these resources are essential (like the kube-proxy), others are optional addons (like the kubernetes-dashboard or the nginx-ingress-controller). In either case, some provider extensions might need to mutate these resources and inject provider-specific bits into it.

What’s the approach to implement such mutations?

Similar to how control plane components in the seed are modified we are using MutatingWebhookConfigurations to achieve the same for resources in the shoot. Both, the provider extension and the kube-apiserver of the shoot cluster are running in the same seed. Consequently, the kube-apiserver can talk cluster-internally to the provider extension webhook which makes such operations even faster.

How is the MutatingWebhookConfiguration object created in the shoot?

The preferred approach is to use a ManagedResource (see also this document) in the seed cluster. This way the gardener-resource-manager ensures that end-users cannot delete/modify the webhook configuration. The provider extension doesn’t need to care about the same.

What else is needed?

The shoot’s kube-apiserver must be allowed to talk to the provider extension. To achieve this you need to create a NetworkPolicy in the shoot namespace. Our extension controller library provides easy-to-use utilities and hooks to implement such a webhook. Please find an exemplary implementation here and here.

31 - Worker

Contract: Worker resource

While the control plane of a shoot cluster is living in the seed and deployed as native Kubernetes workload, the worker nodes of the shoot clusters are normal virtual machines (VMs) in the end-users infrastructure account. The Gardener project features a sub-project called machine-controller-manager. This controller is extending the Kubernetes API using custom resource definitions to represent actual VMs as Machine objects inside a Kubernetes system. This approach unlocks the possibility to manage virtual machines in the Kubernetes style and benefit from all its design principles.

What is the machine-controller-manager exactly doing?

Generally, there are provider-specific MachineClass objects (AWSMachineClass, AzureMachineClass, etc.; similar to StorageClass), and MachineDeployment, MachineSet, and Machine objects (similar to Deployment, ReplicaSet, and Pod). A machine class describes where and how to create virtual machines (in which networks, region, availability zone, SSH key, user-data for bootstrapping, etc.) while a Machine results in an actual virtual machine. You can read up more information in the machine-controller-manager’s repository.

Before the introduction of the Worker extension resource Gardener was deploying the machine-controller-manager, the machine classes, and the machine deployments itself. Now, Gardener commissions an external, provider-specific controller to take over these tasks.

What needs to be implemented to support a new worker provider?

As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Worker
metadata:
  name: bar
  namespace: shoot--foo--bar
spec:
  type: azure
  region: eu-west-1
  secretRef:
    name: cloudprovider
    namespace: shoot--foo--bar
  infrastructureProviderStatus:
    apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
    kind: InfrastructureStatus
    ec2:
      keyName: shoot--foo--bar-ssh-publickey
    iam:
      instanceProfiles:
      - name: shoot--foo--bar-nodes
        purpose: nodes
      roles:
      - arn: arn:aws:iam::0123456789:role/shoot--foo--bar-nodes
        purpose: nodes
    vpc:
      id: vpc-0123456789
      securityGroups:
      - id: sg-1234567890
        purpose: nodes
      subnets:
      - id: subnet-01234
        purpose: nodes
        zone: eu-west-1b
      - id: subnet-56789
        purpose: public
        zone: eu-west-1b
      - id: subnet-0123a
        purpose: nodes
        zone: eu-west-1c
      - id: subnet-5678a
        purpose: public
        zone: eu-west-1c
  pools:
  - name: cpu-worker
    minimum: 3
    maximum: 5
    maxSurge: 1
    maxUnavailable: 0
    machineType: m4.large
    machineImage:
      name: coreos
      version: 1967.5.0
    nodeTemplate:
      capacity:
        cpu: 2
        gpu: 0
        memory: 8Gi
    userData: c29tZSBkYXRhIHRvIGJvb3RzdHJhcCB0aGUgVk0K
    volume:
      size: 20Gi
      type: gp2
    zones:
    - eu-west-1b
    - eu-west-1c
    machineControllerManager:
      drainTimeout: 10m
      healthTimeout: 10m
      creationTimeout: 10m
      maxEvictRetries: 30
      nodeConditions:
      - ReadonlyFilesystem
      - DiskPressure
      - KernelDeadlock

The .spec.secretRef contains a reference to the provider secret pointing to the account that shall be used to create the needed virtual machines. Also, as you can see, Gardener copies the output of the infrastructure creation (.spec.infrastructureProviderStatus), see Infrastructure resource, into the .spec.

In the .spec.pools[] field the desired worker pools are listed. In the above example, one pool with machine type m4.large and min=3, max=5 machines shall be spread over two availability zones (eu-west-1b, eu-west-1c). This information together with the infrastructure status must be used to determine the proper configuration for the machine classes.

The spec.pools[].nodeTemplate.capacity field contains the resource information of the machine like cpu, gpu and memory. This info is used by Cluster Autoscaler to generate nodeTemplate during scaling the nodeGroup from zero.

The spec.pools[].machineControllerManager field allows to configure the settings for machine-controller-manager component. Providers must populate these settings on worker-pool to the related fields in MachineDeployment.

When seeing such a resource your controller must make sure that it deploys the machine-controller-manager next to the control plane in the seed cluster. After that, it must compute the desired machine classes and the desired machine deployments. Typically, one class maps to one deployment, and one class/deployment is created per availability zone. Following this convention, the created resource would look like this:

apiVersion: v1
kind: Secret
metadata:
  name: shoot--foo--bar-cpu-worker-z1-3db65
  namespace: shoot--foo--bar
  labels:
    gardener.cloud/purpose: machineclass
type: Opaque
data:
  providerAccessKeyId: eW91ci1hd3MtYWNjZXNzLWtleS1pZAo=
  providerSecretAccessKey: eW91ci1hd3Mtc2VjcmV0LWFjY2Vzcy1rZXkK
  userData: c29tZSBkYXRhIHRvIGJvb3RzdHJhcCB0aGUgVk0K
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: AWSMachineClass
metadata:
  name: shoot--foo--bar-cpu-worker-z1-3db65
  namespace: shoot--foo--bar
spec:
  ami: ami-0123456789 # Your controller must map the stated version to the provider specific machine image information, in the AWS case the AMI.
  blockDevices:
  - ebs:
      volumeSize: 20
      volumeType: gp2
  iam:
    name: shoot--foo--bar-nodes
  keyName: shoot--foo--bar-ssh-publickey
  machineType: m4.large
  networkInterfaces:
  - securityGroupIDs:
    - sg-1234567890
    subnetID: subnet-01234
  region: eu-west-1
  secretRef:
    name: shoot--foo--bar-cpu-worker-z1-3db65
    namespace: shoot--foo--bar
  tags:
    kubernetes.io/cluster/shoot--foo--bar: "1"
    kubernetes.io/role/node: "1"
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: MachineDeployment
metadata:
  name: shoot--foo--bar-cpu-worker-z1
  namespace: shoot--foo--bar
spec:
  replicas: 2
  selector:
    matchLabels:
      name: shoot--foo--bar-cpu-worker-z1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        name: shoot--foo--bar-cpu-worker-z1
    spec:
      class:
        kind: AWSMachineClass
        name: shoot--foo--bar-cpu-worker-z1-3db65

for the first availability zone eu-west-1b, and

apiVersion: v1
kind: Secret
metadata:
  name: shoot--foo--bar-cpu-worker-z2-5z6as
  namespace: shoot--foo--bar
  labels:
    gardener.cloud/purpose: machineclass
type: Opaque
data:
  providerAccessKeyId: eW91ci1hd3MtYWNjZXNzLWtleS1pZAo=
  providerSecretAccessKey: eW91ci1hd3Mtc2VjcmV0LWFjY2Vzcy1rZXkK
  userData: c29tZSBkYXRhIHRvIGJvb3RzdHJhcCB0aGUgVk0K
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: AWSMachineClass
metadata:
  name: shoot--foo--bar-cpu-worker-z2-5z6as
  namespace: shoot--foo--bar
spec:
  ami: ami-0123456789 # Your controller must map the stated version to the provider specific machine image information, in the AWS case the AMI.
  blockDevices:
  - ebs:
      volumeSize: 20
      volumeType: gp2
  iam:
    name: shoot--foo--bar-nodes
  keyName: shoot--foo--bar-ssh-publickey
  machineType: m4.large
  networkInterfaces:
  - securityGroupIDs:
    - sg-1234567890
    subnetID: subnet-0123a
  region: eu-west-1
  secretRef:
    name: shoot--foo--bar-cpu-worker-z2-5z6as
    namespace: shoot--foo--bar
  tags:
    kubernetes.io/cluster/shoot--foo--bar: "1"
    kubernetes.io/role/node: "1"
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: MachineDeployment
metadata:
  name: shoot--foo--bar-cpu-worker-z1
  namespace: shoot--foo--bar
spec:
  replicas: 1
  selector:
    matchLabels:
      name: shoot--foo--bar-cpu-worker-z1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        name: shoot--foo--bar-cpu-worker-z1
    spec:
      class:
        kind: AWSMachineClass
        name: shoot--foo--bar-cpu-worker-z2-5z6as

for the second availability zone eu-west-1c.

Another convention is the 5-letter hash at the end of the machine class names. Most controllers compute a checksum out of the specification of the machine class. This helps to trigger a rolling update of the worker nodes if, for example, the machine image version changes. In this case, a new checksum will be generated which results in the creation of a new machine class. The MachineDeployment’s machine class reference (.spec.template.spec.class.name) is updated which triggers the rolling update process in the machine-controller-manager. However, all of this is only a convention that eases writing the controller, but you can do it completely differently if you desire - as long as you make sure that the described behaviours are implemented correctly.

After the machine classes and machine deployments have been created the machine-controller-manager will start talking to the provider’s IaaS API and create the virtual machines. Gardener makes sure that the content of the userData field that is used to bootstrap the machines contain the required configuration for installation of the kubelet and registering the VM as worker node in the shoot cluster. The Worker extension controller shall wait until all the created MachineDeployments indicate healthiness/readiness before it ends the control loop.

Does Gardener need some information that must be returned back?

Another important benefit of the machine-controller-manager’s design principles (extending the Kubernetes API using CRDs) is that the cluster-autoscaler can be used without any provider-specific implementation. We have forked the upstream Kubernetes community’s cluster-autoscaler and extended it so that it understands the machine API. Definitely, we will merge it back into the community’s versions once it has been adapted properly.

Our cluster-autoscaler only needs to know the minimum and maximum number of replicas per MachineDeployment and is ready to act without that it needs to talk to the provider APIs (it just modifies the .spec.replicas field in the MachineDeployment object). Gardener deploys this autoscaler if there is at least one worker pool that specifies max>min. In order to know how it needs to configure it, the provider-specific Worker extension controller must expose which MachineDeployments it had created and how the min/max numbers should look like.

Consequently, your controller should write this information into the Worker resource’s .status.machineDeployments field:

---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Worker
metadata:
  name: worker
  namespace: shoot--foo--bar
spec:
  ...
status:
  lastOperation: ...
  machineDeployments:
  - name: shoot--foo--bar-cpu-worker-z1
    minimum: 2
    maximum: 3
  - name: shoot--foo--bar-cpu-worker-z2
    minimum: 1
    maximum: 2

In order to support a new worker provider you need to write a controller that watches all Workers with .spec.type=<my-provider-name>. You can take a look at the below referenced example implementation for the AWS provider.

That sounds like a lot that needs to be done, can you help me?

All of the described behaviour is mostly the same for every provider. The only difference is maybe the version/configuration of the machine-controller-manager, and the machine class specification itself. You can take a look at our extension library, especially the worker controller part where you will find a lot of utilities that you can use. Also, using the library you only need to implement your provider specifics - all the things that can be handled generically can be taken for free and do not need to be re-implemented. Take a look at the AWS worker controller for finding an example.

Non-provider specific information required for worker creation

All the providers require further information that is not provider specific but already part of the shoot resource. One example for such information is whether the shoot is hibernated or not. In this case all the virtual machines should be deleted/terminated, and after that the machine controller-manager should be scaled down. You can take a look at the AWS worker controller to see how it reads this information and how it is used. As Gardener cannot know which information is required by providers it simply mirrors the Shoot, Seed, and CloudProfile resources into the seed. They are part of the Cluster extension resource and can be used to extract information that is not part of the Worker resource itself.

References and additional resources