This is the multi-page printable view of this section. Click here to print.
Resources
1 - BackupBucket
Contract: BackupBucket
Resource
The Gardener project features a sub-project called etcd-backup-restore to take periodic backups of etcd backing Shoot clusters. It demands the bucket (or its equivalent in different object store providers) to be created and configured externally with appropriate credentials. The BackupBucket
resource takes this responsibility in Gardener.
Before introducing the BackupBucket
extension resource, Gardener was using Terraform in order to create and manage these provider-specific resources (e.g., see AWS Backup).
Now, Gardener commissions an external, provider-specific controller to take over this task. You can also refer to backupInfra proposal documentation to get an idea about how the transition was done and understand the resource in a broader scope.
What Is the Scope of a Bucket?
A bucket will be provisioned per Seed
. So, a backup of every Shoot
created on that Seed
will be stored under a different shoot specific prefix under the bucket.
For the backup of the Shoot
rescheduled on different Seed
, it will continue to use the same bucket.
What Is the Lifespan of a BackupBucket
?
The bucket associated with BackupBucket
will be created at the creation of the Seed
. And as per current implementation, it will also be deleted on deletion of the Seed
, if there isn’t any BackupEntry
resource associated with it.
In the future, we plan to introduce a schedule for BackupBucket
- the deletion logic for the BackupBucket
resource, which will reschedule it on different available Seed
s on deletion or failure of a health check for the currently associated seed
. In that case, the BackupBucket
will be deleted only if there isn’t any schedulable Seed
available and there isn’t any associated BackupEntry
resource.
What Needs to Be Implemented to Support a New Infrastructure Provider?
As part of the seed flow, Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: BackupBucket
metadata:
name: foo
spec:
type: azure
providerConfig:
<some-optional-provider-specific-backupbucket-configuration>
region: eu-west-1
secretRef:
name: backupprovider
namespace: shoot--foo--bar
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used to create the needed resources. This provider secret will be configured by the Gardener operator in the Seed
resource and propagated over there by the seed controller.
After your controller has created the required bucket, if required, it generates the secret to access the objects in the bucket and put a reference to it in status
. This secret is supposed to be used by Gardener, or eventually a BackupEntry
resource and etcd-backup-restore component, to backup the etcd.
In order to support a new infrastructure provider, you need to write a controller that watches all BackupBucket
s with .spec.type=<my-provider-name>
. You can take a look at the below referenced example implementation for the Azure provider.
References and Additional Resources
2 - BackupEntry
Contract: BackupEntry
Resource
The Gardener project features a sub-project called etcd-backup-restore to take periodic backups of etcd backing Shoot clusters. It demands the bucket (or its equivalent in different object store providers) access credentials to be created and configured externally with appropriate credentials. The BackupEntry
resource takes this responsibility in Gardener to provide this information by creating a secret specific to the component.
That being said, the core motivation for introducing this resource was to support retention of backups post deletion of Shoot
. The etcd-backup-restore components take responsibility of garbage collecting old backups out of the defined period. Once a shoot is deleted, we need to persist the backups for few days. Hence, Gardener uses the BackupEntry
resource for this housekeeping work post deletion of a Shoot
. The BackupEntry
resource is responsible for shoot specific prefix under referred bucket.
Before introducing the BackupEntry
extension resource, Gardener was using Terraform in order to create and manage these provider-specific resources (e.g., see AWS Backup).
Now, Gardener commissions an external, provider-specific controller to take over this task. You can also refer to backupInfra proposal documentation to get idea about how the transition was done and understand the resource in broader scope.
What Is the Lifespan of a BackupEntry
?
The bucket associated with BackupEntry
will be created by using a BackupBucket
resource. The BackupEntry
resource will be created as a part of the Shoot
creation. But resources might continue to exist post deletion of a Shoot
(see gardenlet for more details).
What Needs to be Implemented to Support a New Infrastructure Provider?
As part of the shoot flow, Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: BackupEntry
metadata:
name: shoot--foo--bar
spec:
type: azure
providerConfig:
<some-optional-provider-specific-backup-bucket-configuration>
backupBucketProviderStatus:
<some-optional-provider-specific-backup-bucket-status>
region: eu-west-1
bucketName: foo
secretRef:
name: backupprovider
namespace: shoot--foo--bar
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used to create the needed resources. This provider secret will be propagated from the BackupBucket
resource by the shoot controller.
Your controller is supposed to create the etcd-backup
secret in the control plane namespace of a shoot. This secret is supposed to be used by Gardener or eventually by the etcd-backup-restore component to backup the etcd. The controller implementation should clean up the objects created under the shoot specific prefix in the bucket equivalent to the name of the BackupEntry
resource.
In order to support a new infrastructure provider, you need to write a controller that watches all the BackupBucket
s with .spec.type=<my-provider-name>
. You can take a look at the below referenced example implementation for the Azure provider.
References and Additional Resources
3 - Bastion
Contract: Bastion
Resource
The Gardener project allows users to connect to Shoot worker nodes via SSH. As nodes are usually firewalled and not directly accessible from the public internet, GEP-15 introduced the concept of “Bastions”. A bastion is a dedicated server that only serves to allow SSH ingress to the worker nodes.
Bastion
resources contain the user’s public SSH key and IP address, in order to provision the server accordingly: The public key is put onto the Bastion and SSH ingress is only authorized for the given IP address (in fact, it’s not a single IP address, but a set of IP ranges, however for most purposes a single IP is be used).
What Is the Lifespan of a Bastion
?
Once a Bastion
has been created in the garden, it will be replicated to the appropriate seed cluster, where a controller then reconciles a server and firewall rules etc., on the cloud provider used by the target Shoot. When the Bastion is ready (i.e. has a public IP), that IP is stored in the Bastion
’s status and from there it is picked up by the garden cluster and gardenctl
eventually.
To make multiple SSH sessions possible, the existence of the Bastion
is not directly tied to the execution of gardenctl
: users can exit out of gardenctl
and use ssh
manually to connect to the bastion and worker nodes.
However, Bastion
s have an expiry date, after which they will be garbage collected.
When SSH access is set to false
for the Shoot
in the workers settings (see Shoot Worker Nodes Settings), Bastion
resources are deleted during Shoot
reconciliation and new Bastion
s are prevented from being created.
What Needs to Be Implemented to Support a New Infrastructure Provider?
As part of the shoot flow, Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Bastion
metadata:
name: mybastion
namespace: shoot--foo--bar
spec:
type: aws
# userData is base64-encoded cloud provider user data; this contains the
# user's SSH key
userData: IyEvYmluL2Jhc2ggL....Nlcgo=
ingress:
- ipBlock:
cidr: 192.88.99.0/32 # this is most likely the user's IP address
Your controller is supposed to create a new instance at the given cloud provider, firewall it to only allow SSH (TCP port 22) from the given IP blocks, and then configure the firewall for the worker nodes to allow SSH from the bastion instance. When a Bastion
is deleted, all these changes need to be reverted.
Implementation Details
ConfigValidator
Interface
For bastion controllers, the generic Reconciler
also delegates to a ConfigValidator
interface that contains a single Validate
method. This method is called by the generic Reconciler
at the beginning of every reconciliation, and can be implemented by the extension to validate the .spec.providerConfig
part of the Bastion
resource with the respective cloud provider, typically the existence and validity of cloud provider resources such as VPCs, images, etc.
The Validate
method returns a list of errors. If this list is non-empty, the generic Reconciler
will fail with an error. This error will have the error code ERR_CONFIGURATION_PROBLEM
, unless there is at least one error in the list that has its ErrorType
field set to field.ErrorTypeInternal
.
References and Additional Resources
4 - ContainerRuntime
Contract: ContainerRuntime
Resource
At the lowest layers of a Kubernetes node is the software that, among other things, starts and stops containers. It is called “Container Runtime”. The most widely known container runtime is Docker, but it is not alone in this space. In fact, the container runtime space has been rapidly evolving.
Kubernetes supports different container runtimes using Container Runtime Interface (CRI) – a plugin interface which enables kubelet to use a wide variety of container runtimes.
Gardener supports creation of Worker machines using CRI. For more information, see CRI Support.
Motivation
Prior to the Container Runtime Extensibility
concept, Gardener used Docker as the only
container runtime to use in shoot worker machines. Because of the wide variety of different container runtimes
offering multiple important features (for example, enhanced security concepts), it is important to enable end users to use other container runtimes as well.
The ContainerRuntime
Extension Resource
Here is what a typical ContainerRuntime
resource would look like:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: ContainerRuntime
metadata:
name: my-container-runtime
spec:
binaryPath: /var/bin/containerruntimes
type: gvisor
workerPool:
name: worker-ubuntu
selector:
matchLabels:
worker.gardener.cloud/pool: worker-ubuntu
Gardener deploys one ContainerRuntime
resource per worker pool per CRI.
To exemplify this, consider a Shoot having two worker pools (worker-one
, worker-two
) using containerd
as the CRI as well as gvisor
and kata
as enabled container runtimes.
Gardener would deploy four ContainerRuntime
resources. For worker-one
: one ContainerRuntime
for type gvisor
and one for type kata
. The same resource are being deployed for worker-two
.
Supporting a New Container Runtime Provider
To add support for another container runtime (e.g., gvisor, kata-containers), a container runtime extension controller needs to be implemented. It should support Gardener’s supported CRI plugins.
The container runtime extension should install the necessary resources into the shoot cluster (e.g., RuntimeClass
es), and it should copy the runtime binaries to the relevant worker machines in path: spec.binaryPath
.
Gardener labels the shoot nodes according to the CRI configured: worker.gardener.cloud/cri-name=<value>
(e.g., worker.gardener.cloud/cri-name=containerd
) and multiple labels for each of the container runtimes configured for the shoot Worker machine:
containerruntime.worker.gardener.cloud/<container-runtime-type-value>=true
(e.g., containerruntime.worker.gardener.cloud/gvisor=true
).
The way to install the binaries is by creating a daemon set which copies the binaries from an image in a docker registry to the relevant labeled Worker’s nodes (avoid downloading binaries from the internet to also cater with isolated environments).
For additional reference, please have a look at the runtime-gvsior provider extension, which provides more information on how to configure the necessary charts, as well as the actuators required to reconcile container runtime inside the Shoot
cluster to the desired state.
5 - ControlPlane
Contract: ControlPlane
Resource
Most Kubernetes clusters require a cloud-controller-manager
or CSI drivers in order to work properly.
Before introducing the ControlPlane
extension resource Gardener was having several different Helm charts for the cloud-controller-manager
deployments for the various providers.
Now, Gardener commissions an external, provider-specific controller to take over this task.
Which control plane resources are required?
As mentioned in the controlplane customization webhooks document, Gardener shall not deploy any cloud-controller-manager
or any other provider-specific component.
Instead, it creates a ControlPlane
CRD that should be picked up by provider extensions.
Its purpose is to trigger the deployment of such provider-specific components in the shoot namespace in the seed cluster.
What needs to be implemented to support a new infrastructure provider?
As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: ControlPlane
metadata:
name: control-plane
namespace: shoot--foo--bar
spec:
type: openstack
region: europe-west1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
providerConfig:
apiVersion: openstack.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
loadBalancerProvider: provider
zone: eu-1a
cloudControllerManager:
featureGates:
CustomResourceValidation: true
infrastructureProviderStatus:
apiVersion: openstack.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureStatus
networks:
floatingPool:
id: vpc-1234
subnets:
- purpose: nodes
id: subnetid
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used for the shoot cluster.
However, the most important section is the .spec.providerConfig
and the .spec.infrastructureProviderStatus
.
The first one contains an embedded declaration of the provider specific configuration for the control plane (that cannot be known by Gardener itself).
You are responsible for designing how this configuration looks like.
Gardener does not evaluate it but just copies this part from what has been provided by the end-user in the Shoot
resource.
The second one contains the output of the Infrastructure
resource (that might be relevant for the CCM config).
In order to support a new control plane provider, you need to write a controller that watches all ControlPlane
s with .spec.type=<my-provider-name>
.
You can take a look at the below referenced example implementation for the Alicloud provider.
The control plane controller as part of the ControlPlane
reconciliation often deploys resources (e.g. pods/deployments) into the Shoot namespace in the Seed
as part of its ControlPlane
reconciliation loop.
Because the namespace contains network policies that per default deny all ingress and egress traffic,
the pods may need to have proper labels matching to the selectors of the network policies in order to allow the required network traffic.
Otherwise, they won’t be allowed to talk to certain other components (e.g., the kube-apiserver of the shoot).
For more information, see NetworkPolicy
s In Garden, Seed, Shoot Clusters.
Non-Provider Specific Information Required for Infrastructure Creation
Most providers might require further information that is not provider specific but already part of the shoot resource.
One example for this is the GCP control plane controller, which needs the Kubernetes version of the shoot cluster (because it already uses the in-tree Kubernetes cloud-controller-manager).
As Gardener cannot know which information is required by providers, it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information that is not part of the Infrastructure
resource itself.
References and Additional Resources
6 - ControlPlane Exposure
Contract: ControlPlane
Resource with Purpose exposure
Some Kubernetes clusters require an additional deployments required by the seed cloud provider in order to work properly, e.g. AWS Load Balancer Readvertiser.
Before using ControlPlane resources with purpose exposure
, Gardener was having different Helm charts for the deployments for the various providers.
Now, Gardener commissions an external, provider-specific controller to take over this task.
Which control plane resources are required?
As mentioned in the controlplane document, Gardener shall not deploy any other provider-specific component.
Instead, it creates a ControlPlane
CRD with purpose exposure
that should be picked up by provider extensions.
Its purpose is to trigger the deployment of such provider-specific components in the shoot namespace in the seed cluster that are needed to expose the kube-apiserver.
The shoot cluster’s kube-apiserver are exposed via a Service
of type LoadBalancer
from the shoot provider (you may run the control plane of an Azure shoot in a GCP seed). It’s the seed provider extension controller that should act on the ControlPlane
resources with purpose exposure
.
If SNI is enabled, then the Service
from above is of type ClusterIP
and Gardner will not create ControlPlane
resources with purpose exposure
.
What needs to be implemented to support a new infrastructure provider?
As part of the shoot flow, Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: ControlPlane
metadata:
name: control-plane-exposure
namespace: shoot--foo--bar
spec:
type: aws
purpose: exposure
region: europe-west1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used for the shoot cluster.
It is most likely not needed, however, still added for some potential corner cases.
If you don’t need it, then just ignore it.
The .spec.region
contains the region of the seed cluster.
In order to support a control plane provider with purpose exposure
, you need to write a controller or expand the existing controlplane controller that watches all ControlPlane
s with .spec.type=<my-provider-name>
and purpose exposure
.
You can take a look at the below referenced example implementation for the AWS provider.
Non-Provider Specific Information Required for Infrastructure Creation
Most providers might require further information that is not provider specific but already part of the shoot resource.
As Gardener cannot know which information is required by providers, it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information.
References and Additional Resources
7 - DNS Record
Contract: DNSRecord
Resources
Every shoot cluster requires external DNS records that are publicly resolvable. The management of these DNS records requires provider-specific knowledge which is to be developed outside the Gardener’s core repository.
Currently, Gardener uses DNSProvider
and DNSEntry
resources. However, this introduces undesired coupling of Gardener to a controller that does not adhere to the Gardener extension contracts. Because of this, we plan to stop using DNSProvider
and DNSEntry
resources for Gardener DNS records in the future and use the DNSRecord
resources described here instead.
What does Gardener create DNS records for?
Internal Domain Name
Every shoot cluster’s kube-apiserver running in the seed is exposed via a load balancer that has a public endpoint (IP or hostname). This endpoint is used by end-users and also by system components (that are running in another network, e.g., the kubelet or kube-proxy) to talk to the cluster. In order to be robust against changes of this endpoint (e.g., caused due to re-creation of the load balancer or move of the DNS record to another seed cluster), Gardener creates a so-called internal domain name for every shoot cluster. The internal domain name is a publicly resolvable DNS record that points to the load balancer of the kube-apiserver. Gardener uses this domain name in the kubeconfigs of all system components, instead of using directly the load balancer endpoint. This way Gardener does not need to recreate all kubeconfigs if the endpoint changes - it just needs to update the DNS record.
External Domain Name
The internal domain name is not configurable by end-users directly but configured by the Gardener administrator. However, end-users usually prefer to have another DNS name, maybe even using their own domain sometimes, to access their Kubernetes clusters. Gardener supports that by creating another DNS record, named external domain name, that actually points to the internal domain name. The kubeconfig handed out to end-users does contain this external domain name, i.e., users can access their clusters with the DNS name they like to.
As not every end-user has an own domain, it is possible for Gardener administrators to configure so-called default domains.
If configured, shoots that do not specify a domain explicitly get an external domain name based on a default domain (unless explicitly stated that this shoot should not get an external domain name (.spec.dns.provider=unmanaged
)).
Ingress Domain Name (Deprecated)
Gardener allows to deploy a nginx-ingress-controller
into a shoot cluster (deprecated).
This controller is exposed via a public load balancer (again, either IP or hostname).
Gardener creates a wildcard DNS record pointing to this load balancer.
Ingress
resources can later use this wildcard DNS record to expose underlying applications.
Seed Ingress
If .spec.ingress
is configured in the Seed, Gardener deploys the ingress controller mentioned in .spec.ingress.controller.kind
to the seed cluster. Currently, the only supported kind is “nginx”. If the ingress field is set, then .spec.dns.provider
must also be set. Gardener creates a wildcard DNS record pointing to the load balancer of the ingress controller. The Ingress
resources of components like Plutono and Prometheus in the garden
namespace and the shoot namespaces use this wildcard DNS record to expose their underlying applications.
What needs to be implemented to support a new DNS provider?
As part of the shoot flow, Gardener will create a number of DNSRecord
resources in the seed cluster (one for each of the DNS records mentioned above) that need to be reconciled by an extension controller.
These resources contain the following information:
- The DNS provider type (e.g.,
aws-route53
,google-clouddns
, …) - A reference to a
Secret
object that contains the provider-specific credentials used to communicate with the provider’s API. - The fully qualified domain name (FQDN) of the DNS record, e.g. “api.<shoot domain>”.
- The DNS record type, one of
A
,AAAA
,CNAME
, orTXT
. - The DNS record values, that is a list of IP addresses for A records, a single hostname for CNAME records, or a list of texts for TXT records.
Optionally, the DNSRecord
resource may contain also the following information:
- The region of the DNS record. If not specified, the region specified in the referenced
Secret
shall be used. If that is also not specified, the extension controller shall use a certain default region. - The DNS hosted zone of the DNS record. If not specified, it shall be determined automatically by the extension controller by getting all hosted zones of the account and searching for the longest zone name that is a suffix of the fully qualified domain name (FQDN) mentioned above.
- The TTL of the DNS record in seconds. If not specified, it shall be set by the extension controller to 120.
Example DNSRecord
:
---
apiVersion: v1
kind: Secret
metadata:
name: dnsrecord-bar-external
namespace: shoot--foo--bar
type: Opaque
data:
# aws-route53 specific credentials here
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: DNSRecord
metadata:
name: dnsrecord-external
namespace: default
spec:
type: aws-route53
secretRef:
name: dnsrecord-bar-external
namespace: shoot--foo--bar
# region: eu-west-1
# zone: ZFOO
name: api.bar.foo.my-fancy-domain.com
recordType: A
values:
- 1.2.3.4
# ttl: 600
In order to support a new DNS record provider, you need to write a controller that watches all DNSRecord
s with .spec.type=<my-provider-name>
.
You can take a look at the below referenced example implementation for the AWS route53 provider.
Key Names in Secrets Containing Provider-Specific Credentials
For compatibility with existing setups, extension controllers shall support two different namings of keys in secrets containing provider-specific credentials:
- The naming used by the external-dns-management DNS controller. For example, on AWS the key names are
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
, andAWS_REGION
. - The naming used by other provider-specific extension controllers, e.g., for infrastructure. For example, on AWS the key names are
accessKeyId
,secretAccessKey
, andregion
.
Avoiding Reading the DNS Hosted Zones
If the DNS hosted zone is not specified in the DNSRecord
resource, during the first reconciliation the extension controller shall determine the correct DNS hosted zone for the specified FQDN and write it to the status of the resource:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: DNSRecord
metadata:
name: dnsrecord-external
namespace: shoot--foo--bar
spec:
...
status:
lastOperation: ...
zone: ZFOO
On subsequent reconciliations, the extension controller shall use the zone from the status and avoid reading the DNS hosted zones from the provider.
If the DNSRecord
resource specifies a zone in .spec.zone
and the extension controller has written a value to .status.zone
, the first one shall be considered with higher priority by the extension controller.
Non-Provider Specific Information Required for DNS Record Creation
Some providers might require further information that is not provider specific but already part of the shoot resource.
As Gardener cannot know which information is required by providers, it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information that is not part of the DNSRecord
resource itself.
Using DNSRecord
Resources
gardenlet manages DNSRecord
resources for all three DNS records mentioned above (internal, external, and ingress).
In order to successfully reconcile a shoot with the feature gate enabled, extension controllers for DNSRecord
resources for types used in the default, internal, and custom domain secrets should be registered via ControllerRegistration
resources.
Note: For compatibility reasons, the
spec.dns.providers
section is still used to specify additional providers. Only the one marked asprimary: true
will be used forDNSRecord
. All others are considered by theshoot-dns-service
extension only (if deployed).
Support for DNSRecord
Resources in the Provider Extensions
The following table contains information about the provider extension version that adds support for DNSRecord
resources:
Extension | Version |
---|---|
provider-alicloud | v1.26.0 |
provider-aws | v1.27.0 |
provider-azure | v1.21.0 |
provider-gcp | v1.18.0 |
provider-openstack | v1.21.0 |
provider-vsphere | N/A |
provider-equinix-metal | N/A |
provider-kubevirt | N/A |
provider-openshift | N/A |
Support for DNSRecord
IPv6 recordType: AAAA
in the Provider Extensions
The following table contains information about the provider extension version that adds support for DNSRecord
IPv6 recordType: AAAA
:
Extension | Version |
---|---|
provider-alicloud | N/A |
provider-aws | N/A |
provider-azure | N/A |
provider-gcp | N/A |
provider-openstack | N/A |
provider-vsphere | N/A |
provider-equinix-metal | N/A |
provider-kubevirt | N/A |
provider-openshift | N/A |
provider-local | v1.63.0 |
References and Additional Resources
8 - Extension
Contract: Extension
Resource
Gardener defines common procedures which must be passed to create a functioning shoot cluster. Well known steps are represented by special resources like Infrastructure
, OperatingSystemConfig
or DNS
. These resources are typically reconciled by dedicated controllers setting up the infrastructure on the hyperscaler or managing DNS entries, etc.
But, some requirements don’t match with those special resources or don’t depend on being proceeded at a specific step in the creation / deletion flow of the shoot. They require a more generic hook. Therefore, Gardener offers the Extension
resource.
What is required to register and support an Extension type?
Gardener creates one Extension
resource per registered extension type in ControllerRegistration
per shoot.
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
metadata:
name: extension-example
spec:
resources:
- kind: Extension
type: example
globallyEnabled: true
workerlessSupported: true
If spec.resources[].globallyEnabled
is true
, then the Extension
resources of the given type
is created for every shoot cluster. Set to false
, the Extension
resource is only created if configured in the Shoot
manifest. In case of workerless Shoot
, a globally enabled Extension
resource is created only if spec.resources[].workerlessSupported
is also set to true
. If an extension configured in the spec of a workerless Shoot
is not supported yet, the admission request will be rejected.
The Extension
resources are created in the shoot namespace of the seed cluster.
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: example
namespace: shoot--foo--bar
spec:
type: example
providerConfig: {}
Your controller needs to reconcile extensions.extensions.gardener.cloud
. Since there can exist multiple Extension
resources per shoot, each one holds a spec.type
field to let controllers check their responsibility (similar to all other extension resources of Gardener).
ProviderConfig
It is possible to provide data in the Shoot
resource which is copied to spec.providerConfig
of the Extension
resource.
---
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: bar
namespace: garden-foo
spec:
extensions:
- type: example
providerConfig:
foo: bar
...
results in
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: example
namespace: shoot--foo--bar
spec:
type: example
providerConfig:
foo: bar
Shoot Reconciliation Flow and Extension Status
Gardener creates Extension resources as part of the Shoot reconciliation. Moreover, it is guaranteed that the Cluster resource exists before the Extension
resource is created. Extension
s can be reconciled at different stages during Shoot reconciliation depending on the defined extension lifecycle strategy in the respective ControllerRegistration resource. Please consult the Extension Lifecycle section for more information.
For an Extension
controller it is crucial to maintain the Extension
’s status correctly. At the end Gardener checks the status of each Extension
and only reports a successful shoot reconciliation if the state of the last operation is Succeeded
.
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
generation: 1
name: example
namespace: shoot--foo--bar
spec:
type: example
status:
lastOperation:
state: Succeeded
observedGeneration: 1
9 - Infrastructure
Contract: Infrastructure
Resource
Every Kubernetes cluster requires some low-level infrastructure to be setup in order to work properly.
Examples for that are networks, routing entries, security groups, IAM roles, etc.
Before introducing the Infrastructure
extension resource Gardener was using Terraform in order to create and manage these provider-specific resources (e.g., see here).
Now, Gardener commissions an external, provider-specific controller to take over this task.
Which infrastructure resources are required?
Unfortunately, there is no general answer to this question as it is highly provider specific. Consider the above mentioned resources, i.e. VPC, subnets, route tables, security groups, IAM roles, SSH key pairs. Most of the resources are required in order to create VMs (the shoot cluster worker nodes), load balancers, and volumes.
What needs to be implemented to support a new infrastructure provider?
As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--foo--bar
spec:
type: azure
region: eu-west-1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
providerConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
resourceGroup:
name: mygroup
networks:
vnet: # specify either 'name' or 'cidr'
# name: my-vnet
cidr: 10.250.0.0/16
workers: 10.250.0.0/19
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used to create the needed resources.
However, the most important section is the .spec.providerConfig
.
It contains an embedded declaration of the provider specific configuration for the infrastructure (that cannot be known by Gardener itself).
You are responsible for designing how this configuration looks like.
Gardener does not evaluate it but just copies this part from what has been provided by the end-user in the Shoot
resource.
After your controller has created the required resources in your provider’s infrastructure it needs to generate an output that can be used by other controllers in subsequent steps.
An example for that is the Worker
extension resource controller.
It is responsible for creating virtual machines (shoot worker nodes) in this prepared infrastructure.
Everything that it needs to know in order to do that (e.g. the network IDs, security group names, etc. (again: provider-specific)) needs to be provided as output in the Infrastructure
resource:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--foo--bar
spec:
...
status:
lastOperation: ...
providerStatus:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureStatus
resourceGroup:
name: mygroup
networks:
vnet:
name: my-vnet
subnets:
- purpose: nodes
name: my-subnet
availabilitySets:
- purpose: nodes
id: av-set-id
name: av-set-name
routeTables:
- purpose: nodes
name: route-table-name
securityGroups:
- purpose: nodes
name: sec-group-name
In order to support a new infrastructure provider you need to write a controller that watches all Infrastructure
s with .spec.type=<my-provider-name>
.
You can take a look at the below referenced example implementation for the Azure provider.
Dynamic nodes network for shoot clusters
Some environments do not allow end-users to statically define a CIDR for the network that shall be used for the shoot worker nodes.
In these cases it is possible for the extension controllers to dynamically provision a network for the nodes (as part of their reconciliation loops), and to provide the CIDR in the status
of the Infrastructure
resource:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Infrastructure
metadata:
name: infrastructure
namespace: shoot--foo--bar
spec:
...
status:
lastOperation: ...
providerStatus: ...
nodesCIDR: 10.250.0.0/16
Gardener will pick this nodesCIDR
and use it to configure the VPN components to establish network connectivity between the control plane and the worker nodes.
If the Shoot
resource already specifies a nodes CIDR in .spec.networking.nodes
and the extension controller provides also a value in .status.nodesCIDR
in the Infrastructure
resource then the latter one will always be considered with higher priority by Gardener.
Non-provider specific information required for infrastructure creation
Some providers might require further information that is not provider specific but already part of the shoot resource.
One example for this is the GCP infrastructure controller which needs the pod and the service network of the cluster in order to prepare and configure the infrastructure correctly.
As Gardener cannot know which information is required by providers it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information that is not part of the Infrastructure
resource itself.
Implementation details
Actuator
interface
Most existing infrastructure controller implementations follow a common pattern where a generic Reconciler
delegates to an Actuator
interface that contains the methods Reconcile
, Delete
, Migrate
, and Restore
. These methods are called by the generic Reconciler
for the respective operations, and should be implemented by the extension according to the contract described here and the migration guidelines.
ConfigValidator
interface
For infrastructure controllers, the generic Reconciler
also delegates to a ConfigValidator
interface that contains a single Validate
method. This method is called by the generic Reconciler
at the beginning of every reconciliation, and can be implemented by the extension to validate the .spec.providerConfig
part of the Infrastructure
resource with the respective cloud provider, typically the existence and validity of cloud provider resources such as AWS VPCs or GCP Cloud NAT IPs.
The Validate
method returns a list of errors. If this list is non-empty, the generic Reconciler
will fail with an error. This error will have the error code ERR_CONFIGURATION_PROBLEM
, unless there is at least one error in the list that has its ErrorType
field set to field.ErrorTypeInternal
.
References and additional resources
10 - Network
Contract: Network
Resource
Gardener is an open-source project that provides a nested user model. Basically, there are two types of services provided by Gardener to its users:
- Managed: end-users only request a Kubernetes cluster (Clusters-as-a-Service)
- Hosted: operators utilize Gardener to provide their own managed version of Kubernetes (Cluster-Provisioner-as-a-service)
Whether a user is an operator or an end-user, it makes sense to provide choice. For example, for an end-user it might make sense to choose a network-plugin that would support enforcing network policies (some plugins does not come with network-policy support by default). For operators however, choice only matters for delegation purposes, i.e., when providing an own managed-service, it becomes important to also provide choice over which network-plugins to use.
Furthermore, Gardener provisions clusters on different cloud-providers with different networking requirements. For example, Azure does not support Calico overlay networking with IP in IP [1], this leads to the introduction of manual exceptions in static add-on charts which is error prone and can lead to failures during upgrades.
Finally, every provider is different, and thus the network always needs to adapt to the infrastructure needs to provide better performance. Consistency does not necessarily lie in the implementation but in the interface.
Motivation
Prior to the Network Extensibility
concept, Gardener followed a mono network-plugin support model (i.e., Calico). Although this seemed to be the easier approach, it did not completely reflect the real use-case.
The goal of the Gardener Network Extensions is to support different network plugins, therefore, the specification for the network resource won’t be fixed and will be customized based on the underlying network plugin.
To do so, a ProviderConfig
field in the spec will be provided where each plugin will define. Below is an example for how to deploy Calico as the cluster network plugin.
The Network Extensions Resource
Here is what a typical Network
resource would look-like:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Network
metadata:
name: my-network
spec:
ipFamilies:
- IPv4
podCIDR: 100.244.0.0/16
serviceCIDR: 100.32.0.0/13
type: calico
providerConfig:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
backend: bird
ipam:
cidr: usePodCIDR
type: host-local
The above resources is divided into two parts (more information can be found at Using the Networking Calico Extension):
- global configuration (e.g., podCIDR, serviceCIDR, and type)
- provider specific config (e.g., for calico we can choose to configure a
bird
backend)
Note: Certain cloud-provider extensions might have webhooks that would modify the network-resource to fit into their network specific context. As previously mentioned, Azure does not support IPIP, as a result, the Azure provider extension implements a webhook to mutate the backend and set it to
None
instead ofbird
.
Supporting a New Network Extension Provider
To add support for another networking provider (e.g., weave, Cilium, Flannel) a network extension controller needs to be implemented which would optionally have its own custom configuration specified in the spec.providerConfig
in the Network
resource. For example, if support for a network plugin named gardenet
is required, the following Network
resource would be created:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Network
metadata:
name: my-network
spec:
ipFamilies:
- IPv4
podCIDR: 100.244.0.0/16
serviceCIDR: 100.32.0.0/13
type: gardenet
providerConfig:
apiVersion: gardenet.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
gardenetCustomConfigField: <value>
ipam:
cidr: usePodCIDR
type: host-local
Once applied, the presumably implemented Gardenet
extension controller would pick the configuration up, parse the providerConfig
, and create the necessary resources in the shoot.
For additional reference, please have a look at the networking-calico provider extension, which provides more information on how to configure the necessary charts, as well as the actuators required to reconcile networking inside the Shoot
cluster to the desired state.
Supporting kube-proxy
-less Service Routing
Some networking extensions support service routing without the kube-proxy
component. This is why Gardener supports disabling of kube-proxy
for service routing by setting .spec.kubernetes.kubeproxy.enabled
to false
in the Shoot
specification. The implicit contract of the flag is:
If kube-proxy
is disabled, then the networking extension is responsible for the service routing.
The networking extensions need to handle this twofold:
- During the reconciliation of the networking resources, the extension needs to check whether
kube-proxy
takes care of the service routing or the networking extension itself should handle it. In case the networking extension should be responsible according to.spec.kubernetes.kubeproxy.enabled
(but is unable to perform the service routing), it should raise an error during the reconciliation. If the networking extension should handle the service routing, it may reconfigure itself accordingly. - (Optional) In case the networking extension does not support taking over the service routing (in some scenarios), it is recommended to also provide a validating admission webhook to reject corresponding changes early on. The validation may take the current operating mode of the networking extension into consideration.
Related Links
11 - OperatingSystemConfig
Contract: OperatingSystemConfig
Resource
Gardener uses the machine API and leverages the functionalities of the machine-controller-manager (MCM) in order to manage the worker nodes of a shoot cluster. The machine-controller-manager itself simply takes a reference to an OS-image and (optionally) some user-data (a script or configuration that is executed when a VM is bootstrapped), and forwards both to the provider’s API when creating VMs. MCM does not have any restrictions regarding supported operating systems as it does not modify or influence the machine’s configuration in any way - it just creates/deletes machines with the provided metadata.
Consequently, Gardener needs to provide this information when interacting with the machine-controller-manager. This means that basically every operating system is possible to be used, as long as there is some implementation that generates the OS-specific configuration in order to provision/bootstrap the machines.
⚠️ Currently, there are a few requirements of pre-installed components that must be present in all OS images:
- containerd
- ctr (client CLI)
containerd
must listen on its default socket path:unix:///run/containerd/containerd.sock
containerd
must be configured to work with the default configuration file in:/etc/containerd/config.toml
(eventually created by Gardener).
- systemd
The reasons for that will become evident later.
What does the user-data bootstrapping the machines contain?
Gardener installs a few components onto every worker machine in order to allow it to join the shoot cluster.
There is the kubelet
process, some scripts for continuously checking the health of kubelet
and containerd
, but also configuration for log rotation, CA certificates, etc.
You can find the complete configuration at the components folder. We are calling this the “original” user-data.
How does Gardener bootstrap the machines?
gardenlet
makes use of gardener-node-agent
to perform the bootstrapping and reconciliation of systemd units and files on the machine.
Please refer to this document for a first overview.
Usually, you would submit all the components you want to install onto the machine as part of the user-data during creation time.
However, some providers do have a size limitation (around ~16KB) for that user-data.
That’s why we do not send the “original” user-data to the machine-controller-manager (who then forwards it to the provider’s API).
Instead, we only send a small “init” script that bootstrap the gardener-node-agent
.
It fetches the “original” content from a Secret
and applies it on the machine directly.
This way we can extend the “original” user-data without any size restrictions (except for the 1 MB
limit for Secret
s).
The high-level flow is as follows:
- For every worker pool
X
in theShoot
specification, Gardener creates aSecret
namedcloud-config-<X>
in thekube-system
namespace of the shoot cluster. The secret contains the “original”OperatingSystemConfig
(i.e., systemd units and files forkubelet
). - Gardener generates a kubeconfig with minimal permissions just allowing reading these secrets. It is used by the
gardener-node-agent
later. - Gardener provides the
gardener-node-init.sh
bash script and the machine image stated in theShoot
specification to the machine-controller-manager. - Based on this information, the machine-controller-manager creates the VM.
- After the VM has been provisioned, the
gardener-node-init.sh
script starts, fetches thegardener-node-agent
binary, and starts it. - The
gardener-node-agent
will read thegardener-node-agent-<X>
Secret
for its worker pool (containing the “original”OperatingSystemConfig
), and reconciles it.
The gardener-node-agent
can update itself in case of newer Gardener versions, and it performs a continuous reconciliation of the systemd units and files in the provided OperatingSystemConfig
(just like any other Kubernetes controller).
What needs to be implemented to support a new operating system?
As part of the Shoot
reconciliation flow, gardenlet
will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: OperatingSystemConfig
metadata:
name: pool-01-original
namespace: default
spec:
type: <my-operating-system>
purpose: reconcile
units:
- name: containerd.service
dropIns:
- name: 10-containerd-opts.conf
content: |
[Service]
Environment="SOME_OPTS=--foo=bar"
- name: containerd-monitor.service
command: start
enable: true
content: |
[Unit]
Description=Containerd-monitor daemon
After=kubelet.service
[Install]
WantedBy=multi-user.target
[Service]
Restart=always
EnvironmentFile=/etc/environment
ExecStart=/opt/bin/health-monitor containerd
files:
- path: /var/lib/kubelet/ca.crt
permissions: 0644
encoding: b64
content:
secretRef:
name: default-token-5dtjz
dataKey: token
- path: /etc/sysctl.d/99-k8s-general.conf
permissions: 0644
content:
inline:
data: |
# A higher vm.max_map_count is great for elasticsearch, mongo, or other mmap users
# See https://github.com/kubernetes/kops/issues/1340
vm.max_map_count = 135217728
In order to support a new operating system, you need to write a controller that watches all OperatingSystemConfig
s with .spec.type=<my-operating-system>
.
For those it shall generate a configuration blob that fits to your operating system.
OperatingSystemConfig
s can have two purposes: either provision
or reconcile
.
provision
Purpose
The provision
purpose is used by gardenlet
for the user-data that it later passes to the machine-controller-manager (and then to the provider’s API) when creating new VMs.
It contains the gardener-node-init.sh
script and systemd unit.
The OS controller has to translate the .spec.units
and .spec.files
into configuration that fits to the operating system.
For example, a Flatcar controller might generate a CoreOS cloud-config or Ignition, SLES might generate cloud-init, and others might simply generate a bash script translating the .spec.units
into systemd
units, and .spec.files
into real files on the disk.
⚠️ Please avoid mixing in additional systemd units or files - this step should just translate what
gardenlet
put into.spec.units
and.spec.files
.
After generation, extension controllers are asked to store their OS config inside a Secret
(as it might contain confidential data) in the same namespace.
The secret’s .data
could look like this:
apiVersion: v1
kind: Secret
metadata:
name: osc-result-pool-01-original
namespace: default
ownerReferences:
- apiVersion: extensions.gardener.cloud/v1alpha1
blockOwnerDeletion: true
controller: true
kind: OperatingSystemConfig
name: pool-01-original
uid: 99c0c5ca-19b9-11e9-9ebd-d67077b40f82
data:
cloud_config: base64(generated-user-data)
Finally, the secret’s metadata must be provided in the OperatingSystemConfig
’s .status
field:
...
status:
cloudConfig:
secretRef:
name: osc-result-pool-01-original
namespace: default
lastOperation:
description: Successfully generated cloud config
lastUpdateTime: "2019-01-23T07:45:23Z"
progress: 100
state: Succeeded
type: Reconcile
observedGeneration: 5
reconcile
Purpose
The reconcile
purpose contains the “original” OperatingSystemConfig
(which is later stored in Secret
s in the shoot’s kube-system
namespace (see step 1)).
The OS controller does not need to translate anything here, but it has the option to provide additional systemd units or files via the .status
field:
status:
extensionUnits:
- name: my-custom-service.service
command: start
enable: true
content: |
[Unit]
// some systemd unit content
extensionFiles:
- path: /etc/some/file
permissions: 0644
content:
inline:
data: some-file-content
lastOperation:
description: Successfully generated cloud config
lastUpdateTime: "2019-01-23T07:45:23Z"
progress: 100
state: Succeeded
type: Reconcile
observedGeneration: 5
The gardener-node-agent
will merge .spec.units
and .status.extensionUnits
as well as .spec.files
and .status.extensionFiles
when applying.
You can find an example implementation here.
Bootstrap Tokens
gardenlet
adds a file with the content <<BOOTSTRAP_TOKEN>>
to the OperatingSystemConfig
with purpose provision
and sets transmitUnencoded=true
.
This instructs the responsible OS extension to pass this file (with its content in clear-text) to the corresponding Worker
resource.
machine-controller-manager
makes sure that:
- a bootstrap token gets created per machine
- the
<<BOOTSTRAP_TOKEN>>
string in the user data of the machine gets replaced by the generated token
After the machine has been bootstrapped, the token secret in the shoot cluster gets deleted again.
The token is used to bootstrap Gardener Node Agent and kubelet
.
What needs to be implemented to support a new operating system?
As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: OperatingSystemConfig
metadata:
name: pool-01-original
namespace: default
spec:
type: <my-operating-system>
purpose: reconcile
units:
- name: docker.service
dropIns:
- name: 10-docker-opts.conf
content: |
[Service]
Environment="DOCKER_OPTS=--log-opt max-size=60m --log-opt max-file=3"
- name: docker-monitor.service
command: start
enable: true
content: |
[Unit]
Description=Containerd-monitor daemon
After=kubelet.service
[Install]
WantedBy=multi-user.target
[Service]
Restart=always
EnvironmentFile=/etc/environment
ExecStart=/opt/bin/health-monitor docker
files:
- path: /var/lib/kubelet/ca.crt
permissions: 0644
encoding: b64
content:
secretRef:
name: default-token-5dtjz
dataKey: token
- path: /etc/sysctl.d/99-k8s-general.conf
permissions: 0644
content:
inline:
data: |
# A higher vm.max_map_count is great for elasticsearch, mongo, or other mmap users
# See https://github.com/kubernetes/kops/issues/1340
vm.max_map_count = 135217728
In order to support a new operating system, you need to write a controller that watches all OperatingSystemConfig
s with .spec.type=<my-operating-system>
.
For those it shall generate a configuration blob that fits to your operating system.
For example, a CoreOS controller might generate a CoreOS cloud-config or Ignition, SLES might generate cloud-init, and others might simply generate a bash script translating the .spec.units
into systemd
units, and .spec.files
into real files on the disk.
OperatingSystemConfig
s can have two purposes which can be used (or ignored) by the extension controllers: either provision
or reconcile
.
- The
provision
purpose is used by Gardener for the user-data that it later passes to the machine-controller-manager (and then to the provider’s API) when creating new VMs. It contains thegardener-node-init
unit. - The
reconcile
purpose contains the “original” user-data (that is then stored inSecret
s in the shoot’skube-system
namespace (see step 1)). This is downloaded and applies late (see step 5).
As described above, the “original” user-data must be re-applicable to allow in-place updates.
The way how this is done is specific to the generated operating system config (e.g., for CoreOS cloud-init the command is /usr/bin/coreos-cloudinit --from-file=<path>
, whereas SLES would run cloud-init --file <path> single -n write_files --frequency=once
).
Consequently, besides the generated OS config, the extension controller must also provide a command for re-application an updated version of the user-data.
As visible in the mentioned examples, the command requires a path to the user-data file.
As soon as Gardener detects that the user data has changed it will reload the systemd daemon and restart all the units provided in the .status.units[]
list (see the below example). The same logic applies during the very first application of the whole configuration.
After generation, extension controllers are asked to store their OS config inside a Secret
(as it might contain confidential data) in the same namespace.
The secret’s .data
could look like this:
apiVersion: v1
kind: Secret
metadata:
name: osc-result-pool-01-original
namespace: default
ownerReferences:
- apiVersion: extensions.gardener.cloud/v1alpha1
blockOwnerDeletion: true
controller: true
kind: OperatingSystemConfig
name: pool-01-original
uid: 99c0c5ca-19b9-11e9-9ebd-d67077b40f82
data:
cloud_config: base64(generated-user-data)
Finally, the secret’s metadata, the OS-specific command to re-apply the configuration, and the list of systemd
units that shall be considered to be restarted if an updated version of the user-data is re-applied must be provided in the OperatingSystemConfig
’s .status
field:
...
status:
cloudConfig:
secretRef:
name: osc-result-pool-01-original
namespace: default
lastOperation:
description: Successfully generated cloud config
lastUpdateTime: "2019-01-23T07:45:23Z"
progress: 100
state: Succeeded
type: Reconcile
observedGeneration: 5
units:
- docker-monitor.service
Once the .status
indicates that the extension controller finished reconciling Gardener will continue with the next step of the shoot reconciliation flow.
CRI Support
Gardener supports specifying a Container Runtime Interface (CRI) configuration in the OperatingSystemConfig
resource. If the .spec.cri
section exists, then the name
property is mandatory. The only supported value for cri.name
at the moment is: containerd
.
For example:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: OperatingSystemConfig
metadata:
name: pool-01-original
namespace: default
spec:
type: <my-operating-system>
purpose: reconcile
cri:
name: containerd
# cgroupDriver: cgroupfs # or systemd
containerd:
sandboxImage: registry.k8s.io/pause
# registries:
# - upstream: docker.io
# server: https://registry-1.docker.io
# hosts:
# - url: http://<service-ip>:<port>]
# plugins:
# - op: add # add (default) or remove
# path: [io.containerd.grpc.v1.cri, containerd]
# values: '{"default_runtime_name": "runc"}'
...
To support containerd
, an OS extension must satisfy the following criteria:
- The operating system must have built-in containerd and ctr (client CLI).
containerd
must listen on its default socket path:unix:///run/containerd/containerd.sock
containerd
must be configured to work with the default configuration file in:/etc/containerd/config.toml
(Created by Gardener).
For a convenient handling, gardener-node-agent can manage various aspects of containerd’s config, e.g. the registry configuration, if given in the OperatingSystemConfig
.
Any Gardener extension which needs to modify the config, should check the functionality exposed through this API first.
If applicable, adjustments can be implemented through mutating webhooks, acting on the created or updated OperatingSystemConfig
resource.
If CRI configurations are not supported, it is recommended to create a validating webhook running in the garden cluster that prevents specifying the .spec.providers.workers[].cri
section in the Shoot
objects.
cgroup driver
For Shoot clusters using Kubernetes < 1.31, Gardener is setting the kubelet’s cgroup driver to cgroupfs
and containerd’s cgroup driver is unmanaged. For Shoot clusters using Kubernetes 1.31+, Gardener is setting both kubelet’s and containerd’s cgroup driver to systemd
.
The systemd
cgroup driver is a requirement for operating systems using cgroup v2. It’s important to ensure that both kubelet and the container runtime (containerd) are using the same cgroup driver to avoid potential issues.
OS extensions might also overwrite the cgroup driver for containerd and kubelet.
References and Additional Resources
12 - Worker
Contract: Worker
Resource
While the control plane of a shoot cluster is living in the seed and deployed as native Kubernetes workload, the worker nodes of the shoot clusters are normal virtual machines (VMs) in the end-users infrastructure account.
The Gardener project features a sub-project called machine-controller-manager.
This controller is extending the Kubernetes API using custom resource definitions to represent actual VMs as Machine
objects inside a Kubernetes system.
This approach unlocks the possibility to manage virtual machines in the Kubernetes style and benefit from all its design principles.
What is the machine-controller-manager doing exactly?
Generally, there are provider-specific MachineClass
objects (AWSMachineClass
, AzureMachineClass
, etc.; similar to StorageClass
), and MachineDeployment
, MachineSet
, and Machine
objects (similar to Deployment
, ReplicaSet
, and Pod
).
A machine class describes where and how to create virtual machines (in which networks, region, availability zone, SSH key, user-data for bootstrapping, etc.), while a Machine
results in an actual virtual machine.
You can read up more information in the machine-controller-manager’s repository.
The gardenlet
deploys the machine-controller-manager
, hence, provider extensions only have to inject their specific out-of-tree machine-controller-manager
sidecar container into the Deployment
.
What needs to be implemented to support a new worker provider?
As part of the shoot flow Gardener will create a special CRD in the seed cluster that needs to be reconciled by an extension controller, for example:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Worker
metadata:
name: bar
namespace: shoot--foo--bar
spec:
type: azure
region: eu-west-1
secretRef:
name: cloudprovider
namespace: shoot--foo--bar
infrastructureProviderStatus:
apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureStatus
ec2:
keyName: shoot--foo--bar-ssh-publickey
iam:
instanceProfiles:
- name: shoot--foo--bar-nodes
purpose: nodes
roles:
- arn: arn:aws:iam::0123456789:role/shoot--foo--bar-nodes
purpose: nodes
vpc:
id: vpc-0123456789
securityGroups:
- id: sg-1234567890
purpose: nodes
subnets:
- id: subnet-01234
purpose: nodes
zone: eu-west-1b
- id: subnet-56789
purpose: public
zone: eu-west-1b
- id: subnet-0123a
purpose: nodes
zone: eu-west-1c
- id: subnet-5678a
purpose: public
zone: eu-west-1c
pools:
- name: cpu-worker
minimum: 3
maximum: 5
maxSurge: 1
maxUnavailable: 0
machineType: m4.large
machineImage:
name: coreos
version: 1967.5.0
nodeAgentSecretName: gardener-node-agent-local-ee46034b8269353b
nodeTemplate:
capacity:
cpu: 2
gpu: 0
memory: 8Gi
labels:
node.kubernetes.io/role: node
worker.gardener.cloud/cri-name: containerd
worker.gardener.cloud/pool: cpu-worker
worker.gardener.cloud/system-components: "true"
userDataSecretRef:
name: user-data-secret
key: cloud_config
volume:
size: 20Gi
type: gp2
zones:
- eu-west-1b
- eu-west-1c
machineControllerManager:
drainTimeout: 10m
healthTimeout: 10m
creationTimeout: 10m
maxEvictRetries: 30
nodeConditions:
- ReadonlyFilesystem
- DiskPressure
- KernelDeadlock
clusterAutoscaler:
scaleDownUtilizationThreshold: 0.5
scaleDownGpuUtilizationThreshold: 0.5
scaleDownUnneededTime: 30m
scaleDownUnreadyTime: 1h
maxNodeProvisionTime: 15m
The .spec.secretRef
contains a reference to the provider secret pointing to the account that shall be used to create the needed virtual machines.
Also, as you can see, Gardener copies the output of the infrastructure creation (.spec.infrastructureProviderStatus
, see Infrastructure
resource), into the .spec
.
In the .spec.pools[]
field, the desired worker pools are listed.
In the above example, one pool with machine type m4.large
and min=3
, max=5
machines shall be spread over two availability zones (eu-west-1b
, eu-west-1c
).
This information together with the infrastructure status must be used to determine the proper configuration for the machine classes.
The spec.pools[].labels
map contains all labels that should be added to all nodes of the corresponding worker pool.
Gardener configures kubelet’s --node-labels
flag to contain all labels that are mentioned here and allowed by the NodeRestriction
admission plugin.
This makes sure that kubelet adds all user-specified and gardener-managed labels to the new Node
object when registering a new machine with the API server.
Nevertheless, this is only effective when bootstrapping new nodes.
The provider extension (respectively, machine-controller-manager) is still responsible for updating the labels of existing Nodes
when the worker specification changes.
The spec.pools[].nodeTemplate.capacity
field contains the resource information of the machine like cpu
, gpu
, and memory
. This info is used by Cluster Autoscaler to generate nodeTemplate
during scaling the nodeGroup
from zero.
The spec.pools[].machineControllerManager
field allows to configure the settings for machine-controller-manager component. Providers must populate these settings on worker-pool to the related fields in MachineDeployment.
The spec.pools[].clusterAutoscaler
field contains cluster-autoscaler
settings that are to be applied only to specific worker group. cluster-autoscaler
expects to find these settings as annotations on the MachineDeployment
, and so providers must pass these values to the corresponding MachineDeployment
via annotations. The keys for these annotations can be found here and the values for the corresponding annotations should be the same as what is passed into the field. Providers can use the helper function extensionsv1alpha1helper.GetMachineDeploymentClusterAutoscalerAnnotations
that returns the annotation map to be used.
The controller must only inject its provider-specific sidecar container into the machine-controller-manager
Deployment
managed by gardenlet
.
After that, it must compute the desired machine classes and the desired machine deployments. Typically, one class maps to one deployment, and one class/deployment is created per availability zone. Following this convention, the created resource would look like this:
apiVersion: v1
kind: Secret
metadata:
name: shoot--foo--bar-cpu-worker-z1-3db65
namespace: shoot--foo--bar
labels:
gardener.cloud/purpose: machineclass
type: Opaque
data:
providerAccessKeyId: eW91ci1hd3MtYWNjZXNzLWtleS1pZAo=
providerSecretAccessKey: eW91ci1hd3Mtc2VjcmV0LWFjY2Vzcy1rZXkK
userData: c29tZSBkYXRhIHRvIGJvb3RzdHJhcCB0aGUgVk0K
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: AWSMachineClass
metadata:
name: shoot--foo--bar-cpu-worker-z1-3db65
namespace: shoot--foo--bar
spec:
ami: ami-0123456789 # Your controller must map the stated version to the provider specific machine image information, in the AWS case the AMI.
blockDevices:
- ebs:
volumeSize: 20
volumeType: gp2
iam:
name: shoot--foo--bar-nodes
keyName: shoot--foo--bar-ssh-publickey
machineType: m4.large
networkInterfaces:
- securityGroupIDs:
- sg-1234567890
subnetID: subnet-01234
region: eu-west-1
secretRef:
name: shoot--foo--bar-cpu-worker-z1-3db65
namespace: shoot--foo--bar
tags:
kubernetes.io/cluster/shoot--foo--bar: "1"
kubernetes.io/role/node: "1"
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: MachineDeployment
metadata:
name: shoot--foo--bar-cpu-worker-z1
namespace: shoot--foo--bar
spec:
replicas: 2
selector:
matchLabels:
name: shoot--foo--bar-cpu-worker-z1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
name: shoot--foo--bar-cpu-worker-z1
spec:
class:
kind: AWSMachineClass
name: shoot--foo--bar-cpu-worker-z1-3db65
for the first availability zone eu-west-1b
, and
apiVersion: v1
kind: Secret
metadata:
name: shoot--foo--bar-cpu-worker-z2-5z6as
namespace: shoot--foo--bar
labels:
gardener.cloud/purpose: machineclass
type: Opaque
data:
providerAccessKeyId: eW91ci1hd3MtYWNjZXNzLWtleS1pZAo=
providerSecretAccessKey: eW91ci1hd3Mtc2VjcmV0LWFjY2Vzcy1rZXkK
userData: c29tZSBkYXRhIHRvIGJvb3RzdHJhcCB0aGUgVk0K
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: AWSMachineClass
metadata:
name: shoot--foo--bar-cpu-worker-z2-5z6as
namespace: shoot--foo--bar
spec:
ami: ami-0123456789 # Your controller must map the stated version to the provider specific machine image information, in the AWS case the AMI.
blockDevices:
- ebs:
volumeSize: 20
volumeType: gp2
iam:
name: shoot--foo--bar-nodes
keyName: shoot--foo--bar-ssh-publickey
machineType: m4.large
networkInterfaces:
- securityGroupIDs:
- sg-1234567890
subnetID: subnet-0123a
region: eu-west-1
secretRef:
name: shoot--foo--bar-cpu-worker-z2-5z6as
namespace: shoot--foo--bar
tags:
kubernetes.io/cluster/shoot--foo--bar: "1"
kubernetes.io/role/node: "1"
---
apiVersion: machine.sapcloud.io/v1alpha1
kind: MachineDeployment
metadata:
name: shoot--foo--bar-cpu-worker-z1
namespace: shoot--foo--bar
spec:
replicas: 1
selector:
matchLabels:
name: shoot--foo--bar-cpu-worker-z1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
name: shoot--foo--bar-cpu-worker-z1
spec:
class:
kind: AWSMachineClass
name: shoot--foo--bar-cpu-worker-z2-5z6as
for the second availability zone eu-west-1c
.
Another convention is the 5-letter hash at the end of the machine class names.
Most controllers compute a checksum out of the specification of the machine class.
Any change to the value of the nodeAgentSecretName
field must result in a change of the machine class name.
The checksum in the machine class name helps to trigger a rolling update of the worker nodes if, for example, the machine image version changes.
In this case, a new checksum will be generated which results in the creation of a new machine class.
The MachineDeployment
’s machine class reference (.spec.template.spec.class.name
) is updated, which triggers the rolling update process in the machine-controller-manager.
However, all of this is only a convention that eases writing the controller, but you can do it completely differently if you desire - as long as you make sure that the described behaviours are implemented correctly.
After the machine classes and machine deployments have been created, the machine-controller-manager will start talking to the provider’s IaaS API and create the virtual machines.
Gardener makes sure that the content of the Secret
referenced in the userDataSecretRef
field that is used to bootstrap the machines contains the required configuration for installation of the kubelet and registering the VM as worker node in the shoot cluster.
The Worker
extension controller shall wait until all the created MachineDeployment
s indicate healthiness/readiness before it ends the control loop.
Does Gardener need some information that must be returned back?
Another important benefit of the machine-controller-manager’s design principles (extending the Kubernetes API using CRDs) is that the cluster-autoscaler can be used without any provider-specific implementation. We have forked the upstream Kubernetes community’s cluster-autoscaler and extended it so that it understands the machine API. Definitely, we will merge it back into the community’s versions once it has been adapted properly.
Our cluster-autoscaler only needs to know the minimum and maximum number of replicas per MachineDeployment
and is ready to act. Without knowing that, it needs to talk to the provider APIs (it just modifies the .spec.replicas
field in the MachineDeployment
object).
Gardener deploys this autoscaler if there is at least one worker pool that specifies max>min
.
In order to know how it needs to configure it, the provider-specific Worker
extension controller must expose which MachineDeployment
s it has created and how the min
/max
numbers should look like.
Consequently, your controller should write this information into the Worker
resource’s .status.machineDeployments
field. It should also update the .status.machineDeploymentsLastUpdateTime
field along with .status.machineDeployments
, so that gardener is able to deploy Cluster-Autoscaler right after the status is updated with the latest MachineDeployment
s and does not wait for the reconciliation to be completed:
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Worker
metadata:
name: worker
namespace: shoot--foo--bar
spec:
...
status:
lastOperation: ...
machineDeployments:
- name: shoot--foo--bar-cpu-worker-z1
minimum: 2
maximum: 3
- name: shoot--foo--bar-cpu-worker-z2
minimum: 1
maximum: 2
machineDeploymentsLastUpdateTime: "2023-05-01T12:44:27Z"
In order to support a new worker provider, you need to write a controller that watches all Worker
s with .spec.type=<my-provider-name>
.
You can take a look at the below referenced example implementation for the AWS provider.
That sounds like a lot that needs to be done, can you help me?
All of the described behaviour is mostly the same for every provider.
The only difference is maybe the version/configuration of the provider-specific machine-controller-manager
sidecar container, and the machine class specification itself.
You can take a look at our extension library, especially the worker controller part where you will find a lot of utilities that you can use.
Note that there are also utility functions for getting the default sidecar container specification or corresponding VPA container policy in the machinecontrollermanager
package called ProviderSidecarContainer
and ProviderSidecarVPAContainerPolicy
.
Also, using the library you only need to implement your provider specifics - all the things that can be handled generically can be taken for free and do not need to be re-implemented.
Take a look at the AWS worker controller for finding an example.
Non-provider specific information required for worker creation
All the providers require further information that is not provider specific but already part of the shoot resource.
One example for such information is whether the shoot is hibernated or not.
In this case, all the virtual machines should be deleted/terminated, and after that the machine controller-manager should be scaled down.
You can take a look at the AWS worker controller to see how it reads this information and how it is used.
As Gardener cannot know which information is required by providers, it simply mirrors the Shoot
, Seed
, and CloudProfile
resources into the seed.
They are part of the Cluster
extension resource and can be used to extract information that is not part of the Worker
resource itself.