This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Others

Other Gardener extensions

1 - Certificate Services

Gardener extension controller for certificate services for shoot clusters

Gardener Extension for certificate services

Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.

Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.

Configuration

Example configuration for this extension controller:

apiVersion: shoot-cert-service.extensions.config.gardener.cloud/v1alpha1
kind: Configuration
issuerName: gardener
restrictIssuer: true # restrict issuer to any sub-domain of shoot.spec.dns.domain (default)
acme:
  email: john.doe@example.com
  server: https://acme-v02.api.letsencrypt.org/directory
# privateKey: | # Optional key for Let's Encrypt account.
#   -----BEGIN BEGIN RSA PRIVATE KEY-----
#   ...
#   -----END RSA PRIVATE KEY-----

Extension-Resources

Example extension resource:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
  name: "extension-certificate-service"
  namespace: shoot--project--abc
spec:
  type: shoot-cert-service

When an extension resource is reconciled, the extension controller will create an instance of Cert-Management as well as an Issuer with the ACME information provided in the configuration above. These resources are placed inside the shoot namespace on the seed. Also, the controller takes care about generating necessary RBAC resources for the seed as well as for the shoot.

Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters, i.e. it never deploys them directly.

How to start using or developing this extension controller locally

You can run the controller locally on your machine by executing make start. Please make sure to have the kubeconfig to the cluster you want to connect to ready in the ./dev/kubeconfig file. Static code checks and tests can be executed by running make verify. We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.

Feedback and Support

Feedback and contributions are always welcome!

Please report bugs or suggestions as GitHub issues or reach out on Slack (join the workspace here).

Learn more!

Please find further resources about out project here:

1.1 - Changing alerting settings

How to change the alerting on expiring certificates

Changing alerting settings

Certificates are normally renewed automatically 30 days before they expire. As a second line of defense, there is an alerting in Prometheus activated if the certificate is a few days before expiration. By default, the alert is triggered 15 days before expiration.

You can configure the days in the providerConfig of the extension. Setting it to 0 disables the alerting.

In this example, the days are changed to 3 days before expiration.

kind: Shoot
...
spec:
  extensions:
  - type: shoot-cert-service
    providerConfig:
      apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
      kind: CertConfig
      alerting:
        certExpirationAlertDays: 3

1.2 - Manage certificates with Gardener for default domain

Use the Gardener cert-management to get fully managed, publicly trusted TLS certificates

Manage certificates with Gardener for default domain

Introduction

Dealing with applications on Kubernetes which offer a secure service endpoints (e.g. HTTPS) also require you to enable a secured communication via SSL/TLS. With the certificate extension enabled, Gardener can manage commonly trusted X.509 certificate for your application endpoint. From initially requesting certificate, it also handeles their renewal in time using the free Let’s Encrypt API.

There are two senarios with which you can use the certificate extension

You want to use a certificate for a subdomain the shoot’s default DNS (see .spec.dns.domain of your shoot resource, e.g. short.ingress.shoot.project.default-domain.gardener.cloud). If this is your case, please keep reading this article.
You want to use a certificate for a custom domain. If this is your case, please see Manage certificates with Gardener for public domain

Prerequisites

Before you start this guide there are a few requirements you need to fulfill:

You have an existing shoot cluster

Since you are using the default DNS name, all DNS configuration should already be done and ready.

Issue a certificate

Every X.509 certificate is represented by a Kubernetes custom resource certificate.cert.gardener.cloud in your cluster. A Certificate resource may be used to initiate a new certificate request as well as to manage its lifecycle. Gardener’s certificate service regularly checks the expiration timestamp of Certificates, triggers a renewal process if necessary and replaces the existing X.509 certificate with a new one.

Your application should be able to reload replaced certificates in a timely manner to avoid service disruptions.

Certificates can be requested via 3 resources type

Ingress
Service (type LoadBalancer)
certificate (Gardener CRD)

If either of the first 2 are used, a corresponding Certificate resource will automatically be created.

Using an ingress Resource

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: amazing-ingress
  annotations:
    cert.gardener.cloud/purpose: managed
    #cert.gardener.cloud/issuer: custom-issuer                    # optional to specify custom issuer (use namespace/name for shoot issuers)
    #cert.gardener.cloud/follow-cname: "true"                     # optional, same as spec.followCNAME in certificates
    #cert.gardener.cloud/secret-labels: "key1=value1,key2=value2" # optional labels for the certificate secret
    #cert.gardener.cloud/preferred-chain: "chain name"            # optional to specify preferred-chain (value is the Subject Common Name of the root issuer)
    #cert.gardener.cloud/private-key-algorithm: ECDSA             # optional to specify algorithm for private key, allowed values are 'RSA' or 'ECDSA'
    #cert.gardener.cloud/private-key-size: "384"                  # optional to specify size of private key, allowed values for RSA are "2048", "3072", "4096" and for ECDSA "256" and "384"spec:
  tls:
  - hosts:
    # Must not exceed 64 characters.
    - short.ingress.shoot.project.default-domain.gardener.cloud
    # Certificate and private key reside in this secret.
    secretName: tls-secret
  rules:
  - host: short.ingress.shoot.project.default-domain.gardener.cloud
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: amazing-svc
            port:
              number: 8080

Using a service type LoadBalancer

apiVersion: v1
kind: Service
metadata:
  annotations:
    cert.gardener.cloud/purpose: managed
    # Certificate and private key reside in this secret.
    cert.gardener.cloud/secretname: tls-secret
    # You may add more domains separated by commas (e.g. "service.shoot.project.default-domain.gardener.cloud, amazing.shoot.project.default-domain.gardener.cloud")
    dns.gardener.cloud/dnsnames: "service.shoot.project.default-domain.gardener.cloud" 
    dns.gardener.cloud/ttl: "600"
    #cert.gardener.cloud/issuer: custom-issuer                    # optional to specify custom issuer (use namespace/name for shoot issuers)
    #cert.gardener.cloud/follow-cname: "true"                     # optional, same as spec.followCNAME in certificates
    #cert.gardener.cloud/secret-labels: "key1=value1,key2=value2" # optional labels for the certificate secret
    #cert.gardener.cloud/preferred-chain: "chain name"            # optional to specify preferred-chain (value is the Subject Common Name of the root issuer)
    #cert.gardener.cloud/private-key-algorithm: ECDSA             # optional to specify algorithm for private key, allowed values are 'RSA' or 'ECDSA'
    #cert.gardener.cloud/private-key-size: "384"                  # optional to specify size of private key, allowed values for RSA are "2048", "3072", "4096" and for ECDSA "256" and "384"  name: test-service
  namespace: default
spec:
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 8080
  type: LoadBalancer

Using the custom Certificate resource

apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
  name: cert-example
  namespace: default
spec:
  commonName: short.ingress.shoot.project.default-domain.gardener.cloud
  secretRef:
    name: tls-secret
    namespace: default
  # Optionnal if using the default issuer
  issuerRef:
    name: garden

If you’re interested in the current progress of your request, you’re advised to consult the description, more specifically the status attribute in case the issuance failed.

Request a wildcard certificate

In order to avoid the creation of multiples certificates for every single endpoints, you may want to create a wildcard certificate for your shoot’s default cluster.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: amazing-ingress
  annotations:
    cert.gardener.cloud/purpose: managed
    cert.gardener.cloud/commonName: "*.ingress.shoot.project.default-domain.gardener.cloud"
spec:
  tls:
  - hosts:
    - amazing.ingress.shoot.project.default-domain.gardener.cloud
    secretName: tls-secret
  rules:
  - host: amazing.ingress.shoot.project.default-domain.gardener.cloud
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: amazing-svc
            port:
              number: 8080

Please note that this can also be achived by directly adding an annotation to a Service type LoadBalancer. You could also create a Certificate object with a wildcard domain.

More information

For more information and more examples about using the certificate extension, please see Manage certificates with Gardener for public domain

1.3 - Manage certificates with Gardener for public domain

Use the Gardener cert-management to get fully managed, publicly trusted TLS certificates

Manage certificates with Gardener for public domain

Introduction

There are two senarios with which you can use the certificate extension

You want to use a certificate for a subdomain the shoot’s default DNS (see .spec.dns.domain of your shoot resource, e.g. short.ingress.shoot.project.default-domain.gardener.cloud). If this is your case, please see Manage certificates with Gardener for default domain
You want to use a certificate for a custom domain. If this is your case, please keep reading this article.

Prerequisites

Before you start this guide there are a few requirements you need to fulfill:

You have an existing shoot cluster
Your custom domain is under a public top level domain (e.g. .com)
Your custom zone is resolvable with a public resolver via the internet (e.g. 8.8.8.8)
You have a custom DNS provider configured and working (see “DNS Providers”)

As part of the Let’s Encrypt ACME challenge validation process, Gardener sets a DNS TXT entry and Let’s Encrypt checks if it can both resolve and authenticate it. Therefore, it’s important that your DNS-entries are publicly resolvable. You can check this by querying e.g. Googles public DNS server and if it returns an entry your DNS is publicly visible:

# returns the A record for cert-example.example.com using Googles DNS server (8.8.8.8)
dig cert-example.example.com @8.8.8.8 A

DNS provider

In order to issue certificates for a custom domain you need to specify a DNS provider which is permitted to create DNS records for subdomains of your requested domain in the certificate. For example, if you request a certificate for host.example.com your DNS provider must be capable of managing subdomains of host.example.com.

DNS providers are normally specified in the shoot manifest. To learn more on how to configure one, please see the DNS provider documentation.

Issue a certificate

Your application should be able to reload replaced certificates in a timely manner to avoid service disruptions.

Certificates can be requested via 3 resources type

Ingress
Service (type LoadBalancer)
Gateways (both Istio gateways and from the Gateway API)
Certificate (Gardener CRD)

If either of the first 2 are used, a corresponding Certificate resource will be created automatically.

Using an Ingress Resource

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: amazing-ingress
  annotations:
    cert.gardener.cloud/purpose: managed
    # Optional but recommended, this is going to create the DNS entry at the same time
    dns.gardener.cloud/class: garden
    dns.gardener.cloud/ttl: "600"
    #cert.gardener.cloud/commonname: "*.example.com"              # optional, if not specified the first name from spec.tls[].hosts is used as common name
    #cert.gardener.cloud/dnsnames: ""                             # optional, if not specified the names from spec.tls[].hosts are used
    #cert.gardener.cloud/follow-cname: "true"                     # optional, same as spec.followCNAME in certificates
    #cert.gardener.cloud/secret-labels: "key1=value1,key2=value2" # optional labels for the certificate secret
    #cert.gardener.cloud/issuer: custom-issuer                    # optional to specify custom issuer (use namespace/name for shoot issuers)
    #cert.gardener.cloud/preferred-chain: "chain name"            # optional to specify preferred-chain (value is the Subject Common Name of the root issuer)
    #cert.gardener.cloud/private-key-algorithm: ECDSA             # optional to specify algorithm for private key, allowed values are 'RSA' or 'ECDSA'
    #cert.gardener.cloud/private-key-size: "384"                  # optional to specify size of private key, allowed values for RSA are "2048", "3072", "4096" and for ECDSA "256" and "384"

spec:
  tls:
  - hosts:
    # Must not exceed 64 characters.
    - amazing.example.com
    # Certificate and private key reside in this secret.
    secretName: tls-secret
  rules:
  - host: amazing.example.com
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: amazing-svc
            port:
              number: 8080

Replace the hosts and rules[].host value again with your own domain and adjust the remaining Ingress attributes in accordance with your deployment (e.g. the above is for an istio Ingress controller and forwards traffic to a service1 on port 80).

Using a Service of type LoadBalancer

apiVersion: v1
kind: Service
metadata:
  annotations:
    cert.gardener.cloud/secretname: tls-secret
    dns.gardener.cloud/dnsnames: example.example.com
    dns.gardener.cloud/class: garden
    # Optional
    dns.gardener.cloud/ttl: "600"
    cert.gardener.cloud/commonname: "*.example.example.com"
    cert.gardener.cloud/dnsnames: ""
    #cert.gardener.cloud/follow-cname: "true"                     # optional, same as spec.followCNAME in certificates
    #cert.gardener.cloud/secret-labels: "key1=value1,key2=value2" # optional labels for the certificate secret
    #cert.gardener.cloud/issuer: custom-issuer                    # optional to specify custom issuer (use namespace/name for shoot issuers)
    #cert.gardener.cloud/preferred-chain: "chain name"            # optional to specify preferred-chain (value is the Subject Common Name of the root issuer)
    #cert.gardener.cloud/private-key-algorithm: ECDSA             # optional to specify algorithm for private key, allowed values are 'RSA' or 'ECDSA'
    #cert.gardener.cloud/private-key-size: "384"                  # optional to specify size of private key, allowed values for RSA are "2048", "3072", "4096" and for ECDSA "256" and "384"
    
  name: test-service
  namespace: default
spec:
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 8080
  type: LoadBalancer

Using a Gateway resource

Please see Istio Gateways or Gateway API for details.

Using the custom Certificate resource

apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
  name: cert-example
  namespace: default
spec:
  commonName: amazing.example.com
  secretRef:
    name: tls-secret
    namespace: default
  # Optionnal if using the default issuer
  issuerRef:
    name: garden

  # If delegated domain for DNS01 challenge should be used. This has only an effect if a CNAME record is set for
  # '_acme-challenge.amazing.example.com'.
  # For example: If a CNAME record exists '_acme-challenge.amazing.example.com' => '_acme-challenge.writable.domain.com',
  # the DNS challenge will be written to '_acme-challenge.writable.domain.com'.
  #followCNAME: true

  # optionally set labels for the secret
  #secretLabels:
  #  key1: value1
  #  key2: value2

  # Optionally specify the preferred certificate chain: if the CA offers multiple certificate chains, prefer the chain with an issuer matching this Subject Common Name. If no match, the default offered chain will be used.
  #preferredChain: "ISRG Root X1"

  # Optionally specify algorithm and key size for private key. Allowed algorithms: "RSA" (allowed sizes: 2048, 3072, 4096) and "ECDSA" (allowed sizes: 256, 384)
  # If not specified, RSA with 2048 is used.
  #privateKey:
  #  algorithm: ECDSA
  #  size: 384

Supported attributes

Here is a list of all supported annotations regarding the certificate extension:

Path	Annotation	Value	Required	Description
N/A	`cert.gardener.cloud/purpose:`	`managed`	Yes when using annotations	Flag for Gardener that this specific Ingress or Service requires a certificate
`spec.commonName`	`cert.gardener.cloud/commonname:`	E.g. “*.demo.example.com” or “special.example.com”	Certificate and Ingress : No Service: Yes, if DNS names unset	Specifies for which domain the certificate request will be created. If not specified, the names from spec.tls[].hosts are used. This entry must comply with the 64 character limit.
`spec.dnsNames`	`cert.gardener.cloud/dnsnames:`	E.g. “special.example.com”	Certificate and Ingress : No Service: Yes, if common name unset	Additional domains the certificate should be valid for (Subject Alternative Name). If not specified, the names from spec.tls[].hosts are used. Entries in this list can be longer than 64 characters.
`spec.secretRef.name`	`cert.gardener.cloud/secretname:`	`any-name`	Yes for certificate and Service	Specifies the secret which contains the certificate/key pair. If the secret is not available yet, it’ll be created automatically as soon as the certificate has been issued.
`spec.issuerRef.name`	`cert.gardener.cloud/issuer:`	E.g. `gardener`	No	Specifies the issuer you want to use. Only necessary if you request certificates for custom domains.
N/A	`cert.gardener.cloud/revoked:`	`true` otherwise always false	No	Use only to revoke a certificate, see reference for more details
`spec.followCNAME`	`cert.gardener.cloud/follow-cname`	E.g. `true`	No	Specifies that the usage of a delegated domain for DNS challenges is allowed. Details see Follow CNAME.
`spec.preferredChain`	`cert.gardener.cloud/preferred-chain`	E.g. `ISRG Root X1`	No	Specifies the Common Name of the issuer for selecting the certificate chain. Details see Preferred Chain.
`spec.secretLabels`	`cert.gardener.cloud/secret-labels`	for annotation use e.g. `key1=value1,key2=value2`	No	Specifies labels for the certificate secret.
`spec.privateKey.algorithm`	`cert.gardener.cloud/private-key-algorithm`	`RSA`, `ECDSA`	No	Specifies algorithm for private key generation. The default value is depending on configuration of the extension (default of the default is `RSA`). You may request a new certificate without privateKey settings to find out the concrete defaults in your Gardener.
`spec.privateKey.size`	`cert.gardener.cloud/private-key-size`	`"256"`, `"384"`, `"2048"`, `"3072"`, `"4096"`	No	Specifies size for private key generation. Allowed values for `RSA` are `2048`, `3072`, and `4096`. For `ECDSA` allowed values are `256` and `384`. The default values are depending on the configuration of the extension (defaults of the default values are `3072` for `RSA` and `384` for `ECDSA` respectively).

Request a wildcard certificate

In order to avoid the creation of multiples certificates for every single endpoints, you may want to create a wildcard certificate for your shoot’s default cluster.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: amazing-ingress
  annotations:
    cert.gardener.cloud/purpose: managed
    cert.gardener.cloud/commonName: "*.example.com"
spec:
  tls:
  - hosts:
    - amazing.example.com
    secretName: tls-secret
  rules:
  - host: amazing.example.com
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: amazing-svc
            port:
              number: 8080

Please note that this can also be achived by directly adding an annotation to a Service type LoadBalancer. You could also create a Certificate object with a wildcard domain.

Using a custom Issuer

Most Gardener deployment with the certification extension enabled have a preconfigured garden issuer. It is also usually configured to use Let’s Encrypt as the certificate provider.

If you need a custom issuer for a specific cluster, please see Using a custom Issuer

Quotas

For security reasons there may be a default quota on the certificate requests per day set globally in the controller registration of the shoot-cert-service.

The default quota only applies if there is no explicit quota defined for the issuer itself with the field requestsPerDayQuota, e.g.:

kind: Shoot
...
spec:
  extensions:
  - type: shoot-cert-service
    providerConfig:
      apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
      kind: CertConfig
      issuers:
        - email: your-email@example.com
          name: custom-issuer # issuer name must be specified in every custom issuer request, must not be "garden"
          server: 'https://acme-v02.api.letsencrypt.org/directory'
          requestsPerDayQuota: 10

DNS Propagation

As stated before, cert-manager uses the ACME challenge protocol to authenticate that you are the DNS owner for the domain’s certificate you are requesting. This works by creating a DNS TXT record in your DNS provider under _acme-challenge.example.example.com containing a token to compare with. The TXT record is only applied during the domain validation. Typically, the record is propagated within a few minutes. But if the record is not visible to the ACME server for any reasons, the certificate request is retried again after several minutes. This means you may have to wait up to one hour after the propagation problem has been resolved before the certificate request is retried. Take a look in the events with kubectl describe ingress example for troubleshooting.

Character Restrictions

Due to restriction of the common name to 64 characters, you may to leave the common name unset in such cases.

For example, the following request is invalid:

apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
  name: cert-invalid
  namespace: default
spec:
  commonName: morethan64characters.ingress.shoot.project.default-domain.gardener.cloud

But it is valid to request a certificate for this domain if you have left the common name unset:

apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
  name: cert-example
  namespace: default
spec:
  dnsNames:
  - morethan64characters.ingress.shoot.project.default-domain.gardener.cloud

References

1.4 - Using a custom Issuer

How to define a custom issuer forma shoot cluster

Using a custom Issuer

Another possibility to request certificates for custom domains is a dedicated issuer.

Note: This is only needed if the default issuer provided by Gardener is restricted to shoot related domains or you are using domain names not visible to public DNS servers. Which means that your senario most likely doesn’t require your to add an issuer.

The custom issuers are specified normally in the shoot manifest. If the shootIssuers feature is enabled, it can alternatively be defined in the shoot cluster.

Custom issuer in the shoot manifest

kind: Shoot
...
spec:
  extensions:
  - type: shoot-cert-service
    providerConfig:
      apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
      kind: CertConfig
      issuers:
        - email: your-email@example.com
          name: custom-issuer # issuer name must be specified in every custom issuer request, must not be "garden"
          server: 'https://acme-v02.api.letsencrypt.org/directory'
          privateKeySecretName: my-privatekey # referenced resource, the private key must be stored in the secret at `data.privateKey` (optionally, only needed as alternative to auto registration) 
          #precheckNameservers: # to provide special set of nameservers to be used for prechecking DNSChallenges for an issuer
          #- dns1.private.company-net:53
          #- dns2.private.company-net:53" 
      #shootIssuers:
        # if true, allows to specify issuers in the shoot cluster
        #enabled: true 
  resources:
  - name: my-privatekey
    resourceRef:
      apiVersion: v1
      kind: Secret
      name: custom-issuer-privatekey # name of secret in Gardener project

If you are using an ACME provider for private domains, you may need to change the nameservers used for checking the availability of the DNS challenge’s TXT record before the certificate is requested from the ACME provider. By default, only public DNS servers may be used for this purpose. At least one of the precheckNameservers must be able to resolve the private domain names.

Using the custom issuer

To use the custom issuer in a certificate, just specify its name in the spec.

apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
spec:
  ...
  issuerRef:
    name: custom-issuer
  ...

For source resources like Ingress or Service use the cert.gardener.cloud/issuer annotation.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: amazing-ingress
  annotations:
    cert.gardener.cloud/purpose: managed
    cert.gardener.cloud/issuer: custom-issuer
...

Custom issuer in the shoot cluster

Prerequiste: The shootIssuers feature has to be enabled. It is either enabled globally in the ControllerDeployment or in the shoot manifest with:

kind: Shoot
...
spec:
  extensions:
  - type: shoot-cert-service
    providerConfig:
      apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
      kind: CertConfig
      shootIssuers:
        enabled: true # if true, allows to specify issuers in the shoot cluster
...

Example for specifying an Issuer resource and its Secret directly in any namespace of the shoot cluster:

apiVersion: cert.gardener.cloud/v1alpha1
kind: Issuer
metadata:
  name: my-own-issuer
  namespace: my-namespace
spec:
  acme:
    domains:
      include:
      - my.own.domain.com
    email: some.user@my.own.domain.com
    privateKeySecretRef:
      name: my-own-issuer-secret
      namespace: my-namespace
    server: https://acme-v02.api.letsencrypt.org/directory
---
apiVersion: v1
kind: Secret
metadata:
  name: my-own-issuer-secret
  namespace: my-namespace
type: Opaque
data:
  privateKey: ... # replace '...' with valus encoded as base64

Using the custom shoot issuer

To use the custom issuer in a certificate, just specify its name and namespace in the spec.

apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
spec:
  ...
  issuerRef:
    name: my-own-issuer
    namespace: my-namespace
  ...

For source resources like Ingress or Service use the cert.gardener.cloud/issuer annotation.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: amazing-ingress
  annotations:
    cert.gardener.cloud/purpose: managed
    cert.gardener.cloud/issuer: my-namespace/my-own-issuer
...

1.5 - Deployment

Gardener Certificate Management

Introduction

Gardener comes with an extension that enables shoot owners to request X.509 compliant certificates for shoot domains.

Extension Installation

The shoot-cert-service extension can be deployed and configured via Gardener’s native resource ControllerRegistration.

Prerequisites

To let the shoot-cert-service operate properly, you need to have:

a DNS service in your seed
contact details and optionally a private key for a pre-existing Let’s Encrypt account

Operator Extension

Using an operator extension resource (extension.operator.gardener.cloud) is the recommended way to deploy the shoot-cert-service extension.

An example of an operator extension resource can be found at extension-shoot-cert-service.yaml.

The ControllerRegistration contains a reference to the Helm chart which eventually deploy the shoot-cert-service to seed clusters. It offers some configuration options, mainly to set up a default issuer for shoot clusters. With a default issuer, pre-existing Let’s Encrypt accounts can be used and shared with shoot clusters (See Integration Guide - One Account or Many?).

Tip
Please keep the Let’s Encrypt Rate Limits in mind when using this shared account model. Depending on the amount of shoots and domains it is recommended to use an account with increased rate limits.

apiVersion: operator.gardener.cloud/v1alpha1
kind: Extension
metadata:
  annotations:
    security.gardener.cloud/pod-security-enforce: baseline
  name: extension-shoot-cert-service
spec:
  deployment:
    extension:
      helm:
        ociRepository:
          ref: ... # OCI reference to the Helm chart
      injectGardenKubeconfig: true
      policy: Always

      values:
        certificateConfig:
          defaultIssuer:
            name: default-issuer
            acme:
              email: some.user@example.com
              # optional private key for the Let's Encrypt account
              #privateKey: |-
              #-----BEGIN RSA PRIVATE KEY-----
              #...
              #-----END RSA PRIVATE KEY-----
              server: https://acme-v02.api.letsencrypt.org/directory
            # altenatively to `acme`, you can use a self-signed root or intermediate certificate for providing self-signed certificates
#           ca:
#             certificate: |
#               -----BEGIN CERTIFICATE-----
#               ...
#               -----END CERTIFICATE-----
#             certificateKey: |
#               -----BEGIN PRIVATE KEY-----
#               ...
#               -----END PRIVATE KEY-----

#          defaultRequestsPerDayQuota: 50

#          precheckNameservers: 8.8.8.8,8.8.4.4

#          caCertificates: | # optional custom CA certificates when using private ACME provider
#            -----BEGIN CERTIFICATE-----
#            ...
#            -----END CERTIFICATE-----
#            -----BEGIN CERTIFICATE-----
#            ...
#            -----END CERTIFICATE-----

        shootIssuers:
          enabled: false # if true, allows specifying issuers in the shoot clusters

        deactivateAuthorizations: true # if true, enables flag --acme-deactivate-authorizations in cert-controller-manager
        skipDNSChallengeValidation: false # if true, skips dns-challenges in cert-controller-manager

      # The following values are only needed if the extension should be deployed on the runtime cluster. 
      runtimeClusterValues:
        certificateConfig:
          defaultIssuer:
            # typically the same values as at .spec.deployment.values.certificateConfig.defaultIssuer
            ...
  resources:
  - autoEnable:
    - shoot # if set, the extension is enabled for all shoots by default
    clusterCompatibility:
    - shoot
    kind: Extension
    type: shoot-cert-service
    workerlessSupported: true
  - clusterCompatibility:
    - garden
    - seed
    kind: Extension
    lifecycle:
      delete: AfterKubeAPIServer
      reconcile: BeforeKubeAPIServer
    type: controlplane-cert-service

Providing Trusted TLS Certificate for Garden Runtime Cluster

The shoot-cert-service can provide the TLS secret labeled with gardener.cloud/role: garden-cert in the garden namespace to be used by the Gardener API server. See Trusted TLS Certificate for Garden Runtime Cluster for more information.

For this purpose, the extension must be deployed on the runtime cluster. Several configuration steps are needed:

Provide the spec.runtimeClusterValues values in the extension.operator.gardener.cloud resource in the operator extension.

Add the extension with type controlplane-cert-service to the Garden resource on the Garden runtime cluster.

apiVersion: operator.gardener.cloud/v1alpha1
kind: Garden
metadata:
  name: ...
spec:
  extensions:
  - type: controlplane-cert-service
    providerConfig:
      apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
      kind: CertConfig
      generateControlPlaneCertificate: true

If you only want to deploy the cert-controller-manager in the garden namespace, you can set generateControlPlaneCertificate to false.

If generateControlPlaneCertificate is set to true, the current values spec.virtualCluster.dns.domains and spec.runtimeCluster.ingress.domains from the Garden resource are read, and the certificate is created/updated in the garden namespace. The certificate will be reconciled by the cert-controller-manager to provide the TLS secret with the label gardener.cloud/role: garden-cert in the garden namespace. Additionally, the extension provides a webhook to mutate the virtual-garden-kube-apiserver deployment in the garden namespace. It will manage the --tls-sni-cert-key command line arguments and patches the volume/volume mount with the TLS secret. Only the secondary domain names will be used for the SNI certificate, as the self-signed certificate of the first domain name is provided by the Gardener operator.

Example for resulting domains of the certificate

If the Garden resource contains the following values:

apiVersion: operator.gardener.cloud/v1alpha1
kind: Garden
spec:
  runtimeCluster:
    ingress:
      domains:
      - name: "ingress.garden.example.com"
        provider: gardener-dns
  virtualCluster:
    dns:
      domains:
      - name: "primary.garden.example.com"
        provider: gardener-dns
      - name: "secondary.foo.example.com"
        provider: gardener-dns

then the Certificate will be created with these wildcard domain names:

*.primary.garden.example.com
*.secondary.foo.example.com
*.ingress.garden.example.com

Providing Trusted TLS Certificate for Shoot Control Planes

The shoot-cert-service can provide the TLS secret labeled with gardener.cloud/role: controlplane-cert in the garden namespace on the seeds. See Trusted TLS Certificate for Shoot Control Planes for more information.

For this purpose, the extension must be enabled for the seed(s) by adding the controlplane-cert-service type to the Seed manifest:

apiVersion: core.gardener.cloud/v1beta1
kind: Seed
metadata:
  name: ...
spec:
  extensions:
  - type: controlplane-cert-service
    providerConfig:
      apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
      kind: CertConfig
      generateControlPlaneCertificate: true

The certificate will contain the wildcard domain name using the base domain name from the .spec.ingress.domain value of the Seed resource.

ControllerRegistration

Note
Using a ControllerRegistration / ControllerDeployment directly is not recommended if your Gardener landscape has a virtual Garden cluster. In this case, please use an extension.operator.gardener.cloud resource as described above. The Gardener operator will then take care of the ControllerRegistration and ControllerDeployment for you.

An example of a ControllerRegistration for the shoot-cert-service can be found at controller-registration.yaml.

The ControllerRegistration contains a Helm chart which eventually deploy the shoot-cert-service to seed clusters. It offers some configuration options, mainly to set up a default issuer for shoot clusters. With a default issuer, pre-existing Let’s Encrypt accounts can be used and shared with shoot clusters (See “One Account or Many?” of the Integration Guide).

Please keep the Let’s Encrypt Rate Limits in mind when using this shared account model. Depending on the amount of shoots and domains it is recommended to use an account with increased rate limits.

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
...
  values:
    certificateConfig:
      defaultIssuer:
        acme:
            email: foo@example.com
            privateKey: |-
              -----BEGIN RSA PRIVATE KEY-----
              ...
              -----END RSA PRIVATE KEY-----
            server: https://acme-v02.api.letsencrypt.org/directory
        name: default-issuer
#       restricted: true # restrict default issuer to any sub-domain of shoot.spec.dns.domain

#     defaultRequestsPerDayQuota: 50

#     precheckNameservers: 8.8.8.8,8.8.4.4

#     caCertificates: | # optional custom CA certificates when using private ACME provider
#     -----BEGIN CERTIFICATE-----
#     ...
#     -----END CERTIFICATE-----
#
#     -----BEGIN CERTIFICATE-----
#     ...
#     -----END CERTIFICATE-----

      shootIssuers:
        enabled: false # if true, allows to specify issuers in the shoot clusters

Enablement

If the shoot-cert-service should be enabled for every shoot cluster in your Gardener managed environment, you need to auto enable it in the ControllerRegistration:

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
...
  resources:
  - kind: Extension
    type: shoot-cert-service
    autoEnable:
    - shoot # if set, the extension is enabled for all shoots by default
    clusterCompatibility:
    - shoot
    workerlessSupported: true

Alternatively, you’re given the option to only enable the service for certain shoots:

kind: Shoot
apiVersion: core.gardener.cloud/v1beta1
...
spec:
  extensions:
  - type: shoot-cert-service
...

1.6 - Gardener yourself a Shoot with Istio, custom Domains, and Certificates

As we ramp up more and more friends of Gardener, I thought it worthwhile to explore and write a tutorial about how to simply:

create a Gardener managed Kubernetes Cluster (Shoot) via kubectl
install Istio as a preferred, production ready Ingress/Service Mesh (instead of the Nginx Ingress addon)
attach your own custom domain to be managed by Gardener
combine everything with certificates from Let’s Encrypt

Here are some pre-pointers that you will need to go deeper:

Tip

If you try my instructions and fail, then read the alternative title of this tutorial as “Shoot yourself in the foot with Gardener, custom Domains, Istio and Certificates”.

First Things First

Login to your Gardener landscape, setup a project with adequate infrastructure credentials and then navigate to your account. Note down the name of your secret. I chose the GCP infrastructure from the vast possible options that my Gardener provides me with, so i had named the secret as shoot-operator-gcp.

From the Access widget (leave the default settings) download your personalized kubeconfig into ~/.kube/kubeconfig-garden-myproject. Follow the instructions to setup kubelogin:

access

For convinience, let us set an alias command with

alias kgarden="kubectl --kubeconfig ~/.kube/kubeconfig-garden-myproject.yaml"

kgarden now gives you all botanical powers and connects you directly with your Gardener.

You should now be able to run kgarden get shoots, automatically get an oidc token, and list already running clusters/shoots.

Prepare your Custom Domain

I am going to use Cloud Flare as programmatic DNS of my custom domain mydomain.io. Please follow detailed instructions from Cloud Flare on how to delegate your domain (the free account does not support delegating subdomains). Alternatively, AWS Route53 (and most others) support delegating subdomains.

I needed to follow these instructions and created the following secret:

apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-mydomain-io
type: Opaque
data:
  CLOUDFLARE_API_TOKEN: useYOURownDAMITzNDU2Nzg5MDEyMzQ1Njc4OQ==

Apply this secret into your project with kgarden create -f cloudflare-mydomain-io.yaml.

Our External DNS Manager also supports Amazon Route53, Google CloudDNS, AliCloud DNS, Azure DNS, or OpenStack Designate. Check it out.

Prepare Gardener Extensions

I now need to prepare the Gardener extensions shoot-dns-service and shoot-cert-service and set the parameters accordingly.

Note
Please note, that the availability of Gardener Extensions depends on how your administrator has configured the Gardener landscape. Please contact your Gardener administrator in case you experience any issues during activation.

The following snippet allows Gardener to manage my entire custom domain, whereas with the include: attribute I restrict all dynamic entries under the subdomain gsicdc.mydomain.io:

  dns:
    providers:
      - domains:
          include:
            - gsicdc.mydomain.io
        primary: false
        secretName: cloudflare-mydomain-io
        type: cloudflare-dns
  extensions:
    - type: shoot-dns-service

The next snipplet allows Gardener to manage certificates automatically from Let’s Encrypt on mydomain.io for me:

  extensions:
    - type: shoot-cert-service
      providerConfig:
        apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
        issuers:
          - email: me@mail.com
            name: mydomain
            server: 'https://acme-v02.api.letsencrypt.org/directory'
          - email: me@mail.com
            name: mydomain-staging
            server: 'https://acme-staging-v02.api.letsencrypt.org/directory'

Note
Adjust the snipplets with your parameters (don’t forget your email). And please use the mydomain-staging issuer while you are testing and learning. Otherwise, Let’s Encrypt will rate limit your frequent requests and you can wait a week until you can continue.

References for Let’s Encrypt:

Create the Gardener Shoot Cluster

Remember I chose to create the Shoot on GCP, so below is the simplest declarative shoot or cluster order document. Notice that I am referring to the infrastructure credentials with shoot-operator-gcp and I combined the above snippets into the yaml file:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: gsicdc
spec:
  dns:
    providers:
    - domains:
        include:
          - gsicdc.mydomain.io
      primary: false
      secretName: cloudflare-mydomain-io
      type: cloudflare-dns
  extensions:
  - type: shoot-dns-service
  - type: shoot-cert-service
    providerConfig:
      apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
      issuers:
        - email: me@mail.com
          name: mydomain
          server: 'https://acme-v02.api.letsencrypt.org/directory'
        - email: me@mail.com
          name: mydomain-staging
          server: 'https://acme-staging-v02.api.letsencrypt.org/directory'
  cloudProfileName: gcp
  kubernetes:
    version: 1.28.2
  maintenance:
    autoUpdate:
      kubernetesVersion: true
      machineImageVersion: true
  networking:
    nodes: 10.250.0.0/16
    pods: 100.96.0.0/11
    services: 100.64.0.0/13
    type: calico
  provider:
    controlPlaneConfig:
      apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1
      kind: ControlPlaneConfig
      zone: europe-west1-d
    infrastructureConfig:
      apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1
      kind: InfrastructureConfig
      networks:
        workers: 10.250.0.0/16
    type: gcp
    workers:
    - machine:
        image:
          name: gardenlinux
          version: 576.9.0
        type: n1-standard-2
      maxSurge: 1
      maxUnavailable: 0
      maximum: 2
      minimum: 1
      name: my-workerpool
      volume:
        size: 50Gi
        type: pd-standard
      zones:
      - europe-west1-d
  purpose: testing
  region: europe-west1
  secretBindingName: shoot-operator-gcp

Create your cluster and wait for it to be ready (about 5 to 7min).

$ kgarden create -f gsicdc.yaml
shoot.core.gardener.cloud/gsicdc created

$ kgarden get shoot gsicdc --watch
NAME     CLOUDPROFILE   VERSION   SEED   DOMAIN                                        HIBERNATION   OPERATION    PROGRESS   APISERVER     CONTROL       NODES     SYSTEM    AGE
gsicdc   gcp            1.28.2    gcp    gsicdc.myproject.shoot.devgarden.cloud   Awake         Processing   38         Progressing   Progressing   Unknown   Unknown   83s
...
gsicdc   gcp            1.28.2    gcp    gsicdc.myproject.shoot.devgarden.cloud   Awake         Succeeded    100        True          True          True          False         6m7s

Get access to your freshly baked cluster and set your KUBECONFIG:

$ kgarden get secrets gsicdc.kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d >kubeconfig-gsicdc.yaml

$ export KUBECONFIG=$(pwd)/kubeconfig-gsicdc.yaml
$ kubectl get all
NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   100.64.0.1   <none>        443/TCP   89m

Install Istio

Please follow the Istio installation instructions and download istioctl. If you are on a Mac, I recommend:

brew install istioctl

I want to install Istio with a default profile and SDS enabled. Furthermore I pass the following annotations to the service object istio-ingressgateway in the istio-system namespace.

  annotations:
    cert.gardener.cloud/issuer: mydomain-staging
    cert.gardener.cloud/secretname: wildcard-tls
    dns.gardener.cloud/class: garden
    dns.gardener.cloud/dnsnames: "*.gsicdc.mydomain.io"
    dns.gardener.cloud/ttl: "120"

With these annotations three things now happen automatically:

The External DNS Manager, provided to you as a service (dns.gardener.cloud/class: garden), picks up the request and creates the wildcard DNS entry *.gsicdc.mydomain.io with a time to live of 120sec at your DNS provider. My provider Cloud Flare is very very quick (as opposed to some other services). You should be able to verify the entry with dig lovemygardener.gsicdc.mydomain.io within seconds.
The Certificate Management picks up the request as well and initiates a DNS01 protocol exchange with Let’s Encrypt; using the staging environment referred to with the issuer behind mydomain-staging.
After aproximately 70sec (give and take) you will receive the wildcard certificate in the wildcard-tls secret in the namespace istio-system.

Note
Notice, that the namespace for the certificate secret is often the cause of many troubleshooting sessions: the secret must reside in the same namespace of the gateway.

Here is the istio-install script:

$ export domainname="*.gsicdc.mydomain.io"
$ export issuer="mydomain-staging"

$ cat <<EOF | istioctl install -y -f -
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  profile: default
  components:
    ingressGateways:
    - name: istio-ingressgateway
      enabled: true
      k8s:
        serviceAnnotations:
          cert.gardener.cloud/issuer: "${issuer}"
          cert.gardener.cloud/secretname: wildcard-tls
          dns.gardener.cloud/class: garden
          dns.gardener.cloud/dnsnames: "${domainname}"
          dns.gardener.cloud/ttl: "120"
EOF

Verify that setup is working and that DNS and certificates have been created/delivered:

$ kubectl -n istio-system describe service istio-ingressgateway
<snip>
Events:
  Type    Reason                Age                From                     Message
  ----    ------                ----               ----                     -------
  Normal  EnsuringLoadBalancer  58s                service-controller       Ensuring load balancer
  Normal  reconcile             58s                cert-controller-manager  created certificate object istio-system/istio-ingressgateway-service-pwqdm
  Normal  cert-annotation       58s                cert-controller-manager  wildcard-tls: cert request is pending
  Normal  cert-annotation       54s                cert-controller-manager  wildcard-tls: certificate pending: certificate requested, preparing/waiting for successful DNS01 challenge
  Normal  cert-annotation       28s                cert-controller-manager  wildcard-tls: certificate ready
  Normal  EnsuredLoadBalancer   26s                service-controller       Ensured load balancer
  Normal  reconcile             26s                dns-controller-manager   created dns entry object shoot--core--gsicdc/istio-ingressgateway-service-p9qqb
  Normal  dns-annotation        26s                dns-controller-manager   *.gsicdc.mydomain.io: dns entry is pending
  Normal  dns-annotation        21s (x3 over 21s)  dns-controller-manager   *.gsicdc.mydomain.io: dns entry active

$ dig lovemygardener.gsicdc.mydomain.io

; <<>> DiG 9.10.6 <<>> lovemygardener.gsicdc.mydomain.io
<snip>
;; ANSWER SECTION:
lovemygardener.gsicdc.mydomain.io. 120 IN A	35.195.120.62
<snip>

There you have it, the wildcard-tls certificate is ready and the *.gsicdc.mydomain.io dns entry is active. Traffic will be going your way.

Handy Tools to Install

Another set of fine tools to use are kapp (formerly known as k14s), k9s and HTTPie. While we are at it, let’s install them all. If you are on a Mac, I recommend:

brew tap vmware-tanzu/carvel
brew install ytt kbld kapp kwt imgpkg vendir
brew install derailed/k9s/k9s
brew install httpie

Ingress at Your Service

Note
Networking is a central part of Kubernetes, but it can be challenging to understand exactly how it is expected to work. You should learn about Kubernetes networking, and first try to debug problems yourself. With a solid managed cluster from Gardener, it is always PEBCAK!

Kubernetes Ingress is a subject that is evolving to much broader standard. Please watch Evolving the Kubernetes Ingress APIs to GA and Beyond for a good introduction. In this example, I did not want to use the Kubernetes Ingress compatibility option of Istio. Instead, I used VirtualService and Gateway from the Istio’s API group networking.istio.io/v1 directly, and enabled istio-injection generically for the namespace.

I use httpbin as service that I want to expose to the internet, or where my ingress should be routed to (depends on your point of view, I guess).

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    istio-injection: enabled
---
apiVersion: v1
kind: Service
metadata:
  name: httpbin
  namespace: production
  labels:
    app: httpbin
spec:
  ports:
  - name: http
    port: 8000
    targetPort: 80
  selector:
    app: httpbin
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin
  namespace: production
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      labels:
        app: httpbin
    spec:
      containers:
      - image: docker.io/kennethreitz/httpbin
        imagePullPolicy: IfNotPresent
        name: httpbin
        ports:
        - containerPort: 80
---
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: httpbin-gw
  namespace: production
spec:
  selector:
    istio: ingressgateway #! use istio default ingress gateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    tls:
      httpsRedirect: true
    hosts:
    - "httpbin.gsicdc.mydomain.io"
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: wildcard-tls
    hosts:
    - "httpbin.gsicdc.mydomain.io"
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: httpbin-vs
  namespace: production
spec:
  hosts:
  - "httpbin.gsicdc.mydomain.io"
  gateways:
  - httpbin-gw
  http:
  - match:
    - uri:
        regex: /.*
    route:
    - destination:
        port:
          number: 8000
        host: httpbin
---

Let us now deploy the whole package of Kubernetes primitives using kapp:

$ kapp deploy -a httpbin -f httpbin-kapp.yaml
Target cluster 'https://api.gsicdc.myproject.shoot.devgarden.cloud' (nodes: shoot--myproject--gsicdc-my-workerpool-z1-6586c8f6cb-x24kh)

Changes

Namespace   Name        Kind            Conds.  Age  Op      Wait to    Rs  Ri
(cluster)   production  Namespace       -       -    create  reconcile  -   -
production  httpbin     Deployment      -       -    create  reconcile  -   -
^           httpbin     Service         -       -    create  reconcile  -   -
^           httpbin-gw  Gateway         -       -    create  reconcile  -   -
^           httpbin-vs  VirtualService  -       -    create  reconcile  -   -

Op:      5 create, 0 delete, 0 update, 0 noop
Wait to: 5 reconcile, 0 delete, 0 noop

Continue? [yN]: y

5:36:31PM: ---- applying 1 changes [0/5 done] ----
<snip>
5:37:00PM: ok: reconcile deployment/httpbin (apps/v1) namespace: production
5:37:00PM: ---- applying complete [5/5 done] ----
5:37:00PM: ---- waiting complete [5/5 done] ----

Succeeded

Let’s finally test the service (Of course you can use the browser as well):

$ http httpbin.gsicdc.mydomain.io
HTTP/1.1 301 Moved Permanently
content-length: 0
date: Wed, 13 May 2020 21:29:13 GMT
location: https://httpbin.gsicdc.mydomain.io/
server: istio-envoy

$ curl -k https://httpbin.gsicdc.mydomain.io/ip
{
    "origin": "10.250.0.2"
}

Quod erat demonstrandum. The proof of exchanging the issuer is now left to the reader.

Tip

Remember that the certificate is actually not valid because it is issued from the Let’s encrypt staging environment. Thus, we needed “curl -k” or “http –verify no”.

Hint: use the interactive k9s tool. k9s

Cleanup

Remove the cloud native application:

$ kapp ls
Apps in namespace 'default'

Name     Namespaces            Lcs   Lca
httpbin  (cluster),production  true  17m

$ kapp delete -a httpbin
...
Continue? [yN]: y
...
11:47:47PM: ---- waiting complete [8/8 done] ----

Succeeded

Remove Istio:

$ istioctl x uninstall --purge
clusterrole.rbac.authorization.k8s.io "prometheus-istio-system" deleted
clusterrolebinding.rbac.authorization.k8s.io "prometheus-istio-system" deleted
...

Delete your Shoot:

kgarden annotate shoot gsicdc confirmation.gardener.cloud/deletion=true --overwrite
kgarden delete shoot gsicdc --wait=false

1.7 - Gateway Api Gateways

Using annotated Gateway API Gateway and/or HTTPRoutes as Source

This tutorial describes how to use annotated Gateway API resources as source for Certificate.

Install Istio on your cluster

Follow the Istio Kubernetes Gateway API to install the Gateway API and to install Istio.

These are the typical commands for the Istio installation with the Kubernetes Gateway API:

export KUEBCONFIG=...
curl -L https://istio.io/downloadIstio | sh -
kubectl get crd gateways.gateway.networking.k8s.io &> /dev/null || \
  { kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd?ref=v1.0.0" | kubectl apply -f -; }
istioctl install --set profile=minimal -y
kubectl label namespace default istio-injection=enabled

Verify that Gateway Source works

Install a sample service

With automatic sidecar injection:

$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin.yaml

Note: The sample service is not used in the following steps. It is deployed for illustration purposes only. To use it with certificates, you have to add an HTTPS port for it.

Using a Gateway as a source

Deploy the Gateway API configuration including a single exposed route (i.e., /get):

kubectl create namespace istio-ingress
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: gateway
  namespace: istio-ingress
  annotations:
    #cert.gardener.cloud/dnsnames: "*.example.com" # alternative if you want to control the dns names explicitly.
    cert.gardener.cloud/purpose: managed
spec:
  gatewayClassName: istio
  listeners:
  - name: default
    hostname: "*.example.com"  # this is used by cert-controller-manager to extract DNS names
    port: 443
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: All
    tls:                       # important: tls section must be defined with exactly one certificateRefs item
      certificateRefs:
        - name: foo-example-com
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: http
  namespace: default
spec:
  parentRefs:
  - name: gateway
    namespace: istio-ingress
  hostnames: ["httpbin.example.com"]  # this is used by cert-controller-manager to extract DNS names too
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /get
    backendRefs:
    - name: httpbin
      port: 8000
EOF

You should now see a created Certificate resource similar to:

$ kubectl -n istio-ingress get cert -oyaml
apiVersion: v1
items:
- apiVersion: cert.gardener.cloud/v1alpha1
  kind: Certificate
  metadata:
    generateName: gateway-gateway-
    name: gateway-gateway-kdw6h
    namespace: istio-ingress
    ownerReferences:
    - apiVersion: gateway.networking.k8s.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Gateway
      name: gateway
  spec:
    commonName: '*.example.com'
    secretName: foo-example-com
  status:
    ...
kind: List
metadata:
  resourceVersion: ""

Using a HTTPRoute as a source

If the Gateway resource is annotated with cert.gardener.cloud/purpose: managed, hostnames from all referencing HTTPRoute resources are automatically extracted. These resources don’t need an additional annotation.

Deploy the Gateway API configuration including a single exposed route (i.e., /get):

kubectl create namespace istio-ingress
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: gateway
  namespace: istio-ingress
  annotations:
    cert.gardener.cloud/purpose: managed
spec:
  gatewayClassName: istio
  listeners:
  - name: default
    hostname: null  # not set 
    port: 443
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: All
    tls:                            # important: tls section must be defined with exactly one certificateRefs item
      certificateRefs:
        - name: foo-example-com
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: http
  namespace: default
spec:
  parentRefs:
  - name: gateway
    namespace: istio-ingress
  hostnames: ["httpbin.example.com"]  # this is used by dns-controller-manager to extract DNS names too
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /get
    backendRefs:
    - name: httpbin
      port: 8000
EOF

This should show a similar Certificate resource as above.

1.8 - Istio Gateways

Using annotated Istio Gateway and/or Istio Virtual Service as Source

This tutorial describes how to use annotated Istio Gateway resources as source for Certificate resources.

Install Istio on your cluster

Follow the Istio Getting Started to download and install Istio.

These are the typical commands for the istio demo installation

export KUEBCONFIG=...
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=demo -y
kubectl label namespace default istio-injection=enabled

Note: If you are using a KinD cluster, the istio-ingressgateway service may be pending forever.

$ kubectl -n istio-system get svc istio-ingressgateway
NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                                                                      AGE
istio-ingressgateway   LoadBalancer   10.96.88.189   <pending>     15021:30590/TCP,80:30185/TCP,443:30075/TCP,31400:30129/TCP,15443:30956/TCP   13m

In this case, you may patch the status for demo purposes (of course it still would not accept connections)

kubectl -n istio-system patch svc istio-ingressgateway --type=merge --subresource status --patch '{"status":{"loadBalancer":{"ingress":[{"ip":"1.2.3.4"}]}}}'

Verify that Istio Gateway/VirtualService Source works

Install a sample service

With automatic sidecar injection:

$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin.yaml

Using a Gateway as a source

Create an Istio Gateway:

$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: httpbin-gateway
  namespace: istio-system
  annotations:
    #cert.gardener.cloud/dnsnames: "*.example.com" # alternative if you want to control the dns names explicitly.
    cert.gardener.cloud/purpose: managed
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - port:
      number: 443
      name: http
      protocol: HTTPS
    hosts:
    - "httpbin.example.com" # this is used by the dns-controller-manager to extract DNS names
    tls:
      credentialName: my-tls-secret
EOF

You should now see a created Certificate resource similar to:

$ kubectl -n istio-system get cert -oyaml
apiVersion: v1
items:
- apiVersion: cert.gardener.cloud/v1alpha1
  kind: Certificate
  metadata:
    generateName: httpbin-gateway-gateway-
    name: httpbin-gateway-gateway-hdbjb
    namespace: istio-system
    ownerReferences:
    - apiVersion: networking.istio.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Gateway
      name: httpbin-gateway
  spec:
    commonName: httpbin.example.com
    secretName: my-tls-secret
  status:
    ...
kind: List
metadata:
  resourceVersion: ""

Using a VirtualService as a source

If the Gateway resource is annotated with cert.gardener.cloud/purpose: managed, hosts from all referencing VirtualServices resources are automatically extracted. These resources don’t need an additional annotation.

Create an Istio Gateway:

$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: httpbin-gateway
  namespace: istio-system
  annotations:
    cert.gardener.cloud/purpose: managed
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - "*"
    tls:
      credentialName: my-tls-secret    
EOF

Configure routes for traffic entering via the Gateway:

$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: httpbin
  namespace: default  
spec:
  hosts:
  - "httpbin.example.com" # this is used by dns-controller-manager to extract DNS names
  gateways:
  - istio-system/httpbin-gateway
  http:
  - match:
    - uri:
        prefix: /status
    - uri:
        prefix: /delay
    route:
    - destination:
        port:
          number: 8000
        host: httpbin
EOF

This should show a similar Certificate resource as above.

2 - Access Control List Extension

Gardener Access Control List Extension

An Access Control List (ACL) is a list of permissions attached to an object, specifying which users or system processes are granted or denied access to that object.

Gardener utilizes the ACL extension to control access to shoot clusters using an allow-list mechanism, allowing you to specify which IP addresses or CIDR blocks are allowed to access the Kubernetes API and other services in the shoot cluster.

This extension supports multiple ingress namespaces, enhancing its flexibility for various deployment scenarios. The extension leverages Istio’s envoy proxy to insert additional configuration that limits access to specific clusters, impacting different external traffic flows such as Kubernetes API listeners, service listeners, and VPN listeners.

The extension’s functionality and limitations are intricately managed to handle different access types effectively, ensuring robust security for Kubernetes API servers.

Note
The ACL extension is not part of Gardener and is maintained by STACKIT in their own GitHub organization.

You can find more information in the Gardener ACL Extension repository.

3 - DNS Services

Gardener extension controller for DNS services for shoot clusters

Gardener Extension for DNS services

Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.

Extension-Resources

Example extension resource:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
  name: "extension-dns-service"
  namespace: shoot--project--abc
spec:
  type: shoot-dns-service

How to start using or developing this extension controller locally

Feedback and Support

Feedback and contributions are always welcome!

Please report bugs or suggestions as GitHub issues or reach out on Slack (join the workspace here).

Learn more!

Please find further resources about out project here:

3.1 - Deployment

Gardener DNS Management for Shoots

Introduction

Gardener allows Shoot clusters to request DNS names for Ingresses and Services out of the box. To support this the gardener must be installed with the shoot-dns-service extension. This extension uses the seed’s dns management infrastructure to maintain DNS names for shoot clusters. So, far only the external DNS domain of a shoot (already used for the kubernetes api server and ingress DNS names) can be used for managed DNS names.

Operator Extension

Using an operator extension resource (extension.operator.gardener.cloud) is the recommended way to deploy the shoot-dns-service extension.

An example of an operator extension resource can be found at extension-shoot-dns-service.yaml.

It is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot. To enable the extension for all shoots, the autoEnable field must be set to [shoot] in the Extension resource.

apiVersion: operator.gardener.cloud/v1alpha1
kind: Extension
metadata:
  annotations:
    security.gardener.cloud/pod-security-enforce: baseline
  name: extension-shoot-dns-service
spec:
  deployment:
    admission:
      runtimeCluster:
        helm:
          ociRepository:
            ref: ... # OCI reference to the Helm chart
      virtualCluster:
        helm:
          ociRepository:
            ref: ... # OCI reference to the Helm chart
    extension:
      helm:
        ociRepository:
          ref: ... # OCI reference to the Helm chart

  resources:
  - autoEnable:
    - shoot # if set, the extension is enabled for all shoots by default
    clusterCompatibility:
    - shoot
    kind: Extension
    type: shoot-dns-service
    workerlessSupported: true

Providing Base Domains usable for a Shoot

So, far only the external DNS domain of a shoot already used for the kubernetes api server and ingress DNS names can be used for managed DNS names. This is either the shoot domain as subdomain of the default domain configured for the gardener installation, or a dedicated domain with dedicated access credentials configured for a dedicated shoot via the shoot manifest.

Alternatively, you can specify DNSProviders and its credentials Secret directly in the shoot, if this feature is enabled. By default, DNSProvider replication is disabled, but it can be enabled globally in the ControllerDeployment or for a shoot cluster in the shoot manifest (details see further below).

apiVersion: operator.gardener.cloud/v1alpha1
kind: Extension
metadata:
  name: extension-shoot-dns-service
spec:
  extension:
    values:
      dnsProviderReplication:
        enabled: true

Shoot Feature Gate

If the shoot DNS feature is not globally enabled by default (depends on the extension registration on the garden cluster), it must be enabled per shoot.

To enable the feature for a shoot, the shoot manifest must explicitly add the shoot-dns-service extension.

...
spec:
  extensions:
    - type: shoot-dns-service
...

Enable/disable DNS provider replication for a shoot

The DNSProvider` replication feature enablement can be overwritten in the shoot manifest, e.g.

Kind: Shoot
...
spec:
  extensions:
    - type: shoot-dns-service
      providerConfig:
        apiVersion: service.dns.extensions.gardener.cloud/v1alpha1
        kind: DNSConfig
        dnsProviderReplication:
          enabled: true
...

3.2 - DNS Names

Request DNS Names in Shoot Clusters

Introduction

Within a shoot cluster, it is possible to request DNS records via the following resource types:

It is necessary that the Gardener installation your shoot cluster runs in is equipped with a shoot-dns-service extension. This extension uses the seed’s dns management infrastructure to maintain DNS names for shoot clusters. Please ask your Gardener operator if the extension is available in your environment.

Shoot Feature Gate

In some Gardener setups the shoot-dns-service extension is not enabled globally and thus must be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to activate the extension individually.

kind: Shoot
...
spec:
  extensions:
    - type: shoot-dns-service
...

Before you start

You should :

Have created a shoot cluster
Have created and correctly configured a DNS Provider (Please consult this page for more information)
Have a basic understanding of DNS (see link under References)

There are 2 types of DNS that you can use within Kubernetes :

internal (usually managed by coreDNS)
external (managed by a public DNS provider).

This page, and the extension, exclusively works for external DNS handling.

Gardener allows 2 way of managing your external DNS:

Manually, which means you are in charge of creating / maintaining your Kubernetes related DNS entries
Via the Gardener DNS extension

Gardener DNS extension

The managed external DNS records feature of the Gardener clusters makes all this easier. You do not need DNS service provider specific knowledge, and in fact you do not need to leave your cluster at all to achieve that. You simply annotate the Ingress / Service that needs its DNS records managed and it will be automatically created / managed by Gardener.

Managed external DNS records are supported with the following DNS provider types:

aws-route53
azure-dns
azure-private-dns
google-clouddns
openstack-designate
alicloud-dns
cloudflare-dns

Request DNS records for Ingress resources

To request a DNS name for Ingress, Service or Gateway (Istio or Gateway API) objects in the shoot cluster it must be annotated with the DNS class garden and an annotation denoting the desired DNS names.

Example for an annotated Ingress resource:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: amazing-ingress
  annotations:
    # Let Gardener manage external DNS records for this Ingress.
    dns.gardener.cloud/dnsnames: special.example.com # Use "*" to collects domains names from .spec.rules[].host
    dns.gardener.cloud/ttl: "600"
    dns.gardener.cloud/class: garden
    # If you are delegating the certificate management to Gardener, uncomment the following line
    #cert.gardener.cloud/purpose: managed
spec:
  rules:
  - host: special.example.com
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: amazing-svc
            port:
              number: 8080
  # Uncomment the following part if you are delegating the certificate management to Gardener
  #tls:
  #  - hosts:
  #      - special.example.com
  #    secretName: my-cert-secret-name

For an Ingress, the DNS names are already declared in the specification. Nevertheless the dnsnames annotation must be present. Here a subset of the DNS names of the ingress can be specified. If DNS names for all names are desired, the value all can be used.

Keep in mind that ingress resources are ignored unless an ingress controller is set up. Gardener does not provide an ingress controller by default. For more details, see Ingress Controllers and Service in the Kubernetes documentation.

Request DNS records for service type LoadBalancer

Example for an annotated Service (it must have the type LoadBalancer) resource:

apiVersion: v1
kind: Service
metadata:
  name: amazing-svc
  annotations:
    # Let Gardener manage external DNS records for this Service.
    dns.gardener.cloud/dnsnames: special.example.com
    dns.gardener.cloud/ttl: "600"
    dns.gardener.cloud/class: garden
spec:
  selector:
    app: amazing-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: LoadBalancer

Request DNS records for Gateway resources

Please see Istio Gateways or Gateway API for details.

Creating a DNSEntry resource explicitly

It is also possible to create a DNS entry via the Kubernetes resource called DNSEntry:

apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSEntry
metadata:
  annotations:
    # Let Gardener manage this DNS entry.
    dns.gardener.cloud/class: garden
  name: special-dnsentry
  namespace: default
spec:
  dnsName: special.example.com
  ttl: 600
  targets:
  - 1.2.3.4

If one of the accepted DNS names is a direct subname of the shoot’s ingress domain, this is already handled by the standard wildcard entry for the ingress domain. Therefore this name should be excluded from the dnsnames list in the annotation. If only this DNS name is configured in the ingress, no explicit DNS entry is required, and the DNS annotations should be omitted at all.

You can check the status of the DNSEntry with

$ kubectl get dnsentry
NAME          DNS                                                            TYPE          PROVIDER      STATUS    AGE
mydnsentry    special.example.com     aws-route53   default/aws   Ready     24s

As soon as the status of the entry is Ready, the provider has accepted the new DNS record. Depending on the provider and your DNS settings and cache, it may take up to 24 hours for the new entry to be propagated over all internet.

More examples can be found here

Request DNS records for Service/Ingress resources using a DNSAnnotation resource

In rare cases it may not be possible to add annotations to a Service or Ingress resource object.

E.g.: the helm chart used to deploy the resource may not be adaptable for some reasons or some automation is used, which always restores the original content of the resource object by dropping any additional annotations.

In these cases, it is recommended to use an additional DNSAnnotation resource in order to have more flexibility that DNSentry resources. The DNSAnnotation resource makes the DNS shoot service behave as if annotations have been added to the referenced resource.

For the Ingress example shown above, you can create a DNSAnnotation resource alternatively to provide the annotations.

apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSAnnotation
metadata:
  annotations:
    dns.gardener.cloud/class: garden
  name: test-ingress-annotation
  namespace: default
spec:
  resourceRef:
    kind: Ingress
    apiVersion: networking.k8s.io/v1
    name: test-ingress
    namespace: default
  annotations:
    dns.gardener.cloud/dnsnames: '*'
    dns.gardener.cloud/class: garden

Note that the DNSAnnotation resource itself needs the dns.gardener.cloud/class=garden annotation. This also only works for annotations known to the DNS shoot service (see Accepted External DNS Records Annotations).

For more details, see also DNSAnnotation objects

Accepted External DNS Records Annotations

Here are all the accepted annotations related to the DNS extension:

Annotation	Description
dns.gardener.cloud/dnsnames	Mandatory for service and ingress resources, accepts a comma-separated list of DNS names if multiple names are required. For ingress you can use the special value `'*'`. In this case, the DNS names are collected from `.spec.rules[].host`.
dns.gardener.cloud/class	Mandatory, in the context of the shoot-dns-service it must always be set to `garden`.
dns.gardener.cloud/ttl	Recommended, overrides the default Time-To-Live of the DNS record.
dns.gardener.cloud/cname-lookup-interval	Only relevant if multiple domain name targets are specified. It specifies the lookup interval for CNAMEs to map them to IP addresses (in seconds)
dns.gardener.cloud/realms	Internal, for restricting provider access for shoot DNS entries. Typcially not set by users of the shoot-dns-service.
dns.gardener.cloud/ip-stack	Only relevant for provider type `aws-route53` if target is an AWS load balancer domain name. Can be set for service, ingress and DNSEntry resources. It specify which DNS records with alias targets are created instead of the usual `CNAME` records. If the annotation is not set (or has the value `ipv4`), only an `A` record is created. With value `dual-stack`, both `A` and `AAAA` records are created. With value `ipv6` only an `AAAA` record is created.
dns.gardener.cloud/ignore	Optional, with the possible values: `true`, `reconcile`, or `full`. For values `true` and `reconcile`, the reconciliation is skipped. `true` is an alias for `reconcile`. For value `full` both reconciliation and deletion operations are skipped.
dns.gardener.cloud/target-hard-ignore	Optional, for a generated target `DNSEntry` to ignore it on reconciliation. It is not propagated from source objects to the target `DNSEntry`. Important: The entry is even ignored on deletion, so use with caution to avoid orphaned entries.
service.beta.kubernetes.io/aws-load-balancer-ip-address-type=dualstack	For services, behaves similar to `dns.gardener.cloud/ip-stack=dual-stack`.
loadbalancer.openstack.org/load-balancer-address	Internal, for services only: support for PROXY protocol on Openstack (which needs a hostname as ingress). Typcially not set by users of the shoot-dns-service.

If one of the accepted DNS names is a direct subdomain of the shoot’s ingress domain, this is already handled by the standard wildcard entry for the ingress domain. Therefore, this name should be excluded from the dnsnames list in the annotation. If only this DNS name is configured in the ingress, no explicit DNS entry is required, and the DNS annotations should be omitted at all.

Troubleshooting

General DNS tools

To check the DNS resolution, use the nslookup or dig command.

$ nslookup special.your-domain.com

or with dig

$ dig +short special.example.com
Depending on your network settings, you may get a successful response faster using a public DNS server (e.g. 8.8.8.8, 8.8.4.4, or 1.1.1.1)

dig @8.8.8.8 +short special.example.com

DNS record events

The DNS controller publishes Kubernetes events for the resource which requested the DNS record (Ingress, Service, DNSEntry). These events reveal more information about the DNS requests being processed and are especially useful to check any kind of misconfiguration, e.g. requests for a domain you don’t own.

Events for a successfully created DNS record:

$ kubectl describe service my-service

Events:
  Type    Reason          Age                From                    Message
  ----    ------          ----               ----                    -------
  Normal  dns-annotation  19s                dns-controller-manager  special.example.com: dns entry is pending
  Normal  dns-annotation  19s (x3 over 19s)  dns-controller-manager  special.example.com: dns entry pending: waiting for dns reconciliation
  Normal  dns-annotation  9s (x3 over 10s)   dns-controller-manager  special.example.com: dns entry active

Please note, events vanish after their retention period (usually 1h).

DNSEntry status

DNSEntry resources offer a .status sub-resource which can be used to check the current state of the object.

Status of a erroneous DNSEntry.

  status:
    message: No responsible provider found
    observedGeneration: 3
    provider: remote
    state: Error

References

3.3 - DNS Providers

DNS Providers

Introduction

Gardener can manage DNS records on your behalf, so that you can request them via different resource types (see here) within the shoot cluster. The domains for which you are permitted to request records, are however restricted and depend on the DNS provider configuration.

Shoot provider

By default, every shoot cluster is equipped with a default provider. It is the very same provider that manages the shoot cluster’s kube-apiserver public DNS record (DNS address in your Kubeconfig).

kind: Shoot
...
dns:
  domain: shoot.project.default-domain.gardener.cloud

You are permitted to request any sub-domain of .dns.domain that is not already taken (e.g. api.shoot.project.default-domain.gardener.cloud, *.ingress.shoot.project.default-domain.gardener.cloud) with this provider.

Additional providers

If you need to request DNS records for domains not managed by the default provider, additional providers can be configured in the shoot specification. Alternatively, if it is enabled, it can be added as DNSProvider resources to the shoot cluster.

Additional providers in the shoot specification

To add a providers in the shoot spec, you need set them in the spec.dns.providers list.

For example:

kind: Shoot
...
spec:
  dns:
    domain: shoot.project.default-domain.gardener.cloud
    providers:
    - secretName: my-aws-account
      type: aws-route53
    - secretName: my-gcp-account
      type: google-clouddns

Please consult the API-Reference to get a complete list of supported fields and configuration options.

Referenced secrets should exist in the project namespace in the Garden cluster and must comply with the provider specific credentials format. The External-DNS-Management project provides corresponding examples (20-secret-<provider-name>-credentials.yaml) for known providers.

Additional providers as resources in the shoot cluster

If it is not enabled globally, you have to enable the feature in the shoot manifest:

Kind: Shoot
...
spec:
  extensions:
    - type: shoot-dns-service
      providerConfig:
        apiVersion: service.dns.extensions.gardener.cloud/v1alpha1
        kind: DNSConfig
        dnsProviderReplication:
          enabled: true
...

To add a provider directly in the shoot cluster, provide a DNSProvider in any namespace together with Secret containing the credentials.

For example if the domain is hosted with AWS Route 53 (provider type aws-route53):

apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSProvider
metadata:
  annotations:
    dns.gardener.cloud/class: garden
  name: my-own-domain
  namespace: my-namespace
spec:
  type: aws-route53
  secretRef:
    name: my-own-domain-credentials
  domains:
    include:
    - my.own.domain.com
---
apiVersion: v1
kind: Secret
metadata:
  name: my-own-domain-credentials
  namespace: my-namespace
type: Opaque
data:
  # replace '...' with values encoded as base64
  AWS_ACCESS_KEY_ID: ...
  AWS_SECRET_ACCESS_KEY: ...

The External-DNS-Management project provides examples with more details for DNSProviders (30-provider-<provider-name>.yaml) and credential Secrets (20-secret-<provider-name>.yaml) at https://github.com/gardener/external-dns-management//examples for all supported provider types.

3.4 - Gateway Api Gateways

Using annotated Gateway API Gateway and/or HTTPRoutes as Source

This tutorial describes how to use annotated Gateway API resources as source for DNSEntries with the Gardener shoot-dns-service extension.

The dns-controller-manager supports the resources Gateway and HTTPRoute.

Install Istio on your cluster

Using a new or existing shoot cluster, follow the Istio Kubernetes Gateway API to install the Gateway API and to install Istio.

These are the typical commands for the Istio installation with the Kubernetes Gateway API:

export KUEBCONFIG=...
curl -L https://istio.io/downloadIstio | sh -
kubectl get crd gateways.gateway.networking.k8s.io &> /dev/null || \
  { kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd?ref=v1.0.0" | kubectl apply -f -; }
istioctl install --set profile=minimal -y
kubectl label namespace default istio-injection=enabled

Verify that Gateway Source works

Install a sample service

With automatic sidecar injection:

$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin.yaml

Using a Gateway as a source

Deploy the Gateway API configuration including a single exposed route (i.e., /get):

kubectl create namespace istio-ingress
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: gateway
  namespace: istio-ingress
  annotations:
    dns.gardener.cloud/dnsnames: "*.example.com"
    dns.gardener.cloud/class: garden
spec:
  gatewayClassName: istio
  listeners:
  - name: default
    hostname: "*.example.com"  # this is used by dns-controller-manager to extract DNS names
    port: 80
    protocol: HTTP
    allowedRoutes:
      namespaces:
        from: All
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: http
  namespace: default
spec:
  parentRefs:
  - name: gateway
    namespace: istio-ingress
  hostnames: ["httpbin.example.com"]  # this is used by dns-controller-manager to extract DNS names too
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /get
    backendRefs:
    - name: httpbin
      port: 8000
EOF

You should now see events in the namespace of the gateway:

$ kubectl -n istio-system get events --sort-by={.metadata.creationTimestamp}
LAST SEEN   TYPE      REASON                 OBJECT                                       MESSAGE
...
38s         Normal    dns-annotation         service/gateway-istio                      httpbin.example.com: created dns entry object shoot--foo--bar/gateway-istio-service-zpf8n
38s         Normal    dns-annotation         service/gateway-istio                      httpbin.example.com: dns entry pending: waiting for dns reconciliation
38s         Normal    dns-annotation         service/gateway-istio                      httpbin.example.com: dns entry is pending
36s         Normal    dns-annotation         service/gateway-istio                      httpbin.example.com: dns entry active

Using a HTTPRoute as a source

If the Gateway resource is annotated with dns.gardener.cloud/dnsnames: "*", hostnames from all referencing HTTPRoute resources are automatically extracted. These resources don’t need an additional annotation.

Deploy the Gateway API configuration including a single exposed route (i.e., /get):

kubectl create namespace istio-ingress
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: gateway
  namespace: istio-ingress
  annotations:
    dns.gardener.cloud/dnsnames: "*"
    dns.gardener.cloud/class: garden
spec:
  gatewayClassName: istio
  listeners:
  - name: default
    hostname: null  # not set 
    port: 80
    protocol: HTTP
    allowedRoutes:
      namespaces:
        from: All
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: http
  namespace: default
spec:
  parentRefs:
  - name: gateway
    namespace: istio-ingress
  hostnames: ["httpbin.example.com"]  # this is used by dns-controller-manager to extract DNS names too
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /get
    backendRefs:
    - name: httpbin
      port: 8000
EOF

This should show a similar events as above.

Access the sample service using `curl`

$ curl -I http://httpbin.example.com/get
HTTP/1.1 200 OK
server: istio-envoy
date: Tue, 13 Feb 2024 08:09:41 GMT
content-type: application/json
content-length: 701
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 19

Accessing any other URL that has not been explicitly exposed should return an HTTP 404 error:

$ curl -I http://httpbin.example.com/headers
HTTP/1.1 404 Not Found
date: Tue, 13 Feb 2024 08:09:41 GMT
server: istio-envoy
transfer-encoding: chunked

3.5 - Istio Gateways

Using annotated Istio Gateway and/or Istio Virtual Service as Source

This tutorial describes how to use annotated Istio Gateway resources as source for DNSEntries with the Gardener shoot-dns-service extension.

Install Istio on your cluster

Using a new or existing shoot cluster, follow the Istio Getting Started to download and install Istio.

These are the typical commands for the istio demo installation

export KUEBCONFIG=...
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=demo -y
kubectl label namespace default istio-injection=enabled

Verify that Istio Gateway/VirtualService Source works

Install a sample service

With automatic sidecar injection:

$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin.yaml

Using a Gateway as a source

Create an Istio Gateway:

$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: httpbin-gateway
  namespace: istio-system
  annotations:
    dns.gardener.cloud/dnsnames: "*"
    dns.gardener.cloud/class: garden
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "httpbin.example.com" # this is used by the dns-controller-manager to extract DNS names
EOF

Configure routes for traffic entering via the Gateway:

$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
  namespace: default
spec:
  hosts:
  - "httpbin.example.com" # this is also used by the dns-controller-manager to extract DNS names
  gateways:
  - istio-system/httpbin-gateway
  http:
  - match:
    - uri:
        prefix: /status
    - uri:
        prefix: /delay
    route:
    - destination:
        port:
          number: 8000
        host: httpbin
EOF

You should now see events in the namespace of the gateway:

$ kubectl -n istio-system get events --sort-by={.metadata.creationTimestamp}
LAST SEEN   TYPE      REASON                 OBJECT                                       MESSAGE
...
38s         Normal    dns-annotation         gateway/httpbin-gateway                      httpbin.example.com: created dns entry object shoot--foo--bar/httpbin-gateway-gateway-zpf8n
38s         Normal    dns-annotation         gateway/httpbin-gateway                      httpbin.example.com: dns entry pending: waiting for dns reconciliation
38s         Normal    dns-annotation         gateway/httpbin-gateway                      httpbin.example.com: dns entry is pending
36s         Normal    dns-annotation         gateway/httpbin-gateway                      httpbin.example.com: dns entry active

Using a VirtualService as a source

If the Gateway resource is annotated with dns.gardener.cloud/dnsnames: "*", hosts from all referencing VirtualServices resources are automatically extracted. These resources don’t need an additional annotation.

Create an Istio Gateway:

$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: httpbin-gateway
  namespace: istio-system
  annotations:
    dns.gardener.cloud/dnsnames: "*"
    dns.gardener.cloud/class: garden
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
EOF

Configure routes for traffic entering via the Gateway:

$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
  namespace: default  
spec:
  hosts:
  - "httpbin.example.com" # this is used by dns-controller-manager to extract DNS names
  gateways:
  - istio-system/httpbin-gateway
  http:
  - match:
    - uri:
        prefix: /status
    - uri:
        prefix: /delay
    route:
    - destination:
        port:
          number: 8000
        host: httpbin
EOF

This should show a similar events as above.

To get the targets to the extracted DNS names, the shoot-dns-service controller is able to gather information from the kubernetes service of the Istio Ingress Gateway.

Note: It is also possible to set the targets my specifying an Ingress resource using the dns.gardener.cloud/ingress annotation on the Istio Ingress Gateway resource.

Note: It is also possible to set the targets manually by using the dns.gardener.cloud/targets annotation on the Istio Ingress Gateway resource.

Access the sample service using `curl`

$ curl -I http://httpbin.example.com/status/200
HTTP/1.1 200 OK
server: istio-envoy
date: Tue, 13 Feb 2024 07:49:37 GMT
content-type: text/html; charset=utf-8
access-control-allow-origin: *
access-control-allow-credentials: true
content-length: 0
x-envoy-upstream-service-time: 15

Accessing any other URL that has not been explicitly exposed should return an HTTP 404 error:

$ curl -I http://httpbin.example.com/headers
HTTP/1.1 404 Not Found
date: Tue, 13 Feb 2024 08:09:41 GMT
server: istio-envoy
transfer-encoding: chunked

4 - Egress Filtering

Gardener extension controller for egress filtering for shoot clusters

Gardener Extension for Networking Filter

Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.

This controller implements Gardener’s extension contract for the shoot-networking-filter extension.

An example for a ControllerRegistration resource that can be used to register this controller to Gardener can be found here.

Please find more information regarding the extensibility concepts and a detailed proposal here.

Extension Resources

Example extension resource:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
  name: extension-shoot-networking-filter
  namespace: shoot--project--abc
spec:
  providerConfig:
    egressFilter:
      blackholingEnabled: false
      staticFilterList:
      - network: 1.2.3.4/31
        policy: BLOCK_ACCESS
      workers:
        blackholingEnabled: true
        names:
        - external-api

When an extension resource is reconciled, if the optional workers field is not used, the extension controller will create a daemonset egress-filter-applier on the shoot containing a Dockerfile container.

If the optional workers field is used, the extension controller will create one daemonset egress-filter-applier-<worker name> per each worker group on the shoot.

See the usage documentation for more details on how to configure the extension on a shoot cluster.

Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters.

How to start using or developing this extension controller locally

You can run the controller locally on your machine by executing make start.

We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.

Feedback and Support

Feedback and contributions are always welcome!

Please report bugs or suggestions as GitHub issues or reach out on Slack (join the workspace here).

Learn more!

Please find further resources about out project here:

4.1 - Deployment

Gardener Networking Policy Filter for Shoots

Introduction

Gardener allows shoot clusters to filter egress traffic on node level. To support this the Gardener must be installed with the shoot-networking-filter extension.

Configuration

To generally enable the networking filter for shoot objects the shoot-networking-filter extension must be registered by providing an appropriate extension registration in the garden cluster.

Here it is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot.

If the extension should be used for all shoots the globallyEnabled flag should be set to true.

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
...
spec:
  resources:
    - kind: Extension
      type: shoot-networking-filter
      globallyEnabled: true

ControllerRegistration

An example of a ControllerRegistration for the shoot-networking-filter can be found at controller-registration.yaml.

The ControllerRegistration contains a Helm chart which eventually deploys the shoot-networking-filter to seed clusters. It offers some configuration options, mainly to set up a static filter list or provide the configuration for downloading the filter list from a service endpoint.

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerDeployment
...
  values:
    egressFilter:
      blackholingEnabled: true

      filterListProviderType: static
      staticFilterList:
        - network: 1.2.3.4/31
          policy: BLOCK_ACCESS
        - network: 5.6.7.8/32
          policy: BLOCK_ACCESS
        - network: ::2/128
          policy: BLOCK_ACCESS

      #filterListProviderType: download
      #downloaderConfig:
      #  endpoint: https://my.filter.list.server/lists/policy
      #  oauth2Endpoint: https://my.auth.server/oauth2/token
      #  refreshPeriod: 1h

      ## if the downloader needs an OAuth2 access token, client credentials can be provided with oauth2Secret
      #oauth2Secret:
      # clientID: 1-2-3-4
      # clientSecret: secret!!
      ## either clientSecret of client certificate is required
      # client.crt.pem: |
      #   -----BEGIN CERTIFICATE-----
      #   ...
      #   -----END CERTIFICATE-----
      # client.key.pem: |
      #   -----BEGIN PRIVATE KEY-----
      #   ...
      #   -----END PRIVATE KEY-----

Enablement for a Shoot

If the shoot networking filter is not globally enabled by default (depends on the extension registration on the garden cluster), it can be enabled per shoot. To enable the service for a shoot, the shoot manifest must explicitly add the shoot-networking-filter extension.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-filter
...

If the shoot networking filter is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-filter
      disabled: true
...

4.2 - Shoot Networking Filter

Register Shoot Networking Filter Extension in Shoot Clusters

Introduction

Within a shoot cluster, it is possible to enable the networking filter. It is necessary that the Gardener installation your shoot cluster runs in is equipped with a shoot-networking-filter extension. Please ask your Gardener operator if the extension is available in your environment.

Shoot Feature Gate

In most of the Gardener setups the shoot-networking-filter extension is not enabled globally and thus must be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to activate the extension individually.

kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-filter
...

Opt-out

If the shoot networking filter is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-filter
      disabled: true
...

Ingress Filtering

By default, the networking filter only filters egress traffic. However, if you enable blackholing, incoming traffic will also be blocked. You can enable blackholing on a per-shoot basis.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-filter
      providerConfig:
        egressFilter:
          blackholingEnabled: true
...

Ingress traffic can only be blocked by blackhole routing, if the source IP address is preserved. On Azure, GCP and AliCloud this works by default. The default on AWS is a classic load balancer that replaces the source IP by it’s own IP address. Here, a network load balancer has to be configured adding the annotation service.beta.kubernetes.io/aws-load-balancer-type: "nlb" to the service. On OpenStack, load balancers don’t preserve the source address.

When you disable blackholing in an existing shoot, the associated blackhole routes will be removed automatically. Conversely, when you re-enable blackholing again, the iptables-based filter rules will be removed and replaced by blackhole routes.

Ingress Filtering per Worker Group

You can optionally enable or disable ingress filtering for specified worker groups. For example, you may want to disable blackholing in general but enable it for a worker group hosting an external API. You can do so by using an optional workers field:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-filter
      providerConfig:
        egressFilter:
          blackholingEnabled: false
          workers:
            blackholingEnabled: true
            names:
              - external-api
...

Please note that only blackholing can be changed per worker group. You may not define different IPs to block or disable blocking altogether.

Custom IP

It is possible to add custom IP addresses to the network filter. This can be useful for testing purposes.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-filter
      providerConfig:
        egressFilter:
          staticFilterList:
          - network: 1.2.3.4/31
            policy: BLOCK_ACCESS
          - network: 5.6.7.8/32
            policy: BLOCK_ACCESS
          - network: ::2/128
            policy: BLOCK_ACCESS
...

5 - Lakom Service

A k8s admission controller verifying pods are using signed images (cosign signatures) and a gardener extension to install it for shoots and seeds.

Gardener Extension for lakom services

Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.

This controller implements Gardener’s extension contract for the shoot-lakom-service extension.

An example for a ControllerRegistration resource that can be used to register this controller to Gardener can be found here.

Please find more information regarding the extensibility concepts and a detailed proposal here.

Lakom Admission Controller

Lakom is kubernetes admission controller which purpose is to implement cosign image signature verification against public cosign key. It also takes care to resolve image tags to sha256 digests. It also caches all OCI artifacts to reduce the load toward the OCI registry.

Extension Resources

Example extension resource:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
  name: extension-shoot-lakom-service
  namespace: shoot--project--abc
spec:
  type: shoot-lakom-service

When an extension resource is reconciled, the extension controller will create an instance of lakom admission controller. These resources are placed inside the shoot namespace on the seed. Also, the controller takes care about generating necessary RBAC resources for the seed as well as for the shoot.

Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters.

How to start using or developing this extension controller locally

The Lakom admission controller can be configured with make dev-setup and started with make start-lakom. You can run the lakom extension controller locally on your machine by executing make start.

If you’d like to develop Lakom using a local cluster such as KinD, make sure your KUBECONFIG environment variable is targeting the local Garden cluster. Add 127.0.0.1 garden.local.gardener.cloud to your /etc/hosts. You can then run:

make extension-up

This will trigger a skaffold deployment that builds the images, pushes them to the registry and installs the helm charts from /charts.

We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.

Feedback and Support

Feedback and contributions are always welcome!

Please report bugs or suggestions as GitHub issues or reach out on Slack (join the workspace here).

Learn more

Please find further resources about out project here:

5.1 - Deployment

Gardener Lakom Service for Shoots

Introduction

Gardener allows Shoot clusters to use Lakom admission controller for cosign image signing verification. To support this the Gardener must be installed with the shoot-lakom-service extension.

Configuration

To generally enable the Lakom service for shoot objects the shoot-lakom-service extension must be registered by providing an appropriate extension registration in the garden cluster.

Here it is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot.

If the extension should be used for all shoots the globallyEnabled flag should be set to true.

spec:
  resources:
    - kind: Extension
      type: shoot-lakom-service
      globallyEnabled: true

Shoot Feature Gate

If the shoot Lakom service is not globally enabled by default (depends on the extension registration on the garden cluster), it can be enabled per shoot. To enable the service for a shoot, the shoot manifest must explicitly add the shoot-lakom-service extension.

...
spec:
  extensions:
    - type: shoot-lakom-service
...

If the shoot Lakom service is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.

...
spec:
  extensions:
    - type: shoot-lakom-service
      disabled: true
...

5.2 - Lakom

Introduction

Lakom is kubernetes admission controller which purpose is to implement cosign image signature verification with public cosign key. It also takes care to resolve image tags to sha256 digests. A built-in cache mechanism can be enabled to reduce the load toward the OCI registry.

Flags

Lakom admission controller is configurable via command line flags. The trusted cosign public keys and the associated algorithms associated with them are set viq configuration file provided with the flag --lakom-config-path.

Flag Name	Description	Default Value
`--bind-address`	Address to bind to	“0.0.0.0”
`--cache-refresh-interval`	Refresh interval for the cached objects	30s
`--cache-ttl`	TTL for the cached objects. Set to 0, if cache has to be disabled	10m0s
`--contention-profiling`	Enable lock contention profiling, if profiling is enabled	false
`--health-bind-address`	Bind address for the health server	“:8081”
`-h`, `--help`	help for lakom
`--insecure-allow-insecure-registries`	If set, communication via HTTP with registries will be allowed.	false
`--insecure-allow-untrusted-images`	If set, the webhook will just return warning for the images without trusted signatures.	false
`--kubeconfig`	Paths to a kubeconfig. Only required if out-of-cluster.
`--lakom-config-path`	Path to file with lakom configuration containing cosign public keys used to verify the image signatures
`--metrics-bind-address`	Bind address for the metrics server	“:8080”
`--port`	Webhook server port	9443
`--profiling`	Enable profiling via web interface host:port/debug/pprof/	false
`--tls-cert-dir`	Directory with server TLS certificate and key (must contain a tls.crt and tls.key file
`--use-only-image-pull-secrets`	If set, only the credentials from the image pull secrets of the pod are used to access the OCI registry. Otherwise, the node identity and docker config are also used.	false
`--version`	prints version information and quits; –version=vX.Y.Z… sets the reported version

Lakom Cosign Public Keys Configuration File

Lakom cosign public keys configuration file should be YAML or JSON formatted. It can set multiple trusted keys, as each key must be given a name. The supported types of public keys are RSA, ECDSA and Ed25519. The RSA keys can be additionally configured with a signature verification algorithm specifying the scheme and hash function used during signature verification. As of now ECDSA and Ed25519 keys cannot be configured with specific algorithm.

publicKeys:
- name: example-public-key
  algorithm: RSASSA-PSS-SHA256
  key: |-
    -----BEGIN PUBLIC KEY-----
    MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBAPeQXbIWMMXYV+9+j9b4jXTflnpfwn4E
    GMrmqYVhm0sclXb2FPP5aV/NFH6SZdHDZcT8LCNsNgxzxV4N+UE/JIsCAwEAAQ==
    -----END PUBLIC KEY-----

Here:

name is logical human friendly name of the key.
algorithm is the algorithm that has to be used to verify the signature, see Supported RSA Signature Verification Algorithms for the list of supported algorithms.
key is the cryptographic public key that will be used for image signature validation.

Supported RSA Signature Verification Algorithms

RSASSA-PKCS1-v1_5-SHA256: uses RSASSA-PKCS1-v1_5 scheme with SHA256 hash func
RSASSA-PKCS1-v1_5-SHA384: uses RSASSA-PKCS1-v1_5 scheme with SHA384 hash func
RSASSA-PKCS1-v1_5-SHA512: uses RSASSA-PKCS1-v1_5 scheme with SHA512 hash func
RSASSA-PSS-SHA256: uses RSASSA-PSS scheme with SHA256 hash func
RSASSA-PSS-SHA384: uses RSASSA-PSS scheme with SHA384 hash func
RSASSA-PSS-SHA512: uses RSASSA-PSS scheme with SHA512 hash func

Supported Resources for Verification

By default, Lakom validates only Pod resources in the clusters that it covers. However, it also has the capabilities to validate the following Gardener specific resources:

controllerdeployments.core.gardener.cloud/v1
gardenlets.seedmanagement.gardener.cloud/v1alpha1
extensions.operator.gardener.cloud/v1alpha1

Important
When deploying Lakom via the helm chart in /charts/lakom, the admissionConfig.rules key can be fully customized to include any of the listed resources above. Make sure that they are registered with the same group & versions as the ones listed above. Any difference will cause Lakom to skip validation and approve the request, making it a security risk.

5.3 - Oci Registries

OCI Registries

This document aims to introduce the reader to the general architecture of OCI Registries, their history and jargon that might be encountered.

History of Container Registries

Most readers might be familiar with the container registry that was introduced by Docker. It aims to provide a way for developers to store and distribute container images that can be used for running applications in containers.

In the beginning, the registry was meant to only host images in a way that the different layers of the image could be downloaded in parallel, while being assembled on the client side.

In June 2015, multiple companies agreed to standardize the container image format, including its distribution and runtime. This was the birth of the Open Container Initiative (OCI).

Understanding OCI Registries

The most important document that pertains to OCI Registries in the case of Lakom is the OCI distribution spec. It outlines the way that artifacts shall be stored and fetched from a registry that adheres to the OCI standard.

OCI Registries are meant to store content. While in the beginning the main problem that they were trying to solve was image storage, during the evolution of the image registry concept, a decision was made to make it more general for storage of any form of objects, sometimes called artifacts.

An artifact is an abstract idea of multiple components that together describe how an object is built along with storing metadata about the object.

To implement this, every OCI registry has to expose an API for creating 3 types of objects:

Blobs
Manifests
Tags

Blobs

A blob is just a set of bytes. Nothing more. The important part about blobs is that they are content addressable. Content Addressable Storage (CAS) is a method of storing data in such a way that the address of the data is derived from the data itself. This means that if you store the same blob twice, it will be stored only once, and the address of the blob will be the same. This is achieved by using the digest of the blob as the address.

This is the actual mechanism for allowing images to be optimal in terms of storage. Since most images begin by depending on the same base image, this means that it can be stored only once and referenced by multiple images.

Manifests

A manifest is just a JSON document. It is stored separately from the blobs. But similarly to the blobs, it is also content addressable. The manifest gets formatted to a canonical form and then hashed. The hash becomes the address of the manifest.

Where manifests become powerful is when they are used to reference to other manifests or blobs. These references are called OCI Content Descriptors. For example, the Image Manifest that most readers would be familiar with contains a list of layers that are used to build the image. Each layer is a reference to a blob.

An additional reference outside the layer references is used for storing additional configuration. More info can be found in the official OCI Image Manifest spec.

The term artifact is just a generalization over the Image Manifest idea. The Image Manifest has a specific format that can be found in the aforementioned OCI Image Manifest spec. The important field is the .config.mediaType field. Based on the official guidelines, it can be used for defining types other than an image. More info can be found here.

Combining all of these ideas, we can see that artifacts can be represented by 1 specific object - a manifest. That manifest, based on the media type in the config, can be interpreted by different programs for their specific use case. Eg. a Helm chart. Helm charts specifically have a mediaType of application/vnd.cncf.helm.config.v1+json. Layers, too, have a media type. The media type of a layer when the artifact is a helm chart is one of:

application/vnd.cncf.helm.config.v1+json
application/vnd.cncf.helm.chart.content.v1.tar+gzip
application/vnd.cncf.helm.chart.provenance.v1.prov

More info specifically on Helm charts as OCI Artifacts can be found here: Helm OCI MediaTypes.

Cosign

Cosign is generally a part of the bigger project called sigstore. Sigstore is an “open-source project for improving software supply chain security”. It aims for developers to have a hassle-free way to sign their artifacts while leveraging other components that add additional layers of security to the signature process. Eg. Rekor and Fulcio.

Cosign is a CLI tool aiming to tie all the components of sigstore together. In our specific case, we use it only as a format for creating signatures and then verifying them. In theory, we wouldn’t need to use Cosign for this, but it uses a popular format for the signatures and the libraries help us save work.

In the context of Lakom, what interests us is how Cosign signs OCI artifacts and how it stores them. The signing process in a nutshell is the following – Sign(sha256(SimpleSigningPayload(sha256(Image Manifest)))), where:

Image Manifest is what it says. Just the manifest of the image (artifact).
sha256 is the hashing algorithm used.
SimpleSigningPayload is the interesting part here. Instead of just signing the hash of the manifest itself, the hash itself is packed into a JSON doc that is called the SimpleSigningPayload. Example payload:

{
    "critical": {
           "identity": {
               "docker-reference": "testing/manifest"
           },
           "image": {
               "Docker-manifest-digest": "sha256:20be...fe55"
           },
           "type": "cosign container image signature"
    },
    "optional": {
           "creator": "atomic",
           "timestamp": 1458239713
    }
}

We can see that the hash of the manifest is stored in the Docker-manifest-digest. Now this object is the one that gets signed using our private key. The signed version of this objest is the one that gets stored and gets used for verification.

Storage of the signature is done via an image manifest that gets uploaded in the same repo as the image, but tagged with sha256-<container-image-digest>.sig. Thus, the signature discovery mechanism is to just fetch the image digest that is tagged with the aforementioned tag.

More info on this whole process can be found here.

Lakom relies on the cosign libraries to automate this whole process. When validating an artifact, if we know its digest, we can fetch the signature and verify it. If the signature is valid, we can be sure that the artifact has not been tampered with.

5.4 - Shoot Extension

Introduction

This extension implements cosign image verification. It is strictly limited only to the kubernetes system components deployed by Gardener and other Gardener Extensions in the kube-system namespace of a shoot cluster.

Shoot Feature Gate

In most of the Gardener setups the shoot-lakom-service extension is enabled globally and thus can be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to disable the extension individually.

kind: Shoot
...
spec:
  resources:
  - name: lakom-ref
    resourceRef:
      apiVersion: v1
      kind: Secret
      name: lakom-secret
  extensions:
  - type: shoot-lakom-service
    disabled: true
    providerConfig:
      apiVersion: lakom.extensions.gardener.cloud/v1alpha1
      kind: LakomConfig
      scope: KubeSystem
      trustedKeysResourceName: lakom-ref
...

Scope

The scope field instruct lakom which pods to validate. The possible values are:

KubeSystem Lakom will validate all pods in the kube-system namespace.
KubeSystemManagedByGardener Lakom will validate all pods in the kube-system namespace that are annotated with “managed-by/gardener”. This is the default value.
Cluster Lakom will validate all pods in all namespaces.

TrustedKeysResourceName

Lakom, by default, tries to verify only workloads that belong to Gardener. Because of this, the only public keys that it uses to do its job are the ones for the Gardener workload.

If you’d like to use Lakom as a tool for verifying your own workload, you’ll need to add your own public keys to the ones that Lakom is already using. This can be achieved using Gardener referenced resources. More information about the keys and their format can be found here.

Simply:

Create a secret in your project namespace that contains a field keys with your keys as a value. Example keys:

- name: example-client-key1
  algorithm: RSASSA-PSS-SHA256
  key: |-
    -----BEGIN PUBLIC KEY-----
    MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBAPeQXbIWMMXYV+9+j9b4jXTflnpfwn4E
    GMrmqYVhm0sclXb2FPP5aV/NFH6SZdHDZcT8LCNsNgxzxV4N+UE/JIsCAwEAAQ==
    -----END PUBLIC KEY-----
- name: example-client-key2
  algorithm: RSASSA-PSS-SHA256
  key: |-
    -----BEGIN PUBLIC KEY-----
    MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBAPeQXbIWMMXYV+9+j9b4jXTflnpfwn4E
    GMrmqYVhm0sclXb2FPP5aV/NFH6SZdHDZcT8LCNsNgxzxV4N+UE/JIsCAwEAAQ==
    -----END PUBLIC KEY-----

Add a reference to your secret via the resources field in the shoot spec as shown above.
Add the name of your referenece in trustedKeysResourceName in the provider config as shown above.

Now, whenever Lakom tries to verify a Pod, it will make sure to use your public keys as well.

6 - Networking Problem Detector

Gardener extension for deploying network problem detector

Gardener Extension for Network Problem Detector

Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.

This controller implements Gardener’s extension contract for the shoot-networking-problemdetector extension.

An example for a ControllerRegistration resource that can be used to register this controller to Gardener can be found here.

Please find more information regarding the extensibility concepts and a detailed proposal here.

Extension Resources

Currently there is nothing to specify in the extension spec.

Example extension resource:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
  name: extension-shoot-networking-problemdetector
  namespace: shoot--project--abc
spec:

When an extension resource is reconciled, the extension controller will create two daemonsets nwpd-agent-pod-net and nwpd-agent-node-net deploying the “network problem detector agent”. These daemon sets perform and collect various checks between all nodes of the Kubernetes cluster, to its Kube API server and/or external endpoints. Checks are performed using TCP connections, PING (ICMP) or mDNS (UDP). More details about the network problem detector agent can be found in its repository gardener/network-problem-detector.

Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters.

How to start using or developing this extension controller locally

You can run the controller locally on your machine by executing make start.

We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.

Feedback and Support

Feedback and contributions are always welcome!

Please report bugs or suggestions as GitHub issues or reach out on Slack (join the workspace here).

Learn more!

Please find further resources about out project here:

6.1 - Deployment

Gardener Networking Policy Filter for Shoots

Introduction

Gardener allows shoot clusters to add network problem observability using the network problem detector. To support this the Gardener must be installed with the shoot-networking-problemdetector extension.

Configuration

To generally enable the networking problem detector for shoot objects the shoot-networking-problemdetector extension must be registered by providing an appropriate extension registration in the garden cluster.

Here it is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot.

If the extension should automatically be used for all shoots the autoEnable field should be set to [shoot].

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
...
spec:
  resources:
    - kind: Extension
      type: shoot-networking-problemdetector
      autoEnable: [shoot]

ControllerRegistration

An example of a ControllerRegistration for the shoot-networking-problemdetector can be found at controller-registration.yaml.

The ControllerRegistration contains a Helm chart which eventually deploys the shoot-networking-problemdetector to seed clusters. It offers some configuration options, mainly to set up a static filter list or provide the configuration for downloading the filter list from a service endpoint.

apiVersion: core.gardener.cloud/v1beta1
kind: ControllerDeployment
...
  values:
    #networkProblemDetector:
    #  defaultPeriod: 30s

Enablement for a Shoot

If the shoot network problem detector is not globally enabled by default (depends on the extension registration on the garden cluster), it can be enabled per shoot. To enable the service for a shoot, the shoot manifest must explicitly add the shoot-networking-problemdetector extension.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-problemdetector
...

If the shoot network problem detector is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-problemdetector
      disabled: true
...

6.2 - Shoot Networking Problemdetector

Register Shoot Networking Filter Extension in Shoot Clusters

Introduction

Within a shoot cluster, it is possible to enable the network problem detector. It is necessary that the Gardener installation your shoot cluster runs in is equipped with a shoot-networking-problemdetector extension. Please ask your Gardener operator if the extension is available in your environment.

Shoot Feature Gate

In most of the Gardener setups the shoot-networking-problemdetector extension is not enabled globally and thus must be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to activate the extension individually.

kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-problemdetector
...

Opt-out

If the shoot network problem detector is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
  extensions:
    - type: shoot-networking-problemdetector
      disabled: true
...

7 - Node Audit Logging

Gardener extension controller which configures the rsyslog and auditd services installed on shoot nodes.

Gardener Extension to configure rsyslog with relp module

Gardener extension controller which configures the rsyslog and auditd services installed on shoot nodes.

Usage

Configuring the Rsyslog Relp Extension - learn what is the use-case for rsyslog-relp, how to enable it and configure it

Local Setup and Development

Deploying the Rsyslog Relp Extension Locally - learn how to set up a local development environment
Developer Docs for Gardener Shoot Rsyslog Relp Extension - learn about the inner workings

7.1 - Configuration

Configuring the Rsyslog Relp Extension

Introduction

As a cluster owner, you might need audit logs on a Shoot node level. With these audit logs you can track actions on your nodes like privilege escalation, file integrity, process executions, and who is the user that performed these actions. Such information is essential for the security of your Shoot cluster. Linux operating systems collect such logs via the auditd and journald daemons. However, these logs can be lost if they are only kept locally on the operating system. You need a reliable way to send them to a remote server where they can be stored for longer time periods and retrieved when necessary.

Rsyslog offers a solution for that. It gathers and processes logs from auditd and journald and then forwards them to a remote server. Moreover, rsyslog can make use of the RELP protocol so that logs are sent reliably and no messages are lost.

The shoot-rsyslog-relp extension is used to configure rsyslog on each Shoot node so that the following can take place:

Rsyslog reads logs from the auditd and journald sockets.
The logs are filtered based on the program name, syslog severity and content of the message.
The logs are enriched with metadata containing the name of the Project in which the Shoot is created, the name of the Shoot, the UID of the Shoot, and the hostname of the node on which the log event occurred.
The enriched logs are sent to the target remote server via the RELP protocol.

The following graph shows a rough outline of how that looks in a Shoot cluster: rsyslog-logging-architecture

Shoot Configuration

The extension is not globally enabled and must be configured per Shoot cluster. The Shoot specification has to be adapted to include the shoot-rsyslog-relp extension configuration, which specifies the target server to which logs are forwarded, its port, and some optional rsyslog settings described in the examples below.

Below is an example shoot-rsyslog-relp extension configuration as part of the Shoot spec:

kind: Shoot
metadata:
  name: bar
  namespace: garden-foo
...
spec:
  extensions:
  - type: shoot-rsyslog-relp
    providerConfig:
      apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
      kind: RsyslogRelpConfig
      # Set the target server to which logs are sent. The server must support the RELP protocol.
      target: some.rsyslog-relp.server
      # Set the port of the target server.
      port: 10250
      # Define rules to select logs from which programs and with what syslog severity
      # are forwarded to the target server.
      loggingRules:
      - severity: 4
        programNames: ["kubelet", "audisp-syslog"]
        # Set regular expression to match or exclude messages based on their content.
        # messageContent:
        #   regex: "foo"
        #   exclude: "bar"
      - severity: 1
        programNames: ["audisp-syslog"]
      # Define an interval of 90 seconds at which the current connection is broken and re-established.
      # By default this value is 0 which means that the connection is never broken and re-established.
      rebindInterval: 90
      # Set the timeout for relp sessions to 90 seconds. If set too low, valid sessions may be considered
      # dead and tried to recover.
      timeout: 90
      # Set how often an action is retried before it is considered to have failed.
      # Failed actions discard log messages. Setting `-1` here means that messages are never discarded.
      resumeRetryCount: -1
      # Configures rsyslog to report continuation of action suspension, e.g. when the connection to the target
      # server is broken.
      reportSuspensionContinuation: true
      # Add tls settings if tls should be used to encrypt the connection to the target server.
      tls:
        enabled: true
        # Use `name` authentication mode for the tls connection.
        authMode: name
        # Only allow connections if the server's name is `some.rsyslog-relp.server`
        permittedPeer:
        - "some.rsyslog-relp.server"
        # Reference to the resource which contains certificates used for the tls connection.
        # It must be added to the `.spec.resources` field of the Shoot.
        secretReferenceName: rsyslog-relp-tls
        # Instruct librelp on the Shoot nodes to use the gnutls tls library.
        tlsLib: gnutls
      # Add auditConfig settings if you want to customize node level auditing.
      auditConfig:
        enabled: true
        # Reference to the resource which contains the audit configuration.
        # It must be added to the `.spec.resources` field of the Shoot.
        configMapReferenceName: audit-config
  resources:
    # Add the rsyslog-relp-tls secret in the resources field of the Shoot spec.
    - name: rsyslog-relp-tls
      resourceRef:
        apiVersion: v1
        kind: Secret
        name: rsyslog-relp-tls-v1
    - name: audit-config
      resourceRef:
        apiVersion: v1
        kind: ConfigMap
        name: audit-config-v1
...

Choosing Which Log Messages to Send to the Target Server

The .loggingRules field defines rules about which logs should be sent to the target server. When a log is processed by rsyslog, it is compared against the list of rules in order. If the program name, the syslog severity of the log messages and the message content matches the rule, the message is forwarded to the target server. The following table describes the syslog severity and their corresponding codes:

Numerical         Severity
  Code

  0               Emergency: system is unusable
  1               Alert: action must be taken immediately
  2               Critical: critical conditions
  3               Error: error conditions
  4               Warning: warning conditions
  5               Notice: normal but significant condition
  6               Informational: informational messages
  7               Debug: debug-level messages

Below is an example with a .loggingRules section that will only forward logs from the kubelet program with syslog severity of 6 or lower that don’t contain “bar” and any other program with syslog severity of 2 or lower:

apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: RsyslogRelpConfig
target: localhost
port: 1520
loggingRules:
- severity: 6
  programNames: ["kubelet"]
  messageContent:
    exclude: "bar"
- severity: 2

You can use a minimal shoot-rsyslog-relp extension configuration to forward all logs to the target server:

apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: RsyslogRelpConfig
target: some.rsyslog-relp.server
port: 10250
loggingRules:
- severity: 7

Securing the Communication to the Target Server with TLS

The communication to the target server is not encrypted by default. To enable encryption, set the .tls.enabled field in the shoot-rsyslog-relp extension configuration to true. In this case, an immutable secret which contains the TLS certificates used to establish the TLS connection to the server must be created in the same project namespace as your Shoot.

An example Secret is given below:

Note: The secret must be immutable

kind: Secret
apiVersion: v1
metadata:
  name: rsyslog-relp-tls-v1
  namespace: garden-foo
immutable: true
data:
  ca: |
    -----BEGIN BEGIN RSA PRIVATE KEY-----
    ...
    -----END RSA PRIVATE KEY-----
  crt: |
    -----BEGIN BEGIN RSA PRIVATE KEY-----
    ...
    -----END RSA PRIVATE KEY-----
  key: |
    -----BEGIN BEGIN RSA PRIVATE KEY-----
    ...
    -----END RSA PRIVATE KEY-----

The Secret must be referenced in the Shoot’s .spec.resources field and the corresponding resource entry must be referenced in the .tls.secretReferenceName of the shoot-rsyslog-relp extension configuration:

kind: Shoot
metadata:
  name: bar
  namespace: garden-foo
...
spec:
  extensions:
  - type: shoot-rsyslog-relp
    providerConfig:
      apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
      kind: RsyslogRelpConfig
      target: some.rsyslog-relp.server
      port: 10250
      loggingRules:
      - severity: 7
      tls:
        enabled: true
        secretReferenceName: rsyslog-relp-tls
  resources:
    - name: rsyslog-relp-tls
      resourceRef:
        apiVersion: v1
        kind: Secret
        name: rsyslog-relp-tls-v1
...

You can set a few additional parameters for the TLS connection: .tls.authMode, tls.permittedPeer, and tls.tlsLib. Refer to the rsyslog documentation for more information on these parameters:

Configuring the Audit Daemon on the Shoot Nodes

The shoot-rsyslog-relp extension also allows you to configure the Audit Daemon (auditd) on the Shoot nodes.

By default, the audit rules located under the /etc/audit/rules.d directory on your Shoot’s nodes will be moved to /etc/audit/rules.d.original and the following rules will be placed under the /etc/audit/rules.d directory: 00-base-config.rules, 10-privilege-escalation.rules, 11-privilege-special.rules, 12-system-integrity.rules. Next, augerules --load will be called and the audit daemon (auditd) restarted so that the new rules can take effect.

Alternatively, you can define your own auditd rules to be placed on your Shoot’s nodes by using the following configuration:

apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: Auditd
auditRules: |
  ## First rule - delete all existing rules
  -D
  ## Now define some custom rules
  -a exit,always -F arch=b64 -S setuid -S setreuid -S setgid -S setregid -F auid>0 -F auid!=-1 -F key=privilege_escalation
  -a exit,always -F arch=b64 -S execve -S execveat -F euid=0 -F auid>0 -F auid!=-1 -F key=privilege_escalation

In this case the original rules are also backed up in the /etc/audit/rules.d.original directory.

To deploy this configuration, it must be embedded in an immutable ConfigMap.

Note
The data key storing this configuration must be named auditd.

An example ConfigMap is given below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: audit-config-v1
  namespace: garden-foo
immutable: true
data:
  auditd: |
    apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
    kind: Auditd
    auditRules: |
      ## First rule - delete all existing rules
      -D
      ## Now define some custom rules
      -a exit,always -F arch=b64 -S setuid -S setreuid -S setgid -S setregid -F auid>0 -F auid!=-1 -F key=privilege_escalation
      -a exit,always -F arch=b64 -S execve -S execveat -F euid=0 -F auid>0 -F auid!=-1 -F key=privilege_escalation

After creating such a ConfigMap, it must be included in the Shoot’s spec.resources array and then referenced from the providerConfig.auditConfig.configMapReferenceName field in the shoot-rsyslog-relp extension configuration.

An example configuration is given below:

kind: Shoot
metadata:
  name: bar
  namespace: garden-foo
...
spec:
  extensions:
  - type: shoot-rsyslog-relp
    providerConfig:
      apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
      kind: RsyslogRelpConfig
      target: some.rsyslog-relp.server
      port: 10250
      loggingRules:
      - severity: 7
      auditConfig:
        enabled: true
        configMapReferenceName: audit-config
  resources:
    - name: audit-config
      resourceRef:
        apiVersion: v1
        kind: ConfigMap
        name: audit-config-v1

Finally, by setting providerConfig.auditConfig.enabled to false in the shoot-rsyslog-relp extension configuration, the original audit rules on your Shoot’s nodes will not be modified and auditd will not be restarted.

Examples on how the providerConfig.auditConfig.enabled field functions are given below:

The following deploys the extension default audit rules as of today:
```
providerConfig:
  auditConfig:
    enabled: true
```

The following deploys only the rules specified in the referenced ConfigMap:

providerConfig:
  auditConfig:
    enabled: true
    configMapReferenceName: audit-config

Both of the following do not deploy any audit rules:

providerConfig:
  auditConfig:
    enabled: false
    configMapReferenceName: audit-config

providerConfig:
  auditConfig:
    enabled: false

7.2 - Deploying Rsyslog Relp Extension Remotely

Learn how to set up a development environment using own Seed clusters on an existing Kubernetes cluster

Deploying Rsyslog Relp Extension Remotely

This document will walk you through running the Rsyslog Relp extension controller on a remote seed cluster and the rsyslog relp admission component in your local garden cluster for development purposes. This guide uses Gardener’s setup with provider extensions and builds on top of it.

If you encounter difficulties, please open an issue so that we can make this process easier.

Prerequisites

Make sure that you have a running Gardener setup with provider extensions. The steps to complete this can be found in the Deploying Gardener Locally and Enabling Provider-Extensions guide.
Make sure you are running Gardener version >= 1.95.0 or the latest version of the master branch.

Setting up the Rsyslog Relp Extension

Important: Make sure that your KUBECONFIG env variable is targeting the local Gardener cluster!

The location of the Gardener project from the Gardener setup is expected to be under the same root as this repository (e.g. ~/go/src/github.com/gardener/). If this is not the case, the location of Gardener project should be specified in GARDENER_REPO_ROOT environment variable:

export GARDENER_REPO_ROOT="<path_to_gardener_project>"

Then you can run:

make remote-extension-up

In case you have added additional Seeds you can specify the seed name:

make remote-extension-up SEED_NAME=<seed-name>

Creating a Shoot Cluster

Once the above step is completed, you can create a Shoot cluster. In order to create a Shoot cluster, please create your own Shoot definition depending on providers on your Seed cluster.

Configuring the Shoot Cluster and deploying the Rsyslog Relp Echo Server

To be able to properly test the rsyslog relp extension you need a running rsyslog relp echo server to which logs from the Shoot nodes can be sent. To deploy the server and configure the rsyslog relp extension on your Shoot cluster you can run:

make configure-shoot SHOOT_NAME=<shoot-name> SHOOT_NAMESPACE=<shoot-namespace>

This command will deploy an rsyslog relp echo server in your Shoot cluster in the rsyslog-relp-echo-server namespace. It will also add configuration for the shoot-rsyslog-relp extension to your Shoot spec by patching it with ./example/extension/<shoot-name>--<shoot-namespace>--extension-config-patch.yaml. This file is automatically copied from extension-config-patch.yaml.tmpl in the same directory when you run make configure-shoot for the first time. The file also includes explanations of the properties you should set or change. The command will also deploy the rsyslog-relp-tls secret in case you wish to enable tls.

Tearing Down the Development Environment

To tear down the development environment, delete the Shoot cluster or disable the shoot-rsyslog-relp extension in the Shoot’s specification. When the extension is not used by the Shoot anymore, you can run:

make remote-extension-down

The make target will delete the ControllerDeployment and ControllerRegistration of the extension, and the shoot-rsyslog-relp admission helm deployment.

7.3 - Getting Started

Deploying Rsyslog Relp Extension Locally

This document will walk you through running the Rsyslog Relp extension and a fake rsyslog relp service on your local machine for development purposes. This guide uses Gardener’s local development setup and builds on top of it.

If you encounter difficulties, please open an issue so that we can make this process easier.

Prerequisites

Make sure that you have a running local Gardener setup. The steps to complete this can be found here.
Make sure you are running Gardener version >= 1.74.0 or the latest version of the master branch.

Setting up the Rsyslog Relp Extension

Important: Make sure that your KUBECONFIG env variable is targeting the local Gardener cluster!

make extension-up

This will build the shoot-rsyslog-relp, shoot-rsyslog-relp-admission, and shoot-rsyslog-relp-echo-server images and deploy the needed resources and configurations in the garden cluster. The shoot-rsyslog-relp-echo-server will act as development replacement of a real rsyslog relp server.

Creating a Shoot Cluster

Once the above step is completed, we can deploy and configure a Shoot cluster with default rsyslog relp settings.

kubectl apply -f ./example/shoot.yaml

Once the Shoot’s namespace is created, we can create a networkpolicy that will allow egress traffic from the rsyslog on the Shoot’s nodes to the rsyslog-relp-echo-server that serves as a fake rsyslog target server.

kubectl apply -f ./example/local/allow-machine-to-rsyslog-relp-echo-server-netpol.yaml

Currently, the Shoot’s nodes run Ubuntu, which does not have the rsyslog-relp and auditd packages installed, so the configuration done by the extension has no effect. Once the Shoot is created, we have to manually install the rsyslog-relp and auditd packages:

kubectl -n shoot--local--local exec -it $(kubectl -n shoot--local--local get po -l app=machine,machine-provider=local -o name) -- bash -c "
   apt-get update && \
   apt-get install -y rsyslog-relp auditd && \
   systemctl enable rsyslog.service && \
   systemctl start rsyslog.service"

Once that is done we can verify that log messages are forwarded to the rsyslog-relp-echo-server by checking its logs.

kubectl -n rsyslog-relp-echo-server logs deployment/rsyslog-relp-echo-server

Making Changes to the Rsyslog Relp Extension

Changes to the rsyslog relp extension can be applied to the local environment by repeatedly running the make recipe.

make extension-up

Tearing Down the Development Environment

To tear down the development environment, delete the Shoot cluster or disable the shoot-rsyslog-relp extension in the Shoot’s spec. When the extension is not used by the Shoot anymore, you can run:

make extension-down

This will delete the ControllerRegistration and ControllerDeployment of the extension, the shoot-rsyslog-relp-admission deployment, and the rsyslog-relp-echo-server deployment.

Maintaining the Publicly Available Image for the rsyslog-relp Echo Server

The testmachinery tests use an rsyslog-relp-echo-server image from a publicly available repository. The one which is currently used is eu.gcr.io/gardener-project/gardener/extensions/shoot-rsyslog-relp-echo-server:v0.1.0.

Sometimes it might be necessary to update the image and publish it, e.g. when updating the alpine base image version specified in the repository’s Dokerfile.

To do that:

Bump the version with which the image is built in the Makefile.
Build the shoot-rsyslog-relp-echo-server image:
```
make echo-server-docker-image
```
Once the image is built, push it to gcr with:
```
make push-echo-server-image
```
Finally, bump the version of the image used by the testmachinery tests here.
Create a PR with the changes.

7.4 - Monitoring

Monitoring

The shoot-rsyslog-relp extension exposes metrics for the rsyslog service running on a Shoot’s nodes so that they can be easily viewed by cluster owners and operators in the Shoot’s Prometheus and Plutono instances. The exposed monitoring data offers valuable insights into the operation of the rsyslog service and can be used to detect and debug ongoing issues. This guide describes the various metrics, alerts and logs available to cluster owners and operators.

Metrics

Metrics for the rsyslog service originate from its impstats module. These include the number of messages in the various queues, the number of ingested messages, the number of processed messages by configured actions, system resources used by the rsyslog service, and others. More information about them can be found in the impstats documentation and the statistics counter documentation. They are exposed via the node-exporter running on each Shoot node and are scraped by the Shoot’s Prometheus instance.

These metrics can also be viewed in a dedicated dashboard named Rsyslog Stats in the Shoot’s Plutono instance. You can select the node for which you wish the metrics to be displayed from the Node dropdown menu (by default metrics are summed over all nodes).

Following is a list of all exposed rsyslog metrics. The name and origin labels can be used to determine wether the metric is for: a queue, an action, plugins or system stats; the node label can be used to determine the node the metric originates from:

rsyslog_pstat_submitted

Number of messages that were submitted to the rsyslog service from its input. Currently rsyslog uses the /run/systemd/journal/syslog socket as input.

Type: Counter
Labels: name node origin

rsyslog_pstat_processed

Number of messages that are successfully processed by an action and sent to the target server.

Type: Counter
Labels: name node origin

rsyslog_pstat_failed

Number of messages that could not be processed by an action nor sent to the target server.

Type: Counter
Labels: name node origin

rsyslog_pstat_suspended

Total number of times an action suspended itself. Note that this counts the number of times the action transitioned from active to suspended state. The counter is no indication of how long the action was suspended or how often it was retried.

Type: Counter
Labels: name node origin

rsyslog_pstat_suspended_duration

The total number of seconds this action was disabled.

Type: Counter
Labels: name node origin

rsyslog_pstat_resumed

The total number of times this action resumed itself. A resumption occurs after the action has detected that a failure condition does no longer exist.

Type: Counter
Labels: name node origin

rsyslog_pstat_utime

User time used in microseconds.

Type: Counter
Labels: name node origin

rsyslog_pstat_stime

System time used in microsends.

Type: Counter
Labels: name node origin

rsyslog_pstat_maxrss

Maximum resident set size

Type: Gauge
Labels: name node origin

rsyslog_pstat_minflt

Total number of minor faults the task has made per second, those which have not required loading a memory page from disk.

Type: Counter
Labels: name node origin

rsyslog_pstat_majflt

Total number of major faults the task has made per second, those which have required loading a memory page from disk.

Type: Counter
Labels: name node origin

rsyslog_pstat_inblock

Filesystem input operations.

Type: Counter
Labels: name node origin

rsyslog_pstat_oublock

Filesystem output operations.

Type: Counter
Labels: name node origin

rsyslog_pstat_nvcsw

Voluntary context switches.

Type: Counter
Labels: name node origin

rsyslog_pstat_nivcsw

Involuntary context switches.

Type: Counter
Labels: name node origin

rsyslog_pstat_openfiles

Number of open files.

Type: Counter
Labels: name node origin

rsyslog_pstat_size

Messages currently in queue.

Type: Gauge
Labels: name node origin

rsyslog_pstat_enqueued

Total messages enqueued.

Type: Counter
Labels: name node origin

rsyslog_pstat_full

Times queue was full.

Type: Counter
Labels: name node origin

rsyslog_pstat_discarded_full

Messages discarded due to queue being full.

Type: Counter
Labels: name node origin

rsyslog_pstat_discarded_nf

Messages discarded when queue not full.

Type: Counter
Labels: name node origin

rsyslog_pstat_maxqsize

Maximum size queue has reached.

Type: Gauge
Labels: name node origin

rsyslog_augenrules_load_success

Shows whether the augenrules --load command was executed successfully or not on the node.

Type: Gauge
Labels: node

Alerts

There are three alerts defined for the rsyslog service in the Shoot’s Prometheus instance:

RsyslogTooManyRelpActionFailures

This indicates that the cumulative failure rate in processing relp action messages is greater than 2%. In other words, it compares the rate of processed relp action messages to the rate of failed relp action messages and fires an alert when the following expression evaluates to true:

sum(rate(rsyslog_pstat_failed{origin="core.action",name="rsyslg-relp"}[5m])) / sum(rate(rsyslog_pstat_processed{origin="core.action",name="rsyslog-relp"}[5m])) > bool 0.02`

RsyslogRelpActionProcessingRateIsZero

This indicates that no messages are being sent to the upstream rsyslog target by the relp action. An alert is fired when the following expression evaluates to true:

rate(rsyslog_pstat_processed{origin="core.action",name="rsyslog-relp"}[5m]) == 0

RsyslogRelpAuditRulesNotLoadedSuccessfully

This indicates that augenrules --load was not executed successfully when called to load the configured audit rules. You should check if the auditd configuration you provided is valid. An alert is fired when the following expression evaluates to true:

absent(rsyslog_augenrules_load_success == 1)

Users can subscribe to these alerts by following the Gardener alerting guide.

Logging

There are two ways to view the logs of the rsyslog service running on the Shoot’s nodes - either using the Explore tab of the Shoot’s Plutono instance, or ssh-ing directly to a node.

To view logs in Plutono, navigate to the Explore tab and select vali from the Explore dropdown menu. Afterwards enter the following vali query:

{nodename="<name-of-node>"} |~ "\"unit\":\"rsyslog.service\""

Notice that you cannot use the unit label to filter for the rsyslog.service unit logs. Instead, you have to grep for the service as displayed in the example above.

To view logs when directly ssh-ing to a node in the Shoot cluster, use either of the following commands on the node:

systemctl status rsyslog

journalctl -u rsyslog

7.5 - Shoot Rsyslog Relp

Developer Docs for Gardener Shoot Rsyslog Relp Extension

This document outlines how Shoot reconciliation and deletion works for a Shoot with the shoot-rsyslog-relp extension enabled.

Shoot Reconciliation

This section outlines how the reconciliation works for a Shoot with the shoot-rsyslog-relp extension enabled.

Extension Enablement / Reconciliation

This section outlines how the extension enablement/reconciliation works, e.g., the extension has been added to the Shoot spec.

As part of the Shoot reconciliation flow, the gardenlet deploys the Extension resource.
The shoot-rsyslog-relp extension reconciles the Extension resource. pkg/controller/lifecycle/actuator.go contains the implementation of the extension.Actuator interface. The reconciliation of an Extension of type shoot-rsyslog-relp only deploys the necessary monitoring configuration - the shoot-rsyslog-relp-dashboards ConfigMap which contains the definitions for: Plutono dashboard for the Rsyslog component, and the shoot-shoot-rsyslog-relp ServiceMonitor and PrometheusRule resources which contains the definitions for: scraping metrics by prometheus, alerting rules.
As part of the Shoot reconciliation flow, the gardenlet deploys the OperatingSystemConfig resource.
The shoot-rsyslog-relp extension serves a webhook that mutates the OperatingSystemConfig resource for Shoots having the shoot-rsyslog-relp extension enabled (the corresponding namespace gets labeled by the gardenlet with extensions.gardener.cloud/shoot-rsyslog-relp=true). pkg/webhook/operatingsystemconfig/ensurer.go contains implementation of the genericmutator.Ensurer interface.
1. The webhook renders the 60-audit.conf.tpl template script and appends it to the OperatingSystemConfig files. When rendering the template, the configuration of the shoot-rsyslog-relp extension is used to fill in the required template values. The file is installed as /var/lib/rsyslog-relp-configurator/rsyslog.d/60-audit.conf on the host OS.
2. The webhook appends the audit rules to the OperatingSystemConfig. The files are installed under /var/lib/rsyslog-relp-configurator/rules.d on the host OS.
3. If the user has specified alternative audit rules in a config map reference, the webhook fetches the referenced ConfigMap from the Shoot’s control plane namespace and decodes the value of its auditd data key into an object of type Auditd. It then takes the auditRules defined in the object and places those under the /var/lib/rsyslog-relp-configurator/rules.d directory in a single file.
4. The webhook renders the configure-rsyslog.tpl.sh script and appends it to the OperatingSystemConfig files. This script is installed as /var/lib/rsyslog-relp-configurator/configure-rsyslog.sh on the host OS. It keeps the configuration of the rsyslog systemd service up-to-date by copying /var/lib/rsyslog-relp-configurator/rsyslog.d/60-audit.conf to /etc/rsyslog.d/60-audit.conf, if /etc/rsyslog.d/60-audit.conf does not exist or the files differ. The script also takes care of syncing the audit rules in /etc/audit/rules.d with the ones installed in /var/lib/rsyslog-relp-configurator/rules.d and restarts the auditd systemd service if necessary.
5. The webhook renders the process-rsyslog-pstats.tpl.sh and appends it to the OperatingSystemConfig files. This script receives metrics from the rsyslog process, transforms them, and writes them to /var/lib/node-exporter/textfile-collector/rsyslog_pstats.prom so that they can be collected by the node-exporter.
6. As part of the Shoot reconciliation, before the shoot-rsyslog-relp extension is deployed, the gardenlet copies all Secret and ConfigMap resources referenced in .spec.resources[] to the Shoot’s control plane namespace on the Seed. When the .tls.enabled field is true in the shoot-rsyslog-relp extension configuration, a value for .tls.secretReferenceName must also be specified so that it references a named resource reference in the Shoot’s .spec.resources[] array. The webhook appends the data of the referenced Secret in the Shoot’s control plane namespace to the OperatingSystemConfig files.
7. The webhook appends the rsyslog-configurator.service unit to the OperatingSystemConfig units. The unit invokes the configure-rsyslog.sh script every 15 seconds.

Extension Disablement

This section outlines how the extension disablement works, i.e., the extension has to be removed from the Shoot spec.

As part of the Shoot reconciliation flow, the gardenlet destroys the Extension resource because it is no longer needed.
1. As part of the deletion flow, the shoot-rsyslog-relp extension deploys the rsyslog-relp-configuration-cleaner DaemonSet to the Shoot cluster to clean up the existing rsyslog configuration and revert the audit rules.

Shoot Deletion

This section outlines how the deletion works for a Shoot with the shoot-rsyslog-relp extension enabled.

As part of the Shoot deletion flow, the gardenlet destroys the Extension resource.
1. In the Shoot deletion flow, the Extension resource is deleted after the Worker resource. Hence, there is no need to deploy the rsyslog-relp-configuration-cleaner DaemonSet to the Shoot cluster to clean up the existing rsyslog configuration and revert the audit rules.

8 - OpenID Connect Services

Gardener extension controller for OpenID Connect services for shoot clusters

Gardener Extension for openid connect services

Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.

This controller implements Gardener’s extension contract for the shoot-oidc-service extension.

An example for a ControllerRegistration resource that can be used to register this controller to Gardener can be found here.

Please find more information regarding the extensibility concepts and a detailed proposal here.

Compatibility

The following lists compatibility requirements of this extension controller with regards to other Gardener components.

OIDC Extension	Gardener	Notes
`== v0.15.0`	`>= 1.60.0 <= v1.64.0`	A typical side-effect when running Gardener < v1.63.0 is an unexpected scale-down of the OIDC webhook from `2 -> 1`.
`== v0.16.0`	`>= 1.65.0`

Extension Resources

Example extension resource:

apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
  name: extension-shoot-oidc-service
  namespace: shoot--project--abc
spec:
  type: shoot-oidc-service

When an extension resource is reconciled, the extension controller will create an instance of OIDC Webhook Authenticator. These resources are placed inside the shoot namespace on the seed. Also, the controller takes care about generating necessary RBAC resources for the seed as well as for the shoot.

Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters.

How to start using or developing this extension controller locally

You can run the controller locally on your machine by executing make start.

We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.

Feedback and Support

Feedback and contributions are always welcome!

Please report bugs or suggestions as GitHub issues or reach out on Slack (join the workspace here).

Learn more!

Please find further resources about out project here:

8.1 - Deployment

Gardener OIDC Service for Shoots

Introduction

Gardener allows Shoot clusters to dynamically register OpenID Connect providers. To support this the Gardener must be installed with the shoot-oidc-service extension.

Configuration

To generally enable the OIDC service for shoot objects the shoot-oidc-service extension must be registered by providing an appropriate extension registration in the garden cluster.

Here it is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot.

If the extension should be used for all shoots the globallyEnabled flag should be set to true.

spec:
  resources:
    - kind: Extension
      type: shoot-oidc-service
      globallyEnabled: true

Shoot Feature Gate

If the shoot OIDC service is not globally enabled by default (depends on the extension registration on the garden cluster), it can be enabled per shoot. To enable the service for a shoot, the shoot manifest must explicitly add the shoot-oidc-service extension.

...
spec:
  extensions:
    - type: shoot-oidc-service
...

If the shoot OIDC service is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.

...
spec:
  extensions:
    - type: shoot-oidc-service
      disabled: true
...

8.2 - Openidconnects

Register OpenID Connect provider in Shoot Clusters

Introduction

Within a shoot cluster, it is possible to dynamically register OpenID Connect providers. It is necessary that the Gardener installation your shoot cluster runs in is equipped with a shoot-oidc-service extension. Please ask your Gardener operator if the extension is available in your environment.

Important
Kubernetes v1.29 introduced support for Structured Authentication. Gardener allows the use of this feature for shoot clusters with Kubernetes version >= 1.30.
Structured Authentication should be preferred over the Gardener OIDC Extension in case:
you do not need more than 64 authenticators (a limitation that is tracked in https://github.com/kubernetes/kubernetes/issues/122809)
you do not need to register an issuer twice (anyways not recommended since it can lead to misconfiguration and user impersonation if done poorly)
you need the ability to write custom expressions to map and validate claims
you need support for multiple audiences per authenticator
you need support for providers that don’t support OpenID connect discovery
See how to make use of Structured Authentication in Gardener.

Shoot Feature Gate

In most of the Gardener setups the shoot-oidc-service extension is not enabled globally and thus must be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to activate the extension individually.

kind: Shoot
...
spec:
  extensions:
    - type: shoot-oidc-service
...

OpenID Connect provider

In order to register an OpenID Connect provider an openidconnect resource should be deployed in the shoot cluster.

Caution
It is strongly recommended to NOT disable prefixing since it may result in unwanted impersonations. The rule of thumb is to always use meaningful and unique prefixes for both username and groups. A good way to ensure this is to use the name of the openidconnect resource as shown in the example below.

Caution
It is strongly recommended to have unique issuer URLs across all openidconnects and other places that might contain issuer URL configurations related to authentication (e.g. Structured Authentication even forbids that). Users handled by an issuer must be identified in a consistent, predictable and unique way!

apiVersion: authentication.gardener.cloud/v1alpha1
kind: OpenIDConnect
metadata:
  name: abc
spec:
  # issuerURL is the URL the provider signs ID Tokens as.
  # This will be the "iss" field of all tokens produced by the provider and is used for configuration discovery.
  issuerURL: https://abc-oidc-provider.example

  # clientID is the audience for which the JWT must be issued for, the "aud" field.
  clientID: my-shoot-cluster

  # usernameClaim is the JWT field to use as the user's username.
  usernameClaim: sub

  # usernamePrefix, if specified, causes claims mapping to username to be prefix with the provided value.
  # A value "oidc:" would result in usernames like "oidc:john".
  # If not provided, the prefix defaults to "( .metadata.name )/". The value "-" can be used to disable all prefixing.
  usernamePrefix: "abc:"

  # groupsClaim, if specified, causes the OIDCAuthenticator to try to populate the user's groups with an ID Token field.
  # If the groupsClaim field is present in an ID Token the value must be a string or list of strings.
  # groupsClaim: groups

  # groupsPrefix, if specified, causes claims mapping to group names to be prefixed with the value.
  # A value "oidc:" would result in groups like "oidc:engineering" and "oidc:marketing".
  # If not provided, the prefix defaults to "( .metadata.name )/".
  # The value "-" can be used to disable all prefixing.
  # groupsPrefix: "abc:"

  # caBundle is a PEM encoded CA bundle which will be used to validate the OpenID server's certificate. If unspecified, system's trusted certificates are used.
  # caBundle: <base64 encoded bundle>

  # supportedSigningAlgs sets the accepted set of JOSE signing algorithms that can be used by the provider to sign tokens.
  # The default value is RS256.
  # supportedSigningAlgs:
  # - RS256

  # requiredClaims, if specified, causes the OIDCAuthenticator to verify that all the
  # required claims key value pairs are present in the ID Token.
  # requiredClaims:
  #   customclaim: requiredvalue

  # maxTokenExpirationSeconds if specified, sets a limit in seconds to the maximum validity duration of a token.
  # Tokens issued with validity greater that this value will not be verified.
  # Setting this will require that the tokens have the "iat" and "exp" claims.
  # maxTokenExpirationSeconds: 3600

  # jwks if specified, provides an option to specify JWKS keys offline.
  # jwks:
  #   keys is a base64 encoded JSON webkey Set. If specified, the OIDCAuthenticator skips the request to the issuer's jwks_uri endpoint to retrieve the keys.
  #   keys: <base64 encoded jwks>

9 - Registry Cache

Gardener extension controller which deploys pull-through caches for container registries.

Gardener Extension for Registry Cache

Gardener extension controller which deploys pull-through caches for container registries.

Usage

Configuring the Registry Cache Extension - learn what is the use-case for a pull-through cache, how to enable it and configure it
How to provide credentials for upstream repository?
Registry Cache Observability - learn what metrics and alerts are exposed and how to view the registry cache logs
Configuring the Registry Mirror Extension - learn what is the use-case for a registry mirror, how to enable and configure it

Local Setup and Development

Deploying Registry Cache Extension Locally - learn how to set up a local development environment
Deploying Registry Cache Extension in Gardener’s Local Setup with Provider Extensions - learn how to set up a development environment using own Seed clusters on an existing Kubernetes cluster
Developer Docs for Gardener Extension Registry Cache - learn about the inner workings

9.1 - Configuring the Registry Cache Extension

Learn what is the use-case for a pull-through cache, how to enable it and configure it

Configuring the Registry Cache Extension

Introduction

Use Case

For a Shoot cluster, the containerd daemon of every Node goes to the internet and fetches an image that it doesn’t have locally in the Node’s image cache. New Nodes are often created due to events such as auto-scaling (scale up), rolling update, or replacement of unhealthy Node. Such a new Node would need to pull all of the images of the Pods running on it from the internet because the Node’s cache is initially empty. Pulling an image from a registry produces network traffic and registry costs. To avoid these network traffic and registry costs, you can use the registry-cache extension to run a registry as pull-through cache.

The following diagram shows a rough outline of how an image pull looks like for a Shoot cluster without registry cache: shoot-cluster-without-registry-cache

Solution

The registry-cache extension deploys and manages a registry in the Shoot cluster that runs as pull-through cache. The used registry implementation is distribution/distribution.

How does it work?

When the extension is enabled, a registry cache for each configured upstream is deployed to the Shoot cluster. Along with this, the containerd daemon on the Shoot cluster Nodes gets configured to use as a mirror the Service IP address of the deployed registry cache. For example, if a registry cache for upstream docker.io is requested via the Shoot spec, then containerd gets configured to first pull the image from the deployed cache in the Shoot cluster. If this image pull operation fails, containerd falls back to the upstream itself (docker.io in that case).

The first time an image is requested from the pull-through cache, it pulls the image from the configured upstream registry and stores it locally, before handing it back to the client. On subsequent requests, the pull-through cache is able to serve the image from its own storage.

Note
The used registry implementation (distribution/distribution) supports mirroring of only one upstream registry.

The following diagram shows a rough outline of how an image pull looks like for a Shoot cluster with registry cache: shoot-cluster-with-registry-cache

Shoot Configuration

The extension is not globally enabled and must be configured per Shoot cluster. The Shoot specification has to be adapted to include the registry-cache extension configuration.

Below is an example of registry-cache extension configuration as part of the Shoot spec:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: crazy-botany
  namespace: garden-dev
spec:
  extensions:
  - type: registry-cache
    providerConfig:
      apiVersion: registry.extensions.gardener.cloud/v1alpha3
      kind: RegistryConfig
      caches:
      - upstream: docker.io
        volume:
          size: 100Gi
          # storageClassName: premium
      - upstream: ghcr.io
      - upstream: quay.io
        garbageCollection:
          ttl: 0s
        secretReferenceName: quay-credentials
      - upstream: my-registry.io:5000
        remoteURL: http://my-registry.io:5000
  # ...
  resources:
  - name: quay-credentials
    resourceRef:
      apiVersion: v1
      kind: Secret
      name: quay-credentials-v1

The providerConfig field is required.

The providerConfig.caches field contains information about the registry caches to deploy. It is a required field. At least one cache has to be specified.

The providerConfig.caches[].upstream field is the remote registry host to cache. It is a required field. The value must be a valid DNS subdomain (RFC 1123) and optionally a port (i.e. <host>[:<port>]). It must not include a scheme.

The providerConfig.caches[].remoteURL optional field is the remote registry URL. If configured, it must include an https:// or http:// scheme. If the field is not configured, the remote registry URL defaults to https://<upstream>. In case the upstream is docker.io, it defaults to https://registry-1.docker.io.

The providerConfig.caches[].volume field contains settings for the registry cache volume. The registry-cache extension deploys a StatefulSet with a volume claim template. A PersistentVolumeClaim is created with the configured size and StorageClass name.

The providerConfig.caches[].volume.size field is the size of the registry cache volume. Defaults to 10Gi. The size must be a positive quantity (greater than 0). This field is immutable. See Increase the cache disk size on how to resize the disk. The extension defines alerts for the volume. More information about the registry cache alerts and how to enable notifications for them can be found in the alerts documentation.

The providerConfig.caches[].volume.storageClassName field is the name of the StorageClass used by the registry cache volume. This field is immutable. If the field is not specified, then the default StorageClass will be used.

The providerConfig.caches[].garbageCollection.ttl field is the time to live of a blob in the cache. If the field is set to 0s, the garbage collection is disabled. Defaults to 168h (7 days). See the Garbage Collection section for more details.

The providerConfig.caches[].secretReferenceName is the name of the reference for the Secret containing the upstream registry credentials. To cache images from a private registry, credentials to the upstream registry should be supplied. For more details, see How to provide credentials for upstream registry.

Note
It is only possible to provide one set of credentials for one private upstream registry.

The providerConfig.caches[].proxy.httpProxy field represents the proxy server for HTTP connections which is used by the registry cache. It must include an https:// or http:// scheme.

The providerConfig.caches[].proxy.httpsProxy field represents the proxy server for HTTPS connections which is used by the registry cache. It must include an https:// or http:// scheme.

The providerConfig.caches[].http.tls field indicates whether TLS is enabled for the HTTP server of the registry cache. Defaults to true.

The providerConfig.caches[].highAvailability.enabled defines if the registry cache is scaled with the high availability feature. See the High Availability section for more details.

Garbage Collection

When the registry cache receives a request for an image that is not present in its local store, it fetches the image from the upstream, returns it to the client and stores the image in the local store. The registry cache runs a scheduler that deletes images when their time to live (ttl) expires. When adding an image to the local store, the registry cache also adds a time to live for the image. The ttl defaults to 168h (7 days) and is configurable. The garbage collection can be disabled by setting the ttl to 0s. Requesting an image from the registry cache does not extend the time to live of the image. Hence, an image is always garbage collected from the registry cache store when its ttl expires. At the time of writing this document, there is no functionality for garbage collection based on disk size - e.g., garbage collecting images when a certain disk usage threshold is passed. The garbage collection cannot be enabled once it is disabled. This constraint is added to mitigate distribution/distribution#4249.

Increase the Cache Disk Size

When there is no available disk space, the registry cache continues to respond to requests. However, it cannot store the remotely fetched images locally because it has no free disk space. In such case, it is simply acting as a proxy without being able to cache the images in its local store. The disk has to be resized to ensure that the registry cache continues to cache images.

There are two alternatives to enlarge the cache’s disk size:

[Alternative 1] Resize the PVC

To enlarge the PVC’s size, perform the following steps:

Make sure that the KUBECONFIG environment variable is targeting the correct Shoot cluster.
Find the PVC name to resize for the desired upstream. The below example fetches the PVC for the docker.io upstream:
```
kubectl -n kube-system get pvc -l upstream-host=docker.io
```

Patch the PVC’s size to the desired size. The below example patches the size of a PVC to 10Gi:

kubectl -n kube-system patch pvc $PVC_NAME --type merge -p '{"spec":{"resources":{"requests": {"storage": "10Gi"}}}}'

Make sure that the PVC gets resized. Describe the PVC to check the resize operation result:
```
kubectl -n kube-system describe pvc -l upstream-host=docker.io
```

Drawback of this approach: The cache’s size in the Shoot spec (providerConfig.caches[].size) diverges from the PVC’s size.

[Alternative 2] Remove and Readd the Cache

There is always the option to remove the cache from the Shoot spec and to readd it again with the updated size.

Drawback of this approach: The already cached images get lost and the cache starts with an empty disk.

High Availability

By default the registry cache runs with a single replica. This fact may lead to concerns for the high availability such as “What happens when the registry cache is down? Does containerd fail to pull the image?”. As outlined in the How does it work? section, containerd is configured to fall back to the upstream registry if it fails to pull the image from the registry cache. Hence, when the registry cache is unavailable, the containerd’s image pull operations are not affected because containerd falls back to image pull from the upstream registry.

In special cases where this is not enough it is possible to set providerConfig.caches[].highAvailability.enabled to true. This will add the label high-availability-config.resources.gardener.cloud/type=server to the StatefulSet and it will be scaled to 2 replicas. Appropriate Pod Topology Spread Constraints will be added to the registry cache Pods according to the Shoot cluster configuration. See also High Availability of Deployed Components. Pay attention that each registry cache replica uses its own volume, so each registry cache pulls the image from the upstream and stores it in its volume.

Possible Pitfalls

The used registry implementation (the Distribution project) supports mirroring of only one upstream registry. The extension deploys a pull-through cache for each configured upstream.
us-docker.pkg.dev, europe-docker.pkg.dev, and asia-docker.pkg.dev are different upstreams. Hence, configuring pkg.dev as upstream won’t cache images from us-docker.pkg.dev, europe-docker.pkg.dev, or asia-docker.pkg.dev.

Limitations

Images that are pulled before a registry cache Pod is running or before a registry cache Service is reachable from the corresponding Node won’t be cached - containerd will pull these images directly from the upstream.
The reasoning behind this limitation is that a registry cache Pod is running in the Shoot cluster. To have a registry cache’s Service cluster IP reachable from containerd running on the Node, the registry cache Pod has to be running and kube-proxy has to configure iptables/IPVS rules for the registry cache Service. If kube-proxy hasn’t configured iptables/IPVS rules for the registry cache Service, then the image pull times (and new Node bootstrap times) will be increased significantly. For more detailed explanations, see point 2. and gardener/gardener-extension-registry-cache#68.
That’s why the registry configuration on a Node is applied only after the registry cache Service is reachable from the Node. The gardener-node-agent.service systemd unit sends requests to the registry cache’s Service. Once the registry cache responds with HTTP 200, the unit creates the needed registry configuration file (hosts.toml).
As a result, for images from Shoot system components:
- On Shoot creation with the registry cache extension enabled, a registry cache is unable to cache all of the images from the Shoot system components. Usually, until the registry cache Pod is running, containerd pulls from upstream the images from Shoot system components (before the registry configuration gets applied).
- On new Node creation for existing Shoot with the registry cache extension enabled, a registry cache is unable to cache most of the images from Shoot system components. The reachability of the registry cache Service requires the Service network to be set up, i.e., the kube-proxy for that new Node to be running and to have set up iptables/IPVS configuration for the registry cache Service.
containerd requests will time out in 30s in case kube-proxy hasn’t configured iptables/IPVS rules for the registry cache Service - the image pull times will increase significantly.
containerd is configured to fall back to the upstream itself if a request against the cache fails. However, if the cluster IP of the registry cache Service does not exist or if kube-proxy hasn’t configured iptables/IPVS rules for the registry cache Service, then containerd requests against the registry cache time out in 30 seconds. This significantly increases the image pull times because containerd does multiple requests as part of the image pull (HEAD request to resolve the manifest by tag, GET request for the manifest by SHA, GET requests for blobs)
Example: If the Service of a registry cache is deleted, then a new Service will be created. containerd’s registry config will still contain the old Service’s cluster IP. containerd requests against the old Service’s cluster IP will time out and containerd will fall back to upstream.
- Image pull of docker.io/library/alpine:3.13.2 from the upstream takes ~2s while image pull of the same image with invalid registry cache cluster IP takes ~2m.2s.
- Image pull of eu.gcr.io/gardener-project/gardener/ops-toolbelt:0.18.0 from the upstream takes ~10s while image pull of the same image with invalid registry cache cluster IP takes ~3m.10s.
Amazon Elastic Container Registry is currently not supported. For details see distribution/distribution#4383.

9.2 - Configuring the Registry Mirror Extension

Learn what is the use-case for a registry mirror, how to enable and configure it

Configuring the Registry Mirror Extension

Introduction

Use Case

containerd allows registry mirrors to be configured. Use cases are:

Usage of public mirror(s) - for example, circumvent issues with the upstream registry such as rate limiting, outages, and others.
Usage of private mirror(s) - for example, reduce network costs by using a private mirror running in the same network.

Solution

The registry-mirror extension allows the registry mirror configuration to be configured via the Shoot spec directly.

How does it work?

When the extension is enabled, the containerd daemon on the Shoot cluster Nodes gets configured to use the requested mirrors as a mirror. For example, if for the upstream docker.io the mirror https://mirror.gcr.io is configured in the Shoot spec, then containerd gets configured to first pull the image from the mirror (https://mirror.gcr.io in that case). If this image pull operation fails, containerd falls back to the upstream itself (docker.io in that case).

The extension is based on the contract described in containerd Registry Configuration. The corresponding upstream documentation in containerd is Registry Configuration - Introduction.

Shoot Configuration

The Shoot specification has to be adapted to include the registry-mirror extension configuration.

Below is an example of registry-mirror extension configuration as part of the Shoot spec:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: crazy-botany
  namespace: garden-dev
spec:
  extensions:
  - type: registry-mirror
    providerConfig:
      apiVersion: mirror.extensions.gardener.cloud/v1alpha1
      kind: MirrorConfig
      mirrors:
      - upstream: docker.io
        hosts:
        - host: "https://mirror.gcr.io"
          capabilities: ["pull"]

The providerConfig field is required.

The providerConfig.mirrors field contains information about the registry mirrors to configure. It is a required field. At least one mirror has to be specified.

The providerConfig.mirror[].upstream field is the remote registry host to mirror. It is a required field. The value must be a valid DNS subdomain (RFC 1123) and optionally a port (i.e. <host>[:<port>]). It must not include a scheme.

The providerConfig.mirror[].hosts field represents the mirror hosts to be used for the upstream. At least one mirror host has to be specified.

The providerConfig.mirror[].hosts[].host field is the mirror host. It is a required field. The value must include a scheme - http:// or https://.

The providerConfig.mirror[].hosts[].capabilities field represents the operations a host is capable of performing. This also represents the set of operations for which the mirror host may be trusted to perform. Defaults to ["pull"]. The supported values are pull and resolve. See the capabilities field documentation for more information on which operations are considered trusted ones against public/private mirrors.

9.3 - Deploying Registry Cache Extension in Gardener's Local Setup with Provider Extensions

Learn how to set up a development environment using own Seed clusters on an existing Kubernetes cluster

Deploying Registry Cache Extension in Gardener’s Local Setup with Provider Extensions

Prerequisites

Make sure that you have a running local Gardener setup with enabled provider extensions. The steps to complete this can be found in the Deploying Gardener Locally and Enabling Provider-Extensions guide.

Setting up the Registry Cache Extension

Make sure that your KUBECONFIG environment variable is targeting the local Gardener cluster.

The location of the Gardener project from the Gardener setup step is expected to be under the same root (e.g. ~/go/src/github.com/gardener/). If this is not the case, the location of Gardener project should be specified in GARDENER_REPO_ROOT environment variable:

export GARDENER_REPO_ROOT="<path_to_gardener_project>"

Then you can run:

make remote-extension-up

In case you have added additional Seeds you can specify the seed name:

make remote-extension-up SEED_NAME=<seed-name>

The corresponding make target will build the extension image, push it into the Seed cluster image registry, and deploy the registry-cache ControllerDeployment and ControllerRegistration resources into the kind cluster. The container image in the ControllerDeployment will be the image that was build and pushed into the Seed cluster image registry.

The make target will then deploy the registry-cache admission component. It will build the admission image, push it into the kind cluster image registry, and finally install the admission component charts to the kind cluster.

Creating a Shoot Cluster

Once the above step is completed, you can create a Shoot cluster. In order to create a Shoot cluster, please create your own Shoot definition depending on providers on your Seed cluster.

Tearing Down the Development Environment

To tear down the development environment, delete the Shoot cluster or disable the registry-cache extension in the Shoot’s specification. When the extension is not used by the Shoot anymore, you can run:

make remote-extension-down

The make target will delete the ControllerDeployment and ControllerRegistration of the extension, and the registry-cache admission helm deployment.

9.4 - Deploying Registry Cache Extension Locally

Learn how to set up a local development environment

Deploying Registry Cache Extension Locally

Prerequisites

Make sure that you have a running local Gardener setup. The steps to complete this can be found in the Deploying Gardener Locally guide.

Setting up the Registry Cache Extension

Make sure that your KUBECONFIG environment variable is targeting the local Gardener cluster. When this is ensured, run:

make extension-up

The corresponding make target will build the extension image, load it into the kind cluster Nodes, and deploy the registry-cache ControllerDeployment and ControllerRegistration resources. The container image in the ControllerDeployment will be the image that was build and loaded into the kind cluster Nodes.

The make target will then deploy the registry-cache admission component. It will build the admission image, load it into the kind cluster Nodes, and finally install the admission component charts to the kind cluster.

Creating a Shoot Cluster

Once the above step is completed, you can create a Shoot cluster.

example/shoot-registry-cache.yaml contains a Shoot specification with the registry-cache extension:

kubectl create -f example/shoot-registry-cache.yaml

example/shoot-registry-mirror.yaml contains a Shoot specification with the registry-mirror extension:

kubectl create -f example/shoot-registry-mirror.yaml

Tearing Down the Development Environment

make extension-down

The make target will delete the ControllerDeployment and ControllerRegistration of the extension, and the registry-cache admission helm deployment.

Alternative Setup Using the `gardener-operator` Local Setup

Alternatively, you can deploy the registry-cache extension in the gardener-operator local setup. To do this, make sure you are have a running local setup based on Alternative Way to Set Up Garden and Seed Leveraging gardener-operator. The KUBECONFIG environment variable should target the operator local KinD cluster (i.e. <path_to_gardener_project>/example/gardener-local/kind/operator/kubeconfig).

Creating the registry-cache `Extension.operator.gardener.cloud` resource:

make extension-operator-up

The corresponding make target will build the registry-cache admission and extension container images, OCI artifacts for the admission runtime and application charts, and the extension chart. Then, the container images and the OCI artifacts are pushed into the default skaffold registry (i.e. garden.local.gardener.cloud:5001). Next, the registry-cache Extension.operator.gardener.cloud resource is deployed into the KinD cluster. Based on this resource the gardener-operator will deploy the registry-cache admission component, as well as the registry-cache ControllerDeployment and ControllerRegistration resources.

Creating a Shoot Cluster

To create a Shoot cluster the KUBECONFIG environment variable should target virtual garden cluster (i.e. <path_to_gardener_project>/example/operator/virtual-garden/kubeconfig) and then execute:

kubectl create -f example/shoot-registry-cache.yaml

Delete the registry-cache `Extension.operator.gardener.cloud` resource

Make sure the environment variable KUBECONFIG points to the operator’s local KinD cluster and then run:

make extension-operator-down

The corresponding make target will delete the Extension.operator.gardener.cloud resource. Consequently, the gardener-operator will delete the registry-cache admission component and registry-cache ControllerDeployment and ControllerRegistration resources.

9.5 - Developer Docs for Gardener Extension Registry Cache

Learn about the inner workings

Developer Docs for Gardener Extension Registry Cache

This document outlines how Shoot reconciliation and deletion works for a Shoot with the registry-cache extension enabled.

Shoot Reconciliation

This section outlines how the reconciliation works for a Shoot with the registry-cache extension enabled.

Extension Enablement / Reconciliation

This section outlines how the extension enablement/reconciliation works, e.g., the extension has been added to the Shoot spec.

As part of the Shoot reconciliation flow, the gardenlet deploys the Extension resource.
The registry-cache extension reconciles the Extension resource. pkg/controller/cache/actuator.go contains the implementation of the extension.Actuator interface. The reconciliation of an Extension of type registry-cache consists of the following steps:
1. The registry-cache extension deploys resources to the Shoot cluster via ManagedResource. For every configured upstream, it creates a StatefulSet (with PVC), Service, and other resources.
2. It lists all Services from the kube-system namespace that have the upstream-host label. It will return an error (and retry in exponential backoff) until the Services count matches the configured registries count.
3. When there is a Service created for each configured upstream registry, the registry-cache extension populates the Extension resource status. In the Extension status, for each upstream, it maintains an endpoint (in the format http://<cluster-ip>:5000) which can be used to access the registry cache from within the Shoot cluster. <cluster-ip> is the cluster IP of the registry cache Service. The cluster IP of a Service is assigned by the Kubernetes API server on Service creation.
As part of the Shoot reconciliation flow, the gardenlet deploys the OperatingSystemConfig resource.
The registry-cache extension serves a webhook that mutates the OperatingSystemConfig resource for Shoots having the registry-cache extension enabled (the corresponding namespace gets labeled by the gardenlet with extensions.gardener.cloud/registry-cache=true). pkg/webhook/cache/ensurer.go contains an implementation of the genericmutator.Ensurer interface.
1. The webhook appends or updates RegistryConfig entries in the OperatingSystemConfig CRI configuration that corresponds to configured registry caches in the Shoot. The RegistryConfig readiness probe is enabled so that gardener-node-agent creates a hosts.toml containerd registry configuration file when all RegistryConfig hosts are reachable.

Extension Disablement

This section outlines how the extension disablement works, i.e., the extension has to be removed from the Shoot spec.

As part of the Shoot reconciliation flow, the gardenlet destroys the Extension resource because it is no longer needed.
1. The extension deletes the ManagedResource containing the registry cache resources.
2. The OperatingSystemConfig resource will not be mutated and no RegistryConfig entries will be added or updated. The gardener-node-agent detects that RegistryConfig entries have been removed or changed and deletes or updates corresponding hosts.toml configuration files under /etc/containerd/certs.d folder.

Shoot Deletion

This section outlines how the deletion works for a Shoot with the registry-cache extension enabled.

As part of the Shoot deletion flow, the gardenlet destroys the Extension resource.
1. The extension deletes the ManagedResource containing the registry cache resources.

9.6 - How to provide credentials for upstream registry?

How to provide credentials for upstream registry?

In Kubernetes, to pull images from private container image registries you either have to specify an image pull Secret (see Pull an Image from a Private Registry) or you have to configure the kubelet to dynamically retrieve credentials using a credential provider plugin (see Configure a kubelet image credential provider). When pulling an image, the kubelet is providing the credentials to the CRI implementation. The CRI implementation uses the provided credentials against the upstream registry to pull the image.

The registry-cache extension is using the Distribution project as pull through cache implementation. The Distribution project does not use the provided credentials from the CRI implementation while fetching an image from the upstream. Hence, the above-described scenarios such as configuring image pull Secret for a Pod or configuring kubelet credential provider plugins don’t work out of the box with the pull through cache provided by the registry-cache extension. Instead, the Distribution project supports configuring only one set of credentials for a given pull through cache instance (for a given upstream).

This document describe how to supply credentials for the private upstream registry in order to pull private image with the registry cache.

How to configure the registry cache to use upstream registry credentials?

Create an immutable Secret with the upstream registry credentials in the Garden cluster:

kubectl create -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: ro-docker-secret-v1
  namespace: garden-dev
type: Opaque
immutable: true
data:
  username: $(echo -n $USERNAME | base64 -w0)
  password: $(echo -n $PASSWORD | base64 -w0)
EOF

For Artifact Registry, the username is _json_key and the password is the service account key in JSON format. To base64 encode the service account key, copy it and run:

echo -n $SERVICE_ACCOUNT_KEY_JSON | base64 -w0

Add the newly created Secret as a reference to the Shoot spec, and then to the registry-cache extension configuration.

In the registry-cache configuration, set the secretReferenceName field. It should point to a resource reference under spec.resources. The resource reference itself points to the Secret in project namespace.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
# ...
spec:
  extensions:
  - type: registry-cache
    providerConfig:
      apiVersion: registry.extensions.gardener.cloud/v1alpha3
      kind: RegistryConfig
      caches:
      - upstream: docker.io
        secretReferenceName: docker-secret
  # ...
  resources:
  - name: docker-secret
    resourceRef:
      apiVersion: v1
      kind: Secret
      name: ro-docker-secret-v1
# ...

Warning
Do not delete the referenced Secret when there is a Shoot still using it.

How to rotate the registry credentials?

To rotate registry credentials perform the following steps:

Generate a new pair of credentials in the cloud provider account. Do not invalidate the old ones.
Create a new Secret (e.g., ro-docker-secret-v2) with the newly generated credentials as described in step 1. in How to configure the registry cache to use upstream registry credentials?.
Update the Shoot spec with newly created Secret as described in step 2. in How to configure the registry cache to use upstream registry credentials?.
The above step will trigger a Shoot reconciliation. Wait for it to complete.
Make sure that the old Secret is no longer referenced by any Shoot cluster. Finally, delete the Secret containing the old credentials (e.g., ro-docker-secret-v1).
Delete the corresponding old credentials from the cloud provider account.

Possible Pitfalls

The registry cache is not protected by any authentication/authorization mechanism. The cached images (incl. private images) can be fetched from the registry cache without authentication/authorization. Note that the registry cache itself is not exposed publicly.
The registry cache provides the credentials for every request against the corresponding upstream. In some cases, misconfigured credentials can prevent the registry cache to pull even public images from the upstream (for example: invalid service account key for Artifact Registry). However, this behaviour is controlled by the server-side logic of the upstream registry.
Do not remove the image pull Secrets when configuring credentials for the registry cache. When the registry-cache is not available, containerd falls back to the upstream registry. containerd still needs the image pull Secret to pull the image and in this way to have the fallback mechanism working.

9.7 - Observability

Registry Cache Observability

The registry-cache extension exposes metrics for the registry caches running in the Shoot cluster so that they can be easily viewed by cluster owners and operators in the Shoot’s Prometheus and Plutono instances. The exposed monitoring data provides an overview of the performance of the pull-through caches, including hit rate and network traffic data.

Metrics

A registry cache serves several metrics. The metrics are scraped by the Shoot’s Prometheus instance.

The Registry Caches dashboard in the Shoot’s Plutono instance contains several panels which are built using the registry cache metrics. From the Registry dropdown menu you can select the upstream for which you wish the metrics to be displayed (by default, metrics are summed for all upstream registries).

Following is a list of all exposed registry cache metrics. The upstream_host label can be used to determine the upstream host to which the metrics are related, while the type label can be used to determine weather the metric is for an image blob or an image manifest:

registry_proxy_requests_total

The number of total incoming request received.

Type: Counter
Labels: upstream_host type

registry_proxy_hits_total

The number of total cache hits; i.e. the requested content exists in the registry cache’s image store and it is served from there (upstream is not contacted at all for serving the requested content).

Type: Counter
Labels: upstream_host type

registry_proxy_misses_total

The number of total cache misses; i.e. the requested content does not exist in the registry cache’s image store and it is fetched from the upstream.

Type: Counter
Labels: upstream_host type

registry_proxy_pulled_bytes_total

The size of total bytes that the registry cache has pulled from the upstream.

Type: Counter
Labels: upstream_host type

registry_proxy_pushed_bytes_total

The size of total bytes pushed to the registry cache’s clients.

Type: Counter
Labels: upstream_host type

Alerts

There are two alerts defined for the registry cache PersistentVolume in the Shoot’s Prometheus instance:

RegistryCachePersistentVolumeUsageCritical

This indicates that the registry cache PersistentVolume is almost full and less than 5% is free. When there is no available disk space, no new images will be cached. However, image pull operations are not affected. An alert is fired when the following expression evaluates to true:

100 * (
 kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"^cache-volume-registry-.+$"}
   /
 kubelet_volume_stats_capacity_bytes{persistentvolumeclaim=~"^cache-volume-registry-.+$"}
) < 5

RegistryCachePersistentVolumeFullInFourDays

This indicates that the registry cache PersistentVolume is expected to fill up within four days based on recent sampling. An alert is fired when the following expression evaluates to true:

100 * (
 kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"^cache-volume-registry-.+$"}
   /
 kubelet_volume_stats_capacity_bytes{persistentvolumeclaim=~"^cache-volume-registry-.+$"}
) < 15
and
predict_linear(kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"^cache-volume-registry-.+$"}[30m], 4 * 24 * 3600) <= 0

Users can subscribe to these alerts by following the Gardener alerting guide.

Logging

To view the registry cache logs in Plutono, navigate to the Explore tab and select vali from the Explore dropdown menu. Afterwards enter the following vali query:

{container_name="registry-cache"} to view the logs for all registries.
{pod_name=~"registry-<upstream_host>.+"} to view the logs for specific upstream registry.