This is the multi-page printable view of this section. Click here to print.
Others
1 - Certificate services
Gardener Extension for certificate services
Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.
Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.
Configuration
Example configuration for this extension controller:
apiVersion: shoot-cert-service.extensions.config.gardener.cloud/v1alpha1
kind: Configuration
issuerName: gardener
restrictIssuer: true # restrict issuer to any sub-domain of shoot.spec.dns.domain (default)
acme:
email: john.doe@example.com
server: https://acme-v02.api.letsencrypt.org/directory
# privateKey: | # Optional key for Let's Encrypt account.
# -----BEGIN BEGIN RSA PRIVATE KEY-----
# ...
# -----END RSA PRIVATE KEY-----
Extension-Resources
Example extension resource:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: "extension-certificate-service"
namespace: shoot--project--abc
spec:
type: shoot-cert-service
When an extension resource is reconciled, the extension controller will create an instance of Cert-Management as well as an Issuer
with the ACME information provided in the configuration above. These resources are placed inside the shoot namespace on the seed. Also, the controller takes care about generating necessary RBAC
resources for the seed as well as for the shoot.
Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters, i.e. it never deploys them directly.
How to start using or developing this extension controller locally
You can run the controller locally on your machine by executing make start
. Please make sure to have the kubeconfig to the cluster you want to connect to ready in the ./dev/kubeconfig
file.
Static code checks and tests can be executed by running make verify
. We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.
Feedback and Support
Feedback and contributions are always welcome. Please report bugs or suggestions as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).
Learn more!
Please find further resources about out project here:
1.1 - Changing alerting settings
Changing alerting settings
Certificates are normally renewed automatically 30 days before they expire. As a second line of defense, there is an alerting in Prometheus activated if the certificate is a few days before expiration. By default, the alert is triggered 15 days before expiration.
You can configure the days in the providerConfig
of the extension.
Setting it to 0 disables the alerting.
In this example, the days are changed to 3 days before expiration.
kind: Shoot
...
spec:
extensions:
- type: shoot-cert-service
providerConfig:
apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
kind: CertConfig
alerting:
certExpirationAlertDays: 3
1.2 - Manage certificates with Gardener for default domain
Manage certificates with Gardener for default domain
Introduction
Dealing with applications on Kubernetes which offer a secure service endpoints (e.g. HTTPS) also require you to enable a secured communication via SSL/TLS. With the certificate extension enabled, Gardener can manage commonly trusted X.509 certificate for your application endpoint. From initially requesting certificate, it also handeles their renewal in time using the free Let’s Encrypt API.
There are two senarios with which you can use the certificate extension
- You want to use a certificate for a subdomain the shoot’s default DNS (see
.spec.dns.domain
of your shoot resource, e.g.short.ingress.shoot.project.default-domain.gardener.cloud
). If this is your case, please keep reading this article. - You want to use a certificate for a custom domain. If this is your case, please see Manage certificates with Gardener for public domain
Prerequisites
Before you start this guide there are a few requirements you need to fulfill:
- You have an existing shoot cluster
Since you are using the default DNS name, all DNS configuration should already be done and ready.
Issue a certificate
Every X.509 certificate is represented by a Kubernetes custom resource certificate.cert.gardener.cloud
in your cluster. A Certificate
resource may be used to initiate a new certificate request as well as to manage its lifecycle. Gardener’s certificate service regularly checks the expiration timestamp of Certificates, triggers a renewal process if necessary and replaces the existing X.509 certificate with a new one.
Your application should be able to reload replaced certificates in a timely manner to avoid service disruptions.
Certificates can be requested via 3 resources type
- Ingress
- Service (type LoadBalancer)
- certificate (Gardener CRD)
If either of the first 2 are used, a corresponding Certificate
resource will automatically be created.
Using an ingress Resource
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: amazing-ingress
annotations:
cert.gardener.cloud/purpose: managed
#cert.gardener.cloud/issuer: custom-issuer # optional to specify custom issuer (use namespace/name for shoot issuers)
#cert.gardener.cloud/follow-cname: "true" # optional, same as spec.followCNAME in certificates
#cert.gardener.cloud/secret-labels: "key1=value1,key2=value2" # optional labels for the certificate secret
#cert.gardener.cloud/preferred-chain: "chain name" # optional to specify preferred-chain (value is the Subject Common Name of the root issuer)
#cert.gardener.cloud/private-key-algorithm: ECDSA # optional to specify algorithm for private key, allowed values are 'RSA' or 'ECDSA'
#cert.gardener.cloud/private-key-size: "384" # optional to specify size of private key, allowed values for RSA are "2048", "3072", "4096" and for ECDSA "256" and "384"spec:
tls:
- hosts:
# Must not exceed 64 characters.
- short.ingress.shoot.project.default-domain.gardener.cloud
# Certificate and private key reside in this secret.
secretName: tls-secret
rules:
- host: short.ingress.shoot.project.default-domain.gardener.cloud
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: amazing-svc
port:
number: 8080
Using a service type LoadBalancer
apiVersion: v1
kind: Service
metadata:
annotations:
cert.gardener.cloud/purpose: managed
# Certificate and private key reside in this secret.
cert.gardener.cloud/secretname: tls-secret
# You may add more domains separated by commas (e.g. "service.shoot.project.default-domain.gardener.cloud, amazing.shoot.project.default-domain.gardener.cloud")
dns.gardener.cloud/dnsnames: "service.shoot.project.default-domain.gardener.cloud"
dns.gardener.cloud/ttl: "600"
#cert.gardener.cloud/issuer: custom-issuer # optional to specify custom issuer (use namespace/name for shoot issuers)
#cert.gardener.cloud/follow-cname: "true" # optional, same as spec.followCNAME in certificates
#cert.gardener.cloud/secret-labels: "key1=value1,key2=value2" # optional labels for the certificate secret
#cert.gardener.cloud/preferred-chain: "chain name" # optional to specify preferred-chain (value is the Subject Common Name of the root issuer)
#cert.gardener.cloud/private-key-algorithm: ECDSA # optional to specify algorithm for private key, allowed values are 'RSA' or 'ECDSA'
#cert.gardener.cloud/private-key-size: "384" # optional to specify size of private key, allowed values for RSA are "2048", "3072", "4096" and for ECDSA "256" and "384" name: test-service
namespace: default
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
type: LoadBalancer
Using the custom Certificate resource
apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
name: cert-example
namespace: default
spec:
commonName: short.ingress.shoot.project.default-domain.gardener.cloud
secretRef:
name: tls-secret
namespace: default
# Optionnal if using the default issuer
issuerRef:
name: garden
If you’re interested in the current progress of your request, you’re advised to consult the description, more specifically the status
attribute in case the issuance failed.
Request a wildcard certificate
In order to avoid the creation of multiples certificates for every single endpoints, you may want to create a wildcard certificate for your shoot’s default cluster.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: amazing-ingress
annotations:
cert.gardener.cloud/purpose: managed
cert.gardener.cloud/commonName: "*.ingress.shoot.project.default-domain.gardener.cloud"
spec:
tls:
- hosts:
- amazing.ingress.shoot.project.default-domain.gardener.cloud
secretName: tls-secret
rules:
- host: amazing.ingress.shoot.project.default-domain.gardener.cloud
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: amazing-svc
port:
number: 8080
Please note that this can also be achived by directly adding an annotation to a Service type LoadBalancer. You could also create a Certificate object with a wildcard domain.
More information
For more information and more examples about using the certificate extension, please see Manage certificates with Gardener for public domain
1.3 - Manage certificates with Gardener for public domain
Manage certificates with Gardener for public domain
Introduction
Dealing with applications on Kubernetes which offer a secure service endpoints (e.g. HTTPS) also require you to enable a secured communication via SSL/TLS. With the certificate extension enabled, Gardener can manage commonly trusted X.509 certificate for your application endpoint. From initially requesting certificate, it also handeles their renewal in time using the free Let’s Encrypt API.
There are two senarios with which you can use the certificate extension
- You want to use a certificate for a subdomain the shoot’s default DNS (see
.spec.dns.domain
of your shoot resource, e.g.short.ingress.shoot.project.default-domain.gardener.cloud
). If this is your case, please see Manage certificates with Gardener for default domain - You want to use a certificate for a custom domain. If this is your case, please keep reading this article.
Prerequisites
Before you start this guide there are a few requirements you need to fulfill:
- You have an existing shoot cluster
- Your custom domain is under a public top level domain (e.g.
.com
) - Your custom zone is resolvable with a public resolver via the internet (e.g.
8.8.8.8
) - You have a custom DNS provider configured and working (see “DNS Providers”)
As part of the Let’s Encrypt ACME challenge validation process, Gardener sets a DNS TXT entry and Let’s Encrypt checks if it can both resolve and authenticate it. Therefore, it’s important that your DNS-entries are publicly resolvable. You can check this by querying e.g. Googles public DNS server and if it returns an entry your DNS is publicly visible:
# returns the A record for cert-example.example.com using Googles DNS server (8.8.8.8)
dig cert-example.example.com @8.8.8.8 A
DNS provider
In order to issue certificates for a custom domain you need to specify a DNS provider which is permitted to create DNS records for subdomains of your requested domain in the certificate. For example, if you request a certificate for host.example.com
your DNS provider must be capable of managing subdomains of host.example.com
.
DNS providers are normally specified in the shoot manifest. To learn more on how to configure one, please see the DNS provider documentation.
Issue a certificate
Every X.509 certificate is represented by a Kubernetes custom resource certificate.cert.gardener.cloud
in your cluster. A Certificate
resource may be used to initiate a new certificate request as well as to manage its lifecycle. Gardener’s certificate service regularly checks the expiration timestamp of Certificates, triggers a renewal process if necessary and replaces the existing X.509 certificate with a new one.
Your application should be able to reload replaced certificates in a timely manner to avoid service disruptions.
Certificates can be requested via 3 resources type
- Ingress
- Service (type LoadBalancer)
- Gateways (both Istio gateways and from the Gateway API)
- Certificate (Gardener CRD)
If either of the first 2 are used, a corresponding Certificate
resource will be created automatically.
Using an Ingress Resource
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: amazing-ingress
annotations:
cert.gardener.cloud/purpose: managed
# Optional but recommended, this is going to create the DNS entry at the same time
dns.gardener.cloud/class: garden
dns.gardener.cloud/ttl: "600"
#cert.gardener.cloud/commonname: "*.example.com" # optional, if not specified the first name from spec.tls[].hosts is used as common name
#cert.gardener.cloud/dnsnames: "" # optional, if not specified the names from spec.tls[].hosts are used
#cert.gardener.cloud/follow-cname: "true" # optional, same as spec.followCNAME in certificates
#cert.gardener.cloud/secret-labels: "key1=value1,key2=value2" # optional labels for the certificate secret
#cert.gardener.cloud/issuer: custom-issuer # optional to specify custom issuer (use namespace/name for shoot issuers)
#cert.gardener.cloud/preferred-chain: "chain name" # optional to specify preferred-chain (value is the Subject Common Name of the root issuer)
#cert.gardener.cloud/private-key-algorithm: ECDSA # optional to specify algorithm for private key, allowed values are 'RSA' or 'ECDSA'
#cert.gardener.cloud/private-key-size: "384" # optional to specify size of private key, allowed values for RSA are "2048", "3072", "4096" and for ECDSA "256" and "384"
spec:
tls:
- hosts:
# Must not exceed 64 characters.
- amazing.example.com
# Certificate and private key reside in this secret.
secretName: tls-secret
rules:
- host: amazing.example.com
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: amazing-svc
port:
number: 8080
Replace the hosts
and rules[].host
value again with your own domain and adjust the remaining Ingress attributes in accordance with your deployment (e.g. the above is for an istio
Ingress controller and forwards traffic to a service1
on port 80).
Using a Service of type LoadBalancer
apiVersion: v1
kind: Service
metadata:
annotations:
cert.gardener.cloud/secretname: tls-secret
dns.gardener.cloud/dnsnames: example.example.com
dns.gardener.cloud/class: garden
# Optional
dns.gardener.cloud/ttl: "600"
cert.gardener.cloud/commonname: "*.example.example.com"
cert.gardener.cloud/dnsnames: ""
#cert.gardener.cloud/follow-cname: "true" # optional, same as spec.followCNAME in certificates
#cert.gardener.cloud/secret-labels: "key1=value1,key2=value2" # optional labels for the certificate secret
#cert.gardener.cloud/issuer: custom-issuer # optional to specify custom issuer (use namespace/name for shoot issuers)
#cert.gardener.cloud/preferred-chain: "chain name" # optional to specify preferred-chain (value is the Subject Common Name of the root issuer)
#cert.gardener.cloud/private-key-algorithm: ECDSA # optional to specify algorithm for private key, allowed values are 'RSA' or 'ECDSA'
#cert.gardener.cloud/private-key-size: "384" # optional to specify size of private key, allowed values for RSA are "2048", "3072", "4096" and for ECDSA "256" and "384"
name: test-service
namespace: default
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
type: LoadBalancer
Using a Gateway resource
Please see Istio Gateways or Gateway API for details.
Using the custom Certificate resource
apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
name: cert-example
namespace: default
spec:
commonName: amazing.example.com
secretRef:
name: tls-secret
namespace: default
# Optionnal if using the default issuer
issuerRef:
name: garden
# If delegated domain for DNS01 challenge should be used. This has only an effect if a CNAME record is set for
# '_acme-challenge.amazing.example.com'.
# For example: If a CNAME record exists '_acme-challenge.amazing.example.com' => '_acme-challenge.writable.domain.com',
# the DNS challenge will be written to '_acme-challenge.writable.domain.com'.
#followCNAME: true
# optionally set labels for the secret
#secretLabels:
# key1: value1
# key2: value2
# Optionally specify the preferred certificate chain: if the CA offers multiple certificate chains, prefer the chain with an issuer matching this Subject Common Name. If no match, the default offered chain will be used.
#preferredChain: "ISRG Root X1"
# Optionally specify algorithm and key size for private key. Allowed algorithms: "RSA" (allowed sizes: 2048, 3072, 4096) and "ECDSA" (allowed sizes: 256, 384)
# If not specified, RSA with 2048 is used.
#privateKey:
# algorithm: ECDSA
# size: 384
Supported attributes
Here is a list of all supported annotations regarding the certificate extension:
Path | Annotation | Value | Required | Description |
---|---|---|---|---|
N/A | cert.gardener.cloud/purpose: | managed | Yes when using annotations | Flag for Gardener that this specific Ingress or Service requires a certificate |
spec.commonName | cert.gardener.cloud/commonname: | E.g. “*.demo.example.com” or “special.example.com” | Certificate and Ingress : No Service: Yes, if DNS names unset | Specifies for which domain the certificate request will be created. If not specified, the names from spec.tls[].hosts are used. This entry must comply with the 64 character limit. |
spec.dnsNames | cert.gardener.cloud/dnsnames: | E.g. “special.example.com” | Certificate and Ingress : No Service: Yes, if common name unset | Additional domains the certificate should be valid for (Subject Alternative Name). If not specified, the names from spec.tls[].hosts are used. Entries in this list can be longer than 64 characters. |
spec.secretRef.name | cert.gardener.cloud/secretname: | any-name | Yes for certificate and Service | Specifies the secret which contains the certificate/key pair. If the secret is not available yet, it’ll be created automatically as soon as the certificate has been issued. |
spec.issuerRef.name | cert.gardener.cloud/issuer: | E.g. gardener | No | Specifies the issuer you want to use. Only necessary if you request certificates for custom domains. |
N/A | cert.gardener.cloud/revoked: | true otherwise always false | No | Use only to revoke a certificate, see reference for more details |
spec.followCNAME | cert.gardener.cloud/follow-cname | E.g. true | No | Specifies that the usage of a delegated domain for DNS challenges is allowed. Details see Follow CNAME. |
spec.preferredChain | cert.gardener.cloud/preferred-chain | E.g. ISRG Root X1 | No | Specifies the Common Name of the issuer for selecting the certificate chain. Details see Preferred Chain. |
spec.secretLabels | cert.gardener.cloud/secret-labels | for annotation use e.g. key1=value1,key2=value2 | No | Specifies labels for the certificate secret. |
spec.privateKey.algorithm | cert.gardener.cloud/private-key-algorithm | RSA , ECDSA | No | Specifies algorithm for private key generation. The default value is depending on configuration of the extension (default of the default is RSA ). You may request a new certificate without privateKey settings to find out the concrete defaults in your Gardener. |
spec.privateKey.size | cert.gardener.cloud/private-key-size | "256" , "384" , "2048" , "3072" , "4096" | No | Specifies size for private key generation. Allowed values for RSA are 2048 , 3072 , and 4096 . For ECDSA allowed values are 256 and 384 . The default values are depending on the configuration of the extension (defaults of the default values are 3072 for RSA and 384 for ECDSA respectively). |
Request a wildcard certificate
In order to avoid the creation of multiples certificates for every single endpoints, you may want to create a wildcard certificate for your shoot’s default cluster.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: amazing-ingress
annotations:
cert.gardener.cloud/purpose: managed
cert.gardener.cloud/commonName: "*.example.com"
spec:
tls:
- hosts:
- amazing.example.com
secretName: tls-secret
rules:
- host: amazing.example.com
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: amazing-svc
port:
number: 8080
Please note that this can also be achived by directly adding an annotation to a Service type LoadBalancer. You could also create a Certificate object with a wildcard domain.
Using a custom Issuer
Most Gardener deployment with the certification extension enabled have a preconfigured garden
issuer. It is also usually configured to use Let’s Encrypt as the certificate provider.
If you need a custom issuer for a specific cluster, please see Using a custom Issuer
Quotas
For security reasons there may be a default quota on the certificate requests per day set globally in the controller registration of the shoot-cert-service.
The default quota only applies if there is no explicit quota defined for the issuer itself with the field
requestsPerDayQuota
, e.g.:
kind: Shoot
...
spec:
extensions:
- type: shoot-cert-service
providerConfig:
apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
kind: CertConfig
issuers:
- email: your-email@example.com
name: custom-issuer # issuer name must be specified in every custom issuer request, must not be "garden"
server: 'https://acme-v02.api.letsencrypt.org/directory'
requestsPerDayQuota: 10
DNS Propagation
As stated before, cert-manager uses the ACME challenge protocol to authenticate that you are the DNS owner for the domain’s certificate you are requesting.
This works by creating a DNS TXT record in your DNS provider under _acme-challenge.example.example.com
containing a token to compare with. The TXT record is only applied during the domain validation.
Typically, the record is propagated within a few minutes. But if the record is not visible to the ACME server for any reasons, the certificate request is retried again after several minutes.
This means you may have to wait up to one hour after the propagation problem has been resolved before the certificate request is retried. Take a look in the events with kubectl describe ingress example
for troubleshooting.
Character Restrictions
Due to restriction of the common name to 64 characters, you may to leave the common name unset in such cases.
For example, the following request is invalid:
apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
name: cert-invalid
namespace: default
spec:
commonName: morethan64characters.ingress.shoot.project.default-domain.gardener.cloud
But it is valid to request a certificate for this domain if you have left the common name unset:
apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
name: cert-example
namespace: default
spec:
dnsNames:
- morethan64characters.ingress.shoot.project.default-domain.gardener.cloud
References
1.4 - Using a custom Issuer
Using a custom Issuer
Another possibility to request certificates for custom domains is a dedicated issuer.
Note: This is only needed if the default issuer provided by Gardener is restricted to shoot related domains or you are using domain names not visible to public DNS servers. Which means that your senario most likely doesn’t require your to add an issuer.
The custom issuers are specified normally in the shoot manifest. If the shootIssuers
feature is enabled, it can alternatively be defined in the shoot cluster.
Custom issuer in the shoot manifest
kind: Shoot
...
spec:
extensions:
- type: shoot-cert-service
providerConfig:
apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
kind: CertConfig
issuers:
- email: your-email@example.com
name: custom-issuer # issuer name must be specified in every custom issuer request, must not be "garden"
server: 'https://acme-v02.api.letsencrypt.org/directory'
privateKeySecretName: my-privatekey # referenced resource, the private key must be stored in the secret at `data.privateKey` (optionally, only needed as alternative to auto registration)
#precheckNameservers: # to provide special set of nameservers to be used for prechecking DNSChallenges for an issuer
#- dns1.private.company-net:53
#- dns2.private.company-net:53"
#shootIssuers:
# if true, allows to specify issuers in the shoot cluster
#enabled: true
resources:
- name: my-privatekey
resourceRef:
apiVersion: v1
kind: Secret
name: custom-issuer-privatekey # name of secret in Gardener project
If you are using an ACME provider for private domains, you may need to change the nameservers used for
checking the availability of the DNS challenge’s TXT record before the certificate is requested from the ACME provider.
By default, only public DNS servers may be used for this purpose.
At least one of the precheckNameservers
must be able to resolve the private domain names.
Using the custom issuer
To use the custom issuer in a certificate, just specify its name in the spec.
apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
spec:
...
issuerRef:
name: custom-issuer
...
For source resources like Ingress
or Service
use the cert.gardener.cloud/issuer
annotation.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: amazing-ingress
annotations:
cert.gardener.cloud/purpose: managed
cert.gardener.cloud/issuer: custom-issuer
...
Custom issuer in the shoot cluster
Prerequiste: The shootIssuers
feature has to be enabled.
It is either enabled globally in the ControllerDeployment
or in the shoot manifest
with:
kind: Shoot
...
spec:
extensions:
- type: shoot-cert-service
providerConfig:
apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
kind: CertConfig
shootIssuers:
enabled: true # if true, allows to specify issuers in the shoot cluster
...
Example for specifying an Issuer
resource and its Secret
directly in any
namespace of the shoot cluster:
apiVersion: cert.gardener.cloud/v1alpha1
kind: Issuer
metadata:
name: my-own-issuer
namespace: my-namespace
spec:
acme:
domains:
include:
- my.own.domain.com
email: some.user@my.own.domain.com
privateKeySecretRef:
name: my-own-issuer-secret
namespace: my-namespace
server: https://acme-v02.api.letsencrypt.org/directory
---
apiVersion: v1
kind: Secret
metadata:
name: my-own-issuer-secret
namespace: my-namespace
type: Opaque
data:
privateKey: ... # replace '...' with valus encoded as base64
Using the custom shoot issuer
To use the custom issuer in a certificate, just specify its name and namespace in the spec.
apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
spec:
...
issuerRef:
name: my-own-issuer
namespace: my-namespace
...
For source resources like Ingress
or Service
use the cert.gardener.cloud/issuer
annotation.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: amazing-ingress
annotations:
cert.gardener.cloud/purpose: managed
cert.gardener.cloud/issuer: my-namespace/my-own-issuer
...
1.5 - Deployment
Gardener Certificate Management
Introduction
Gardener comes with an extension that enables shoot owners to request X.509 compliant certificates for shoot domains.
Extension Installation
The Shoot-Cert-Service
extension can be deployed and configured via Gardener’s native resource ControllerRegistration.
Prerequisites
To let the Shoot-Cert-Service
operate properly, you need to have:
- a DNS service in your seed
- contact details and optionally a private key for a pre-existing Let’s Encrypt account
ControllerRegistration
An example of a ControllerRegistration
for the Shoot-Cert-Service
can be found at controller-registration.yaml.
The ControllerRegistration
contains a Helm chart which eventually deploy the Shoot-Cert-Service
to seed clusters. It offers some configuration options, mainly to set up a default issuer for shoot clusters. With a default issuer, pre-existing Let’s Encrypt accounts can be used and shared with shoot clusters (See “One Account or Many?” of the Integration Guide).
Please keep the Let’s Encrypt Rate Limits in mind when using this shared account model. Depending on the amount of shoots and domains it is recommended to use an account with increased rate limits.
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
...
values:
certificateConfig:
defaultIssuer:
acme:
email: foo@example.com
privateKey: |-
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
server: https://acme-v02.api.letsencrypt.org/directory
name: default-issuer
# restricted: true # restrict default issuer to any sub-domain of shoot.spec.dns.domain
# defaultRequestsPerDayQuota: 50
# precheckNameservers: 8.8.8.8,8.8.4.4
# caCertificates: | # optional custom CA certificates when using private ACME provider
# -----BEGIN CERTIFICATE-----
# ...
# -----END CERTIFICATE-----
#
# -----BEGIN CERTIFICATE-----
# ...
# -----END CERTIFICATE-----
shootIssuers:
enabled: false # if true, allows to specify issuers in the shoot clusters
Enablement
If the Shoot-Cert-Service
should be enabled for every shoot cluster in your Gardener managed environment, you need to globally enable it in the ControllerRegistration
:
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
...
resources:
- globallyEnabled: true
kind: Extension
type: shoot-cert-service
Alternatively, you’re given the option to only enable the service for certain shoots:
kind: Shoot
apiVersion: core.gardener.cloud/v1beta1
...
spec:
extensions:
- type: shoot-cert-service
...
1.6 - Gardener yourself a Shoot with Istio, custom Domains, and Certificates
As we ramp up more and more friends of Gardener, I thought it worthwhile to explore and write a tutorial about how to simply:
- create a Gardener managed Kubernetes Cluster (Shoot) via kubectl
- install Istio as a preferred, production ready Ingress/Service Mesh (instead of the Nginx Ingress addon)
- attach your own custom domain to be managed by Gardener
- combine everything with certificates from Let’s Encrypt
Here are some pre-pointers that you will need to go deeper:
Tip
If you try my instructions and fail, then read the alternative title of this tutorial as “Shoot yourself in the foot with Gardener, custom Domains, Istio and Certificates”.First Things First
Login to your Gardener landscape, setup a project with adequate infrastructure credentials and then navigate to your account. Note down the name of your secret. I chose the GCP infrastructure from the vast possible options that my Gardener provides me with, so i had named the secret as shoot-operator-gcp
.
From the Access widget (leave the default settings) download your personalized kubeconfig
into ~/.kube/kubeconfig-garden-myproject
. Follow the instructions to setup kubelogin
:
For convinience, let us set an alias command with
alias kgarden="kubectl --kubeconfig ~/.kube/kubeconfig-garden-myproject.yaml"
kgarden
now gives you all botanical powers and connects you directly with your Gardener.
You should now be able to run kgarden get shoots
, automatically get an oidc token, and list already running clusters/shoots.
Prepare your Custom Domain
I am going to use Cloud Flare as programmatic DNS of my custom domain mydomain.io
. Please follow detailed instructions from Cloud Flare on how to delegate your domain (the free account does not support delegating subdomains). Alternatively, AWS Route53 (and most others) support delegating subdomains.
I needed to follow these instructions and created the following secret:
apiVersion: v1
kind: Secret
metadata:
name: cloudflare-mydomain-io
type: Opaque
data:
CLOUDFLARE_API_TOKEN: useYOURownDAMITzNDU2Nzg5MDEyMzQ1Njc4OQ==
Apply this secret into your project with kgarden create -f cloudflare-mydomain-io.yaml
.
Our External DNS Manager also supports Amazon Route53, Google CloudDNS, AliCloud DNS, Azure DNS, or OpenStack Designate. Check it out.
Prepare Gardener Extensions
I now need to prepare the Gardener extensions shoot-dns-service
and shoot-cert-service
and set the parameters accordingly.
The following snippet allows Gardener to manage my entire custom domain, whereas with the include:
attribute I restrict all dynamic entries under the subdomain gsicdc.mydomain.io
:
dns:
providers:
- domains:
include:
- gsicdc.mydomain.io
primary: false
secretName: cloudflare-mydomain-io
type: cloudflare-dns
extensions:
- type: shoot-dns-service
The next snipplet allows Gardener to manage certificates automatically from Let’s Encrypt on mydomain.io
for me:
extensions:
- type: shoot-cert-service
providerConfig:
apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
issuers:
- email: me@mail.com
name: mydomain
server: 'https://acme-v02.api.letsencrypt.org/directory'
- email: me@mail.com
name: mydomain-staging
server: 'https://acme-staging-v02.api.letsencrypt.org/directory'
References for Let’s Encrypt:
Create the Gardener Shoot Cluster
Remember I chose to create the Shoot on GCP, so below is the simplest declarative shoot or cluster order document. Notice that I am referring to the infrastructure credentials with shoot-operator-gcp
and I combined the above snippets into the yaml file:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: gsicdc
spec:
dns:
providers:
- domains:
include:
- gsicdc.mydomain.io
primary: false
secretName: cloudflare-mydomain-io
type: cloudflare-dns
extensions:
- type: shoot-dns-service
- type: shoot-cert-service
providerConfig:
apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
issuers:
- email: me@mail.com
name: mydomain
server: 'https://acme-v02.api.letsencrypt.org/directory'
- email: me@mail.com
name: mydomain-staging
server: 'https://acme-staging-v02.api.letsencrypt.org/directory'
cloudProfileName: gcp
kubernetes:
allowPrivilegedContainers: true
version: 1.24.8
maintenance:
autoUpdate:
kubernetesVersion: true
machineImageVersion: true
networking:
nodes: 10.250.0.0/16
pods: 100.96.0.0/11
services: 100.64.0.0/13
type: calico
provider:
controlPlaneConfig:
apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
zone: europe-west1-d
infrastructureConfig:
apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
workers: 10.250.0.0/16
type: gcp
workers:
- machine:
image:
name: gardenlinux
version: 576.9.0
type: n1-standard-2
maxSurge: 1
maxUnavailable: 0
maximum: 2
minimum: 1
name: my-workerpool
volume:
size: 50Gi
type: pd-standard
zones:
- europe-west1-d
purpose: testing
region: europe-west1
secretBindingName: shoot-operator-gcp
Create your cluster and wait for it to be ready (about 5 to 7min).
$ kgarden create -f gsicdc.yaml
shoot.core.gardener.cloud/gsicdc created
$ kgarden get shoot gsicdc --watch
NAME CLOUDPROFILE VERSION SEED DOMAIN HIBERNATION OPERATION PROGRESS APISERVER CONTROL NODES SYSTEM AGE
gsicdc gcp 1.24.8 gcp gsicdc.myproject.shoot.devgarden.cloud Awake Processing 38 Progressing Progressing Unknown Unknown 83s
...
gsicdc gcp 1.24.8 gcp gsicdc.myproject.shoot.devgarden.cloud Awake Succeeded 100 True True True False 6m7s
Get access to your freshly baked cluster and set your KUBECONFIG
:
$ kgarden get secrets gsicdc.kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d >kubeconfig-gsicdc.yaml
$ export KUBECONFIG=$(pwd)/kubeconfig-gsicdc.yaml
$ kubectl get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 89m
Install Istio
Please follow the Istio installation instructions and download istioctl
. If you are on a Mac, I recommend:
brew install istioctl
I want to install Istio with a default profile and SDS enabled. Furthermore I pass the following annotations to the service object istio-ingressgateway
in the istio-system
namespace.
annotations:
cert.gardener.cloud/issuer: mydomain-staging
cert.gardener.cloud/secretname: wildcard-tls
dns.gardener.cloud/class: garden
dns.gardener.cloud/dnsnames: "*.gsicdc.mydomain.io"
dns.gardener.cloud/ttl: "120"
With these annotations three things now happen automatically:
- The External DNS Manager, provided to you as a service (
dns.gardener.cloud/class: garden
), picks up the request and creates the wildcard DNS entry*.gsicdc.mydomain.io
with a time to live of 120sec at your DNS provider. My provider Cloud Flare is very very quick (as opposed to some other services). You should be able to verify the entry withdig lovemygardener.gsicdc.mydomain.io
within seconds. - The Certificate Management picks up the request as well and initiates a DNS01 protocol exchange with Let’s Encrypt; using the staging environment referred to with the issuer behind
mydomain-staging
. - After aproximately 70sec (give and take) you will receive the wildcard certificate in the
wildcard-tls
secret in the namespaceistio-system
.
Here is the istio-install script:
$ export domainname="*.gsicdc.mydomain.io"
$ export issuer="mydomain-staging"
$ cat <<EOF | istioctl install -y -f -
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
profile: default
components:
ingressGateways:
- name: istio-ingressgateway
enabled: true
k8s:
serviceAnnotations:
cert.gardener.cloud/issuer: "${issuer}"
cert.gardener.cloud/secretname: wildcard-tls
dns.gardener.cloud/class: garden
dns.gardener.cloud/dnsnames: "${domainname}"
dns.gardener.cloud/ttl: "120"
EOF
Verify that setup is working and that DNS and certificates have been created/delivered:
$ kubectl -n istio-system describe service istio-ingressgateway
<snip>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 58s service-controller Ensuring load balancer
Normal reconcile 58s cert-controller-manager created certificate object istio-system/istio-ingressgateway-service-pwqdm
Normal cert-annotation 58s cert-controller-manager wildcard-tls: cert request is pending
Normal cert-annotation 54s cert-controller-manager wildcard-tls: certificate pending: certificate requested, preparing/waiting for successful DNS01 challenge
Normal cert-annotation 28s cert-controller-manager wildcard-tls: certificate ready
Normal EnsuredLoadBalancer 26s service-controller Ensured load balancer
Normal reconcile 26s dns-controller-manager created dns entry object shoot--core--gsicdc/istio-ingressgateway-service-p9qqb
Normal dns-annotation 26s dns-controller-manager *.gsicdc.mydomain.io: dns entry is pending
Normal dns-annotation 21s (x3 over 21s) dns-controller-manager *.gsicdc.mydomain.io: dns entry active
$ dig lovemygardener.gsicdc.mydomain.io
; <<>> DiG 9.10.6 <<>> lovemygardener.gsicdc.mydomain.io
<snip>
;; ANSWER SECTION:
lovemygardener.gsicdc.mydomain.io. 120 IN A 35.195.120.62
<snip>
There you have it, the wildcard-tls certificate is ready and the *.gsicdc.mydomain.io dns entry is active. Traffic will be going your way.
Handy Tools to Install
Another set of fine tools to use are kapp (formerly known as k14s), k9s and HTTPie. While we are at it, let’s install them all. If you are on a Mac, I recommend:
brew tap vmware-tanzu/carvel
brew install ytt kbld kapp kwt imgpkg vendir
brew install derailed/k9s/k9s
brew install httpie
Ingress at Your Service
Kubernetes Ingress is a subject that is evolving to much broader standard. Please watch Evolving the Kubernetes Ingress APIs to GA and Beyond for a good introduction. In this example, I did not want to use the Kubernetes Ingress
compatibility option of Istio. Instead, I used VirtualService
and Gateway
from the Istio’s API group networking.istio.io/v1
directly, and enabled istio-injection generically for the namespace.
I use httpbin as service that I want to expose to the internet, or where my ingress should be routed to (depends on your point of view, I guess).
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
istio-injection: enabled
---
apiVersion: v1
kind: Service
metadata:
name: httpbin
namespace: production
labels:
app: httpbin
spec:
ports:
- name: http
port: 8000
targetPort: 80
selector:
app: httpbin
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpbin
namespace: production
spec:
replicas: 1
selector:
matchLabels:
app: httpbin
template:
metadata:
labels:
app: httpbin
spec:
containers:
- image: docker.io/kennethreitz/httpbin
imagePullPolicy: IfNotPresent
name: httpbin
ports:
- containerPort: 80
---
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
name: httpbin-gw
namespace: production
spec:
selector:
istio: ingressgateway #! use istio default ingress gateway
servers:
- port:
number: 80
name: http
protocol: HTTP
tls:
httpsRedirect: true
hosts:
- "httpbin.gsicdc.mydomain.io"
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: wildcard-tls
hosts:
- "httpbin.gsicdc.mydomain.io"
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: httpbin-vs
namespace: production
spec:
hosts:
- "httpbin.gsicdc.mydomain.io"
gateways:
- httpbin-gw
http:
- match:
- uri:
regex: /.*
route:
- destination:
port:
number: 8000
host: httpbin
---
Let us now deploy the whole package of Kubernetes primitives using kapp
:
$ kapp deploy -a httpbin -f httpbin-kapp.yaml
Target cluster 'https://api.gsicdc.myproject.shoot.devgarden.cloud' (nodes: shoot--myproject--gsicdc-my-workerpool-z1-6586c8f6cb-x24kh)
Changes
Namespace Name Kind Conds. Age Op Wait to Rs Ri
(cluster) production Namespace - - create reconcile - -
production httpbin Deployment - - create reconcile - -
^ httpbin Service - - create reconcile - -
^ httpbin-gw Gateway - - create reconcile - -
^ httpbin-vs VirtualService - - create reconcile - -
Op: 5 create, 0 delete, 0 update, 0 noop
Wait to: 5 reconcile, 0 delete, 0 noop
Continue? [yN]: y
5:36:31PM: ---- applying 1 changes [0/5 done] ----
<snip>
5:37:00PM: ok: reconcile deployment/httpbin (apps/v1) namespace: production
5:37:00PM: ---- applying complete [5/5 done] ----
5:37:00PM: ---- waiting complete [5/5 done] ----
Succeeded
Let’s finally test the service (Of course you can use the browser as well):
$ http httpbin.gsicdc.mydomain.io
HTTP/1.1 301 Moved Permanently
content-length: 0
date: Wed, 13 May 2020 21:29:13 GMT
location: https://httpbin.gsicdc.mydomain.io/
server: istio-envoy
$ curl -k https://httpbin.gsicdc.mydomain.io/ip
{
"origin": "10.250.0.2"
}
Quod erat demonstrandum. The proof of exchanging the issuer is now left to the reader.
Tip
Remember that the certificate is actually not valid because it is issued from the Let’s encrypt staging environment. Thus, we needed “curl -k” or “http –verify no”.Hint: use the interactive k9s tool.
Cleanup
Remove the cloud native application:
$ kapp ls
Apps in namespace 'default'
Name Namespaces Lcs Lca
httpbin (cluster),production true 17m
$ kapp delete -a httpbin
...
Continue? [yN]: y
...
11:47:47PM: ---- waiting complete [8/8 done] ----
Succeeded
Remove Istio:
$ istioctl x uninstall --purge
clusterrole.rbac.authorization.k8s.io "prometheus-istio-system" deleted
clusterrolebinding.rbac.authorization.k8s.io "prometheus-istio-system" deleted
...
Delete your Shoot:
kgarden annotate shoot gsicdc confirmation.gardener.cloud/deletion=true --overwrite
kgarden delete shoot gsicdc --wait=false
1.7 - Gateway Api Gateways
Using annotated Gateway API Gateway and/or HTTPRoutes as Source
This tutorial describes how to use annotated Gateway API resources as source for Certificate
.
Install Istio on your cluster
Follow the Istio Kubernetes Gateway API to install the Gateway API and to install Istio.
These are the typical commands for the Istio installation with the Kubernetes Gateway API:
export KUEBCONFIG=...
curl -L https://istio.io/downloadIstio | sh -
kubectl get crd gateways.gateway.networking.k8s.io &> /dev/null || \
{ kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd?ref=v1.0.0" | kubectl apply -f -; }
istioctl install --set profile=minimal -y
kubectl label namespace default istio-injection=enabled
Verify that Gateway Source works
Install a sample service
With automatic sidecar injection:
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin.yaml
Note: The sample service is not used in the following steps. It is deployed for illustration purposes only. To use it with certificates, you have to add an HTTPS port for it.
Using a Gateway as a source
Deploy the Gateway API configuration including a single exposed route (i.e., /get):
kubectl create namespace istio-ingress
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: gateway
namespace: istio-ingress
annotations:
#cert.gardener.cloud/dnsnames: "*.example.com" # alternative if you want to control the dns names explicitly.
cert.gardener.cloud/purpose: managed
spec:
gatewayClassName: istio
listeners:
- name: default
hostname: "*.example.com" # this is used by cert-controller-manager to extract DNS names
port: 443
protocol: HTTPS
allowedRoutes:
namespaces:
from: All
tls: # important: tls section must be defined with exactly one certificateRefs item
certificateRefs:
- name: foo-example-com
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: http
namespace: default
spec:
parentRefs:
- name: gateway
namespace: istio-ingress
hostnames: ["httpbin.example.com"] # this is used by cert-controller-manager to extract DNS names too
rules:
- matches:
- path:
type: PathPrefix
value: /get
backendRefs:
- name: httpbin
port: 8000
EOF
You should now see a created Certificate
resource similar to:
$ kubectl -n istio-ingress get cert -oyaml
apiVersion: v1
items:
- apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
generateName: gateway-gateway-
name: gateway-gateway-kdw6h
namespace: istio-ingress
ownerReferences:
- apiVersion: gateway.networking.k8s.io/v1
blockOwnerDeletion: true
controller: true
kind: Gateway
name: gateway
spec:
commonName: '*.example.com'
secretName: foo-example-com
status:
...
kind: List
metadata:
resourceVersion: ""
Using a HTTPRoute as a source
If the Gateway
resource is annotated with cert.gardener.cloud/purpose: managed
,
hostnames from all referencing HTTPRoute
resources are automatically extracted.
These resources don’t need an additional annotation.
Deploy the Gateway API configuration including a single exposed route (i.e., /get):
kubectl create namespace istio-ingress
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: gateway
namespace: istio-ingress
annotations:
cert.gardener.cloud/purpose: managed
spec:
gatewayClassName: istio
listeners:
- name: default
hostname: null # not set
port: 443
protocol: HTTPS
allowedRoutes:
namespaces:
from: All
tls: # important: tls section must be defined with exactly one certificateRefs item
certificateRefs:
- name: foo-example-com
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: http
namespace: default
spec:
parentRefs:
- name: gateway
namespace: istio-ingress
hostnames: ["httpbin.example.com"] # this is used by dns-controller-manager to extract DNS names too
rules:
- matches:
- path:
type: PathPrefix
value: /get
backendRefs:
- name: httpbin
port: 8000
EOF
This should show a similar Certificate
resource as above.
1.8 - Istio Gateways
Using annotated Istio Gateway and/or Istio Virtual Service as Source
This tutorial describes how to use annotated Istio Gateway resources as source for Certificate
resources.
Install Istio on your cluster
Follow the Istio Getting Started to download and install Istio.
These are the typical commands for the istio demo installation
export KUEBCONFIG=...
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=demo -y
kubectl label namespace default istio-injection=enabled
Note: If you are using a KinD cluster, the istio-ingressgateway service may be pending forever.
$ kubectl -n istio-system get svc istio-ingressgateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 10.96.88.189 <pending> 15021:30590/TCP,80:30185/TCP,443:30075/TCP,31400:30129/TCP,15443:30956/TCP 13m
In this case, you may patch the status for demo purposes (of course it still would not accept connections)
kubectl -n istio-system patch svc istio-ingressgateway --type=merge --subresource status --patch '{"status":{"loadBalancer":{"ingress":[{"ip":"1.2.3.4"}]}}}'
Verify that Istio Gateway/VirtualService Source works
Install a sample service
With automatic sidecar injection:
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin.yaml
Using a Gateway as a source
Create an Istio Gateway:
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
name: httpbin-gateway
namespace: istio-system
annotations:
#cert.gardener.cloud/dnsnames: "*.example.com" # alternative if you want to control the dns names explicitly.
cert.gardener.cloud/purpose: managed
spec:
selector:
istio: ingressgateway # use Istio default gateway implementation
servers:
- port:
number: 443
name: http
protocol: HTTPS
hosts:
- "httpbin.example.com" # this is used by the dns-controller-manager to extract DNS names
tls:
credentialName: my-tls-secret
EOF
You should now see a created Certificate
resource similar to:
$ kubectl -n istio-system get cert -oyaml
apiVersion: v1
items:
- apiVersion: cert.gardener.cloud/v1alpha1
kind: Certificate
metadata:
generateName: httpbin-gateway-gateway-
name: httpbin-gateway-gateway-hdbjb
namespace: istio-system
ownerReferences:
- apiVersion: networking.istio.io/v1
blockOwnerDeletion: true
controller: true
kind: Gateway
name: httpbin-gateway
spec:
commonName: httpbin.example.com
secretName: my-tls-secret
status:
...
kind: List
metadata:
resourceVersion: ""
Using a VirtualService as a source
If the Gateway
resource is annotated with cert.gardener.cloud/purpose: managed
,
hosts from all referencing VirtualServices
resources are automatically extracted.
These resources don’t need an additional annotation.
Create an Istio Gateway:
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
name: httpbin-gateway
namespace: istio-system
annotations:
cert.gardener.cloud/purpose: managed
spec:
selector:
istio: ingressgateway # use Istio default gateway implementation
servers:
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- "*"
tls:
credentialName: my-tls-secret
EOF
Configure routes for traffic entering via the Gateway:
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: httpbin
namespace: default
spec:
hosts:
- "httpbin.example.com" # this is used by dns-controller-manager to extract DNS names
gateways:
- istio-system/httpbin-gateway
http:
- match:
- uri:
prefix: /status
- uri:
prefix: /delay
route:
- destination:
port:
number: 8000
host: httpbin
EOF
This should show a similar Certificate
resource as above.
2 - DNS services
Gardener Extension for DNS services
Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.
Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.
Extension-Resources
Example extension resource:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: "extension-dns-service"
namespace: shoot--project--abc
spec:
type: shoot-dns-service
How to start using or developing this extension controller locally
You can run the controller locally on your machine by executing make start
. Please make sure to have the kubeconfig to the cluster you want to connect to ready in the ./dev/kubeconfig
file.
Static code checks and tests can be executed by running make verify
. We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.
Feedback and Support
Feedback and contributions are always welcome. Please report bugs or suggestions as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).
Learn more!
Please find further resources about out project here:
2.1 - Configuration
Deployment of the shoot DNS service extension
Disclaimer: This document is NOT a step by step deployment guide for the shoot DNS service extension and only contains some configuration specifics regarding the deployment of different components via the helm charts residing in the shoot DNS service extension repository.
gardener-extension-admission-shoot-dns-service
Authentication against the Garden cluster
There are several authentication possibilities depending on whether or not the concept of Virtual Garden is used.
Virtual Garden is not used, i.e., the runtime
Garden cluster is also the target
Garden cluster.
Automounted Service Account Token
The easiest way to deploy the gardener-extension-admission-shoot-dns-service
component will be to not provide kubeconfig
at all. This way in-cluster configuration and an automounted service account token will be used. The drawback of this approach is that the automounted token will not be automatically rotated.
Service Account Token Volume Projection
Another solution will be to use Service Account Token Volume Projection combined with a kubeconfig
referencing a token file (see example below).
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: <CA-DATA>
server: https://default.kubernetes.svc.cluster.local
name: garden
contexts:
- context:
cluster: garden
user: garden
name: garden
current-context: garden
users:
- name: garden
user:
tokenFile: /var/run/secrets/projected/serviceaccount/token
This will allow for automatic rotation of the service account token by the kubelet
. The configuration can be achieved by setting both .Values.global.serviceAccountTokenVolumeProjection.enabled: true
and .Values.global.kubeconfig
in the respective chart’s values.yaml
file.
Virtual Garden is used, i.e., the runtime
Garden cluster is different from the target
Garden cluster.
Service Account
The easiest way to setup the authentication will be to create a service account and the respective roles will be bound to this service account in the target
cluster. Then use the generated service account token and craft a kubeconfig
which will be used by the workload in the runtime
cluster. This approach does not provide a solution for the rotation of the service account token. However, this setup can be achieved by setting .Values.global.virtualGarden.enabled: true
and following these steps:
- Deploy the
application
part of the charts in thetarget
cluster. - Get the service account token and craft the
kubeconfig
. - Set the crafted
kubeconfig
and deploy theruntime
part of the charts in theruntime
cluster.
Client Certificate
Another solution will be to bind the roles in the target
cluster to a User
subject instead of a service account and use a client certificate for authentication. This approach does not provide a solution for the client certificate rotation. However, this setup can be achieved by setting both .Values.global.virtualGarden.enabled: true
and .Values.global.virtualGarden.user.name
, then following these steps:
- Generate a client certificate for the
target
cluster for the respective user. - Deploy the
application
part of the charts in thetarget
cluster. - Craft a
kubeconfig
using the already generated client certificate. - Set the crafted
kubeconfig
and deploy theruntime
part of the charts in theruntime
cluster.
Projected Service Account Token
This approach requires an already deployed and configured oidc-webhook-authenticator for the target
cluster. Also the runtime
cluster should be registered as a trusted identity provider in the target
cluster. Then projected service accounts tokens from the runtime
cluster can be used to authenticate against the target
cluster. The needed steps are as follows:
- Deploy OWA and establish the needed trust.
- Set
.Values.global.virtualGarden.enabled: true
and.Values.global.virtualGarden.user.name
. Note: username value will depend on the trust configuration, e.g.,<prefix>:system:serviceaccount:<namespace>:<serviceaccount>
- Set
.Values.global.serviceAccountTokenVolumeProjection.enabled: true
and.Values.global.serviceAccountTokenVolumeProjection.audience
. Note: audience value will depend on the trust configuration, e.g.,<cliend-id-from-trust-config>
. - Craft a kubeconfig (see example below).
- Deploy the
application
part of the charts in thetarget
cluster. - Deploy the
runtime
part of the charts in theruntime
cluster.
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: <CA-DATA>
server: https://virtual-garden.api
name: virtual-garden
contexts:
- context:
cluster: virtual-garden
user: virtual-garden
name: virtual-garden
current-context: virtual-garden
users:
- name: virtual-garden
user:
tokenFile: /var/run/secrets/projected/serviceaccount/token
2.2 - Deployment
Gardener DNS Management for Shoots
Introduction
Gardener allows Shoot clusters to request DNS names for Ingresses and Services out of the box.
To support this the gardener must be installed with the shoot-dns-service
extension.
This extension uses the seed’s dns management infrastructure to maintain DNS
names for shoot clusters. So, far only the external DNS domain of a shoot
(already used for the kubernetes api server and ingress DNS names) can be used
for managed DNS names.
Configuration
To generally enable the DNS management for shoot objects the
shoot-dns-service
extension must be registered by providing an
appropriate extension registration in the garden cluster.
Here it is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot.
If the extension should be used for all shoots, the registration must set the globallyEnabled flag to true
.
spec:
resources:
- kind: Extension
type: shoot-dns-service
globallyEnabled: true
Deployment of DNS controller manager
If you are using Gardener version >= 1.54
, please make sure to deploy the DNS controller manager by
adding the dnsControllerManager
section to the providerConfig.values
section.
For example:
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerDeployment
metadata:
name: extension-shoot-dns-service
type: helm
providerConfig:
chart: ...
values:
image:
...
dnsControllerManager:
image:
repository: europe-docker.pkg.dev/gardener-project/releases/dns-controller-manager
tag: v0.16.0
configuration:
cacheTtl: 300
controllers: dnscontrollers,dnssources
dnsPoolResyncPeriod: 30m
#poolSize: 20
#providersPoolResyncPeriod: 24h
serverPortHttp: 8080
createCRDs: false
deploy: true
replicaCount: 1
#resources:
# requests:
# cpu: 50m
# memory: 500Mi
dnsProviderManagement:
enabled: true
Providing Base Domains usable for a Shoot
So, far only the external DNS domain of a shoot already used for the kubernetes api server and ingress DNS names can be used for managed DNS names. This is either the shoot domain as subdomain of the default domain configured for the gardener installation, or a dedicated domain with dedicated access credentials configured for a dedicated shoot via the shoot manifest.
Alternatively, you can specify DNSProviders
and its credentials
Secret
directly in the shoot, if this feature is enabled.
By default, DNSProvider
replication is disabled, but it can be enabled globally in the ControllerDeployment
or for a shoot cluster in the shoot manifest (details see further below).
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerDeployment
metadata:
name: extension-shoot-dns-service
type: helm
providerConfig:
chart: ...
values:
image:
...
dnsProviderReplication:
enabled: true
See example files (20-* and 30-*) for details for the various provider types.
Shoot Feature Gate
If the shoot DNS feature is not globally enabled by default (depends on the extension registration on the garden cluster), it must be enabled per shoot.
To enable the feature for a shoot, the shoot manifest must explicitly add the
shoot-dns-service
extension.
...
spec:
extensions:
- type: shoot-dns-service
...
Enable/disable DNS provider replication for a shoot
The DNSProvider` replication feature enablement can be overwritten in the shoot manifest, e.g.
Kind: Shoot
...
spec:
extensions:
- type: shoot-dns-service
providerConfig:
apiVersion: service.dns.extensions.gardener.cloud/v1alpha1
kind: DNSConfig
dnsProviderReplication:
enabled: true
...
2.3 - DNS Names
Request DNS Names in Shoot Clusters
Introduction
Within a shoot cluster, it is possible to request DNS records via the following resource types:
It is necessary that the Gardener installation your shoot cluster runs in is equipped with a shoot-dns-service
extension. This extension uses the seed’s dns management infrastructure to maintain DNS names for shoot clusters. Please ask your Gardener operator if the extension is available in your environment.
Shoot Feature Gate
In some Gardener setups the shoot-dns-service
extension is not enabled globally and thus must be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to activate the extension individually.
kind: Shoot
...
spec:
extensions:
- type: shoot-dns-service
...
Before you start
You should :
- Have created a shoot cluster
- Have created and correctly configured a DNS Provider (Please consult this page for more information)
- Have a basic understanding of DNS (see link under References)
There are 2 types of DNS that you can use within Kubernetes :
- internal (usually managed by coreDNS)
- external (managed by a public DNS provider).
This page, and the extension, exclusively works for external DNS handling.
Gardener allows 2 way of managing your external DNS:
- Manually, which means you are in charge of creating / maintaining your Kubernetes related DNS entries
- Via the Gardener DNS extension
Gardener DNS extension
The managed external DNS records feature of the Gardener clusters makes all this easier. You do not need DNS service provider specific knowledge, and in fact you do not need to leave your cluster at all to achieve that. You simply annotate the Ingress / Service that needs its DNS records managed and it will be automatically created / managed by Gardener.
Managed external DNS records are supported with the following DNS provider types:
- aws-route53
- azure-dns
- azure-private-dns
- google-clouddns
- openstack-designate
- alicloud-dns
- cloudflare-dns
Request DNS records for Ingress resources
To request a DNS name for Ingress
, Service
or Gateway
(Istio or Gateway API) objects in the shoot cluster it must be annotated with the DNS class garden
and an annotation denoting the desired DNS names.
Example for an annotated Ingress resource:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: amazing-ingress
annotations:
# Let Gardener manage external DNS records for this Ingress.
dns.gardener.cloud/dnsnames: special.example.com # Use "*" to collects domains names from .spec.rules[].host
dns.gardener.cloud/ttl: "600"
dns.gardener.cloud/class: garden
# If you are delegating the certificate management to Gardener, uncomment the following line
#cert.gardener.cloud/purpose: managed
spec:
rules:
- host: special.example.com
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: amazing-svc
port:
number: 8080
# Uncomment the following part if you are delegating the certificate management to Gardener
#tls:
# - hosts:
# - special.example.com
# secretName: my-cert-secret-name
For an Ingress, the DNS names are already declared in the specification. Nevertheless the dnsnames annotation must be present. Here a subset of the DNS names of the ingress can be specified. If DNS names for all names are desired, the value all
can be used.
Keep in mind that ingress resources are ignored unless an ingress controller is set up. Gardener does not provide an ingress controller by default. For more details, see Ingress Controllers and Service in the Kubernetes documentation.
Request DNS records for service type LoadBalancer
Example for an annotated Service (it must have the type LoadBalancer
) resource:
apiVersion: v1
kind: Service
metadata:
name: amazing-svc
annotations:
# Let Gardener manage external DNS records for this Service.
dns.gardener.cloud/dnsnames: special.example.com
dns.gardener.cloud/ttl: "600"
dns.gardener.cloud/class: garden
spec:
selector:
app: amazing-app
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: LoadBalancer
Request DNS records for Gateway resources
Please see Istio Gateways or Gateway API for details.
Creating a DNSEntry resource explicitly
It is also possible to create a DNS entry via the Kubernetes resource called DNSEntry
:
apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSEntry
metadata:
annotations:
# Let Gardener manage this DNS entry.
dns.gardener.cloud/class: garden
name: special-dnsentry
namespace: default
spec:
dnsName: special.example.com
ttl: 600
targets:
- 1.2.3.4
If one of the accepted DNS names is a direct subname of the shoot’s ingress domain, this is already handled by the standard wildcard entry for the ingress domain. Therefore this name should be excluded from the dnsnames list in the annotation. If only this DNS name is configured in the ingress, no explicit DNS entry is required, and the DNS annotations should be omitted at all.
You can check the status of the DNSEntry
with
$ kubectl get dnsentry
NAME DNS TYPE PROVIDER STATUS AGE
mydnsentry special.example.com aws-route53 default/aws Ready 24s
As soon as the status of the entry is Ready
, the provider has accepted the new DNS record. Depending on the provider and your DNS settings and cache, it may take up to 24 hours for the new entry to be propagated over all internet.
More examples can be found here
Request DNS records for Service/Ingress resources using a DNSAnnotation resource
In rare cases it may not be possible to add annotations to a Service
or Ingress
resource object.
E.g.: the helm chart used to deploy the resource may not be adaptable for some reasons or some automation is used, which always restores the original content of the resource object by dropping any additional annotations.
In these cases, it is recommended to use an additional DNSAnnotation
resource in order to have more flexibility that DNSentry resources
. The DNSAnnotation
resource makes the DNS shoot service behave as if annotations have been added to the referenced resource.
For the Ingress example shown above, you can create a DNSAnnotation
resource alternatively to provide the annotations.
apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSAnnotation
metadata:
annotations:
dns.gardener.cloud/class: garden
name: test-ingress-annotation
namespace: default
spec:
resourceRef:
kind: Ingress
apiVersion: networking.k8s.io/v1
name: test-ingress
namespace: default
annotations:
dns.gardener.cloud/dnsnames: '*'
dns.gardener.cloud/class: garden
Note that the DNSAnnotation resource itself needs the dns.gardener.cloud/class=garden
annotation. This also only works for annotations known to the DNS shoot service (see Accepted External DNS Records Annotations).
For more details, see also DNSAnnotation objects
Accepted External DNS Records Annotations
Here are all of the accepted annotation related to the DNS extension:
Annotation | Description |
---|---|
dns.gardener.cloud/dnsnames | Mandatory for service and ingress resources, accepts a comma-separated list of DNS names if multiple names are required. For ingress you can use the special value '*' . In this case, the DNS names are collected from .spec.rules[].host . |
dns.gardener.cloud/class | Mandatory, in the context of the shoot-dns-service it must always be set to garden . |
dns.gardener.cloud/ttl | Recommended, overrides the default Time-To-Live of the DNS record. |
dns.gardener.cloud/cname-lookup-interval | Only relevant if multiple domain name targets are specified. It specifies the lookup interval for CNAMEs to map them to IP addresses (in seconds) |
dns.gardener.cloud/realms | Internal, for restricting provider access for shoot DNS entries. Typcially not set by users of the shoot-dns-service. |
dns.gardener.cloud/ip-stack | Only relevant for provider type aws-route53 if target is an AWS load balancer domain name. Can be set for service, ingress and DNSEntry resources. It specify which DNS records with alias targets are created instead of the usual CNAME records. If the annotation is not set (or has the value ipv4 ), only an A record is created. With value dual-stack , both A and AAAA records are created. With value ipv6 only an AAAA record is created. |
service.beta.kubernetes.io/aws-load-balancer-ip-address-type=dualstack | For services, behaves similar to dns.gardener.cloud/ip-stack=dual-stack . |
loadbalancer.openstack.org/load-balancer-address | Internal, for services only: support for PROXY protocol on Openstack (which needs a hostname as ingress). Typcially not set by users of the shoot-dns-service. |
If one of the accepted DNS names is a direct subdomain of the shoot’s ingress domain, this is already handled by the standard wildcard entry for the ingress domain. Therefore, this name should be excluded from the dnsnames list in the annotation. If only this DNS name is configured in the ingress, no explicit DNS entry is required, and the DNS annotations should be omitted at all.
Troubleshooting
General DNS tools
To check the DNS resolution, use the nslookup
or dig
command.
$ nslookup special.your-domain.com
or with dig
$ dig +short special.example.com
Depending on your network settings, you may get a successful response faster using a public DNS server (e.g. 8.8.8.8, 8.8.4.4, or 1.1.1.1)
dig @8.8.8.8 +short special.example.com
DNS record events
The DNS controller publishes Kubernetes events for the resource which requested the DNS record (Ingress, Service, DNSEntry). These events reveal more information about the DNS requests being processed and are especially useful to check any kind of misconfiguration, e.g. requests for a domain you don’t own.
Events for a successfully created DNS record:
$ kubectl describe service my-service
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal dns-annotation 19s dns-controller-manager special.example.com: dns entry is pending
Normal dns-annotation 19s (x3 over 19s) dns-controller-manager special.example.com: dns entry pending: waiting for dns reconciliation
Normal dns-annotation 9s (x3 over 10s) dns-controller-manager special.example.com: dns entry active
Please note, events vanish after their retention period (usually 1h
).
DNSEntry status
DNSEntry
resources offer a .status
sub-resource which can be used to check the current state of the object.
Status of a erroneous DNSEntry
.
status:
message: No responsible provider found
observedGeneration: 3
provider: remote
state: Error
References
2.4 - DNS Providers
DNS Providers
Introduction
Gardener can manage DNS records on your behalf, so that you can request them via different resource types (see here) within the shoot cluster. The domains for which you are permitted to request records, are however restricted and depend on the DNS provider configuration.
Shoot provider
By default, every shoot cluster is equipped with a default provider. It is the very same provider that manages the shoot cluster’s kube-apiserver
public DNS record (DNS address in your Kubeconfig).
kind: Shoot
...
dns:
domain: shoot.project.default-domain.gardener.cloud
You are permitted to request any sub-domain of .dns.domain
that is not already taken (e.g. api.shoot.project.default-domain.gardener.cloud
, *.ingress.shoot.project.default-domain.gardener.cloud
) with this provider.
Additional providers
If you need to request DNS records for domains not managed by the default provider, additional providers can
be configured in the shoot specification.
Alternatively, if it is enabled, it can be added as DNSProvider
resources to the shoot cluster.
Additional providers in the shoot specification
To add a providers in the shoot spec, you need set them in the spec.dns.providers
list.
For example:
kind: Shoot
...
spec:
dns:
domain: shoot.project.default-domain.gardener.cloud
providers:
- secretName: my-aws-account
type: aws-route53
- secretName: my-gcp-account
type: google-clouddns
Please consult the API-Reference to get a complete list of supported fields and configuration options.
Referenced secrets should exist in the project namespace in the Garden cluster and must comply with the provider specific credentials format. The External-DNS-Management project provides corresponding examples (20-secret-<provider-name>-credentials.yaml) for known providers.
Additional providers as resources in the shoot cluster
If it is not enabled globally, you have to enable the feature in the shoot manifest:
Kind: Shoot
...
spec:
extensions:
- type: shoot-dns-service
providerConfig:
apiVersion: service.dns.extensions.gardener.cloud/v1alpha1
kind: DNSConfig
dnsProviderReplication:
enabled: true
...
To add a provider directly in the shoot cluster, provide a DNSProvider
in any namespace together
with Secret
containing the credentials.
For example if the domain is hosted with AWS Route 53 (provider type aws-route53
):
apiVersion: dns.gardener.cloud/v1alpha1
kind: DNSProvider
metadata:
annotations:
dns.gardener.cloud/class: garden
name: my-own-domain
namespace: my-namespace
spec:
type: aws-route53
secretRef:
name: my-own-domain-credentials
domains:
include:
- my.own.domain.com
---
apiVersion: v1
kind: Secret
metadata:
name: my-own-domain-credentials
namespace: my-namespace
type: Opaque
data:
# replace '...' with values encoded as base64
AWS_ACCESS_KEY_ID: ...
AWS_SECRET_ACCESS_KEY: ...
The External-DNS-Management project provides examples with more details for DNSProviders
(30-provider-<provider-name>.yaml)
and credential Secrets
(20-secret-<provider-name>.yaml) at https://github.com/gardener/external-dns-management//examples
for all supported provider types.
2.5 - Gateway Api Gateways
Using annotated Gateway API Gateway and/or HTTPRoutes as Source
This tutorial describes how to use annotated Gateway API resources as source for DNSEntries with the Gardener shoot-dns-service extension.
The dns-controller-manager supports the resources Gateway
and HTTPRoute
.
Install Istio on your cluster
Using a new or existing shoot cluster, follow the Istio Kubernetes Gateway API to install the Gateway API and to install Istio.
These are the typical commands for the Istio installation with the Kubernetes Gateway API:
export KUEBCONFIG=...
curl -L https://istio.io/downloadIstio | sh -
kubectl get crd gateways.gateway.networking.k8s.io &> /dev/null || \
{ kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd?ref=v1.0.0" | kubectl apply -f -; }
istioctl install --set profile=minimal -y
kubectl label namespace default istio-injection=enabled
Verify that Gateway Source works
Install a sample service
With automatic sidecar injection:
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin.yaml
Using a Gateway as a source
Deploy the Gateway API configuration including a single exposed route (i.e., /get):
kubectl create namespace istio-ingress
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: gateway
namespace: istio-ingress
annotations:
dns.gardener.cloud/dnsnames: "*.example.com"
dns.gardener.cloud/class: garden
spec:
gatewayClassName: istio
listeners:
- name: default
hostname: "*.example.com" # this is used by dns-controller-manager to extract DNS names
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: All
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: http
namespace: default
spec:
parentRefs:
- name: gateway
namespace: istio-ingress
hostnames: ["httpbin.example.com"] # this is used by dns-controller-manager to extract DNS names too
rules:
- matches:
- path:
type: PathPrefix
value: /get
backendRefs:
- name: httpbin
port: 8000
EOF
You should now see events in the namespace of the gateway:
$ kubectl -n istio-system get events --sort-by={.metadata.creationTimestamp}
LAST SEEN TYPE REASON OBJECT MESSAGE
...
38s Normal dns-annotation service/gateway-istio httpbin.example.com: created dns entry object shoot--foo--bar/gateway-istio-service-zpf8n
38s Normal dns-annotation service/gateway-istio httpbin.example.com: dns entry pending: waiting for dns reconciliation
38s Normal dns-annotation service/gateway-istio httpbin.example.com: dns entry is pending
36s Normal dns-annotation service/gateway-istio httpbin.example.com: dns entry active
Using a HTTPRoute as a source
If the Gateway
resource is annotated with dns.gardener.cloud/dnsnames: "*"
, hostnames from all referencing HTTPRoute
resources
are automatically extracted. These resources don’t need an additional annotation.
Deploy the Gateway API configuration including a single exposed route (i.e., /get):
kubectl create namespace istio-ingress
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: gateway
namespace: istio-ingress
annotations:
dns.gardener.cloud/dnsnames: "*"
dns.gardener.cloud/class: garden
spec:
gatewayClassName: istio
listeners:
- name: default
hostname: null # not set
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: All
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: http
namespace: default
spec:
parentRefs:
- name: gateway
namespace: istio-ingress
hostnames: ["httpbin.example.com"] # this is used by dns-controller-manager to extract DNS names too
rules:
- matches:
- path:
type: PathPrefix
value: /get
backendRefs:
- name: httpbin
port: 8000
EOF
This should show a similar events as above.
Access the sample service using curl
$ curl -I http://httpbin.example.com/get
HTTP/1.1 200 OK
server: istio-envoy
date: Tue, 13 Feb 2024 08:09:41 GMT
content-type: application/json
content-length: 701
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 19
Accessing any other URL that has not been explicitly exposed should return an HTTP 404 error:
$ curl -I http://httpbin.example.com/headers
HTTP/1.1 404 Not Found
date: Tue, 13 Feb 2024 08:09:41 GMT
server: istio-envoy
transfer-encoding: chunked
2.6 - Istio Gateways
Using annotated Istio Gateway and/or Istio Virtual Service as Source
This tutorial describes how to use annotated Istio Gateway resources as source for DNSEntries with the Gardener shoot-dns-service extension.
Install Istio on your cluster
Using a new or existing shoot cluster, follow the Istio Getting Started to download and install Istio.
These are the typical commands for the istio demo installation
export KUEBCONFIG=...
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=demo -y
kubectl label namespace default istio-injection=enabled
Verify that Istio Gateway/VirtualService Source works
Install a sample service
With automatic sidecar injection:
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/httpbin/httpbin.yaml
Using a Gateway as a source
Create an Istio Gateway:
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: httpbin-gateway
namespace: istio-system
annotations:
dns.gardener.cloud/dnsnames: "*"
dns.gardener.cloud/class: garden
spec:
selector:
istio: ingressgateway # use Istio default gateway implementation
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "httpbin.example.com" # this is used by the dns-controller-manager to extract DNS names
EOF
Configure routes for traffic entering via the Gateway:
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: httpbin
namespace: default
spec:
hosts:
- "httpbin.example.com" # this is also used by the dns-controller-manager to extract DNS names
gateways:
- istio-system/httpbin-gateway
http:
- match:
- uri:
prefix: /status
- uri:
prefix: /delay
route:
- destination:
port:
number: 8000
host: httpbin
EOF
You should now see events in the namespace of the gateway:
$ kubectl -n istio-system get events --sort-by={.metadata.creationTimestamp}
LAST SEEN TYPE REASON OBJECT MESSAGE
...
38s Normal dns-annotation gateway/httpbin-gateway httpbin.example.com: created dns entry object shoot--foo--bar/httpbin-gateway-gateway-zpf8n
38s Normal dns-annotation gateway/httpbin-gateway httpbin.example.com: dns entry pending: waiting for dns reconciliation
38s Normal dns-annotation gateway/httpbin-gateway httpbin.example.com: dns entry is pending
36s Normal dns-annotation gateway/httpbin-gateway httpbin.example.com: dns entry active
Using a VirtualService as a source
If the Gateway
resource is annotated with dns.gardener.cloud/dnsnames: "*"
, hosts from all referencing VirtualServices
resources
are automatically extracted. These resources don’t need an additional annotation.
Create an Istio Gateway:
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: httpbin-gateway
namespace: istio-system
annotations:
dns.gardener.cloud/dnsnames: "*"
dns.gardener.cloud/class: garden
spec:
selector:
istio: ingressgateway # use Istio default gateway implementation
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
EOF
Configure routes for traffic entering via the Gateway:
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: httpbin
namespace: default
spec:
hosts:
- "httpbin.example.com" # this is used by dns-controller-manager to extract DNS names
gateways:
- istio-system/httpbin-gateway
http:
- match:
- uri:
prefix: /status
- uri:
prefix: /delay
route:
- destination:
port:
number: 8000
host: httpbin
EOF
This should show a similar events as above.
To get the targets to the extracted DNS names, the shoot-dns-service controller is able to gather information from the kubernetes service of the Istio Ingress Gateway.
Note: It is also possible to set the targets my specifying an Ingress resource using the dns.gardener.cloud/ingress
annotation on the Istio Ingress Gateway resource.
Note: It is also possible to set the targets manually by using the dns.gardener.cloud/targets
annotation on the Istio Ingress Gateway resource.
Access the sample service using curl
$ curl -I http://httpbin.example.com/status/200
HTTP/1.1 200 OK
server: istio-envoy
date: Tue, 13 Feb 2024 07:49:37 GMT
content-type: text/html; charset=utf-8
access-control-allow-origin: *
access-control-allow-credentials: true
content-length: 0
x-envoy-upstream-service-time: 15
Accessing any other URL that has not been explicitly exposed should return an HTTP 404 error:
$ curl -I http://httpbin.example.com/headers
HTTP/1.1 404 Not Found
date: Tue, 13 Feb 2024 08:09:41 GMT
server: istio-envoy
transfer-encoding: chunked
3 - Egress filtering
Gardener Extension for Networking Filter
Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.
Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.
This controller implements Gardener’s extension contract for the shoot-networking-filter
extension.
An example for a ControllerRegistration
resource that can be used to register this controller to Gardener can be found here.
Please find more information regarding the extensibility concepts and a detailed proposal here.
Extension Resources
Example extension resource:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: extension-shoot-networking-filter
namespace: shoot--project--abc
spec:
providerConfig:
egressFilter:
blackholingEnabled: false
staticFilterList:
- network: 1.2.3.4/31
policy: BLOCK_ACCESS
workers:
blackholingEnabled: true
names:
- external-api
When an extension resource is reconciled, if the optional workers
field is not used, the extension controller will create a daemonset egress-filter-applier
on the shoot containing a Dockerfile container.
If the optional workers
field is used, the extension controller will create one daemonset egress-filter-applier-<worker name>
per each worker group on the shoot.
See the usage documentation for more details on how to configure the extension on a shoot cluster.
Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters.
How to start using or developing this extension controller locally
You can run the controller locally on your machine by executing make start
.
We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.
Feedback and Support
Feedback and contributions are always welcome. Please report bugs or suggestions as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).
Learn more!
Please find further resources about out project here:
3.1 - Deployment
Gardener Networking Policy Filter for Shoots
Introduction
Gardener allows shoot clusters to filter egress traffic on node level. To support this the Gardener must be installed with the shoot-networking-filter
extension.
Configuration
To generally enable the networking filter for shoot objects the shoot-networking-filter
extension must be registered by providing an appropriate extension registration in the garden cluster.
Here it is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot.
If the extension should be used for all shoots the globallyEnabled
flag should be set to true
.
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
...
spec:
resources:
- kind: Extension
type: shoot-networking-filter
globallyEnabled: true
ControllerRegistration
An example of a ControllerRegistration
for the shoot-networking-filter
can be found at controller-registration.yaml.
The ControllerRegistration
contains a Helm chart which eventually deploys the shoot-networking-filter
to seed clusters. It offers some configuration options, mainly to set up a static filter list or provide the configuration for downloading the filter list from a service endpoint.
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerDeployment
...
values:
egressFilter:
blackholingEnabled: true
filterListProviderType: static
staticFilterList:
- network: 1.2.3.4/31
policy: BLOCK_ACCESS
- network: 5.6.7.8/32
policy: BLOCK_ACCESS
- network: ::2/128
policy: BLOCK_ACCESS
#filterListProviderType: download
#downloaderConfig:
# endpoint: https://my.filter.list.server/lists/policy
# oauth2Endpoint: https://my.auth.server/oauth2/token
# refreshPeriod: 1h
## if the downloader needs an OAuth2 access token, client credentials can be provided with oauth2Secret
#oauth2Secret:
# clientID: 1-2-3-4
# clientSecret: secret!!
## either clientSecret of client certificate is required
# client.crt.pem: |
# -----BEGIN CERTIFICATE-----
# ...
# -----END CERTIFICATE-----
# client.key.pem: |
# -----BEGIN PRIVATE KEY-----
# ...
# -----END PRIVATE KEY-----
Enablement for a Shoot
If the shoot networking filter is not globally enabled by default (depends on the extension registration on the garden cluster), it can be enabled per shoot. To enable the service for a shoot, the shoot manifest must explicitly add the shoot-networking-filter
extension.
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-filter
...
If the shoot networking filter is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-filter
disabled: true
...
3.2 - Shoot Networking Filter
Register Shoot Networking Filter Extension in Shoot Clusters
Introduction
Within a shoot cluster, it is possible to enable the networking filter. It is necessary that the Gardener installation your shoot cluster runs in is equipped with a shoot-networking-filter
extension. Please ask your Gardener operator if the extension is available in your environment.
Shoot Feature Gate
In most of the Gardener setups the shoot-networking-filter
extension is not enabled globally and thus must be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to activate the extension individually.
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-filter
...
Opt-out
If the shoot networking filter is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-filter
disabled: true
...
Ingress Filtering
By default, the networking filter only filters egress traffic. However, if you enable blackholing, incoming traffic will also be blocked. You can enable blackholing on a per-shoot basis.
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-filter
providerConfig:
egressFilter:
blackholingEnabled: true
...
Ingress traffic can only be blocked by blackhole routing, if the source IP address is preserved. On Azure, GCP and AliCloud this works by default.
The default on AWS is a classic load balancer that replaces the source IP by it’s own IP address. Here, a network load balancer has to be
configured adding the annotation service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
to the service.
On OpenStack, load balancers don’t preserve the source address.
When you disable blackholing
in an existing shoot, the associated blackhole routes will be removed automatically.
Conversely, when you re-enable blackholing
again, the iptables-based filter rules will be removed and replaced by blackhole routes.
Ingress Filtering per Worker Group
You can optionally enable or disable ingress filtering for specified worker groups.
For example, you may want to disable blackholing in general but enable it for a worker group hosting an external API.
You can do so by using an optional workers
field:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-filter
providerConfig:
egressFilter:
blackholingEnabled: false
workers:
blackholingEnabled: true
names:
- external-api
...
Please note that only blackholing can be changed per worker group. You may not define different IPs to block or disable blocking altogether.
Custom IP
It is possible to add custom IP addresses to the network filter. This can be useful for testing purposes.
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-filter
providerConfig:
egressFilter:
staticFilterList:
- network: 1.2.3.4/31
policy: BLOCK_ACCESS
- network: 5.6.7.8/32
policy: BLOCK_ACCESS
- network: ::2/128
policy: BLOCK_ACCESS
...
4 - Lakom service
Gardener Extension for lakom services
Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.
Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.
This controller implements Gardener’s extension contract for the shoot-lakom-service
extension.
An example for a ControllerRegistration
resource that can be used to register this controller to Gardener can be found here.
Please find more information regarding the extensibility concepts and a detailed proposal here.
Lakom Admission Controller
Lakom is kubernetes admission controller which purpose is to implement cosign image signature verification against public cosign key. It also takes care to resolve image tags to sha256 digests. It also caches all OCI artifacts to reduce the load toward the OCI registry.
Extension Resources
Example extension resource:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: extension-shoot-lakom-service
namespace: shoot--project--abc
spec:
type: shoot-lakom-service
When an extension resource is reconciled, the extension controller will create an instance of lakom
admission controller. These resources are placed inside the shoot namespace on the seed. Also, the controller takes care about generating necessary RBAC
resources for the seed as well as for the shoot.
Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters.
How to start using or developing this extension controller locally
The Lakom
admission controller can be configured with make dev-setup
and started with make start-lakom
.
You can run the lakom extension controller locally on your machine by executing make start
.
If you’d like to develop Lakom using a local cluster such as KinD, make sure your KUBECONFIG environment variable is targeting the local Garden cluster.
Add 127.0.0.1 garden.local.gardener.cloud
to your /etc/hosts
. You can then run:
make extension-up
This will trigger a skaffold deployment that builds the images, pushes them to the registry and installs the helm charts from /charts
.
We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.
Feedback and Support
Feedback and contributions are always welcome. Please report bugs or suggestions as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).
Learn more
Please find further resources about out project here:
4.1 - Deployment
Gardener Lakom Service for Shoots
Introduction
Gardener allows Shoot clusters to use Lakom
admission controller for cosign image signing verification. To support this the Gardener must be installed with the shoot-lakom-service
extension.
Configuration
To generally enable the Lakom service for shoot objects the shoot-lakom-service
extension must be registered by providing an appropriate extension registration in the garden cluster.
Here it is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot.
If the extension should be used for all shoots the globallyEnabled
flag should be set to true
.
spec:
resources:
- kind: Extension
type: shoot-lakom-service
globallyEnabled: true
Shoot Feature Gate
If the shoot Lakom service is not globally enabled by default (depends on the extension registration on the garden cluster), it can be enabled per shoot. To enable the service for a shoot, the shoot manifest must explicitly add the shoot-lakom-service
extension.
...
spec:
extensions:
- type: shoot-lakom-service
...
If the shoot Lakom service is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.
...
spec:
extensions:
- type: shoot-lakom-service
disabled: true
...
4.2 - Lakom
Introduction
Lakom is kubernetes admission controller which purpose is to implement cosign image signature verification with public cosign key. It also takes care to resolve image tags to sha256 digests. A built-in cache mechanism can be enabled to reduce the load toward the OCI registry.
Flags
Lakom admission controller is configurable via command line flags. The trusted
cosign public keys and the associated algorithms associated with them are set
viq configuration file provided with the flag --lakom-config-path
.
Flag Name | Description | Default Value |
---|---|---|
--bind-address | Address to bind to | “0.0.0.0” |
--cache-refresh-interval | Refresh interval for the cached objects | 30s |
--cache-ttl | TTL for the cached objects. Set to 0, if cache has to be disabled | 10m0s |
--contention-profiling | Enable lock contention profiling, if profiling is enabled | false |
--health-bind-address | Bind address for the health server | “:8081” |
-h , --help | help for lakom | |
--insecure-allow-insecure-registries | If set, communication via HTTP with registries will be allowed. | false |
--insecure-allow-untrusted-images | If set, the webhook will just return warning for the images without trusted signatures. | false |
--kubeconfig | Paths to a kubeconfig. Only required if out-of-cluster. | |
--lakom-config-path | Path to file with lakom configuration containing cosign public keys used to verify the image signatures | |
--metrics-bind-address | Bind address for the metrics server | “:8080” |
--port | Webhook server port | 9443 |
--profiling | Enable profiling via web interface host:port/debug/pprof/ | false |
--tls-cert-dir | Directory with server TLS certificate and key (must contain a tls.crt and tls.key file | |
--use-only-image-pull-secrets | If set, only the credentials from the image pull secrets of the pod are used to access the OCI registry. Otherwise, the node identity and docker config are also used. | false |
--version | prints version information and quits; –version=vX.Y.Z… sets the reported version |
Lakom Cosign Public Keys Configuration File
Lakom cosign public keys configuration file should be YAML or JSON formatted. It
can set multiple trusted keys, as each key must be given a name. The supported
types of public keys are RSA
, ECDSA
and Ed25519
. The RSA
keys can be
additionally configured with a signature verification algorithm specifying the
scheme and hash function used during signature verification. As of now ECDSA
and Ed25519
keys cannot be configured with specific algorithm.
publicKeys:
- name: example-public-key
algorithm: RSASSA-PSS-SHA256
key: |-
-----BEGIN PUBLIC KEY-----
MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBAPeQXbIWMMXYV+9+j9b4jXTflnpfwn4E
GMrmqYVhm0sclXb2FPP5aV/NFH6SZdHDZcT8LCNsNgxzxV4N+UE/JIsCAwEAAQ==
-----END PUBLIC KEY-----
Supported RSA Signature Verification Algorithms
RSASSA-PKCS1-v1_5-SHA256
: usesRSASSA-PKCS1-v1_5
scheme withSHA256
hash funcRSASSA-PKCS1-v1_5-SHA384
: usesRSASSA-PKCS1-v1_5
scheme withSHA384
hash funcRSASSA-PKCS1-v1_5-SHA512
: usesRSASSA-PKCS1-v1_5
scheme withSHA512
hash funcRSASSA-PSS-SHA256
: usesRSASSA-PSS
scheme withSHA256
hash funcRSASSA-PSS-SHA384
: usesRSASSA-PSS
scheme withSHA384
hash funcRSASSA-PSS-SHA512
: usesRSASSA-PSS
scheme withSHA512
hash func
4.3 - Shoot Extension
Introduction
This extension implements cosign image verification. It is strictly limited only to the kubernetes system components deployed by Gardener and other Gardener Extensions in the kube-system
namespace of a shoot cluster.
Shoot Feature Gate
In most of the Gardener setups the shoot-lakom-service
extension is enabled globally and thus can be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to disable the extension individually.
kind: Shoot
...
spec:
extensions:
- type: shoot-lakom-service
disabled: true
providerConfig:
apiVersion: lakom.extensions.gardener.cloud/v1alpha1
kind: LakomConfig
scope: KubeSystem
...
The scope
field instruct lakom which pods to validate. The possible values are:
KubeSystem
Lakom will validate all pods in thekube-system
namespace.KubeSystemManagedByGardener
Lakom will validate all pods in thekube-system
namespace that are annotated with “managed-by/gardener”Cluster
Lakom will validate all pods in all namespaces.
5 - Networking problemdetector
Gardener Extension for Network Problem Detector
Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.
Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.
This controller implements Gardener’s extension contract for the shoot-networking-problemdetector
extension.
An example for a ControllerRegistration
resource that can be used to register this controller to Gardener can be found here.
Please find more information regarding the extensibility concepts and a detailed proposal here.
Extension Resources
Currently there is nothing to specify in the extension spec.
Example extension resource:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: extension-shoot-networking-problemdetector
namespace: shoot--project--abc
spec:
When an extension resource is reconciled, the extension controller will create two daemonsets nwpd-agent-pod-net
and nwpd-agent-node-net
deploying
the “network problem detector agent”.
These daemon sets perform and collect various checks between all nodes of the Kubernetes cluster, to its Kube API server and/or external endpoints.
Checks are performed using TCP connections, PING (ICMP) or mDNS (UDP).
More details about the network problem detector agent can be found in its repository gardener/network-problem-detector.
Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters.
How to start using or developing this extension controller locally
You can run the controller locally on your machine by executing make start
.
We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.
Feedback and Support
Feedback and contributions are always welcome. Please report bugs or suggestions as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).
Learn more!
Please find further resources about out project here:
5.1 - Deployment
Gardener Networking Policy Filter for Shoots
Introduction
Gardener allows shoot clusters to add network problem observability using the network problem detector.
To support this the Gardener must be installed with the shoot-networking-problemdetector
extension.
Configuration
To generally enable the networking problem detector for shoot objects the shoot-networking-problemdetector
extension must be registered by providing an appropriate extension registration in the garden cluster.
Here it is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot.
If the extension should be used for all shoots the globallyEnabled
flag should be set to true
.
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerRegistration
...
spec:
resources:
- kind: Extension
type: shoot-networking-problemdetector
globallyEnabled: true
ControllerRegistration
An example of a ControllerRegistration
for the shoot-networking-problemdetector
can be found at controller-registration.yaml.
The ControllerRegistration
contains a Helm chart which eventually deploys the shoot-networking-problemdetector
to seed clusters. It offers some configuration options, mainly to set up a static filter list or provide the configuration for downloading the filter list from a service endpoint.
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerDeployment
...
values:
#networkProblemDetector:
# defaultPeriod: 30s
Enablement for a Shoot
If the shoot network problem detector is not globally enabled by default (depends on the extension registration on the garden cluster), it can be enabled per shoot. To enable the service for a shoot, the shoot manifest must explicitly add the shoot-networking-problemdetector
extension.
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-problemdetector
...
If the shoot network problem detector is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-problemdetector
disabled: true
...
5.2 - Shoot Networking Problemdetector
Register Shoot Networking Filter Extension in Shoot Clusters
Introduction
Within a shoot cluster, it is possible to enable the network problem detector. It is necessary that the Gardener installation your shoot cluster runs in is equipped with a shoot-networking-problemdetector
extension. Please ask your Gardener operator if the extension is available in your environment.
Shoot Feature Gate
In most of the Gardener setups the shoot-networking-problemdetector
extension is not enabled globally and thus must be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to activate the extension individually.
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-problemdetector
...
Opt-out
If the shoot network problem detector is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
...
spec:
extensions:
- type: shoot-networking-problemdetector
disabled: true
...
6 - Node Audit Logging
Gardener Extension to configure rsyslog with relp module
Gardener extension controller which configures the rsyslog and auditd services installed on shoot nodes.
Usage
- Configuring the Rsyslog Relp Extension - learn what is the use-case for rsyslog-relp, how to enable it and configure it
Local Setup and Development
- Deploying the Rsyslog Relp Extension Locally - learn how to set up a local development environment
- Developer Docs for Gardener Shoot Rsyslog Relp Extension - learn about the inner workings
6.1 - Configuration
Configuring the Rsyslog Relp Extension
Introduction
As a cluster owner, you might need audit logs on a Shoot node level. With these audit logs you can track actions on your nodes like privilege escalation, file integrity, process executions, and who is the user that performed these actions. Such information is essential for the security of your Shoot cluster. Linux operating systems collect such logs via the auditd
and journald
daemons. However, these logs can be lost if they are only kept locally on the operating system. You need a reliable way to send them to a remote server where they can be stored for longer time periods and retrieved when necessary.
Rsyslog offers a solution for that. It gathers and processes logs from auditd
and journald
and then forwards them to a remote server. Moreover, rsyslog
can make use of the RELP protocol so that logs are sent reliably and no messages are lost.
The shoot-rsyslog-relp
extension is used to configure rsyslog
on each Shoot node so that the following can take place:
Rsyslog
reads logs from theauditd
andjournald
sockets.- The logs are filtered based on the program name and syslog severity of the message.
- The logs are enriched with metadata containing the name of the Project in which the Shoot is created, the name of the Shoot, the UID of the Shoot, and the hostname of the node on which the log event occurred.
- The enriched logs are sent to the target remote server via the RELP protocol.
The following graph shows a rough outline of how that looks in a Shoot cluster:
Shoot Configuration
The extension is not globally enabled and must be configured per Shoot cluster. The Shoot specification has to be adapted to include the shoot-rsyslog-relp
extension configuration, which specifies the target server to which logs are forwarded, its port, and some optional rsyslog settings described in the examples below.
Below is an example shoot-rsyslog-relp
extension configuration as part of the Shoot spec:
kind: Shoot
metadata:
name: bar
namespace: garden-foo
...
spec:
extensions:
- type: shoot-rsyslog-relp
providerConfig:
apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: RsyslogRelpConfig
# Set the target server to which logs are sent. The server must support the RELP protocol.
target: some.rsyslog-relp.server
# Set the port of the target server.
port: 10250
# Define rules to select logs from which programs and with what syslog severity
# are forwarded to the target server.
loggingRules:
- severity: 4
programNames: ["kubelet", "audisp-syslog"]
- severity: 1
programNames: ["audisp-syslog"]
# Define an interval of 90 seconds at which the current connection is broken and re-established.
# By default this value is 0 which means that the connection is never broken and re-established.
rebindInterval: 90
# Set the timeout for relp sessions to 90 seconds. If set too low, valid sessions may be considered
# dead and tried to recover.
timeout: 90
# Set how often an action is retried before it is considered to have failed.
# Failed actions discard log messages. Setting `-1` here means that messages are never discarded.
resumeRetryCount: -1
# Configures rsyslog to report continuation of action suspension, e.g. when the connection to the target
# server is broken.
reportSuspensionContinuation: true
# Add tls settings if tls should be used to encrypt the connection to the target server.
tls:
enabled: true
# Use `name` authentication mode for the tls connection.
authMode: name
# Only allow connections if the server's name is `some.rsyslog-relp.server`
permittedPeer:
- "some.rsyslog-relp.server"
# Reference to the resource which contains certificates used for the tls connection.
# It must be added to the `.spec.resources` field of the Shoot.
secretReferenceName: rsyslog-relp-tls
# Instruct librelp on the Shoot nodes to use the gnutls tls library.
tlsLib: gnutls
# Add auditConfig settings if you want to customize node level auditing.
auditConfig:
enabled: true
# Reference to the resource which contains the audit configuration.
# It must be added to the `.spec.resources` field of the Shoot.
configMapReferenceName: audit-config
resources:
# Add the rsyslog-relp-tls secret in the resources field of the Shoot spec.
- name: rsyslog-relp-tls
resourceRef:
apiVersion: v1
kind: Secret
name: rsyslog-relp-tls-v1
- name: audit-config
resourceRef:
apiVersion: v1
kind: ConfigMap
name: audit-config-v1
...
Choosing Which Log Messages to Send to the Target Server
The .loggingRules
field defines rules about which logs should be sent to the target server. When a log is processed by rsyslog, it is compared against the list of rules in order. If the program name and the syslog severity of the log messages matches the rule, the message is forwarded to the target server. The following table describes the syslog severity and their corresponding codes:
Numerical Severity
Code
0 Emergency: system is unusable
1 Alert: action must be taken immediately
2 Critical: critical conditions
3 Error: error conditions
4 Warning: warning conditions
5 Notice: normal but significant condition
6 Informational: informational messages
7 Debug: debug-level messages
Below is an example with a .loggingRules
section that will only forward logs from the kubelet
program with syslog severity of 6 or lower and any other program with syslog severity of 2 or lower:
apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: RsyslogRelpConfig
target: localhost
port: 1520
loggingRules:
- severity: 6
programNames: ["kubelet"]
- severity: 2
You can use a minimal shoot-rsyslog-relp
extension configuration to forward all logs to the target server:
apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: RsyslogRelpConfig
target: some.rsyslog-relp.server
port: 10250
loggingRules:
- severity: 7
Securing the Communication to the Target Server with TLS
The communication to the target server is not encrypted by default. To enable encryption, set the .tls.enabled
field in the shoot-rsyslog-relp
extension configuration to true
. In this case, an immutable secret which contains the TLS certificates used to establish the TLS connection to the server must be created in the same project namespace as your Shoot.
An example Secret is given below:
Note: The secret must be immutable
kind: Secret
apiVersion: v1
metadata:
name: rsyslog-relp-tls-v1
namespace: garden-foo
immutable: true
data:
ca: |
-----BEGIN BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
crt: |
-----BEGIN BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
key: |
-----BEGIN BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
The Secret must be referenced in the Shoot’s .spec.resources
field and the corresponding resource entry must be referenced in the .tls.secretReferenceName
of the shoot-rsyslog-relp
extension configuration:
kind: Shoot
metadata:
name: bar
namespace: garden-foo
...
spec:
extensions:
- type: shoot-rsyslog-relp
providerConfig:
apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: RsyslogRelpConfig
target: some.rsyslog-relp.server
port: 10250
loggingRules:
- severity: 7
tls:
enabled: true
secretReferenceName: rsyslog-relp-tls
resources:
- name: rsyslog-relp-tls
resourceRef:
apiVersion: v1
kind: Secret
name: rsyslog-relp-tls-v1
...
You can set a few additional parameters for the TLS connection: .tls.authMode
, tls.permittedPeer
, and tls.tlsLib
. Refer to the rsyslog documentation for more information on these parameters:
Configuring the Audit Daemon on the Shoot Nodes
The shoot-rsyslog-relp
extension also allows you to configure the Audit Daemon (auditd
) on the Shoot nodes.
By default, the audit rules located under the /etc/audit/rules.d
directory on your Shoot’s nodes will be moved to /etc/audit/rules.d.original
and the following rules will be placed under the /etc/audit/rules.d
directory: 00-base-config.rules, 10-privilege-escalation.rules, 11-privilege-special.rules, 12-system-integrity.rules. Next, augerules --load
will be called and the audit daemon (auditd
) restarted so that the new rules can take effect.
Alternatively, you can define your own auditd
rules to be placed on your Shoot’s nodes by using the following configuration:
apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: Auditd
auditRules: |
## First rule - delete all existing rules
-D
## Now define some custom rules
-a exit,always -F arch=b64 -S setuid -S setreuid -S setgid -S setregid -F auid>0 -F auid!=-1 -F key=privilege_escalation
-a exit,always -F arch=b64 -S execve -S execveat -F euid=0 -F auid>0 -F auid!=-1 -F key=privilege_escalation
In this case the original rules are also backed up in the /etc/audit/rules.d.original
directory.
To deploy this configuration, it must be embedded in an immutable ConfigMap.
Note
The data key storing this configuration must be named
auditd
.
An example ConfigMap
is given below:
apiVersion: v1
kind: ConfigMap
metadata:
name: audit-config-v1
namespace: garden-foo
immutable: true
data:
auditd: |
apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: Auditd
auditRules: |
## First rule - delete all existing rules
-D
## Now define some custom rules
-a exit,always -F arch=b64 -S setuid -S setreuid -S setgid -S setregid -F auid>0 -F auid!=-1 -F key=privilege_escalation
-a exit,always -F arch=b64 -S execve -S execveat -F euid=0 -F auid>0 -F auid!=-1 -F key=privilege_escalation
After creating such a ConfigMap
, it must be included in the Shoot’s spec.resources
array and then referenced from the providerConfig.auditConfig.configMapReferenceName
field in the shoot-rsyslog-relp
extension configuration.
An example configuration is given below:
kind: Shoot
metadata:
name: bar
namespace: garden-foo
...
spec:
extensions:
- type: shoot-rsyslog-relp
providerConfig:
apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: RsyslogRelpConfig
target: some.rsyslog-relp.server
port: 10250
loggingRules:
- severity: 7
auditConfig:
enabled: true
configMapReferenceName: audit-config
resources:
- name: audit-config
resourceRef:
apiVersion: v1
kind: ConfigMap
name: audit-config-v1
Finally, by setting providerConfig.auditConfig.enabled
to false
in the shoot-rsyslog-relp
extension configuration, the original audit rules on your Shoot’s nodes will not be modified and auditd
will not be restarted.
Examples on how the providerConfig.auditConfig.enabled
field functions are given below:
- The following deploys the extension default audit rules as of today:
providerConfig: auditConfig: enabled: true
- The following deploys only the rules specified in the referenced ConfigMap:
providerConfig: auditConfig: enabled: true configMapReferenceName: audit-config
- Both of the following do not deploy any audit rules:
providerConfig: auditConfig: enabled: false configMapReferenceName: audit-config
providerConfig: auditConfig: enabled: false
6.2 - Deploying Rsyslog Relp Extension Remotely
Deploying Rsyslog Relp Extension Remotely
This document will walk you through running the Rsyslog Relp extension controller on a remote seed cluster and the rsyslog relp admission component in your local garden cluster for development purposes. This guide uses Gardener’s setup with provider extensions and builds on top of it.
If you encounter difficulties, please open an issue so that we can make this process easier.
Prerequisites
- Make sure that you have a running Gardener setup with provider extensions. The steps to complete this can be found in the Deploying Gardener Locally and Enabling Provider-Extensions guide.
- Make sure you are running Gardener version
>= 1.95.0
or the latest version of the master branch.
Setting up the Rsyslog Relp Extension
Important: Make sure that your KUBECONFIG
env variable is targeting the local Gardener cluster!
The location of the Gardener project from the Gardener setup is expected to be under the same root as this repository (e.g. ~/go/src/github.com/gardener/). If this is not the case, the location of Gardener project should be specified in GARDENER_REPO_ROOT
environment variable:
export GARDENER_REPO_ROOT="<path_to_gardener_project>"
Then you can run:
make remote-extension-up
In case you have added additional Seeds you can specify the seed name:
make remote-extension-up SEED_NAME=<seed-name>
Creating a Shoot Cluster
Once the above step is completed, you can create a Shoot cluster. In order to create a Shoot cluster, please create your own Shoot
definition depending on providers on your Seed
cluster.
Configuring the Shoot Cluster and deploying the Rsyslog Relp Echo Server
To be able to properly test the rsyslog relp extension you need a running rsyslog relp echo server to which logs from the Shoot nodes can be sent. To deploy the server and configure the rsyslog relp extension on your Shoot cluster you can run:
make configure-shoot SHOOT_NAME=<shoot-name> SHOOT_NAMESPACE=<shoot-namespace>
This command will deploy an rsyslog relp echo server in your Shoot cluster in the rsyslog-relp-echo-server
namespace.
It will also add configuration for the shoot-rsyslog-relp
extension to your Shoot
spec by patching it with ./example/extension/<shoot-name>--<shoot-namespace>--extension-config-patch.yaml
. This file is automatically copied from extension-config-patch.yaml.tmpl
in the same directory when you run make configure-shoot
for the first time. The file also includes explanations of the properties you should set or change.
The command will also deploy the rsyslog-relp-tls
secret in case you wish to enable tls.
Tearing Down the Development Environment
To tear down the development environment, delete the Shoot cluster or disable the shoot-rsyslog-relp
extension in the Shoot’s specification. When the extension is not used by the Shoot anymore, you can run:
make remote-extension-down
The make target will delete the ControllerDeployment and ControllerRegistration of the extension, and the shoot-rsyslog-relp
admission helm deployment.
6.3 - Getting Started
Deploying Rsyslog Relp Extension Locally
This document will walk you through running the Rsyslog Relp extension and a fake rsyslog relp service on your local machine for development purposes. This guide uses Gardener’s local development setup and builds on top of it.
If you encounter difficulties, please open an issue so that we can make this process easier.
Prerequisites
- Make sure that you have a running local Gardener setup. The steps to complete this can be found here.
- Make sure you are running Gardener version
>= 1.74.0
or the latest version of the master branch.
Setting up the Rsyslog Relp Extension
Important: Make sure that your KUBECONFIG
env variable is targeting the local Gardener cluster!
make extension-up
This will build the shoot-rsyslog-relp
, shoot-rsyslog-relp-admission
, and shoot-rsyslog-relp-echo-server
images and deploy the needed resources and configurations in the garden cluster. The shoot-rsyslog-relp-echo-server
will act as development replacement of a real rsyslog relp server.
Creating a Shoot Cluster
Once the above step is completed, we can deploy and configure a Shoot cluster with default rsyslog relp settings.
kubectl apply -f ./example/shoot.yaml
Once the Shoot’s namespace is created, we can create a networkpolicy
that will allow egress traffic from the rsyslog
on the Shoot’s nodes to the rsyslog-relp-echo-server
that serves as a fake rsyslog target server.
kubectl apply -f ./example/local/allow-machine-to-rsyslog-relp-echo-server-netpol.yaml
Currently, the Shoot’s nodes run Ubuntu, which does not have the rsyslog-relp
and auditd
packages installed, so the configuration done by the extension has no effect.
Once the Shoot is created, we have to manually install the rsyslog-relp
and auditd
packages:
kubectl -n shoot--local--local exec -it $(kubectl -n shoot--local--local get po -l app=machine,machine-provider=local -o name) -- bash -c "
apt-get update && \
apt-get install -y rsyslog-relp auditd && \
systemctl enable rsyslog.service && \
systemctl start rsyslog.service"
Once that is done we can verify that log messages are forwarded to the rsyslog-relp-echo-server
by checking its logs.
kubectl -n rsyslog-relp-echo-server logs deployment/rsyslog-relp-echo-server
Making Changes to the Rsyslog Relp Extension
Changes to the rsyslog relp extension can be applied to the local environment by repeatedly running the make
recipe.
make extension-up
Tearing Down the Development Environment
To tear down the development environment, delete the Shoot cluster or disable the shoot-rsyslog-relp
extension in the Shoot’s spec. When the extension is not used by the Shoot anymore, you can run:
make extension-down
This will delete the ControllerRegistration
and ControllerDeployment
of the extension, the shoot-rsyslog-relp-admission
deployment, and the rsyslog-relp-echo-server
deployment.
Maintaining the Publicly Available Image for the rsyslog-relp Echo Server
The testmachinery tests use an rsyslog-relp-echo-server
image from a publicly available repository. The one which is currently used is eu.gcr.io/gardener-project/gardener/extensions/shoot-rsyslog-relp-echo-server:v0.1.0
.
Sometimes it might be necessary to update the image and publish it, e.g. when updating the alpine
base image version specified in the repository’s Dokerfile.
To do that:
Bump the version with which the image is built in the Makefile.
Build the
shoot-rsyslog-relp-echo-server
image:make echo-server-docker-image
Once the image is built, push it to
gcr
with:make push-echo-server-image
Finally, bump the version of the image used by the
testmachinery
tests here.Create a PR with the changes.
6.4 - Monitoring
Monitoring
The shoot-rsyslog-relp
extension exposes metrics for the rsyslog
service running on a Shoot’s nodes so that they can be easily viewed by cluster owners and operators in the Shoot’s Prometheus and Plutono instances. The exposed monitoring data offers valuable insights into the operation of the rsyslog
service and can be used to detect and debug ongoing issues. This guide describes the various metrics, alerts and logs available to cluster owners and operators.
Metrics
Metrics for the rsyslog
service originate from its impstats
module. These include the number of messages in the various queues, the number of ingested messages, the number of processed messages by configured actions, system resources used by the rsyslog
service, and others. More information about them can be found in the impstats
documentation and the statistics counter documentation. They are exposed via the node-exporter
running on each Shoot node and are scraped by the Shoot’s Prometheus instance.
These metrics can also be viewed in a dedicated dashboard named Rsyslog Stats
in the Shoot’s Plutono instance. You can select the node for which you wish the metrics to be displayed from the Node
dropdown menu (by default metrics are summed over all nodes).
Following is a list of all exposed rsyslog
metrics. The name
and origin
labels can be used to determine wether the metric is for: a queue, an action, plugins or system stats; the node
label can be used to determine the node the metric originates from:
rsyslog_pstat_submitted
Number of messages that were submitted to the rsyslog
service from its input. Currently rsyslog
uses the /run/systemd/journal/syslog
socket as input.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_processed
Number of messages that are successfully processed by an action and sent to the target server.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_failed
Number of messages that could not be processed by an action nor sent to the target server.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_suspended
Total number of times an action suspended itself. Note that this counts the number of times the action transitioned from active to suspended state. The counter is no indication of how long the action was suspended or how often it was retried.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_suspended_duration
The total number of seconds this action was disabled.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_resumed
The total number of times this action resumed itself. A resumption occurs after the action has detected that a failure condition does no longer exist.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_utime
User time used in microseconds.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_stime
System time used in microsends.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_maxrss
Maximum resident set size
- Type: Gauge
- Labels:
name
node
origin
rsyslog_pstat_minflt
Total number of minor faults the task has made per second, those which have not required loading a memory page from disk.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_majflt
Total number of major faults the task has made per second, those which have required loading a memory page from disk.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_inblock
Filesystem input operations.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_oublock
Filesystem output operations.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_nvcsw
Voluntary context switches.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_nivcsw
Involuntary context switches.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_openfiles
Number of open files.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_size
Messages currently in queue.
- Type: Gauge
- Labels:
name
node
origin
rsyslog_pstat_enqueued
Total messages enqueued.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_full
Times queue was full.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_discarded_full
Messages discarded due to queue being full.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_discarded_nf
Messages discarded when queue not full.
- Type: Counter
- Labels:
name
node
origin
rsyslog_pstat_maxqsize
Maximum size queue has reached.
- Type: Gauge
- Labels:
name
node
origin
rsyslog_augenrules_load_success
Shows whether the augenrules --load
command was executed successfully or not on the node.
- Type: Gauge
- Labels:
node
Alerts
There are three alerts defined for the rsyslog
service in the Shoot’s Prometheus instance:
RsyslogTooManyRelpActionFailures
This indicates that the cumulative failure rate in processing relp
action messages is greater than 2%. In other words, it compares the rate of processed relp
action messages to the rate of failed relp
action messages and fires an alert when the following expression evaluates to true:
sum(rate(rsyslog_pstat_failed{origin="core.action",name="rsyslg-relp"}[5m])) / sum(rate(rsyslog_pstat_processed{origin="core.action",name="rsyslog-relp"}[5m])) > bool 0.02`
RsyslogRelpActionProcessingRateIsZero
This indicates that no messages are being sent to the upstream rsyslog target by the relp
action. An alert is fired when the following expression evaluates to true:
rate(rsyslog_pstat_processed{origin="core.action",name="rsyslog-relp"}[5m]) == 0
RsyslogRelpAuditRulesNotLoadedSuccessfully
This indicates that augenrules --load
was not executed successfully when called to load the configured audit rules. You should check if the auditd
configuration you provided is valid. An alert is fired when the following expression evaluates to true:
absent(rsyslog_augenrules_load_success == 1)
Users can subscribe to these alerts by following the Gardener alerting guide.
Logging
There are two ways to view the logs of the rsyslog
service running on the Shoot’s nodes - either using the Explore
tab of the Shoot’s Plutono instance, or ssh
-ing directly to a node.
To view logs in Plutono, navigate to the Explore
tab and select vali
from the Explore
dropdown menu. Afterwards enter the following vali
query:
{nodename="<name-of-node>"} |~ "\"unit\":\"rsyslog.service\""
Notice that you cannot use the unit
label to filter for the rsyslog.service
unit logs. Instead, you have to grep
for the service as displayed in the example above.
To view logs when directly ssh
-ing to a node in the Shoot cluster, use either of the following commands on the node:
systemctl status rsyslog
journalctl -u rsyslog
6.5 - Shoot Rsyslog Relp
Developer Docs for Gardener Shoot Rsyslog Relp Extension
This document outlines how Shoot reconciliation and deletion works for a Shoot with the shoot-rsyslog-relp extension enabled.
Shoot Reconciliation
This section outlines how the reconciliation works for a Shoot with the shoot-rsyslog-relp extension enabled.
Extension Enablement / Reconciliation
This section outlines how the extension enablement/reconciliation works, e.g., the extension has been added to the Shoot spec.
- As part of the Shoot reconciliation flow, the gardenlet deploys the Extension resource.
- The shoot-rsyslog-relp extension reconciles the Extension resource. pkg/controller/lifecycle/actuator.go contains the implementation of the extension.Actuator interface. The reconciliation of an Extension of type
shoot-rsyslog-relp
only deploys the necessary monitoring configuration - theshoot-rsyslog-relp-dashboards
ConfigMap which contains the definitions for: Plutono dashboard for the Rsyslog component, and theshoot-shoot-rsyslog-relp
ServiceMonitor
andPrometheusRule
resources which contains the definitions for: scraping metrics by prometheus, alerting rules. - As part of the Shoot reconciliation flow, the gardenlet deploys the OperatingSystemConfig resource.
- The shoot-rsyslog-relp extension serves a webhook that mutates the OperatingSystemConfig resource for Shoots having the shoot-rsyslog-relp extension enabled (the corresponding namespace gets labeled by the gardenlet with
extensions.gardener.cloud/shoot-rsyslog-relp=true
). pkg/webhook/operatingsystemconfig/ensurer.go contains implementation of the genericmutator.Ensurer interface.- The webhook renders the 60-audit.conf.tpl template script and appends it to the OperatingSystemConfig files. When rendering the template, the configuration of the shoot-rsyslog-relp extension is used to fill in the required template values. The file is installed as
/var/lib/rsyslog-relp-configurator/rsyslog.d/60-audit.conf
on the host OS. - The webhook appends the audit rules to the OperatingSystemConfig. The files are installed under
/var/lib/rsyslog-relp-configurator/rules.d
on the host OS. - If the user has specified alternative audit rules in a config map reference, the webhook fetches the referenced
ConfigMap
from the Shoot’s control plane namespace and decodes the value of itsauditd
data key into an object of typeAuditd
. It then takes theauditRules
defined in the object and places those under the/var/lib/rsyslog-relp-configurator/rules.d
directory in a single file. - The webhook renders the configure-rsyslog.tpl.sh script and appends it to the OperatingSystemConfig files. This script is installed as
/var/lib/rsyslog-relp-configurator/configure-rsyslog.sh
on the host OS. It keeps the configuration of thersyslog
systemd service up-to-date by copying/var/lib/rsyslog-relp-configurator/rsyslog.d/60-audit.conf
to/etc/rsyslog.d/60-audit.conf
, if/etc/rsyslog.d/60-audit.conf
does not exist or the files differ. The script also takes care of syncing the audit rules in/etc/audit/rules.d
with the ones installed in/var/lib/rsyslog-relp-configurator/rules.d
and restarts the auditd systemd service if necessary. - The webhook renders the process-rsyslog-pstats.tpl.sh and appends it to the OperatingSystemConfig files. This script receives metrics from the
rsyslog
process, transforms them, and writes them to/var/lib/node-exporter/textfile-collector/rsyslog_pstats.prom
so that they can be collected by thenode-exporter
. - As part of the Shoot reconciliation, before the shoot-rsyslog-relp extension is deployed, the gardenlet copies all Secret and ConfigMap resources referenced in
.spec.resources[]
to the Shoot’s control plane namespace on the Seed. When the.tls.enabled
field istrue
in the shoot-rsyslog-relp extension configuration, a value for.tls.secretReferenceName
must also be specified so that it references a named resource reference in the Shoot’s.spec.resources[]
array. The webhook appends the data of the referenced Secret in the Shoot’s control plane namespace to the OperatingSystemConfig files. - The webhook appends the
rsyslog-configurator.service
unit to the OperatingSystemConfig units. The unit invokes theconfigure-rsyslog.sh
script every 15 seconds.
- The webhook renders the 60-audit.conf.tpl template script and appends it to the OperatingSystemConfig files. When rendering the template, the configuration of the shoot-rsyslog-relp extension is used to fill in the required template values. The file is installed as
Extension Disablement
This section outlines how the extension disablement works, i.e., the extension has to be removed from the Shoot spec.
- As part of the Shoot reconciliation flow, the gardenlet destroys the Extension resource because it is no longer needed.
- As part of the deletion flow, the shoot-rsyslog-relp extension deploys the
rsyslog-relp-configuration-cleaner
DaemonSet to the Shoot cluster to clean up the existing rsyslog configuration and revert the audit rules.
- As part of the deletion flow, the shoot-rsyslog-relp extension deploys the
Shoot Deletion
This section outlines how the deletion works for a Shoot with the shoot-rsyslog-relp extension enabled.
- As part of the Shoot deletion flow, the gardenlet destroys the Extension resource.
- In the Shoot deletion flow, the Extension resource is deleted after the Worker resource. Hence, there is no need to deploy the
rsyslog-relp-configuration-cleaner
DaemonSet to the Shoot cluster to clean up the existing rsyslog configuration and revert the audit rules.
- In the Shoot deletion flow, the Extension resource is deleted after the Worker resource. Hence, there is no need to deploy the
7 - OpenID Connect services
Gardener Extension for openid connect services
Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.
Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.
This controller implements Gardener’s extension contract for the shoot-oidc-service
extension.
An example for a ControllerRegistration
resource that can be used to register this controller to Gardener can be found here.
Please find more information regarding the extensibility concepts and a detailed proposal here.
Compatibility
The following lists compatibility requirements of this extension controller with regards to other Gardener components.
OIDC Extension | Gardener | Notes |
---|---|---|
== v0.15.0 | >= 1.60.0 <= v1.64.0 | A typical side-effect when running Gardener < v1.63.0 is an unexpected scale-down of the OIDC webhook from 2 -> 1 . |
== v0.16.0 | >= 1.65.0 |
Extension Resources
Example extension resource:
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Extension
metadata:
name: extension-shoot-oidc-service
namespace: shoot--project--abc
spec:
type: shoot-oidc-service
When an extension resource is reconciled, the extension controller will create an instance of OIDC Webhook Authenticator. These resources are placed inside the shoot namespace on the seed. Also, the controller takes care about generating necessary RBAC
resources for the seed as well as for the shoot.
Please note, this extension controller relies on the Gardener-Resource-Manager to deploy k8s resources to seed and shoot clusters.
How to start using or developing this extension controller locally
You can run the controller locally on your machine by executing make start
.
We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.
Feedback and Support
Feedback and contributions are always welcome. Please report bugs or suggestions as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).
Learn more!
Please find further resources about out project here:
7.1 - Deployment
Gardener OIDC Service for Shoots
Introduction
Gardener allows Shoot clusters to dynamically register OpenID Connect providers. To support this the Gardener must be installed with the shoot-oidc-service
extension.
Configuration
To generally enable the OIDC service for shoot objects the shoot-oidc-service
extension must be registered by providing an appropriate extension registration in the garden cluster.
Here it is possible to decide whether the extension should be always available for all shoots or whether the extension must be separately enabled per shoot.
If the extension should be used for all shoots the globallyEnabled
flag should be set to true
.
spec:
resources:
- kind: Extension
type: shoot-oidc-service
globallyEnabled: true
Shoot Feature Gate
If the shoot OIDC service is not globally enabled by default (depends on the extension registration on the garden cluster), it can be enabled per shoot. To enable the service for a shoot, the shoot manifest must explicitly add the shoot-oidc-service
extension.
...
spec:
extensions:
- type: shoot-oidc-service
...
If the shoot OIDC service is globally enabled by default, it can be disabled per shoot. To disable the service for a shoot, the shoot manifest must explicitly state it.
...
spec:
extensions:
- type: shoot-oidc-service
disabled: true
...
7.2 - Openidconnects
Register OpenID Connect provider in Shoot Clusters
Introduction
Within a shoot cluster, it is possible to dynamically register OpenID Connect providers. It is necessary that the Gardener installation your shoot cluster runs in is equipped with a shoot-oidc-service
extension. Please ask your Gardener operator if the extension is available in your environment.
Important
Kubernetes v1.29 introduced support for Structured Authentication. Gardener allows the use of this feature for shoot clusters with Kubernetes version >= 1.30.
Structured Authentication should be preferred over the Gardener OIDC Extension in case:
- you do not need more than 64 authenticators (a limitation that is tracked in https://github.com/kubernetes/kubernetes/issues/122809)
- you do not need to register an issuer twice (anyways not recommended since it can lead to misconfiguration and user impersonation if done poorly)
- you need the ability to write custom expressions to map and validate claims
- you need support for multiple audiences per authenticator
- you need support for providers that don’t support OpenID connect discovery
See how to make use of Structured Authentication in Gardener.
Shoot Feature Gate
In most of the Gardener setups the shoot-oidc-service
extension is not enabled globally and thus must be configured per shoot cluster. Please adapt the shoot specification by the configuration shown below to activate the extension individually.
kind: Shoot
...
spec:
extensions:
- type: shoot-oidc-service
...
OpenID Connect provider
In order to register an OpenID Connect provider an openidconnect
resource should be deployed in the shoot cluster.
Caution
It is strongly recommended to NOT disable prefixing since it may result in unwanted impersonations. The rule of thumb is to always use meaningful and unique prefixes for both
username
andgroups
. A good way to ensure this is to use the name of theopenidconnect
resource as shown in the example below.
apiVersion: authentication.gardener.cloud/v1alpha1
kind: OpenIDConnect
metadata:
name: abc
spec:
# issuerURL is the URL the provider signs ID Tokens as.
# This will be the "iss" field of all tokens produced by the provider and is used for configuration discovery.
issuerURL: https://abc-oidc-provider.example
# clientID is the audience for which the JWT must be issued for, the "aud" field.
clientID: my-shoot-cluster
# usernameClaim is the JWT field to use as the user's username.
usernameClaim: sub
# usernamePrefix, if specified, causes claims mapping to username to be prefix with the provided value.
# A value "oidc:" would result in usernames like "oidc:john".
# If not provided, the prefix defaults to "( .metadata.name )/". The value "-" can be used to disable all prefixing.
usernamePrefix: "abc:"
# groupsClaim, if specified, causes the OIDCAuthenticator to try to populate the user's groups with an ID Token field.
# If the groupsClaim field is present in an ID Token the value must be a string or list of strings.
# groupsClaim: groups
# groupsPrefix, if specified, causes claims mapping to group names to be prefixed with the value.
# A value "oidc:" would result in groups like "oidc:engineering" and "oidc:marketing".
# If not provided, the prefix defaults to "( .metadata.name )/".
# The value "-" can be used to disable all prefixing.
# groupsPrefix: "abc:"
# caBundle is a PEM encoded CA bundle which will be used to validate the OpenID server's certificate. If unspecified, system's trusted certificates are used.
# caBundle: <base64 encoded bundle>
# supportedSigningAlgs sets the accepted set of JOSE signing algorithms that can be used by the provider to sign tokens.
# The default value is RS256.
# supportedSigningAlgs:
# - RS256
# requiredClaims, if specified, causes the OIDCAuthenticator to verify that all the
# required claims key value pairs are present in the ID Token.
# requiredClaims:
# customclaim: requiredvalue
# maxTokenExpirationSeconds if specified, sets a limit in seconds to the maximum validity duration of a token.
# Tokens issued with validity greater that this value will not be verified.
# Setting this will require that the tokens have the "iat" and "exp" claims.
# maxTokenExpirationSeconds: 3600
# jwks if specified, provides an option to specify JWKS keys offline.
# jwks:
# keys is a base64 encoded JSON webkey Set. If specified, the OIDCAuthenticator skips the request to the issuer's jwks_uri endpoint to retrieve the keys.
# keys: <base64 encoded jwks>
8 - Registry cache
Gardener Extension for Registry Cache
Gardener extension controller which deploys pull-through caches for container registries.
Usage
- Configuring the Registry Cache Extension - learn what is the use-case for a pull-through cache, how to enable it and configure it
- How to provide credentials for upstream repository?
- Configuring the Registry Mirror Extension - learn what is the use-case for a registry mirror, how to enable and configure it
Local Setup and Development
- Deploying Registry Cache Extension Locally - learn how to set up a local development environment
- Deploying Registry Cache Extension in Gardener’s Local Setup with Provider Extensions - learn how to set up a development environment using own Seed clusters on an existing Kubernetes cluster
- Developer Docs for Gardener Extension Registry Cache - learn about the inner workings
8.1 - Configuring the Registry Cache Extension
Configuring the Registry Cache Extension
Introduction
Use Case
For a Shoot cluster, the containerd daemon of every Node goes to the internet and fetches an image that it doesn’t have locally in the Node’s image cache. New Nodes are often created due to events such as auto-scaling (scale up), rolling update, or replacement of unhealthy Node. Such a new Node would need to pull all of the images of the Pods running on it from the internet because the Node’s cache is initially empty. Pulling an image from a registry produces network traffic and registry costs. To avoid these network traffic and registry costs, you can use the registry-cache extension to run a registry as pull-through cache.
The following diagram shows a rough outline of how an image pull looks like for a Shoot cluster without registry cache:
Solution
The registry-cache extension deploys and manages a registry in the Shoot cluster that runs as pull-through cache. The used registry implementation is distribution/distribution.
How does it work?
When the extension is enabled, a registry cache for each configured upstream is deployed to the Shoot cluster. Along with this, the containerd daemon on the Shoot cluster Nodes gets configured to use as a mirror the Service IP address of the deployed registry cache. For example, if a registry cache for upstream docker.io
is requested via the Shoot spec, then containerd gets configured to first pull the image from the deployed cache in the Shoot cluster. If this image pull operation fails, containerd falls back to the upstream itself (docker.io
in that case).
The first time an image is requested from the pull-through cache, it pulls the image from the configured upstream registry and stores it locally, before handing it back to the client. On subsequent requests, the pull-through cache is able to serve the image from its own storage.
Note
The used registry implementation (distribution/distribution) supports mirroring of only one upstream registry.
The following diagram shows a rough outline of how an image pull looks like for a Shoot cluster with registry cache:
Shoot Configuration
The extension is not globally enabled and must be configured per Shoot cluster. The Shoot specification has to be adapted to include the registry-cache
extension configuration.
Below is an example of registry-cache
extension configuration as part of the Shoot spec:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: crazy-botany
namespace: garden-dev
spec:
extensions:
- type: registry-cache
providerConfig:
apiVersion: registry.extensions.gardener.cloud/v1alpha3
kind: RegistryConfig
caches:
- upstream: docker.io
volume:
size: 100Gi
# storageClassName: premium
- upstream: ghcr.io
- upstream: quay.io
garbageCollection:
ttl: 0s
secretReferenceName: quay-credentials
- upstream: my-registry.io:5000
remoteURL: http://my-registry.io:5000
# ...
resources:
- name: quay-credentials
resourceRef:
apiVersion: v1
kind: Secret
name: quay-credentials-v1
The providerConfig
field is required.
The providerConfig.caches
field contains information about the registry caches to deploy. It is a required field. At least one cache has to be specified.
The providerConfig.caches[].upstream
field is the remote registry host to cache. It is a required field.
The value must be a valid DNS subdomain (RFC 1123) and optionally a port (i.e. <host>[:<port>]
). It must not include a scheme.
The providerConfig.caches[].remoteURL
optional field is the remote registry URL. If configured, it must include an https://
or http://
scheme.
If the field is not configured, the remote registry URL defaults to https://<upstream>
. In case the upstream is docker.io
, it defaults to https://registry-1.docker.io
.
The providerConfig.caches[].volume
field contains settings for the registry cache volume.
The registry-cache extension deploys a StatefulSet with a volume claim template. A PersistentVolumeClaim is created with the configured size and StorageClass name.
The providerConfig.caches[].volume.size
field is the size of the registry cache volume. Defaults to 10Gi
. The size must be a positive quantity (greater than 0).
This field is immutable. See Increase the cache disk size on how to resize the disk.
The extension defines alerts for the volume. See Alerting for Users on how to enable notifications for Shoot cluster alerts.
The providerConfig.caches[].volume.storageClassName
field is the name of the StorageClass used by the registry cache volume.
This field is immutable. If the field is not specified, then the default StorageClass will be used.
The providerConfig.caches[].garbageCollection.ttl
field is the time to live of a blob in the cache. If the field is set to 0s
, the garbage collection is disabled. Defaults to 168h
(7 days). See the Garbage Collection section for more details.
The providerConfig.caches[].secretReferenceName
is the name of the reference for the Secret containing the upstream registry credentials. To cache images from a private registry, credentials to the upstream registry should be supplied. For more details, see How to provide credentials for upstream registry.
Note
It is only possible to provide one set of credentials for one private upstream registry.
The providerConfig.caches[].proxy.httpProxy
field represents the proxy server for HTTP connections which is used by the registry cache. It must include an https://
or http://
scheme.
The providerConfig.caches[].proxy.httpsProxy
field represents the proxy server for HTTPS connections which is used by the registry cache. It must include an https://
or http://
scheme.
Garbage Collection
When the registry cache receives a request for an image that is not present in its local store, it fetches the image from the upstream, returns it to the client and stores the image in the local store. The registry cache runs a scheduler that deletes images when their time to live (ttl) expires. When adding an image to the local store, the registry cache also adds a time to live for the image. The ttl defaults to 168h
(7 days) and is configurable. The garbage collection can be disabled by setting the ttl to 0s
. Requesting an image from the registry cache does not extend the time to live of the image. Hence, an image is always garbage collected from the registry cache store when its ttl expires.
At the time of writing this document, there is no functionality for garbage collection based on disk size - e.g., garbage collecting images when a certain disk usage threshold is passed.
The garbage collection cannot be enabled once it is disabled. This constraint is added to mitigate distribution/distribution#4249.
Increase the Cache Disk Size
When there is no available disk space, the registry cache continues to respond to requests. However, it cannot store the remotely fetched images locally because it has no free disk space. In such case, it is simply acting as a proxy without being able to cache the images in its local store. The disk has to be resized to ensure that the registry cache continues to cache images.
There are two alternatives to enlarge the cache’s disk size:
[Alternative 1] Resize the PVC
To enlarge the PVC’s size, perform the following steps:
Make sure that the
KUBECONFIG
environment variable is targeting the correct Shoot cluster.Find the PVC name to resize for the desired upstream. The below example fetches the PVC for the
docker.io
upstream:kubectl -n kube-system get pvc -l upstream-host=docker.io
Patch the PVC’s size to the desired size. The below example patches the size of a PVC to
10Gi
:kubectl -n kube-system patch pvc $PVC_NAME --type merge -p '{"spec":{"resources":{"requests": {"storage": "10Gi"}}}}'
Make sure that the PVC gets resized. Describe the PVC to check the resize operation result:
kubectl -n kube-system describe pvc -l upstream-host=docker.io
Drawback of this approach: The cache’s size in the Shoot spec (
providerConfig.caches[].size
) diverges from the PVC’s size.
[Alternative 2] Remove and Readd the Cache
There is always the option to remove the cache from the Shoot spec and to readd it again with the updated size.
Drawback of this approach: The already cached images get lost and the cache starts with an empty disk.
High Аvailability
The registry cache runs with a single replica. This fact may lead to concerns for the high availability such as “What happens when the registry cache is down? Does containerd fail to pull the image?”. As outlined in the How does it work? section, containerd is configured to fall back to the upstream registry if it fails to pull the image from the registry cache. Hence, when the registry cache is unavailable, the containerd’s image pull operations are not affected because containerd falls back to image pull from the upstream registry.
Possible Pitfalls
- The used registry implementation (the Distribution project) supports mirroring of only one upstream registry. The extension deploys a pull-through cache for each configured upstream.
us-docker.pkg.dev
,europe-docker.pkg.dev
, andasia-docker.pkg.dev
are different upstreams. Hence, configuringpkg.dev
as upstream won’t cache images fromus-docker.pkg.dev
,europe-docker.pkg.dev
, orasia-docker.pkg.dev
.
Limitations
Images that are pulled before a registry cache Pod is running or before a registry cache Service is reachable from the corresponding Node won’t be cached - containerd will pull these images directly from the upstream.
The reasoning behind this limitation is that a registry cache Pod is running in the Shoot cluster. To have a registry cache’s Service cluster IP reachable from containerd running on the Node, the registry cache Pod has to be running and kube-proxy has to configure iptables/IPVS rules for the registry cache Service. If kube-proxy hasn’t configured iptables/IPVS rules for the registry cache Service, then the image pull times (and new Node bootstrap times) will be increased significantly. For more detailed explanations, see point 2. and gardener/gardener-extension-registry-cache#68.
That’s why the registry configuration on a Node is applied only after the registry cache Service is reachable from the Node. The
gardener-node-agent.service
systemd unit sends requests to the registry cache’s Service. Once the registry cache responds withHTTP 200
, the unit creates the needed registry configuration file (hosts.toml
).As a result, for images from Shoot system components:
- On Shoot creation with the registry cache extension enabled, a registry cache is unable to cache all of the images from the Shoot system components. Usually, until the registry cache Pod is running, containerd pulls from upstream the images from Shoot system components (before the registry configuration gets applied).
- On new Node creation for existing Shoot with the registry cache extension enabled, a registry cache is unable to cache most of the images from Shoot system components. The reachability of the registry cache Service requires the Service network to be set up, i.e., the kube-proxy for that new Node to be running and to have set up iptables/IPVS configuration for the registry cache Service.
containerd requests will time out in 30s in case kube-proxy hasn’t configured iptables/IPVS rules for the registry cache Service - the image pull times will increase significantly.
containerd is configured to fall back to the upstream itself if a request against the cache fails. However, if the cluster IP of the registry cache Service does not exist or if kube-proxy hasn’t configured iptables/IPVS rules for the registry cache Service, then containerd requests against the registry cache time out in 30 seconds. This significantly increases the image pull times because containerd does multiple requests as part of the image pull (HEAD request to resolve the manifest by tag, GET request for the manifest by SHA, GET requests for blobs)
Example: If the Service of a registry cache is deleted, then a new Service will be created. containerd’s registry config will still contain the old Service’s cluster IP. containerd requests against the old Service’s cluster IP will time out and containerd will fall back to upstream.
- Image pull of
docker.io/library/alpine:3.13.2
from the upstream takes ~2s while image pull of the same image with invalid registry cache cluster IP takes ~2m.2s. - Image pull of
eu.gcr.io/gardener-project/gardener/ops-toolbelt:0.18.0
from the upstream takes ~10s while image pull of the same image with invalid registry cache cluster IP takes ~3m.10s.
- Image pull of
Amazon Elastic Container Registry is currently not supported. For details see distribution/distribution#4383.
8.2 - Configuring the Registry Mirror Extension
Configuring the Registry Mirror Extension
Introduction
Use Case
containerd allows registry mirrors to be configured. Use cases are:
- Usage of public mirror(s) - for example, circumvent issues with the upstream registry such as rate limiting, outages, and others.
- Usage of private mirror(s) - for example, reduce network costs by using a private mirror running in the same network.
Solution
The registry-mirror extension allows the registry mirror configuration to be configured via the Shoot spec directly.
How does it work?
When the extension is enabled, the containerd daemon on the Shoot cluster Nodes gets configured to use the requested mirrors as a mirror. For example, if for the upstream docker.io
the mirror https://mirror.gcr.io
is configured in the Shoot spec, then containerd gets configured to first pull the image from the mirror (https://mirror.gcr.io
in that case). If this image pull operation fails, containerd falls back to the upstream itself (docker.io
in that case).
The extension is based on the contract described in containerd
Registry Configuration. The corresponding upstream documentation in containerd is Registry Configuration - Introduction.
Shoot Configuration
The Shoot specification has to be adapted to include the registry-mirror
extension configuration.
Below is an example of registry-mirror
extension configuration as part of the Shoot spec:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: crazy-botany
namespace: garden-dev
spec:
extensions:
- type: registry-mirror
providerConfig:
apiVersion: mirror.extensions.gardener.cloud/v1alpha1
kind: MirrorConfig
mirrors:
- upstream: docker.io
hosts:
- host: "https://mirror.gcr.io"
capabilities: ["pull"]
The providerConfig
field is required.
The providerConfig.mirrors
field contains information about the registry mirrors to configure. It is a required field. At least one mirror has to be specified.
The providerConfig.mirror[].upstream
field is the remote registry host to mirror. It is a required field.
The value must be a valid DNS subdomain (RFC 1123) and optionally a port (i.e. <host>[:<port>]
). It must not include a scheme.
The providerConfig.mirror[].hosts
field represents the mirror hosts to be used for the upstream. At least one mirror host has to be specified.
The providerConfig.mirror[].hosts[].host
field is the mirror host. It is a required field.
The value must include a scheme - http://
or https://
.
The providerConfig.mirror[].hosts[].capabilities
field represents the operations a host is capable of performing. This also represents the set of operations for which the mirror host may be trusted to perform. Defaults to ["pull"]
. The supported values are pull
and resolve
.
See the capabilities field documentation for more information on which operations are considered trusted ones against public/private mirrors.
8.3 - Deploying Registry Cache Extension in Gardener's Local Setup with Provider Extensions
Deploying Registry Cache Extension in Gardener’s Local Setup with Provider Extensions
Prerequisites
- Make sure that you have a running local Gardener setup with enabled provider extensions. The steps to complete this can be found in the Deploying Gardener Locally and Enabling Provider-Extensions guide.
Setting up the Registry Cache Extension
Make sure that your KUBECONFIG
environment variable is targeting the local Gardener cluster.
The location of the Gardener project from the Gardener setup step is expected to be under the same root (e.g. ~/go/src/github.com/gardener/
). If this is not the case, the location of Gardener project should be specified in GARDENER_REPO_ROOT
environment variable:
export GARDENER_REPO_ROOT="<path_to_gardener_project>"
Then you can run:
make remote-extension-up
In case you have added additional Seeds you can specify the seed name:
make remote-extension-up SEED_NAME=<seed-name>
The corresponding make target will build the extension image, push it into the Seed cluster image registry, and deploy the registry-cache ControllerDeployment and ControllerRegistration resources into the kind cluster. The container image in the ControllerDeployment will be the image that was build and pushed into the Seed cluster image registry.
The make target will then deploy the registry-cache admission component. It will build the admission image, push it into the kind cluster image registry, and finally install the admission component charts to the kind cluster.
Creating a Shoot Cluster
Once the above step is completed, you can create a Shoot cluster. In order to create a Shoot cluster, please create your own Shoot definition depending on providers on your Seed cluster.
Tearing Down the Development Environment
To tear down the development environment, delete the Shoot cluster or disable the registry-cache
extension in the Shoot’s specification. When the extension is not used by the Shoot anymore, you can run:
make remote-extension-down
The make target will delete the ControllerDeployment and ControllerRegistration of the extension, and the registry-cache admission helm deployment.
8.4 - Deploying Registry Cache Extension Locally
Deploying Registry Cache Extension Locally
Prerequisites
- Make sure that you have a running local Gardener setup. The steps to complete this can be found in the Deploying Gardener Locally guide.
Setting up the Registry Cache Extension
Make sure that your KUBECONFIG
environment variable is targeting the local Gardener cluster. When this is ensured, run:
make extension-up
The corresponding make target will build the extension image, load it into the kind cluster Nodes, and deploy the registry-cache ControllerDeployment and ControllerRegistration resources. The container image in the ControllerDeployment will be the image that was build and loaded into the kind cluster Nodes.
The make target will then deploy the registry-cache admission component. It will build the admission image, load it into the kind cluster Nodes, and finally install the admission component charts to the kind cluster.
Creating a Shoot Cluster
Once the above step is completed, you can create a Shoot cluster.
example/shoot-registry-cache.yaml
contains a Shoot specification with the registry-cache
extension:
kubectl create -f example/shoot-registry-cache.yaml
example/shoot-registry-mirror.yaml
contains a Shoot specification with the registry-mirror
extension:
kubectl create -f example/shoot-registry-mirror.yaml
Tearing Down the Development Environment
To tear down the development environment, delete the Shoot cluster or disable the registry-cache
extension in the Shoot’s specification. When the extension is not used by the Shoot anymore, you can run:
make extension-down
The make target will delete the ControllerDeployment and ControllerRegistration of the extension, and the registry-cache admission helm deployment.
8.5 - Developer Docs for Gardener Extension Registry Cache
Developer Docs for Gardener Extension Registry Cache
This document outlines how Shoot reconciliation and deletion works for a Shoot with the registry-cache extension enabled.
Shoot Reconciliation
This section outlines how the reconciliation works for a Shoot with the registry-cache extension enabled.
Extension Enablement / Reconciliation
This section outlines how the extension enablement/reconciliation works, e.g., the extension has been added to the Shoot spec.
- As part of the Shoot reconciliation flow, the gardenlet deploys the Extension resource.
- The registry-cache extension reconciles the Extension resource. pkg/controller/cache/actuator.go contains the implementation of the extension.Actuator interface. The reconciliation of an Extension of type
registry-cache
consists of the following steps:- The registry-cache extension deploys resources to the Shoot cluster via ManagedResource. For every configured upstream, it creates a StatefulSet (with PVC), Service, and other resources.
- It lists all Services from the
kube-system
namespace that have theupstream-host
label. It will return an error (and retry in exponential backoff) until the Services count matches the configured registries count. - When there is a Service created for each configured upstream registry, the registry-cache extension populates the Extension resource status. In the Extension status, for each upstream, it maintains an endpoint (in the format
http://<cluster-ip>:5000
) which can be used to access the registry cache from within the Shoot cluster.<cluster-ip>
is the cluster IP of the registry cache Service. The cluster IP of a Service is assigned by the Kubernetes API server on Service creation.
- As part of the Shoot reconciliation flow, the gardenlet deploys the OperatingSystemConfig resource.
- The registry-cache extension serves a webhook that mutates the OperatingSystemConfig resource for Shoots having the registry-cache extension enabled (the corresponding namespace gets labeled by the gardenlet with
extensions.gardener.cloud/registry-cache=true
). pkg/webhook/cache/ensurer.go contains an implementation of the genericmutator.Ensurer interface.- The webhook appends or updates
RegistryConfig
entries in the OperatingSystemConfig CRI configuration that corresponds to configured registry caches in the Shoot. TheRegistryConfig
readiness probe is enabled so that gardener-node-agent creates ahosts.toml
containerd registry configuration file when allRegistryConfig
hosts are reachable.
- The webhook appends or updates
Extension Disablement
This section outlines how the extension disablement works, i.e., the extension has to be removed from the Shoot spec.
- As part of the Shoot reconciliation flow, the gardenlet destroys the Extension resource because it is no longer needed.
- The extension deletes the ManagedResource containing the registry cache resources.
- The OperatingSystemConfig resource will not be mutated and no
RegistryConfig
entries will be added or updated. The gardener-node-agent detects thatRegistryConfig
entries have been removed or changed and deletes or updates correspondinghosts.toml
configuration files under/etc/containerd/certs.d
folder.
Shoot Deletion
This section outlines how the deletion works for a Shoot with the registry-cache extension enabled.
- As part of the Shoot deletion flow, the gardenlet destroys the Extension resource.
- The extension deletes the ManagedResource containing the registry cache resources.
8.6 - How to provide credentials for upstream registry?
How to provide credentials for upstream registry?
In Kubernetes, to pull images from private container image registries you either have to specify an image pull Secret (see Pull an Image from a Private Registry) or you have to configure the kubelet to dynamically retrieve credentials using a credential provider plugin (see Configure a kubelet image credential provider). When pulling an image, the kubelet is providing the credentials to the CRI implementation. The CRI implementation uses the provided credentials against the upstream registry to pull the image.
The registry-cache extension is using the Distribution project as pull through cache implementation. The Distribution project does not use the provided credentials from the CRI implementation while fetching an image from the upstream. Hence, the above-described scenarios such as configuring image pull Secret for a Pod or configuring kubelet credential provider plugins don’t work out of the box with the pull through cache provided by the registry-cache extension. Instead, the Distribution project supports configuring only one set of credentials for a given pull through cache instance (for a given upstream).
This document describe how to supply credentials for the private upstream registry in order to pull private image with the registry cache.
How to configure the registry cache to use upstream registry credentials?
Create an immutable Secret with the upstream registry credentials in the Garden cluster:
kubectl create -f - <<EOF apiVersion: v1 kind: Secret metadata: name: ro-docker-secret-v1 namespace: garden-dev type: Opaque immutable: true data: username: $(echo -n $USERNAME | base64 -w0) password: $(echo -n $PASSWORD | base64 -w0) EOF
For Artifact Registry, the username is
_json_key
and the password is the service account key in JSON format. To base64 encode the service account key, copy it and run:echo -n $SERVICE_ACCOUNT_KEY_JSON | base64 -w0
Add the newly created Secret as a reference to the Shoot spec, and then to the registry-cache extension configuration.
In the registry-cache configuration, set the
secretReferenceName
field. It should point to a resource reference underspec.resources
. The resource reference itself points to the Secret in project namespace.apiVersion: core.gardener.cloud/v1beta1 kind: Shoot # ... spec: extensions: - type: registry-cache providerConfig: apiVersion: registry.extensions.gardener.cloud/v1alpha3 kind: RegistryConfig caches: - upstream: docker.io secretReferenceName: docker-secret # ... resources: - name: docker-secret resourceRef: apiVersion: v1 kind: Secret name: ro-docker-secret-v1 # ...
Warning
Do not delete the referenced Secret when there is a Shoot still using it.
How to rotate the registry credentials?
To rotate registry credentials perform the following steps:
- Generate a new pair of credentials in the cloud provider account. Do not invalidate the old ones.
- Create a new Secret (e.g.,
ro-docker-secret-v2
) with the newly generated credentials as described in step 1. in How to configure the registry cache to use upstream registry credentials?. - Update the Shoot spec with newly created Secret as described in step 2. in How to configure the registry cache to use upstream registry credentials?.
- The above step will trigger a Shoot reconciliation. Wait for it to complete.
- Make sure that the old Secret is no longer referenced by any Shoot cluster. Finally, delete the Secret containing the old credentials (e.g.,
ro-docker-secret-v1
). - Delete the corresponding old credentials from the cloud provider account.
Possible Pitfalls
- The registry cache is not protected by any authentication/authorization mechanism. The cached images (incl. private images) can be fetched from the registry cache without authentication/authorization. Note that the registry cache itself is not exposed publicly.
- The registry cache provides the credentials for every request against the corresponding upstream. In some cases, misconfigured credentials can prevent the registry cache to pull even public images from the upstream (for example: invalid service account key for Artifact Registry). However, this behaviour is controlled by the server-side logic of the upstream registry.
- Do not remove the image pull Secrets when configuring credentials for the registry cache. When the registry-cache is not available, containerd falls back to the upstream registry. containerd still needs the image pull Secret to pull the image and in this way to have the fallback mechanism working.