This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Provider Azure

Gardener extension controller for the Azure cloud provider

Gardener Extension for Azure provider

REUSE status CI Build status Go Report Card

Project Gardener implements the automated management and operation of Kubernetes clusters as a service. Its main principle is to leverage Kubernetes concepts for all of its tasks.

Recently, most of the vendor specific logic has been developed in-tree. However, the project has grown to a size where it is very hard to extend, maintain, and test. With GEP-1 we have proposed how the architecture can be changed in a way to support external controllers that contain their very own vendor specifics. This way, we can keep Gardener core clean and independent.

This controller implements Gardener’s extension contract for the Azure provider.

An example for a ControllerRegistration resource that can be used to register this controller to Gardener can be found here.

Please find more information regarding the extensibility concepts and a detailed proposal here.

Supported Kubernetes versions

This extension controller supports the following Kubernetes versions:

VersionSupportConformance test results
Kubernetes 1.291.29.0+Gardener v1.29 Conformance Tests
Kubernetes 1.281.28.0+Gardener v1.28 Conformance Tests
Kubernetes 1.271.27.0+Gardener v1.27 Conformance Tests
Kubernetes 1.261.26.0+Gardener v1.26 Conformance Tests
Kubernetes 1.251.25.0+Gardener v1.25 Conformance Tests

Please take a look here to see which versions are supported by Gardener in general.


How to start using or developing this extension controller locally

You can run the controller locally on your machine by executing make start.

Static code checks and tests can be executed by running make verify. We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.

Feedback and Support

Feedback and contributions are always welcome. Please report bugs or suggestions as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).

Learn more!

Please find further resources about out project here:

1 - Tutorials

1.1 - Create a Kubernetes Cluster on Azure with Gardener

Overview

Gardener allows you to create a Kubernetes cluster on different infrastructure providers. This tutorial will guide you through the process of creating a cluster on Azure.

Prerequisites

  • You have created an Azure account.
  • You have access to the Gardener dashboard and have permissions to create projects.
  • You have an Azure Service Principal assigned to your subscription.

Steps

  1. Go to the Gardener dashboard and create a Project.

  2. Get the properties of your Azure AD tenant, Subscription and Service Principal.

    Before you can provision and access a Kubernetes cluster on Azure, you need to add the Azure service principal, AD tenant and subscription credentials in Gardener. Gardener needs the credentials to provision and operate the Azure infrastructure for your Kubernetes cluster.

    Ensure that the Azure service principal has the actions defined within the Azure Permissions within your Subscription assigned. If no fine-grained permission/actions are required, then simply the built-in Contributor role can be assigned.

    • Tenant ID

      To find your TenantID, follow this guide.

    • SubscriptionID

      To find your SubscriptionID, search for and select Subscriptions.

      After that, copy the SubscriptionID from your subscription of choice.

    • Service Principal (SPN)

      A service principal consist of a ClientID (also called ApplicationID) and a Client Secret. For more information, see Application and service principal objects in Azure Active Directory. You need to obtain the:

      • Client ID

        Access the Azure Portal and navigate to the Active Directory service. Within the service navigate to App registrations and select your service principal. Copy the ClientID you see there.

      • Client Secret

        Secrets for the Azure Account/Service Principal can be generated/rotated via the Azure Portal. After copying your ClientID, in the Detail view of your Service Principal navigate to Certificates & secrets. In the section, you can generate a new secret.

  3. Choose Secrets, then the plus icon and select Azure.

  4. Create your secret.

    1. Type the name of your secret.
    2. Copy and paste the TenantID, SubscriptionID and the Service Principal credentials (ClientID and ClientSecret).
    3. Choose Add secret.

    After completing these steps, you should see your newly created secret in the Infrastructure Secrets section.

  5. Register resource providers for your subscription.

    1. Go to your Azure dashboard
    2. Navigate to Subscriptions -> <your_subscription>
    3. Pick resource providers from the sidebar
    4. Register microsoft.Network
    5. Register microsoft.Compute
  6. To create a new cluster, choose Clusters and then the plus sign in the upper right corner.

  7. In the Create Cluster section:

    1. Select Azure in the Infrastructure tab.
    2. Type the name of your cluster in the Cluster Details tab.
    3. Choose the secret you created before in the Infrastructure Details tab.
    4. Choose Create.
  8. Wait for your cluster to get created.

Result

After completing the steps in this tutorial, you will be able to see and download the kubeconfig of your cluster.

2 - Azure Permissions

Azure Permissions

The following document describes the required Azure actions manage a Shoot cluster on Azure split by the different Azure provider/services.

Be aware some actions are just required if particilar deployment sceanrios or features e.g. bring your own vNet, use Azure-file, let the Shoot act as Seed etc. should be used.

Microsoft.Compute

# Required if a non zonal cluster based on Availability Set should be used.
Microsoft.Compute/availabilitySets/delete
Microsoft.Compute/availabilitySets/read
Microsoft.Compute/availabilitySets/write

# Required to let Kubernetes manage Azure disks.
Microsoft.Compute/disks/delete
Microsoft.Compute/disks/read
Microsoft.Compute/disks/write

# Required for to fetch meta information about disk and virtual machines sizes.
Microsoft.Compute/locations/diskOperations/read
Microsoft.Compute/locations/operations/read
Microsoft.Compute/locations/vmSizes/read

# Required if csi snapshot capabilities should be used and/or the Shoot should act as a Seed.
Microsoft.Compute/snapshots/delete
Microsoft.Compute/snapshots/read
Microsoft.Compute/snapshots/write

# Required to let Gardener/Machine-Controller-Manager manage the cluster nodes/machines.
Microsoft.Compute/virtualMachines/delete
Microsoft.Compute/virtualMachines/read
Microsoft.Compute/virtualMachines/start/action
Microsoft.Compute/virtualMachines/write

# Required if a non zonal cluster based on VMSS Flex (VMO) should be used.
Microsoft.Compute/virtualMachineScaleSets/delete
Microsoft.Compute/virtualMachineScaleSets/read
Microsoft.Compute/virtualMachineScaleSets/write

Microsoft.ManagedIdentity

# Required if a user provided Azure managed identity should attached to the cluster nodes.
Microsoft.ManagedIdentity/userAssignedIdentities/assign/action
Microsoft.ManagedIdentity/userAssignedIdentities/read

Microsoft.MarketplaceOrdering

# Required if nodes/machines should be created with images hosted on the Azure Marketplace.
Microsoft.MarketplaceOrdering/offertypes/publishers/offers/plans/agreements/read
Microsoft.MarketplaceOrdering/offertypes/publishers/offers/plans/agreements/write

Microsoft.Network

# Required to let Kubernetes manage services of type 'LoadBalancer'.
Microsoft.Network/loadBalancers/backendAddressPools/join/action
Microsoft.Network/loadBalancers/delete
Microsoft.Network/loadBalancers/read
Microsoft.Network/loadBalancers/write

# Required in case the Shoot should use NatGateway(s).
Microsoft.Network/natGateways/delete
Microsoft.Network/natGateways/join/action
Microsoft.Network/natGateways/read
Microsoft.Network/natGateways/write

# Required to let Gardener/Machine-Controller-Manager manage the cluster nodes/machines.
Microsoft.Network/networkInterfaces/delete
Microsoft.Network/networkInterfaces/ipconfigurations/join/action
Microsoft.Network/networkInterfaces/ipconfigurations/read
Microsoft.Network/networkInterfaces/join/action
Microsoft.Network/networkInterfaces/read
Microsoft.Network/networkInterfaces/write

# Required to let Gardener maintain the basic infrastructure of the Shoot cluster and maintaing LoadBalancer services.
Microsoft.Network/networkSecurityGroups/delete
Microsoft.Network/networkSecurityGroups/join/action
Microsoft.Network/networkSecurityGroups/read
Microsoft.Network/networkSecurityGroups/write

# Required for managing LoadBalancers and NatGateways.
Microsoft.Network/publicIPAddresses/delete
Microsoft.Network/publicIPAddresses/join/action
Microsoft.Network/publicIPAddresses/read
Microsoft.Network/publicIPAddresses/write

# Required for managing the basic infrastructure of a cluster and maintaing LoadBalancer services.
Microsoft.Network/routeTables/delete
Microsoft.Network/routeTables/join/action
Microsoft.Network/routeTables/read
Microsoft.Network/routeTables/routes/delete
Microsoft.Network/routeTables/routes/read
Microsoft.Network/routeTables/routes/write
Microsoft.Network/routeTables/write

# Required to let Gardener maintain the basic infrastructure of the Shoot cluster.
# Only a subset is required for the bring your own vNet scenario.
Microsoft.Network/virtualNetworks/delete # not required for bring your own vnet
Microsoft.Network/virtualNetworks/read
Microsoft.Network/virtualNetworks/subnets/delete
Microsoft.Network/virtualNetworks/subnets/join/action
Microsoft.Network/virtualNetworks/subnets/read
Microsoft.Network/virtualNetworks/subnets/write
Microsoft.Network/virtualNetworks/write # not required for bring your own vnet

Microsoft.Resources

# Required to let Gardener maintain the basic infrastructure of the Shoot cluster.
Microsoft.Resources/subscriptions/resourceGroups/delete
Microsoft.Resources/subscriptions/resourceGroups/read
Microsoft.Resources/subscriptions/resourceGroups/write

Microsoft.Storage

# Required if Azure File should be used and/or if the Shoot should act as Seed.
Microsoft.Storage/operations/read
Microsoft.Storage/storageAccounts/blobServices/containers/delete
Microsoft.Storage/storageAccounts/blobServices/containers/read
Microsoft.Storage/storageAccounts/blobServices/containers/write
Microsoft.Storage/storageAccounts/blobServices/read
Microsoft.Storage/storageAccounts/delete
Microsoft.Storage/storageAccounts/listkeys/action
Microsoft.Storage/storageAccounts/read
Microsoft.Storage/storageAccounts/write

3 - Deployment

Deployment of the Azure provider extension

Disclaimer: This document is NOT a step by step installation guide for the Azure provider extension and only contains some configuration specifics regarding the installation of different components via the helm charts residing in the Azure provider extension repository.

gardener-extension-admission-azure

Authentication against the Garden cluster

There are several authentication possibilities depending on whether or not the concept of Virtual Garden is used.

Virtual Garden is not used, i.e., the runtime Garden cluster is also the target Garden cluster.

Automounted Service Account Token The easiest way to deploy the gardener-extension-admission-azure component will be to not provide kubeconfig at all. This way in-cluster configuration and an automounted service account token will be used. The drawback of this approach is that the automounted token will not be automatically rotated.

Service Account Token Volume Projection Another solution will be to use Service Account Token Volume Projection combined with a kubeconfig referencing a token file (see example below).

apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority-data: <CA-DATA>
    server: https://default.kubernetes.svc.cluster.local
  name: garden
contexts:
- context:
    cluster: garden
    user: garden
  name: garden
current-context: garden
users:
- name: garden
  user:
    tokenFile: /var/run/secrets/projected/serviceaccount/token

This will allow for automatic rotation of the service account token by the kubelet. The configuration can be achieved by setting both .Values.global.serviceAccountTokenVolumeProjection.enabled: true and .Values.global.kubeconfig in the respective chart’s values.yaml file.

Virtual Garden is used, i.e., the runtime Garden cluster is different from the target Garden cluster.

Service Account The easiest way to setup the authentication will be to create a service account and the respective roles will be bound to this service account in the target cluster. Then use the generated service account token and craft a kubeconfig which will be used by the workload in the runtime cluster. This approach does not provide a solution for the rotation of the service account token. However, this setup can be achieved by setting .Values.global.virtualGarden.enabled: true and following these steps:

  1. Deploy the application part of the charts in the target cluster.
  2. Get the service account token and craft the kubeconfig.
  3. Set the crafted kubeconfig and deploy the runtime part of the charts in the runtime cluster.

Client Certificate Another solution will be to bind the roles in the target cluster to a User subject instead of a service account and use a client certificate for authentication. This approach does not provide a solution for the client certificate rotation. However, this setup can be achieved by setting both .Values.global.virtualGarden.enabled: true and .Values.global.virtualGarden.user.name, then following these steps:

  1. Generate a client certificate for the target cluster for the respective user.
  2. Deploy the application part of the charts in the target cluster.
  3. Craft a kubeconfig using the already generated client certificate.
  4. Set the crafted kubeconfig and deploy the runtime part of the charts in the runtime cluster.

Projected Service Account Token This approach requires an already deployed and configured oidc-webhook-authenticator for the target cluster. Also the runtime cluster should be registered as a trusted identity provider in the target cluster. Then projected service accounts tokens from the runtime cluster can be used to authenticate against the target cluster. The needed steps are as follows:

  1. Deploy OWA and establish the needed trust.
  2. Set .Values.global.virtualGarden.enabled: true and .Values.global.virtualGarden.user.name. Note: username value will depend on the trust configuration, e.g., <prefix>:system:serviceaccount:<namespace>:<serviceaccount>
  3. Set .Values.global.serviceAccountTokenVolumeProjection.enabled: true and .Values.global.serviceAccountTokenVolumeProjection.audience. Note: audience value will depend on the trust configuration, e.g., <cliend-id-from-trust-config>.
  4. Craft a kubeconfig (see example below).
  5. Deploy the application part of the charts in the target cluster.
  6. Deploy the runtime part of the charts in the runtime cluster.
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority-data: <CA-DATA>
    server: https://virtual-garden.api
  name: virtual-garden
contexts:
- context:
    cluster: virtual-garden
    user: virtual-garden
  name: virtual-garden
current-context: virtual-garden
users:
- name: virtual-garden
  user:
    tokenFile: /var/run/secrets/projected/serviceaccount/token

4 - Local Setup

admission-azure

admission-azure is an admission webhook server which is responsible for the validation of the cloud provider (Azure in this case) specific fields and resources. The Gardener API server is cloud provider agnostic and it wouldn’t be able to perform similar validation.

Follow the steps below to run the admission webhook server locally.

  1. Start the Gardener API server.

    For details, check the Gardener local setup.

  2. Start the webhook server

    Make sure that the KUBECONFIG environment variable is pointing to the local garden cluster.

    make start-admission
    
  3. Setup the ValidatingWebhookConfiguration.

    hack/dev-setup-admission-azure.sh will configure the webhook Service which will allow the kube-apiserver of your local cluster to reach the webhook server. It will also apply the ValidatingWebhookConfiguration manifest.

    ./hack/dev-setup-admission-azure.sh
    

You are now ready to experiment with the admission-azure webhook server locally.

5 - Migrate Loadbalancer

Migrate Azure Shoot Load Balancer from basic to standard SKU

This guide descibes how to migrate the Load Balancer of an Azure Shoot cluster from the basic SKU to the standard SKU.
Be aware: You need to delete and recreate all services of type Load Balancer, which means that the public ip addresses of your service endpoints will change.
Please do this only if the Stakeholder really needs to migrate this Shoot to use standard Load Balancers. All new Shoot clusters will automatically use Azure Standard Load Balancers.

  1. Disable temporarily Gardeners reconciliation.
    The Gardener Controller Manager need to be configured to allow ignoring Shoot clusters. This can be configured in its the ControllerManagerConfiguration via the field .controllers.shoot.respectSyncPeriodOverwrite="true".
# In the Garden cluster.
kubectl annotate shoot <shoot-name> shoot.garden.sapcloud.io/ignore="true"

# In the Seed cluster.
kubectl -n <shoot-namespace> scale deployment gardener-resource-manager --replicas=0
  1. Backup all Kubernetes services of type Load Balancer.
# In the Shoot cluster.
# Determine all Load Balancer services.
kubectl get service --all-namespaces | grep LoadBalancer

# Backup each Load Balancer service.
echo "---" >> service-backup.yaml && kubectl -n <namespace> get service <service-name> -o yaml >> service-backup.yaml
  1. Delete all Load Balancer services.
# In the Shoot cluster.
kubectl -n <namespace> delete service <service-name>
  1. Wait until until Load Balancer is deleted. Wait until all services of type Load Balancer are deleted and the Azure Load Balancer resource is also deleted. Check via the Azure Portal if the Load Balancer within the Shoot Resource Group has been deleted. This should happen automatically after all Kubernetes Load Balancer service are gone within a few minutes.

Alternatively the Azure cli can be used to check the Load Balancer in the Shoot Resource Group. The credentials to configure the cli are available on the Seed cluster in the Shoot namespace.

# In the Seed cluster.
# Fetch the credentials from cloudprovider secret.
kubectl -n <shoot-namespace> get secret cloudprovider -o yaml

# Configure the Azure cli, with the base64 decoded values of the cloudprovider secret.
az login --service-principal --username <clientID> --password <clientSecret> --tenant <tenantID>
az account set -s <subscriptionID>

# Fetch the constantly the Shoot Load Balancer in the Shoot Resource Group. Wait until the resource is gone.
watch 'az network lb show -g shoot--<project-name>--<shoot-name> -n shoot--<project-name>--<shoot-name>'

# Logout.
az logout
  1. Modify the cloud-povider-config configmap in the Seed namespace of the Shoot.
    The key cloudprovider.conf contains the Kubernetes cloud-provider configuration. The value is a multiline string. Please change the value of the field loadBalancerSku from basic to standard. Iff the field does not exists then append loadBalancerSku: \"standard\"\n to the value/string.
# In the Seed cluster.
kubectl -n <shoot-namespace> edit cm cloud-provider-config
  1. Enable Gardeners reconcilation and trigger a reconciliation.
# In the Garden cluster
# Enable reconcilation
kubectl annotate shoot <shoot-name> shoot.garden.sapcloud.io/ignore-

# Trigger reconcilation
kubectl annotate shoot <shoot-name> shoot.garden.sapcloud.io/operation="reconcile"

Wait until the cluster has been reconciled.

  1. Recreate the services from the backup file.
    Probably you need to remove some fields from the service defintions e.g. .spec.clusterIP, .metadata.uid or .status etc.
kubectl apply -f service-backup.yaml
  1. If successful remove backup file.
# Delete the backup file.
rm -f service-backup.yaml

6 - Operations

Using the Azure provider extension with Gardener as an operator

The core.gardener.cloud/v1beta1.CloudProfile resource declares a providerConfig field that is meant to contain provider-specific configuration. The core.gardener.cloud/v1beta1.Seed resource is structured similarly. Additionally, it allows configuring settings for the backups of the main etcds’ data of shoot clusters control planes running in this seed cluster.

This document explains the necessary configuration for the Azure provider extension.

CloudProfile resource

This section describes, how the configuration for CloudProfiles looks like for Azure by providing an example CloudProfile manifest with minimal configuration that can be used to allow the creation of Azure shoot clusters.

CloudProfileConfig

The cloud profile configuration contains information about the real machine image IDs in the Azure environment (image urn, id, communityGalleryImageID or sharedGalleryImageID). You have to map every version that you specify in .spec.machineImages[].versions to an available VM image in your subscription. The VM image can be either from the Azure Marketplace and will then get identified via a urn, it can be a custom VM image from a shared image gallery and is then identified sharedGalleryImageID, or it can be from a community image gallery and is then identified by its communityGalleryImageID. You can use id field also to specifiy the image location in the azure compute gallery (in which case it would have a different kind of path) but it is not recommended as it sometimes faces problems in cross subscription image sharing. For each machine image version an architecture field can be specified which specifies the CPU architecture of the machine on which given machine image can be used.

An example CloudProfileConfig for the Azure extension looks as follows:

apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: CloudProfileConfig
countUpdateDomains:
- region: westeurope
  count: 5
countFaultDomains:
- region: westeurope
  count: 3
machineTypes:
- name: Standard_D3_v2
  acceleratedNetworking: true
- name: Standard_X
machineImages:
- name: coreos
  versions:
  - version: 2135.6.0
    urn: "CoreOS:CoreOS:Stable:2135.6.0"
    # architecture: amd64 # optional
    acceleratedNetworking: true
- name: myimage
  versions:
  - version: 1.0.0
    id: "/subscriptions/<subscription ID where the gallery is located>/resourceGroups/myGalleryRG/providers/Microsoft.Compute/galleries/myGallery/images/myImageDefinition/versions/1.0.0"
- name: GardenLinuxCommunityImage
  versions:
  - version: 1.0.0
    communityGalleryImageID: "/CommunityGalleries/gardenlinux-567905d8-921f-4a85-b423-1fbf4e249d90/Images/gardenlinux/Versions/576.1.1"
- name: SharedGalleryImageName
  versions:
    - version: 1.0.0
      sharedGalleryImageID: "/SharedGalleries/sharedGalleryName/Images/sharedGalleryImageName/Versions/sharedGalleryImageVersionName"

The cloud profile configuration contains information about the update via .countUpdateDomains[] and failure domain via .countFaultDomains[] counts in the Azure regions you want to offer.

The .machineTypes[] list contain provider specific information to the machine types e.g. if the machine type support Azure Accelerated Networking, see .machineTypes[].acceleratedNetworking.

Additionally, it contains the real machine image identifiers in the Azure environment. You can provide either URN for Azure Market Place images or id of Shared Image Gallery images. When Shared Image Gallery is used, you have to ensure that the image is available in the desired regions and the end-user subscriptions have access to the image or to the whole gallery. You have to map every version that you specify in .spec.machineImages[].versions here such that the Azure extension knows the machine image identifiers for every version you want to offer. Furthermore, you can specify for each image version via .machineImages[].versions[].acceleratedNetworking if Azure Accelerated Networking is supported.

Example CloudProfile manifest

The possible values for .spec.volumeTypes[].name on Azure are Standard_LRS, StandardSSD_LRS and Premium_LRS. There is another volume type called UltraSSD_LRS but this type is not supported to use as os disk. If an end user select a volume type whose name is not equal to one of the valid values then the machine will be created with the default volume type which belong to the selected machine type. Therefore it is recommended to configure only the valid values for the .spec.volumeType[].name in the CloudProfile.

Please find below an example CloudProfile manifest:

apiVersion: core.gardener.cloud/v1beta1
kind: CloudProfile
metadata:
  name: azure
spec:
  type: azure
  kubernetes:
    versions:
    - version: 1.28.2
    - version: 1.23.8
      expirationDate: "2022-10-31T23:59:59Z"
  machineImages:
  - name: coreos
    versions:
    - version: 2135.6.0
  machineTypes:
  - name: Standard_D3_v2
    cpu: "4"
    gpu: "0"
    memory: 14Gi
  - name: Standard_D4_v3
    cpu: "4"
    gpu: "0"
    memory: 16Gi
  volumeTypes:
  - name: Standard_LRS
    class: standard
    usable: true
  - name: StandardSSD_LRS
    class: premium
    usable: false
  - name: Premium_LRS
    class: premium
    usable: false
  regions:
  - name: westeurope
  providerConfig:
    apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
    kind: CloudProfileConfig
    machineTypes:
    - name: Standard_D3_v2
      acceleratedNetworking: true
    - name: Standard_D4_v3
    countUpdateDomains:
    - region: westeurope
      count: 5
    countFaultDomains:
    - region: westeurope
      count: 3
    machineImages:
    - name: coreos
      versions:
      - version: 2303.3.0
        urn: CoreOS:CoreOS:Stable:2303.3.0
        # architecture: amd64 # optional
        acceleratedNetworking: true
      - version: 2135.6.0
        urn: "CoreOS:CoreOS:Stable:2135.6.0"
        # architecture: amd64 # optional

Seed resource

This provider extension does not support any provider configuration for the Seed’s .spec.provider.providerConfig field. However, it supports managing of backup infrastructure, i.e., you can specify a configuration for the .spec.backup field.

Backup configuration

A Seed of type azure can be configured to perform backups for the main etcds’ of the shoot clusters control planes using Azure Blob storage.

The location/region where the backups will be stored defaults to the region of the Seed (spec.provider.region), but can also be explicitly configured via the field spec.backup.region. The region of the backup can be different from where the Seed cluster is running. However, usually it makes sense to pick the same region for the backup bucket as used for the Seed cluster.

Please find below an example Seed manifest (partly) that configures backups using Azure Blob storage.

---
apiVersion: core.gardener.cloud/v1beta1
kind: Seed
metadata:
  name: my-seed
spec:
  provider:
    type: azure
    region: westeurope
  backup:
    provider: azure
    region: westeurope # default region
    secretRef:
      name: backup-credentials
      namespace: garden
  ...

The referenced secret has to contain the provider credentials of the Azure subscription. Please take a look here on how to create an Azure Application, Service Principle and how to obtain credentials. The example below demonstrates how the secret has to look like.

apiVersion: v1
kind: Secret
metadata:
  name: core-azure
  namespace: garden-dev
type: Opaque
data:
  clientID: base64(client-id)
  clientSecret: base64(client-secret)
  subscriptionID: base64(subscription-id)
  tenantID: base64(tenant-id)

Permissions for Azure Blob storage

Please make sure the Azure application has the following IAM roles.

Miscellaneous

Gardener managed Service Principals

The operators of the Gardener Azure extension can provide a list of managed service principals (technical users) that can be used for Azure Shoots. This eliminates the need for users to provide own service principals for their clusters.

The user would need to grant the managed service principal access to their subscription with proper permissions.

As service principals are managed in an Azure Active Directory for each supported Active Directory, an own service principal needs to be provided.

In case the user provides an own service principal in the Shoot secret, this one will be used instead of the managed one provided by the operator.

Each managed service principal will be maintained in a Secret like that:

apiVersion: v1
kind: Secret
metadata:
  name: service-principal-my-tenant
  namespace: extension-provider-azure
  labels:
    azure.provider.extensions.gardener.cloud/purpose: tenant-service-principal-secret
data:
  tenantID: base64(my-tenant)
  clientID: base64(my-service-princiapl-id)
  clientSecret: base64(my-service-princiapl-secret)
type: Opaque

The user needs to provide in its Shoot secret a tenantID and subscriptionID.

The managed service principal will be assigned based on the tenantID. In case there is a managed service principal secret with a matching tenantID, this one will be used for the Shoot. If there is no matching managed service principal secret then the next Shoot operation will fail.

One of the benefits of having managed service principals is that the operator controls the lifecycle of the service principal and can rotate its secrets.

After the service principal secret has been rotated and the corresponding secret is updated, all Shoot clusters using it need to be reconciled or the last operation to be retried.

7 - Usage

Using the Azure provider extension with Gardener as end-user

The core.gardener.cloud/v1beta1.Shoot resource declares a few fields that are meant to contain provider-specific configuration.

This document describes the configurable options for Azure and provides an example Shoot manifest with minimal configuration that can be used to create an Azure cluster (modulo the landscape-specific information like cloud profile names, secret binding names, etc.).

Azure Provider Credentials

In order for Gardener to create a Kubernetes cluster using Azure infrastructure components, a Shoot has to provide credentials with sufficient permissions to the desired Azure subscription. Every shoot cluster references a SecretBinding which itself references a Secret, and this Secret contains the provider credentials of the Azure subscription. The SecretBinding is configurable in the Shoot cluster with the field secretBindingName.

Create an Azure Application and Service Principle and obtain its credentials.

Please ensure that the Azure application (spn) has the IAM actions defined here assigned. If no fine-grained permissions/actions required then simply assign the Contributor role.

The example below demonstrates how the secret containing the client credentials of the Azure Application has to look like:

apiVersion: v1
kind: Secret
metadata:
  name: core-azure
  namespace: garden-dev
type: Opaque
data:
  clientID: base64(client-id)
  clientSecret: base64(client-secret)
  subscriptionID: base64(subscription-id)
  tenantID: base64(tenant-id)

⚠️ Depending on your API usage it can be problematic to reuse the same Service Principal for different Shoot clusters due to rate limits. Please consider spreading your Shoots over Service Principals from different Azure subscriptions if you are hitting those limits.

Managed Service Principals

The operators of the Gardener Azure extension can provide managed service principals. This eliminates the need for users to provide an own service principal for a Shoot.

To make use of a managed service principal, the Azure secret of a Shoot cluster must contain only a subscriptionID and a tenantID field, but no clientID and clientSecret. Removing those fields from the secret of an existing Shoot will also let it adopt the managed service principal.

Based on the tenantID field, the Gardener extension will try to assign the managed service principal to the Shoot. If no managed service principal can be assigned then the next operation on the Shoot will fail.

⚠️ The managed service principal need to be assigned to the users Azure subscription with proper permissions before using it.

InfrastructureConfig

The infrastructure configuration mainly describes how the network layout looks like in order to create the shoot worker nodes in a later step, thus, prepares everything relevant to create VMs, load balancers, volumes, etc.

An example InfrastructureConfig for the Azure extension looks as follows:

apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
  vnet: # specify either 'name' and 'resourceGroup' or 'cidr'
    # name: my-vnet
    # resourceGroup: my-vnet-resource-group
    cidr: 10.250.0.0/16
    # ddosProtectionPlanID: /subscriptions/test/resourceGroups/test/providers/Microsoft.Network/ddosProtectionPlans/test-ddos-protection-plan
  workers: 10.250.0.0/19
  # natGateway:
  #   enabled: false
  #   idleConnectionTimeoutMinutes: 4
  #   zone: 1
  #   ipAddresses:
  #   - name: my-public-ip-name
  #     resourceGroup: my-public-ip-resource-group
  #     zone: 1
  # serviceEndpoints:
  # - Microsoft.Test
  # zones:
  # - name: 1
  #   cidr: "10.250.0.0/24
  # - name: 2
  #   cidr: "10.250.0.0/24"
  #   natGateway:
  #     enabled: false
zoned: false
# resourceGroup:
#   name: mygroup
#identity:
#  name: my-identity-name
#  resourceGroup: my-identity-resource-group
#  acrAccess: true

Currently, it’s not yet possible to deploy into existing resource groups, but in the future it will. The .resourceGroup.name field will allow specifying the name of an already existing resource group that the shoot cluster and all infrastructure resources will be deployed to.

Via the .zoned boolean you can tell whether you want to use Azure availability zones or not. If you don’t use zones then an availability set will be created and only basic load balancers will be used. Zoned clusters use standard load balancers.

The networks.vnet section describes whether you want to create the shoot cluster in an already existing VNet or whether to create a new one:

  • If networks.vnet.name and networks.vnet.resourceGroup are given then you have to specify the VNet name and VNet resource group name of the existing VNet that was created by other means (manually, other tooling, …).
  • If networks.vnet.cidr is given then you have to specify the VNet CIDR of a new VNet that will be created during shoot creation. You can freely choose a private CIDR range.
  • Either networks.vnet.name and neworks.vnet.resourceGroup or networks.vnet.cidr must be present, but not both at the same time.
  • The networks.vnet.ddosProtectionPlanID field can be used to specify the id of a ddos protection plan which should be assigned to the VNet. This will only work for a VNet managed by Gardener. For externally managed VNets the ddos protection plan must be assigned by other means.
  • If a vnet name is given and cilium shoot clusters are created without a network overlay within one vnet make sure that the pod CIDR specified in shoot.spec.networking.pods is not overlapping with any other pod CIDR used in that vnet. Overlapping pod CIDRs will lead to disfunctional shoot clusters.

The networks.workers section describes the CIDR for a subnet that is used for all shoot worker nodes, i.e., VMs which later run your applications. The specified CIDR range must be contained in the VNet CIDR specified above, or the VNet CIDR of your already existing VNet. You can freely choose this CIDR and it is your responsibility to properly design the network layout to suit your needs.

In the networks.serviceEndpoints[] list you can specify the list of Azure service endpoints which shall be associated with the worker subnet. All available service endpoints and their technical names can be found in the (Azure Service Endpoint documentation](https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview).

The networks.natGateway section contains configuration for the Azure NatGateway which can be attached to the worker subnet of a Shoot cluster. Here are some key information about the usage of the NatGateway for a Shoot cluster:

  • NatGateway usage is optional and can be enabled or disabled via .networks.natGateway.enabled.
  • If the NatGateway is not used then the egress connections initiated within the Shoot cluster will be nated via the LoadBalancer of the clusters (default Azure behaviour, see here).
  • NatGateway is only available for zonal clusters .zoned=true.
  • The NatGateway is currently not zone redundantly deployed. That mean the NatGateway of a Shoot cluster will always be in just one zone. This zone can be optionally selected via .networks.natGateway.zone.
  • Caution: Modifying the .networks.natGateway.zone setting requires a recreation of the NatGateway and the managed public ip (automatically used if no own public ip is specified, see below). That mean you will most likely get a different public ip for egress connections.
  • It is possible to bring own zonal public ip(s) via networks.natGateway.ipAddresses. Those public ip(s) need to be in the same zone as the NatGateway (see networks.natGateway.zone) and be of SKU standard. For each public ip the name, the resourceGroup and the zone need to be specified.
  • The field networks.natGateway.idleConnectionTimeoutMinutes allows the configuration of NAT Gateway’s idle connection timeout property. The idle timeout value can be adjusted from 4 minutes, up to 120 minutes. Omitting this property will set the idle timeout to its default value according to NAT Gateway’s documentation.

In the identity section you can specify an Azure user-assigned managed identity which should be attached to all cluster worker machines. With identity.name you can specify the name of the identity and with identity.resourceGroup you can specify the resource group which contains the identity resource on Azure. The identity need to be created by the user upfront (manually, other tooling, …). Gardener/Azure Extension will only use the referenced one and won’t create an identity. Furthermore the identity have to be in the same subscription as the Shoot cluster. Via the identity.acrAccess you can configure the worker machines to use the passed identity for pulling from an Azure Container Registry (ACR). Caution: Adding, exchanging or removing the identity will require a rolling update of all worker machines in the Shoot cluster.

Apart from the VNet and the worker subnet the Azure extension will also create a dedicated resource group, route tables, security groups, and an availability set (if not using zoned clusters).

InfrastructureConfig with dedicated subnets per zone

Another deployment option for zonal clusters only, is to create and configure a separate subnet per availability zone. This network layout is recommended to users that require fine-grained control over their network setup. One prevalent usecase is to create a zone-redundant NAT Gateway deployment by taking advantage of the ability to deploy separate NAT Gateways for each subnet.

To use this configuration the following requirements must be met:

  • the zoned field must be set to true.
  • the networks.vnet section must not be empty and must contain a valid configuration. For existing clusters that were not using the networks.vnet section, it is enough if networks.vnet.cidr field is set to the current networks.worker value.

For each of the target zones a subnet CIDR range must be specified. The specified CIDR range must be contained in the VNet CIDR specified above, or the VNet CIDR of your already existing VNet. In addition, the CIDR ranges must not overlap with the ranges of the other subnets.

ServiceEndpoints and NatGateways can be configured per subnet. Respectively, when networks.zones is specified, the fields networks.workers, networks.serviceEndpoints and networks.natGateway cannot be set. All the configuration for the subnets must be done inside the respective zone’s configuration.

Example:

apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
  zoned: true
  vnet: # specify either 'name' and 'resourceGroup' or 'cidr'
    cidr: 10.250.0.0/16
  zones:
  - name: 1
    cidr: "10.250.0.0/24"
  - name: 2
    cidr: "10.250.0.0/24"
    natGateway:
      enabled: false

Migrating to zonal shoots with dedicated subnets per zone

For existing zonal clusters it is possible to migrate to a network layout with dedicated subnets per zone. The migration works by creating additional network resources as specified in the configuration and progressively roll part of your existing nodes to use the new resources. To achieve the controlled rollout of your nodes, parts of the existing infrastructure must be preserved which is why the following constraint is imposed:

One of your specified zones must have the exact same CIDR range as the current network.workers field. Here is an example of such migration:

infrastructureConfig:
  apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
  kind: InfrastructureConfig
  networks:
    vnet:
      cidr: 10.250.0.0/16
    workers: 10.250.0.0/19
  zoned: true

to

infrastructureConfig:
  apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
  kind: InfrastructureConfig
  networks:
    vnet:
      cidr: 10.250.0.0/16
    zones:
      - name: 3
        cidr: 10.250.0.0/19 # note the preservation of the 'workers' CIDR
# optionally add other zones 
    # - name: 2  
    #   cidr: 10.250.32.0/19
    #   natGateway:
    #     enabled: true
  zoned: true

Another more advanced example with user-provided public IP addresses for the NAT Gateway and how it can be migrated:

infrastructureConfig:
  apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
  kind: InfrastructureConfig
  networks:
    vnet:
      cidr: 10.250.0.0/16
    workers: 10.250.0.0/19
    natGateway:
      enabled: true
      zone: 1
      ipAddresses:
        - name: pip1
          resourceGroup: group
          zone: 1
        - name: pip2
          resourceGroup: group
          zone: 1
  zoned: true

to

infrastructureConfig:
  apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
  kind: InfrastructureConfig
  zoned: true
  networks:
    vnet:
      cidr: 10.250.0.0/16
    zones:
      - name: 1
        cidr: 10.250.0.0/19 # note the preservation of the 'workers' CIDR
        natGateway:
          enabled: true
          ipAddresses:
            - name: pip1
              resourceGroup: group
              zone: 1
            - name: pip2
              resourceGroup: group
              zone: 1
# optionally add other zones 
#     - name: 2  
#       cidr: 10.250.32.0/19
#       natGateway:
#         enabled: true
#         ipAddresses:
#           - name: pip3
#             resourceGroup: group

You can apply such change to your shoot by issuing a kubectl patch command to replace your current .spec.provider.infrastructureConfig section:

$ cat new-infra.json

[
  {
    "op": "replace",
    "path": "/spec/provider/infrastructureConfig",
    "value": {
      "apiVersion": "azure.provider.extensions.gardener.cloud/v1alpha1",
      "kind": "InfrastructureConfig",
      "networks": {
        "vnet": {
          "cidr": "<your-vnet-cidr>"
        },
        "zones": [
          {
            "name": 1,
            "cidr": "10.250.0.0/24",
            "natGateway": {
              "enabled": true
            }
          },
          {
            "name": 1,
            "cidr": "10.250.1.0/24",
            "natGateway": {
              "enabled": true
            }
          },
        ]
      },
      "zoned": true
    }
  }
]

kubectl patch --type="json" --patch-file new-infra.json shoot <my-shoot>

⚠️ The migration to shoots with dedicated subnets per zone is a one-way process. Reverting the shoot to the previous configuration is not supported.

⚠️ During the migration a subset of the nodes will be rolled to the new subnets.

ControlPlaneConfig

The control plane configuration mainly contains values for the Azure-specific control plane components. Today, the only component deployed by the Azure extension is the cloud-controller-manager.

An example ControlPlaneConfig for the Azure extension looks as follows:

apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
cloudControllerManager:
  featureGates:
    RotateKubeletServerCertificate: true

The cloudControllerManager.featureGates contains a map of explicitly enabled or disabled feature gates. For production usage it’s not recommend to use this field at all as you can enable alpha features or disable beta/stable features, potentially impacting the cluster stability. If you don’t want to configure anything for the cloudControllerManager simply omit the key in the YAML specification.

storage contains options for storage-related control plane component. storage.managedDefaultStorageClass is enabled by default and will deploy a storageClass and mark it as a default (via the storageclass.kubernetes.io/is-default-class annotation) storage.managedDefaultVolumeSnapshotClass is enabled by default and will deploy a volumeSnapshotClass and mark it as a default (via the snapshot.storage.kubernetes.io/is-default-classs annotation) In case you want to manage your own default storageClass or volumeSnapshotClass you need to disable the respective options above, otherwise reconciliation of the controlplane may fail.

WorkerConfig

The Azure extension supports encryption for volumes plus support for additional data volumes per machine. Please note that you cannot specify the encrypted flag for Azure disks as they are encrypted by default/out-of-the-box. For each data volume, you have to specify a name. The following YAML is a snippet of a Shoot resource:

spec:
  provider:
    workers:
    - name: cpu-worker
      ...
      volume:
        type: Standard_LRS
        size: 20Gi
      dataVolumes:
      - name: kubelet-dir
        type: Standard_LRS
        size: 25Gi

Additionally, it supports for other Azure-specific values and could be configured under .spec.provider.workers[].providerConfig

An example WorkerConfig for the Azure extension looks like:

apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: WorkerConfig
nodeTemplate: # (to be specified only if the node capacity would be different from cloudprofile info during runtime)
  capacity:
    cpu: 2
    gpu: 1
    memory: 50Gi

The .nodeTemplate is used to specify resource information of the machine during runtime. This then helps in Scale-from-Zero. Some points to note for this field: - Currently only cpu, gpu and memory are configurable. - a change in the value lead to a rolling update of the machine in the workerpool - all the resources needs to be specified

Example Shoot manifest (non-zoned)

Please find below an example Shoot manifest for a non-zoned cluster:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: johndoe-azure
  namespace: garden-dev
spec:
  cloudProfileName: azure
  region: westeurope
  secretBindingName: core-azure
  provider:
    type: azure
    infrastructureConfig:
      apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
      kind: InfrastructureConfig
      networks:
        vnet:
          cidr: 10.250.0.0/16
        workers: 10.250.0.0/19
      zoned: false
    controlPlaneConfig:
      apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
      kind: ControlPlaneConfig
    workers:
    - name: worker-xoluy
      machine:
        type: Standard_D4_v3
      minimum: 2
      maximum: 2
      volume:
        size: 50Gi
        type: Standard_LRS
#      providerConfig:
#        apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
#        kind: WorkerConfig
#        nodeTemplate: # (to be specified only if the node capacity would be different from cloudprofile info during runtime)
#          capacity:
#            cpu: 2
#            gpu: 1
#            memory: 50Gi
  networking:
    type: calico
    pods: 100.96.0.0/11
    nodes: 10.250.0.0/16
    services: 100.64.0.0/13
  kubernetes:
    version: 1.28.2
  maintenance:
    autoUpdate:
      kubernetesVersion: true
      machineImageVersion: true
  addons:
    kubernetesDashboard:
      enabled: true
    nginxIngress:
      enabled: true

Example Shoot manifest (zoned)

Please find below an example Shoot manifest for a zoned cluster:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: johndoe-azure
  namespace: garden-dev
spec:
  cloudProfileName: azure
  region: westeurope
  secretBindingName: core-azure
  provider:
    type: azure
    infrastructureConfig:
      apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
      kind: InfrastructureConfig
      networks:
        vnet:
          cidr: 10.250.0.0/16
        workers: 10.250.0.0/19
      zoned: true
    controlPlaneConfig:
      apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
      kind: ControlPlaneConfig
    workers:
    - name: worker-xoluy
      machine:
        type: Standard_D4_v3
      minimum: 2
      maximum: 2
      volume:
        size: 50Gi
        type: Standard_LRS
      zones:
      - "1"
      - "2"
  networking:
    type: calico
    pods: 100.96.0.0/11
    nodes: 10.250.0.0/16
    services: 100.64.0.0/13
  kubernetes:
    version: 1.28.2
  maintenance:
    autoUpdate:
      kubernetesVersion: true
      machineImageVersion: true
  addons:
    kubernetesDashboard:
      enabled: true
    nginxIngress:
      enabled: true

Example Shoot manifest (zoned with NAT Gateways per zone)

Please find below an example Shoot manifest for a zoned cluster using NAT Gateways per zone:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: johndoe-azure
  namespace: garden-dev
spec:
  cloudProfileName: azure
  region: westeurope
  secretBindingName: core-azure
  provider:
    type: azure
    infrastructureConfig:
      apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
      kind: InfrastructureConfig
      networks:
        vnet:
          cidr: 10.250.0.0/16
        zones:
        - name: 1
          cidr: 10.250.0.0/24
          serviceEndpoints:
          - Microsoft.Storage
          - Microsoft.Sql
          natGateway:
            enabled: true
            idleConnectionTimeoutMinutes: 4
        - name: 2
          cidr: 10.250.1.0/24
          serviceEndpoints:
          - Microsoft.Storage
          - Microsoft.Sql
          natGateway:
            enabled: true
      zoned: true
    controlPlaneConfig:
      apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
      kind: ControlPlaneConfig
    workers:
    - name: worker-xoluy
      machine:
        type: Standard_D4_v3
      minimum: 2
      maximum: 2
      volume:
        size: 50Gi
        type: Standard_LRS
      zones:
      - "1"
      - "2"
  networking:
    type: calico
    pods: 100.96.0.0/11
    nodes: 10.250.0.0/16
    services: 100.64.0.0/13
  kubernetes:
    version: 1.28.2
  maintenance:
    autoUpdate:
      kubernetesVersion: true
      machineImageVersion: true
  addons:
    kubernetesDashboard:
      enabled: true
    nginxIngress:
      enabled: true

CSI volume provisioners

Every Azure shoot cluster will be deployed with the Azure Disk CSI driver and the Azure File CSI driver.

Kubernetes Versions per Worker Pool

This extension supports gardener/gardener’s WorkerPoolKubernetesVersion feature gate, i.e., having worker pools with overridden Kubernetes versions since gardener-extension-provider-azure@v1.25.

Shoot CA Certificate and ServiceAccount Signing Key Rotation

This extension supports gardener/gardener’s ShootCARotation and ShootSARotation feature gates since gardener-extension-provider-azure@v1.28.

Miscellaneous

Azure Accelerated Networking

All worker machines of the cluster will be automatically configured to use Azure Accelerated Networking if the prerequisites are fulfilled. The prerequisites are that the cluster must be zoned, and the used machine type and operating system image version are compatible for Accelerated Networking. Availability Set based shoot clusters will not be enabled for accelerated networking even if the machine type and operating system support it, this is necessary because all machines from the availability set must be scheduled on special hardware, more daitls can be found here. Supported machine types are listed in the CloudProfile in .spec.providerConfig.machineTypes[].acceleratedNetworking and the supported operating system image versions are defined in .spec.providerConfig.machineImages[].versions[].acceleratedNetworking.

Preview: Shoot clusters with VMSS Flexible Orchestration (VMSS Flex/VMO)

The machines of an Azure cluster can be created while being attached to an Azure Virtual Machine ScaleSet with flexible orchestraion. The Virtual Machine ScaleSet with flexible orchestration feature is currently in preview and not yet general available on Azure. Subscriptions need to join the preview to make use of the feature.

Azure VMSS Flex is intended to replace Azure AvailabilitySet for non-zoned Azure Shoot clusters in the mid-term (once the feature goes GA) as VMSS Flex come with less disadvantages like no blocking machine operations or compability with Standard SKU loadbalancer etc.

To configure an Azure Shoot cluster which make use of VMSS Flex you need to do the following:

  • The InfrastructureConfig of the Shoot configuration need to contain .zoned=false
  • Shoot resource need to have the following annotation assigned: alpha.azure.provider.extensions.gardener.cloud/vmo=true

Some key facts about VMSS Flex based clusters:

  • Unlike regular non-zonal Azure Shoot clusters, which have a primary AvailabilitySet which is shared between all machines in all worker pools of a Shoot cluster, a VMSS Flex based cluster has an own VMSS for each workerpool
  • In case the configuration of the VMSS will change (e.g. amount of fault domains in a region change; configured in the CloudProfile) all machines of the worker pool need to be rolled
  • It is not possible to migrate an existing primary AvailabilitySet based Shoot cluster to VMSS Flex based Shoot cluster and vice versa
  • VMSS Flex based clusters are using Standard SKU LoadBalancers instead of Basic SKU LoadBalancers for AvailabilitySet based Shoot clusters