2 - Azure Permissions
Azure Permissions
The following document describes the required Azure actions manage a Shoot cluster on Azure split by the different Azure provider/services.
Be aware some actions are just required if particilar deployment sceanrios or features e.g. bring your own vNet, use Azure-file, let the Shoot act as Seed etc. should be used.
Microsoft.Compute
# Required if a non zonal cluster based on Availability Set should be used.
Microsoft.Compute/availabilitySets/delete
Microsoft.Compute/availabilitySets/read
Microsoft.Compute/availabilitySets/write
# Required to let Kubernetes manage Azure disks.
Microsoft.Compute/disks/delete
Microsoft.Compute/disks/read
Microsoft.Compute/disks/write
# Required for to fetch meta information about disk and virtual machines sizes.
Microsoft.Compute/locations/diskOperations/read
Microsoft.Compute/locations/operations/read
Microsoft.Compute/locations/vmSizes/read
# Required if csi snapshot capabilities should be used and/or the Shoot should act as a Seed.
Microsoft.Compute/snapshots/delete
Microsoft.Compute/snapshots/read
Microsoft.Compute/snapshots/write
# Required to let Gardener/Machine-Controller-Manager manage the cluster nodes/machines.
Microsoft.Compute/virtualMachines/delete
Microsoft.Compute/virtualMachines/read
Microsoft.Compute/virtualMachines/start/action
Microsoft.Compute/virtualMachines/write
# Required if a non zonal cluster based on VMSS Flex (VMO) should be used.
Microsoft.Compute/virtualMachineScaleSets/delete
Microsoft.Compute/virtualMachineScaleSets/read
Microsoft.Compute/virtualMachineScaleSets/write
Microsoft.ManagedIdentity
# Required if a user provided Azure managed identity should attached to the cluster nodes.
Microsoft.ManagedIdentity/userAssignedIdentities/assign/action
Microsoft.ManagedIdentity/userAssignedIdentities/read
Microsoft.MarketplaceOrdering
# Required if nodes/machines should be created with images hosted on the Azure Marketplace.
Microsoft.MarketplaceOrdering/offertypes/publishers/offers/plans/agreements/read
Microsoft.MarketplaceOrdering/offertypes/publishers/offers/plans/agreements/write
Microsoft.Network
# Required to let Kubernetes manage services of type 'LoadBalancer'.
Microsoft.Network/loadBalancers/backendAddressPools/join/action
Microsoft.Network/loadBalancers/delete
Microsoft.Network/loadBalancers/read
Microsoft.Network/loadBalancers/write
# Required in case the Shoot should use NatGateway(s).
Microsoft.Network/natGateways/delete
Microsoft.Network/natGateways/join/action
Microsoft.Network/natGateways/read
Microsoft.Network/natGateways/write
# Required to let Gardener/Machine-Controller-Manager manage the cluster nodes/machines.
Microsoft.Network/networkInterfaces/delete
Microsoft.Network/networkInterfaces/ipconfigurations/join/action
Microsoft.Network/networkInterfaces/ipconfigurations/read
Microsoft.Network/networkInterfaces/join/action
Microsoft.Network/networkInterfaces/read
Microsoft.Network/networkInterfaces/write
# Required to let Gardener maintain the basic infrastructure of the Shoot cluster and maintaing LoadBalancer services.
Microsoft.Network/networkSecurityGroups/delete
Microsoft.Network/networkSecurityGroups/join/action
Microsoft.Network/networkSecurityGroups/read
Microsoft.Network/networkSecurityGroups/write
# Required for managing LoadBalancers and NatGateways.
Microsoft.Network/publicIPAddresses/delete
Microsoft.Network/publicIPAddresses/join/action
Microsoft.Network/publicIPAddresses/read
Microsoft.Network/publicIPAddresses/write
# Required for managing the basic infrastructure of a cluster and maintaing LoadBalancer services.
Microsoft.Network/routeTables/delete
Microsoft.Network/routeTables/join/action
Microsoft.Network/routeTables/read
Microsoft.Network/routeTables/routes/delete
Microsoft.Network/routeTables/routes/read
Microsoft.Network/routeTables/routes/write
Microsoft.Network/routeTables/write
# Required to let Gardener maintain the basic infrastructure of the Shoot cluster.
# Only a subset is required for the bring your own vNet scenario.
Microsoft.Network/virtualNetworks/delete # not required for bring your own vnet
Microsoft.Network/virtualNetworks/read
Microsoft.Network/virtualNetworks/subnets/delete
Microsoft.Network/virtualNetworks/subnets/join/action
Microsoft.Network/virtualNetworks/subnets/read
Microsoft.Network/virtualNetworks/subnets/write
Microsoft.Network/virtualNetworks/write # not required for bring your own vnet
Microsoft.Resources
# Required to let Gardener maintain the basic infrastructure of the Shoot cluster.
Microsoft.Resources/subscriptions/resourceGroups/delete
Microsoft.Resources/subscriptions/resourceGroups/read
Microsoft.Resources/subscriptions/resourceGroups/write
Microsoft.Storage
# Required if Azure File should be used and/or if the Shoot should act as Seed.
Microsoft.Storage/operations/read
Microsoft.Storage/storageAccounts/blobServices/containers/delete
Microsoft.Storage/storageAccounts/blobServices/containers/read
Microsoft.Storage/storageAccounts/blobServices/containers/write
Microsoft.Storage/storageAccounts/blobServices/read
Microsoft.Storage/storageAccounts/delete
Microsoft.Storage/storageAccounts/listkeys/action
Microsoft.Storage/storageAccounts/read
Microsoft.Storage/storageAccounts/write
3 - Deployment
Deployment of the Azure provider extension
Disclaimer: This document is NOT a step by step installation guide for the Azure provider extension and only contains some configuration specifics regarding the installation of different components via the helm charts residing in the Azure provider extension repository.
gardener-extension-admission-azure
Authentication against the Garden cluster
There are several authentication possibilities depending on whether or not the concept of Virtual Garden is used.
Virtual Garden is not used, i.e., the runtime
Garden cluster is also the target
Garden cluster.
Automounted Service Account Token
The easiest way to deploy the gardener-extension-admission-azure
component will be to not provide kubeconfig
at all. This way in-cluster configuration and an automounted service account token will be used. The drawback of this approach is that the automounted token will not be automatically rotated.
Service Account Token Volume Projection
Another solution will be to use Service Account Token Volume Projection combined with a kubeconfig
referencing a token file (see example below).
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: <CA-DATA>
server: https://default.kubernetes.svc.cluster.local
name: garden
contexts:
- context:
cluster: garden
user: garden
name: garden
current-context: garden
users:
- name: garden
user:
tokenFile: /var/run/secrets/projected/serviceaccount/token
This will allow for automatic rotation of the service account token by the kubelet
. The configuration can be achieved by setting both .Values.global.serviceAccountTokenVolumeProjection.enabled: true
and .Values.global.kubeconfig
in the respective chart’s values.yaml
file.
Virtual Garden is used, i.e., the runtime
Garden cluster is different from the target
Garden cluster.
Service Account
The easiest way to setup the authentication will be to create a service account and the respective roles will be bound to this service account in the target
cluster. Then use the generated service account token and craft a kubeconfig
which will be used by the workload in the runtime
cluster. This approach does not provide a solution for the rotation of the service account token. However, this setup can be achieved by setting .Values.global.virtualGarden.enabled: true
and following these steps:
- Deploy the
application
part of the charts in the target
cluster. - Get the service account token and craft the
kubeconfig
. - Set the crafted
kubeconfig
and deploy the runtime
part of the charts in the runtime
cluster.
Client Certificate
Another solution will be to bind the roles in the target
cluster to a User
subject instead of a service account and use a client certificate for authentication. This approach does not provide a solution for the client certificate rotation. However, this setup can be achieved by setting both .Values.global.virtualGarden.enabled: true
and .Values.global.virtualGarden.user.name
, then following these steps:
- Generate a client certificate for the
target
cluster for the respective user. - Deploy the
application
part of the charts in the target
cluster. - Craft a
kubeconfig
using the already generated client certificate. - Set the crafted
kubeconfig
and deploy the runtime
part of the charts in the runtime
cluster.
Projected Service Account Token
This approach requires an already deployed and configured oidc-webhook-authenticator for the target
cluster. Also the runtime
cluster should be registered as a trusted identity provider in the target
cluster. Then projected service accounts tokens from the runtime
cluster can be used to authenticate against the target
cluster. The needed steps are as follows:
- Deploy OWA and establish the needed trust.
- Set
.Values.global.virtualGarden.enabled: true
and .Values.global.virtualGarden.user.name
. Note: username value will depend on the trust configuration, e.g., <prefix>:system:serviceaccount:<namespace>:<serviceaccount>
- Set
.Values.global.serviceAccountTokenVolumeProjection.enabled: true
and .Values.global.serviceAccountTokenVolumeProjection.audience
. Note: audience value will depend on the trust configuration, e.g., <cliend-id-from-trust-config>
. - Craft a kubeconfig (see example below).
- Deploy the
application
part of the charts in the target
cluster. - Deploy the
runtime
part of the charts in the runtime
cluster.
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: <CA-DATA>
server: https://virtual-garden.api
name: virtual-garden
contexts:
- context:
cluster: virtual-garden
user: virtual-garden
name: virtual-garden
current-context: virtual-garden
users:
- name: virtual-garden
user:
tokenFile: /var/run/secrets/projected/serviceaccount/token
6 - Operations
Using the Azure provider extension with Gardener as an operator
The core.gardener.cloud/v1beta1.CloudProfile
resource declares a providerConfig
field that is meant to contain provider-specific configuration.
The core.gardener.cloud/v1beta1.Seed
resource is structured similarly.
Additionally, it allows configuring settings for the backups of the main etcds’ data of shoot clusters control planes running in this seed cluster.
This document explains the necessary configuration for the Azure provider extension.
CloudProfile
resource
This section describes, how the configuration for CloudProfile
s looks like for Azure by providing an example CloudProfile
manifest with minimal configuration that can be used to allow the creation of Azure shoot clusters.
CloudProfileConfig
The cloud profile configuration contains information about the real machine image IDs in the Azure environment (image urn
, id
, communityGalleryImageID
or sharedGalleryImageID
).
You have to map every version that you specify in .spec.machineImages[].versions
to an available VM image in your subscription.
The VM image can be either from the Azure Marketplace and will then get identified via a urn
, it can be a custom VM image from a shared image gallery and is then identified sharedGalleryImageID
, or it can be from a community image gallery and is then identified by its communityGalleryImageID
. You can use id
field also to specifiy the image location in the azure compute gallery (in which case it would have a different kind of path) but it is not recommended as it sometimes faces problems in cross subscription image sharing.
For each machine image version an architecture
field can be specified which specifies the CPU architecture of the machine on which given machine image can be used.
An example CloudProfileConfig
for the Azure extension looks as follows:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: CloudProfileConfig
countUpdateDomains:
- region: westeurope
count: 5
countFaultDomains:
- region: westeurope
count: 3
machineTypes:
- name: Standard_D3_v2
acceleratedNetworking: true
- name: Standard_X
machineImages:
- name: coreos
versions:
- version: 2135.6.0
urn: "CoreOS:CoreOS:Stable:2135.6.0"
# architecture: amd64 # optional
acceleratedNetworking: true
- name: myimage
versions:
- version: 1.0.0
id: "/subscriptions/<subscription ID where the gallery is located>/resourceGroups/myGalleryRG/providers/Microsoft.Compute/galleries/myGallery/images/myImageDefinition/versions/1.0.0"
- name: GardenLinuxCommunityImage
versions:
- version: 1.0.0
communityGalleryImageID: "/CommunityGalleries/gardenlinux-567905d8-921f-4a85-b423-1fbf4e249d90/Images/gardenlinux/Versions/576.1.1"
- name: SharedGalleryImageName
versions:
- version: 1.0.0
sharedGalleryImageID: "/SharedGalleries/sharedGalleryName/Images/sharedGalleryImageName/Versions/sharedGalleryImageVersionName"
The cloud profile configuration contains information about the update via .countUpdateDomains[]
and failure domain via .countFaultDomains[]
counts in the Azure regions you want to offer.
The .machineTypes[]
list contain provider specific information to the machine types e.g. if the machine type support Azure Accelerated Networking, see .machineTypes[].acceleratedNetworking
.
Additionally, it contains the real machine image identifiers in the Azure environment. You can provide either URN for Azure Market Place images or id of Shared Image Gallery images.
When Shared Image Gallery is used, you have to ensure that the image is available in the desired regions and the end-user subscriptions have access to the image or to the whole gallery.
You have to map every version that you specify in .spec.machineImages[].versions
here such that the Azure extension knows the machine image identifiers for every version you want to offer.
Furthermore, you can specify for each image version via .machineImages[].versions[].acceleratedNetworking
if Azure Accelerated Networking is supported.
Example CloudProfile
manifest
The possible values for .spec.volumeTypes[].name
on Azure are Standard_LRS
, StandardSSD_LRS
and Premium_LRS
. There is another volume type called UltraSSD_LRS
but this type is not supported to use as os disk. If an end user select a volume type whose name is not equal to one of the valid values then the machine will be created with the default volume type which belong to the selected machine type. Therefore it is recommended to configure only the valid values for the .spec.volumeType[].name
in the CloudProfile
.
Please find below an example CloudProfile
manifest:
apiVersion: core.gardener.cloud/v1beta1
kind: CloudProfile
metadata:
name: azure
spec:
type: azure
kubernetes:
versions:
- version: 1.28.2
- version: 1.23.8
expirationDate: "2022-10-31T23:59:59Z"
machineImages:
- name: coreos
versions:
- version: 2135.6.0
machineTypes:
- name: Standard_D3_v2
cpu: "4"
gpu: "0"
memory: 14Gi
- name: Standard_D4_v3
cpu: "4"
gpu: "0"
memory: 16Gi
volumeTypes:
- name: Standard_LRS
class: standard
usable: true
- name: StandardSSD_LRS
class: premium
usable: false
- name: Premium_LRS
class: premium
usable: false
regions:
- name: westeurope
providerConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: CloudProfileConfig
machineTypes:
- name: Standard_D3_v2
acceleratedNetworking: true
- name: Standard_D4_v3
countUpdateDomains:
- region: westeurope
count: 5
countFaultDomains:
- region: westeurope
count: 3
machineImages:
- name: coreos
versions:
- version: 2303.3.0
urn: CoreOS:CoreOS:Stable:2303.3.0
# architecture: amd64 # optional
acceleratedNetworking: true
- version: 2135.6.0
urn: "CoreOS:CoreOS:Stable:2135.6.0"
# architecture: amd64 # optional
Seed
resource
This provider extension does not support any provider configuration for the Seed
’s .spec.provider.providerConfig
field.
However, it supports managing of backup infrastructure, i.e., you can specify a configuration for the .spec.backup
field.
Backup configuration
A Seed of type azure
can be configured to perform backups for the main etcds’ of the shoot clusters control planes using Azure Blob storage.
The location/region where the backups will be stored defaults to the region of the Seed (spec.provider.region
), but can also be explicitly configured via the field spec.backup.region
.
The region of the backup can be different from where the Seed cluster is running.
However, usually it makes sense to pick the same region for the backup bucket as used for the Seed cluster.
Please find below an example Seed
manifest (partly) that configures backups using Azure Blob storage.
---
apiVersion: core.gardener.cloud/v1beta1
kind: Seed
metadata:
name: my-seed
spec:
provider:
type: azure
region: westeurope
backup:
provider: azure
region: westeurope # default region
secretRef:
name: backup-credentials
namespace: garden
...
The referenced secret has to contain the provider credentials of the Azure subscription.
Please take a look here on how to create an Azure Application, Service Principle and how to obtain credentials.
The example below demonstrates how the secret has to look like.
apiVersion: v1
kind: Secret
metadata:
name: core-azure
namespace: garden-dev
type: Opaque
data:
clientID: base64(client-id)
clientSecret: base64(client-secret)
subscriptionID: base64(subscription-id)
tenantID: base64(tenant-id)
Permissions for Azure Blob storage
Please make sure the Azure application has the following IAM roles.
Miscellaneous
Gardener managed Service Principals
The operators of the Gardener Azure extension can provide a list of managed service principals (technical users) that can be used for Azure Shoots.
This eliminates the need for users to provide own service principals for their clusters.
The user would need to grant the managed service principal access to their subscription with proper permissions.
As service principals are managed in an Azure Active Directory for each supported Active Directory, an own service principal needs to be provided.
In case the user provides an own service principal in the Shoot secret, this one will be used instead of the managed one provided by the operator.
Each managed service principal will be maintained in a Secret
like that:
apiVersion: v1
kind: Secret
metadata:
name: service-principal-my-tenant
namespace: extension-provider-azure
labels:
azure.provider.extensions.gardener.cloud/purpose: tenant-service-principal-secret
data:
tenantID: base64(my-tenant)
clientID: base64(my-service-princiapl-id)
clientSecret: base64(my-service-princiapl-secret)
type: Opaque
The user needs to provide in its Shoot secret a tenantID
and subscriptionID
.
The managed service principal will be assigned based on the tenantID
.
In case there is a managed service principal secret with a matching tenantID
, this one will be used for the Shoot.
If there is no matching managed service principal secret then the next Shoot operation will fail.
One of the benefits of having managed service principals is that the operator controls the lifecycle of the service principal and can rotate its secrets.
After the service principal secret has been rotated and the corresponding secret is updated, all Shoot clusters using it need to be reconciled or the last operation to be retried.
7 - Usage
Using the Azure provider extension with Gardener as end-user
The core.gardener.cloud/v1beta1.Shoot
resource declares a few fields that are meant to contain provider-specific configuration.
This document describes the configurable options for Azure and provides an example Shoot
manifest with minimal configuration that can be used to create an Azure cluster (modulo the landscape-specific information like cloud profile names, secret binding names, etc.).
Azure Provider Credentials
In order for Gardener to create a Kubernetes cluster using Azure infrastructure components, a Shoot has to provide credentials with sufficient permissions to the desired Azure subscription.
Every shoot cluster references a SecretBinding
which itself references a Secret
, and this Secret
contains the provider credentials of the Azure subscription.
The SecretBinding
is configurable in the Shoot cluster with the field secretBindingName
.
Create an Azure Application and Service Principle and obtain its credentials.
Please ensure that the Azure application (spn) has the IAM actions defined here assigned.
If no fine-grained permissions/actions required then simply assign the Contributor role.
The example below demonstrates how the secret containing the client credentials of the Azure Application has to look like:
apiVersion: v1
kind: Secret
metadata:
name: core-azure
namespace: garden-dev
type: Opaque
data:
clientID: base64(client-id)
clientSecret: base64(client-secret)
subscriptionID: base64(subscription-id)
tenantID: base64(tenant-id)
⚠️ Depending on your API usage it can be problematic to reuse the same Service Principal for different Shoot clusters due to rate limits.
Please consider spreading your Shoots over Service Principals from different Azure subscriptions if you are hitting those limits.
Managed Service Principals
The operators of the Gardener Azure extension can provide managed service principals.
This eliminates the need for users to provide an own service principal for a Shoot.
To make use of a managed service principal, the Azure secret of a Shoot cluster must contain only a subscriptionID
and a tenantID
field, but no clientID
and clientSecret
.
Removing those fields from the secret of an existing Shoot will also let it adopt the managed service principal.
Based on the tenantID
field, the Gardener extension will try to assign the managed service principal to the Shoot.
If no managed service principal can be assigned then the next operation on the Shoot will fail.
⚠️ The managed service principal need to be assigned to the users Azure subscription with proper permissions before using it.
InfrastructureConfig
The infrastructure configuration mainly describes how the network layout looks like in order to create the shoot worker nodes in a later step, thus, prepares everything relevant to create VMs, load balancers, volumes, etc.
An example InfrastructureConfig
for the Azure extension looks as follows:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vnet: # specify either 'name' and 'resourceGroup' or 'cidr'
# name: my-vnet
# resourceGroup: my-vnet-resource-group
cidr: 10.250.0.0/16
# ddosProtectionPlanID: /subscriptions/test/resourceGroups/test/providers/Microsoft.Network/ddosProtectionPlans/test-ddos-protection-plan
workers: 10.250.0.0/19
# natGateway:
# enabled: false
# idleConnectionTimeoutMinutes: 4
# zone: 1
# ipAddresses:
# - name: my-public-ip-name
# resourceGroup: my-public-ip-resource-group
# zone: 1
# serviceEndpoints:
# - Microsoft.Test
# zones:
# - name: 1
# cidr: "10.250.0.0/24
# - name: 2
# cidr: "10.250.0.0/24"
# natGateway:
# enabled: false
zoned: false
# resourceGroup:
# name: mygroup
#identity:
# name: my-identity-name
# resourceGroup: my-identity-resource-group
# acrAccess: true
Currently, it’s not yet possible to deploy into existing resource groups, but in the future it will.
The .resourceGroup.name
field will allow specifying the name of an already existing resource group that the shoot cluster and all infrastructure resources will be deployed to.
Via the .zoned
boolean you can tell whether you want to use Azure availability zones or not.
If you don’t use zones then an availability set will be created and only basic load balancers will be used.
Zoned clusters use standard load balancers.
The networks.vnet
section describes whether you want to create the shoot cluster in an already existing VNet or whether to create a new one:
- If
networks.vnet.name
and networks.vnet.resourceGroup
are given then you have to specify the VNet name and VNet resource group name of the existing VNet that was created by other means (manually, other tooling, …). - If
networks.vnet.cidr
is given then you have to specify the VNet CIDR of a new VNet that will be created during shoot creation.
You can freely choose a private CIDR range. - Either
networks.vnet.name
and neworks.vnet.resourceGroup
or networks.vnet.cidr
must be present, but not both at the same time. - The
networks.vnet.ddosProtectionPlanID
field can be used to specify the id of a ddos protection plan which should be assigned to the VNet. This will only work for a VNet managed by Gardener. For externally managed VNets the ddos protection plan must be assigned by other means. - If a vnet name is given and cilium shoot clusters are created without a network overlay within one vnet make sure that the pod CIDR specified in
shoot.spec.networking.pods
is not overlapping with any other pod CIDR used in that vnet.
Overlapping pod CIDRs will lead to disfunctional shoot clusters.
The networks.workers
section describes the CIDR for a subnet that is used for all shoot worker nodes, i.e., VMs which later run your applications.
The specified CIDR range must be contained in the VNet CIDR specified above, or the VNet CIDR of your already existing VNet.
You can freely choose this CIDR and it is your responsibility to properly design the network layout to suit your needs.
In the networks.serviceEndpoints[]
list you can specify the list of Azure service endpoints which shall be associated with the worker subnet. All available service endpoints and their technical names can be found in the (Azure Service Endpoint documentation](https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview).
The networks.natGateway
section contains configuration for the Azure NatGateway which can be attached to the worker subnet of a Shoot cluster. Here are some key information about the usage of the NatGateway for a Shoot cluster:
- NatGateway usage is optional and can be enabled or disabled via
.networks.natGateway.enabled
. - If the NatGateway is not used then the egress connections initiated within the Shoot cluster will be nated via the LoadBalancer of the clusters (default Azure behaviour, see here).
- NatGateway is only available for zonal clusters
.zoned=true
. - The NatGateway is currently not zone redundantly deployed. That mean the NatGateway of a Shoot cluster will always be in just one zone. This zone can be optionally selected via
.networks.natGateway.zone
. - Caution: Modifying the
.networks.natGateway.zone
setting requires a recreation of the NatGateway and the managed public ip (automatically used if no own public ip is specified, see below). That mean you will most likely get a different public ip for egress connections. - It is possible to bring own zonal public ip(s) via
networks.natGateway.ipAddresses
. Those public ip(s) need to be in the same zone as the NatGateway (see networks.natGateway.zone
) and be of SKU standard
. For each public ip the name
, the resourceGroup
and the zone
need to be specified. - The field
networks.natGateway.idleConnectionTimeoutMinutes
allows the configuration of NAT Gateway’s idle connection timeout property. The idle timeout value can be adjusted from 4 minutes, up to 120 minutes. Omitting this property will set the idle timeout to its default value according to NAT Gateway’s documentation.
In the identity
section you can specify an Azure user-assigned managed identity which should be attached to all cluster worker machines. With identity.name
you can specify the name of the identity and with identity.resourceGroup
you can specify the resource group which contains the identity resource on Azure. The identity need to be created by the user upfront (manually, other tooling, …). Gardener/Azure Extension will only use the referenced one and won’t create an identity. Furthermore the identity have to be in the same subscription as the Shoot cluster. Via the identity.acrAccess
you can configure the worker machines to use the passed identity for pulling from an Azure Container Registry (ACR).
Caution: Adding, exchanging or removing the identity will require a rolling update of all worker machines in the Shoot cluster.
Apart from the VNet and the worker subnet the Azure extension will also create a dedicated resource group, route tables, security groups, and an availability set (if not using zoned clusters).
InfrastructureConfig with dedicated subnets per zone
Another deployment option for zonal clusters only, is to create and configure a separate subnet per availability zone. This network layout is recommended to users that require fine-grained control over their network setup. One prevalent usecase is to create a zone-redundant NAT Gateway deployment by taking advantage of the ability to deploy separate NAT Gateways for each subnet.
To use this configuration the following requirements must be met:
- the
zoned
field must be set to true
. - the
networks.vnet
section must not be empty and must contain a valid configuration. For existing clusters that were not using the networks.vnet
section, it is enough if networks.vnet.cidr
field is set to the current networks.worker
value.
For each of the target zones a subnet CIDR range must be specified. The specified CIDR range must be contained in the VNet CIDR specified above, or the VNet CIDR of your already existing VNet. In addition, the CIDR ranges must not overlap with the ranges of the other subnets.
ServiceEndpoints and NatGateways can be configured per subnet. Respectively, when networks.zones
is specified, the fields networks.workers
, networks.serviceEndpoints
and networks.natGateway
cannot be set. All the configuration for the subnets must be done inside the respective zone’s configuration.
Example:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
zoned: true
vnet: # specify either 'name' and 'resourceGroup' or 'cidr'
cidr: 10.250.0.0/16
zones:
- name: 1
cidr: "10.250.0.0/24"
- name: 2
cidr: "10.250.0.0/24"
natGateway:
enabled: false
Migrating to zonal shoots with dedicated subnets per zone
For existing zonal clusters it is possible to migrate to a network layout with dedicated subnets per zone. The migration works by creating additional network resources as specified in the configuration and progressively roll part of your existing nodes to use the new resources. To achieve the controlled rollout of your nodes, parts of the existing infrastructure must be preserved which is why the following constraint is imposed:
One of your specified zones must have the exact same CIDR range as the current network.workers
field. Here is an example of such migration:
infrastructureConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vnet:
cidr: 10.250.0.0/16
workers: 10.250.0.0/19
zoned: true
to
infrastructureConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vnet:
cidr: 10.250.0.0/16
zones:
- name: 3
cidr: 10.250.0.0/19 # note the preservation of the 'workers' CIDR
# optionally add other zones
# - name: 2
# cidr: 10.250.32.0/19
# natGateway:
# enabled: true
zoned: true
Another more advanced example with user-provided public IP addresses for the NAT Gateway and how it can be migrated:
infrastructureConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vnet:
cidr: 10.250.0.0/16
workers: 10.250.0.0/19
natGateway:
enabled: true
zone: 1
ipAddresses:
- name: pip1
resourceGroup: group
zone: 1
- name: pip2
resourceGroup: group
zone: 1
zoned: true
to
infrastructureConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
zoned: true
networks:
vnet:
cidr: 10.250.0.0/16
zones:
- name: 1
cidr: 10.250.0.0/19 # note the preservation of the 'workers' CIDR
natGateway:
enabled: true
ipAddresses:
- name: pip1
resourceGroup: group
zone: 1
- name: pip2
resourceGroup: group
zone: 1
# optionally add other zones
# - name: 2
# cidr: 10.250.32.0/19
# natGateway:
# enabled: true
# ipAddresses:
# - name: pip3
# resourceGroup: group
You can apply such change to your shoot by issuing a kubectl patch
command to replace your current .spec.provider.infrastructureConfig
section:
$ cat new-infra.json
[
{
"op": "replace",
"path": "/spec/provider/infrastructureConfig",
"value": {
"apiVersion": "azure.provider.extensions.gardener.cloud/v1alpha1",
"kind": "InfrastructureConfig",
"networks": {
"vnet": {
"cidr": "<your-vnet-cidr>"
},
"zones": [
{
"name": 1,
"cidr": "10.250.0.0/24",
"natGateway": {
"enabled": true
}
},
{
"name": 1,
"cidr": "10.250.1.0/24",
"natGateway": {
"enabled": true
}
},
]
},
"zoned": true
}
}
]
kubectl patch --type="json" --patch-file new-infra.json shoot <my-shoot>
⚠️ The migration to shoots with dedicated subnets per zone is a one-way process. Reverting the shoot to the previous configuration is not supported.
⚠️ During the migration a subset of the nodes will be rolled to the new subnets.
ControlPlaneConfig
The control plane configuration mainly contains values for the Azure-specific control plane components.
Today, the only component deployed by the Azure extension is the cloud-controller-manager
.
An example ControlPlaneConfig
for the Azure extension looks as follows:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
cloudControllerManager:
# featureGates:
# SomeKubernetesFeature: true
The cloudControllerManager.featureGates
contains a map of explicitly enabled or disabled feature gates.
For production usage it’s not recommend to use this field at all as you can enable alpha features or disable beta/stable features, potentially impacting the cluster stability.
If you don’t want to configure anything for the cloudControllerManager
simply omit the key in the YAML specification.
storage
contains options for storage-related control plane component.
storage.managedDefaultStorageClass
is enabled by default and will deploy a storageClass
and mark it as a default (via the storageclass.kubernetes.io/is-default-class
annotation)
storage.managedDefaultVolumeSnapshotClass
is enabled by default and will deploy a volumeSnapshotClass
and mark it as a default (via the snapshot.storage.kubernetes.io/is-default-classs
annotation)
In case you want to manage your own default storageClass
or volumeSnapshotClass
you need to disable the respective options above, otherwise reconciliation of the controlplane may fail.
WorkerConfig
The Azure extension supports encryption for volumes plus support for additional data volumes per machine.
Please note that you cannot specify the encrypted
flag for Azure disks as they are encrypted by default/out-of-the-box.
For each data volume, you have to specify a name.
The following YAML is a snippet of a Shoot
resource:
spec:
provider:
workers:
- name: cpu-worker
...
volume:
type: Standard_LRS
size: 20Gi
dataVolumes:
- name: kubelet-dir
type: Standard_LRS
size: 25Gi
Additionally, it supports for other Azure-specific values and could be configured under .spec.provider.workers[].providerConfig
An example WorkerConfig
for the Azure extension looks like:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: WorkerConfig
nodeTemplate: # (to be specified only if the node capacity would be different from cloudprofile info during runtime)
capacity:
cpu: 2
gpu: 1
memory: 50Gi
diagnosticsProfile:
enabled: true
# storageURI: https://<storage-account-name>.blob.core.windows.net/
The .nodeTemplate
is used to specify resource information of the machine during runtime. This then helps in Scale-from-Zero.
Some points to note for this field:
- Currently only cpu, gpu and memory are configurable.
- a change in the value lead to a rolling update of the machine in the worker pool
- all the resources needs to be specified
The .diagnosticsProfile
is used to enable machine boot diagnostics (disabled per default).
A storage account is used for storing vm’s boot console output and screenshots.
If .diagnosticsProfile.StorageURI
is not specified azure managed storage will be used (recommended way).
Example Shoot
manifest (non-zoned)
Please find below an example Shoot
manifest for a non-zoned cluster:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: johndoe-azure
namespace: garden-dev
spec:
cloudProfileName: azure
region: westeurope
secretBindingName: core-azure
provider:
type: azure
infrastructureConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vnet:
cidr: 10.250.0.0/16
workers: 10.250.0.0/19
zoned: false
controlPlaneConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
workers:
- name: worker-xoluy
machine:
type: Standard_D4_v3
minimum: 2
maximum: 2
volume:
size: 50Gi
type: Standard_LRS
# providerConfig:
# apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
# kind: WorkerConfig
# nodeTemplate: # (to be specified only if the node capacity would be different from cloudprofile info during runtime)
# capacity:
# cpu: 2
# gpu: 1
# memory: 50Gi
networking:
type: calico
pods: 100.96.0.0/11
nodes: 10.250.0.0/16
services: 100.64.0.0/13
kubernetes:
version: 1.28.2
maintenance:
autoUpdate:
kubernetesVersion: true
machineImageVersion: true
addons:
kubernetesDashboard:
enabled: true
nginxIngress:
enabled: true
Example Shoot
manifest (zoned)
Please find below an example Shoot
manifest for a zoned cluster:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: johndoe-azure
namespace: garden-dev
spec:
cloudProfileName: azure
region: westeurope
secretBindingName: core-azure
provider:
type: azure
infrastructureConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vnet:
cidr: 10.250.0.0/16
workers: 10.250.0.0/19
zoned: true
controlPlaneConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
workers:
- name: worker-xoluy
machine:
type: Standard_D4_v3
minimum: 2
maximum: 2
volume:
size: 50Gi
type: Standard_LRS
zones:
- "1"
- "2"
networking:
type: calico
pods: 100.96.0.0/11
nodes: 10.250.0.0/16
services: 100.64.0.0/13
kubernetes:
version: 1.28.2
maintenance:
autoUpdate:
kubernetesVersion: true
machineImageVersion: true
addons:
kubernetesDashboard:
enabled: true
nginxIngress:
enabled: true
Example Shoot
manifest (zoned with NAT Gateways per zone)
Please find below an example Shoot
manifest for a zoned cluster using NAT Gateways per zone:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: johndoe-azure
namespace: garden-dev
spec:
cloudProfileName: azure
region: westeurope
secretBindingName: core-azure
provider:
type: azure
infrastructureConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vnet:
cidr: 10.250.0.0/16
zones:
- name: 1
cidr: 10.250.0.0/24
serviceEndpoints:
- Microsoft.Storage
- Microsoft.Sql
natGateway:
enabled: true
idleConnectionTimeoutMinutes: 4
- name: 2
cidr: 10.250.1.0/24
serviceEndpoints:
- Microsoft.Storage
- Microsoft.Sql
natGateway:
enabled: true
zoned: true
controlPlaneConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
workers:
- name: worker-xoluy
machine:
type: Standard_D4_v3
minimum: 2
maximum: 2
volume:
size: 50Gi
type: Standard_LRS
zones:
- "1"
- "2"
networking:
type: calico
pods: 100.96.0.0/11
nodes: 10.250.0.0/16
services: 100.64.0.0/13
kubernetes:
version: 1.28.2
maintenance:
autoUpdate:
kubernetesVersion: true
machineImageVersion: true
addons:
kubernetesDashboard:
enabled: true
nginxIngress:
enabled: true
CSI volume provisioners
Every Azure shoot cluster will be deployed with the Azure Disk CSI driver and the Azure File CSI driver.
Kubernetes Versions per Worker Pool
This extension supports gardener/gardener
’s WorkerPoolKubernetesVersion
feature gate, i.e., having worker pools with overridden Kubernetes versions since gardener-extension-provider-azure@v1.25
.
Shoot CA Certificate and ServiceAccount
Signing Key Rotation
This extension supports gardener/gardener
’s ShootCARotation
and ShootSARotation
feature gates since gardener-extension-provider-azure@v1.28
.
Miscellaneous
Azure Accelerated Networking
All worker machines of the cluster will be automatically configured to use Azure Accelerated Networking if the prerequisites are fulfilled.
The prerequisites are that the cluster must be zoned, and the used machine type and operating system image version are compatible for Accelerated Networking.
Availability Set
based shoot clusters will not be enabled for accelerated networking even if the machine type and operating system support it, this is necessary because all machines from the availability set must be scheduled on special hardware, more daitls can be found here.
Supported machine types are listed in the CloudProfile in .spec.providerConfig.machineTypes[].acceleratedNetworking
and the supported operating system image versions are defined in .spec.providerConfig.machineImages[].versions[].acceleratedNetworking
.
Preview: Shoot clusters with VMSS Flexible Orchestration (VMSS Flex/VMO)
The machines of an Azure cluster can be created while being attached to an Azure Virtual Machine ScaleSet with flexible orchestraion.
The Virtual Machine ScaleSet with flexible orchestration feature is currently in preview and not yet general available on Azure.
Subscriptions need to join the preview to make use of the feature.
Azure VMSS Flex is intended to replace Azure AvailabilitySet for non-zoned Azure Shoot clusters in the mid-term (once the feature goes GA) as VMSS Flex come with less disadvantages like no blocking machine operations or compability with Standard
SKU loadbalancer etc.
To configure an Azure Shoot cluster which make use of VMSS Flex you need to do the following:
- The
InfrastructureConfig
of the Shoot configuration need to contain .zoned=false
- Shoot resource need to have the following annotation assigned:
alpha.azure.provider.extensions.gardener.cloud/vmo=true
Some key facts about VMSS Flex based clusters:
- Unlike regular non-zonal Azure Shoot clusters, which have a primary AvailabilitySet which is shared between all machines in all worker pools of a Shoot cluster, a VMSS Flex based cluster has an own VMSS for each workerpool
- In case the configuration of the VMSS will change (e.g. amount of fault domains in a region change; configured in the CloudProfile) all machines of the worker pool need to be rolled
- It is not possible to migrate an existing primary AvailabilitySet based Shoot cluster to VMSS Flex based Shoot cluster and vice versa
- VMSS Flex based clusters are using
Standard
SKU LoadBalancers instead of Basic
SKU LoadBalancers for AvailabilitySet based Shoot clusters