7 minute read
Using the GCP provider extension with Gardener as end-user
core.gardener.cloud/v1beta1.Shoot resource declares a few fields that are meant to contain provider-specific configuration.
This document describes the configurable options for GCP and provides an example
Shoot manifest with minimal configuration that can be used to create a GCP cluster (modulo the landscape-specific information like cloud profile names, secret binding names, etc.).
GCP Provider Credentials
In order for Gardener to create a Kubernetes cluster using GCP infrastructure components, a Shoot has to provide credentials with sufficient permissions to the desired GCP project.
Every shoot cluster references a
SecretBinding which itself references a
Secret, and this
Secret contains the provider credentials of the GCP project.
SecretBinding is configurable in the Shoot cluster with the field
The required credentials for the GCP project are a Service Account Key to authenticate as a GCP Service Account. A service account is a special account that can be used by services and applications to interact with Google Cloud Platform APIs. Applications can use service account credentials to authorize themselves to a set of APIs and perform actions within the permissions granted to the service account.
Make sure to enable the Google Identity and Access Management (IAM) API. Create a Service Account that shall be used for the Shoot cluster. Grant at least the following IAM roles to the Service Account.
- Service Account Admin
- Service Account Token Creator
- Service Account User
- Compute Admin
Create a JSON Service Account key for the Service Account.
Provide it in the
Secret (base64 encoded for field
serviceaccount.json), that is being referenced by the
SecretBinding in the Shoot cluster configuration.
Secret must look as follows:
apiVersion: v1 kind: Secret metadata: name: core-gcp namespace: garden-dev type: Opaque data: serviceaccount.json: base64(serviceaccount-json)
⚠️ Depending on your API usage it can be problematic to reuse the same Service Account Key for different Shoot clusters due to rate limits. Please consider spreading your Shoots over multiple Service Accounts on different GCP projects if you are hitting those limits, see https://cloud.google.com/compute/docs/api-rate-limits.
The infrastructure configuration mainly describes how the network layout looks like in order to create the shoot worker nodes in a later step, thus, prepares everything relevant to create VMs, load balancers, volumes, etc.
InfrastructureConfig for the GCP extension looks as follows:
apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1 kind: InfrastructureConfig networks: # vpc: # name: my-vpc # cloudRouter: # name: my-cloudrouter workers: 10.250.0.0/16 # internal: 10.251.0.0/16 # cloudNAT: # minPortsPerVM: 2048 # natIPNames: # - name: manualnat1 # - name: manualnat2 # flowLogs: # aggregationInterval: INTERVAL_5_SEC # flowSampling: 0.2 # metadata: INCLUDE_ALL_METADATA
networks.vpc section describes whether you want to create the shoot cluster in an already existing VPC or whether to create a new one:
networks.vpc.nameis given then you have to specify the VPC name of the existing VPC that was created by other means (manually, other tooling, …). If you want to get a fresh VPC for the shoot then just omit the
If a VPC name is not given then we will create the cloud router + NAT gateway to ensure that worker nodes don’t get external IPs.
If a VPC name is given then a cloud router name must also be given, failure to do so would result in validation errors and possibly clusters without egress connectivity.
networks.workers section describes the CIDR for a subnet that is used for all shoot worker nodes, i.e., VMs which later run your applications.
networks.internal section is optional and can describe a CIDR for a subnet that is used for internal load balancers,
networks.cloudNAT.minPortsPerVM is optional and is used to define the minimum number of ports allocated to a VM for the CloudNAT
networks.cloudNAT.natIPNames is optional and is used to specify the names of the manual ip addresses which should be used by the nat gateway
The specified CIDR ranges must be contained in the VPC CIDR specified above, or the VPC CIDR of your already existing VPC. You can freely choose these CIDRs and it is your responsibility to properly design the network layout to suit your needs.
networks.flowLogs section describes the configuration for the VPC flow logs. In order to enable the VPC flow logs at least one of the following parameters needs to be specified in the flow log section:
networks.flowLogs.aggregationIntervalan optional parameter describing the aggregation interval for collecting flow logs. For more details, see aggregation_interval reference.
networks.flowLogs.flowSamplingan optional parameter describing the sampling rate of VPC flow logs within the subnetwork where 1.0 means all collected logs are reported and 0.0 means no logs are reported. For more details, see flow_sampling reference.
networks.flowLogs.metadataan optional parameter describing whether metadata fields should be added to the reported VPC flow logs. For more details, see metadata reference.
Apart from the VPC and the subnets the GCP extension will also create a dedicated service account for this shoot, and firewall rules.
The control plane configuration mainly contains values for the GCP-specific control plane components.
Today, the only component deployed by the GCP extension is the
ControlPlaneConfig for the GCP extension looks as follows:
apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1 kind: ControlPlaneConfig zone: europe-west1-b cloudControllerManager: featureGates: CustomResourceValidation: true
zone field tells the cloud-controller-manager in which zone it should mainly operate.
You can still create clusters in multiple availability zones, however, the cloud-controller-manager requires one “main” zone.
⚠️ You always have to specify this field!
cloudControllerManager.featureGates contains a map of explicitly enabled or disabled feature gates.
For production usage it’s not recommend to use this field at all as you can enable alpha features or disable beta/stable features, potentially impacting the cluster stability.
If you don’t want to configure anything for the
cloudControllerManager simply omit the key in the YAML specification.
The worker configuration contains:
Local SSD interface for the additional volumes attached to GCP worker machines.
If you attach the disk with
SCRATCHtype, either an
NVMeinterface or a
SCSIinterface must be specified. It is only meaningful to provide this volume interface if only
SCRATCHdata volumes are used.
Service Account with their specified scopes, authorized for this worker.
Service accounts created in advance that generate access tokens that can be accessed through the metadata server and used to authenticate applications on the instance.
GPU with its type and count per node. This will attach that GPU to all the machines in the worker grp
- A rolling upgrade of the worker group would be triggered in case the
- Some machineTypes like a2 family come with already attached gpu of
a100type and pre-defined count. If your workerPool consists of those machineTypes, please do not specify any GPU configuration.
- Sufficient quota of gpu is needed in the GCP project. This includes quota to support autoscaling if enabled.
- GPU-attached machines can’t be live migrated during host maintenance events. Find out how to handle that in your application here
- GPU count specified here is considered for forming node template during scale-from-zero in Cluster Autoscaler
WorkerConfigfor the GCP looks as follows:
- A rolling upgrade of the worker group would be triggered in case the
apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1 kind: WorkerConfig volume: interface: NVME serviceAccount: email: firstname.lastname@example.org scopes: - https://www.googleapis.com/auth/cloud-platform gpu: acceleratorType: nvidia-tesla-t4 count: 1
Please find below an example
apiVersion: core.gardener.cloud/v1alpha1 kind: Shoot metadata: name: johndoe-gcp namespace: garden-dev spec: cloudProfileName: gcp region: europe-west1 secretBindingName: core-gcp provider: type: gcp infrastructureConfig: apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1 kind: InfrastructureConfig networks: workers: 10.250.0.0/16 controlPlaneConfig: apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1 kind: ControlPlaneConfig zone: europe-west1-b workers: - name: worker-xoluy machine: type: n1-standard-4 minimum: 2 maximum: 2 volume: size: 50Gi type: pd-standard zones: - europe-west1-b networking: nodes: 10.250.0.0/16 type: calico kubernetes: version: 1.24.3 maintenance: autoUpdate: kubernetesVersion: true machineImageVersion: true addons: kubernetes-dashboard: enabled: true nginx-ingress: enabled: true
CSI volume provisioners
Every GCP shoot cluster that has at least Kubernetes v1.18 will be deployed with the GCP PD CSI driver.
It is compatible with the legacy in-tree volume provisioner that was deprecated by the Kubernetes community and will be removed in future versions of Kubernetes.
End-users might want to update their custom
StorageClasses to the new
Shoot clusters with Kubernetes v1.17 or less will use the in-tree
kubernetes.io/gce-pd volume provisioner in the kube-controller-manager and the kubelet.
Kubernetes Versions per Worker Pool
This extension supports
WorkerPoolKubernetesVersion feature gate, i.e., having worker pools with overridden Kubernetes versions since
Note that this feature is only usable for
.spec.kubernetes.version is greater or equal than the CSI migration version (
Shoot CA Certificate and
ServiceAccount Signing Key Rotation
This extension supports
ShootSARotation feature gates since