10 minute read
Using the KubeVirt provider extension with Gardener as end-user
The core.gardener.cloud/v1beta1.Shoot
resource declares a few fields that are meant to contain provider-specific configuration.
This document describes how this configuration looks like for KubeVirt and provides an example Shoot
manifest with minimal configuration that you can use to create a KubeVirt shoot cluster (without the landscape-specific information such as cloud profile names, secret binding names, etc.).
Provider Secret Data
Every shoot cluster references a SecretBinding
which itself references a Secret
, and this Secret
contains the kubeconfig of your KubeVirt provider cluster. This cluster is the cluster where KubeVirt itself is installed, and that hosts the KubeVirt virtual machines used as shoot worker nodes. This Secret
must look as follows:
apiVersion: v1
kind: Secret
metadata:
name: provider-cluster-kubeconfig
namespace: garden-dev
type: Opaque
data:
kubeconfig: base64(kubeconfig)
Permissions
All KubeVirt resources (VirtualMachines
, DataVolumes
, etc.) are created in the namespace of the current context of the above kubeconfig, that is my-shoot
in the example below:
...
current-context: provider-cluster
contexts:
- name: provider-cluster
context:
cluster: provider-cluster
namespace: my-shoot
user: provider-cluster-token
...
If no namespace is specified, the default
namespace is assumed. You can use the same namespace for multiple shoots. The user specified in the kubeconfig
must have permissions to read and write KubeVirt and Kubernetes core resources in this namespace.
InfrastructureConfig
The infrastructure configuration can contain additional networks used by the shoot worker nodes. If this configuration is empty, all KubeVirt virtual machines used as shoot worker nodes use only the pod network of the provider cluster.
An example InfrastructureConfig
for the KubeVirt extension looks as follows:
apiVersion: kubevirt.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
sharedNetworks:
# Reference to the network defined by the NetworkAttachmentDefinition default/net-conf
- name: net-conf
namespace: default
tenantNetworks:
- name: network-1
# Configuration for the CNI plugins bridge and firewall
config: |
{
"cniVersion": "0.4.0",
"name": "bridge-firewall",
"plugins": [
{
"type": "bridge",
"isGateway": true,
"isDefaultGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.100.0.0/16"
}
},
{
"type": "firewall"
}
]
}
# Don't attach the pod network at all, instead use this network as default
default: true
A non-empty infrastructure configuration can contain:
- References to pre-existing, shared networks that can be shared between multiple shoots. These networks must exist in the provider cluster prior to shoot creation.
- CNI configurations for tenant networks that are created, updated, and deleted together with the shoot. If one of these networks is marked as
default: true
, it becomes the default network instead of the pod network of the provider cluster. This can be used to achieve higher level of network isolation, since the networks of the different shoots can be isolated from each other, and in some cases better performance.
Both shared and tenant networks are maintained in the provider cluster via Multus CNI NetworkAttachmentDefinition resources. For shared networks, these resources must be created in advance, while for tenant networks they are managed by the shoot reconciliation process.
In order to use any additional CNI plugins in a tenant network configuration, such as bridge
or firewall
in the above example, the plugin binaries must be present in the /opt/cni/bin
directory of the provider cluster nodes. They can be installed manually by downloading a containernetworking/plugins release (not recommended except for testing a new configuration). Alternatively, they can be installed via a specially prepared daemon set that ensures the existence of the plugin binaries on each provider cluster node.
Note: Although it is possible to update the network configuration in InfrastructureConfig
, any such changes will result in recreating all KubeVirt VMs, so that the new network configuration is properly taken into account. This will be done automatically by the MCM using rolling update.
ControlPlaneConfig
The control plane configuration contains options for the KubeVirt-specific control plane components. Currently, the only component deployed by the KubeVirt extension is the KubeVirt Cloud Controller Manager (CCM).
An example ControlPlaneConfig
for the KubeVirt extension looks as follows:
apiVersion: kubevirt.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
cloudControllerManager:
featureGates:
CustomResourceValidation: true
The cloudControllerManager.featureGates
contains a map of explicitly enabled or disabled feature gates. For production usage it’s not recommend to use this field at all as you can enable alpha features or disable beta/stable features, potentially impacting the cluster stability. If you don’t want to configure anything for the CCM, simply omit the key in the YAML specification.
WorkerConfig
The KubeVirt extension supports specifying additional data volumes per machine in the worker pool. For each data volume, you must specify a name and a type.
Below is an example Shoot
resource snippet with root volume and data volumes:
spec:
provider:
workers:
- name: cpu-worker
...
volume:
type: default
size: 20Gi
dataVolumes:
- name: volume-1
type: default
size: 10Gi
Note: The additional data volumes will be attached as blank disks to the KubeVirt VMs. These disks must be formatted and mounted manually to the VM before they can be used.
The KubeVirt extension does not currently support encryption for volumes.
Additionally, it is possible to specify additional KubeVirt-specific options for configuring the worker pools. They can be specified in .spec.provider.workers[].providerConfig
and are evaluated by the KubeVirt worker controller when it reconciles the shoot machines.
An example WorkerConfig
for the KubeVirt extension looks as follows:
apiVersion: kubevirt.provider.extensions.gardener.cloud/v1alpha1
kind: WorkerConfig
devices:
# disks allow to customize disks attached to KubeVirt VM
# check [link](https://kubevirt.io/user-guide/#/creation/disks-and-volumes?id=disks-and-volumes) for full specification and options
disks:
# name must match defined dataVolume name
# to modify root volume the name must be equal to 'root-disk'
- name: root-disk # modify root-disk
# disk type, check [link](https://kubevirt.io/user-guide/#/creation/disks-and-volumes?id=disks) for more types
disk:
# bus indicates the type of disk device to emulate.
bus: virtio
# set disk device cache
cache: writethrough
# dedicatedIOThread indicates this disk should have an exclusive IO Thread
dedicatedIOThread: true
- name: volume-1 # modify dataVolume named volume-1
disk: {}
# whether to have random number generator from host
rng: {}
# whether or not to enable virtio multi-queue for block devices
blockMultiQueue: true
# if specified, virtual network interfaces configured with a virtio bus will also enable the vhost multiqueue feature
networkInterfaceMultiQueue: true
cpu:
# number of cores inside the VMI
cores: 1
# number of sockets inside the VMI
sockets: 2
# number of threads inside the VMI
threads: 1
# models specifies the CPU model of the VMI
# list of available models https://github.com/libvirt/libvirt/tree/master/src/cpu_map.
# and options https://libvirt.org/formatdomain.html#cpu-model-and-topology
model: "host-model"
# features specifies the CPU features list inside the VMI
features:
- "pcid"
# dedicatedCPUPlacement requests the scheduler to place the VirtualMachineInstance on a node
# with dedicated pCPUs and pin the vCPUs to it.
dedicatedCpuPlacement: false
# isolateEmulatorThread requests one more dedicated pCPU to be allocated for the VMI to place the emulator thread on it.
isolateEmulatorThread: false
# memory configuration for KubeVirt VMs, allows to set 'hugepages' and 'guest' settings.
# See https://kubevirt.io/api-reference/master/definitions.html#_v1_memory
memory:
# hugepages requires appropriate feature gate to be enabled, take a look at the following links for more details:
# * k8s - https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/
# * okd - https://docs.okd.io/latest/scalability_and_performance/what-huge-pages-do-and-how-they-are-consumed-by-apps.html
hugepages:
pageSize: "2Mi"
# guest allows to specifying the amount of memory which is visible inside the Guest OS. It must lie between requests and limits.
# Defaults to the requested memory in the machineTypes.
guest: "1Gi"
# overcommitGuestOverhead informs the scheduler to not take the guest-management overhead into account. Instead
# put the overhead only into the container's memory limit. This can lead to crashes if
# all memory is in use on a node. Defaults to false.
# For more details take a look at https://kubevirt.io/user-guide/#/usage/overcommit?id=overcommit-the-guest-overhead
overcommitGuestOverhead: true
# DNS policy for KubeVirt VMs. Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
# Defaults to 'ClusterFirst`.
# See https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
dnsPolicy: ClusterFirst
# DNS configuration for KubeVirt VMs, merged with the generated DNS configuration based on dnsPolicy.
# See https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
dnsConfig:
nameservers:
- 8.8.8.8
# Disable using pre-allocated data volumes. Defaults to 'false'.
disablePreAllocatedDataVolumes: true
# cpu allows to set the CPU topology of the VMI
# See https://kubevirt.io/api-reference/master/definitions.html#_v1_cpu
Currently, these KubeVirt-specific options may include:
- The CPU topology and memory configuration of the KubVirt VMs. For more information, see CPU.v1 and Memory.v1.
- The DNS policy and DNS configuration of the KubeVirt VMs. For more information, see DNS for Services and Pods.
- Whether to use pre-allocated data volumes with KubeVirt VMs. With pre-allocated data volumes (the default), a data volume is created in advance for each machine class, the OS image is imported into this volume only once, and actual KubeVirt VM data volumes are cloned from this data volume. Typically, this significantly speeds up the data volume creation process. You can disable this feature by setting the
disablePreAllocatedDataVolumes
option totrue
.
Region and Zone Support
Nodes in the provider cluster may belong to provider-specific regions and zones, and Kubernetes would then use this information to spread pods across zones as described in Running in multiple zones. You may want to take advantage of these capabilities in the shoot cluster as well.
To achieve this, the KubeVirt provider extension ensures that the region and zones specified in the Shoot
resource are taken into account when creating the KubeVirt VMs used as shoot cluster nodes.
Below is an example Shoot
resource snippet with region and zones:
spec:
region: europe-west1
provider:
...
workers:
- name: cpu-worker
...
zones:
- europe-west1-c
- europe-west1-d
The shoot region and zones must correspond to the region and zones of the provider cluster. A KubeVirt VM designated for specific region and zone will only be scheduled on provider cluster nodes belonging to these region and zone. If there are no such nodes, or they have insufficient resources, the KubeVirt VM may remain in Pending
state for a longer period and the shoot reconciliation may fail. Therefore, always make sure that the provider cluster contains nodes for all zones specified in the shoot.
If multiple zones are specified for a worker pool, the KubeVirt VMs will be equally distributed over these zones in the specified order.
If your provider cluster is not region and zone aware, or if it contains nodes that don’t belong to any region or zone, you can use default
as a region or zone name in the Shoot
resource to target such nodes.
Note that the region
and zones
are mandatory fields in the Shoot
resource, so you must specify either a concrete region / zone or default
.
Once the KubeVirt VMs are scheduled on the correct provider cluster nodes, the KubeVirt Cloud Controller Manager (CCM) mentioned above will appropriately label the shoot worker nodes themselves with the appropriate region and zone labels, by propagating the region and zone from the provider cluster nodes, so that Kubernetes multi-zone capabilities are also available in the shoot cluster.
Example Shoot
Manifest
Please find below an example Shoot
manifest for one availability zone:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: johndoe-kubevirt
namespace: garden-dev
spec:
cloudProfileName: kubevirt
secretBindingName: provider-cluster-kubeconfig
region: europe-west1
provider:
type: kubevirt
# infrastructureConfig:
# apiVersion: kubevirt.provider.extensions.gardener.cloud/v1alpha1
# kind: InfrastructureConfig
# networks:
# tenantNetworks:
# - name: network-1
# config: "{...}"
# default: true
# controlPlaneConfig:
# apiVersion: kubevirt.provider.extensions.gardener.cloud/v1alpha1
# kind: ControlPlaneConfig
# cloudControllerManager:
# featureGates:
# CustomResourceValidation: true
workers:
- name: cpu-worker
machine:
type: standard-1
image:
name: ubuntu
version: "18.04"
minimum: 1
maximum: 2
volume:
type: default
size: 20Gi
# dataVolumes:
# - name: volume-1
# type: default
# size: 10Gi
# providerConfig:
# apiVersion: kubevirt.provider.extensions.gardener.cloud/v1alpha1
# kind: WorkerConfig
# disablePreAllocatedDataVolumes: true
zones:
- europe-west1-c
networking:
type: calico
pods: 100.96.0.0/11
# Must match the IPAM subnet of the default tenant network, if present.
# Otherwise, must be the same as the provider cluster pod network range.
nodes: 10.225.128.0/17 # 10.100.0.0/16
services: 100.64.0.0/13
kubernetes:
version: 1.17.8
maintenance:
autoUpdate:
kubernetesVersion: true
machineImageVersion: true
addons:
kubernetesDashboard:
enabled: true
nginxIngress:
enabled: true
Feedback
Was this page helpful?