그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그

  4 minute read  

Using the Networking Calico extension with Gardener as end-user

The core.gardener.cloud/v1beta1.Shoot resource declares a networking field that is meant to contain network-specific configuration.

In this document we are describing how this configuration looks like for Calico and provide an example Shoot manifest with minimal configuration that you can use to create a cluster.

Calico Typha

Calico Typha is an optional component of Project Calico designed to offload the Kubernetes API server. The Typha daemon sits between the datastore (such as the Kubernetes API server which is the one used by Gardener managed Kubernetes) and many instances of Felix. Typha’s main purpose is to increase scale by reducing each node’s impact on the datastore. You can opt-out Typha via .spec.networking.providerConfig.typha.enabled=false of your Shoot manifest. By default the Typha is enabled.

EBPF Dataplane

Calico can be run in ebpf dataplane mode. This has several benefits, calico scales to higher troughput, uses less cpu per GBit and has native support for kubernetes services (without needing kube-proxy). To switch to a pure ebpf dataplane it is recommended to run without an overlay network. The following configuration can be used to run without an overlay and without kube-proxy.

An example ebpf dataplane NetworkingConfig manifest:

apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
ebpfDataplane:
  enabled: true
overlay:
  enabled: false

To disable kube-proxy set the enabled field to false in the shoot manifest.

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: ebpf-shoot
  namespace: garden-dev
spec:
  kubernetes:
    kubeProxy:
      enabled: false

Know limitations of the EBPF Dataplane

Please note that the default settings for calico’s ebpf dataplane may interfere with accelerated networking in azure rendering nodes with accelerated networking unusable in the network. The reason for this is that calico does not ignore the accelerated networking interface enP... as it should, but applies its ebpf programs to it. A simple mitigation for this is to adapt the FelixConfiguration default and ensure that the bpfDataIfacePattern does not include enP.... Per default bpfDataIfacePattern is not set. The default value for this option can be found here. For example, you could apply the following change:

$ kubectl edit felixconfiguration default
...
apiVersion: crd.projectcalico.org/v1
kind: FelixConfiguration
metadata:
  ...
  name: default
  ...
spec:
  bpfDataIfacePattern: ^((en|wl|ww|sl|ib)[opsx].*|(eth|wlan|wwan).*|tunl0$|vxlan.calico$|wireguard.cali$|wg-v6.cali$)
  ...

AutoScaling

Autoscaling defines how the calico components are automatically scaled. It allows to use either static resource assignment, vertical pod or cluster-proportional autoscaler (default: cluster-proportional).

The cluster-proportional autoscaling mode is preferable when conditions require minimal disturbances and vpa mode for improved cluster resource utilization. Static resource assignments causes no disruptions due to autoscaling, but has no dynamics to handle changing demands.

Please note VPA must be enabled on the shoot as a pre-requisite to enabling vpa mode.

An example NetworkingConfig manifest for vertical pod autoscaling:

apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
autoScaling:
  mode: "vpa"

An example NetworkingConfig manifest for static resource assignment:

apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
autoScaling:
  mode: "static"
  resources:
    node:
      cpu: 100m
      memory: 100Mi
    typha:
      cpu: 100m
      memory: 100Mi

ℹ️ Please note that in static mode, you have the option to configure the resource requests for calico-node and calico-typha. If not specified, default settings will be used. If the resource requests are chosen too low, it might impact the stability/performance of the cluster. Specifying the resource requests for any other autoscaling mode has no effect.

Example NetworkingConfig manifest

An example NetworkingConfig for the Calico extension looks as follows:

apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
ipam:
  type: host-local
  cidr: usePodCIDR
vethMTU: 1440
typha:
  enabled: true
overlay:
  enabled: true
autoScaling:
  mode: "vpa"

Example Shoot manifest

Please find below an example Shoot manifest with calico networking configratations:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  name: johndoe-azure
  namespace: garden-dev
spec:
  cloudProfileName: azure
  region: westeurope
  secretBindingName: core-azure
  provider:
    type: azure
    infrastructureConfig:
      apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
      kind: InfrastructureConfig
      networks:
        vnet:
          cidr: 10.250.0.0/16
        workers: 10.250.0.0/19
      zoned: true
    controlPlaneConfig:
      apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
      kind: ControlPlaneConfig
    workers:
    - name: worker-xoluy
      machine:
        type: Standard_D4_v3
      minimum: 2
      maximum: 2
      volume:
        size: 50Gi
        type: Standard_LRS
      zones:
      - "1"
      - "2"
  networking:
    type: calico
    nodes: 10.250.0.0/16
    providerConfig:
      apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
      kind: NetworkConfig
      ipam:
        type: host-local
      vethMTU: 1440
      overlay:
        enabled: true
      typha:
        enabled: false
  kubernetes:
    version: 1.28.3
  maintenance:
    autoUpdate:
      kubernetesVersion: true
      machineImageVersion: true
  addons:
    kubernetesDashboard:
      enabled: true
    nginxIngress:
      enabled: true

Known Limitations in conjunction with NodeLocalDNS

If NodeLocalDNS is active in a shoot cluster, which uses calico as CNI without overlay network, it may be impossible to block DNS traffic to the cluster DNS server via network policy. This is due to FELIX_CHAININSERTMODE being set to APPEND instead of INSERT in case SNAT is being applied to requests to the infrastructure DNS server. In this scenario the iptables rules of NodeLocalDNS already accept the traffic before the network policies are checked.

This only applies to traffic directed to NodeLocalDNS. If blocking of all DNS traffic is desired via network policy the pod dnsPolicy should be changed to Default so that the cluster DNS is not used. Alternatives are usage of overlay network or disabling of NodeLocalDNS.