Calico CNI
Gardener extension controller for the Calico CNI network plugin
This controller operates on the Network
resource in the extensions.gardener.cloud/v1alpha1
API group. It manages those objects that are requesting Calico Networking configuration (.spec.type=calico
):
---
apiVersion: extensions.gardener.cloud/v1alpha1
kind: Network
metadata:
name: calico-network
namespace: shoot--core--test-01
spec:
type: calico
clusterCIDR: 192.168.0.0/24
serviceCIDR: 10.96.0.0/24
providerConfig:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
overlay:
enabled: false
Please find a concrete example in the example
folder. All the Calico
specific configuration
should be configured in the providerConfig
section. If additional configuration is required, it should be added to
the networking-calico
chart in controllers/networking-calico/charts/internal/calico/values.yaml
and corresponding code
parts should be adapted (for example in controllers/networking-calico/pkg/charts/utils.go
).
Once the network resource is applied, the networking-calico
controller would then create all the necessary managed-resources
which should be picked
up by the gardener-resource-manager which will then apply all the
network extensions resources to the shoot cluster.
Finally after successful reconciliation an output similar to the one below should be expected.
status:
lastOperation:
description: Successfully reconciled network
lastUpdateTime: "..."
progress: 100
state: Succeeded
type: Reconcile
observedGeneration: 1
providerStatus:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkStatus
Compatibility
The following table lists known compatibility issues of this extension controller with other Gardener components.
Calico Extension | Gardener | Action | Notes |
---|
>= v1.30.0 | < v1.63.0 | Please first update Gardener components to >= v1.63.0 . | Without the mentioned minimum Gardener version, Calico Pod s are not only scheduled to dedicated system component nodes in the shoot cluster. |
How to start using or developing this extension controller locally
You can run the controller locally on your machine by executing make start
. Please make sure to have the kubeconfig
pointed to the cluster you want to connect to.
Static code checks and tests can be executed by running make verify
. We are using Go modules for Golang package dependency management and Ginkgo/Gomega for testing.
Feedback and Support
Feedback and contributions are always welcome. Please report bugs or suggestions as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).
Learn more!
Please find further resources about out project here:
1 - Deployment
Deployment of the networking Calico extension
Disclaimer: This document is NOT a step by step deployment guide for the networking Calico extension and only contains some configuration specifics regarding the deployment of different components via the helm charts residing in the networking Calico extension repository.
gardener-extension-admission-calico
Authentication against the Garden cluster
There are several authentication possibilities depending on whether or not the concept of Virtual Garden is used.
Virtual Garden is not used, i.e., the runtime
Garden cluster is also the target
Garden cluster.
Automounted Service Account Token
The easiest way to deploy the gardener-extension-admission-calico
component will be to not provide kubeconfig
at all. This way in-cluster configuration and an automounted service account token will be used. The drawback of this approach is that the automounted token will not be automatically rotated.
Service Account Token Volume Projection
Another solution will be to use Service Account Token Volume Projection combined with a kubeconfig
referencing a token file (see example below).
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: <CA-DATA>
server: https://default.kubernetes.svc.cluster.local
name: garden
contexts:
- context:
cluster: garden
user: garden
name: garden
current-context: garden
users:
- name: garden
user:
tokenFile: /var/run/secrets/projected/serviceaccount/token
This will allow for automatic rotation of the service account token by the kubelet
. The configuration can be achieved by setting both .Values.global.serviceAccountTokenVolumeProjection.enabled: true
and .Values.global.kubeconfig
in the respective chart’s values.yaml
file.
Virtual Garden is used, i.e., the runtime
Garden cluster is different from the target
Garden cluster.
Service Account
The easiest way to setup the authentication will be to create a service account and the respective roles will be bound to this service account in the target
cluster. Then use the generated service account token and craft a kubeconfig
which will be used by the workload in the runtime
cluster. This approach does not provide a solution for the rotation of the service account token. However, this setup can be achieved by setting .Values.global.virtualGarden.enabled: true
and following these steps:
- Deploy the
application
part of the charts in the target
cluster. - Get the service account token and craft the
kubeconfig
. - Set the crafted
kubeconfig
and deploy the runtime
part of the charts in the runtime
cluster.
Client Certificate
Another solution will be to bind the roles in the target
cluster to a User
subject instead of a service account and use a client certificate for authentication. This approach does not provide a solution for the client certificate rotation. However, this setup can be achieved by setting both .Values.global.virtualGarden.enabled: true
and .Values.global.virtualGarden.user.name
, then following these steps:
- Generate a client certificate for the
target
cluster for the respective user. - Deploy the
application
part of the charts in the target
cluster. - Craft a
kubeconfig
using the already generated client certificate. - Set the crafted
kubeconfig
and deploy the runtime
part of the charts in the runtime
cluster.
Projected Service Account Token
This approach requires an already deployed and configured oidc-webhook-authenticator for the target
cluster. Also the runtime
cluster should be registered as a trusted identity provider in the target
cluster. Then projected service accounts tokens from the runtime
cluster can be used to authenticate against the target
cluster. The needed steps are as follows:
- Deploy OWA and establish the needed trust.
- Set
.Values.global.virtualGarden.enabled: true
and .Values.global.virtualGarden.user.name
. Note: username value will depend on the trust configuration, e.g., <prefix>:system:serviceaccount:<namespace>:<serviceaccount>
- Set
.Values.global.serviceAccountTokenVolumeProjection.enabled: true
and .Values.global.serviceAccountTokenVolumeProjection.audience
. Note: audience value will depend on the trust configuration, e.g., <cliend-id-from-trust-config>
. - Craft a kubeconfig (see example below).
- Deploy the
application
part of the charts in the target
cluster. - Deploy the
runtime
part of the charts in the runtime
cluster.
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: <CA-DATA>
server: https://virtual-garden.api
name: virtual-garden
contexts:
- context:
cluster: virtual-garden
user: virtual-garden
name: virtual-garden
current-context: virtual-garden
users:
- name: virtual-garden
user:
tokenFile: /var/run/secrets/projected/serviceaccount/token
2 - Operations
Using the Calico networking extension with Gardener as operator
This document explains configuration options supported by the networking-calico extension.
Run calico-node in non-privileged and non-root mode
Feature State: Alpha
Motivation
Running containers in privileged mode is not recommended as privileged containers run with all linux capabilities enabled and can access the host’s resources. Running containers in privileged mode opens number of security threats such as breakout to underlying host OS.
Support for non-privileged and non-root mode
The Calico project has a preliminary support for running the calico-node component in non-privileged mode (see this guide). Similar to Tigera Calico operator the networking-calico extension can also run calico-node in non-privileged and non-root mode. This feature is controller via feature gate named NonPrivilegedCalicoNode
. The feature gates are configured in the ControllerConfiguration of networking-calico. The corresponding ControllerDeployment configuration that enables the NonPrivilegedCalicoNode
would look like:
apiVersion: core.gardener.cloud/v1beta1
kind: ControllerDeployment
metadata:
name: networking-calico
type: helm
providerConfig:
values:
chart: <omitted>
config:
featureGates:
NonPrivilegedCalicoNode: false
Limitations
- The support for the non-privileged mode in the Calico project is not ready for productive usage. The upstream documentation states that in non-privileged mode the support for features added after Calico v3.21 is not guaranteed.
- Calico in non-privileged mode does not support eBPF dataplane. That’s why when eBPF dataplane is enabled, calico-node has to run in privileged mode (even when the
NonPrivilegedCalicoNode
feature gate is enabled). - (At the time of writing this guide) there is the following issue projectcalico/calico#5348 that is not addressed.
- (At the time of writing this guide) the upstream adoptions seems to be low. The Calico charts and manifest in projectcalico/calico run calico-node in privileged mode.
3 - Shoot Overlay Network
Enable / disable overlay network for shoots with Calico
Gardener can be used with or without the overlay network.
Starting versions:
The default configuration of shoot clusters is without overlay network.
Understanding overlay network
The Overlay networking permits the routing of packets between multiples pods located on multiple nodes, even if the pod and the node network are not the same.
This is done through the encapsulation of pod packets in the node network so that the routing can be done as usual. We use ipip
encapsulation with calico in case the overlay network is enabled. This (simply put) sends an IP packet as workload in another IP packet.
In order to simplify the troubleshooting of problems and reduce the latency of packets traveling between nodes, the overlay network is disabled by default as stated above for all new clusters.
This means that the routing is done directly through the VPC routing table. Basically, when a new node is created, it is assigned a slice (usually a /24) of the pod network. All future pods in that node are going to be in this slice. Then, the cloud-controller-manager updates the cloud provider router to add the new route (all packets within the network slice as destination should go to that node).
This has the advantage of:
- Doing less work for the node as encapsulation takes some CPU cycles.
- The maximum transmission unit (MTU) is slightly bigger resulting in slightly better performance, i.e. potentially more workload bytes per packet.
- More direct and simpler setup, which makes the problems much easier to troubleshoot.
In the case where multiple shoots are in the same VPC and the overlay network is disabled, if the pod’s network is not configured properly, there is a very strong chance that some pod IP address might overlap, which is going to cause all sorts of funny problems. So, if someone asks you how to avoid that, they need to make sure that the podCIDRs for each shoot do not overlap with each other.
Enabling the overlay network
In certain cases, the overlay network might be preferable if, for example, the customer wants to create multiple clusters in the same VPC without ensuring there’s no overlap between the pod networks.
To enable the overlay network, add the following to the shoot’s YAML:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
...
spec:
...
networking:
type: calico
providerConfig:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
overlay:
enabled: true
...
Disabling the overlay network
Inversely, here is how to disable the overlay network:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
...
spec:
...
networking:
type: calico
providerConfig:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
overlay:
enabled: false
...
How to know if a cluster is using overlay or not?
You can look at any of the old nodes. If there are tunl0
devices at least at some point in time the overlay network was used.
Another way is to look into the Network object in the shoot’s control plane namespace on the seed (see example above).
Do we have some documentation somewhere on how to do the migration?
No, not yet. The migration from no overlay to overlay is fairly simply by just setting the configuration as specified above. The other way is more complicated as the Network configuration needs to be changed AND the local routes need to be cleaned.
Unfortunately, the change will be rolled out slowly (one calico-node at a time). Hence, it implies some network outages during the migration.
AWS implementation
On AWS, it is not possible to use the cloud-controller-manager for managing the routes as it does not support multiple route tables, which Gardener creates. Therefore, a custom controller is created to manage the routes.
4 - Usage
Using the Networking Calico extension with Gardener as end-user
The core.gardener.cloud/v1beta1.Shoot
resource declares a networking
field that is meant to contain network-specific configuration.
In this document we are describing how this configuration looks like for Calico and provide an example Shoot
manifest with minimal configuration that you can use to create a cluster.
Calico Typha
Calico Typha is an optional component of Project Calico designed to offload the Kubernetes API server. The Typha daemon sits between the datastore (such as the Kubernetes API server which is the one used by Gardener managed Kubernetes) and many instances of Felix. Typha’s main purpose is to increase scale by reducing each node’s impact on the datastore. You can opt-out Typha via .spec.networking.providerConfig.typha.enabled=false
of your Shoot manifest. By default the Typha is enabled.
EBPF Dataplane
Calico can be run in ebpf dataplane mode. This has several benefits, calico scales to higher troughput, uses less cpu per GBit and has native support for kubernetes services (without needing kube-proxy).
To switch to a pure ebpf dataplane it is recommended to run without an overlay network. The following configuration can be used to run without an overlay and without kube-proxy.
An example ebpf dataplane NetworkingConfig
manifest:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
ebpfDataplane:
enabled: true
overlay:
enabled: false
To disable kube-proxy set the enabled field to false in the shoot manifest.
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: ebpf-shoot
namespace: garden-dev
spec:
kubernetes:
kubeProxy:
enabled: false
Know limitations of the EBPF Dataplane
Please note that the default settings for calico’s ebpf dataplane may interfere with
accelerated networking in azure
rendering nodes with accelerated networking unusable in the network. The reason for this is that calico does not ignore
the accelerated networking interface enP...
as it should, but applies its ebpf programs to it. A simple mitigation for
this is to adapt the FelixConfiguration
default
and ensure that the bpfDataIfacePattern
does not include enP...
.
Per default bpfDataIfacePattern
is not set. The default value for this option can be found
here.
For example, you could apply the following change:
$ kubectl edit felixconfiguration default
...
apiVersion: crd.projectcalico.org/v1
kind: FelixConfiguration
metadata:
...
name: default
...
spec:
bpfDataIfacePattern: ^((en|wl|ww|sl|ib)[opsx].*|(eth|wlan|wwan).*|tunl0$|vxlan.calico$|wireguard.cali$|wg-v6.cali$)
...
AutoScaling
Autoscaling defines how the calico components are automatically scaled. It allows to use either static resource assignment, vertical pod or cluster-proportional autoscaler (default: cluster-proportional).
The cluster-proportional autoscaling mode is preferable when conditions require minimal disturbances and vpa mode for improved cluster resource utilization. Static resource assignments causes no disruptions due to autoscaling, but has no dynamics to handle changing demands.
Please note VPA must be enabled on the shoot as a pre-requisite to enabling vpa mode.
An example NetworkingConfig
manifest for vertical pod autoscaling:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
autoScaling:
mode: "vpa"
An example NetworkingConfig
manifest for static resource assignment:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
autoScaling:
mode: "static"
resources:
node:
cpu: 100m
memory: 100Mi
typha:
cpu: 100m
memory: 100Mi
ℹ️ Please note that in static mode, you have the option to configure the resource requests for calico-node and calico-typha. If not specified, default settings will be used.
If the resource requests are chosen too low, it might impact the stability/performance of the cluster.
Specifying the resource requests for any other autoscaling mode has no effect.
Example NetworkingConfig
manifest
An example NetworkingConfig
for the Calico extension looks as follows:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
ipam:
type: host-local
cidr: usePodCIDR
vethMTU: 1440
typha:
enabled: true
overlay:
enabled: true
autoScaling:
mode: "vpa"
Example Shoot
manifest
Please find below an example Shoot
manifest with calico networking configratations:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
name: johndoe-azure
namespace: garden-dev
spec:
cloudProfileName: azure
region: westeurope
secretBindingName: core-azure
provider:
type: azure
infrastructureConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: InfrastructureConfig
networks:
vnet:
cidr: 10.250.0.0/16
workers: 10.250.0.0/19
zoned: true
controlPlaneConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: ControlPlaneConfig
workers:
- name: worker-xoluy
machine:
type: Standard_D4_v3
minimum: 2
maximum: 2
volume:
size: 50Gi
type: Standard_LRS
zones:
- "1"
- "2"
networking:
type: calico
nodes: 10.250.0.0/16
providerConfig:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
ipam:
type: host-local
vethMTU: 1440
overlay:
enabled: true
typha:
enabled: false
kubernetes:
version: 1.28.3
maintenance:
autoUpdate:
kubernetesVersion: true
machineImageVersion: true
addons:
kubernetesDashboard:
enabled: true
nginxIngress:
enabled: true
Known Limitations in conjunction with NodeLocalDNS
If NodeLocalDNS
is active in a shoot cluster, which uses calico as CNI without overlay network, it may be impossible to block DNS traffic to the cluster DNS server via network policy. This is due to FELIX_CHAININSERTMODE
being set to APPEND
instead of INSERT
in case SNAT is being applied to requests to the infrastructure DNS server. In this scenario the iptables
rules of NodeLocalDNS
already accept the traffic before the network policies are checked.
This only applies to traffic directed to NodeLocalDNS
. If blocking of all DNS traffic is desired via network policy the pod dnsPolicy
should be changed to Default
so that the cluster DNS is not used. Alternatives are usage of overlay network or disabling of NodeLocalDNS
.