This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

2024

Gardener's Registry Cache Extension: Another Cost Saving Win and More

Monday, April 22, 2024

Use Cases

In Kubernetes, on every Node the container runtime daemon pulls the container images that are configured in the Pods’ specifications running on the corresponding Node. Although these container images are cached on the Node’s file system after the initial pull operation, there are imperfections with this setup.

New Nodes are often created due to events such as auto-scaling (scale up), rolling updates, or replacements of unhealthy Nodes. A new Node would need to pull the images running on it from the container registry because the Node’s cache is initially empty. Pulling an image from a registry incurs network traffic and registry costs.

To reduce network traffic and registry costs for your Shoot cluster, it is recommended to enable the Gardener’s Registry Cache extension to run a registry as pull-through cache in the Shoot cluster.

The use cases of using a pull-through cache are not only limited to cost savings. Using a pull-through cache makes the Kubernetes cluster resilient to failures with the upstream registry - outages, failures due to rate limiting.

Solution

Gardener’s Registry Cache extension deploys and manages a pull-through cache registry in the Shoot cluster.

A pull-through cache registry is a registry that caches container images in its storage. The first time when an image is requested from the pull-through cache, it pulls the image from the upstream registry, returns it to the client, and stores it in its local storage. On subsequent requests for the same image, the pull-through cache serves the image from its storage, avoiding network traffic to the upstream registry.

Imagine that you have a DaemonSet in your Kubernetes cluster. In a cluster without a pull-through cache, every Node must pull the same container image from the upstream registry. In a cluster with a pull-through cache, the image is pulled once from the upstream registry and served later for all Nodes.

A Shoot cluster setup with a registry cache for Docker Hub (docker.io)

A Shoot cluster setup with a registry cache for Docker Hub (docker.io).

Cost Considerations

An image pull represents ingress traffic for a virtual machine (data is entering to the system from outside) and egress traffic for the upstream registry (data is leaving the system).

Ingress traffic from the internet to a virtual machine is free of charge on AWS, GCP, and Azure. However, the cloud providers charge NAT gateway costs for inbound and outbound data processed by the NAT gateway based on the processed data volume (per GB). The container registry offerings on the cloud providers charge for egress traffic - again, based on the data volume (per GB).

Having all of this in mind, the Registry Cache extension reduces NAT gateway costs for the Shoot cluster and container registry costs.

Try It Out!

We would also like to encourage you to try it! As a Gardener user, you can also reduce your infrastructure costs and increase resilience by enabling the Registry Cache for your Shoot clusters. The Registry Cache extension is a great fit for long running Shoot clusters that have high image pull rate.

For more information, refer to the Registry Cache extension documentation!

SpinKube on Gardener - Serverless WASM on Kubernetes

Thursday, April 18, 2024

With the rising popularity of WebAssembly (WASM) and WebAssembly System Interface (WASI) comes a variety of integration possibilities. WASM is now not only suitable for the browser, but can be also utilized for running workloads on the server. In this post we will explore how you can get started writing serverless applications powered by SpinKube on a Gardener Shoot cluster. This post is inspired by a similar tutorial that goes through the steps of Deploying the Spin Operator on Azure Kubernetes Service. Keep in mind that this post does not aim to define a production environment. It is meant to show that Gardener Shoot clusters are able to run WebAssembly workloads, giving users the chance to experiment and explore this cutting-edge technology.

Prerequisites

kubectl - the Kubernetes command line tool
helm - the package manager for Kubernetes
A running Gardener Shoot cluster

Gardener Shoot Cluster

For this showcase I am using a Gardener Shoot cluster on AWS infrastructure with nodes powered by Garden Linux, although the steps should be applicable for other infrastructures as well, since Gardener aims to provide a homogenous Kubernetes experience.

As a prerequisite for next steps, verify that you have access to your Gardener Shoot cluster.

# Verify the access to the Gardener Shoot cluster
kubectl get ns

NAME              STATUS   AGE
default           Active   4m1s
kube-node-lease   Active   4m1s
kube-public       Active   4m1s
kube-system       Active   4m1s

If you are having troubles accessing the Gardener Shoot cluster, please consult the Accessing Shoot Clusters documentation page.

Deploy the Spin Operator

As a first step, we will install the Spin Operator Custom Resource Definitions and the Runtime Class needed by wasmtime-spin-v2.

# Install Spin Operator CRDs
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.1.0/spin-operator.crds.yaml

# Install the Runtime Class
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.1.0/spin-operator.runtime-class.yaml

Next, we will install cert-manager, which is required for provisioning TLS certificates used by the admission webhook of the Spin Operator. If you face issues installing cert-manager, please consult the cert-manager installation documentation.

# Add and update the Jetstack repository
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Install the cert-manager chart alongside with CRDs needed by cert-manager
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.14.4 \
  --set installCRDs=true

In order to install the containerd-wasm-shim on the Kubernetes nodes we will use the kwasm-operator. There is also a successor of kwasm-operator - runtime-class-manager which aims to address some of the limitations of kwasm-operator and provide a production grade implementation for deploying containerd shims on Kubernetes nodes. Since kwasm-operator is easier to install, for the purpose of this post we will use it instead of the runtime-class-manager.

# Add the kwasm helm repository
helm repo add kwasm http://kwasm.sh/kwasm-operator/
helm repo update

# Install KWasm operator
helm install \
  kwasm-operator kwasm/kwasm-operator \
  --namespace kwasm \
  --create-namespace \
  --set kwasmOperator.installerImage=ghcr.io/spinkube/containerd-shim-spin/node-installer:v0.13.1

# Annotate all nodes in the cluster so kwasm can select them and provision the required containerd shim
kubectl annotate node --all kwasm.sh/kwasm-node=true

We can see that a pod has started and completed in the kwasm namespace.

kubectl -n kwasm get pod

NAME                                                              READY   STATUS      RESTARTS   AGE
ip-10-180-7-60.eu-west-1.compute.internal-provision-kwasm-qhr8r   0/1     Completed   0          8s
kwasm-operator-6c76c5f94b-8zt4s                                   1/1     Running     0          15s

The logs of the kwasm-operator also indicate that the node was provisioned with the required shim.

kubectl -n kwasm logs kwasm-operator-6c76c5f94b-8zt4s

{"level":"info","node":"ip-10-180-7-60.eu-west-1.compute.internal","time":"2024-04-18T05:44:25Z","message":"Trying to Deploy on ip-10-180-7-60.eu-west-1.compute.internal"}
{"level":"info","time":"2024-04-18T05:44:31Z","message":"Job ip-10-180-7-60.eu-west-1.compute.internal-provision-kwasm is still Ongoing"}
{"level":"info","time":"2024-04-18T05:44:31Z","message":"Job ip-10-180-7-60.eu-west-1.compute.internal-provision-kwasm is Completed. Happy WASMing"}

Finally we can deploy the spin-operator alongside with a shim-executor.

helm install spin-operator \
  --namespace spin-operator \
  --create-namespace \
  --version 0.1.0 \
  --wait \
  oci://ghcr.io/spinkube/charts/spin-operator

kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.1.0/spin-operator.shim-executor.yaml

Deploy a Spin App

Let’s deploy a sample Spin application using the following command:

kubectl apply -f https://raw.githubusercontent.com/spinkube/spin-operator/main/config/samples/simple.yaml

After the CRD has been picked up by the spin-operator, a pod will be created running the sample application. Let’s explore its logs.

kubectl logs simple-spinapp-56687588d9-nbrtq

Serving http://0.0.0.0:80
Available Routes:
  hello: http://0.0.0.0:80/hello
  go-hello: http://0.0.0.0:80/go-hello

We can see the available routes served by the application. Let’s port forward to the application service and test them out.

kubectl port-forward services/simple-spinapp 8000:80

Forwarding from 127.0.0.1:8000 -> 80
Forwarding from [::1]:8000 -> 80

In another terminal, we can verify that the application returns a response.

curl http://localhost:8000/hello

Hello world from Spin!%

This sets the ground for further experimentation and testing. What the SpinApp CRD provides as capabilities and API can be explored through the SpinApp CRD reference.

Cleanup

Let’s clean all deployed resources so far.

# Delete the spin app and its executor
kubectl delete spinapp simple-spinapp
kubectl delete spinappexecutors.core.spinoperator.dev containerd-shim-spin

# Uninstall the spin-operator chart
helm -n spin-operator uninstall spin-operator

# Remove the kwasm.sh/kwasm-node annotation from nodes
kubectl annotate node --all kwasm.sh/kwasm-node-

# Uninstall the kwasm-operator chart
helm -n kwasm uninstall kwasm-operator

# Uninstall the cert-manager chart
helm -n cert-manager uninstall cert-manager

# Delete the runtime class and SpinApp CRDs
kubectl delete runtimeclass wasmtime-spin-v2
kubectl delete crd spinappexecutors.core.spinoperator.dev
kubectl delete crd spinapps.core.spinoperator.dev

Conclusion

In my opinion, WASM on the server is here to stay. Communities are expressing more and more interest in integrating Kubernetes with WASM workloads. As shown Gardener clusters are perfectly capable of supporting this use case. This setup is a great way to start exploring the capabilities that WASM can bring to the server. As stated in the introduction, bear in mind that this post does not define a production environment, but is rather meant to define a playground suitable for exploring and trying out ideas.

KubeCon / CloudNativeCon Europe 2024 Highlights

Friday, April 05, 2024

KubeCon EU 2024 Keynote Room

KubeCon + CloudNativeCon Europe 2024, recently held in Paris, was a testament to the robustness of the open-source community and its pivotal role in driving advancements in AI and cloud-native technologies. With a record attendance of over +12,000 participants, the conference underscored the ubiquity of cloud-native architectures and the business opportunities they provide.

AI Everywhere

LLMs and GenAI took center stage at the event, with discussions on challenges such as security, data management, and energy consumption. A popular quote stated, “If #inference is the new web application, #kubernetes is the new web server”. The conference emphasized the need for more open data models for AI to democratize the technology. Cloud-native platforms offer advantages for AI innovation, such as packaging models and dependencies as Docker packages and enhancing resource management for proper model execution. The community is exploring AI workload management, including using CPUs for inferencing and preprocessing data before handing it over to GPUs. CNCF took the initiative and put together an AI whitepaper outlining the apparent synergy between cloud-native technologies and AI.

Cluster Autopilot

The conference showcased popular projects in the cloud-native ecosystem, including Kubernetes, Istio, and OpenTelemetry. Kubernetes was highlighted as a platform for running massive AI workloads. The UXL Foundation aims to enable multi-vendor AI workloads on Kubernetes, allowing developers to move AI workloads without being locked into a specific infrastructure. Every vendor we interacted with has assembled an AI-powered chatbot, which performs various functions – from assessing cluster health through analyzing cost efficiency and proposing workload optimizations to troubleshooting issues and alerting for potential challenges with upcoming Kubernetes version upgrades. Sysdig went even further with a chatbot, which answers the popular question, “Do any of my products have critical CVEs in production?” and analyzes workloads’ structure and configuration. Some chatbots leveraged the k8sgpt project, which joined the CNCF sandbox earlier this year.

Sophisticated Fleet Management

The ecosystem showcased maturity in observability, platform engineering, security, and optimization, which will help operationalize AI workloads. Data demands and costs were also in focus, touching on data observability and cloud-cost management. Cloud-native technologies, also going beyond Kubernetes, are expected to play a crucial role in managing the increasing volume of data and scaling AI. Google showcased fleet management in their Google Hosted Cloud offering (ex-Anthos). It allows for defining teams and policies at the fleet level, later applied to all the Kubernetes clusters in the fleet, irrespective of the infrastructure they run on (GCP and beyond).

WASM Everywhere

The conference also highlighted the growing interest in WebAssembly (WASM) as a portable binary instruction format for executable programs and its integration with Kubernetes and other functions. The topic here started with a dedicated WASM pre-conference day, the sessions of which are available in the following playlist. WASM is positioned as the smoother approach to software distribution and modularity, providing more lightweight runtime execution options and an easier way for app developers to enter.

Rust on the Rise

Several talks were promoting Rust as an ideal programming language for cloud-native workloads. It was even promoted as suitable for writing Kubernetes controllers.

Internal Developer Platforms

The event showcased the importance of Internal Developer Platforms (IDPs), both commercial and open-source, in facilitating the development process across all types of organizations – from Allianz to Mercedes. Backstage leads the pack by a large margin, with all relevant sessions being full or at capacity. Much effort goes into the modularization of Backstage, which was also a notable highlight at the conference.

Sustainability

Sustainability was a key theme, with discussions on the role of cloud-native technologies in promoting green practices. The KubeCost application folks put a lot of effort into emphasizing the large amount of wasted money, which hyperscalers benefit from. In parallel – the kube-green project emphasized optimizing your cluster footprint to minimize CO2 emissions. The conference also highlighted the importance of open source in creating a level playing field for multiple players to compete, fostering diverse participation, and solving global challenges.

Customer Stories

In contrast to the Chicago KubeCon in 2023, the one in Paris outlined multiple case studies, best practices, and reference scenarios. Many enterprises and their IT teams were well represented at KubeCon - regarding sessions, sponsorships, and participation. These companies strive to excel forward, reaping the efficiency and flexibility benefits cloud-native architectures provide. We came across multiple companies using Gardener as their Kubernetes management underlay – including FUGA Cloud, StackIT, and metal-stack Cloud. We eagerly anticipate more companies embracing Gardener at future events. The consistent feedback from these companies has been overwhelmingly positive—they absolutely love using Gardener and our shared excitement grows as the community thrives!

Notable Talks

Notable talks from leaders in the cloud-native world, including Solomon Hykes, Bob Wise, and representatives from KCP for Platforms and the United Nations, provided valuable insights into the future of AI and cloud-native technologies. All the talks are now uploaded to YouTube in the following playlist. Those do not include the various pre-conference days, available as separate playlists by CNCF.

In Conclusion…

In conclusion, KubeCon 2024 showcased the intersection of AI and cloud-native technologies, the open-source community’s growth, and the cloud-native ecosystem’s maturity. Many enterprises are actively engaged there, innovating, trying, and growing their internal expertise. They’re using KubeCon as a recruiting event, expanding their internal talent pool and taking more of their internal operations and processes into their own hands. The event served as a platform for global collaboration, cross-company alignments, innovation, and the exchange of ideas, setting the stage for the future of cloud-native computing.