This is the multi-page printable view of this section. Click here to print.
2024
Innovation Unleashed: A Deep Dive into the 5th Gardener Community Hackathon
The Gardener community recently concluded its 5th Hackathon, a week-long event that brought together multiple companies to collaborate on common topics of interest. The Hackathon, held at Schlosshof Freizeitheim in Schelklingen, Germany, was a testament to the power of collective effort and open-source, producing a tremendous number of results in a short time and moving the Gardener project forward with innovative solutions.
A Week of Collaboration and Innovation
The Hackathon addressed a wide range of topics, from improving the maturity of the Gardener API to harmonizing development setups and automating additional preparation tasks for Gardener installations. The event also saw the introduction of new resources and configurations, the rewriting of VPN components from Bash to Golang, and the exploration of a Tailscale-based VPN to secure shoot clusters.
Key Achievements
- ๐๏ธ OCI Helm Release Reference for ControllerDeployment: The Hackathon introduced the
core.gardener.cloud/v1
API, which supports OCI repository-based Helm chart references. This innovation reduces operational complexity and enables reusability for other scenarios. - ๐จ๐ผโ๐ป Local
gardener-operator
Development Setup with gardenlet: A new Skaffold configuration was created to harmonize the development setups for Gardener. This configuration deploysgardener-operator
and itsGarden
CRD together with a deployment ofgardenlet
to register a seed cluster, allowing for a full-fledged Gardener setup. - ๐จ๐ปโ๐พ Extensions for Garden Cluster via
gardener-operator
: The Hackathon focused on automating additional preparation tasks for Gardener installations. TheGarden
controller was augmented to deploy extensions as part of its reconciliation flow, reducing operational complexity. - ๐ช Gardenlet Self-Upgrades for Unmanaged
Seed
s: A newGardenlet
resource was introduced, allowing for the specification of deployment values and component configurations. A new controller withingardenlet
watches these resources and updates thegardenlet
’s Helm chart and configuration accordingly, effectively implementing self-upgrades. - ๐ฆบ Type-Safe Configurability in
OperatingSystemConfig
: The Hackathon improved the configurability of theOperatingSystemConfig
forcontainerd
, DNS, NTP, etc. TheOperatingSystemConfig
API was augmented to supportcontainerd
-config related use-cases. - ๐ฎ Expose Shoot API Server in Tailscale VPN: The Hackathon explored the use of a Tailscale-based VPN to secure shoot clusters. A document was compiled explaining how shoot owners can expose their API server within a Tailscale VPN.
- โจ๏ธ Rewrite
gardener/vpn2
from Bash to Golang: The Hackathon improved the VPN components by rewriting them in Golang. All functionality was successfully rewritten, and the pull requests have been opened forgardener/vpn2
and the integration intogardener/gardener
. - ๐ณ๏ธ Pure IPv6-Based VPN Tunnel: The Hackathon addressed the restriction of the VPN network CIDR by switching the VPN tunnel to a pure IPv6-based network (follow-up of gardener/gardener#9597). This allows for more flexibility in network design.
- ๐ Harmonize Local VPN Setup with Real-World Scenario: The Hackathon aimed to align the local VPN setup with real-world scenarios regarding the VPN connection.
provider-local
was augmented to dynamically create Calico’sIPPool
resources to emulate the real-world’s networking situation. - ๐ Support Cilium
v1.15+
for HAShoot
s: The Hackathon addressed the issue ofCilium v1.15+
not consideringStatefulSet
labels inNetworkPolicy
s. A prototype was developed to make theService
resources forvpn-seed-server
headless. - ๐ Compression for
ManagedResource
Secret
s: The Hackathon focused on reducing the size ofSecret
related toManagedResource
s by leveraging the Brotli compression algorithm. This reduces network I/O and related costs, improving scalability and reducing load on the ETCD cluster. - ๐ Making Shoot Flux Extension Production-Ready: The Hackathon aimed to promote the Flux extension to “production-ready” status. Features such as reconciliation sync mode, and the option to provide additional
Secret
resources were added. - ๐งน Move
machine-controller-manager-provider-local
Repository into gardener/gardener: The Hackathon focused on moving themachine-controller-manager-provider-local
repository content into thegardener/gardener
repository. This simplifies maintenance and development tasks. - ๐๏ธ Stop Vendoring Third-Party Code in OS Extensions: The Hackathon aimed to avoid vendoring third-party code in the OS extensions. Two out of the four OS extensions have been adapted.
- ๐ฆ Consider Embedded Files for Local Image Builds: The Hackathon addressed the issue that changes to embedded files don’t lead to automatic rebuilds of the Gardener images by
Skaffold
for local development. The relatedhack
script was augmented to detect embedded files and make them part of the list of dependencies.
Note that a significant portion of the above topics have been built on top of the achievements of previous Hackathons.This continuity and progression of these Hackathons, with each one building on the achievements of the last, is a testament to the power of sustained collaborative effort.
Looking Ahead
As we look towards the future, the Gardener community is already gearing up for the next Hackathon slated for the end of 2024. The anticipation is palpable, as these events have consistently proven to be a hotbed of creativity, innovation, and collaboration. The 5th Gardener Community Hackathon has once again demonstrated the remarkable outcomes that can be achieved when diverse minds unite to work on shared interests. The event has not only yielded an impressive array of results in a short span but has also sparked innovations that promise to propel the Gardener project to new heights. The community eagerly awaits the next Hackathon, ready to tackle new challenges and continue the journey of innovation and growth.
Gardener's Registry Cache Extension: Another Cost Saving Win and More
Use Cases
In Kubernetes, on every Node the container runtime daemon pulls the container images that are configured in the Pods’ specifications running on the corresponding Node. Although these container images are cached on the Node’s file system after the initial pull operation, there are imperfections with this setup.
New Nodes are often created due to events such as auto-scaling (scale up), rolling updates, or replacements of unhealthy Nodes. A new Node would need to pull the images running on it from the container registry because the Node’s cache is initially empty. Pulling an image from a registry incurs network traffic and registry costs.
To reduce network traffic and registry costs for your Shoot cluster, it is recommended to enable the Gardener’s Registry Cache extension to run a registry as pull-through cache in the Shoot cluster.
The use cases of using a pull-through cache are not only limited to cost savings. Using a pull-through cache makes the Kubernetes cluster resilient to failures with the upstream registry - outages, failures due to rate limiting.
Solution
Gardener’s Registry Cache extension deploys and manages a pull-through cache registry in the Shoot cluster.
A pull-through cache registry is a registry that caches container images in its storage. The first time when an image is requested from the pull-through cache, it pulls the image from the upstream registry, returns it to the client, and stores it in its local storage. On subsequent requests for the same image, the pull-through cache serves the image from its storage, avoiding network traffic to the upstream registry.
Imagine that you have a DaemonSet in your Kubernetes cluster. In a cluster without a pull-through cache, every Node must pull the same container image from the upstream registry. In a cluster with a pull-through cache, the image is pulled once from the upstream registry and served later for all Nodes.
A Shoot cluster setup with a registry cache for Docker Hub (docker.io).
Cost Considerations
An image pull represents ingress traffic for a virtual machine (data is entering to the system from outside) and egress traffic for the upstream registry (data is leaving the system).
Ingress traffic from the internet to a virtual machine is free of charge on AWS, GCP, and Azure. However, the cloud providers charge NAT gateway costs for inbound and outbound data processed by the NAT gateway based on the processed data volume (per GB). The container registry offerings on the cloud providers charge for egress traffic - again, based on the data volume (per GB).
Having all of this in mind, the Registry Cache extension reduces NAT gateway costs for the Shoot cluster and container registry costs.
Try It Out!
We would also like to encourage you to try it! As a Gardener user, you can also reduce your infrastructure costs and increase resilience by enabling the Registry Cache for your Shoot clusters. The Registry Cache extension is a great fit for long running Shoot clusters that have high image pull rate.
For more information, refer to the Registry Cache extension documentation!
SpinKube on Gardener - Serverless WASM on Kubernetes
With the rising popularity of WebAssembly (WASM) and WebAssembly System Interface (WASI) comes a variety of integration possibilities. WASM is now not only suitable for the browser, but can be also utilized for running workloads on the server. In this post we will explore how you can get started writing serverless applications powered by SpinKube on a Gardener Shoot cluster. This post is inspired by a similar tutorial that goes through the steps of Deploying the Spin Operator on Azure Kubernetes Service. Keep in mind that this post does not aim to define a production environment. It is meant to show that Gardener Shoot clusters are able to run WebAssembly workloads, giving users the chance to experiment and explore this cutting-edge technology.
Prerequisites
- kubectl - the Kubernetes command line tool
- helm - the package manager for Kubernetes
- A running Gardener Shoot cluster
Gardener Shoot Cluster
For this showcase I am using a Gardener Shoot cluster on AWS infrastructure with nodes powered by Garden Linux, although the steps should be applicable for other infrastructures as well, since Gardener aims to provide a homogenous Kubernetes experience.
As a prerequisite for next steps, verify that you have access to your Gardener Shoot cluster.
# Verify the access to the Gardener Shoot cluster
kubectl get ns
NAME STATUS AGE
default Active 4m1s
kube-node-lease Active 4m1s
kube-public Active 4m1s
kube-system Active 4m1s
If you are having troubles accessing the Gardener Shoot cluster, please consult the Accessing Shoot Clusters documentation page.
Deploy the Spin Operator
As a first step, we will install the Spin Operator Custom Resource Definitions and the Runtime Class needed by wasmtime-spin-v2
.
# Install Spin Operator CRDs
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.1.0/spin-operator.crds.yaml
# Install the Runtime Class
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.1.0/spin-operator.runtime-class.yaml
Next, we will install cert-manager, which is required for provisioning TLS certificates used by the admission webhook of the Spin Operator. If you face issues installing cert-manager
, please consult the cert-manager installation documentation.
# Add and update the Jetstack repository
helm repo add jetstack https://charts.jetstack.io
helm repo update
# Install the cert-manager chart alongside with CRDs needed by cert-manager
helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.14.4 \
--set installCRDs=true
In order to install the containerd-wasm-shim
on the Kubernetes nodes we will use the kwasm-operator. There is also a successor of kwasm-operator
- runtime-class-manager which aims to address some of the limitations of kwasm-operator
and provide a production grade implementation for deploying containerd
shims on Kubernetes nodes. Since kwasm-operator
is easier to install, for the purpose of this post we will use it instead of the runtime-class-manager
.
# Add the kwasm helm repository
helm repo add kwasm http://kwasm.sh/kwasm-operator/
helm repo update
# Install KWasm operator
helm install \
kwasm-operator kwasm/kwasm-operator \
--namespace kwasm \
--create-namespace \
--set kwasmOperator.installerImage=ghcr.io/spinkube/containerd-shim-spin/node-installer:v0.13.1
# Annotate all nodes in the cluster so kwasm can select them and provision the required containerd shim
kubectl annotate node --all kwasm.sh/kwasm-node=true
We can see that a pod has started and completed in the kwasm
namespace.
kubectl -n kwasm get pod
NAME READY STATUS RESTARTS AGE
ip-10-180-7-60.eu-west-1.compute.internal-provision-kwasm-qhr8r 0/1 Completed 0 8s
kwasm-operator-6c76c5f94b-8zt4s 1/1 Running 0 15s
The logs of the kwasm-operator
also indicate that the node was provisioned with the required shim.
kubectl -n kwasm logs kwasm-operator-6c76c5f94b-8zt4s
{"level":"info","node":"ip-10-180-7-60.eu-west-1.compute.internal","time":"2024-04-18T05:44:25Z","message":"Trying to Deploy on ip-10-180-7-60.eu-west-1.compute.internal"}
{"level":"info","time":"2024-04-18T05:44:31Z","message":"Job ip-10-180-7-60.eu-west-1.compute.internal-provision-kwasm is still Ongoing"}
{"level":"info","time":"2024-04-18T05:44:31Z","message":"Job ip-10-180-7-60.eu-west-1.compute.internal-provision-kwasm is Completed. Happy WASMing"}
Finally we can deploy the spin-operator
alongside with a shim-executor.
helm install spin-operator \
--namespace spin-operator \
--create-namespace \
--version 0.1.0 \
--wait \
oci://ghcr.io/spinkube/charts/spin-operator
kubectl apply -f https://github.com/spinkube/spin-operator/releases/download/v0.1.0/spin-operator.shim-executor.yaml
Deploy a Spin App
Let’s deploy a sample Spin application using the following command:
kubectl apply -f https://raw.githubusercontent.com/spinkube/spin-operator/main/config/samples/simple.yaml
After the CRD has been picked up by the spin-operator
, a pod will be created running the sample application. Let’s explore its logs.
kubectl logs simple-spinapp-56687588d9-nbrtq
Serving http://0.0.0.0:80
Available Routes:
hello: http://0.0.0.0:80/hello
go-hello: http://0.0.0.0:80/go-hello
We can see the available routes served by the application. Let’s port forward to the application service
and test them out.
kubectl port-forward services/simple-spinapp 8000:80
Forwarding from 127.0.0.1:8000 -> 80
Forwarding from [::1]:8000 -> 80
In another terminal, we can verify that the application returns a response.
curl http://localhost:8000/hello
Hello world from Spin!%
This sets the ground for further experimentation and testing. What the SpinApp
CRD provides as capabilities and API can be explored through the SpinApp CRD reference.
Cleanup
Let’s clean all deployed resources so far.
# Delete the spin app and its executor
kubectl delete spinapp simple-spinapp
kubectl delete spinappexecutors.core.spinoperator.dev containerd-shim-spin
# Uninstall the spin-operator chart
helm -n spin-operator uninstall spin-operator
# Remove the kwasm.sh/kwasm-node annotation from nodes
kubectl annotate node --all kwasm.sh/kwasm-node-
# Uninstall the kwasm-operator chart
helm -n kwasm uninstall kwasm-operator
# Uninstall the cert-manager chart
helm -n cert-manager uninstall cert-manager
# Delete the runtime class and SpinApp CRDs
kubectl delete runtimeclass wasmtime-spin-v2
kubectl delete crd spinappexecutors.core.spinoperator.dev
kubectl delete crd spinapps.core.spinoperator.dev
Conclusion
In my opinion, WASM on the server is here to stay. Communities are expressing more and more interest in integrating Kubernetes with WASM workloads. As shown Gardener clusters are perfectly capable of supporting this use case. This setup is a great way to start exploring the capabilities that WASM can bring to the server. As stated in the introduction, bear in mind that this post does not define a production environment, but is rather meant to define a playground suitable for exploring and trying out ideas.
KubeCon / CloudNativeCon Europe 2024 Highlights
KubeCon + CloudNativeCon Europe 2024, recently held in Paris, was a testament to the robustness of the open-source community and its pivotal role in driving advancements in AI and cloud-native technologies. With a record attendance of over +12,000 participants, the conference underscored the ubiquity of cloud-native architectures and the business opportunities they provide.
AI Everywhere
LLMs and GenAI took center stage at the event, with discussions on challenges such as security, data management, and energy consumption. A popular quote stated, “If #inference is the new web application, #kubernetes is the new web server”. The conference emphasized the need for more open data models for AI to democratize the technology. Cloud-native platforms offer advantages for AI innovation, such as packaging models and dependencies as Docker packages and enhancing resource management for proper model execution. The community is exploring AI workload management, including using CPUs for inferencing and preprocessing data before handing it over to GPUs. CNCF took the initiative and put together an AI whitepaper outlining the apparent synergy between cloud-native technologies and AI.
Cluster Autopilot
The conference showcased popular projects in the cloud-native ecosystem, including Kubernetes, Istio, and OpenTelemetry. Kubernetes was highlighted as a platform for running massive AI workloads. The UXL Foundation aims to enable multi-vendor AI workloads on Kubernetes, allowing developers to move AI workloads without being locked into a specific infrastructure. Every vendor we interacted with has assembled an AI-powered chatbot, which performs various functions โ from assessing cluster health through analyzing cost efficiency and proposing workload optimizations to troubleshooting issues and alerting for potential challenges with upcoming Kubernetes version upgrades. Sysdig went even further with a chatbot, which answers the popular question, “Do any of my products have critical CVEs in production?” and analyzes workloads’ structure and configuration. Some chatbots leveraged the k8sgpt project, which joined the CNCF sandbox earlier this year.
Sophisticated Fleet Management
The ecosystem showcased maturity in observability, platform engineering, security, and optimization, which will help operationalize AI workloads. Data demands and costs were also in focus, touching on data observability and cloud-cost management. Cloud-native technologies, also going beyond Kubernetes, are expected to play a crucial role in managing the increasing volume of data and scaling AI. Google showcased fleet management in their Google Hosted Cloud offering (ex-Anthos). It allows for defining teams and policies at the fleet level, later applied to all the Kubernetes clusters in the fleet, irrespective of the infrastructure they run on (GCP and beyond).
WASM Everywhere
The conference also highlighted the growing interest in WebAssembly (WASM) as a portable binary instruction format for executable programs and its integration with Kubernetes and other functions. The topic here started with a dedicated WASM pre-conference day, the sessions of which are available in the following playlist. WASM is positioned as the smoother approach to software distribution and modularity, providing more lightweight runtime execution options and an easier way for app developers to enter.
Rust on the Rise
Several talks were promoting Rust as an ideal programming language for cloud-native workloads. It was even promoted as suitable for writing Kubernetes controllers.
Internal Developer Platforms
The event showcased the importance of Internal Developer Platforms (IDPs), both commercial and open-source, in facilitating the development process across all types of organizations โ from Allianz to Mercedes. Backstage leads the pack by a large margin, with all relevant sessions being full or at capacity. Much effort goes into the modularization of Backstage, which was also a notable highlight at the conference.
Sustainability
Sustainability was a key theme, with discussions on the role of cloud-native technologies in promoting green practices. The KubeCost application folks put a lot of effort into emphasizing the large amount of wasted money, which hyperscalers benefit from. In parallel โ the kube-green project emphasized optimizing your cluster footprint to minimize CO2 emissions. The conference also highlighted the importance of open source in creating a level playing field for multiple players to compete, fostering diverse participation, and solving global challenges.
Customer Stories
In contrast to the Chicago KubeCon in 2023, the one in Paris outlined multiple case studies, best practices, and reference scenarios. Many enterprises and their IT teams were well represented at KubeCon - regarding sessions, sponsorships, and participation. These companies strive to excel forward, reaping the efficiency and flexibility benefits cloud-native architectures provide. We came across multiple companies using Gardener as their Kubernetes management underlay โ including FUGA Cloud, STACKIT, and metal-stack Cloud. We eagerly anticipate more companies embracing Gardener at future events. The consistent feedback from these companies has been overwhelmingly positiveโthey absolutely love using Gardener and our shared excitement grows as the community thrives!
Notable Talks
Notable talks from leaders in the cloud-native world, including Solomon Hykes, Bob Wise, and representatives from KCP for Platforms and the United Nations, provided valuable insights into the future of AI and cloud-native technologies. All the talks are now uploaded to YouTube in the following playlist. Those do not include the various pre-conference days, available as separate playlists by CNCF.
In Conclusion…
In conclusion, KubeCon 2024 showcased the intersection of AI and cloud-native technologies, the open-source community’s growth, and the cloud-native ecosystem’s maturity. Many enterprises are actively engaged there, innovating, trying, and growing their internal expertise. They’re using KubeCon as a recruiting event, expanding their internal talent pool and taking more of their internal operations and processes into their own hands. The event served as a platform for global collaboration, cross-company alignments, innovation, and the exchange of ideas, setting the stage for the future of cloud-native computing.