This is the multi-page printable view of this section. Click here to print.
May
Fine-Tuning kube-proxy Readiness: Ensuring Accurate Health Checks During Node Scale-Down
Gardener has recently refined how it determines the readiness of kube-proxy
components within managed Kubernetes clusters. This adjustment leads to more accurate system health reporting, especially during node scale-down operations orchestrated by cluster-autoscaler
.
The Challenge: kube-proxy Readiness During Node Scale-Down
Previously, Gardener utilized kube-proxy
’s /healthz
endpoint for its readiness probe. While generally effective, this endpoint’s behavior changed in Kubernetes 1.28 (as part of KEP-3836 and implemented in kubernetes/kubernetes#116470). The /healthz
endpoint now reports kube-proxy
as unhealthy if its node is marked for deletion by cluster-autoscaler
(e.g., via a specific taint) or has a deletion timestamp.
This behavior is intended to help external load balancers (particularly those using externalTrafficPolicy: Cluster
on infrastructures like GCP) avoid sending new traffic to nodes that are about to be terminated. However, for Gardener’s internal system component health checks, this meant that kube-proxy
could appear unready for extended periods if node deletion was delayed due to PodDisruptionBudgets
or long terminationGracePeriodSeconds
. This could lead to misleading “unhealthy” states for the cluster’s system components.
The Solution: Aligning with Upstream kube-proxy Enhancements
To address this, Gardener now leverages the /livez
endpoint for kube-proxy
’s readiness probe in clusters running Kubernetes version 1.28 and newer. The /livez
endpoint, also introduced as part of the aforementioned kube-proxy
improvements, checks the actual liveness of the kube-proxy
process itself, without considering the node’s termination status.
For clusters running Kubernetes versions 1.27.x and older (where /livez
is not available), Gardener will continue to use the /healthz
endpoint for the readiness probe.
This change, detailed in gardener/gardener#12015, ensures that Gardener’s readiness check for kube-proxy
accurately reflects kube-proxy
’s operational status rather than the node’s lifecycle state. It’s important to note that this adjustment does not interfere with the goals of KEP-3836; cloud controller managers can still utilize the /healthz
endpoint for their load balancer health checks as intended.
Benefits for Gardener Operators
This enhancement brings a key benefit to Gardener operators:
- More Accurate System Health: The system components health check will no longer report
kube-proxy
as unhealthy simply because its node is being gracefully terminated bycluster-autoscaler
. This reduces false alarms and provides a clearer view of the cluster’s actual health. - Smoother Operations: Operations teams will experience fewer unnecessary alerts related to
kube-proxy
during routine scale-down events, allowing them to focus on genuine issues.
By adapting its kube-proxy
readiness checks, Gardener continues to refine its operational robustness, providing a more stable and predictable management experience.
Further Information
- GitHub Pull Request: gardener/gardener#12015
- Recording of the presentation segment: Watch on YouTube (starts at the relevant section)
- Upstream KEP: KEP-3836: Kube-proxy improved ingress connectivity reliability
- Upstream Kubernetes PR: kubernetes/kubernetes#116470
New in Gardener: Forceful Redeployment of gardenlets for Enhanced Operational Control
Gardener continues to enhance its operational capabilities, and a recent improvement introduces a much-requested feature for managing gardenlets: the ability to forcefully trigger their redeployment. This provides operators with greater control and a streamlined recovery path for specific scenarios.
The Standard gardenlet Lifecycle
gardenlets, crucial components in the Gardener architecture, are typically deployed into seed clusters. For setups utilizing the seedmanagement.gardener.cloud/v1alpha1.Gardenlet
resource, particularly in unmanaged seeds (those not backed by a shoot cluster and ManagedSeed
resource), the gardener-operator
handles the initial deployment of the gardenlet.
Once this initial deployment is complete, the gardenlet takes over its own lifecycle, leveraging a self-upgrade strategy to keep itself up-to-date. Under normal circumstances, the gardener-operator
does not intervene further after this initial phase.
When Things Go Awry: The Need for Intervention
While the self-upgrade mechanism is robust, certain situations can arise where a gardenlet might require a more direct intervention. For example:
- The gardenlet’s client certificate to the virtual garden cluster might have expired or become invalid.
- The gardenlet
Deployment
in the seed cluster might have been accidentally deleted or become corrupted.
In such cases, because the gardener-operator
’s responsibility typically ends after the initial deployment, the gardenlet might not be able to recover on its own, potentially leading to operational issues.
Empowering Operators: The Force-Redeploy Annotation
To address these challenges, Gardener now allows operators to instruct the gardener-operator
to forcefully redeploy a gardenlet. This is achieved by annotating the specific Gardenlet
resource with:
gardener.cloud/operation=force-redeploy
When this annotation is applied, it signals the gardener-operator
to re-initiate the deployment process for the targeted gardenlet, effectively overriding the usual hands-off approach after initial setup.
How It Works
The process for a forceful redeployment is straightforward:
- An operator identifies a gardenlet that requires redeployment due to issues like an expired certificate or a missing deployment.
- The operator applies the
gardener.cloud/operation=force-redeploy
annotation to the correspondingseedmanagement.gardener.cloud/v1alpha1.Gardenlet
resource in the virtual garden cluster. - Important: If the gardenlet is for a remote cluster and its kubeconfig
Secret
was previously removed (a standard cleanup step after initial deployment), thisSecret
must be recreated, and its reference (.spec.kubeconfigSecretRef
) must be re-added to theGardenlet
specification. - The
gardener-operator
detects the annotation and proceeds to redeploy the gardenlet, applying its configurations and charts anew. - Once the redeployment is successfully completed, the
gardener-operator
automatically removes thegardener.cloud/operation=force-redeploy
annotation from theGardenlet
resource. Similar to the initial deployment, it will also clean up the referenced kubeconfigSecret
and set.spec.kubeconfigSecretRef
tonil
if it was provided.
Benefits
This new feature offers significant advantages for Gardener operators:
- Enhanced Recovery: Provides a clear and reliable mechanism to recover gardenlets from specific critical failure states.
- Improved Operational Flexibility: Offers more direct control over the gardenlet lifecycle when exceptional circumstances demand it.
- Reduced Manual Effort: Streamlines the process of restoring a misbehaving gardenlet, minimizing potential downtime or complex manual recovery procedures.
This enhancement underscores Gardener’s commitment to operational excellence and responsiveness to the needs of its user community.
Dive Deeper
To learn more about this feature, you can explore the following resources:
- GitHub Pull Request: gardener/gardener#11972
- Official Documentation: Forceful Re-Deployment of gardenlets
- Community Meeting Recording (starts at the relevant segment): Gardener Review Meeting on YouTube
Streamlined Node Onboarding: Introducing `gardenadm token` and `gardenadm join`
Gardener continues to enhance its gardenadm
tool, simplifying the management of autonomous Shoot clusters. Recently, new functionalities have been introduced to streamline the process of adding worker nodes to these clusters: the gardenadm token
command suite and the corresponding gardenadm join
command. These additions offer a more convenient and Kubernetes-native experience for cluster expansion.
Managing Bootstrap Tokens with gardenadm token
A key aspect of securely joining nodes to a Kubernetes cluster is the use of bootstrap tokens. The new gardenadm token
command provides a set of subcommands to manage these tokens effectively within your autonomous Shoot cluster’s control plane node. This functionality is analogous to the familiar kubeadm token
commands.
The available subcommands include:
gardenadm token list
: Displays all current bootstrap tokens. You can also use the--with-token-secrets
flag to include the token secrets in the output for easier inspection.gardenadm token generate
: Generates a cryptographically random bootstrap token. This command only prints the token; it does not create it on the server.gardenadm token create [token]
: Creates a new bootstrap token on the server. If you provide a token (in the format[a-z0-9]{6}.[a-z0-9]{16}
), it will be used. If no token is supplied,gardenadm
will automatically generate a random one and create it.- A particularly helpful option for this command is
--print-join-command
. When used, instead of just outputting the token, it prints the completegardenadm join
command, ready to be copied and executed on the worker node you intend to join. You can also specify flags like--description
,--validity
, and--worker-pool-name
to customize the token and the generated join command.
- A particularly helpful option for this command is
gardenadm token delete <token-value...>
: Deletes one or more bootstrap tokens from the server. You can specify tokens by their ID, the full token string, or the name of the Kubernetes Secret storing the token (e.g.,bootstrap-token-<id>
).
These commands provide comprehensive control over the lifecycle of bootstrap tokens, enhancing security and operational ease.
Joining Worker Nodes with gardenadm join
Once a bootstrap token is created (ideally using gardenadm token create --print-join-command
on a control plane node), the new gardenadm join
command facilitates the process of adding a new worker node to the autonomous Shoot cluster.
The command is executed on the prospective worker machine and typically looks like this:
gardenadm join --bootstrap-token <token_id.token_secret> --ca-certificate <base64_encoded_ca_bundle> --gardener-node-agent-secret-name <os_config_secret_name> <control_plane_api_server_address>
Key parameters include:
--bootstrap-token
: The token obtained from thegardenadm token create
command.--ca-certificate
: The base64-encoded CA certificate bundle of the cluster’s API server.--gardener-node-agent-secret-name
: The name of the Secret in thekube-system
namespace of the control plane that contains the OperatingSystemConfig (OSC) for thegardener-node-agent
. This OSC dictates how the node should be configured.<control_plane_api_server_address>
: The address of the Kubernetes API server of the autonomous cluster.
Upon execution, gardenadm join
performs several actions:
- It discovers the Kubernetes version of the control plane using the provided bootstrap token and CA certificate.
- It checks if the
gardener-node-agent
has already been initialized on the machine. - If not already joined, it prepares the
gardener-node-init
configuration. This involves setting up a systemd service (gardener-node-init.service
) which, in turn, downloads and runs thegardener-node-agent
. - The
gardener-node-agent
then uses the bootstrap token to securely download its specific OperatingSystemConfig from the control plane. - Finally, it applies this configuration, setting up the kubelet and other necessary components, thereby officially joining the node to the cluster.
After the node has successfully joined, the bootstrap token used for the process will be automatically deleted by the kube-controller-manager
once it expires. However, it can also be manually deleted immediately using gardenadm token delete
on the control plane node for enhanced security.
These new gardenadm
commands significantly simplify the expansion of autonomous Shoot clusters, providing a robust and user-friendly mechanism for managing bootstrap tokens and joining worker nodes.
Further Information
Enhanced Network Flexibility: Gardener Now Supports CIDR Overlap for Non-HA Shoots
Gardener is continually evolving to offer greater flexibility and efficiency in managing Kubernetes clusters. A significant enhancement has been introduced that addresses a common networking challenge: the requirement for completely disjoint network CIDR blocks between a shoot cluster and its seed cluster. Now, Gardener allows for IPv4 network overlap in specific scenarios, providing users with more latitude in their network planning.
Addressing IP Address Constraints
Previously, all shoot cluster networks (pods, services, nodes) had to be distinct from the seed cluster’s networks. This could be challenging in environments with limited IP address space or complex network topologies. With this new feature, IPv4 or dual-stack shoot clusters can now define pod, service, and node networks that overlap with the IPv4 networks of their seed cluster.
How It Works: NAT for Seamless Connectivity
This capability is enabled through a double Network Address Translation (NAT) mechanism within the VPN connection established between the shoot and seed clusters. When IPv4 network overlap is configured, Gardener intelligently maps the overlapping shoot and seed networks to a dedicated set of newly reserved IPv4 ranges. These ranges are used exclusively within the VPN pods to ensure seamless communication, effectively resolving any conflicts that would arise from the overlapping IPs.
The reserved mapping ranges are:
241.0.0.0/8
: Seed Pod Mapping Range242.0.0.0/8
: Shoot Node Mapping Range243.0.0.0/8
: Shoot Service Mapping Range244.0.0.0/8
: Shoot Pod Mapping Range
Conditions for Utilizing Overlapping Networks
To leverage this new network flexibility, the following conditions must be met:
- Non-Highly-Available VPN: The shoot cluster must utilize a non-highly-available (non-HA) VPN. This is typically the configuration for shoots with a non-HA control plane.
- IPv4 or Dual-Stack Shoots: The shoot cluster must be configured as either single-stack IPv4 or dual-stack (IPv4/IPv6). The overlap feature specifically pertains to IPv4 networks.
- Non-Use of Reserved Ranges: The shoot cluster’s own defined networks (for pods, services, and nodes) must not utilize any of the Gardener-reserved IP ranges, including the newly introduced mapping ranges listed above, or the existing
240.0.0.0/8
range (Kube-ApiServer Mapping Range).
It’s important to note that Gardener will prevent the migration of a non-HA shoot to an HA setup if its network ranges currently overlap with the seed, as this feature is presently limited to non-HA VPN configurations. For single-stack IPv6 shoots, Gardener continues to enforce non-overlapping IPv6 networks to avoid any potential issues, although IPv6 address space exhaustion is less common.
Benefits for Gardener Users
This enhancement offers increased flexibility in IP address management, particularly beneficial for users operating numerous shoot clusters or those in environments with constrained IPv4 address availability. By relaxing the strict disjointedness requirement for non-HA shoots, Gardener simplifies network allocation and reduces the operational overhead associated with IP address planning.
Explore Further
To dive deeper into this feature, you can review the original pull request and the updated documentation:
- GitHub PR: feat: Allow CIDR overlap for non-HA VPN shoots (#11582)
- Gardener Documentation: Shoot Networking
- Developer Talk Recording: Gardener Development - Sprint Review #131
Enhanced Node Management: Introducing In-Place Updates in Gardener
Gardener is committed to providing efficient and flexible Kubernetes cluster management. Traditionally, updates to worker pool configurations, such as machine image or Kubernetes minor version changes, trigger a rolling update. This process involves replacing existing nodes with new ones, which is a robust approach for many scenarios. However, for environments with physical or bare-metal nodes, or stateful workloads sensitive to node replacement, or if the virtual machine type is scarce, this can introduce challenges like extended update times and potential disruptions.
To address these needs, Gardener now introduces In-Place Node Updates. This new capability allows certain updates to be applied directly to existing worker nodes without requiring their replacement, significantly reducing disruption and speeding up update processes for compatible changes.
New Update Strategies for Worker Pools
Gardener now supports three distinct update strategies for your worker pools, configurable via the updateStrategy
field in the Shoot
specification’s worker pool definition:
AutoRollingUpdate
: This is the classic and default strategy. When updates occur, nodes are cordoned, drained, terminated, and replaced with new nodes incorporating the changes.AutoInPlaceUpdate
: With this strategy, compatible updates are applied directly to the existing nodes. The MachineControllerManager (MCM) automatically selects nodes, cordons and drains them, and then signals the Gardener Node Agent (GNA) to perform the update. Once GNA confirms success, MCM uncordons the node.ManualInPlaceUpdate
: This strategy also applies updates directly to existing nodes but gives operators fine-grained control. After an update is specified, MCM marks all nodes in the pool as candidates. Operators must then manually label individual nodes to select them for the in-place update process, which then proceeds similarly to theAutoInPlaceUpdate
strategy.
The AutoInPlaceUpdate
and ManualInPlaceUpdate
strategies are available when the InPlaceNodeUpdates
feature gate is enabled in the gardener-apiserver
.
What Can Be Updated In-Place?
In-place updates are designed to handle a variety of common operational tasks more efficiently:
- Machine Image Updates: Newer versions of a machine image can be rolled out by executing an update command directly on the node, provided the image and cloud profile are configured to support this.
- Kubernetes Minor Version Updates: Updates to the Kubernetes minor version of worker nodes can be applied in-place.
- Kubelet Configuration Changes: Modifications to the Kubelet configuration can be applied directly.
- Credentials Rotation: Critical for security, rotation of Certificate Authorities (CAs) and ServiceAccount signing keys can now be performed on existing nodes without replacement.
However, some changes still necessitate a rolling update (node replacement):
- Changing the machine image name (e.g., switching from Ubuntu to Garden Linux).
- Modifying the machine type.
- Altering volume types or sizes.
- Changing the Container Runtime Interface (CRI) name (e.g., from Docker to containerd).
- Enabling or disabling node-local DNS.
Key API and Component Adaptations
Several Gardener components and APIs have been enhanced to support in-place updates:
- CloudProfile: The
CloudProfile
API now allows specifyinginPlaceUpdates
configuration withinmachineImage.versions
. This includes a booleansupported
field to indicate if a version supports in-place updates and an optionalminVersionForUpdate
string to define the minimum OS version from which an in-place update to the current version is permissible. - Shoot Specification: As mentioned, the
spec.provider.workers[].updateStrategy
field allows selection of the desired update strategy. Additionally,spec.provider.workers[].machineControllerManagerSettings
now includesmachineInPlaceUpdateTimeout
anddisableHealthTimeout
(which defaults totrue
for in-place strategies to prevent premature machine deletion during lengthy updates). ForManualInPlaceUpdate
,maxSurge
defaults to0
andmaxUnavailable
to1
. - OperatingSystemConfig (OSC): The OSC resource, managed by OS extensions, now includes
status.inPlaceUpdates.osUpdate
where extensions can specify thecommand
andargs
for the Gardener Node Agent to execute for machine image (Operating System) updates. Thespec.inPlaceUpdates
field in the OSC will carry information like the target Operating System version, Kubelet version, and credential rotation status to the node. - Gardener Node Agent (GNA): GNA is responsible for executing the in-place updates on the node. It watches for a specific node condition (
InPlaceUpdate
with reasonReadyForUpdate
) set by MCM, performs the OS update, Kubelet updates, or credentials rotation, restarts necessary pods (like DaemonSets), and then labels the node with the update outcome. - MachineControllerManager (MCM): MCM orchestrates the in-place update process. For in-place strategies, while new machine classes and machine sets are created to reflect the desired state, the actual machine objects are not deleted and recreated. Instead, their ownership is transferred to the new machine set. MCM handles cordoning, draining, and setting node conditions to coordinate with GNA.
- Shoot Status & Constraints: To provide visibility, the
status.inPlaceUpdates.pendingWorkerUpdates
field in theShoot
now lists worker pools pendingautoInPlaceUpdate
ormanualInPlaceUpdate
. A newShootManualInPlaceWorkersUpdated
constraint is added if any manual in-place updates are pending, ensuring users are aware. - Worker Status: The
Worker
extension resource now includesstatus.inPlaceUpdates.workerPoolToHashMap
to track the configuration hash of worker pools that have undergone in-place updates. This helps Gardener determine if a pool is up-to-date. - Forcing Updates: If an in-place update is stuck, the
gardener.cloud/operation=force-in-place-update
annotation can be added to the Shoot to allow subsequent changes or retries.
Benefits of In-Place Updates
- Reduced Disruption: Minimizes workload interruptions by avoiding full node replacements for compatible updates.
- Faster Updates: Applying changes directly can be quicker than provisioning new nodes, especially for OS patches or configuration changes.
- Bare-Metal Efficiency: Particularly beneficial for bare-metal environments where node provisioning is more time-consuming and complex.
- Stateful Workload Friendly: Lessens the impact on stateful applications that might be sensitive to node churn.
In-place node updates represent a significant step forward in Gardener’s operational flexibility, offering a more nuanced and efficient approach to managing node lifecycles, especially in demanding or specialized environments.
Dive Deeper
To explore the technical details and contributions that made this feature possible, refer to the following resources:
- Parent Issue for “[GEP-31] Support for In-Place Node Updates”: Issue #10219
- GEP-31: In-Place Node Updates of Shoot Clusters: GEP-31: In-Place Node Updates of Shoot Clusters
- Developer Talk Recording (starting at 39m37s): Youtube
Gardener Dashboard 1.80: Streamlined Credentials, Enhanced Cluster Views, and Real-Time Updates
Gardener Dashboard version 1.80 introduces several significant enhancements aimed at improving user experience, credentials management, and overall operational efficiency. These updates bring more clarity to credential handling, a smoother experience for managing large numbers of clusters, and a move towards a more reactive interface.
Unified and Enhanced Credentials Management
The management of secrets and credentials has been significantly revamped for better clarity and functionality:
- Introducing CredentialsBindings: The dashboard now fully supports
CredentialsBinding
resources alongside the existingSecretBinding
resources. This allows for referencing both Secrets and, in the future, Workload Identities more explicitly. WhileCredentialsBindings
referencing Workload Identity resources are visible for cluster creation, editing or deleting them via the dashboard is not yet supported. - “Credentials” Page: The former “Secrets” page has been renamed to “Credentials.” It features a new “Kind” column and distinct icons to clearly differentiate between
SecretBinding
andCredentialsBinding
types, especially useful when resources share names. The column showing the referenced credential resource name has been removed as this information is part of the binding’s details. - Contextual Information and Safeguards: When editing a secret, all its associated data is now displayed, providing better context. If an underlying secret is referenced by multiple bindings, a hint is shown to prevent unintended impacts. Deletion of a binding is prevented if the underlying secret is still in use by another binding.
- Simplified Creation and Editing: New secrets created via the dashboard will now automatically generate a
CredentialsBinding
. While existingSecretBindings
remain updatable, the creation of newSecretBindings
through the dashboard is no longer supported, encouraging the adoption of the more versatileCredentialsBinding
. The edit dialog for secrets now pre-fills current data, allowing for easier modification of specific fields. - Handling Missing Secrets: The UI now provides clear information and guidance if a
CredentialsBinding
orSecretBinding
references a secret that no longer exists.
Revamped Cluster List for Improved Scalability
Navigating and managing a large number of clusters is now more efficient:
- Virtual Scrolling: The cluster list has adopted virtual scrolling. Rows are rendered dynamically as you scroll, replacing the previous pagination system. This significantly improves performance and provides a smoother browsing experience, especially for environments with hundreds or thousands of clusters.
- Optimized Row Display: The height of individual rows in the cluster list has been reduced, allowing more clusters to be visible on the screen at once. Additionally, expandable content within a row (like worker details or ticket labels) now has a maximum height with internal scrolling, ensuring consistent row sizes and smooth virtual scrolling performance.
Real-Time Updates for Projects
The dashboard is becoming more dynamic with the introduction of real-time updates:
- Instant Project Changes: Modifications to projects, such as creation or deletion, are now reflected instantly in the project list and interface without requiring a page reload. This is achieved through WebSocket communication.
- Foundation for Future Reactivity: This enhancement for projects lays the groundwork for bringing real-time updates to other resources within the dashboard, such as Seeds and the Garden resource, in future releases.
Other Notable Enhancements
- Kubeconfig Update: The kubeconfig generated for garden cluster access via the “Account” page now uses the
--oidc-pkce-method
flag, replacing the deprecated--oidc-use-pkce
flag. Users encountering deprecation messages should redownload their kubeconfig. - Notification Behavior: Kubernetes warning notifications are now automatically dismissed after 5 seconds. However, all notifications will remain visible as long as the mouse cursor is hovering over them, giving users more time to read important messages.
- API Server URL Path: Support has been added for kubeconfigs that include a path in the API server URL.
These updates in Gardener Dashboard 1.80 collectively enhance usability, provide better control over credentials, and improve performance for large-scale operations.
For a comprehensive list of all features, bug fixes, and contributor acknowledgments, please refer to the official release notes. You can also view the segment of the community call discussing these dashboard updates here.
Gardener: Powering Enterprise Kubernetes at Scale and Europe's Sovereign Cloud Future
The Kubernetes ecosystem is dynamic, offering a wealth of tools to manage the complexities of modern cloud-native applications. For enterprises seeking to provision and manage Kubernetes clusters efficiently, securely, and at scale, a robust and comprehensive solution is paramount. Gardener, born from years of managing tens of thousands of clusters efficiently across diverse platforms and in demanding environments, stands out as a fully open-source choice for delivering fully managed Kubernetes Clusters as a Service. It already empowers organizations like SAP, STACKIT, T-Systems, and others (see adopters) and has become a core technology for NeoNephos, a project aimed at advancing digital autonomy in Europe (see KubeCon London 2025 Keynote and press announcement).
The Gardener Approach: An Architecture Forged by Experience
At the heart of Gardener’s architecture is the concept of “Kubeception” (see readme and architecture). This approach involves using Kubernetes to manage Kubernetes. Gardener runs on a Kubernetes cluster (called a runtime cluster), facilitates access through a self-managed node-less Kubernetes cluster (the garden cluster), manages Kubernetes control planes as pods within other self-managed Kubernetes clusters that provide high scalability (called seed clusters), and ultimately provisions end-user Kubernetes clusters (called shoot clusters).
This multi-layered architecture isn’t complexity for its own sake. Gardener’s design and extensive feature set are the product of over eight years of continuous development and refinement, directly shaped by the high-scale, security-sensitive, and enterprise-grade requirements of its users. Experience has shown that such a sophisticated structure is key to addressing significant challenges in scalability, security, and operational manageability. For instance:
- Scalability: Gardener achieves considerable scalability through its use of seed clusters, which it also manages. This allows for the distribution of control planes, preventing bottlenecks. The design even envisions leveraging Gardener to host its own management components (as an autonomous cluster), showcasing its resilience without risking circular dependencies.
- Security: A fundamental principle in Gardener is the strict isolation of control planes from data planes. This extends to Gardener itself, which runs in a dedicated management cluster but exposes its API to end-users through a workerless virtual cluster. This workerless cluster acts as an isolated access point, presenting no compute surface for potentially malicious pods, thereby significantly enhancing security.
- API Power & User Experience: Gardener utilizes the full capabilities of the Kubernetes API server. This enables advanced functionalities and sophisticated API change management. Crucially, for the end-user, interaction remains 100% Kubernetes-native. Users employ standard custom resources to instruct Gardener, meaning any tool, library, or language binding that supports Kubernetes CRDs inherently supports Gardener.
Delivering Fully Managed Kubernetes Clusters as a Service
Gardener provides a comprehensive “fully managed Kubernetes Clusters as a Service” offering. This means it handles much more than just spinning up a cluster; it manages the entire lifecycle and operational aspects. Here’s a glimpse into its capabilities:
Full Cluster Lifecycle Management:
- Infrastructure Provisioning: Gardener takes on the provisioning and management of underlying cloud infrastructure, including VPCs, subnets, NAT gateways, security groups, IAM roles, and virtual machines across a wide range of providers like AWS, Azure, GCP, OpenStack, and more.
- Worker Node Management: It meticulously manages worker pools, covering OS images, machine types, autoscaling configurations (min/max/surge), update strategies, volume management, CRI configuration, and provider-specific settings.
Enterprise Platform Governance:
- Cloud Profiles: Gardener is designed with the comprehensive needs of enterprise platform operators in mind. Managing a fleet of clusters for an organization requires more than just provisioning; it demands clear governance over available resources, versions, and their lifecycle. Gardener addresses this through its declarative API, allowing platform administrators to define and enforce policies such as which Kubernetes versions are “supported,” “preview,” or “deprecated,” along with their expiration dates. Similarly, it allows control over available machine images, their versions, and lifecycle status. This level of granular control and lifecycle management for the underlying components of a Kubernetes service is crucial for enterprise adoption and stable operations. This is a key consideration often left as an additional implementation burden for platform teams using other cluster provisioning tools, where such governance features must be built on top. Gardener, by contrast, integrates these concerns directly into its API and operational model, simplifying the task for platform operators.
Advanced Networking:
- CNI Plugin Management: Gardener manages the deployment and configuration of CNI plugins such as Calico or Cilium.
- Dual-Stack Networking: It offers comprehensive support for IPv4, IPv6, and dual-stack configurations for pods, services, and nodes.
- NodeLocal DNS Cache: To enhance DNS performance and reliability, Gardener can deploy and manage NodeLocal DNS.
Comprehensive Autoscaling:
- Cluster Autoscaler: Gardener manages the Cluster Autoscaler for worker nodes, enabling dynamic scaling based on pod scheduling demands.
- Horizontal and Vertical Pod Autoscaler (VPA): It manages HPA/VPA for workloads and applies it to control plane components, optimizing resource utilization (see blog).
Operational Excellence & Maintenance:
- Automated Kubernetes Upgrades: Gardener handles automated Kubernetes version upgrades for both control plane and worker nodes, with configurable maintenance windows.
- Automated OS Image Updates: It manages automated machine image updates for worker nodes.
- Cluster Hibernation: To optimize costs, Gardener supports hibernating clusters, scaling down components during inactivity.
- Scheduled Maintenance: It allows defining specific maintenance windows for predictability.
- Robust Credentials Rotation: Gardener features automated mechanisms for rotating all credentials. It provisions fine-grained, dedicated, and individual CAs, certificates, credentials, and secrets for each component — whether Kubernetes-related (such as service account keys or etcd encryption keys) or Gardener-specific (such as opt-in SSH keys or observability credentials). The Gardener installation, the seeds, and all shoots have their own distinct sets of credentials — amounting to more than 150 per shoot cluster control plane and hundreds of thousands for larger Gardener installations overall. All these credentials are rotated automatically and without downtime — most continuously, while some (like the API server CA) require user initiation to ensure operational awareness. For a deeper dive into Gardener’s credential rotation, see our Cloud Native Rejekts talk). This granular approach effectively prevents lateral movement, significantly strengthening the security posture.
Enhanced Security & Access Control:
- OIDC Integration: Gardener supports OIDC configuration for the
kube-apiserver
for secure user authentication. - Customizable Audit Policies: It allows specifying custom audit policies for detailed logging.
- Managed Service Account Issuers: Gardener can manage service account issuers, enhancing workload identity security.
- SSH Access Control: It provides mechanisms to manage SSH access to worker nodes securely if opted in (Gardener itself doesn’t require SSH access to worker nodes).
- Workload Identity: Gardener supports workload identity features, allowing pods to securely authenticate to cloud provider services.
- OIDC Integration: Gardener supports OIDC configuration for the
Powerful Extensibility:
- Extension Framework and Ecosystem: Gardener features a robust extension mechanism for deep integration of cloud providers, operating systems, container runtimes, or services like DNS management, certificate management, registry caches, network filtering, image signature verification, and more.
- Catered to Platform Builders: This extensibility also allows platform builders to deploy custom extensions into the self-managed seed cluster infrastructure that hosts shoot cluster control planes. This offers robust isolation for these custom components from the user’s shoot cluster worker nodes, enhancing both security and operational stability.
Integrated DNS and Certificate Management:
- External DNS Management: Gardener can manage DNS records for the cluster’s API server and services via its
shoot-dns-service
extension. - Automated Certificate Management: Through extensions like
shoot-cert-service
, it manages TLS certificates, including ACME integration. Gardener also provides its own robust DNS (dns-management
) and certificate (cert-management
) solutions designed for enterprise scale. These custom solutions were developed because, at the scale Gardener operates, many deep optimizations were necessary, e.g., to avoid being rate-limited by upstream providers.
- External DNS Management: Gardener can manage DNS records for the cluster’s API server and services via its
A Kubernetes-Native Foundation for Sovereign Cloud
The modern IT landscape is rapidly evolving away from primitive virtual machines towards distributed systems. Kubernetes has emerged as the de facto standard for deploying and managing these modern, cloud-native applications and services at scale. Gardener is squarely positioned at the forefront of this shift, offering a Kubernetes-native approach to managing Kubernetes clusters themselves. It possesses a mature, declarative, Kubernetes-native API for full cluster lifecycle management. Unlike services that might expose proprietary APIs, Gardener’s approach is inherently Kubernetes-native and multi-cloud. This unified API is comprehensive, offering a consistent way to manage diverse cluster landscapes.
Its nature as a fully open-source project is particularly relevant for initiatives like NeoNephos, which aim to build sovereign cloud solutions. All core features, stable releases, and essential operational components are available to the community. This inherent cloud-native, Kubernetes-centric design, coupled with its open-source nature and ability to run on diverse infrastructures (including on-premise and local cloud providers), provides the transparency, control, and technological independence crucial for digital sovereignty. Gardener delivers full sovereign control today, enabling organizations to run all modern applications and services at scale with complete authority over their infrastructure and data. This is a significant reason why many cloud providers and enterprises that champion sovereignty are choosing Gardener as their foundation and actively contributing to its ecosystem.
Operational Depth Reflecting Real-World Scale
Gardener’s operational maturity is a direct reflection of its long evolution, shaped by the demands of enterprise users and real-world, large-scale deployments. This maturity translates into statistical evidence and track records of uptime for end-users and their critical services. For instance, Gardener includes fully automated, incremental etcd backups with a recovery point objective (RPO) of five minutes and supports autonomous, hands-off restoration workflows via etcd-druid
. Features like Vertical Pod Autoscalers (VPAs), PodDisruptionBudgets (PDBs), NetworkPolicies, PriorityClasses, and sophisticated pod placement strategies are integral to Gardener’s offering, ensuring high availability and fault tolerance. Gardener’s automation deals with many of the usual exceptions and does not require human DevOps intervention for most operational tasks. Gardener’s commitment to robust security is evident in Gardener’s proactive security posture, which has proven effective in real-world scenarios. This depth of experience and automation ultimately translates into first-class Service Level Agreements (SLAs) that businesses can trust and rely on. As a testament to this, SAP entrusts Gardener with its Systems of Record. This level of operational excellence enables Gardener to meet the expectations of today’s most demanding Kubernetes use cases.
Conclusion: A Solid Foundation for Your Kubernetes Strategy
For enterprises and organizations seeking a comprehensive, truly open-source solution for managing the full lifecycle of Kubernetes clusters at scale, Gardener offers a compelling proposition. Its mature architecture, rich feature set, operational robustness, built-in enterprise governance capabilities, and commitment to the open-source community provide a solid foundation for running demanding Kubernetes workloads with confidence. This makes it a suitable technical underpinning for ambitious projects like NeoNephos, contributing to a future of greater digital autonomy.
We invite you to explore Gardener and discover how it can empower your enterprise-grade and -scale Kubernetes journey.