This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

May

Fine-Tuning kube-proxy Readiness: Ensuring Accurate Health Checks During Node Scale-Down

Gardener has recently refined how it determines the readiness of kube-proxy components within managed Kubernetes clusters. This adjustment leads to more accurate system health reporting, especially during node scale-down operations orchestrated by cluster-autoscaler.

The Challenge: kube-proxy Readiness During Node Scale-Down

Previously, Gardener utilized kube-proxy’s /healthz endpoint for its readiness probe. While generally effective, this endpoint’s behavior changed in Kubernetes 1.28 (as part of KEP-3836 and implemented in kubernetes/kubernetes#116470). The /healthz endpoint now reports kube-proxy as unhealthy if its node is marked for deletion by cluster-autoscaler (e.g., via a specific taint) or has a deletion timestamp.

This behavior is intended to help external load balancers (particularly those using externalTrafficPolicy: Cluster on infrastructures like GCP) avoid sending new traffic to nodes that are about to be terminated. However, for Gardener’s internal system component health checks, this meant that kube-proxy could appear unready for extended periods if node deletion was delayed due to PodDisruptionBudgets or long terminationGracePeriodSeconds. This could lead to misleading “unhealthy” states for the cluster’s system components.

The Solution: Aligning with Upstream kube-proxy Enhancements

To address this, Gardener now leverages the /livez endpoint for kube-proxy’s readiness probe in clusters running Kubernetes version 1.28 and newer. The /livez endpoint, also introduced as part of the aforementioned kube-proxy improvements, checks the actual liveness of the kube-proxy process itself, without considering the node’s termination status.

For clusters running Kubernetes versions 1.27.x and older (where /livez is not available), Gardener will continue to use the /healthz endpoint for the readiness probe.

This change, detailed in gardener/gardener#12015, ensures that Gardener’s readiness check for kube-proxy accurately reflects kube-proxy’s operational status rather than the node’s lifecycle state. It’s important to note that this adjustment does not interfere with the goals of KEP-3836; cloud controller managers can still utilize the /healthz endpoint for their load balancer health checks as intended.

Benefits for Gardener Operators

This enhancement brings a key benefit to Gardener operators:

  • More Accurate System Health: The system components health check will no longer report kube-proxy as unhealthy simply because its node is being gracefully terminated by cluster-autoscaler. This reduces false alarms and provides a clearer view of the cluster’s actual health.
  • Smoother Operations: Operations teams will experience fewer unnecessary alerts related to kube-proxy during routine scale-down events, allowing them to focus on genuine issues.

By adapting its kube-proxy readiness checks, Gardener continues to refine its operational robustness, providing a more stable and predictable management experience.

Further Information

New in Gardener: Forceful Redeployment of gardenlets for Enhanced Operational Control

Gardener continues to enhance its operational capabilities, and a recent improvement introduces a much-requested feature for managing gardenlets: the ability to forcefully trigger their redeployment. This provides operators with greater control and a streamlined recovery path for specific scenarios.

The Standard gardenlet Lifecycle

gardenlets, crucial components in the Gardener architecture, are typically deployed into seed clusters. For setups utilizing the seedmanagement.gardener.cloud/v1alpha1.Gardenlet resource, particularly in unmanaged seeds (those not backed by a shoot cluster and ManagedSeed resource), the gardener-operator handles the initial deployment of the gardenlet.

Once this initial deployment is complete, the gardenlet takes over its own lifecycle, leveraging a self-upgrade strategy to keep itself up-to-date. Under normal circumstances, the gardener-operator does not intervene further after this initial phase.

When Things Go Awry: The Need for Intervention

While the self-upgrade mechanism is robust, certain situations can arise where a gardenlet might require a more direct intervention. For example:

  • The gardenlet’s client certificate to the virtual garden cluster might have expired or become invalid.
  • The gardenlet Deployment in the seed cluster might have been accidentally deleted or become corrupted.

In such cases, because the gardener-operator’s responsibility typically ends after the initial deployment, the gardenlet might not be able to recover on its own, potentially leading to operational issues.

Empowering Operators: The Force-Redeploy Annotation

To address these challenges, Gardener now allows operators to instruct the gardener-operator to forcefully redeploy a gardenlet. This is achieved by annotating the specific Gardenlet resource with:

gardener.cloud/operation=force-redeploy

When this annotation is applied, it signals the gardener-operator to re-initiate the deployment process for the targeted gardenlet, effectively overriding the usual hands-off approach after initial setup.

How It Works

The process for a forceful redeployment is straightforward:

  1. An operator identifies a gardenlet that requires redeployment due to issues like an expired certificate or a missing deployment.
  2. The operator applies the gardener.cloud/operation=force-redeploy annotation to the corresponding seedmanagement.gardener.cloud/v1alpha1.Gardenlet resource in the virtual garden cluster.
  3. Important: If the gardenlet is for a remote cluster and its kubeconfig Secret was previously removed (a standard cleanup step after initial deployment), this Secret must be recreated, and its reference (.spec.kubeconfigSecretRef) must be re-added to the Gardenlet specification.
  4. The gardener-operator detects the annotation and proceeds to redeploy the gardenlet, applying its configurations and charts anew.
  5. Once the redeployment is successfully completed, the gardener-operator automatically removes the gardener.cloud/operation=force-redeploy annotation from the Gardenlet resource. Similar to the initial deployment, it will also clean up the referenced kubeconfig Secret and set .spec.kubeconfigSecretRef to nil if it was provided.

Benefits

This new feature offers significant advantages for Gardener operators:

  • Enhanced Recovery: Provides a clear and reliable mechanism to recover gardenlets from specific critical failure states.
  • Improved Operational Flexibility: Offers more direct control over the gardenlet lifecycle when exceptional circumstances demand it.
  • Reduced Manual Effort: Streamlines the process of restoring a misbehaving gardenlet, minimizing potential downtime or complex manual recovery procedures.

This enhancement underscores Gardener’s commitment to operational excellence and responsiveness to the needs of its user community.

Dive Deeper

To learn more about this feature, you can explore the following resources:

Streamlined Node Onboarding: Introducing `gardenadm token` and `gardenadm join`

Gardener continues to enhance its gardenadm tool, simplifying the management of autonomous Shoot clusters. Recently, new functionalities have been introduced to streamline the process of adding worker nodes to these clusters: the gardenadm token command suite and the corresponding gardenadm join command. These additions offer a more convenient and Kubernetes-native experience for cluster expansion.

Managing Bootstrap Tokens with gardenadm token

A key aspect of securely joining nodes to a Kubernetes cluster is the use of bootstrap tokens. The new gardenadm token command provides a set of subcommands to manage these tokens effectively within your autonomous Shoot cluster’s control plane node. This functionality is analogous to the familiar kubeadm token commands.

The available subcommands include:

  • gardenadm token list: Displays all current bootstrap tokens. You can also use the --with-token-secrets flag to include the token secrets in the output for easier inspection.
  • gardenadm token generate: Generates a cryptographically random bootstrap token. This command only prints the token; it does not create it on the server.
  • gardenadm token create [token]: Creates a new bootstrap token on the server. If you provide a token (in the format [a-z0-9]{6}.[a-z0-9]{16}), it will be used. If no token is supplied, gardenadm will automatically generate a random one and create it.
    • A particularly helpful option for this command is --print-join-command. When used, instead of just outputting the token, it prints the complete gardenadm join command, ready to be copied and executed on the worker node you intend to join. You can also specify flags like --description, --validity, and --worker-pool-name to customize the token and the generated join command.
  • gardenadm token delete <token-value...>: Deletes one or more bootstrap tokens from the server. You can specify tokens by their ID, the full token string, or the name of the Kubernetes Secret storing the token (e.g., bootstrap-token-<id>).

These commands provide comprehensive control over the lifecycle of bootstrap tokens, enhancing security and operational ease.

Joining Worker Nodes with gardenadm join

Once a bootstrap token is created (ideally using gardenadm token create --print-join-command on a control plane node), the new gardenadm join command facilitates the process of adding a new worker node to the autonomous Shoot cluster.

The command is executed on the prospective worker machine and typically looks like this:

gardenadm join --bootstrap-token <token_id.token_secret> --ca-certificate <base64_encoded_ca_bundle> --gardener-node-agent-secret-name <os_config_secret_name> <control_plane_api_server_address>

Key parameters include:

  • --bootstrap-token: The token obtained from the gardenadm token create command.
  • --ca-certificate: The base64-encoded CA certificate bundle of the cluster’s API server.
  • --gardener-node-agent-secret-name: The name of the Secret in the kube-system namespace of the control plane that contains the OperatingSystemConfig (OSC) for the gardener-node-agent. This OSC dictates how the node should be configured.
  • <control_plane_api_server_address>: The address of the Kubernetes API server of the autonomous cluster.

Upon execution, gardenadm join performs several actions:

  1. It discovers the Kubernetes version of the control plane using the provided bootstrap token and CA certificate.
  2. It checks if the gardener-node-agent has already been initialized on the machine.
  3. If not already joined, it prepares the gardener-node-init configuration. This involves setting up a systemd service (gardener-node-init.service) which, in turn, downloads and runs the gardener-node-agent.
  4. The gardener-node-agent then uses the bootstrap token to securely download its specific OperatingSystemConfig from the control plane.
  5. Finally, it applies this configuration, setting up the kubelet and other necessary components, thereby officially joining the node to the cluster.

After the node has successfully joined, the bootstrap token used for the process will be automatically deleted by the kube-controller-manager once it expires. However, it can also be manually deleted immediately using gardenadm token delete on the control plane node for enhanced security.

These new gardenadm commands significantly simplify the expansion of autonomous Shoot clusters, providing a robust and user-friendly mechanism for managing bootstrap tokens and joining worker nodes.

Further Information

  • gardenadm token Pull Request: GEP-28 gardenadm token (#11934)
  • gardenadm join Pull Request: GEP-28 gardenadm join (#11942)
  • Recording of the demo: Watch the demo starting at 12m48s

Enhanced Network Flexibility: Gardener Now Supports CIDR Overlap for Non-HA Shoots

Gardener is continually evolving to offer greater flexibility and efficiency in managing Kubernetes clusters. A significant enhancement has been introduced that addresses a common networking challenge: the requirement for completely disjoint network CIDR blocks between a shoot cluster and its seed cluster. Now, Gardener allows for IPv4 network overlap in specific scenarios, providing users with more latitude in their network planning.

Addressing IP Address Constraints

Previously, all shoot cluster networks (pods, services, nodes) had to be distinct from the seed cluster’s networks. This could be challenging in environments with limited IP address space or complex network topologies. With this new feature, IPv4 or dual-stack shoot clusters can now define pod, service, and node networks that overlap with the IPv4 networks of their seed cluster.

How It Works: NAT for Seamless Connectivity

This capability is enabled through a double Network Address Translation (NAT) mechanism within the VPN connection established between the shoot and seed clusters. When IPv4 network overlap is configured, Gardener intelligently maps the overlapping shoot and seed networks to a dedicated set of newly reserved IPv4 ranges. These ranges are used exclusively within the VPN pods to ensure seamless communication, effectively resolving any conflicts that would arise from the overlapping IPs.

The reserved mapping ranges are:

  • 241.0.0.0/8: Seed Pod Mapping Range
  • 242.0.0.0/8: Shoot Node Mapping Range
  • 243.0.0.0/8: Shoot Service Mapping Range
  • 244.0.0.0/8: Shoot Pod Mapping Range

Conditions for Utilizing Overlapping Networks

To leverage this new network flexibility, the following conditions must be met:

  1. Non-Highly-Available VPN: The shoot cluster must utilize a non-highly-available (non-HA) VPN. This is typically the configuration for shoots with a non-HA control plane.
  2. IPv4 or Dual-Stack Shoots: The shoot cluster must be configured as either single-stack IPv4 or dual-stack (IPv4/IPv6). The overlap feature specifically pertains to IPv4 networks.
  3. Non-Use of Reserved Ranges: The shoot cluster’s own defined networks (for pods, services, and nodes) must not utilize any of the Gardener-reserved IP ranges, including the newly introduced mapping ranges listed above, or the existing 240.0.0.0/8 range (Kube-ApiServer Mapping Range).

It’s important to note that Gardener will prevent the migration of a non-HA shoot to an HA setup if its network ranges currently overlap with the seed, as this feature is presently limited to non-HA VPN configurations. For single-stack IPv6 shoots, Gardener continues to enforce non-overlapping IPv6 networks to avoid any potential issues, although IPv6 address space exhaustion is less common.

Benefits for Gardener Users

This enhancement offers increased flexibility in IP address management, particularly beneficial for users operating numerous shoot clusters or those in environments with constrained IPv4 address availability. By relaxing the strict disjointedness requirement for non-HA shoots, Gardener simplifies network allocation and reduces the operational overhead associated with IP address planning.

Explore Further

To dive deeper into this feature, you can review the original pull request and the updated documentation:

Enhanced Node Management: Introducing In-Place Updates in Gardener

Gardener is committed to providing efficient and flexible Kubernetes cluster management. Traditionally, updates to worker pool configurations, such as machine image or Kubernetes minor version changes, trigger a rolling update. This process involves replacing existing nodes with new ones, which is a robust approach for many scenarios. However, for environments with physical or bare-metal nodes, or stateful workloads sensitive to node replacement, or if the virtual machine type is scarce, this can introduce challenges like extended update times and potential disruptions.

To address these needs, Gardener now introduces In-Place Node Updates. This new capability allows certain updates to be applied directly to existing worker nodes without requiring their replacement, significantly reducing disruption and speeding up update processes for compatible changes.

New Update Strategies for Worker Pools

Gardener now supports three distinct update strategies for your worker pools, configurable via the updateStrategy field in the Shoot specification’s worker pool definition:

  • AutoRollingUpdate: This is the classic and default strategy. When updates occur, nodes are cordoned, drained, terminated, and replaced with new nodes incorporating the changes.
  • AutoInPlaceUpdate: With this strategy, compatible updates are applied directly to the existing nodes. The MachineControllerManager (MCM) automatically selects nodes, cordons and drains them, and then signals the Gardener Node Agent (GNA) to perform the update. Once GNA confirms success, MCM uncordons the node.
  • ManualInPlaceUpdate: This strategy also applies updates directly to existing nodes but gives operators fine-grained control. After an update is specified, MCM marks all nodes in the pool as candidates. Operators must then manually label individual nodes to select them for the in-place update process, which then proceeds similarly to the AutoInPlaceUpdate strategy.

The AutoInPlaceUpdate and ManualInPlaceUpdate strategies are available when the InPlaceNodeUpdates feature gate is enabled in the gardener-apiserver.

What Can Be Updated In-Place?

In-place updates are designed to handle a variety of common operational tasks more efficiently:

  • Machine Image Updates: Newer versions of a machine image can be rolled out by executing an update command directly on the node, provided the image and cloud profile are configured to support this.
  • Kubernetes Minor Version Updates: Updates to the Kubernetes minor version of worker nodes can be applied in-place.
  • Kubelet Configuration Changes: Modifications to the Kubelet configuration can be applied directly.
  • Credentials Rotation: Critical for security, rotation of Certificate Authorities (CAs) and ServiceAccount signing keys can now be performed on existing nodes without replacement.

However, some changes still necessitate a rolling update (node replacement):

  • Changing the machine image name (e.g., switching from Ubuntu to Garden Linux).
  • Modifying the machine type.
  • Altering volume types or sizes.
  • Changing the Container Runtime Interface (CRI) name (e.g., from Docker to containerd).
  • Enabling or disabling node-local DNS.

Key API and Component Adaptations

Several Gardener components and APIs have been enhanced to support in-place updates:

  • CloudProfile: The CloudProfile API now allows specifying inPlaceUpdates configuration within machineImage.versions. This includes a boolean supported field to indicate if a version supports in-place updates and an optional minVersionForUpdate string to define the minimum OS version from which an in-place update to the current version is permissible.
  • Shoot Specification: As mentioned, the spec.provider.workers[].updateStrategy field allows selection of the desired update strategy. Additionally, spec.provider.workers[].machineControllerManagerSettings now includes machineInPlaceUpdateTimeout and disableHealthTimeout (which defaults to true for in-place strategies to prevent premature machine deletion during lengthy updates). For ManualInPlaceUpdate, maxSurge defaults to 0 and maxUnavailable to 1.
  • OperatingSystemConfig (OSC): The OSC resource, managed by OS extensions, now includes status.inPlaceUpdates.osUpdate where extensions can specify the command and args for the Gardener Node Agent to execute for machine image (Operating System) updates. The spec.inPlaceUpdates field in the OSC will carry information like the target Operating System version, Kubelet version, and credential rotation status to the node.
  • Gardener Node Agent (GNA): GNA is responsible for executing the in-place updates on the node. It watches for a specific node condition ( InPlaceUpdate with reason ReadyForUpdate) set by MCM, performs the OS update, Kubelet updates, or credentials rotation, restarts necessary pods (like DaemonSets), and then labels the node with the update outcome.
  • MachineControllerManager (MCM): MCM orchestrates the in-place update process. For in-place strategies, while new machine classes and machine sets are created to reflect the desired state, the actual machine objects are not deleted and recreated. Instead, their ownership is transferred to the new machine set. MCM handles cordoning, draining, and setting node conditions to coordinate with GNA.
  • Shoot Status & Constraints: To provide visibility, the status.inPlaceUpdates.pendingWorkerUpdates field in the Shoot now lists worker pools pending autoInPlaceUpdate or manualInPlaceUpdate. A new ShootManualInPlaceWorkersUpdated constraint is added if any manual in-place updates are pending, ensuring users are aware.
  • Worker Status: The Worker extension resource now includes status.inPlaceUpdates.workerPoolToHashMap to track the configuration hash of worker pools that have undergone in-place updates. This helps Gardener determine if a pool is up-to-date.
  • Forcing Updates: If an in-place update is stuck, the gardener.cloud/operation=force-in-place-update annotation can be added to the Shoot to allow subsequent changes or retries.

Benefits of In-Place Updates

  • Reduced Disruption: Minimizes workload interruptions by avoiding full node replacements for compatible updates.
  • Faster Updates: Applying changes directly can be quicker than provisioning new nodes, especially for OS patches or configuration changes.
  • Bare-Metal Efficiency: Particularly beneficial for bare-metal environments where node provisioning is more time-consuming and complex.
  • Stateful Workload Friendly: Lessens the impact on stateful applications that might be sensitive to node churn.

In-place node updates represent a significant step forward in Gardener’s operational flexibility, offering a more nuanced and efficient approach to managing node lifecycles, especially in demanding or specialized environments.

Dive Deeper

To explore the technical details and contributions that made this feature possible, refer to the following resources:

Gardener Dashboard 1.80: Streamlined Credentials, Enhanced Cluster Views, and Real-Time Updates

Gardener Dashboard version 1.80 introduces several significant enhancements aimed at improving user experience, credentials management, and overall operational efficiency. These updates bring more clarity to credential handling, a smoother experience for managing large numbers of clusters, and a move towards a more reactive interface.

Unified and Enhanced Credentials Management

The management of secrets and credentials has been significantly revamped for better clarity and functionality:

  • Introducing CredentialsBindings: The dashboard now fully supports CredentialsBinding resources alongside the existing SecretBinding resources. This allows for referencing both Secrets and, in the future, Workload Identities more explicitly. While CredentialsBindings referencing Workload Identity resources are visible for cluster creation, editing or deleting them via the dashboard is not yet supported.
  • “Credentials” Page: The former “Secrets” page has been renamed to “Credentials.” It features a new “Kind” column and distinct icons to clearly differentiate between SecretBinding and CredentialsBinding types, especially useful when resources share names. The column showing the referenced credential resource name has been removed as this information is part of the binding’s details.
  • Contextual Information and Safeguards: When editing a secret, all its associated data is now displayed, providing better context. If an underlying secret is referenced by multiple bindings, a hint is shown to prevent unintended impacts. Deletion of a binding is prevented if the underlying secret is still in use by another binding.
  • Simplified Creation and Editing: New secrets created via the dashboard will now automatically generate a CredentialsBinding. While existing SecretBindings remain updatable, the creation of new SecretBindings through the dashboard is no longer supported, encouraging the adoption of the more versatile CredentialsBinding. The edit dialog for secrets now pre-fills current data, allowing for easier modification of specific fields.
  • Handling Missing Secrets: The UI now provides clear information and guidance if a CredentialsBinding or SecretBinding references a secret that no longer exists.

Revamped Cluster List for Improved Scalability

Navigating and managing a large number of clusters is now more efficient:

  • Virtual Scrolling: The cluster list has adopted virtual scrolling. Rows are rendered dynamically as you scroll, replacing the previous pagination system. This significantly improves performance and provides a smoother browsing experience, especially for environments with hundreds or thousands of clusters.
  • Optimized Row Display: The height of individual rows in the cluster list has been reduced, allowing more clusters to be visible on the screen at once. Additionally, expandable content within a row (like worker details or ticket labels) now has a maximum height with internal scrolling, ensuring consistent row sizes and smooth virtual scrolling performance.

Real-Time Updates for Projects

The dashboard is becoming more dynamic with the introduction of real-time updates:

  • Instant Project Changes: Modifications to projects, such as creation or deletion, are now reflected instantly in the project list and interface without requiring a page reload. This is achieved through WebSocket communication.
  • Foundation for Future Reactivity: This enhancement for projects lays the groundwork for bringing real-time updates to other resources within the dashboard, such as Seeds and the Garden resource, in future releases.

Other Notable Enhancements

  • Kubeconfig Update: The kubeconfig generated for garden cluster access via the “Account” page now uses the --oidc-pkce-method flag, replacing the deprecated --oidc-use-pkce flag. Users encountering deprecation messages should redownload their kubeconfig.
  • Notification Behavior: Kubernetes warning notifications are now automatically dismissed after 5 seconds. However, all notifications will remain visible as long as the mouse cursor is hovering over them, giving users more time to read important messages.
  • API Server URL Path: Support has been added for kubeconfigs that include a path in the API server URL.

These updates in Gardener Dashboard 1.80 collectively enhance usability, provide better control over credentials, and improve performance for large-scale operations.

For a comprehensive list of all features, bug fixes, and contributor acknowledgments, please refer to the official release notes. You can also view the segment of the community call discussing these dashboard updates here.

Gardener: Powering Enterprise Kubernetes at Scale and Europe's Sovereign Cloud Future

The Kubernetes ecosystem is dynamic, offering a wealth of tools to manage the complexities of modern cloud-native applications. For enterprises seeking to provision and manage Kubernetes clusters efficiently, securely, and at scale, a robust and comprehensive solution is paramount. Gardener, born from years of managing tens of thousands of clusters efficiently across diverse platforms and in demanding environments, stands out as a fully open-source choice for delivering fully managed Kubernetes Clusters as a Service. It already empowers organizations like SAP, STACKIT, T-Systems, and others (see adopters) and has become a core technology for NeoNephos, a project aimed at advancing digital autonomy in Europe (see KubeCon London 2025 Keynote and press announcement).

The Gardener Approach: An Architecture Forged by Experience

At the heart of Gardener’s architecture is the concept of “Kubeception” (see readme and architecture). This approach involves using Kubernetes to manage Kubernetes. Gardener runs on a Kubernetes cluster (called a runtime cluster), facilitates access through a self-managed node-less Kubernetes cluster (the garden cluster), manages Kubernetes control planes as pods within other self-managed Kubernetes clusters that provide high scalability (called seed clusters), and ultimately provisions end-user Kubernetes clusters (called shoot clusters).

This multi-layered architecture isn’t complexity for its own sake. Gardener’s design and extensive feature set are the product of over eight years of continuous development and refinement, directly shaped by the high-scale, security-sensitive, and enterprise-grade requirements of its users. Experience has shown that such a sophisticated structure is key to addressing significant challenges in scalability, security, and operational manageability. For instance:

  • Scalability: Gardener achieves considerable scalability through its use of seed clusters, which it also manages. This allows for the distribution of control planes, preventing bottlenecks. The design even envisions leveraging Gardener to host its own management components (as an autonomous cluster), showcasing its resilience without risking circular dependencies.
  • Security: A fundamental principle in Gardener is the strict isolation of control planes from data planes. This extends to Gardener itself, which runs in a dedicated management cluster but exposes its API to end-users through a workerless virtual cluster. This workerless cluster acts as an isolated access point, presenting no compute surface for potentially malicious pods, thereby significantly enhancing security.
  • API Power & User Experience: Gardener utilizes the full capabilities of the Kubernetes API server. This enables advanced functionalities and sophisticated API change management. Crucially, for the end-user, interaction remains 100% Kubernetes-native. Users employ standard custom resources to instruct Gardener, meaning any tool, library, or language binding that supports Kubernetes CRDs inherently supports Gardener.

Delivering Fully Managed Kubernetes Clusters as a Service

Gardener provides a comprehensive “fully managed Kubernetes Clusters as a Service” offering. This means it handles much more than just spinning up a cluster; it manages the entire lifecycle and operational aspects. Here’s a glimpse into its capabilities:

  1. Full Cluster Lifecycle Management:

    • Infrastructure Provisioning: Gardener takes on the provisioning and management of underlying cloud infrastructure, including VPCs, subnets, NAT gateways, security groups, IAM roles, and virtual machines across a wide range of providers like AWS, Azure, GCP, OpenStack, and more.
    • Worker Node Management: It meticulously manages worker pools, covering OS images, machine types, autoscaling configurations (min/max/surge), update strategies, volume management, CRI configuration, and provider-specific settings.
  2. Enterprise Platform Governance:

    • Cloud Profiles: Gardener is designed with the comprehensive needs of enterprise platform operators in mind. Managing a fleet of clusters for an organization requires more than just provisioning; it demands clear governance over available resources, versions, and their lifecycle. Gardener addresses this through its declarative API, allowing platform administrators to define and enforce policies such as which Kubernetes versions are “supported,” “preview,” or “deprecated,” along with their expiration dates. Similarly, it allows control over available machine images, their versions, and lifecycle status. This level of granular control and lifecycle management for the underlying components of a Kubernetes service is crucial for enterprise adoption and stable operations. This is a key consideration often left as an additional implementation burden for platform teams using other cluster provisioning tools, where such governance features must be built on top. Gardener, by contrast, integrates these concerns directly into its API and operational model, simplifying the task for platform operators.
  3. Advanced Networking:

    • CNI Plugin Management: Gardener manages the deployment and configuration of CNI plugins such as Calico or Cilium.
    • Dual-Stack Networking: It offers comprehensive support for IPv4, IPv6, and dual-stack configurations for pods, services, and nodes.
    • NodeLocal DNS Cache: To enhance DNS performance and reliability, Gardener can deploy and manage NodeLocal DNS.
  4. Comprehensive Autoscaling:

    • Cluster Autoscaler: Gardener manages the Cluster Autoscaler for worker nodes, enabling dynamic scaling based on pod scheduling demands.
    • Horizontal and Vertical Pod Autoscaler (VPA): It manages HPA/VPA for workloads and applies it to control plane components, optimizing resource utilization (see blog).
  5. Operational Excellence & Maintenance:

    • Automated Kubernetes Upgrades: Gardener handles automated Kubernetes version upgrades for both control plane and worker nodes, with configurable maintenance windows.
    • Automated OS Image Updates: It manages automated machine image updates for worker nodes.
    • Cluster Hibernation: To optimize costs, Gardener supports hibernating clusters, scaling down components during inactivity.
    • Scheduled Maintenance: It allows defining specific maintenance windows for predictability.
    • Robust Credentials Rotation: Gardener features automated mechanisms for rotating all credentials. It provisions fine-grained, dedicated, and individual CAs, certificates, credentials, and secrets for each component — whether Kubernetes-related (such as service account keys or etcd encryption keys) or Gardener-specific (such as opt-in SSH keys or observability credentials). The Gardener installation, the seeds, and all shoots have their own distinct sets of credentials — amounting to more than 150 per shoot cluster control plane and hundreds of thousands for larger Gardener installations overall. All these credentials are rotated automatically and without downtime — most continuously, while some (like the API server CA) require user initiation to ensure operational awareness. For a deeper dive into Gardener’s credential rotation, see our Cloud Native Rejekts talk). This granular approach effectively prevents lateral movement, significantly strengthening the security posture.
  6. Enhanced Security & Access Control:

    • OIDC Integration: Gardener supports OIDC configuration for the kube-apiserver for secure user authentication.
    • Customizable Audit Policies: It allows specifying custom audit policies for detailed logging.
    • Managed Service Account Issuers: Gardener can manage service account issuers, enhancing workload identity security.
    • SSH Access Control: It provides mechanisms to manage SSH access to worker nodes securely if opted in (Gardener itself doesn’t require SSH access to worker nodes).
    • Workload Identity: Gardener supports workload identity features, allowing pods to securely authenticate to cloud provider services.
  7. Powerful Extensibility:

    • Extension Framework and Ecosystem: Gardener features a robust extension mechanism for deep integration of cloud providers, operating systems, container runtimes, or services like DNS management, certificate management, registry caches, network filtering, image signature verification, and more.
    • Catered to Platform Builders: This extensibility also allows platform builders to deploy custom extensions into the self-managed seed cluster infrastructure that hosts shoot cluster control planes. This offers robust isolation for these custom components from the user’s shoot cluster worker nodes, enhancing both security and operational stability.
  8. Integrated DNS and Certificate Management:

    • External DNS Management: Gardener can manage DNS records for the cluster’s API server and services via its shoot-dns-service extension.
    • Automated Certificate Management: Through extensions like shoot-cert-service, it manages TLS certificates, including ACME integration. Gardener also provides its own robust DNS (dns-management) and certificate (cert-management) solutions designed for enterprise scale. These custom solutions were developed because, at the scale Gardener operates, many deep optimizations were necessary, e.g., to avoid being rate-limited by upstream providers.

A Kubernetes-Native Foundation for Sovereign Cloud

The modern IT landscape is rapidly evolving away from primitive virtual machines towards distributed systems. Kubernetes has emerged as the de facto standard for deploying and managing these modern, cloud-native applications and services at scale. Gardener is squarely positioned at the forefront of this shift, offering a Kubernetes-native approach to managing Kubernetes clusters themselves. It possesses a mature, declarative, Kubernetes-native API for full cluster lifecycle management. Unlike services that might expose proprietary APIs, Gardener’s approach is inherently Kubernetes-native and multi-cloud. This unified API is comprehensive, offering a consistent way to manage diverse cluster landscapes.

Its nature as a fully open-source project is particularly relevant for initiatives like NeoNephos, which aim to build sovereign cloud solutions. All core features, stable releases, and essential operational components are available to the community. This inherent cloud-native, Kubernetes-centric design, coupled with its open-source nature and ability to run on diverse infrastructures (including on-premise and local cloud providers), provides the transparency, control, and technological independence crucial for digital sovereignty. Gardener delivers full sovereign control today, enabling organizations to run all modern applications and services at scale with complete authority over their infrastructure and data. This is a significant reason why many cloud providers and enterprises that champion sovereignty are choosing Gardener as their foundation and actively contributing to its ecosystem.

Operational Depth Reflecting Real-World Scale

Gardener’s operational maturity is a direct reflection of its long evolution, shaped by the demands of enterprise users and real-world, large-scale deployments. This maturity translates into statistical evidence and track records of uptime for end-users and their critical services. For instance, Gardener includes fully automated, incremental etcd backups with a recovery point objective (RPO) of five minutes and supports autonomous, hands-off restoration workflows via etcd-druid. Features like Vertical Pod Autoscalers (VPAs), PodDisruptionBudgets (PDBs), NetworkPolicies, PriorityClasses, and sophisticated pod placement strategies are integral to Gardener’s offering, ensuring high availability and fault tolerance. Gardener’s automation deals with many of the usual exceptions and does not require human DevOps intervention for most operational tasks. Gardener’s commitment to robust security is evident in Gardener’s proactive security posture, which has proven effective in real-world scenarios. This depth of experience and automation ultimately translates into first-class Service Level Agreements (SLAs) that businesses can trust and rely on. As a testament to this, SAP entrusts Gardener with its Systems of Record. This level of operational excellence enables Gardener to meet the expectations of today’s most demanding Kubernetes use cases.

Conclusion: A Solid Foundation for Your Kubernetes Strategy

For enterprises and organizations seeking a comprehensive, truly open-source solution for managing the full lifecycle of Kubernetes clusters at scale, Gardener offers a compelling proposition. Its mature architecture, rich feature set, operational robustness, built-in enterprise governance capabilities, and commitment to the open-source community provide a solid foundation for running demanding Kubernetes workloads with confidence. This makes it a suitable technical underpinning for ambitious projects like NeoNephos, contributing to a future of greater digital autonomy.

We invite you to explore Gardener and discover how it can empower your enterprise-grade and -scale Kubernetes journey.