그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그
6 minute read
Overview
This document describes the used autoscaling mechanism for several components.
Garden or Shoot Cluster etcd
By default, if none of the autoscaling modes is requested the etcd
is deployed with static resources, without autoscaling.
However, there are two supported autoscaling modes for the Garden or Shoot cluster etcd.
HVPA
In
HVPA
mode, the etcd is scaled by the hvpa-controller. The gardenlet/gardener-operator is creating anHVPA
resource for the etcd (main
orevents
). TheHVPA
enables a vertical scaling for etcd.The
HVPA
mode is the used autoscaling mode when theHVPA
feature gate is enabled and theVPAForETCD
feature gate is disabled.VPA
In
VPA
mode, the etcd is scaled by a nativeVPA
resource.The
VPA
mode is the used autoscaling mode when theVPAForETCD
feature gate is enabled (takes precedence over theHVPA
feature gate).
[!NOTE] Starting with release
v1.97
, theVPAForETCD
feature gate is enabled by default.
For both of the autoscaling modes downscaling is handled more pessimistically to prevent many subsequent etcd restarts. Thus, for production
and infrastructure
Shoot clusters (or all Garden clusters), downscaling is deactivated for the main etcd. For all other Shoot clusters, lower advertised requests/limits are only applied during the Shoot’s maintenance time window.
Shoot Kubernetes API Server
There are three supported autoscaling modes for the Shoot Kubernetes API server.
Baseline
In
Baseline
mode, the Shoot Kubernetes API server is scaled by active HPA and VPA in passive, recommend-only mode.The API server resource requests are computed based on the Shoot’s minimum Nodes count:
Range Resource Requests [0, 2] 800m
,800Mi
(2, 10] 1000m
,1100Mi
(10, 50] 1200m
,1600Mi
(50, 100] 2500m
,5200Mi
(100, inf.) 3000m
,5200Mi
The API server’s min replicas count is 2, the max replicas count - 3.
The
Baseline
mode is the used autoscaling mode when theHVPA
andVPAAndHPAForAPIServer
feature gates are not enabled.HVPA
In
HVPA
mode, the Shoot Kubernetes API server is scaled by the hvpa-controller. The gardenlet is creating anHVPA
resource for the API server. TheHVPA
resource is backed by HPA and VPA both in recommend-only mode. The hvpa-controller is responsible for enabling simultaneous horizontal and vertical scaling by incorporating the recommendations from the HPA and VPA.The initial API server resource requests are
500m
and1Gi
. HVPA’s HPA is scaling only on CPU (average utilization 80%). HVPA’s VPA max allowed values are8
CPU and25G
.The API server’s min replicas count is 2, the max replicas count - 3.
The
HVPA
mode is the used autoscaling mode when theHVPA
feature gate is enabled (and theVPAAndHPAForAPIServer
feature gate is disabled).VPAAndHPA
In
VPAAndHPA
mode, the Shoot Kubernetes API server is scaled simultaneously by VPA and HPA on the same metric (CPU and memory usage). The pod-trashing cycle between VPA and HPA scaling on the same metric is avoided by configuring the HPA to scale on average usage (not on average utilization) and by picking the target average utilization values in sync with VPA’s allowed maximums. This makes possible VPA to first scale vertically on CPU/memory usage. Once all Pods’ average CPU/memory usage is close to exceed the VPA’s allowed maximum CPU/memory (the HPA’s target average utilization, 1/7 less than VPA’s allowed maximums), HPA is scaling horizontally (by adding a new replica).The
VPAAndHPA
mode is introduced to address disadvantages with HVPA: additional component; modifies the deployment triggering unnecessary rollouts; vertical scaling only at max replicas; stuck vertical resource requests when scaling in again; etc.The initial API server resource requests are
250m
and500Mi
. VPA’s max allowed values are7
CPU and28G
. HPA’s average target usage values are6
CPU and24G
.The API server’s min replicas count is 2, the max replicas count - 6.
The
VPAAndHPA
mode is the used autoscaling mode when theVPAAndHPAForAPIServer
feature gate is enabled (takes precedence over theHVPA
feature gate).
[!NOTE] Starting with release
v1.101
, theVPAAndHPAForAPIServer
feature gate is enabled by default.
In all scaling modes the min replicas count of 2 is imposed by the High Availability of Shoot Control Plane Components.
The gardenlet sets the initial API server resource requests only when the Deployment is not found. When the Deployment exists, it is not overwriting the kube-apiserver container resources.
Disabling Scale Down for Components in the Shoot Control Plane
Some Shoot clusters’ control plane components can be overloaded and can have very high resource usage. The existing autoscaling solution could be imperfect to cover these cases. Scale down actions for such overloaded components could be disruptive.
To prevent such disruptive scale-down actions it is possible to disable scale down of the etcd, Kubernetes API server and Kubernetes controller manager in the Shoot control plane by annotating the Shoot with alpha.control-plane.scaling.shoot.gardener.cloud/scale-down-disabled=true
.
There are the following specifics for when disabling scale-down for the Kubernetes API server component:
- In
Baseline
andHVPA
modes the HPA’s min and max replicas count are set to 4. - In
VPAAndHPA
mode if the HPA resource exists and HPA’sspec.minReplicas
is not nil then the min replicas count ismax(spec.minReplicas, status.desiredReplicas)
. When scale-down is disabled, this allows operators to specify a custom value for HPAspec.minReplicas
and this value not to be reverted by gardenlet. I.e, HPA does scale down to min replicas but not below min replicas. HPA’s max replicas count is 6.
Note: The
alpha.control-plane.scaling.shoot.gardener.cloud/scale-down-disabled
annotation is alpha and can be removed anytime without further notice. Only use it if you know what you do.
Virtual Kubernetes API Server and Gardener API Server
The virtual Kubernetes API server’s autoscaling is same as the Shoot Kubernetes API server’s with the following differences:
- The initial API server resource requests are
600m
and512Mi
in all autoscaling modes. - The min replicas count is 2 for a non-HA virtual cluster and 3 for an HA virtual cluster. The max replicas count is 6.
- In
HVPA
mode, HVPA’s HPA is scaling on both CPU and memory (average utilization 80% for both).
The Gardener API server’s autoscaling is the same as the Shoot Kubernetes API server’s with the following differences:
- The initial API server resource requests are
600m
and512Mi
in all autoscaling modes. - The min replicas count is 2 for a non-HA virtual cluster and 3 for an HA virtual cluster. The max replicas count is 6.
- In
HVPA
mode, HVPA’s HPA is scaling on both CPU and memory (average utilization 80% for both). - In
HVPA
mode, HVPA’s VPA max allowed values are4
CPU and25G
.