그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그 그

  8 minute read  

Configure Dependency Watchdog Components

Prober

Dependency watchdog prober command takes command-line-flags which are meant to fine-tune the prober. In addition a ConfigMap is also mounted to the container which provides tuning knobs for the all probes that the prober starts.

Command line arguments

Prober can be configured via the following flags:

Flag NameTypeRequiredDefault ValueDescription
kube-api-burstintNo10Burst to use while talking with kubernetes API server. The number must be >= 0. If it is 0 then a default value of 10 will be used
kube-api-qpsfloatNo5.0Maximum QPS (queries per second) allowed when talking with kubernetes API server. The number must be >= 0. If it is 0 then a default value of 5.0 will be used
concurrent-reconcilesintNo1Maximum number of concurrent reconciles
config-filestringYesNAPath of the config file containing the configuration to be used for all probes
metrics-bind-addrstringNo“:9643”The TCP address that the controller should bind to for serving prometheus metrics
health-bind-addrstringNo“:9644”The TCP address that the controller should bind to for serving health probes
enable-leader-electionboolNofalseIn case prober deployment has more than 1 replica for high availability, then it will be setup in a active-passive mode. Out of many replicas one will become the leader and the rest will be passive followers waiting to acquire leadership in case the leader dies.
leader-election-namespacestringNo“garden”Namespace in which leader election resource will be created. It should be the same namespace where DWD pods are deployed
leader-elect-lease-durationtime.DurationNo15sThe duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled.
leader-elect-renew-deadlinetime.DurationNo10sThe interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable if leader election is enabled.
leader-elect-retry-periodtime.DurationNo2sThe duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled.

You can view an example kubernetes prober deployment YAML to see how these command line args are configured.

Prober Configuration

A probe configuration is mounted as ConfigMap to the container. The path to the config file is configured via config-file command line argument as mentioned above. Prober will start one probe per Shoot control plane hosted within the Seed cluster. Each such probe will run asynchronously and will periodically connect to the Kube ApiServer of the Shoot. Configuration below will influence each such probe.

You can view an example YAML configuration provided as data in a ConfigMap here.

NameTypeRequiredDefault ValueDescription
kubeConfigSecretNamestringYesNAName of the kubernetes Secret which has the encoded KubeConfig required to connect to the Shoot control plane Kube ApiServer via an internal domain. This typically uses the local cluster DNS.
probeIntervalmetav1.DurationNo10sInterval with which each probe will run.
initialDelaymetav1.DurationNo30sInitial delay for the probe to become active. Only applicable when the probe is created for the first time.
probeTimeoutmetav1.DurationNo30sIn each run of the probe it will attempt to connect to the Shoot Kube ApiServer. probeTimeout defines the timeout after which a single run of the probe will fail.
backoffJitterFactorfloat64No0.2Jitter with which a probe is run.
dependentResourceInfos[]prober.DependentResourceInfoYesNADetailed below.
kcmNodeMonitorGraceDurationmetav1.DurationYesNAIt is the node-monitor-grace-period set in the kcm flags. Used to determine whether a node lease can be considered expired.
nodeLeaseFailureFractionfloat64No0.6is used to determine the maximum number of leases that can be expired for a lease probe to succeed.

DependentResourceInfo

If a lease probe fails, then it scales down the dependent resources defined by this property. Similarly, if the lease probe is now successful, then it scales up the dependent resources defined by this property.

Each dependent resource info has the following properties:

NameTypeRequiredDefault ValueDescription
refautoscalingv1.CrossVersionObjectReferenceYesNAIt is a collection of ApiVersion, Kind and Name for a kubernetes resource thus serving as an identifier.
optionalboolYesNAIt is possible that a dependent resource is optional for a Shoot control plane. This property enables a probe to determine the correct behavior in case it is unable to find the resource identified via ref.
scaleUpprober.ScaleInfoNoCaptures the configuration to scale up this resource. Detailed below.
scaleDownprober.ScaleInfoNoCaptures the configuration to scale down this resource. Detailed below.

NOTE: Since each dependent resource is a target for scale up/down, therefore it is mandatory that the resource reference points a kubernetes resource which has a scale subresource.

ScaleInfo

How to scale a DependentResourceInfo is captured in ScaleInfo. It has the following properties:

NameTypeRequiredDefault ValueDescription
levelintYesNADetailed below.
initialDelaymetav1.DurationNo0s (No initial delay)Once a decision is taken to scale a resource then via this property a delay can be induced before triggering the scale of the dependent resource.
timeoutmetav1.DurationNo30sDefines the timeout for the scale operation to finish for a dependent resource.

Determining target replicas

Prober cannot assume any target replicas during a scale-up operation for the following reasons:

  1. Kubernetes resources could be set to provide highly availability and the number of replicas could wary from one shoot control plane to the other. In gardener the number of replicas of pods in shoot namespace are controlled by the shoot control plane configuration.
  2. If Horizontal Pod Autoscaler has been configured for a kubernetes dependent resource then it could potentially change the spec.replicas for a deployment/statefulset.

Given the above constraint lets look at how prober determines the target replicas during scale-down or scale-up operations.

  1. Scale-Up: Primary responsibility of a probe while performing a scale-up is to restore the replicas of a kubernetes dependent resource prior to scale-down. In order to do that it updates the following for each dependent resource that requires a scale-up:

    1. spec.replicas: Checks if dependency-watchdog.gardener.cloud/replicas is set. If it is, then it will take the value stored against this key as the target replicas. To be a valid value it should always be greater than 0.
    2. If dependency-watchdog.gardener.cloud/replicas annotation is not present then it falls back to the hard coded default value for scale-up which is set to 1.
    3. Removes the annotation dependency-watchdog.gardener.cloud/replicas if it exists.
  2. Scale-Down: To scale down a dependent kubernetes resource it does the following:

    1. Adds an annotation dependency-watchdog.gardener.cloud/replicas and sets its value to the current value of spec.replicas.
    2. Updates spec.replicas to 0.

Level

Each dependent resource that should be scaled up or down is associated to a level. Levels are ordered and processed in ascending order (starting with 0 assigning it the highest priority). Consider the following configuration:

dependentResourceInfos:
  - ref: 
      kind: "Deployment"
      name: "kube-controller-manager"
      apiVersion: "apps/v1"
    scaleUp: 
      level: 1 
    scaleDown: 
      level: 0 
  - ref:
      kind: "Deployment"
      name: "machine-controller-manager"
      apiVersion: "apps/v1"
    scaleUp:
      level: 1
    scaleDown:
      level: 1
  - ref:
      kind: "Deployment"
      name: "cluster-autoscaler"
      apiVersion: "apps/v1"
    scaleUp:
      level: 0
    scaleDown:
      level: 2

Let us order the dependent resources by their respective levels for both scale-up and scale-down. We get the following order:

Scale Up Operation

Order of scale up will be:

  1. cluster-autoscaler
  2. kube-controller-manager and machine-controller-manager will be scaled up concurrently after cluster-autoscaler has been scaled up.

Scale Down Operation

Order of scale down will be:

  1. kube-controller-manager
  2. machine-controller-manager after (1) has been scaled down.
  3. cluster-autoscaler after (2) has been scaled down.

Disable/Ignore Scaling

A probe can be configured to ignore scaling of configured dependent kubernetes resources. To do that one must set dependency-watchdog.gardener.cloud/ignore-scaling annotation to true on the scalable resource for which scaling should be ignored.

Weeder

Dependency watchdog weeder command also (just like the prober command) takes command-line-flags which are meant to fine-tune the weeder. In addition a ConfigMap is also mounted to the container which helps in defining the dependency of pods on endpoints.

Command Line Arguments

Weeder can be configured with the same flags as that for prober described under command-line-arguments section You can find an example weeder deployment YAML to see how these command line args are configured.

Weeder Configuration

Weeder configuration is mounted as ConfigMap to the container. The path to the config file is configured via config-file command line argument as mentioned above. Weeder will start one go routine per podSelector per endpoint on an endpoint event as described in weeder internal concepts.

You can view the example YAML configuration provided as data in a ConfigMap here.

NameTypeRequiredDefault ValueDescription
watchDuration*metav1.DurationNo5m0sThe time duration for which watch is kept on dependent pods to see if anyone turns to CrashLoopBackoff
servicesAndDependantSelectorsmap[string]DependantSelectorsYesNAEndpoint name and its corresponding dependent pods. More info below.

DependantSelectors

If the service recovers from downtime, then weeder starts to watch for CrashLoopBackOff pods. These pods are identified by info stored in this property.

NameTypeRequiredDefault ValueDescription
podSelectors[]*metav1.LabelSelectorYesNAThis is a list of Label selector