Contributors to this page
Last update:

Operator Alerts

ApiServerUnreachableViaKubernetesServicecriticalshootThe Api server has been unreachable for 3 minutes via the kubernetes service in the shoot.
CoreDNSDowncriticalshootCoreDNS could not be found. Cluster DNS resolution will not work.
ApiServerNotReachableblockerseedAPI server not reachable via external endpoint: {{ $labels.instance }}.
KubeApiserverDownblockerseedAll API server replicas are down/unreachable, or all API server could not be found.
KubeApiServerTooManyAuditlogFailurescriticalseedThe API servers cumulative failure rate in logging audit events is {{ printf "%0.2f" $value }}%. This may be caused by an unavailable/unreachable AuditSink(s) and/or improper API server audit configuration.
KubeControllerManagerDowncriticalseedDeployments and replication controllers are not making progress.
KubeEtcdMainDownblockerseedEtcd3 cluster main is unavailable or cannot be scraped. As long as etcd3 main is down the cluster is unreachable.
KubeEtcdEventsDowncriticalseedEtcd3 cluster events is unavailable or cannot be scraped. Cluster events cannot be collected.
KubeEtcd3MainNoLeadercriticalseedEtcd3 main has no leader. No communication with etcd main possible. Apiserver is read only.
KubeEtcd3EventsNoLeadercriticalseedEtcd3 events has no leader. No communication with etcd events possible. New cluster events cannot be collected. Events can only be read.
KubeEtcd3HighNumberOfFailedProposalswarningseedEtcd3 pod {{ $labels.pod }} has seen {{ $value }} proposal failures within the last hour.
KubeEtcd3DbSizeLimitApproachingwarningseedEtcd3 {{ $labels.role }} DB size is approaching its current practical limit of 2GB.
KubeEtcd3DbSizeLimitCrossedcriticalseedEtcd3 {{ $labels.role }} DB size has crossed its current practical limit of 2GB. Etcd might now require more memory to continue serving traffic with low latency, and might face request throttling.
KubeEtcdDeltaBackupFailedcriticalseedNo delta snapshot for the past at least 30 minutes.
KubeEtcdFullBackupFailedcriticalseedNo full snapshot taken in the past day.
KubeEtcdRestorationFailedcriticalseedEtcd data restoration was triggered, but has failed.
KubeletTooManyOpenFileDescriptorsSeedcriticalseedSeed-kubelet ({{ $labels.kubernetes_io_hostname }}) is using {{ $value }}% of the available file/socket descriptors. Kubelet could be under heavy load.
KubePersistentVolumeUsageCriticalcriticalseedThe PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} is only {{ printf "%0.2f" $value }}% free.
KubePersistentVolumeFullInFourDayswarningseedBased on recent sampling, the PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} is expected to fill up within four days. Currently {{ printf "%0.2f" $value }}% is available.
KubePodPendingControlPlanewarningseedPod {{ $labels.pod }} is stuck in "Pending" state for more than 30 minutes.
KubePodNotReadyControlPlanewarningPod {{ $labels.pod }} is not ready for more than 30 minutes.
KubeSchedulerDowncriticalseedNew pods are not being assigned to nodes.
KubeStateMetricsShootDowninfoseedThere are no running kube-state-metric pods for the shoot cluster. No kubernetes resource metrics can be scraped.
KubeStateMetricsSeedDowncriticalseedThere are no running kube-state-metric pods for the seed cluster. No kubernetes resource metrics can be scraped.
NoWorkerNodesblockerThere are no worker nodes in the cluster or all of the worker nodes in the cluster are not schedulable.
PrometheusCantScrapewarningseedPrometheus failed to scrape metrics. Instance {{ $labels.instance }}, job {{ $labels.job }}.
PrometheusConfigurationFailurewarningseedLatest Prometheus configuration is broken and Prometheus is using the previous one.
VPNShootNoPodscriticalshootvpn-shoot deployment in Shoot cluster has 0 available pods. VPN won't work.
VPNProbeAPIServerProxyFailedcriticalshootThe API Server proxy functionality is not working. Probably the vpn connection from an API Server pod to the vpn-shoot endpoint on the Shoot workers does not work.