Contributors to this page
Last update:

User Alerts

AlertnameSeverityTypeDescription
ApiServerUnreachableViaKubernetesServicecriticalshootThe Api server has been unreachable for 3 minutes via the kubernetes service in the shoot.
CoreDNSDowncriticalshootCoreDNS could not be found. Cluster DNS resolution will not work.
ApiServerNotReachableblockerseedAPI server not reachable via external endpoint: {{ $labels.instance }}.
KubeApiServerLatencywarningseedKube API server latency for verb {{ $labels.verb }} is high. This could be because the shoot workers and the control plane are in different regions. 99th percentile of request latency is greater than 3 second.
KubeApiServerTooManyOpenFileDescriptorswarningseedThe API server ({{ $labels.instance }}) is using {{ $value }}% of the available file/socket descriptors.
KubeApiServerTooManyOpenFileDescriptorscriticalseedThe API server ({{ $labels.instance }}) is using {{ $value }}% of the available file/socket descriptors.
KubeControllerManagerDowncriticalseedDeployments and replication controllers are not making progress.
KubeEtcd3DbSizeLimitApproachingwarningseedEtcd3 {{ $labels.role }} DB size is approaching its current practical limit of 2GB.
KubeEtcd3DbSizeLimitCrossedcriticalseedEtcd3 {{ $labels.role }} DB size has crossed its current practical limit of 2GB. Etcd might now require more memory to continue serving traffic with low latency, and might face request throttling.
KubeKubeletNodeDownwarningshootThe kubelet {{ $labels.instance }} has been unavailable/unreachable for more than 1 hour. Workloads on the affected node may not be schedulable.
KubeKubeletTooManyPodswarningKubelet {{ $labels.instance }} is running {{ $value }} pods, close to the limit of 110
KubeletTooManyOpenFileDescriptorsShootwarningshootShoot-kubelet ({{ $labels.kubernetes_io_hostname }}) is using {{ $value }}% of the available file/socket descriptors. Kubelet could be under heavy load.
KubeletTooManyOpenFileDescriptorsShootcriticalshootShoot-kubelet ({{ $labels.kubernetes_io_hostname }}) is using {{ $value }}% of the available file/socket descriptors. Kubelet could be under heavy load.
KubePodPendingShootwarningshootPod {{ $labels.pod }} is stuck in "Pending" state for more than 1 hour.
KubePodNotReadyShootwarningshootPod {{ $labels.pod }} is not ready for more than 1 hour.
KubeSchedulerDowncriticalseedNew pods are not being assigned to nodes.
NoWorkerNodesblockerThere are no worker nodes in the cluster or all of the worker nodes in the cluster are not schedulable.
NodeExporterDownwarningshootThe NodeExporter has been down or unreachable from Prometheus for more than 1 hour.
K8SNodeOutOfDiskcriticalshootNode {{ $labels.node }} has run out of disk space.
K8SNodeMemoryPressurewarningshootNode {{ $labels.node }} is under memory pressure.
K8SNodeDiskPressurewarningshootNode {{ $labels.node }} is under disk pressure
VMRootfsFullcriticalshootRoot filesystem device on instance {{ $labels.instance }} is almost full.
VMConntrackTableFullcriticalshootThe nf_conntrack table is {{ $value }}% full.
VPNProbeAPIServerProxyFailedcriticalshootThe API Server proxy functionality is not working. Probably the vpn connection from an API Server pod to the vpn-shoot endpoint on the Shoot workers does not work.