Contributors to this page
Last update:

User Alerts

AlertnameSeverityTypeDescription
ApiServerUnreachableViaKubernetesServicecriticalshootThe Api server has been unreachable for 3 minutes via the kubernetes service in the shoot.
CoreDNSDowncriticalshootCoreDNS could not be found. Cluster DNS resolution will not work.
ApiServerNotReachableblockerseedAPI server not reachable via external endpoint: {{ $labels.instance }}.
KubeApiServerTooManyOpenFileDescriptorswarningseedThe API server ({{ $labels.instance }}) is using {{ $value }}% of the available file/socket descriptors.
KubeApiServerTooManyOpenFileDescriptorscriticalseedThe API server ({{ $labels.instance }}) is using {{ $value }}% of the available file/socket descriptors.
KubeApiServerLatencywarningseedKube API server latency for verb {{ $labels.verb }} is high. This could be because the shoot workers and the control plane are in different regions. 99th percentile of request latency is greater than 3 seconds.
KubeControllerManagerDowncriticalseedDeployments and replication controllers are not making progress.
KubeEtcd3DbSizeLimitApproachingwarningseedEtcd3 {{ $labels.role }} DB size is approaching its current practical limit of 8GB. Etcd quota might need to be increased.
KubeEtcd3DbSizeLimitCrossedcriticalseedEtcd3 {{ $labels.role }} DB size has crossed its current practical limit of 8GB. Etcd quota must be increased to allow updates.
KubeKubeletNodeDownwarningshootThe kubelet {{ $labels.instance }} has been unavailable/unreachable for more than 1 hour. Workloads on the affected node may not be schedulable.
KubeletTooManyOpenFileDescriptorsShootwarningshootShoot-kubelet ({{ $labels.kubernetes_io_hostname }}) is using {{ $value }}% of the available file/socket descriptors. Kubelet could be under heavy load.
KubeletTooManyOpenFileDescriptorsShootcriticalshootShoot-kubelet ({{ $labels.kubernetes_io_hostname }}) is using {{ $value }}% of the available file/socket descriptors. Kubelet could be under heavy load.
KubePodPendingShootwarningshootPod {{ $labels.pod }} is stuck in "Pending" state for more than 1 hour.
KubePodNotReadyShootwarningshootPod {{ $labels.pod }} is not ready for more than 1 hour.
NoWorkerNodesblockerThere are no worker nodes in the cluster or all of the worker nodes in the cluster are not schedulable.
NodeExporterDownwarningshootThe NodeExporter has been down or unreachable from Prometheus for more than 1 hour.
K8SNodeOutOfDiskcriticalshootNode {{ $labels.node }} has run out of disk space.
K8SNodeMemoryPressurewarningshootNode {{ $labels.node }} is under memory pressure.
K8SNodeDiskPressurewarningshootNode {{ $labels.node }} is under disk pressure
VMRootfsFullcriticalshootRoot filesystem device on instance {{ $labels.instance }} is almost full.
VMConntrackTableFullcriticalshootThe nf_conntrack table is {{ $value }}% full.
VPNProbeAPIServerProxyFailedcriticalshootThe API Server proxy functionality is not working. Probably the vpn connection from an API Server pod to the vpn-shoot endpoint on the Shoot workers does not work.