11 minute read
In this blog post, we will explore how to set up an OpenTelemetry based observability stack on a Gardener shoot cluster. OpenTelemetry is an open-source observability framework that provides a set of APIs, SDKs, agents, and instrumentation to collect telemetry data from applications and systems. It provides a unified approach for collecting, processing, and exporting telemetry data such as traces, metrics, and logs. In addition, it gives flexibility in designing observability stacks, helping avoid vendor lock-in and allowing users to choose the most suitable tools for their use cases.
Here we will focus on setting up OpenTelemetry for a Gardener shoot cluster, collecting both logs and metrics and exporting them to various backends. We will use the OpenTelemetry Operator to simplify the deployment and management of OpenTelemetry collectors on Kubernetes and demonstrate some best practices for configuration including security and performance considerations.
To follow along with this guide, you will need:
kubectl
configured to access the cluster.shoot-cert-service
enabled on the shoot cluster, to manage TLS certificates for the OpenTelemetry Collectors and backends.Here we use a self managed mTLS architecture with an illustration purpose. In a production environment, you would typically use a managed certificate authority (CA) or a service mesh to handle mTLS certificates and encryption. However, there might be cases where you want to have flexibility in authentication and authorization mechanisms, for example, by leveraging Kubernetes RBAC to determine whether a service is authorized to connect to a backend or not. In our illustration, we will use a kube-rbac-proxy
as a sidecar to the backends, to enforce the mTLS authentication and authorization. The kube-rbac-proxy
is a reverse proxy that uses Kubernetes RBAC to control access to services, allowing us to define fine-grained access control policies.
The kube-rbac-proxy
extracts the identity of the client (OpenTelemetry collector) from the CommonName (CN) field of the TLS certificate and uses it to perform authorization checks against the Kubernetes API server. This enables fine-grained access control policies based on client identity, ensuring that only authorized clients can connect to the backends.
First, set up the Issuer
certificate in the Gardener shoot cluster, allowing you to later issue and manage TLS certificates for the OpenTelemetry collectors and the backends. To allow a custom issuer, the shoot cluster shall be configured with the shoot-cert-service
extension.
kind: Shoot
apiVersion: core.gardener.cloud/v1beta1
metadata:
name: my-shoot
namespace: my-project
...
spec:
extensions:
- type: shoot-cert-service
providerConfig:
apiVersion: service.cert.extensions.gardener.cloud/v1alpha1
kind: CertConfig
shootIssuers:
enabled: true
...
Once the shoot is reconciled, the Issuer.cert.gardener.cloud
resources will be available.
We can use openssl
to create a self-signed CA certificate that will be used to sign the TLS certificates for the OpenTelemetry Collector and backends.
openssl genrsa -out ./ca.key 4096
openssl req -x509 -new -nodes -key ./ca.key -sha256 -days 365 -out ./ca.crt -subj "/CN=ca"
# Create namespace and apply the CA secret and issuer
kubectl create namespace certs \
--dry-run=client -o yaml | kubectl apply -f -
# Create the CA secret in the certs namespace
kubectl create secret tls ca --namespace certs \
--key=./ca.key --cert=./ca.crt \
--dry-run=client -o yaml | kubectl apply -f -
Next, we will create the cluster Issuer
resource, referencing the CA secret we just created.
apiVersion: cert.gardener.cloud/v1alpha1
kind: Issuer
metadata:
name: issuer-selfsigned
namespace: certs
spec:
ca:
privateKeySecretRef:
name: ca
namespace: certs
Later, we can create Certificate
resources to securely connect the OpenTelemetry collectors to the backends.
To deploy the OpenTelemetry Operator on your Gardener Shoot Cluster, we can use the project helm chart with a minimum configuration. The important part is to set the collector
image to the latest contrib
distribution image which determines the set of receivers, processors, and exporters plugins that will be available in the OpenTelemetry collector instance. There are several pre-built distributions available such as: otelcol
, otelcol-contrib
, otelcol-k8s
, otelcol-otlp
, and otelcol-ebpf-profiler
. For the purpose of this guide, we will use the otelcol-contrib
distribution, which includes a wide range of plugins for various backends and data sources.
manager:
collectorImage:
repository: "otel/opentelemetry-collector-contrib"
Setting up the backends is a straightforward process. We will use plain resource manifests for illustration purposes, outlining the important parts allowing OpenTelemetry collectors to connect securely to the backends using mTLS. An important part is enabling the respective OTLP
ingestion endpoints on the backends, which will be used by the OpenTelemetry collectors to send telemetry data. In a production environment, the lifecycle of the backends will be probably managed by the respective component’s operators
Here is the complete list of manifests for deploying a single prometheus instance with the OTLP
ingestion endpoint and a kube-rbac-proxy
sidecar for mTLS authentication and authorization:
Prometheus Certificate
That is the serving certificate of the kube-rbac-proxy
sidecar. The OpenTelemetry collector needs to trust the signing CA, hence we use the same Issuer
we created earlier.
Prometheus
The prometheus needs to be configured to allow OTLP
ingestion endpoint: --web.enable-OTLP-receiver
. That allows the OpenTelemetry collector to push metrics to the Prometheus instance (via the kube-rbac-proxy
sidecar).
Prometheus Configuration
In Prometheus’ case, the OpenTelemetry
resource attributes usually set by the collectors can be used to determine labels for the metrics. This is illustrated in the collector’s prometheus
receiver configuration. A common and unified set of labels across all metrics collected by the OpenTelemetry collector is a fundamental requirement for sharing and understanding the data across different teams and systems. This common set is defined by the OpenTelemetry Semantic Conventions specification. For example ,k8s.pod.name
, k8s.namespace.name
, k8s.node.name
, etc. are some of the common labels that can be used to identify the source of the observability signals. Those are also common across the different types of telemetry data (traces, metrics, logs), serving correlation and analysis use cases.
mTLS Proxy rbac
This example defines a Role
allowing requests to the prometheus backend to pass the kube-rbac-proxy.
rules:
- apiGroups: ["authorization.kubernetes.io"]
resources:
- observabilityapps/prometheus
verbs: ["get", "create"] # GET, POST
In this example, we allow GET
and POST
requests to reach the prometheus upstream service, if the request is authenticated with a valid mTLS certificate and the identified user is allowed to access the Prometheus service by the corresponding RoleBinding
. PATCH
and DELETE
requests are not allowed. The mapping between the http request methods and the Kubernetes RBAC verbs is seen at kube-rbac-proxy/proxy.go.
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: client
mTLS Proxy resource-attributes
kube-rbac-proxy
creates Kubernetes SubjectAccessReview
to determine if the request is allowed to pass. The SubjectAccessReview
is created with the resourceAttributes
set to the upstream service, in this case the Prometheus service.
In our example, we will use victoria-logs as the logs backend. victoria-logs is a high-performance, cost-effective, and scalable log management solution. It is designed to work seamlessly with Kubernetes and provides powerful querying capabilities. It is important to note that any OTLP
compatible backend can be used as a logs backend, allowing flexibility in choosing the best tool for the concrete needs.
Here is the complete manifests for deploying a single victoria-logs
instance with the OTLP
ingestion endpoint enabled and kube-rbac-proxy
sidecar for mTLS authentication and authorization, using the upstream helm chart:
kube-rbac-proxy
sidecar. The OpenTelemetry collector needs to trust the signing CA hence we use the same Issuer
we created earlier.kube-rbac-proxy
sidecar.Role
allows requests to the VictoriaLogs backend to pass the kube-rbac-proxy.By now we shall have a working Prometheus and victoria-logs backends, both secured with mTLS and ready to accept telemetry data from the OpenTelemetry collector.
We are going to deploy two OpenTelemetry collectors: k8s-events
and shoot-metrics
. Both collectors will emit their own telemetry data in addition to the data collected from the respective receivers.
In this example, we use 2 receivers:
Here is an example of Kubernetes events persited in the victoria-logs
backend. It filters logs which represents events from kube-system
namesapce related to a rollout restart of the target statefulset. Then it formats the UI to show the event reason and object name.
The collector features few important configurations related to reliability and performance. The collected metrics points are are sent in batches to the Prometheus backend using the corresponding OTLP exporter and the memory consumption of the collector is also limited. In general, it is always a good practice to set a memory limiter and batch processing in the collector pipeline.
processors:
memory_limiter:
check_interval: 1s
limit_percentage: 80
spike_limit_percentage: 2
batch:
timeout: 5s
send_batch_size: 1000
Allowing the collector to emit its own telemetry data is configured in the service section of the collector configuration.
service:
# Configure the collector own telemetry
telemetry:
# Emit collector logs to stdout, you can also push them to a backend.
logs:
level: info
encoding: console
output_paths: [stdout]
error_output_paths: [stderr]
# Push collector internal metrics to Prometheus
metrics:
level: detailed
readers:
- # push metrics to Prometheus backend
periodic:
interval: 30000
timeout: 10000
exporter:
OTLP:
protocol: http/protobuf
endpoint: "${env:PROMETHEUS_URL}/api/v1/OTLP/v1/metrics"
insecure: false # Ensure server certificate is validated against the CA
certificate: /etc/cert/ca.crt
client_certificate: /etc/cert/tls.crt
client_key: /etc/cert/tls.key
The majority of the samples use an prometheus receiver to scrape the collector metrics endpoint, however that is not a clean solution because it puts the metrics via the pipeline, thus consuming resources and potentially causing performance issues. Instead, we use the periodic
reader to push the metrics directly to the Prometheus backend.
Since the k8s-events
collector obtains telemetry data from the kube-apiserver, it requires a corresponding set of permissions defined at k8s-events rbac manifests.
In this example, we have a single receiver:
kubelet
system service metrics. This receiver accepts standard Prometheus scrape configurations using kubernetes_sd_configs
to discover the targets dynamically. The kubernetes_sd_configs
allows the receiver to discover Kubernetes resources such as pods, nodes, and services, and scrape their metrics endpoints.Here, the example illustrates the prometheus
receiver scraping metrics from the kubelet
service, adding node kubernetes labels as labels to the scraped metrics and filtering the metrics to keep only the relevant ones. Since the kubelet metrics endpoint is secured, it needs the corresponding bearer token to be provided in the scrape configuration. The bearer token is automatically mounted in the pod by Kubernetes, allowing the OpenTelemetry collector to authenticate with the kubelet service.
- job_name: shoot-kube-kubelet
honor_labels: false
scheme: https
tls_config:
insecure_skip_verify: true
metrics_path: /metrics
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels:
- job
target_label: __tmp_prometheus_job_name
- target_label: job
replacement: kube-kubelet
action: replace
- target_label: type
replacement: shoot
action: replace
- source_labels:
- __meta_kubernetes_node_address_InternalIP
target_label: instance
action: replace
- regex: __meta_kubernetes_node_label_(.+)
action: labelmap
replacement: "k8s_node_label_$${1}"
metric_relabel_configs:
- source_labels:
- __name__
regex: ^(kubelet_running_pods|process_max_fds|process_open_fds|kubelet_volume_stats_available_bytes|kubelet_volume_stats_capacity_bytes|kubelet_volume_stats_used_bytes|kubelet_image_pull_duration_seconds_bucket|kubelet_image_pull_duration_seconds_sum|kubelet_image_pull_duration_seconds_count)$
action: keep
- source_labels:
- namespace
regex: (^$|^kube-system$)
action: keep
The collector also illustrates collecting metrics from cadvisor
endpoints and Gardener specific exporters such as
shoot-apiserver-proxy
, shoot-coredns
, etc. The exporters usually reside in the kube-system
namespace and are configured to expose metrics on a specific port.
Since we aimed at unified set of resources attribues accross all telemetry data, we can translate exporters metrics which do not conform the conventions in OpenTelemetry.
Here is an example of translating the metrics, produced by the kubelet, to the OpenTelemetry conventions using the transform/metrics
processor:
# Convert Prometheus metrics names to OpenTelemetry metrics names
transform/metrics:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(attributes["k8s.container.name"], attributes["container"]) where attributes["container"] != nil
- delete_key(attributes, "container") where attributes["container"] != nil
- set(attributes["k8s.pod.name"], attributes["pod"]) where attributes["pod"] != nil
- delete_key(attributes, "pod") where attributes["pod"] != nil
- set(attributes["k8s.namespace.name"], attributes["namespace"]) where attributes["namespace"] != nil
- delete_key(attributes, "namespace") where attributes["namespace"] != nil
Here is a visualization of container_network_transmit_bytes_total
metric collected from the cadvisor
endpoint of the kubelet
service, showing the network traffic in bytes transmitted by the vpn-shoot
containers.
Similarly to the k8s-events
collector, the shoot-metrics
collector also emits its own telemetry data, including metrics and logs. The collector is configured to push its own metrics to the Prometheus backend using the periodic
reader, avoiding the need for a separate Prometheus scrape configuration. It requires a corresponding set of permissions defined at shoot-metrics rbac manifest.
In this blog post, we have explored how to set up an OpenTelemetry based observability stack on a Gardener Shoot Cluster. We have demonstrated how to deploy the OpenTelemetry Operator, configure the backends prometheus and victoria-logs), and deploy OpenTelemetry collectors to obtain telemetry data from the cluster. We have also discussed best practices for configuration, including security and performance considerations. In this blog we have shown the unified set of resource attributes that can be used to identify the source of the telemetry data, allowing correlation and analysis across different teams and systems. We have demonstrated how to transform metrics labels which do not conform to the OpenTelemetry conventions, achieving a unified set of labels across all telemetry data. Finally, we have illustrated how to securely connect the OpenTelemetry collectors to the backends using mTLS and kube-rbac-proxy
for authentication and authorization.
We hope this guide will inspire you to get started with OpenTelemetry on a Gardener managed shoot cluster and equip you with ideas and best practices for building a powerful observability stack that meets your needs. For more information, please refer to the OpenTelemetry documentation and the Gardener documentation.