This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Node Audit Logging

Gardener extension controller which configures the rsyslog and auditd services installed on shoot nodes.

Gardener Extension to configure rsyslog with relp module

REUSE status CI Build status Go Report Card

Gardener extension controller which configures the rsyslog and auditd services installed on shoot nodes.

Usage

Local Setup and Development

1 - Configuration

Configuring the Rsyslog Relp Extension

Introduction

As a cluster owner, you might need audit logs on a Shoot node level. With these audit logs you can track actions on your nodes like privilege escalation, file integrity, process executions, and who is the user that performed these actions. Such information is essential for the security of your Shoot cluster. Linux operating systems collect such logs via the auditd and journald daemons. However, these logs can be lost if they are only kept locally on the operating system. You need a reliable way to send them to a remote server where they can be stored for longer time periods and retrieved when necessary.

Rsyslog offers a solution for that. It gathers and processes logs from auditd and journald and then forwards them to a remote server. Moreover, rsyslog can make use of the RELP protocol so that logs are sent reliably and no messages are lost.

The shoot-rsyslog-relp extension is used to configure rsyslog on each Shoot node so that the following can take place:

  1. Rsyslog reads logs from the auditd and journald sockets.
  2. The logs are filtered based on the program name and syslog severity of the message.
  3. The logs are enriched with metadata containing the name of the Project in which the Shoot is created, the name of the Shoot, the UID of the Shoot, and the hostname of the node on which the log event occurred.
  4. The enriched logs are sent to the target remote server via the RELP protocol.

The following graph shows a rough outline of how that looks in a Shoot cluster: rsyslog-logging-architecture

Shoot Configuration

The extension is not globally enabled and must be configured per Shoot cluster. The Shoot specification has to be adapted to include the shoot-rsyslog-relp extension configuration, which specifies the target server to which logs are forwarded, its port, and some optional rsyslog settings described in the examples below.

Below is an example shoot-rsyslog-relp extension configuration as part of the Shoot spec:

kind: Shoot
metadata:
  name: bar
  namespace: garden-foo
...
spec:
  extensions:
  - type: shoot-rsyslog-relp
    providerConfig:
      apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
      kind: RsyslogRelpConfig
      # Set the target server to which logs are sent. The server must support the RELP protocol.
      target: some.rsyslog-rlep.server
      # Set the port of the target server.
      port: 10250
      # Define rules to select logs from which programs and with what syslog severity
      # are forwarded to the target server.
      loggingRules:
      - severity: 4
        programNames: ["kubelet", "audisp-syslog"]
      - severity: 1
        programNames: ["audisp-syslog"]
      # Define an interval of 90 seconds at which the current connection is broken and re-established.
      # By default this value is 0 which means that the connection is never broken and re-established.
      rebindInterval: 90
      # Set the timeout for relp sessions to 90 seconds. If set too low, valid sessions may be considered
      # dead and tried to recover.
      timeout: 90
      # Set how often an action is retried before it is considered to have failed.
      # Failed actions discard log messages. Setting `-1` here means that messages are never discarded.
      resumeRetryCount: -1
      # Configures rsyslog to report continuation of action suspension, e.g. when the connection to the target
      # server is broken.
      reportSuspensionContinuation: true
      # Add tls settings if tls should be used to encrypt the connection to the target server.
      tls:
        enabled: true
        # Use `name` authentication mode for the tls connection.
        authMode: name
        # Only allow connections if the server's name is `some.rsyslog-rlep.server`
        permittedPeer:
        - "some.rsyslog-rlep.server"
        # Reference to the resource which contains certificates used for the tls connection.
        # It must be added to the `.spec.resources` field of the Shoot.
        secretReferenceName: rsyslog-relp-tls
        # Instruct librelp on the Shoot nodes to use the gnutls tls library.
        tlsLib: gnutls
  resources:
    # Add the rsyslog-relp-tls secret in the resources field of the Shoot spec.
    - name: rsyslog-relp-tls
      resourceRef:
        apiVersion: v1
        kind: Secret
        name: rsyslog-relp-tls-v1
...

Choosing Which Log Messages to Send to the Target Server

The .loggingRules field defines rules about which logs should be sent to the target server. When a log is processed by rsyslog, it is compared against the list of rules in order. If the program name and the syslog severity of the log messages matches the rule, the message is forwarded to the target server. The following table describes the syslog severity and their corresponding codes:

Numerical         Severity
  Code

  0               Emergency: system is unusable
  1               Alert: action must be taken immediately
  2               Critical: critical conditions
  3               Error: error conditions
  4               Warning: warning conditions
  5               Notice: normal but significant condition
  6               Informational: informational messages
  7               Debug: debug-level messages

Below is an example with a .loggingRules section that will only forward logs from the kubelet program with syslog severity of 6 or lower and any other program with syslog severity of 2 or lower:

apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: RsyslogRelpConfig
target: localhost
port: 1520
loggingRules:
- severity: 6
  programNames: ["kubelet"]
- severity: 2

You can use a minimal shoot-rsyslog-relp extension configuration to forward all logs to the target server:

apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
kind: RsyslogRelpConfig
target: some.rsyslog-rlep.server
port: 10250
loggingRules:
- severity: 7

Securing the Communication to the Target Server with TLS

The communication to the target server is not encrypted by default. To enable encryption, set the .tls.enabled field in the shoot-rsyslog-relp extension configuration to true. In this case, an immutable secret which contains the TLS certificates used to establish the TLS connection to the server must be created in the same project namespace as your Shoot.

An example Secret is given below:

Note: The secret must be immutable

kind: Secret
apiVersion: v1
metadata:
  name: rsyslog-relp-tls-v1
  namespace: garden-foo
immutable: true
data:
  ca: |
    -----BEGIN BEGIN RSA PRIVATE KEY-----
    ...
    -----END RSA PRIVATE KEY-----    
  crt: |
    -----BEGIN BEGIN RSA PRIVATE KEY-----
    ...
    -----END RSA PRIVATE KEY-----    
  key: |
    -----BEGIN BEGIN RSA PRIVATE KEY-----
    ...
    -----END RSA PRIVATE KEY-----    

The Secret must be referenced in the Shoot’s .spec.resources field and the corresponding resource entry must be referenced in the .tls.secretReferenceName of the shoot-rsyslog-relp extension configuration:

kind: Shoot
metadata:
  name: bar
  namespace: garden-foo
...
spec:
  extensions:
  - type: shoot-rsyslog-relp
    providerConfig:
      apiVersion: rsyslog-relp.extensions.gardener.cloud/v1alpha1
      kind: RsyslogRelpConfig
      target: some.rsyslog-rlep.server
      port: 10250
      loggingRules:
      - severity: 7
      tls:
        enabled: true
        secretReferenceName: rsyslog-relp-tls
  resources:
    - name: rsyslog-relp-tls
      resourceRef:
        apiVersion: v1
        kind: Secret
        name: rsyslog-relp-tls-v1
...

You can set a few additional parameters for the TLS connection: .tls.authMode, tls.permittedPeer, and tls.tlsLib. Refer to the rsyslog documentation for more information on these parameters:

2 - Deploying Rsyslog Relp Extension Remotely

Learn how to set up a development environment using own Seed clusters on an existing Kubernetes cluster

Deploying Rsyslog Relp Extension Remotely

This document will walk you through running the Rsyslog Relp extension controller on a remote seed cluster and the rsyslog relp admission component in your local garden cluster for development purposes. This guide uses Gardener’s setup with provider extensions and builds on top of it.

If you encounter difficulties, please open an issue so that we can make this process easier.

Prerequisites

Setting up the Rsyslog Relp Extension

Important: Make sure that your KUBECONFIG env variable is targeting the local Gardener cluster!

The location of the Gardener project from the Gardener setup is expected to be under the same root as this repository (e.g. ~/go/src/github.com/gardener/). If this is not the case, the location of Gardener project should be specified in GARDENER_REPO_ROOT environment variable:

export GARDENER_REPO_ROOT="<path_to_gardener_project>"

Then you can run:

make remote-extension-up

In case you have added additional Seeds you can specify the seed name:

make remote-extension-up SEED_NAME=<seed-name>

Creating a Shoot Cluster

Once the above step is completed, you can create a Shoot cluster. In order to create a Shoot cluster, please create your own Shoot definition depending on providers on your Seed cluster.

Configuring the Shoot Cluster and deploying the Rsyslog Relp Echo Server

To be able to properly test the rsyslog relp extension you need a running rsyslog relp echo server to which logs from the Shoot nodes can be sent. To deploy the server and configure the rsyslog relp extension on your Shoot cluster you can run:

make configure-shoot SHOOT_NAME=<shoot-name> SHOOT_NAMESPACE=<shoot-namespace>

This command will deploy an rsyslog relp echo server in your Shoot cluster in the rsyslog-relp-echo-server namespace. It will also add configuration for the shoot-rsyslog-relp extension to your Shoot spec by patching it with ./example/extension/<shoot-name>--<shoot-namespace>--extension-config-patch.yaml. This file is automatically copied from extension-config-patch.yaml.tmpl in the same directory when you run make configure-shoot for the first time. The file also includes explanations of the properties you should set or change. The command will also deploy the rsyslog-relp-tls secret in case you wish to enable tls.

Tearing Down the Development Environment

To tear down the development environment, delete the Shoot cluster or disable the shoot-rsyslog-relp extension in the Shoot’s specification. When the extension is not used by the Shoot anymore, you can run:

make remote-extension-down

The make target will delete the ControllerDeployment and ControllerRegistration of the extension, and the shoot-rsyslog-relp admission helm deployment.

3 - Getting Started

Deploying Rsyslog Relp Extension Locally

This document will walk you through running the Rsyslog Relp extension and a fake rsyslog relp service on your local machine for development purposes. This guide uses Gardener’s local development setup and builds on top of it.

If you encounter difficulties, please open an issue so that we can make this process easier.

Prerequisites

  • Make sure that you have a running local Gardener setup. The steps to complete this can be found here.
  • Make sure you are running Gardener version >= 1.74.0 or the latest version of the master branch.

Setting up the Rsyslog Relp Extension

Important: Make sure that your KUBECONFIG env variable is targeting the local Gardener cluster!

make extension-up

This will build the shoot-rsyslog-relp, shoot-rsyslog-relp-admission, and shoot-rsyslog-relp-echo-server images and deploy the needed resources and configurations in the garden cluster. The shoot-rsyslog-relp-echo-server will act as development replacement of a real rsyslog relp server.

Creating a Shoot Cluster

Once the above step is completed, we can deploy and configure a Shoot cluster with default rsyslog relp settings.

kubectl apply -f ./example/shoot.yaml

Once the Shoot’s namespace is created, we can create a networkpolicy that will allow egress traffic from the rsyslog on the Shoot’s nodes to the rsyslog-relp-echo-server that serves as a fake rsyslog target server.

kubectl apply -f ./example/local/allow-machine-to-rsyslog-relp-echo-server-netpol.yaml

Currently, the Shoot’s nodes run Ubuntu, which does not have the rsyslog-relp and auditd packages installed, so the configuration done by the extension has no effect. Once the Shoot is created, we have to manually install the rsyslog-relp and auditd packages:

kubectl -n shoot--local--local exec -it $(kubectl -n shoot--local--local get po -l app=machine,machine-provider=local -o name) -- bash -c "
   apt-get update && \
   apt-get install -y rsyslog-relp auditd && \
   systemctl enable rsyslog.service && \
   systemctl start rsyslog.service"

Once that is done we can verify that log messages are forwarded to the rsyslog-relp-echo-server by checking its logs.

kubectl -n rsyslog-relp-echo-server logs deployment/rsyslog-relp-echo-server

Making Changes to the Rsyslog Relp Extension

Changes to the rsyslog relp extension can be applied to the local environment by repeatedly running the make recipe.

make extension-up

Tearing Down the Development Environment

To tear down the development environment, delete the Shoot cluster or disable the shoot-rsyslog-relp extension in the Shoot’s spec. When the extension is not used by the Shoot anymore, you can run:

make extension-down

This will delete the ControllerRegistration and ControllerDeployment of the extension, the shoot-rsyslog-relp-admission deployment, and the rsyslog-relp-echo-server deployment.

Maintaining the Publicly Available Image for the rsyslog-relp Echo Server

The testmachinery tests use an rsyslog-relp-echo-server image from a publicly available repository. The one which is currently used is eu.gcr.io/gardener-project/gardener/extensions/shoot-rsyslog-relp-echo-server:v0.1.0.

Sometimes it might be necessary to update the image and publish it, e.g. when updating the alpine base image version specified in the repository’s Dokerfile.

To do that:

  1. Bump the version with which the image is built in the Makefile.

  2. Build the shoot-rsyslog-relp-echo-server image:

    make echo-server-docker-image
    
  3. Once the image is built, push it to gcr with:

    make push-echo-server-image
    
  4. Finally, bump the version of the image used by the testmachinery tests here.

  5. Create a PR with the changes.

4 - Monitoring

Monitoring

The shoot-rsyslog-relp extension exposes metrics for the rsyslog service running on a Shoot’s nodes so that they can be easily viewed by cluster owners and operators in the Shoot’s Prometheus and Plutono instances. The exposed monitoring data offers valuable insights into the operation of the rsyslog service and can be used to detect and debug ongoing issues. This guide describes the various metrics, alerts and logs available to cluster owners and operators.

Metrics

Metrics for the rsyslog service originate from its impstats module. These include the number of messages in the various queues, the number of ingested messages, the number of processed messages by configured actions, system resources used by the rsyslog service, and others. More information about them can be found in the impstats documentation and the statistics counter documentation. They are exposed via the node-exporter running on each Shoot node and are scraped by the Shoot’s Prometheus instance.

These metrics can also be viewed in a dedicated dashboard named Rsyslog Stats in the Shoot’s Plutono instance. You can select the node for which you wish the metrics to be displayed from the Node dropdown menu (by default metrics are summed over all nodes).

Following is a list of all exposed rsyslog metrics. The name and origin labels can be used to determine wether the metric is for: a queue, an action, plugins or system stats; the node label can be used to determine the node the metric originates from:

rsyslog_pstat_submitted

Number of messages that were submitted to the rsyslog service from its input. Currently rsyslog uses the /run/systemd/journal/syslog socket as input.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_processed

Number of messages that are successfully processed by an action and sent to the target server.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_failed

Number of messages that could not be processed by an action nor sent to the target server.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_suspended

Total number of times an action suspended itself. Note that this counts the number of times the action transitioned from active to suspended state. The counter is no indication of how long the action was suspended or how often it was retried.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_suspended_duration

The total number of seconds this action was disabled.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_resumed

The total number of times this action resumed itself. A resumption occurs after the action has detected that a failure condition does no longer exist.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_utime

User time used in microseconds.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_stime

System time used in microsends.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_maxrss

Maximum resident set size

  • Type: Gauge
  • Labels: name node origin

rsyslog_pstat_minflt

Total number of minor faults the task has made per second, those which have not required loading a memory page from disk.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_majflt

Total number of major faults the task has made per second, those which have required loading a memory page from disk.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_inblock

Filesystem input operations.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_oublock

Filesystem output operations.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_nvcsw

Voluntary context switches.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_nivcsw

Involuntary context switches.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_openfiles

Number of open files.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_size

Messages currently in queue.

  • Type: Gauge
  • Labels: name node origin

rsyslog_pstat_enqueued

Total messages enqueued.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_full

Times queue was full.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_discarded_full

Messages discarded due to queue being full.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_discarded_nf

Messages discarded when queue not full.

  • Type: Counter
  • Labels: name node origin

rsyslog_pstat_maxqsize

Maximum size queue has reached.

  • Type: Gauge
  • Labels: name node origin

Alerts

There are two alerts defined for the rsyslog service in the Shoot’s Prometheus instance:

RsyslogTooManyRelpActionFailures

This indicates that the cumulative failure rate in processing relp action messages is greater than 2%. In other words, it compares the rate of processed relp action messages to the rate of failed relp action messages and fires an alert when the following expression evaluates to true:

sum(rate(rsyslog_pstat_failed{origin="core.action",name="rsyslg-relp"}[5m])) / sum(rate(rsyslog_pstat_processed{origin="core.action",name="rsyslog-relp"}[5m])) > bool 0.02`

RsyslogRelpActionProcessingRateIsZero

This indicates that no messages are being sent to the upstream rsyslog target by the relp action. An alert is fired when the following expression evaluates to true:

rate(rsyslog_pstat_processed{origin="core.action",name="rsyslog-relp"}[5m]) == 0

Users can subscribe to these alerts by following the Gardener alerting guide.

Logging

There are two ways to view the logs of the rsyslog service running on the Shoot’s nodes - either using the Explore tab of the Shoot’s Plutono instance, or ssh-ing directly to a node.

To view logs in Plutono, navigate to the Explore tab and select vali from the Explore dropdown menu. Afterwards enter the following vali query:

{nodename="<name-of-node>"} |~ "\"unit\":\"rsyslog.service\""

Notice that you cannot use the unit label to filter for the rsyslog.service unit logs. Instead, you have to grep for the service as displayed in the example above.

To view logs when directly ssh-ing to a node in the Shoot cluster, use either of the following commands on the node:

systemctl status rsyslog

journalctl -u rsyslog

5 - Shoot Rsyslog Relp

Developer Docs for Gardener Shoot Rsyslog Relp Extension

This document outlines how Shoot reconciliation and deletion works for a Shoot with the shoot-rsyslog-relp extension enabled.

Shoot Reconciliation

This section outlines how the reconciliation works for a Shoot with the shoot-rsyslog-relp extension enabled.

Extension Enablement / Reconciliation

This section outlines how the extension enablement/reconciliation works, e.g., the extension has been added to the Shoot spec.

  1. As part of the Shoot reconciliation flow, the gardenlet deploys the Extension resource.
  2. The shoot-rsyslog-relp extension reconciles the Extension resource. pkg/controller/lifecycle/actuator.go contains the implementation of the extension.Actuator interface. The reconciliation of an Extension of type shoot-rsyslog-relp only deploys the necessary monitoring configuration - the shoot-rsyslog-relp-dashboards ConfigMap which contains the definitions for: Plutono dashboard for the Rsyslog component, and the shoot-shoot-rsyslog-relp ServiceMonitor and PrometheusRule resources which contains the definitions for: scraping metrics by prometheus, alerting rules.
  3. As part of the Shoot reconciliation flow, the gardenlet deploys the OperatingSystemConfig resource.
  4. The shoot-rsyslog-relp extension serves a webhook that mutates the OperatingSystemConfig resource for Shoots having the shoot-rsyslog-relp extension enabled (the corresponding namespace gets labeled by the gardenlet with extensions.gardener.cloud/shoot-rsyslog-relp=true). pkg/webhook/operatingsystemconfig/ensurer.go contains implementation of the genericmutator.Ensurer interface.
    1. The webhook renders the 60-audit.conf.tpl template script and appends it to the OperatingSystemConfig files. When rendering the template, the configuration of the shoot-rsyslog-relp extension is used to fill in the required template values. The file is installed as /var/lib/rsyslog-relp-configurator/rsyslog.d/60-audit.conf on the host OS.
    2. The webhook appends the audit rules to the OperatingSystemConfig. The files are installed under /var/lib/rsyslog-relp-configurator/rules.d on the host OS.
    3. The webhook renders the configure-rsyslog.tpl.sh script and appends it to the OperatingSystemConfig files. This script is installed as /var/lib/rsyslog-relp-configurator/configure-rsyslog.sh on the host OS. It keeps the configuration of the rsyslog systemd service up-to-date by copying /var/lib/rsyslog-relp-configurator/rsyslog.d/60-audit.conf to /etc/rsyslog.d/60-audit.conf, if /etc/rsyslog.d/60-audit.conf does not exist or the files differ. The script also takes care of syncing the audit rules in /etc/audit/rules.d with the ones installed in /var/lib/rsyslog-relp-configurator/rules.d and restarts the auditd systemd service if necessary.
    4. The webhook renders the process-rsyslog-pstats.tpl.sh and appends it to the OperatingSystemConfig files. This script receives metrics from the rsyslog process, transforms them, and writes them to /var/lib/node-exporter/textfile-collector/rsyslog_pstats.prom so that they can be collected by the node-exporter.
    5. As part of the Shoot reconciliation, before the shoot-rsyslog-relp extension is deployed, the gardenlet copies all Secret and ConfigMap resources referenced in .spec.resources[] to the Shoot’s control plane namespace on the Seed. When the .tls.enabled field is true in the shoot-rsyslog-relp extension configuration, a value for .tls.secretReferenceName must also be specified so that it references a named resource reference in the Shoot’s .spec.resources[] array. The webhook appends the data of the referenced Secret in the Shoot’s control plane namespace to the OperatingSystemConfig files.
    6. The webhook appends the rsyslog-configurator.service unit to the OperatingSystemConfig units. The unit invokes the configure-rsyslog.sh script every 15 seconds.

Extension Disablement

This section outlines how the extension disablement works, i.e., the extension has to be removed from the Shoot spec.

  1. As part of the Shoot reconciliation flow, the gardenlet destroys the Extension resource because it is no longer needed.
    1. As part of the deletion flow, the shoot-rsyslog-relp extension deploys the rsyslog-relp-configuration-cleaner DaemonSet to the Shoot cluster to clean up the existing rsyslog configuration and revert the audit rules.

Shoot Deletion

This section outlines how the deletion works for a Shoot with the shoot-rsyslog-relp extension enabled.

  1. As part of the Shoot deletion flow, the gardenlet destroys the Extension resource.
    1. In the Shoot deletion flow, the Extension resource is deleted after the Worker resource. Hence, there is no need to deploy the rsyslog-relp-configuration-cleaner DaemonSet to the Shoot cluster to clean up the existing rsyslog configuration and revert the audit rules.