This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Guides

1 - Set Up Client Tools

1.1 - Automated Deployment

Automated deployment with kubectl

Introduction

With kubectl you can easily deploy an image from your local environment.

However, what if you want to use a automated deployment script on a CI server (e.g. Jenkins), but don’t want to store the KUBECONFIG on that server?

You can use kubectl and connect to the API-server of your cluster.

Prerequisites

  1. Create a service account user

    kubectl create serviceaccount deploy-user -n default
    
  2. Bind a role to the newly created serviceuser

    !!! Warning !!! In this example the preconfigured role edit and the namespace default is being used, please adjust the role to a more strict scope! see https://kubernetes.io/docs/admin/authorization/rbac/

    kubectl create rolebinding deploy-default-role --clusterrole=edit --serviceaccount=default:deploy-user --namespace=default
    
  3. Get the URL of your API-server

    APISERVER=$(kubectl config view | grep server | cut -f 2- -d ":" | tr -d " ")
    
  4. Get the service account

    SERVICEACCOUNT=$(kubectl get serviceaccount deploy-user -n default -o=jsonpath={.secrets[0].name})
    
  5. Generate a token for the serviceaccount

    TOKEN=$(kubectl get secret -n default $SERVICEACCOUNT -o=jsonpath={.data.token} | base64 -D)
    

Usage

You can deploy your app without setting the kubeconfig locally, you just need to pass the environment variables (e.g. store them in the Jenkins credentials store)

kubectl --server=${APIServer} --token=${TOKEN} --insecure-skip-tls-verify=true apply --filename myapp.yaml

1.2 - Kubeconfig Context as bash Prompt

Expose the active kubeconfig into bash

Use the Kubernetes command-line tool, kubectl, to deploy and manage applications on Kubernetes. Using kubectl, you can inspect cluster resources; create, delete, and update components

port-forward

By default, the kubectl configuration is located at ~/.kube/config.

Suppose you have two clusters, one for development work and one for scratch work.

How to handle this easily without copying the used configuration always to the right place?

Export the KUBECONFIG enviroment variable

bash$ export KUBECONFIG=<PATH-TO-M>-CONFIG>/kubeconfig-dev.yaml

How to determine which cluster is used by the kubectl command?

Determine active cluster

bash$ kubectl cluster-info
Kubernetes master is running at https://api.dev.garden.shoot.canary.k8s-hana.ondemand.com
KubeDNS is running at https://api.dev.garden.shoot.canary.k8s-hana.ondemand.com/api/v1/proxy/namespaces/kube-system/services/kube-dns

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
bash$ 

Display cluster in the bash - Linux and alike

I found this tip on Stackoverflow and find it worth to be added here. Edit your ~/.bash_profile and add the following code snippet to show the current k8s context in the shell’s prompt.

prompt_k8s(){
  k8s_current_context=$(kubectl config current-context 2> /dev/null)
  if [[ $? -eq 0 ]] ; then echo -e "(${k8s_current_context}) "; fi
}
 
 
PS1+='$(prompt_k8s)'

After this your bash command prompt contains the active KUBECONFIG context and you always know which cluster is active - develop or production.

e.g.

bash$ export KUBECONFIG=/Users/d023280/Documents/workspace/gardener-ui/kubeconfig_gardendev.yaml 
bash (garden_dev)$ 

Note the (garden_dev) prefix in the bash command prompt.

This helps immensely to avoid thoughtless mistakes.

Display cluster in the PowerShell - Windows

Display current k8s cluster in the title of PowerShell window.

Create a profile file for your shell under %UserProfile%\Documents\Windows­PowerShell\Microsoft.PowerShell_profile.ps1

Copy following code to Microsoft.PowerShell_profile.ps1

 function prompt_k8s {
     $k8s_current_context = (kubectl config current-context) | Out-String
     if($?) {
         return $k8s_current_context
     }else {
         return "No K8S contenxt found"
     }
 }

 $host.ui.rawui.WindowTitle = prompt_k8s

port-forward

If you want to switch to different cluster, you can set KUBECONFIG to new value, and re-run the file Microsoft.PowerShell_profile.ps1

1.3 - Organizing Access Using kubeconfig Files

Organizing Access Using kubeconfig Files

The kubectl command-line tool uses kubeconfig files to find the information it needs to choose a cluster and communicate with the API server of a cluster.

Problem

If you’ve become aware of a security breach that affects you, you may want to revoke or cycle credentials in case anything was leaked. However, this is not possible with the initial or master kubeconfig from your cluster.

teaser

Pitfall

Never distribute the kubeconfig, which you can download directly within the Gardener dashboard, for a productive cluster.

kubeconfig-dont

Create custom kubeconfig file for each user

Create a separate kubeconfig for each user. One of the big advantages is, that you can revoke them and control the permissions better. A limitation to single namespaces is also possible here.

The script creates a new ServiceAccount with read privileges in the whole cluster (Secretes are excluded). To run the script jq, a lightweight and flexible command-line JSON processor, must be installed.

#!/bin/bash

if [[ -z "$1" ]] ;then
  echo "usage: $0 <username>"
  exit 1
fi

user=$1
kubectl create sa ${user}
secret=$(kubectl get sa ${user} -o json | jq -r .secrets[].name)
kubectl get secret ${secret} -o json | jq -r '.data["ca.crt"]' | base64 -D > ca.crt

user_token=$(kubectl get secret ${secret} -o json | jq -r '.data["token"]' | base64 -D)
c=`kubectl config current-context`
cluster_name=`kubectl config get-contexts $c | awk '{print $3}' | tail -n 1`
endpoint=`kubectl config view -o jsonpath="{.clusters[?(@.name == \"${cluster_name}\")].cluster.server}"`

# Set up the config
KUBECONFIG=k8s-${user}-conf kubectl config set-cluster ${cluster_name} \
    --embed-certs=true \
    --server=${endpoint} \
    --certificate-authority=./ca.crt

KUBECONFIG=k8s-${user}-conf kubectl config set-credentials ${user}-${cluster_name#cluster-} --token=${user_token}
KUBECONFIG=k8s-${user}-conf kubectl config set-context ${user}-${cluster_name#cluster-} \
    --cluster=${cluster_name} \
    --user=${user}-${cluster_name#cluster-}
KUBECONFIG=k8s-${user}-conf kubectl config use-context ${user}-${cluster_name#cluster-}

cat <<EOF | kubectl create -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: view-${user}-global
subjects:
- kind: ServiceAccount
  name: ${user}
  namespace: default
roleRef:
  kind: ClusterRole
  name: view
  apiGroup: rbac.authorization.k8s.io

EOF


echo "done! Test with: "
echo "export KUBECONFIG=k8s-${user}-conf"
echo "kubectl get pods"

If edit or admin rights are to be assigned, the ClusterRoleBinding must be adapted in the roleRef section with the roles listed below.

Furthermore, you can restrict this to a single namespace by not creating a ClusterRoleBinding but only a RoleBinding within the desired namespace.

Default ClusterRoleDefault ClusterRoleBindingDescription
cluster-adminsystem:masters groupAllows super-user access to perform any action on any resource. When used in a ClusterRoleBinding, it gives full control over every resource in the cluster and in all namespaces. When used in a RoleBinding, it gives full control over every resource in the rolebinding’s namespace, including the namespace itself.
adminNoneAllows admin access, intended to be granted within a namespace using a RoleBinding. If used in a RoleBinding, allows read/write access to most resources in a namespace, including the ability to create roles and rolebindings within the namespace. It does not allow write access to resource quota or to the namespace itself.
editNoneAllows read/write access to most objects in a namespace. It does not allow viewing or modifying roles or rolebindings.
viewNoneAllows read-only access to see most objects in a namespace. It does not allow viewing roles or rolebindings. It does not allow viewing secrets, since those are escalating.

1.4 - Use a Helm Chart to Deploy an Application or Service

Basically, Helm Charts can be installed as described e.g. in the Helm QuickStart Guide. However, our clusters come with RBAC enabled by default, hence Helm must be installed as follows:

Create a Service Account

Create a service account via the following command:

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: ServiceAccount
metadata:
 name: helm
 namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
 name: helm
roleRef:
 apiGroup: rbac.authorization.k8s.io
 kind: ClusterRole
 name: cluster-admin
subjects:
 - kind: ServiceAccount
   name: helm
   namespace: kube-system
EOF

Initialize Helm

Initialise Helm via helm init --service-account helm. You can now use helm.

In case of failure

In case you have already executed helm init, but without the above service account, you will get the following error: Error: User "system:serviceaccount:kube-system:default" cannot list configmaps in the namespace "kube-system". (get configmaps) (e.g. when you run helm list). You will now need to delete the Tiller deployment (Helm backend implicitly deployed to the Kubernetes cluster when you call helm init) as well as the local Helm files (usually $HELM_HOME is set to ~/.helm):

kubectl delete deployment tiller-deploy --namespace=kube-system
kubectl delete service tiller-deploy --namespace=kube-system 
rm -rf ~/.helm/

Now follow the instructions above. For more details see this Kubernetes Helm issue #2687.

2 - Install Gardener

2.1 - Hardening the Gardener Community Setup

Hardening the Gardener Community Setup

Context

Gardener stakeholders in the Open Source community usually use the Gardener Setup Scripts, to create a Garden cluster based on Kubernetes v1.9 which then can be used to create Shoot clusters based on Kubernetes v1.10, v1.11 and v1.12. Shoot clusters can play the following roles in a Gardener landscape:

  • Seed cluster
  • Shoot cluster

As Alban Crequy from Kinvolk has recommended in his recent Gardener blog Auditing Kubernetes for Secure Setup the Gardener Team at SAP has applied several means to harden the Gardener landscapes at SAP.

Recommendations

Mitigation for Gardener CVE-2018-2475

The following recommendations describe how you can harden your Gardener Community Setup by adding a Seed cluster hardened with network policies.

  • Use the Gardener Setup Scripts to create a Garden cluster in a dedicated IaaS account
  • Create a Shoot cluster in a different IaaS account
  • As a precaution you should not deploy the Kubernetes dashboard on this Shoot cluster
  • Register this newly created Shoot cluster as a Seed cluster in the Gardener
  • End user Shoot clusters can then be created using this newly created Seed cluster (which in turn is a Shoot cluster).

A tutorial on how to create a shooted seed cluster can be found here.

The rational behind this activity is, that Calico network policies harden this Seed cluster but the community installer uses Flannel which does not offer these features for the Garden cluster.

When you have added a hardened Seed cluster you are expected not be vulnerable to the Gardener CVE-2018-2475 anymore.

Mitigation for Kubernetes CVE-2018-1002105

In addition when you follow the recommendations in the recent Gardener Security Announcement you are expected not be vulnerable to the Kubernetes CVE-2018-1002105 with your hardened Gardener Community Setup.

Alternative Approach

For this alternative approach there is no Gardener blog available, it is not part of the Gardener Setup Scripts, but it was tested by the Gardener Team at SAP. Use GKE to host a Garden cluster based on Kubernetes v1.10, v1.11 and v1.12 (without the Kubernetes dashboard) in a dedicated GCP account. If you do this by your own, please ensure that the network policies are turned on, which might not be the case by default. Then you can apply the security configuration which Alban Crequy from Kinvolk has recommended in his blog directly in the Garden cluster and create Shoot clusters from there in a different IaaS account.

2.2 - Manually Adding a Node to an Existing Cluster

How to add a node to an existing cluster without the support of Gardener

Manually adding a node to an existing cluster

Gardener has an excellent ability to automatically scale machines for the cluster. From the point of view of scalability, there is no need for manual intervention.

This tutorial is useful for those end-users who need specifically configured nodes, which are not yet supported by Gardener. For example: an end-user who wants some workload that requires runnc instead of runc as container runtime.

teaser

Disclaimer

Here we will look at the steps on how to add a node to an existing cluster without the support of Gardener. Such a node will not be managed by Gardener, and if it goes down for any reason, Gardener will not be responsible to replace it.

How

  1. Create a new instance in the same VPC/network as other machines in the cluster. You should be able to ssh into the machine. So save its private key, and assign a public IP to it. If adding a public IP is not preferred, then ssh into any other machine in the cluster, and then ssh from there into the new machine using its private key.

    To ssh into a machine which is already in the cluster, use the steps defined here.

    Attach the same IAM role to the new machine which is attached to the existing machines in the cluster. This is required by kubelet in the new machine so that it can contact the cloud provider to query the node’s name.

  2. On the new machine, create file /var/lib/kubelet/kubeconfig-bootstrap with the following content:

apiVersion: v1
kind: Config
current-context: kubelet-bootstrap@default
clusters:
- cluster:
    certificate-authority-data: <CA Certificate>
    server: <Server>
  name: default
contexts:
- context:
    cluster: default
    user: kubelet-bootstrap
  name: kubelet-bootstrap@default
users:
- name: kubelet-bootstrap
  user:
    as-user-extra: {}
    token: <Token>
  1. ssh into an existing node, and run these commands to get the values of and to be replaced in above file:
  • <Servr>
 /opt/bin/hyperkube kubectl \
   --kubeconfig /var/lib/kubelet/kubeconfig-real \
   config view \
   -o go-template='{{index .clusters 0 "cluster" "server"}}' \
   --raw
  • <CA Certificate>
/opt/bin/hyperkube kubectl \
   --kubeconfig /var/lib/kubelet/kubeconfig-real \
   config view \
   -o go-template='{{index .clusters 0 "cluster" "certificate-authority-data"}}' \
   --raw
  1. <Token>
    The kubelet on the new machine needs a bootstrap token to authenticate with the kube-apiserver when adding itself to the cluster. Kube-apiserver uses a secret in the kube-system namespace to authenticate this token. This token is valid for 90 minutes from the time of creation, and the corresponding secret captures this detail in its .data.expiration field. The name of this secret is of the format bootstrap-token-*. Gardener takes care of creating new bootstrap tokens, and the corresponding secrets. To get an unexpired token, find the secrets with the name format bootstrap-token-* in the kube-system namespace in the cluster, and pick the one with minimum age. Eg. bootstrap-token-abcdef.
    Run these commands to get the token:

     tokenid=$(kubectl get secret bootstrap-token-abcdef -n kube-system -o go-template='{{index .data "token-id"}}' | base64 --decode)
    
     tokensecret=$(kubectl get secret bootstrap-token-abcdef -n kube-system -o go-template='{{index .data "token-secret"}}' | base64 --decode)
    
     echo $tokenid.$tokensecret
    

    The value of $TOKEN will be tokenid.tokensecret. Replace $TOKEN in above file with this value

  2. Copy contents of the files - /var/lib/kubelet/config/kubelet, /var/lib/kubelet/ca.crt and /etc/systemd/system/kubelet.service - from an existing node to the new node

  3. Run the following command in the new node to start the kubelet:

systemctl enable kubelet && systemctl start kubelet

The new node should be added to the existing cluster within a couple of minutes.

2.3 - Setting Up a Seed Cluster

How to configure a Kubernetes cluster as a Gardener seed

The Seed Cluster

The landscape-setup-template is meant to provide an as-simple-as-possible Gardener installation. Therefore it just registers the cluster where the Gardener is deployed on as a seed cluster. While this is easy, it might be insecure. Clusters created with Kubify don’t have network policies, for example. See Hardening the Gardener Community Setup for more information.

To have network policies on the seed cluster and avoid having the seed on the same cluster as the Gardener, the easiest option is probably to simply create a shoot and then register that shoot as seed. This way you can also leverage other advantages of shooted clusters for your seed, e.g. autoscaling.

Setting up the Shoot

The first step is to create a shoot cluster. Unfortunately, the Gardener dashboard currently does not allow to change the CIDRs for the created shoot clusters, and your shoots won’t work if they have overlapping CIDR ranges with their corresponding seed cluster. So either your seed cluster is deployed with different CIDRs - not using the dashboard, but kubectl apply and a yaml file - or all of your shoots on that seed need to be created this way. In order to be able to use the dashboard for the shoots, it makes sense to create the seed with different CIDRs.

So, create yourself a shoot with modified CIDRs. You can find templates for the shoot manifest here. You could, for example, change the CIDRs to this:

      ...
      networks:
        internal:
        - 10.254.112.0/22
        nodes: 10.254.0.0/19
        pods: 10.255.0.0/17
        public:
        - 10.254.96.0/22
        services: 10.255.128.0/17
        vpc:
          cidr: 10.254.0.0/16
        workers:
        - 10.254.0.0/19
      ...

Also make sure that your new seed cluster has enough resources for the expected number of shoots.

Registering the Shoot as Seed

The seed itself is a Kubernetes resource that can be deployed via a yaml file, but it has some dependencies. You can find templated versions of these files in the seed-config component of the landscape-setup-template project. If you have set up your Gardener using this project, there should also be rendered versions of these files in the state/seed-config/ directory of your landscape folder (they are probably easier to work with). Examples for all these files can also be found in the aforementioned example folder in the Gardener repo.

1. Seed Namespace

First, you should create a namespace for your new seed and everything that belongs to it. This is not necessary, but it will keep your cluster organized. For this example, the namespace will be called seed-test.

2. Cloud Provider Secret

The Gardener needs to create resources on the seed and thus needs a kubeconfig for it. It is provided with the cloud provider secret (below is an example for AWS).

apiVersion: v1
kind: Secret
metadata:
  name: test-seed-secret
  namespace: seed-test
  labels:
    cloudprofile.garden.sapcloud.io/name: aws 
type: Opaque
data:
  accessKeyID: <base64-encoded AWS access key>
  secretAccessKey: <base64-encoded AWS secret key>
  kubeconfig: <base64-encoded kubeconfig>

Deploy the secret into your seed namespace. Apart from the kubeconfig, also infrastructure credentials are required. They will only be used for the etcd backup, so in case for AWS, S3 privileges should be sufficient.

3. Secretbinding for Cloud Provider Secret

Create a secretbinding for your cloud provider secret:

apiVersion: core.gardener.cloud/v1beta1
kind: SecretBinding
metadata:
  name: test-seed-secret
  namespace: seed-test
  labels:
    cloudprofile.garden.sapcloud.io/name: aws
secretRef:
  name: test-seed-secret
# namespace: only required if in different namespace than referenced secret
quotas: []

You can give it the same name as the referenced secret.

4. Cloudprofile

The cloudprofile contains the information which shoots can be created with this seed. You could create a new cloudprofile, but you can also just reference the existing cloudprofile if you don’t want to change anything.

5. Seed

Now the seed resource can be created. Choose a name, reference cloudprofile and secretbinding, fill in your ingress domain, and set the CIDRs to the same values as in the underlying shoot cluster.

apiVersion: core.gardener.cloud/v1beta1
kind: Seed
metadata:
  name: aws-secure
spec:
  provider:
    type: aws
    region: eu-west-1
  secretRef:
    name: test-seed-secret
    namespace: seed-test
  dns:
    ingressDomain: ingress.<your cluster domain>
  networks:
    nodes: 10.254.0.0/19
    pods: 10.255.0.0/17
    services: 10.255.128.0/17

6. Hide Original Seed

In the dashboard, it is not possible to select the seed for a shoot (it is possible when deploying the shoot using a yaml file, however). Since both seeds probably reference the same cloudprofile, the Gardener will try to distribute the shoots equally among both seeds.

To solve this problem, edit the original seed and set its spec.visible field to false. This will prevent the Gardener from choosing this seed, so now all shoots created via the dashboard should have their control plane on the new, more secure seed.

3 - Administer Client (Shoot) Clusters

3.1 - Create / Delete a Shoot cluster

Create a Shoot Cluster

As you have already prepared an example Shoot manifest in the steps described in the development documentation, please open another Terminal pane/window with the KUBECONFIG environment variable pointing to the Garden development cluster and send the manifest to the Kubernetes API server:

$ kubectl apply -f your-shoot-aws.yaml

You should see that the Gardener has immediately picked up your manifest and started to deploy the Shoot cluster.

In order to investigate what is happening in the Seed cluster, please download its proper Kubeconfig yourself (see next paragraph). The namespace of the Shoot cluster in the Seed cluster will look like that: shoot-johndoe-johndoe-1, whereas the first johndoe is your namespace in the Garden cluster (also called “project”) and the johndoe-1 suffix is the actual name of the Shoot cluster.

To connect to the newly created Shoot cluster, you must download its Kubeconfig as well. Please connect to the proper Seed cluster, navigate to the Shoot namespace, and download the Kubeconfig from the kubecfg secret in that namespace.

Delete a Shoot Cluster

In order to delete your cluster, you have to set an annotation confirming the deletion first, and trigger the deletion after that. You can use the prepared delete shoot script which takes the Shoot name as first parameter. The namespace can be specified by the second parameter, but it is optional. If you don’t state it, it defaults to your namespace (the username you are logged in with to your machine).

$ ./hack/usage/delete shoot johndoe-1 johndoe

( hack bash script can be found here https://github.com/gardener/gardener/blob/master/hack/usage/delete)

Configure a Shoot cluster alert receiver

The receiver of the Shoot alerts can be configured from the .spec.monitoring.alerting.emailReceivers section in the Shoot specification. The value of the field has to be a list of valid mail addresses.

The alerting for the Shoot clusters is handled by the Prometheus Alertmanager. The Alertmanager will be deployed next to the control plane when the Shoot resource specifies .spec.monitoring.alerting.emailReceivers and if a SMTP secret exists.

If the field gets removed then the Alertmanager will be also removed during the next reconcilation of the cluster. The opposite is also valid if the field is added to an existing cluster.

3.2 - Create a Shoot Cluster into Existing AWS VPC

Create a Shoot cluster into existing AWS VPC

Gardener can create a new VPC, or use an existing one for your Shoot cluster. Depending on your needs you may want to create Shoot(s) into already created VPC. The tutorial describes how to create a Shoot cluster into existing AWS VPC. The steps are identical for Alicloud, Azure, and GCP. Please note that the existing VPC must be in the same region like the shoot cluster that you want to deploy into the VPC.

TL;DR

If .spec.provider.infrastructureConfig.networks.vpc.cidr is specified, Gardener will create a new VPC with the given CIDR block and respectively will delete it on Shoot deletion.
If .spec.provider.infrastructureConfig.networks.vpc.id is specified, Gardener will use the existing VPC and respectively won’t delete it on Shoot deletion.

It’s not recommended to create a Shoot cluster into VPC that is managed by Gardener (that is created for another Shoot cluster). In this case the deletion of the initial Shoot cluster will fail to delete the VPC because there will be resources attached to it.
Gardener won’t delete any manually created (unmanaged) resources in your cloud provider account.

1. Configure AWS CLI

The aws configure command is a convenient way to setup your AWS CLI. It will prompt you for your credentials and settings which will be used in the following AWS CLI invocations.

$ aws configure
AWS Access Key ID [None]: <ACCESS_KEY_ID>
AWS Secret Access Key [None]: <SECRET_ACCESS_KEY>
Default region name [None]: <DEFAULT_REGION>
Default output format [None]: <DEFAULT_OUTPUT_FORMAT>

2. Create VPC

Create the VPC by running the following command:

$ aws ec2 create-vpc --cidr-block <cidr-block>
{
  "Vpc": {
      "VpcId": "vpc-ff7bbf86",
      "InstanceTenancy": "default",
      "Tags": [],
      "CidrBlockAssociations": [
          {
              "AssociationId": "vpc-cidr-assoc-6e42b505",
              "CidrBlock": "10.0.0.0/16",
              "CidrBlockState": {
                  "State": "associated"
              }
          }
      ],
      "Ipv6CidrBlockAssociationSet": [],
      "State": "pending",
      "DhcpOptionsId": "dopt-38f7a057",
      "CidrBlock": "10.0.0.0/16",
      "IsDefault": false
  }
}

Gardener requires the VPC to have enabled DNS support, i.e the attributes enableDnsSupport and enableDnsHostnames must be set to true. enableDnsSupport attribute is enabled by default, enableDnsHostnames - not. Set the enableDnsHostnames attribute to true:

$ aws ec2 modify-vpc-attribute --vpc-id vpc-ff7bbf86 --enable-dns-hostnames

3. Create Internet Gateway

Gardener also requires that an internet gateway is attached to the VPC. You can create one using:

$ aws ec2 create-internet-gateway
{
    "InternetGateway": {
        "Tags": [],
        "InternetGatewayId": "igw-c0a643a9",
        "Attachments": []
    }
}

and attach it to the VPC using:

$ aws ec2 attach-internet-gateway --internet-gateway-id igw-c0a643a9 --vpc-id vpc-ff7bbf86

4. Create the Shoot

Prepare your Shoot manifest (you could check the example manifests). Please make sure that you choose the region in which you had created the VPC earlier (step 2). Also, put your VPC ID in the .spec.provider.infrastructureConfig.networks.vpc.id field:

spec:
  region: <aws-region-of-vpc>
  provider:
    type: aws
    infrastructureConfig:
      apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
      kind: InfrastructureConfig
      networks:
        vpc:
          id: vpc-ff7bbf86
    # ...

Apply your Shoot manifest.

$ kubectl apply -f your-shoot-aws.yaml

Ensure that the Shoot cluster is properly created.

$ kubectl get shoot $SHOOT_NAME -n $SHOOT_NAMESPACE
NAME           CLOUDPROFILE   VERSION   SEED   DOMAIN           OPERATION   PROGRESS   APISERVER   CONTROL   NODES   SYSTEM   AGE
<SHOOT_NAME>   aws            1.15.0    aws    <SHOOT_DOMAIN>   Succeeded   100        True        True      True    True     20m

3.3 - Shoot Cluster Maintenance

Understanding and configuring Gardener’s Day-2 operations for Shoot clusters.

Shoot Cluster Maintenance

Day two operations for shoot clusters are related to:

  • The Kubernetes version of the control plane and the worker nodes
  • the operating system version of the worker nodes

The following table summarizes what options Gardener offers to maintain these versions:

Auto-UpdateForceful UpdatesManual updates
Kubernetes versionPatches onlyPatches and consecutive minor updates onlyyes
Operating system versionyesyesyes

Allowed Target Versions in the CloudProfile

Administrators maintain the allowed target versions that you can update to in the CloudProfile for each IaaS-Provider. Users with access to a Gardener project can check supported target versions with:

kubectl get cloudprofile [IAAS-SPECIFIC-PROFILE] -o yaml
PathDescriptionMore information
spec.kubernetes.versionsThe supported Kubernetes version major.minor.patch.Patch releases
spec.machineImagesThe supported operating system versions for worker nodes.

Both the Kubernetes version, and the operating system version follow semantic versioning that allows Gardener to handle updates automatically.

More information: Semantic Versioning.

Impact of Version Classifications on Updates

Gardener allows to classify versions in the CloudProfile as preview, supported, deprecated, or expired. During maintenance operations, preview versions are excluded from updates, because they’re often recently released versions that haven’t yet undergone thorough testing and may contain bugs or security issues.

More information: Version Classifications.

Let Gardener manage your updates

The Maintenance Window

Gardener can manage updates for you automatically. It offers users to specify a maintenance window during which updates are scheduled:

  • The time interval of the maintenance window can’t be less than 30 minutes or more than 6 hours.
  • If there’s no maintenance window specified during the creation of a shoot cluster, Gardener chooses a maintenance window randomly to spread the load.

You can either specify the maintenance window in the shoot cluster specification (.spec.maintenance.timeWindow) or the start time of the maintenance window using the Gardener dashboard (CLUSTERS > [YOUR-CLUSTER] > OVERVIEW > Lifecycle > Maintenance).

Auto-Update and Forceful Updates

To trigger updates during the maintenance window automatically, Gardener offers the following methods:

  • Auto-update:
    Gardener starts an update during the next maintenance window whenever there’s a version available in the CloudProfile that is higher than the one of your shoot cluster specification, and that isn’t classified as preview version. For Kubernetes versions, auto-update only updates to higher patch levels.

    You can either activate auto-update on the Gardener dashboard (CLUSTERS > [YOUR-CLUSTER] > OVERVIEW > Lifecycle > Maintenance) or in the shoot cluster specification:

    • .spec.maintenance.autoUpdate.kubernetesVersion: true
    • .spec.maintenance.autoUpdate.machineImageVersion: true
  • Forceful updates:
    In the maintenance window, Gardener compares the current version given in the shoot cluster specification with the version list in the CloudProfile. If the version has an expiration date and if the date is before the start of the maintenance window, Gardener starts an update to the highest version available in the CloudProfile that isn’t classified as preview version. The highest version in CloudProfile can’t have an expiration date. For Kubernetes versions, Gardener only updates to higher patch levels or consecutive minor versions.

If you don’t want to wait for the next maintenance window, you can annotate the shoot cluster specification with shoot.gardener.cloud/operation: maintain. Gardener then checks immediately if there’s an auto-update or a forceful update needed.

With expiration dates, administrators can give shoot cluster owners more time for testing before the actual version update happens, which allows smoother transitions to new versions.

Kubernetes Update Paths

The bigger the delta of the Kubernetes source version and the Kubernetes target version, the better it must be planned and executed by operators. Gardener only provides automatic support for updates that can be applied safely to the cluster workload:

Update TypeExampleUpdate method
Patches1.10.12 to 1.10.13auto-update or Forceful update
Update to consecutive minor version1.10.12 to 1.11.10Forceful update
Other1.10.12 to 1.12.0manual update

Gardener doesn’t support automatic updates of nonconsecutive minor versions, because Kubernetes doesn’t guarantee updateability in this case. However, multiple minor version updates are possible if not only the minor source version is expired, but also the minor target version is expired. Gardener then updates the Kubernetes version first to the expired target version, and waits for the next maintenance window to update this version to the next minor target version.

Manual Updates

To update the Kubernetes version or the node operating system manually, change the .spec.kubernetes.version field or the .spec.provider.workers.machine.image.version field correspondingly.

Manual updates are required if you would like to do a minor update of the Kubernetes version. Gardener doesn’t do such updates automatically as they can have breaking changes that could impact the cluster workload.

Manual updates are either executed immediately (default) or can be confined to the maintenance time window.
Choosing the latter option, causes changes to the cluster (for example, node pool rolling-updates) and the subsequent reconciliation, to only predictably happen during a defined time window (available since Gardener version 1.4).

More information: Confine Specification Changes/Update Roll Out.

Examples

In the examples for the CloudProfile and the shoot cluster specification, only the fields relevant for the example are shown.

Auto-Update of Kubernetes Version

Let’s assume Kubernetes version 1.10.5 and 1.11.0 were added in the following CloudProfile:

spec:
  kubernetes:
    versions:
    - version: 1.11.0
    - version: 1.10.5
    - version: 1.10.0

Before this change, the shoot cluster specification looked like this:

spec:
  kubernetes:
    version: 1.10.0
  maintenance:
    timeWindow:
      begin: 220000+0000
      end: 230000+0000
    autoUpdate:
      kubernetesVersion: true

As a consequence, the shoot cluster is updated to Kubernetes version 1.10.5 between 22:00-23:00 UTC. Your shoot cluster isn’t updated automatically to 1.11.0 even though it’s the highest Kubernetes version in the CloudProfile, because Gardener does only do automatic updates of the Kubernetes patch level.

Forceful Update Due to Expired Kubernetes Version

Let’s assume the following CloudProfile:

spec:
  kubernetes:
    versions:
    - version: 1.12.8
    - version: 1.11.10
    - version: 1.10.13
    - version: 1.10.12
      expirationDate: "2019-04-13T08:00:00Z"

Let’s assume the shoot cluster has the following specification:

spec:
  kubernetes:
    version: 1.10.12
  maintenance:
    timeWindow:
      begin: 220000+0100
      end: 230000+0100
    autoUpdate:
      kubernetesVersion: false

The shoot cluster specification refers a Kubernetes version that has an expirationDate. In the maintenance window on 2019-04-12, the Kubernetes version stays the same as it’s still not expired. But in the maintenance window on 2019-04-14 the Kubernetes version of the shoot cluster is updated to 1.10.13 (independently of the value of .spec.maintenance.autoUpdate.kubernetesVersion).

Forceful Update to New Minor Kubernetes Version

Let’s assume the following CloudProfile:

spec:
  kubernetes:
    versions:
    - version: 1.12.8
    - version: 1.11.10
    - version: 1.11.09
    - version: 1.10.12
      expirationDate: "2019-04-13T08:00:00Z"

Let’s assume the shoot cluster has the following specification:

spec:
  kubernetes:
    version: 1.10.12
  maintenance:
    timeWindow:
      begin: 220000+0100
      end: 230000+0100
    autoUpdate:
      kubernetesVersion: false

The shoot cluster specification refers a Kubernetes version that has an expirationDate. In the maintenance window on 2019-04-14, the Kubernetes version of the shoot cluster is updated to 1.11.10, which is the highest patch version of minor target version 1.11 that follows source version 1.10.

Automatic Update from Expired Machine Image Version

Let’s assume the following CloudProfile:

spec:
  machineImages:
  - name: coreos
    versions:
    - version: 2191.5.0
    - version: 2191.4.1
    - version: 2135.6.0
      expirationDate: "2019-04-13T08:00:00Z"

Let’s assume the shoot cluster has the following specification:

spec:
  provider:
    type: aws
    workers:
    - name: name
      maximum: 1
      minimum: 1
      maxSurge: 1
      maxUnavailable: 0
      image:
        name: coreos
        version: 2135.6.0
        type: m5.large
      volume:
        type: gp2
        size: 20Gi
  maintenance:
    timeWindow:
      begin: 220000+0100
      end: 230000+0100
    autoUpdate:
      machineImageVersion: false

The shoot cluster specification refers a machine image version that has an expirationDate. In the maintenance window on 2019-04-12, the machine image version stays the same as it’s still not expired. But in the maintenance window on 2019-04-14 the machine image version of the shoot cluster is updated to 2191.5.0 (independently of the value of .spec.maintenance.autoUpdate.machineImageVersion) as version 2135.6.0 is expired.

4 - Monitor and Troubleshoot

4.1 - Get a Shell to a Gardener Shoot Worker Node

Describes the methods for getting shell access to worker nodes.

Get a Shell to a Kubernetes Node

To troubleshoot certain problems in a Kubernetes cluster, operators need access to the host of the Kubernetes node to troubleshoot problems. This can be required if a node misbehaves or fails to join the cluster in the first place.

With access to the host, it is for instance possible to check the kubelet logs and interact with common tools such as systemctland journalctl.

The first section of this guide explores options to get a shell to the node of a Gardener Kubernetes cluster. The options described in the second section do not rely on Kubernetes capabilities to get shell access to a node and thus can also be used if an instance failed to join the cluster.

This guide only covers how to get access to the host, but does not cover troubleshooting methods.

Get a Shell to an operational cluster node

The following describes four different approaches to get a shell to an operational Shoot worker node. As a prerequisite to troubleshooting a Kubernetes node, the node must have joined the cluster successfully and be able to run a pod. All of the described approaches involve scheduling a pod with root permissions and mounting the root filesystem.

Gardener Dashboard

Prerequisite: the terminal feature is configured for the Gardener dashboard.

Navigate to the cluster overview page and find the Terminal in the Access tile.

Access Tile

Select the target Cluster (Garden, Seed / Control Plane, Shoot cluster) depending on the requirements and access rights (only certain users have access to the Seed Control Plane).

To open the terminal configuration, click on the top right-hand corner of the screen.

Terminal configuration

Set the Terminal Runtime to “Privileged. Also specify the target node from the drop-down menu.

Dashboard terminal pod configuration

The dashboard then schedules a pod and opens a shell session to the node.

To get access to common binaries installed on the host, prefix the command with chroot /hostroot. Note that the path depends on where the root path is mounted in the container. In the default image used by the Dashboard, it is under /hostroot.

Dashboard terminal pod configuration

gardenctl shell

Prerequisite: kubectl and gardenctl are available and configured.

First, target a Garden cluster containing all the Shoot definitions.

$ gardenctl target garden <target-garden>

Target an available Shoot by name. This sets up the context and configures the kubeconfig file of the Shoot cluster. Subsequent commands will execute in this context.

$ gardenctl target shoot <target-shoot>

Get the nodes of the Shoot cluster.

$ gardenctl kubectl get nodes 

Pick a node name from the list above and get a root shell access to it.

$ gardenctl shell <target-node>

Gardener Ops Toolbelt

Prerequisite: kubectl is available.

The Gardener ops-toolbelt can be used as a convenient way to deploy a root pod to a node. The pod uses an image that is bundled with a bunch of useful troubleshooting tools. This is also the same image that is used by default when using the Gardener Dashboard terminal feature as described in the previous section.

The easiest way to use the Gardener ops-toolbelt is to execute the ops-pod script in the hacks folder. To get root shell access to a node, execute the aforementioned script by supplying the target node name as an argument:

$ <path-to-ops-toolbelt-repo>/hacks/ops-pod <target-node>

Custom root pod

Alternatively, a pod can be assigned to a target node and a shell can be opened via standard Kubernetes means. To enable root access to the node, the pod specification requires proper securityContext and volume properties.

For instance you can use the following pod manifest, after changing with the name of the node you want this pod attached to:

apiVersion: v1
kind: Pod
metadata:
  name: privileged-pod
  namespace: default
spec:
  nodeSelector:
    kubernetes.io/hostname: <target-node-name>
  containers:
  - name: busybox
    image: busybox
    stdin: true
    securityContext:
      privileged: true
    volumeMounts:
    - name: host-root-volume
      mountPath: /host
      readOnly: true
  volumes:
  - name: host-root-volume
    hostPath:
      path: /
  hostNetwork: true
  hostPID: true
  restartPolicy: Never

SSH access to a node that failed to join the cluster

This section explores two options that can be used to get SSH access to a node that failed to join the cluster. As it is not possible to schedule a pod on the node, the Kubernetes-based methods explored so far cannot be used in this scenario.

Additionally, Gardener typically provisions worker instances in a private subnet of the VPC, hence - there is no public IP address that could be used for direct SSH access.

For this scenario, cloud providers typically have extensive documentation (e.g AWS & GCP and in some cases tooling support). However, these approaches are mostly cloud provider specific, require interaction via their CLI and API or sometimes the installation of a cloud provider specific agent one the node.

Alternatively, gardenctl can be used providing a cloud provider agnostic and out-of-the-box support to get ssh access to an instance in a private subnet. Currently gardenctl supports AWS, GCP, Openstack, Azure and Alibaba Cloud.

Identifying the problematic instance

First, the problematic instance has to be identified. In Gardener, worker pools can be created in different cloud provider regions, zones and accounts.

The instance would typically show up as successfully started / running in the cloud provider dashboard or API and it is not immediately obvious which one has a problem. Instead, we can use the Gardener API / CRDs to obtain the faulty instance identifier in a cloud-agnostic way.

Gardener uses the Machine Controller Manager to create the Shoot worker nodes. For each worker node, the Machine Controller Manager creates a Machine CRD in the Shoot namespace in the respective Seed cluster. Usually the problematic instance can be identified as the respective Machine CRD has status pending.

The instance / node name can be obtained from the Machine .status field:

$ kubectl get machine <machine-name> -o json | jq -r .status.node

This is all the information needed to go ahead and use gardenctl ssh to get a shell to the node. In addition, the used cloud provider, the specific identifier of the instance and the instance region can be identified from the Machine CRD.

Get the identifier of the instance via:

$ kubectl get machine <machine-name> -o json | jq -r .spec.providerID // e.g aws:///eu-north-1/i-069733c435bdb4640

The identifier shows that the instance belongs to the cloud provider aws with the ec2 instance-id i-069733c435bdb4640 in region eu-north-1.

To get more information about the instance, check out the MachineClass (e.g AWSMachineClass) that is associated with each Machine CRD in the Shoot namespace of the Seed cluster. The AWSMachineClass contains the machine image (ami), machine-type, iam information, network-interfaces, subnets, security groups and attached volumes.

Of course, the information can also be used to get the instance with the cloud provider CLI / API.

gardenctl ssh

Using the node name of the problematic instance, we can use the gardenctl ssh command to get SSH access to the cloud provider instance via an automatically set up bastion host. gardenctl takes care of spinning up the bastion instance, setting up the SSH keys, ports and security groups and opens a root shell on the target instance. After the SSH session has ended, gardenctl deletes the created cloud provider resources.

Use the following commands:

First, target a Garden cluster containing all the Shoot definitions.

$ gardenctl target garden <target-garden>

Target an available Shoot by name. This sets up the context, configures the kubeconfig file of the Shoot cluster and downloads the cloud provider credentials. Subsequent commands will execute in this context.

$ gardenctl target shoot <target-shoot>

This uses the cloud provider credentials to spin up the bastion and to open a shell on the target instance.

$ gardenctl ssh <target-node>

SSH with manually created Bastion on AWS

In case you are not using gardenctl or want to control the bastion instance yourself, you can also manually set it up. The steps described here are generally the same as those used by gardenctl internally. Despite some cloud provider specifics they can be generalized to the following list:

  • Open port 22 on the target instance.
  • Create an instance / VM in a public subnet (bastion instance needs to have public ip address).
  • Set-up security groups, roles and open port 22 for the bastion instance.

The following diagram shows an overview how the SSH access to the target instance works:

SSH Bastion diagram

This guide demonstrates the setup of a bastion on AWS.

Prerequisites:

  • The AWS CLI is set up.
  • Obtain target instance-id (see here).
  • Obtain the VPC ID the Shoot resources are created in. This can be found in the Infrastructure CRD in the Shoot namespace in the Seed.
  • Make sure that port 22 on the target instance is open (default for Gardener deployed instances).
    • Extract security group via
    $ aws ec2 describe-instances --instance-ids <instance-id>
    
    • Check for rule that allows inbound connections on port 22:
    $ aws ec2 describe-security-groups --group-ids=<security-group-id>
    
    • If not available, create the rule with the following comamnd:
    $ aws ec2 authorize-security-group-ingress --group-id <security-group-id>  --protocol tcp --port 22 --cidr 0.0.0.0/0
    

Create the Bastion Security Group

  • The common name of the security group is <shoot-name>-bsg. Create the security group:

    $ aws ec2 create-security-group --group-name <bastion-security-group-name>  --description ssh-access --vpc-id <VPC-ID>
    
  • Optionally, create identifying tags for the security group:

    $ aws ec2 create-tags --resources <bastion-security-group-id> --tags Key=component,Value=<tag>
    
  • Create permission in the bastion security group that allows ssh access on port 22.

    $ aws ec2 authorize-security-group-ingress --group-id <bastion-security-group-id>  --protocol tcp --port 22 --cidr 0.0.0.0/0
    
  • Create an IAM role for the bastion instance with the name <shoot-name>-bastions:

    $ aws iam create-role --role-name <shoot-name>-bastions
    

    The content should be:

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "ec2:DescribeRegions"
        ],
        "Resource": [
            "*"
        ]
    }
]
}
  • Create the instance profile with name <shoot-name>-bastions:

    $ aws iam create-instance-profile --instance-profile-name <name>
    
  • Add the created role to the instance profile:

    $ aws iam add-role-to-instance-profile --instance-profile-name <instance-profile-name> --role-name <role-name>
    

Create the bastion instance

Next, in order to be able to ssh into the bastion instance, the instance has to be set up with a user with a public ssh key. Create a user gardener that has the same Gardener-generated public ssh key as the target instance.

  • First, we need to get the public part of the Shoot ssh-key. The ssh-key is stored in a secret in the the project namespace in the Garden cluster. The name is: <shoot-name>-ssh-publickey. Get the key via:

    $ kubectl get secret aws-gvisor.ssh-keypair -o json | jq -r .data.\"id_rsa.pub\"
    
  • A script handed over as user-data to the bastion ec2 instance, can be used to create the gardener user and add the ssh-key. For your convenience, you can use the following script to generate the user-data.

#!/bin/bash -eu
saveUserDataFile () {
  ssh_key=$1

cat > gardener-bastion-userdata.sh <<EOF
#!/bin/bash -eu
id gardener || useradd gardener -mU
mkdir -p /home/gardener/.ssh
echo "$ssh_key" > /home/gardener/.ssh/authorized_keys
chown gardener:gardener /home/gardener/.ssh/authorized_keys
echo "gardener ALL=(ALL) NOPASSWD:ALL" >/etc/sudoers.d/99-gardener-user
EOF
}


if [ -p /dev/stdin ]; then
    read -r input
    cat | saveUserDataFile "$input"
else
    pbpaste | saveUserDataFile "$input"
fi
  • Use the script by handing-over the public ssh-key of the Shoot cluster:

    $ kubectl get secret aws-gvisor.ssh-keypair -o json | jq -r .data.\"id_rsa.pub\" | ./generate-userdata.sh
    

    This generates a file called gardener-bastion-userdata.sh in the same directory containing the user-data.

  • The following information is needed to create the bastion instance:

    bastion-IAM-instance-profile-name

    • Use the created instance profile with name <shoot-name>-bastions

    image-id

    • Possible use the same image-id as for the target instance (or any other image). Has cloud provider specific format (AWS: ami).

    ssh-public-key-name

    • This is the ssh key pair already created in the Shoot’s cloud provider account by Gardener during the Infrastructure CRD reconciliation.
    • The name is usually: <shoot-name>-ssh-publickey

    subnet-id

    • Choose a subnet that is attached to an Internet Gateway and NAT Gateway (bastion instance must have a public IP).
    • The Gardener created public subnet with the name <shoot-name>-public-utility-<xy> can be used. Please check the created subnets with the cloud provider.

    bastion-security-group-id

    • Use the id of the created bastion security group.

    file-path-to-userdata

    • Use the filepath to user-data file generated in the previous step.

    • bastion-instance-name

      • Optional to tag the instance.
      • Usually <shoot-name>-bastions
  • Create the bastion instance via:

$ ec2 run-instances --iam-instance-profile Name=<bastion-IAM-instance-profile-name> --image-id <image-id>  --count 1 --instance-type t3.nano --key-name <ssh-public-key-name>  --security-group-ids <bastion-security-group-id> --subnet-id <subnet-id> --associate-public-ip-address --user-data <file-path-to-userdata> --tag-specifications ResourceType=instance,Tags=[{Key=Name,Value=<bastion-instance-name>},{Key=component,Value=<mytag>}] ResourceType=volume,Tags=[{Key=component,Value=<mytag>}]"

Capture the instance-id from the reponse and wait until the ec2 instance is running and has a public ip address.

Connecting to the target instance

Save the private key of the ssh-key-pair in a temporary local file for later use.

$ umask 077

$ kubectl get secret <shoot-name>.ssh-keypair -o json | jq -r .data.\"id_rsa\" | base64 -d > id_rsa.key

Use the private ssh key to ssh into the bastion instance.

$ ssh -i <path-to-private-key> gardener@<public-bastion-instance-ip> 

If that works, connect from your local terminal to the target instance via the bastion.

$ ssh  -i <path-to-private-key> -o ProxyCommand="ssh -W %h:%p -i <private-key> -o IdentitiesOnly=yes -o StrictHostKeyChecking=no gardener@<public-ip-bastion>" gardener@<private-ip-target-instance> -o IdentitiesOnly=yes -o StrictHostKeyChecking=no

Cleanup

Do not forget to cleanup the created resources. Otherwise Gardener will eventually fail to delete the Shoot.

4.2 - How to Debug a Pod

Your pod doesn’t run as expected. Are there any log files? Where? How could I debug a pod?

Introduction

Kubernetes offers powerful options to get more details about startup or runtime failures of pods as e.g. described in Application Introspection and Debugging or Debug Pods and Replication Controllers.

In order to identify pods with potential issus you could e.g. run kubectl get pods --all-namespaces | grep -iv Running to filter out the pods which are not in the state Running. One of frequent error state is CrashLoopBackOff, which tells that a pod crashes right after the start. Kubernetes then tries to restart the pod, but often the pod startup fails again.

Here is a short list of possible reasons which might lead to a pod crash:

  1. error during image pull caused by e.g. wrong/missing secrets or wrong/missing image
  2. the app runs in an error state caused e.g. by missing environmental variables (ConfigMaps) or secrets
  3. liveness probe failed
  4. too high resource consumption (memory and/or CPU) or too strict quota settings
  5. persistent volumes can’t be created/mounted
  6. the container image is not updated

Basically, the commands kubectl logs ... and kubectl describe ... with additional parameters are used to get more detailed information. By calling e.g. kubectl logs --help you get more detailed information about the command and its parameters.

In the next sections you’ll find some basic approaches to get some ideas what went wrong.

Remarks:

  • Even if the pods seem to be running as the status Running indicates, a high counter of the Restarts shows potential problems
  • There is as well an interactive Tutorial Troubleshooting with Kubectl available which explains basic debugging activities
  • The examples below are deployed into the namespace default. In case you want to change it use the optional parameter --namespace <your-namespace> to select the target namespace. They require Kubernetes release ≥ 1.8.

Prerequisites

Your deployment was successful (no logical/syntactical errors in the manifest files), but the pod(s) aren’t running.

Error caused by wrong image name

You run kubectl describe pod <your-pod> <your-namespace> to get detailed information about the pod startup.

In the Events section, you get an error message like Failed to pull image ... and Reason: Failed. The pod is in state ImagePullBackOff.

The example below is based on demo in Kubernetes documentation. In all examples the default namespace is used.

First, cleanup with

kubectl delete pod termination-demo

Next, create a resource based on the yaml content below

apiVersion: v1
kind: Pod 
metadata:
  name: termination-demo
spec:
  containers:
  - name: termination-demo-container
    image: debiann
    command: ["/bin/sh"]
    args: ["-c", "sleep 10 && echo Sleep expired > /dev/termination-log"]

kubectl describe pod termination-demo lists the following content in the Event section

Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath					Type		Reason			Message
  ---------	--------	-----	----							-------------					--------	------			-------
  2m		2m		1	default-scheduler											Normal		Scheduled		Successfully assigned termination-demo to ip-10-250-17-112.eu-west-1.compute.internal
  2m		2m		1	kubelet, ip-10-250-17-112.eu-west-1.compute.internal							Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-sgccm" 
  2m		1m		4	kubelet, ip-10-250-17-112.eu-west-1.compute.internal	spec.containers{termination-demo-container}	Normal		Pulling			pulling image "debiann"
  2m		1m		4	kubelet, ip-10-250-17-112.eu-west-1.compute.internal	spec.containers{termination-demo-container}	Warning		Failed			Failed to pull image "debiann": rpc error: code = Unknown desc = Error: image library/debiann:latest not found
  2m		54s		10	kubelet, ip-10-250-17-112.eu-west-1.compute.internal							Warning		FailedSync		Error syncing pod
  2m		54s		6	kubelet, ip-10-250-17-112.eu-west-1.compute.internal	spec.containers{termination-demo-container}	Normal		BackOff			Back-off pulling image "debiann"

The error message with Reason: Failed tells that there is an error during pulling the image. A closer look at the image name indicates a misspelling.

App runs in an error state caused by missing ConfigMaps or Secrets

This example illustrates the behavior in case of the app expecting environment variables but the corresponding Kubernetes artifacts are missing.

First, cleanup with

kubectl delete deployment termination-demo
kubectl delete configmaps app-env

Next, deploy this manifest

apiVersion: apps/v1
kind: Deployment
metadata:
  name: termination-demo
  labels:
     app: termination-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: termination-demo
  template:
    metadata:
      labels:
        app: termination-demo
    spec:
      containers:
      - name: termination-demo-container
        image: debian
        command: ["/bin/sh"]
        args: ["-c", "sed \"s/foo/bar/\" < $MYFILE"]

Now, the command kubectl get pods lists the pod termination-demo-xxx in the state Error or CrashLoopBackOff. The command kubectl describe pod termination-demo-xxx tells that there is no error during startup but gives no clue about what caused the crash.

Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath					Type		Reason		Message
  ---------	--------	-----	----							-------------					--------	------		-------
  19m		19m		1	default-scheduler											Normal		Scheduled	Successfully assigned termination-demo-5fb484867d-xz2x9 to ip-10-250-17-112.eu-west-1.compute.internal
  19m		19m		1	kubelet, ip-10-250-17-112.eu-west-1.compute.internal							Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-sgccm" 
  19m		19m		4	kubelet, ip-10-250-17-112.eu-west-1.compute.internal	spec.containers{termination-demo-container}	Normal		Pulling		pulling image "debian"
  19m		19m		4	kubelet, ip-10-250-17-112.eu-west-1.compute.internal	spec.containers{termination-demo-container}	Normal		Pulled		Successfully pulled image "debian"
  19m		19m		4	kubelet, ip-10-250-17-112.eu-west-1.compute.internal	spec.containers{termination-demo-container}	Normal		Created		Created container
  19m		19m		4	kubelet, ip-10-250-17-112.eu-west-1.compute.internal	spec.containers{termination-demo-container}	Normal		Started		Started container
  19m		14m		24	kubelet, ip-10-250-17-112.eu-west-1.compute.internal	spec.containers{termination-demo-container}	Warning		BackOff		Back-off restarting failed container
  19m		4m		69	kubelet, ip-10-250-17-112.eu-west-1.compute.internal							Warning		FailedSync	Error syncing pod

The command kubectl get logs termination-demo-xxx gives access to the output, the application writes on stderr and stdout. In this case, you get an output like

/bin/sh: 1: cannot open : No such file

So you need to have a closer look at the application. In this case the environmental variable MYFILEis missing. To fix this issue you could e.g. add a ConfigMap to your deployment as it is shown in the manifest listed below.

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-env
data:
  MYFILE: "/etc/profile"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: termination-demo
  labels:
     app: termination-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: termination-demo
  template:
    metadata:
      labels:
        app: termination-demo
    spec:
      containers:
      - name: termination-demo-container
        image: debian
        command: ["/bin/sh"]
        args: ["-c", "sed \"s/foo/bar/\" < $MYFILE"]
        envFrom:
        - configMapRef:
            name: app-env 

Note that once you fix the error and re-run the scenario, you might still see the pod in CrashLoopBackOff status. It is because the container finishes the command sed ... and runs to completion. In order to keep the container in Running status, a long running task is required, e.g.

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-env
data:
  MYFILE: "/etc/profile"
  SLEEP: "5"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: termination-demo
  labels:
     app: termination-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: termination-demo
  template:
    metadata:
      labels:
        app: termination-demo
    spec:
      containers:
      - name: termination-demo-container
        image: debian
        command: ["/bin/sh"]
        # args: ["-c", "sed \"s/foo/bar/\" < $MYFILE"]
        args: ["-c", "while true; do sleep $SLEEP; echo sleeping; done;"]
        envFrom:
        - configMapRef:
            name: app-env

Too high resource consumption or too strict quota settings

You can optionally specify the amount of memory and/or CPU your container gets during runtime. In case these settings are missing, the default requests settings are taken: CPU: 0m (in Milli CPU) and RAM: 0Gi which indicate no other limits than the ones of the node(s) itself. Find more details in Configure Default Memory Requests and Limits for a Namespace,

In case your application needs more resources, Kubernetes distinguishes between requests and limit settings: requests specify the guaranteed amount of resource, whereas limit tells Kubernetes the maximum amount of resource the container might need. Mathematically both settings could be described by the relation 0 <= requests <= limit. For both settings you need to consider the total amount of resources the available nodes provide. For a detailed description of the concept see Resource Quality of Service in Kubernetes.

Use kubectl describe nodes to get a first overview of the resource consumption of your cluster. Of special interest are the figures indicating the amount of CPU and Memory Requests at the bottom of the output.

The next example demonstrates what happens in case the CPU request is too high in order to be managed by your cluster.

First, cleanup with

kubectl delete deployment termination-demo
kubectl delete configmaps app-env

Next, adapt the cpu in the yaml below to be slightly higher than the remaining cpu resources in your cluster and deploy this manifest. In this example 600m (milli CPUs) are requested in a Kubernetes system with a single 2 Core worker node which results in an error message.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: termination-demo
  labels:
     app: termination-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: termination-demo
  template:
    metadata:
      labels:
        app: termination-demo
    spec:
      containers:
      - name: termination-demo-container
        image: debian
        command: ["/bin/sh"]
        args: ["-c", "sleep 10 && echo Sleep expired > /dev/termination-log"]
        resources:
          requests:
            cpu: "600m" 

The command kubectl get pods lists the pod termination-demo-xxx in the state Pending. More details on why this happens could be found by using the command kubectl describe pod termination-demo-xxx:

$ kubectl describe po termination-demo-fdb7bb7d9-mzvfw
Name:           termination-demo-fdb7bb7d9-mzvfw
Namespace:      default
...
Containers:
  termination-demo-container:
    Image:      debian
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
    Args:
      -c
      sleep 10 && echo Sleep expired > /dev/termination-log
    Requests:
      cpu:        6
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-t549m (ro)
Conditions:
  Type           Status
  PodScheduled   False
Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  9s (x7 over 40s)  default-scheduler  0/2 nodes are available: 2 Insufficient cpu.

More details in

Remark:

  • This example works similarly when specifying a too high request for memory
  • In case you configured an autoscaler range when creating your Kubernetes cluster another worker node will be started automatically if you didn’t reach the maximum number of worker nodes
  • If your app is running out of memory (the memory settings are too small), you typically find OOMKilled (Out Of Memory) message in the Events section fo the kubectl describe pod ... output

Why was the container image not updated?

You applied a fix in your app, created a new container image and pushed it into your container repository. After redeploying your Kubernetes manifests you expected to get the updated app, but still the same bug is in the new deployment present.

This behavior is related to how Kubernetes decides whether to pull a new docker image or to use the cached one.

In case you didn’t change the image tag, the default image policy IfNotPresent tells Kubernetes to use the cached image (see Images).

As a best practice you should not use the tag latest and change the image tag whenever you changed anything in your image (see Configuration Best Practices).

Find more details in FAQ Container Image not updating.

4.3 - tail -f /var/log/my-application.log

Aggregate log files from different pods

Problem

One thing that always bothered me was that I couldn’t get logs of several pods at once with kubectl. A simple tail -f <path-to-logfile> isn’t possible at all. Certainly you can use kubectl logs -f <pod-id>, but it doesn’t help if you want to monitor more than one pod at a time.

This is something you really need a lot, at least if you run several instances of a pod behind a deployment. This is even more so if you don’t have a Kibana setup or similar.

Solution

Luckily, there are smart developers out there who always come up with solutions. The finding of the week is a small bash script that allows you to aggregate log files of several pods at the same time in a simple way. The script is called kubetail and is available at GitHub.

5 - Applications

5.1 - Access a Port of a Pod Locally

Question

You deployed an application with a web UI or an internal endpoint in your Kubernetes (K8s) cluster. How can I access this endpoint without an external load balancer (e.g. Ingress)? This tutorial presents two options:

  • Using Kubernetes port forward
  • Using Kubernetes apiserver proxy

Please note that the options described here are mostly for quick testing or troubleshooting your application. For enabling access to your application for productive environment, please refer to Service and Ingress.

Solution 1: Using Kubernetes port forward

You could use the port forwarding functionality of kubectl to access the pods from your local host without involving a service.

To access any pod follow these steps:

  • Run kubectl get pods
  • Note down the name of the pod in question as <your-pod-name>
  • Run kubectl port-forward <your-pod-name> <local-port>:<your-app-port>
  • Run a web browser or curl locally and enter the URL http(s)://localhost:<local-port>

In addition, kubectl port-forward allows to use a resource name. such as a deployment or service name, to select a matching pod to port forward. Find more details in the Kubernetes documentation.

The main drawback of this approach is that the pod’s name will change as soon as it is restarted. Moreover, you need to have a web browser on your client and you need to make sure that the local port is not already used by an application running on your system. Finally, sometimes port forwarding is canceled due to non obvious reasons. This leads to a kind of shaky approach. A more robust approach is to access the application using kube-proxy.

port-forward

Solution 2: Using apiserver proxy

There are several different proxies used with Kubernetes, the official documentation provides a good overview.

In this tutorial we are using apiserver proxy to enable access to services running in Kubernetes without using an Ingress. Different from the first solution, a service is required for this solution .

Use the following URL to access a service via apiserver proxy. For details about apiserver proxy URLs read Discovering builtin services.

https://<cluster-master>/api/v1/namespace/<namespace>/services/<service>:<service-port>/proxy/<service-endpoint>

Example:

cluster-masternamespaceserviceservice-portservice-endpointurl to access service
api.testclstr.cpet.k8s.sapcloud.iodefaultnginx-svc80/http://api.testclstr.cpet.k8s.sapcloud.io/api/v1/namespaces/default/services/nginx-svc:80/proxy/
api.testclstr.cpet.k8s.sapcloud.iodefaultdocker-nodejs-svc4500/cpu?baseNumber=4https://api.testclstr.cpet.k8s.sapcloud.io/api/v1/namespaces/default/services/docker-nodejs-svc:4500/proxy/cpu?baseNumber=4

There are applications, which do not yet support relative URLs like Prometheus (as of end of November, 2017). This typically leads to missing JavaScript objects when trying to open the URL in a browser. In this case use the port-forward approach described above.

5.2 - Auditing Kubernetes for Secure Setup

A few insecure configurations in Kubernetes

Auditing Kubernetes for Secure Setup

teaser

Increasing the Security of all Gardener Stakeholders

In summer 2018, the Gardener project team asked Kinvolk to execute several penetration tests in its role as third-party contractor. The goal of this ongoing work is to increase the security of all Gardener stakeholders in the open source community. Following the Gardener architecture, the control plane of a Gardener managed shoot cluster resides in the corresponding seed cluster. This is a Control-Plane-as-a-Service with a network air gap.

Along the way we found various kinds of security issues, for example, due to misconfiguration or missing isolation, as well as two special problems with upstream Kubernetes and its Control-Plane-as-a-Service architecture.

Major Findings

From this experience, we’d like to share a few examples of security issues that could happen on a Kubernetes installation and how to fix them.

Alban Crequy (Kinvolk) and Dirk Marwinski (SAP SE) gave a presentation entitled Hardening Multi-Cloud Kubernetes Clusters as a Service at KubeCon 2018 in Shanghai presenting some of the findings.

Here is a summary of the findings:

  • Privilege escalation due to insecure configuration of the Kubernetes API server

    • Root cause: Same certificate authority (CA) is used for both the API server and the proxy that allows accessing the API server.
    • Risk: Users can get access to the API server.
    • Recommendation: Always use different CAs.
  • Exploration of the control plane network with malicious HTTP-redirects

    • Root cause: See detailed description below.

    • Risk: Provoked error message contains full HTTP payload from an existing endpoint which can be exploited. The contents of the payload depends on your setup, but can potentially be user data, configuration data, and credentials.

    • Recommendation:

      • Use the latest version of Gardener
      • Ensure the seed cluster’s container network supports network policies. Clusters that have been created with Kubify are not protected as Flannel is used there which doesn’t support network policies.
  • Reading private AWS metadata via Grafana

    • Root cause: It is possible to configuring a new custom data source in Grafana, we could send HTTP requests to target the control
    • Risk: Users can get the “user-data” for the seed cluster from the metadata service and retrieve a kubeconfig for that Kubernetes cluster
    • Recommendation: Lockdown Grafana features to only what’s necessary in this setup, block all unnecessary outgoing traffic, move Grafana to a different network, lockdown unauthenticated endpoints

Scenario 1: Privilege Escalation with Insecure API Server

In most configurations, different components connect directly to the Kubernetes API server, often using a kubeconfig with a client certificate. The API server is started with the flag:

/hyperkube apiserver --client-ca-file=/srv/kubernetes/ca/ca.crt ...

The API server will check whether the client certificate presented by kubectl, kubelet, scheduler or another component is really signed by the configured certificate authority for clients.

The API server can have many clients of various kinds


However, it is possible to configure the API server differently for use with an intermediate authenticating proxy. The proxy will authenticate the client with its own custom method and then issue HTTP requests to the API server with additional HTTP headers specifying the user name and group name. The API server should only accept HTTP requests with HTTP headers from a legitimate proxy. To allow the API server to check incoming requests, you need pass on a list of certificate authorities (CAs) to it. Requests coming from a proxy are only accepted if they use a client certificate that is signed by one of the CAs of that list.

--requestheader-client-ca-file=/srv/kubernetes/ca/ca-proxy.crt
--requestheader-username-headers=X-Remote-User
--requestheader-group-headers=X-Remote-Group

API server clients can reach the API server through an authenticating proxy


So far, so good. But what happens if malicious user “Mallory” tries to connect directly to the API server and reuses the HTTP headers to pretend to be someone else?

What happens when a client bypasses the proxy, connecting directly to the API server?


With a correct configuration, Mallory’s kubeconfig will have a certificate signed by the API server certificate authority but not signed by the proxy certificate authority. So the API server will not accept the extra HTTP header “X-Remote-Group: system:masters”.

You only run into an issue when the same certificate authority is used for both the API server and the proxy. Then, any Kubernetes client certificate can be used to take the role of different user or group as the API server will accept the user header and group header.

The kubectl tool does not normally add those HTTP headers but it’s pretty easy to generate the corresponding HTTP requests manually.

We worked on improving the Kubernetes documentation to make clearer that this configuration should be avoided.

Scenario 2: Exploration of the Control Plane Network with Malicious HTTP-Redirects

The API server is a central component of Kubernetes and many components initiate connections to it, including the Kubelet running on worker nodes. Most of the requests from those clients will end up updating Kubernetes objects (pods, services, deployments, and so on) in the etcd database but the API server usually does not need to initiate TCP connections itself.

The API server is mostly a component that receives requests


However, there are exceptions. Some kubectl commands will trigger the API server to open a new connection to the Kubelet. Kubectl exec is one of those commands. In order to get the standard I/Os from the pod, the API server will start an HTTP connection to the Kubelet on the worker node where the pod is running. Depending on the container runtime used, it can be done in different ways, but one way to do it is for the Kubelet to reply with a HTTP-302 redirection to the Container Runtime Interface (CRI). Basically, the Kubelet is telling the API server to get the streams from CRI itself directly instead of forwarding. The redirection from the Kubelet will only change the port and path from the URL; the IP address will not be changed because the Kubelet and the CRI component run on the same worker node.

But the API server also initiates some connections, for example, to worker nodes


It’s often quite easy for users of a Kubernetes cluster to get access to worker nodes and tamper with the Kubelet. They could be given explicit SSH access or they could be given a kubeconfig with enough privileges to create privileged pods or even just pods with “host” volumes.

In contrast, users — even those with “system:masters” permissions or “root” rights — are often not given access to the control plane. On setups like for example GKE or Gardener, the control plane is running on separate nodes, with a different administrative access. It could be hosted on a different cloud provider account. So users are not free to explore the internal network in the control plane.

What would happen if a user was tampering with the Kubelet to make it maliciously redirect kubectl exec requests to a different random endpoint? Most likely the given endpoint would not speak the streaming server protocol, so there would be an error. However, the full HTTP payload from the endpoint is included in the error message printed by kubectl exec.

The API server is tricked to connect to other components


The impact of this issue depends on the specific setup. But in many configurations, we could find a metadata service (such as the AWS metadata service) containing user data, configurations and credentials. The setup we explored had a different AWS account and a different EC2 instance profile for the worker nodes and the control plane. This issue allowed users to get access to the AWS metadata service in the context of the control plane, which they should not have access to.

We have reported this issue to the Kubernetes Security mailing list and the public pull request that addresses the issue has been merged PR#66516. It provides a way to enforce HTTP redirect validation (disabled by default).

But there are several other ways that users could trigger the API server to generate HTTP requests and get the reply payload back, so it is advised to isolate the API server and other components from the network as additional precautious measures. Depending on where the API server runs, it could be with Kubernetes Network Policies, EC2 Security Groups or just iptables directly. Following the defense in depth principle, it is a good idea to apply the API server HTTP redirect validation when it is available as well as firewall rules.

In Gardener, this has been fixed with Kubernetes network policies along with changes to ensure the API server does not need to contact the metadata service. You can see more details in the announcements on the Gardener mailing list. This is tracked in CVE-2018-2475.

To be protected from this issue, stakeholders should:

  • Use the latest version of Gardener
  • Ensure the seed cluster’s container network supports network policies. Clusters that have been created with Kubify are not protected as Flannel is used there which doesn’t support network policies.

Scenario 3: Reading Private AWS Metadata via Grafana

For our tests, we had access to a Kubernetes setup where users are not only given access to the API server in the control plane, but also to a Grafana instance that is used to gather data from their Kubernetes clusters via Prometheus. The control plane is managed and users don’t have access to the nodes that it runs. They can only access the API server and Grafana via a load balancer. The internal network of the control plane is therefore hidden to users.

Prometheus and Grafana can be used to monitor worker nodes


Unfortunately, that setup was not protecting the control plane network from nosy users. By configuring a new custom data source in Grafana, we could send HTTP requests to target the control plane network, for example the AWS metadata service. The reply payload is not displayed on the Grafana Web UI but it is possible to access it from the debugging console of the Chrome browser.

Credentials can be retrieved from the debugging console of Chrome


Adding a Grafana data source is a way to issue HTTP requests to arbitrary targets


In that installation, users could get the “user-data” for the seed cluster from the metadata service and retrieve a kubeconfig for that Kubernetes cluster.

There are many possible measures to avoid this situation: lockdown Grafana features to only what’s necessary in this setup, block all unnecessary outgoing traffic, move Grafana to a different network, lockdown unauthenticated endpoints.

Conclusion

The three scenarios above show pitfalls with a Kubernetes setup. A lot of them were specific to the Kubernetes installation: different cloud providers or different configurations will show different weakness. Users should no longer be given access to Grafana.

5.3 - Container Image not Pulled

Wrong Container Image or Invalid Registry Permissions

Problem

Two of the most common problems are specifying the wrong container image or trying to use private images without providing registry credentials.

Note: There is no observable difference in Pod status between a missing image and incorrect registry permissions. In either case, Kubernetes will report an ErrImagePull status for the Pods. For this reason, this article deals with both scenarios.

Example

Let’s see an example. We’ll create a pod named fail referencing a non-existent Docker image:

kubectl run -i --tty fail --image=tutum/curl:1.123456

the command prompt doesn’t return and you can press ctrl+c

Error analysis

We can then inspect our Pods and see that we have one Pod with a status of ErrImagePull or ImagePullBackOff.

$ (minikube) kubectl get pods
NAME                      READY     STATUS         RESTARTS   AGE
client-5b65b6c866-cs4ch   1/1       Running        1          1m
fail-6667d7685d-7v6w8     0/1       ErrImagePull   0          <invalid>
vuejs-578574b75f-5x98z    1/1       Running        0          1d
$ (minikube) 

For some additional information, we can describe the failing Pod.

kubectl describe pod fail-6667d7685d-7v6w8

As you can see in the events section, your image can’t be pulled

Name:		fail-6667d7685d-7v6w8
Namespace:	default
Node:		minikube/192.168.64.10
Start Time:	Wed, 22 Nov 2017 10:01:59 +0100
Labels:		pod-template-hash=2223832418
		run=fail
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"fail-6667d7685d","uid":"cc4ccb3f-cf63-11e7-afca-4a7a1fa05b3f","a...
.
.
.
.
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath		Type		Reason			Message
  ---------	--------	-----	----			-------------		--------	------			-------
  1m		1m		1	default-scheduler				Normal		Scheduled		Successfully assigned fail-6667d7685d-7v6w8 to minikube
  1m		1m		1	kubelet, minikube				Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-9fr6r" 
  1m		6s		4	kubelet, minikube	spec.containers{fail}	Normal		Pulling			pulling image "tutum/curl:1.123456"
  1m		5s		4	kubelet, minikube	spec.containers{fail}	Warning		Failed			Failed to pull image "tutum/curl:1.123456": rpc error: code = Unknown desc = Error response from daemon: manifest for tutum/curl:1.123456 not found
  1m		<invalid>	10	kubelet, minikube				Warning		FailedSync		Error syncing pod
  1m		<invalid>	6	kubelet, minikube	spec.containers{fail}	Normal		BackOff			Back-off pulling image "tutum/curl:1.123456"

Why couldn’t Kubernetes pull the image? There are three primary candidates besides network connectivity issues:

  • The image tag is incorrect
  • The image doesn’t exist
  • Kubernetes doesn’t have permissions to pull that image

If you don’t notice a typo in your image tag, then it’s time to test using your local machine. I usually start by running docker pull on my local development machine with the exact same image tag. In this case, I would run docker pull tutum/curl:1.123456

If this succeeds, then it probably means that Kubernetes doesn’t have correct permissions to pull that image.

Add the docker registry user/pwd to your cluster

kubectl create secret docker-registry dockersecret --docker-server=https://index.docker.io/v1/ --docker-username=<username> --docker-password=<password> --docker-email=<email>

If the exact image tag fails, then I will test without an explicit image tag - docker pull tutum/curl - which will attempt to pull the latest tag. If this succeeds, then that means the originally specified tag doesn’t exist. Go to the Docker registry and check which tags are available for this image.

If docker pull tutum/curl (without an exact tag) fails, then we have a bigger problem - that image does not exist at all in our image registry.

5.4 - Container Image not Updating

Updating Images in your cluster during development

Preface

A container image should use a fixed tag or the content hash of the image. It should not use the tags latest, head, canary, or other tags that are designed to be floating.

Problem

Many Kubernetes users have run into this problem. The story goes something like this:

  • Deploy any image using an image tag (e.g. cp-enablement/awesomeapp:1.0)
  • Fix a bug in awesomeapp
  • Build a new image and push it with the same tag (cp-enablement/awesomeapp:1.0)
  • Update your deployment
  • Realize that the bug is still present
  • Rinse and repeat steps 3 to 5 until you recognize this doesn’t work

The problem relates to how Kubernetes decides whether to do a docker pull when starting a container. Since we tagged our image as :1.0, the default pull policy is IfNotPresent. The Kubelet already has a local copy of cp-enablement/awesomeapp:1.0, hence it doesn’t attempt to do a docker pull. When the new Pods come up, they still use the old broken Docker image.

There are three ways to resolve this:

  • Switch to using the tag :latest (DO NOT DO THIS!)
  • Specify ImagePullPolicy: always (not recomended).
  • Use unique tags (best practice)

Solution

In the quest to automate myself out of a job, I created a bash script that runs anytime to create a new tag and push the build result to the registry.

#!/usr/bin/env bash

# Set the docker image name and the corresponding repository
# Ensure that you change them in the deployment.yml as well.
# You must be logged in with docker login…
#
# CHANGE THIS TO YOUR Docker.io SETTINGS
#
PROJECT=awesomeapp
REPOSITORY=cp-enablement

# exit if any subcommand or pipeline returns a non-zero status.
set -e

# set debug mode
#set -x

# build my nodeJS app
#
npm run build

# get latest version IDs from the Docker.io registry and increment them
#
VERSION=$(curl https://registry.hub.docker.com/v1/repositories/$REPOSITORY/$PROJECT/tags  | sed -e 's/[][]//g' -e 's/"//g' -e 's/ //g' | tr '}' '\n'  | awk -F: '{print $3}' | grep v| tail -n 1)
VERSION=${VERSION:1}
((VERSION++))
VERSION="v$VERSION"


# build a new docker image
#
echo '>>> Building new image'
# Due to a bug in Docker we need to analyse the log to find out if build passed (see https://github.com/dotcloud/docker/issues/1875)
docker build --no-cache=true -t $REPOSITORY/$PROJECT:$VERSION . | tee /tmp/docker_build_result.log
RESULT=$(cat /tmp/docker_build_result.log | tail -n 1)
if [[ "$RESULT" != *Successfully* ]];
then
  exit -1
fi


echo '>>> Push new image'
docker push $REPOSITORY/$PROJECT:$VERSION

5.5 - Custom Seccomp Profile

Custom Seccomp profile

Context

Seccomp (secure computing mode) is a security facility in the Linux kernel for restricting the set of system calls applications can make.

Starting from Kubernetes v1.3.0 the Seccomp feature is in Alpha. To configure it on a Pod, the following annotations can be used:

  • seccomp.security.alpha.kubernetes.io/pod: <seccomp-profile> where <seccomp-profile> is the seccomp profile to apply to all containers in a Pod.
  • container.seccomp.security.alpha.kubernetes.io/<container-name>: <seccomp-profile> where <seccomp-profile> is the seccomp profile to apply to <container-name> in a Pod.

More details can be found in the PodSecurityPolicy documentation.

Installation of custom profile

By default, kubelet loads custom Seccomp profiles from /var/lib/kubelet/seccomp/. There are two ways in which Seccomp profiles can be added to a Node:

  • to be baked in the machine image
  • to be added at runtime.

This guide focuses on creating those profiles via a DaemonSet.

Create a file called seccomp-profile.yaml with the following content:

apiVersion: v1
kind: ConfigMap
metadata:
  name: seccomp-profile
  namespace: kube-system
data:
  my-profile.json: |
    {
      "defaultAction": "SCMP_ACT_ALLOW",
      "syscalls": [
        {
          "name": "chmod",
          "action": "SCMP_ACT_ERRNO"
        }
      ]
    }    

The policy above is a very simple one and not siutable for complex applications. The default docker profile can be used a reference. Feel free to modify it to your needs.

Apply the ConfigMap in your cluster:

$ kubectl apply -f seccomp-profile.yaml
configmap/seccomp-profile created

The next steps is to create the DaemonSet seccomp installer. It’s going to copy the policy from above in /var/lib/kubelet/seccomp/my-profile.json.

Create a file called seccomp-installer.yaml with the following content:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: seccomp
  namespace: kube-system
  labels:
    security: seccomp
spec:
  selector:
    matchLabels:
      security: seccomp
  template:
    metadata:
      labels:
        security: seccomp
    spec:
      initContainers:
      - name: installer
        image: alpine:3.10.0
        command: ["/bin/sh", "-c", "cp -r -L /seccomp/*.json /host/seccomp/"]
        volumeMounts:
        - name: profiles
          mountPath: /seccomp
        - name: hostseccomp
          mountPath: /host/seccomp
          readOnly: false
      containers:
      - name: pause
        image: k8s.gcr.io/pause:3.1
      terminationGracePeriodSeconds: 5
      volumes:
      - name: hostseccomp
        hostPath:
          path: /var/lib/kubelet/seccomp
      - name: profiles
        configMap:
          name: seccomp-profile

Create the installer and wait until it’s ready on all Nodes:

$ kubectl apply -f seccomp-installer.yaml
daemonset.apps/seccomp-installer created

$ kubectl -n kube-system get pods -l security=seccomp
NAME                      READY   STATUS    RESTARTS   AGE
seccomp-installer-wjbxq   1/1     Running   0          21s

Create a Pod using custom Seccomp profile

Finally we want to create a profile which uses our new Seccomp profile my-profile.json.

Create a file called my-seccomp-pod.yaml with the following content:

apiVersion: v1
kind: Pod
metadata:
  name: seccomp-app
  namespace: default
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: "localhost/my-profile.json"
    # you can specify seccomp profile per container. If you add another profile you can configure
    # it for a specific container - 'pause' in this case.
    # container.seccomp.security.alpha.kubernetes.io/pause: "localhost/some-other-profile.json"
spec:
  containers:
  - name: pause
    image: k8s.gcr.io/pause:3.1

Create the Pod and see that’s running:

$ kubectl apply -f my-seccomp-pod.yaml
pod/seccomp-app created

$ kubectl get pod seccomp-app
NAME         READY   STATUS    RESTARTS   AGE
seccomp-app  1/1     Running   0          42s

Throubleshooting

If an invalid or not existing profile is used then the Pod will be stuck in ContainerCreating phase:

broken-seccomp-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: broken-seccomp
  namespace: default
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: "localhost/not-existing-profile.json"
spec:
  containers:
  - name: pause
    image: k8s.gcr.io/pause:3.1
$ kubectl apply -f broken-seccomp-pod.yaml
pod/broken-seccomp created

$ kubectl get pod broken-seccomp
NAME            READY   STATUS              RESTARTS   AGE
broken-seccomp  1/1     ContainerCreating   0          2m

$ kubectl describe pod broken-seccomp
Name:               broken-seccomp
Namespace:          default
....
Events:
  Type     Reason                  Age               From                     Message
  ----     ------                  ----              ----                     -------
  Normal   Scheduled               18s               default-scheduler        Successfully assigned kube-system/broken-seccomp to docker-desktop
  Warning  FailedCreatePodSandBox  4s (x2 over 18s)  kubelet, docker-desktop  Failed create pod sandbox: rpc error: code = Unknown desc = failed to make sandbox docker config for pod "broken-seccomp": failed to generate sandbox security options
for sandbox "broken-seccomp": failed to generate seccomp security options for container: cannot load seccomp profile "/var/lib/kubelet/seccomp/not-existing-profile.json": open /var/lib/kubelet/seccomp/not-existing-profile.json: no such file or directory

Further reading

5.6 - Dockerfile Pitfalls

Common Dockerfile pitfalls

Using latest tag for an image

Many Dockerfiles use the FROM package:latest pattern at the top of their Dockerfiles to pull the latest image from a Docker registry.

Bad Dockerfile

FROM alpine

While simple, using the latest tag for an image means that your build can suddenly break if that image gets updated. This can lead to problems where everything builds fine locally (because your local cache thinks it is the latest) while a build server may fail, because some Pipelines makes a clean pull on every build. Additionally, troubleshooting can prove to be difficult, since the maintainer of the Dockerfile didn’t actually make any changes.

Good Dockerfile

A digest takes the place of the tag when pulling an image. This will ensure your Dockerfile remains immutable.

FROM alpine@sha256:7043076348bf5040220df6ad703798fd8593a0918d06d3ce30c6c93be117e430

Running apt/apk/yum update

Running apt-get install is one of those things virtually every Debian-based Dockerfile will have to satiate some external package requirements your code needs to run. But, using apt-get as an example, comes with its own problems.

apt-get upgrade

This will update all your packages to their latests versions, which can be bad because it prevents your Dockerfile from creating consistent, immutable builds.

apt-get update in a different line than running your apt-get install command.

Running apt-get update as a single line entry will get cached by the build and won’t actually run every time you need to run apt-get install. Instead, make sure you run apt-get update in the same line with all the packages to ensure all are updated correctly.

Avoid big container images

Building small container image will reduce the time needed to start or restart pods. An image based on the popular Alpine Linux project is much smaller than most distribution based images (~5MB). For most popular languages and products, there are usually an official Alpine Linux image, e.g. golang, nodejs and postgres.

$  docker images
REPOSITORY                                                      TAG                     IMAGE ID            CREATED             SIZE
postgres                                                        9.6.9-alpine            6583932564f8        13 days ago         39.26 MB
postgres                                                        9.6                     d92dad241eff        13 days ago         235.4 MB
postgres                                                        10.4-alpine             93797b0f31f4        13 days ago         39.56 MB

In addition, for compiled languages such as Go or C++ which does not requires build time tooling during runtime, it is recommended to avoid build time tooling in the final images. With Docker’s support for multi-stages builds this can be easily achieved with minimal effort. Such an example can be found here.
Google’s distroless image is also a good base image.

5.7 - Integrity and Immutability

Ensure that you get always the right image

Introduction

When transferring data among networked systems, trust is a central concern. In particular, when communicating over an untrusted medium such as the internet, it is critical to ensure the integrity and immutability of all the data a system operates on. Especially if you use Docker Engine to push and pull images (data) to a public registry.

This immutability offers me a guarantee that any and all containers that I instantiate will be absolutely identical at inception. Surprise surprise, deterministic operations.

A Lesson in Deterministic Ops

Docker Tags are about as reliable and disposable as this guy down here.

docker-labels

Seems simple enough. You have probably already deployed hundreds of YAML’s or started endless count of Docker container.

docker run --name mynginx1 -P -d nginx:1.13.9

or

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rss-site
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: front-end
          image: nginx:1.13.9
          ports:
            - containerPort: 80

But Tags are mutable and humans are prone to error. Not a good combination. Here we’ll dig into why the use of tags can be dangerous and how to deploy your containers across a pipeline and across environments, you guessed it, with determinism in mind.

I want to ensure that whether it’s today or 5 years from now, that specific deployment uses the very same image that I defined. Any updates or newer versions of an image should be executed as a new deployment. The solution: digest

A digest takes the place of the tag when pulling an image, for example, to pull the above image by digest, run the following command:

docker run --name mynginx1 -P -d nginx@sha256:4771d09578c7c6a65299e110b3ee1c0a2592f5ea2618d23e4ffe7a4cab1ce5de

You can now make sure that the same image is always loaded at every deployment. It doesn’t matter if the TAG of the image has been changed or not. This solves the problem of repeatability.

Content Trust

However, there’s an additionally hidden danger. It is possible for an attacker to replace a server image with another one infected with malware.

docker-content-trust

Docker Content trust gives you the ability to verify both the integrity and the publisher of all the data received from a registry over any channel.

Prior to version 1.8, Docker didn’t have a way to verify the authenticity of a server image. But in v1.8, a new feature called Docker Content Trust was introduced to automatically sign and verify the signature of a publisher.

So, as soon as a server image is downloaded, it is cross-checked with the signature of the publisher to see if someone tampered with it in any way. This solves the problem of trust.

In addition you should scan all images for known vulnerabilities, this can fill another book

5.8 - Kubernetes Antipatterns

Common Antipatterns for Kubernetes and Docker

antipattern

This HowTo covers common kubernetes antipatterns that we have seen over the past months.

Running as root user.

Whenever possible, do not run containers as root user. One could be tempted to say that Kubernetes Pods and Node are well separated. Host and containers running on it share the same kernel. If a container is compromised, the root user in the container has full control over the underlying node.

Watch the very good presentation by Liz Rice at the KubeCon 2018

Use RUN groupadd -r anygroup && useradd -r -g anygroup myuser to create a group and add a user to it. Use the USER command to switch to this user. Note that you may also consider to provide an explicit UID/GID if required.

For Example:

ARG GF_UID="500"
ARG GF_GID="500"

# add group & user
RUN groupadd -r -g $GF_GID appgroup && \
   useradd appuser -r -u $GF_UID -g appgroup

USER appuser

Store data or logs in containers

Containers are ideal for stateless applications and should be transient. This means that no data or logs should be stored in the container, as they are lost when the container is closed. Use persistence volumes instead to persist data outside of containers. Using an ELK stack is another good option for storing and processing logs.

Using pod IP addresses

Each pod is assigned an IP address. It is necessary for pods to communicate with each other to build an application, e.g. an application must communicate with a database. Existing pods are terminated and new pods are constantly started. If you would rely on the IP address of a pod or container, you would need to update the application configuration constantly. This makes the application fragile. Create services instead. They provide a logical name that can be assigned independently of the varying number and IP addresses of containers. Services are the basic concept for load balancing within Kubernetes.

More than one process in a container

A docker file provides a CMD and ENTRYPOINT to start the image. CMD is often used around a script that makes a configuration and then starts the container. Do not try to start multiple processes with this script. It is important to consider the separation of concerns when creating docker images. Running multiple processes in a single pod makes managing your containers, collecting logs and updating each process more difficult. You can split the image into multiple containers and manage them independently - even in one pod. Bear in mind that Kubernetes only monitors the process with PID=1. If more than one process is started within a container, then these no longer fall under the control of Kubernetes.

Creating images in a running container

A new image can be created with the docker commit command. This is useful if changes have been made to the container and you want to persist them for later error analysis. However, images created like this are not reproducible and completely worthless for a CI/CD environment. Furthermore, another developer cannot recognize which components the image contains. Instead, always make changes to the docker file, close existing containers and start a new container with the updated image.

Saving passwords in docker image 💀

Do not save passwords in a Docker file. They are in plain text and are checked into a repository. That makes them completely vulnerable even if you are using a private repository like the Artifactory. Always use Secrets or ConfigMaps to provision passwords or inject them by mounting a persistent volume.

Using the ’latest’ tag

Starting an image with tomcat is tempting. If no tags are specified, a container is started with the tomcat:latest image. This image may no longer be up to date and refers to an older version instead. Running a production application requires complete control of the environment with exact versions of the image. Make sure you always use a tag or even better the sha256 hash of the image e.g. tomcat@sha256:c34ce3c1fcc0c7431e1392cc3abd0dfe2192ffea1898d5250f199d3ac8d8720f. Why use the sha256 hash? Tags are not immutable and can be overwritten by a developer at any time. In this case you don’t have complete control over your image - which is bad.

Different images per environment

Don’t create different images for development, testing, staging and production environments. The image should be the source of truth and should only be created once and pushed to the repository. This image:tag should be used for different environments in the future.

Depend on start order of pods

Applications often depend on containers being started in a certain order. For example, a database container must be up and running before an application can connect to it. The application should be resilient to such changes, as the db pod can be unreachable or restarted at any time. The application container should be able to handle such situations without terminating or crashing.

Additional anti-patterns and patterns…

In the community vast experience have been collected to improve stability and usability of Docker and Kubernetes. Refer to the following link for more information

5.9 - Namespace Isolation

…or DENY all traffic from other namespaces

You can configure a NetworkPolicy to deny all the traffic from other namespaces while allowing all the traffic coming from the same namespace the pod was deployed into.

There are many reasons why you may chose to employ Kubernetes network policies:

  • Isolate multi-tenant deployments
  • Regulatory compliance
  • Ensure containers assigned to different environments (e.g. dev/staging/prod) cannot interfere with each other

Kubernetes network policies are application centric compared to infrastructure/network centric standard firewalls. There are no explicit CIDRs or IP addresses used for matching source or destination IP’s. Network policies build up on labels and selectors which are key concepts of Kubernetes that are used to organize (for e.g all DB tier pods of an app) and select subsets of objects.

Example

We create two nginx HTTP-Servers in two namespaces and block all traffic between the two namespaces. E.g. you are unable to get content from namespace1 if you are sitting in namespace2.

Setup the namespaces

# create two namespaces for test purpose
kubectl create ns customer1
kubectl create ns customer2

# create a standard HTTP web server
kubectl run nginx --image=nginx --replicas=1 --port=80 -n=customer1
kubectl run nginx --image=nginx --replicas=1 --port=80 -n=customer2

# expose the port 80 for external access
kubectl expose deployment nginx --port=80 --type=NodePort -n=customer1
kubectl expose deployment nginx --port=80 --type=NodePort -n=customer2

Test without NP

Create a pod with curl preinstalled inside the namespace customer1

# create a "bash" pod in one namespace
kubectl run -i --tty client --image=tutum/curl -n=customer1

try to curl the exposed nginx server to get the default index.html page. Execute this in the bash prompt of the pod created above.

# get the index.html from the nginx of the namespace "customer1" => success
curl http://nginx.customer1
# get the index.html from the nginx of the namespace "customer2" => success
curl http://nginx.customer2

Both calls are done in a pod within namespace customer1 and both nginx servers are always reachable, no matter in what namespace.


Test with NP

Install the NetworkPolicy from your shell

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-from-other-namespaces
spec:
  podSelector:
    matchLabels:
  ingress:
  - from:
    - podSelector: {}
  • it applies the policy to ALL pods in the named namespace as the spec.podSelector.matchLabels is empty and therefore selects all pods.
  • it allows traffic from ALL pods in the named namespace, as spec.ingress.from.podSelector is empty and therefore selects all pods.
kubectl apply -f ./network-policy.yaml -n=customer1
kubectl apply -f ./network-policy.yaml -n=customer2

after this curl http://nginx.customer2 shouldn’t work anymore if you are a service inside the namespace customer1 and vice versa

Note: This policy, once applied, will also disable all external traffic to these pods. For example you can create a service of type LoadBalancer in namespace customer1 that match the nginx pod. When you request the service by its <EXTERNAL_IP>:<PORT>, then the network policy will deny the ingress traffic from the service and the request will time out.

More

You can get more information how to configure the NetworkPolicies on:

5.10 - Orchestration of Container Startup

How to orchestrate startup sequence of multiple containers

Disclaimer

If an application depends on other services deployed separately do not rely on a certain start sequence of containers but ensure that the application can cope with unavailability of the services it depends on.

Introduction

Kubernetes offers a feature called InitContainers to perform some tasks during a pod’s initialization. In this tutorial we demonstrate how to use it orchestrate starting sequence of multiple containers. The tutorial uses the example app url-shortener which consists of two components:

  • postgresql database
  • webapp which depends on postgresql database and provides two endpoints: create a short url from a given location, and redirect from a given short URL to the corresponding target location.

This app represents the minimal example where an application relies on another service or database. In this example, if the application starts before database is ready, the application will fail as shown below:

$ kubectl logs webapp-958cf5567-h247n
time="2018-06-12T11:02:42Z" level=info msg="Connecting to Postgres database using: host=`postgres:5432` dbname=`url_shortener_db` username=`user`\n"
time="2018-06-12T11:02:42Z" level=fatal msg="failed to start: failed to open connection to database: dial tcp: lookup postgres on 100.64.0.10:53: no such host\n"


$ kubectl get po -w
NAME                                READY     STATUS    RESTARTS   AGE
webapp-958cf5567-h247n   0/1       Pending   0         0s
webapp-958cf5567-h247n   0/1       Pending   0         0s
webapp-958cf5567-h247n   0/1       ContainerCreating   0         0s
webapp-958cf5567-h247n   0/1       ContainerCreating   0         1s
webapp-958cf5567-h247n   0/1       Error     0         2s
webapp-958cf5567-h247n   0/1       Error     1         3s
webapp-958cf5567-h247n   0/1       CrashLoopBackOff   1         4s
webapp-958cf5567-h247n   0/1       Error     2         18s
webapp-958cf5567-h247n   0/1       CrashLoopBackOff   2         29s
webapp-958cf5567-h247n   0/1       Error     3         43s
webapp-958cf5567-h247n   0/1       CrashLoopBackOff   3         56s

If the restartPolicy is set to Always (default) in yaml, the application will continue to restart the pod with an exponential back-off delay in case of failure.

Using InitContaniner

To avoid such situation, InitContainers can be defined which are executed prior to the application container. If one InitContainers fails, the application container won’t be triggered.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      initContainers:  # check if DB is ready, and only continue when true
      - name: check-db-ready
        image: postgres:9.6.5
        command: ['sh', '-c',  'until pg_isready -h postgres -p 5432;  do echo waiting for database; sleep 2; done;']
      containers:
      - image: xcoulon/go-url-shortener:0.1.0
        name: go-url-shortener
        env:
        - name: POSTGRES_HOST
          value: postgres
        - name: POSTGRES_PORT
          value: "5432"
        - name: POSTGRES_DATABASE
          value: url_shortener_db
        - name: POSTGRES_USER
          value: user
        - name: POSTGRES_PASSWORD
          value: mysecretpassword
        ports:
        - containerPort: 8080

In above example, the InitContainers uses docker image postgres:9.6.5 which is different from the application container. This also brings the advantage of not having to include unnecessary tools (e.g. pg_isready) in the application container.

With introduction of InitContainers, the pod startup will look like following in case database is not available yet:

$ kubectl get po -w
NAME                                READY     STATUS    RESTARTS   AGE
nginx-deployment-5cc79d6bfd-t9n8h   1/1       Running   0          5d
privileged-pod                      1/1       Running   0          4d
webapp-fdcb49cbc-4gs4n   0/1       Pending   0         0s
webapp-fdcb49cbc-4gs4n   0/1       Pending   0         0s
webapp-fdcb49cbc-4gs4n   0/1       Init:0/1   0         0s
webapp-fdcb49cbc-4gs4n   0/1       Init:0/1   0         1s


$ kubectl  logs webapp-fdcb49cbc-4gs4n
Error from server (BadRequest): container "go-url-shortener" in pod "webapp-fdcb49cbc-4gs4n" is waiting to start: PodInitializing

5.11 - Out-Dated HTML and JS Files Delivered

Why is my application always outdated?

Problem

After updating your HTML and JavaScript sources in your web application, the kubernetes cluster delivers outdated versions - why?

Preamble

By default, Kubernetes service pods are not accessible from the external network, but only from other pods within the same Kubernetes cluster.

The Gardener cluster has a built-in configuration for HTTP load balancing called Ingress, defining rules for external connectivity to Kubernetes services. Users who want external access to their Kubernetes services create an ingress resource that defines rules, including the URI path, backing service name, and other information. The Ingress controller can then automatically program a frontend load balancer to enable Ingress configuration.

nginx

Example Ingress Configuration

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: vuejs-ingress
spec:
  rules:
  - host: test.ingress.<GARDENER-CLUSTER>.<GARDENER-PROJECT>.shoot.canary.k8s-hana.ondemand.com
    http:
      paths:
      - backend:
          serviceName: vuejs-svc
          servicePort: 8080

where:

  • <GARDENER-CLUSTER>: The cluster name in the Gardener
  • <GARDENER-PROJECT>: You project name in the Gardener

What is the underlying problem?

The ingress controller we are using is NGINX.

NGINX is a software load balancer, web server, and content cache built on top of open source NGINX.

NGINX caches the content as specified in the HTTP header. If the HTTP header is missing, it is assumed that the cache is forever and NGINX never updates the content in the stupidest case.

Solution

In general you can avoid this pitfall with one of the solutions below:

  • use a cache buster + HTTP-Cache-Control(prefered)
  • use HTTP-Cache-Control with a lower retention period
  • disable the caching in the ingress (just for dev purpose)

Learning how to set the HTTP header or setup a cache buster is left to the read as an exercise for your web framework (e.g. Express/NodeJS, SpringBoot,…)

Here an example how to disable the cache control for your ingress done with an annotation in your ingress YAML (during development).

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    ingress.kubernetes.io/cache-enable: "false"
  name: vuejs-ingress
spec:
  rules:
  - host: test.ingress.<GARDENER-CLUSTER>.<GARDENER-PROJECT>.shoot.canary.k8s-hana.ondemand.com
    http:
      paths:
      - backend:
          serviceName: vuejs-svc
          servicePort: 8080

5.12 - Storing Secrets in git 💀

Never ever commit a kubeconfig.yaml into github

Problem

If you commit sensitive data, such as a kubeconfig.yaml or SSH key into a Git repository, you can remove it from the history. To entirely remove unwanted files from a repository’s history you can use the git filter-branch command.

The git filter-branch command rewrite your repository’s history, which changes the SHAs for existing commits that you alter and any dependent commits. Changed commit SHAs may affect open pull requests in your repository. I recommend merging or closing all open pull requests before removing files from your repository.

Warning: - if someone has already checked out the repository, then of course he has the secret on his computer. So ALWAYS revoke the OAuthToken/Password or whatever it was imediately.

Purging a file from your repository’s history

Warning: If you run git filter-branch after stashing changes, you won’t be able to retrieve your changes with other stash commands. Before running git filter-branch, we recommend unstashing any changes you’ve made. To unstash the last set of changes you’ve stashed, run git stash show -p | git apply -R. For more information, see Git Tools Stashing.

To illustrate how git filter-branch works, we’ll show you how to remove your file with sensitive data from the history of your repository and add it to .gitignore to ensure that it is not accidentally re-committed.

Navigate into the repository’s working directory.

cd YOUR-REPOSITORY

Run the following command, replacing PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA with the path to the file you want to remove, not just its filename.

These arguments will:

  • Force Git to process, but not check out, the entire history of every branch and tag
  • Remove the specified file, as well as any empty commits generated as a result
  • Overwrite your existing tags
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' \
--prune-empty --tag-name-filter cat -- --all

Add your file with sensitive data to .gitignore to ensure that you don’t accidentally commit it again.

 echo "YOUR-FILE-WITH-SENSITIVE-DATA" >> .gitignore

Double-check that you’ve removed everything you wanted to from your repository’s history, and that all of your branches are checked out.

Once you’re happy with the state of your repository, force-push your local changes to overwrite your GitHub repository, as well as all the branches you’ve pushed up:

git push origin --force --all

In order to remove the sensitive file from your tagged releases, you’ll also need to force-push against your Git tags:

git push origin --force --tags

Warning: Tell your collaborators to rebase, not merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging.

References:

5.13 - Using Prometheus and Grafana to Monitor K8s

How to deploy and configure Prometheus and Grafana to collect and monitor kubelet container metrics

Disclaimer

This post is meant to give a basic end-to-end description for deploying and using Prometheus and Grafana. Both applications offer a wide range of flexibility which needs to be considered in case you have specific requirenments. Such advanced details are not in the scope of this post.

Introduction

Prometheus is an open-source systems monitoring and alerting toolkit for recording numeric time series. It fits both machine-centric monitoring as well as monitoring of highly dynamic service-oriented architectures. In a world of microservices, its support for multi-dimensional data collection and querying is a particular strength.

Prometheus graduates within CNCF second hosted project.

The following characteristics make Prometheus a good match for monitoring Kubernetes clusters:

  • Pull-based monitoring
    Prometheus is a pull-based monitoring system, which means that the Prometheus server dynamically discovers and pulls metrics from your services running in Kubernetes.

  • Labels Prometheus and Kubernetes share the same label (key-value) concept that can be used to select objects in the system.
    Labels are used to identify time series and sets of label matchers can be used in the query language ( PromQL ) to select the time series to be aggregated..

  • Exporters
    There are many exporters available which enable integration of databases or even other monitoring systems not already providing a way to export metrics to Prometheus. One prominent exporter is the so called node-exporter, which allows to monitor hardware and OS related metrics of Unix systems.

  • Powerful query language
    The Prometheus query language PromQL lets the user select and aggregate time series data in real time. Results can either be shown as a graph, viewed as tabular data in the Prometheus expression browser, or consumed by external systems via the HTTP API.

Find query examples on Prometheus Query Examples.

One very popular open-source visualization tool not only for Prometheus is Grafana. Grafana is a metric analytics and visualization suite. It is popular for for visualizing time series data for infrastructure and application analytics but many use it in other domains including industrial sensors, home automation, weather, and process control [see Grafana Documentation].

Grafana accesses data via Data Sources. The continuously growing list of supported backends includes Prometheus.

Dashboards are created by combining panels, e.g. Graph and Dashlist.

In this example we describe an End-To-End scenario including the deployment of Prometheus and a basic monitoring configuration as the one provided for Kubernetes clusters created by Gardener.

If you miss elements on the Prometheus web page when accessing it via its service URL https://<your K8s FQN>/api/v1/namespaces/<your-prometheus-namespace>/services/prometheus-prometheus-server:80/proxy this is probably caused by Prometheus issue #1583 To workaround this issue setup a port forward kubectl port-forward -n <your-prometheus-namespace> <prometheus-pod> 9090:9090 on your client and access the Prometheus UI from there with your locally installed web browser. This issue is not relevant in case you use the service type LoadBalancer.

Preparation

The deployment of Prometheus and Grafana is based on Helm charts.
Make sure to implement the Helm settings before deploying the Helm charts.

The Kubernetes clusters provided by Gardener use role based access control (RBAC). To authorize the Prometheus node-exporter to access hardware and OS relevant metrics of your cluster’s worker nodes specific artifacts need to be deployed.

Bind the prometheus service account to the garden.sapcloud.io:monitoring:prometheus cluster role by running the command kubectl apply -f crbinding.yaml.

Content of crbinding.yaml

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: <your-prometheus-name>-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: garden.sapcloud.io:monitoring:prometheus
subjects:
- kind: ServiceAccount
  name: <your-prometheus-name>-server
  namespace: <your-prometheus-namespace>

Deployment of Prometheus and Grafana

Only minor changes are needed to deploy Prometheus and Grafana based on Helm charts.

Copy the following configuration into a file called values.yaml and deploy Prometheus: helm install <your-prometheus-name> --namespace <your-prometheus-namespace> stable/prometheus -f values.yaml

Typically, Prometheus and Grafana are deployed into the same namespace. There is no technical reason behind this so feel free to choose different namespaces.

Content of values.yaml for Prometheus:

rbac:
  create: false # Already created in Preparation step
nodeExporter:
  enabled: false # The node-exporter is already deployed by default

server:
  global:
    scrape_interval: 30s
    scrape_timeout: 30s

serverFiles:
  prometheus.yml:
    rule_files:
      - /etc/config/rules
      - /etc/config/alerts      
    scrape_configs:
    - job_name: 'kube-kubelet'
      honor_labels: false
      scheme: https

      tls_config:
      # This is needed because the kubelets' certificates are not generated
      # for a specific pod IP
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - target_label: __metrics_path__
        replacement: /metrics
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

    - job_name: 'kube-kubelet-cadvisor'
      honor_labels: false
      scheme: https

      tls_config:
      # This is needed because the kubelets' certificates are not generated
      # for a specific pod IP
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - target_label: __metrics_path__
        replacement: /metrics/cadvisor
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

    # Example scrape config for probing services via the Blackbox Exporter.
    #
    # Relabelling allows to configure the actual service scrape endpoint using the following annotations:
    #
    # * `prometheus.io/probe`: Only probe services that have a value of `true`
    - job_name: 'kubernetes-services'
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
        - role: service
      relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
          action: keep
          regex: true
        - source_labels: [__address__]
          target_label: __param_target
        - target_label: __address__
          replacement: blackbox
        - source_labels: [__param_target]
          target_label: instance
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          target_label: kubernetes_name
    # Example scrape config for pods
    #
    # Relabelling allows to configure the actual service scrape endpoint using the following annotations:
    #
    # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
    # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
    # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`.
    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: (.+):(?:\d+);(\d+)
          replacement: ${1}:${2}
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
    # Scrape config for service endpoints.
    #
    # The relabeling allows the actual service scrape endpoint to be configured
    # via the following annotations:
    #
    # * `prometheus.io/scrape`: Only scrape services that have a value of `true`
    # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
    # to set this to `https` & most likely set the `tls_config` of the scrape config.
    # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
    # * `prometheus.io/port`: If the metrics are exposed on a different port to the
    # service then set this appropriately.
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
        - role: endpoints
      relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: (.+)(?::\d+);(\d+)
          replacement: $1:$2
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_name # Add your additional configuration here...

Next, deploy Grafana. Since the deployment in this post is based on the Helm default values, the settings below are set explicitly in case the default changed. Deploy Grafana via helm install grafana --namespace <your-prometheus-namespace> stable/grafana -f values.yaml. Here, the same namespace is chosen for Prometheus and for Grafana.

Content of values.yaml for Grafana:

server:
  ingress:
    enabled: false
  service:
    type: ClusterIP

Check the running state of the pods on the Kubernetes Dashboard or by running kubectl get pods -n <your-prometheus-namespace>. In case of errors check the log files of the pod(s) in question.

The text output of Helm after the deployment of Prometheus and Grafana contains very useful information, e.g. the user and password of the Grafana Admin user. The credentials are stored as secrets in the namespace <your-prometheus-namespace> and could be decoded via kubectl get secret --namespace <my-grafana-namespace> grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo.

Basic functional tests

To access the web UI of both applications use port forwarding of port 9090.

Setup port forwarding for port 9090:

kubectl port-forward -n <your-prometheus-namespace> <your-prometheus-server-pod> 9090:9090

Open http://localhost:9090 in your web browser. Select Graph from the top tab and enter the following expressing to show the overall CPU usage for a server (see Prometheus Query Examples)

100 * (1 - avg by(instance)(irate(node_cpu{mode='idle'}[5m])))

This should show some data in a graph.

To show the same data in Grafana setup port forwarding for port 3000 for the Grafana pod and open the Grafana Web UI by opening http://localhost:3000 in a browser. Enter the credentials of the admin user.

Next, you need to enter the server name of your Prometheus deployment. This name is shown directly after the installation via helm.

Run

helm status <your-prometheus-name>

to find this name. Below this server name is referenced by <your-prometheus-server-name>.

First, you need to add your Prometheus server as data source.

  • select Dashboards → Data Sources
  • select Add data source
  • enter Name: <your-prometheus-datasource-name>
    Type: Prometheus
    URL: http://<your-prometheus-server-name>
    _Access: proxy
  • select Save & Test

In case of failure check the Prometheus URL in the Kubernetes Dashboard.

To add a Graph follow these steps:

  • in the left corner, select Dashboards → New to create a new dashboard
  • select Graph to create a new graph
  • next, select the Panel Title → Edit
  • select your Prometheus Data Source in the drop down list
  • enter the expression 100 * (1 - avg by(instance)(irate(node_cpu{mode='idle'}[5m]))) in the entry field A
  • select the floppy disk symbol (Save) on top

Now you should have a very basic Prometheus and Grafana setup for your Kubernetes cluster.

As a next step you can implement monitoring for your applications by implementing the Prometheus client API.