Navigating Cloud-Native Security - Lessons from a Recent Container Service Vulnerability

Sunday, September 12, 2021

4 minute read

The cloud-native landscape is constantly evolving, bringing immense benefits in agility and scale. However, with this evolution comes a complex and ever-changing threat landscape. Recently, a significant vulnerability was reported by Unit 42 concerning Azure Container Instances (ACI), a service designed to run containers in a multi-tenant environment. This incident offers valuable lessons for the entire community, and we at Gardener believe in sharing insights that can help strengthen collective security.

This particular vulnerability underscores the critical importance of vigilance, timely patching, and defense-in-depth, principles we have long championed within the Gardener project.

Understanding the ACI Vulnerability

As detailed in the Unit 42 report, the attack vector on ACI involved several stages, leveraging a combination of outdated software and architectural choices:

Outdated runc: The initial entry point exploited a version of runc from October 2016. This version was susceptible to CVE-2019-5736, a critical vulnerability allowing host takeover. This vulnerability was widely publicized in early 2019.
Lateral Movement via Kubelet Impersonation: After gaining node access, the next step involved attempting to impersonate the Kubelet to interact with the Kubernetes API server. The ACI clusters were reportedly running Kubernetes versions (v1.8.x - v1.10.x) from 2017/2018, which were, in principle, vulnerable to such an attack.
Exploiting a Custom Bridge Component: While the direct Kubelet impersonation didn’t work as initially expected, investigators found that a custom bridge component, designed to abstract the underlying Kubernetes, became the next target. This proprietary component inadvertently held a service account token with cluster-admin privileges. By capturing this token, attackers could gain full control over the Kubernetes cluster.
Control Plane Access: The report also noted that the API server appeared to be self-hosted, potentially allowing easier movement from the data plane to the control plane environment once cluster-admin privileges were obtained.
Alternative Attack Vector: A second distinct attack vector, also leveraging the custom bridge component, was identified, pointing to another area where security hardening could have prevented compromise.

Gardener’s Proactive Security Posture

The ACI incident highlights several threat vectors that the Gardener team has actively worked to mitigate over the years, often well in advance of them becoming widely exploited.

Timely Patching of Critical Vulnerabilities (e.g., runc CVE-2019-5736): When CVE-2019-5736 was pre-disclosed, the Gardener team treated it with utmost seriousness. We had announcements and patches prepared, rolling them out on the day of public disclosure. This rapid response is crucial for minimizing exposure to known high-severity vulnerabilities.
Hardening Against Exploits: The Kubelet impersonation vector mentioned in the ACI report is particularly relevant to Gardener. The underlying Kubernetes vulnerability (CVE-2020-8558, tracked as Kubernetes issue #85867) that could allow a compromised node/Kubelet to redirect API server traffic (like kubectl exec) was discovered and reported by Alban Crequy from Kinvolk. This discovery was made during a penetration test commissioned by the Gardener project, specifically asking to find loopholes in our seed clusters. We were able to implement mitigations in Gardener even before the upstream Kubernetes fix was available, further securing our seed cluster architecture. The second distinct attack vector was also discovered during such a penetration test and Gardener further hardened its network policies.
Principle of Least Privilege and Secure Component Design: The ACI bridge component’s cluster-admin token is a stark reminder of the dangers of overly privileged components, especially those interacting with user workloads. Within Gardener, we’ve invested heavily in mechanisms like the Gardener Seed Authorizer (as discussed in Gardener issue #1723). It goes beyond standard RBAC to strictly limit the capabilities of components and prevent lateral movement, ensuring that even if one part is compromised, the blast radius is contained. We also meticulously review and restrict permissions for all components.
Strict Separation of Concerns: A core architectural principle in Gardener is the strict separation between the control plane and the data plane - at all levels. Being an administrator in a shoot cluster does not grant any access to the underlying seed cluster’s control plane execution environment or the upper hierarchy of runtime and garden cluster, a critical defense against escalation.

Learning and Moving Forward

The ACI vulnerability is a powerful reminder that security is not a one-time task but a continuous process of vigilance, proactive hardening, and learning from every incident, whether our own or others’. No system is impenetrable, and the assumption that any single entity, regardless of size, has perfected security can lead to complacency.

At Gardener, we remain committed to:

Staying current: Diligently updating dependencies and core components.
Defense-in-depth: Implementing multiple layers of security controls.
Proactive discovery: Continuously testing and seeking out potential weaknesses.
Community collaboration: Sharing knowledge and contributing to upstream security efforts.

We believe that by fostering a culture of security awareness and investing in robust, layered defenses, we can build more resilient cloud-native systems for everyone. This recent industry event, while unfortunate for those affected, provides crucial learning points that reinforce our commitment to the security principles embedded in Gardener. We will continue to evolve Gardener’s security posture, always striving to stay ahead of emerging threats.