Skip to content

GEP-0039: Live Control Plane Migration (Live CPM)

  • 📌 GEP Tracking Issue: https://github.com/gardener/enhancements/issues/39
  • 📖 GEP Link: https://github.com/gardener/enhancements/tree/main/geps/0039-live-control-plane-migration
  • ✍🏻 Author(s): @acumino (Sonu Kumar Singh), @ary1992 (Ashish Ranjan Yadav), @seshachalam-yv (Seshachalam), @shafeeqes (Shafeeque E S)
  • 🗓️ Presentation: 2026-02-09, 13:00 - 14:00 CET
  • 🎥 Recording: https://youtu.be/DdU8SNNf23o
  • 👨‍⚖️ Decisions:
    • No major technical decisions were finalized in this session.
    • Agreement to:
      • Revamp and update the proposal document, addressing the open questions above.
      • Clearly document assumptions, risks, and guarantees.
      • Hold a follow-up Technical Steering session in a few weeks to re-evaluate the updated proposal.
    • Key Discussion Points & Open Questions
      • Failure Handling & Recovery
        • What additional failure modes exist beyond those documented (e.g., ETCD scale-up failures)?
        • Can we always fall back to normal CPM, or are there cases requiring manual intervention?
        • Assumption: all failures except those explicitly documented should be retryable / recoverable.
        • Gardenlet restart behavior:
          • Current proposal resumes from the failed step.
          • This differs from usual reconciliation semantics (restart from beginning).
          • Question: could this lead to irrecoverable or inconsistent states?
      • ETCD-related Topics
        • 6-member ETCD risk:
          • 3 members per seed implies permanent quorum loss if seed-to-seed connectivity is lost.
        • ETCD APIs:
          • Separate APIs exist for member name prefix vs. externally managed members due to uniqueness constraints.
          • Question: can these be harmonized, or would that increase complexity?
        • ETCD member removal:
          • GEP-28 (SHSC) requires this as well.
          • Existing plan: etcd-druid removes members via HTTP calls to the backup-restore sidecar.
          • Question: can Live CPM reuse this approach instead of introducing EtcdOpsTask?
        • ETCD exposure:
          • Proposal doc is outdated and will be updated to reflect Istio-based exposure.
          • Open question: how are DNSRecords constructed in this model?
      • Networking & Connectivity
        • Is seed-to-seed connectivity guaranteed at all times?
        • VPN setup:
          • Why is an additional VPN seed server configuration needed on the destination seed?
          • Can we deploy directly in the “target” configuration from the start?
      • Scheduling & Latency Constraints
        • Scheduler ConfigMap distances are weights, not necessarily latency in ms.
        • How is the “distant region” prevention for LCPM enforced?
        • The proposal should explain how the 180 ms latency threshold was derived.
      • Control Plane Components & Coordination
        • Lease management:
          • Can controllers simply be recreated in the destination seed instead of running in both seeds?
          • Would this simplify the implementation / should we change the proposal?
        • Gardenlet coordination:
          • Current design uses back-and-forth updates via .status.liveMigration.
          • Question: would conditions be a clearer and more robust coordination mechanism?
        • Gardenlet versions:
          • How is it enforced that gardenlets in both seeds run the same version?
          • What happens if a gardenlet upgrade occurs while a migration is already in progress?
      • Autoscaling & Resource Management
        • VPA recommendations:
          • Not yet considered.
          • Open question: do VPACheckpoints need to be transferred as part of migration?
EU and German government funding logos

Funded by the European Union – NextGenerationEU.

The views and opinions expressed are solely those of the author(s) and do not necessarily reflect the views of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them.