Post-Create Initialization of Machine Instance
Background
Today the driver.Driver facade represents the boundary between the the machine-controller and its various provider specific implementations.
We have abstract operations for creation/deletion and listing of machines (actually compute instances) but we do not correctly handle post-creation initialization logic. Nor do we provide an abstract operation to represent the hot update of an instance after creation.
We have found this to be necessary for several use cases. Today in the MCM AWS Provider, we already misuse driver.GetMachineStatus which is supposed to be a read-only operation obtaining the status of an instance.
Each AWS EC2 instance performs source/destination checks by default. For EC2 NAT instances these should be disabled. This is done by issuing a ModifyInstanceAttribute request with the
SourceDestCheckset tofalse. The MCM AWS Provider, decodes the AWSProviderSpec, readsproviderSpec.SrcAndDstChecksEnabledand correspondingly issues the call to modify the already launched instance. However, this should be done as an action after creating the instance and should not be part of the VM status retrieval.Similarly, there is a pending PR to add the
Ipv6AddessCountandIpv6PrefixCountto enable the assignment of an ipv6 address and an ipv6 prefix to instances. This requires constructing and issuing an AssignIpv6Addresses request after the EC2 instance is available.We have other uses-cases such as MCM Issue#750 where there is a requirement to provide a way for consumers to add tags which can be hot-updated onto instances. This requirement can be generalized to also offer a convenient way to specify tags which can be applied to VMs, NICs, Devices etc.
We have a need for "machine-instance-not-ready" taint as described in MCM#740 which should only get removed once the post creation updates are finished.
Objectives
We will split the fulfilment of this overall need into 2 stages of implementation.
Stage-A: Support post-VM creation initialization logic of the instance suing a proposed
Driver.InitializeMachineby permitting provider implementors to add initialization logic after VM creation, return with special new error codecodes.Initializationfor initialization errors and correspondingly support a new machine operation stageInstanceInitializationwhich will be updated in the machineLastOperation. The triggerCreationFlow - a reconciliation sub-flow of the MCM responsible for orchestrating instance creation and updating machine status will be changed to support this behaviour.Stage-B: Introduction of
Driver.UpdateMachineand enhancing the MCM, MCM providers and gardener extension providers to support hot update of instances throughDriver.UpdateMachine. The MCM triggerUpdationFlow - a reconciliation sub-flow of the MCM which is supposed to be responsible for orchestrating instance update - but currently not used, will be updated to invoke the providerDriver.UpdateMachineon hot-updates to to theMachineobject
Stage-A Proposal
Current MCM triggerCreationFlow
Today, reconcileClusterMachine which is the main routine for the Machine object reconciliation invokes triggerCreationFlow at the end when the machine.Spec.ProviderID is empty or if the machine.Status.CurrentStatus.Phase is empty or in CrashLoopBackOff
%%{ init: {
'themeVariables':
{ 'fontSize': '12px'}
} }%%
flowchart LR
other["..."]
-->chk{"machine ProviderID empty
OR
Phase empty or CrashLoopBackOff ?
"}--yes-->triggerCreationFlow
chk--noo-->LongRetry["return machineutils.LongRetry"]Today, the triggerCreationFlow is illustrated below with some minor details omitted/compressed for brevity
NOTES
- The
lastopbelow is an abbreviation formachine.Status.LastOperation. This, along with the machine phase is generally updated on theMachineobject just before returning from the method. - regarding
phase=CrashLoopBackOff|Failed. the machine phase may either beCrashLoopBackOffor move toFailedif the difference between current time and themachine.CreationTimestamphas exceeded the configuredMachineCreationTimeout.
%%{ init: {
'themeVariables':
{ 'fontSize': '12px'}
} }%%
flowchart TD
end1(("end"))
begin((" "))
medretry["return MediumRetry, err"]
shortretry["return ShortRetry, err"]
medretry-->end1
shortretry-->end1
begin-->AddBootstrapTokenToUserData
-->gms["statusResp,statusErr=driver.GetMachineStatus(...)"]
-->chkstatuserr{"Check statusErr"}
chkstatuserr--notFound-->chknodelbl{"Chk Node Label"}
chkstatuserr--else-->createFailed["lastop.Type=Create,lastop.state=Failed,phase=CrashLoopBackOff|Failed"]-->medretry
chkstatuserr--nil-->initnodename["nodeName = statusResp.NodeName"]-->setnodename
chknodelbl--notset-->createmachine["createResp, createErr=driver.CreateMachine(...)"]-->chkCreateErr{"Check createErr"}
chkCreateErr--notnil-->createFailed
chkCreateErr--nil-->getnodename["nodeName = createResp.NodeName"]
-->chkstalenode{"nodeName != machine.Name\n//chk stale node"}
chkstalenode--false-->setnodename["if unset machine.Labels['node']= nodeName"]
-->machinepending["if empty/crashloopbackoff lastop.type=Create,lastop.State=Processing,phase=Pending"]
-->shortretry
chkstalenode--true-->delmachine["driver.DeleteMachine(...)"]
-->permafail["lastop.type=Create,lastop.state=Failed,Phase=Failed"]
-->shortretry
subgraph noteA [" "]
permafail -.- note1(["VM was referring to stale node obj"])
end
style noteA opacity:0
subgraph noteB [" "]
setnodename-.- note2(["Proposal: Introduce Driver.InitializeMachine after this"])
endEnhancement of MCM triggerCreationFlow
Relevant Observations on Current Flow
- Observe that we always perform a call to
Driver.GetMachineStatusand only then conditionally perform a call toDriver.CreateMachineif there was was no machine found. - Observe that after the call to a successful
Driver.CreateMachine, the machine phase is set toPending, theLastOperation.Typeis currently set toCreateand theLastOperation.Stateset toProcessingbefore returning with aShortRetry. TheLastOperation.Descriptionis (unfortunately) set to the fixed message:Creating machine on cloud provider. - Observe that after an erroneous call to
Driver.CreateMachine, the machine phase is set toCrashLoopBackOfforFailed(in case of creation timeout).
The following changes are proposed with a view towards minimal impact on current code and no introduction of a new Machine Phase.
MCM Changes
- We propose introducing a new machine operation
Driver.InitializeMachinewith the following signaturegotype Driver interface { // .. existing methods are omitted for brevity. // InitializeMachine call is responsible for post-create initialization of the provider instance. InitializeMachine(context.Context, *InitializeMachineRequest) error } // InitializeMachineRequest is the initialization request for machine instance initialization type InitializeMachineRequest struct { // Machine object whose VM instance should be initialized Machine *v1alpha1.Machine // MachineClass backing the machine object MachineClass *v1alpha1.MachineClass // Secret backing the machineClass object Secret *corev1.Secret } - We propose introducing a new MC error code
codes.Initializationindicating that the VM Instance was created but there was an error in initialization after VM creation. The implementor ofDriver.InitializeMachinecan return this error code, indicating thatInitializeMachineneeds to be called again. The Machine Controller will change the phase toCrashLoopBackOffas usual when encountering acodes.Initializationerror. - We will introduce a new machine operation stage
InstanceInitialization. In case of ancodes.Initializationerror- the
machine.Status.LastOperation.Descriptionwill be set toInstanceInitialization, machine.Status.LastOperation.ErrorCodewill be set tocodes.Initialization- the
LastOperation.Typewill be set toCreate - the
LastOperation.Stateset toFailedbefore returning with aShortRetry
- the
- The semantics of
Driver.GetMachineStatuswill be changed. If the instance associated with machine exists, but the instance was not initialized as expected, the provider implementations ofGetMachineStatusshould return an error:status.Error(codes.Initialization). - If
Driver.GetMachineStatusreturned an error encapsulatingcodes.InitializationthenDriver.InitializeMachinewill be invoked again in thetriggerCreationFlow. - As according to the usual logic, the main machine controller reconciliation loop will now re-invoke the
triggerCreationFlowagain if the machine phase isCrashLoopBackOff.
Illustration
AWS Provider Changes
Driver.InitializeMachine
The implementation for the AWS Provider will look something like:
- After the VM instance is available, check
providerSpec.SrcAndDstChecksEnabled, constructModifyInstanceAttributeInputand callModifyInstanceAttribute. In case of an error returncodes.Initializationinstead of the currentcodes.Internal - Check
providerSpec.NetworkInterfacesand ifIpv6PrefixCountis notnil, then constructAssignIpv6AddressesInputand callAssignIpv6Addresses. In case of an error returncodes.Initialization. Don't use the genericcodes.Internal
The existing Ipv6 PR will need modifications.
Driver.GetMachineStatus
- If
providerSpec.SrcAndDstChecksEnabledisfalse, checkec2.Instance.SourceDestCheck. If it does not match then returnstatus.Error(codes.Initialization) - Check
providerSpec.NetworkInterfacesand ifIpv6PrefixCountis notnil, checkec2.Instance.NetworkInterfacesand check ifInstanceNetworkInterface.Ipv6Addresseshas a non-nil slice. If this is not the case then returnstatus.Error(codes.Initialization)
Instance Not Ready Taint
- Due to the fact that creation flow for machines will now be enhanced to correctly support post-creation startup logic, we should not scheduled workload until this startup logic is complete. Even without this feature we have a need for such a taint as described in MCM#740
- We propose a new taint
node.machine.sapcloud.io/instance-not-readywhich will be added as a node startup taint in gardener core KubeletConfiguration.RegisterWithTaints - The will will then removed by MCM in health check reconciliation, once the machine becomes fully ready. (when moving to
Runningphase) - We will add this taint as part of
--ignore-taintin CA - We will introduce a disclaimer / prerequisite in the MCM FAQ, to add this taint as part of kubelet config under
--register-with-taints, otherwise workload could get scheduled , before machine beomesRunning
Stage-B Proposal
Enhancement of Driver Interface for Hot Updation
Kindly refer to the Hot-Update Instances design which provides elaborate detail.