如何以无侵方式实现Deployment原地升级?
本文将展示如何以无侵、原生的方式实现Deployment
原地升级。
在文章末尾会提供shell脚本供大家参考。
本文的原地升级仅指镜像更新
本篇kubernetes版本为v1.27.3。
原地升级的概念以及OpenKruise
的实现方式可以参考文章:从源码解析Kruise原地升级原理
kubernetes项目地址: https://github.com/kubernetes/kubernetes
controller命令main入口: cmd/kube-controller-manager/controller-manager.go
controller相关代码目录: pkg/controller
需要解决的问题
我们知道, Deployment是以管理多个RS
的方式来控制升级的。 当我们修改image
之后, 会同时存在两个镜像分别为"old image"和"new image"的RS
,当"new image"的RS
状态正常后, 另外一个RS
会被回收。 在这期间,pod
也同时完成新建的操作。
~|⇒ kubectl get deployment,rs,pod
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx 1/1 1 1 98s
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-54b596f5bf 1 1 1 98s
NAME READY STATUS RESTARTS AGE
pod/nginx-54b596f5bf-cw9n8 1/1 Running 0 98s
~|⇒ kubectl edit deployments.apps nginx # 修改image
deployment.apps/nginx edited
~|⇒ kubectl get deployment,rs,pod # 出现两个rs, 两个pod
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx 1/1 1 1 3m12s
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-54b596f5bf 1 1 1 3m12s
replicaset.apps/nginx-564768b864 1 1 0 2s
NAME READY STATUS RESTARTS AGE
pod/nginx-54b596f5bf-cw9n8 1/1 Running 0 3m12s
pod/nginx-564768b864-vzqfp 0/1 ContainerCreating 0 2s
~|⇒ kubectl describe deployments.apps nginx
Name: nginx
Namespace: default
CreationTimestamp: Mon, 04 Mar 2024 11:44:49 +0800
Labels: app=nginx
Annotations: deployment.kubernetes.io/revision: 2
Selector: name=nginx
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: name=nginx
Containers:
nginx:
Image: nginx:1.25.4
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: nginx-54b596f5bf (0/0 replicas created) # 标记出新旧的rs
NewReplicaSet: nginx-564768b864 (1/1 replicas created)
如果想要实现原地升级, 需要解决以下问题:
- 修改
Deployment
的image
字段后, 阻止资源的的重建 - 不重建pod的前提下,更新pod中的容器
- 保证
Deployment
和ReplicaSet
以及Pod
的相关信息一致,状态正常
解决方案
更新容器
先说更新容器的问题。
在 从源码解析Kruise原地升级原理 这篇文章中有提, 修改Pod
中容器的镜像,Pod
是不会重建的,本身是具有原地升级的能力。
相关信息一致
这个也好解决, 把更新内容同时更新到Deployment
和ReplicaSet
以及Pod
中即可。
阻止资源的的重建
阻止资源的的重建才是这个问题的关键。
我们可以通过修改代码的运行逻辑,或者一些hack
(如用webhook)的手段来做这件事情,但这不够优雅或者入侵了k8s的原生逻辑。
有一个命令可以满足我们的需求 – rollout pause
csi-driver-nfs|master ⇒ kubectl rollout pause --help
Mark the provided resource as paused.
Paused resources will not be reconciled by a controller. Use "kubectl rollout resume" to resume a paused resource.
Currently only deployments support being paused.
这个命令可以暂停Deployment
,被暂停的资源不会被Controller
控制,这正好满足我们的需求。
pause功能(源码)分析
Deployment控制器的源码解析,可以看文章 《Deployment控制器源码解析》 源码位置 pkg/controller/deployment
Deployment
处理最终会由DeploymentController.syncDeployment
方法处理, 方法中会对Pause
状态判断并处理
func (dc *DeploymentController) syncDeployment(ctx context.Context, key string) error {
//...
if d.Spec.Paused {
return dc.sync(ctx, d, rsList)
}
//...
}
func (dc *DeploymentController) sync(ctx context.Context, d *apps.Deployment, rsList []*apps.ReplicaSet) error {
// 负责更新rs, 我们只看这里
newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, false)
// ...
}
// 最终会由这个函数处理
func (dc *DeploymentController) getNewReplicaSet(ctx context.Context, d *apps.Deployment, rsList, oldRSs []*apps.ReplicaSet, createIfNotExisted bool) (*apps.ReplicaSet, error) {
logger := klog.FromContext(ctx)
// 通过对比deployment中的pod template hash和rs中的pod template hash来判断是否有新的rs存在
// 即不需要更新的rs
existingNewRS := deploymentutil.FindNewReplicaSet(d, rsList)
// 存在的最大版本
maxOldRevision := deploymentutil.MaxRevision(oldRSs)
// 新版本
newRevision := strconv.FormatInt(maxOldRevision+1, 10)
// 注意看这里, 如果存在新的rs, 会更新同步rs与deployment中关联的信息, 使其保持一致
// 这里包含:
// deploy.annotations -> rs.annotations
// rs.revision -> deploy.revision
if existingNewRS != nil {
rsCopy := existingNewRS.DeepCopy()
// 同步tAnnotation
annotationsUpdated := deploymentutil.SetNewReplicaSetAnnotations(ctx, d, rsCopy, newRevision, true, maxRevHistoryLengthInChars)
minReadySecondsNeedsUpdate := rsCopy.Spec.MinReadySeconds != d.Spec.MinReadySeconds
if annotationsUpdated || minReadySecondsNeedsUpdate {
rsCopy.Spec.MinReadySeconds = d.Spec.MinReadySeconds
return dc.client.AppsV1().ReplicaSets(rsCopy.ObjectMeta.Namespace).Update(ctx, rsCopy, metav1.UpdateOptions{})
}
// 同步revision
needsUpdate := deploymentutil.SetDeploymentRevision(d, rsCopy.Annotations[deploymentutil.RevisionAnnotation])
// 更新进度
cond := deploymentutil.GetDeploymentCondition(d.Status, apps.DeploymentProgressing)
if deploymentutil.HasProgressDeadline(d) && cond == nil {
msg := fmt.Sprintf("Found new replica set %q", rsCopy.Name)
condition := deploymentutil.NewDeploymentCondition(apps.DeploymentProgressing, v1.ConditionTrue, deploymentutil.FoundNewRSReason, msg)
deploymentutil.SetDeploymentCondition(&d.Status, *condition)
needsUpdate = true
}
if needsUpdate {
var err error
if _, err = dc.client.AppsV1().Deployments(d.Namespace).UpdateStatus(ctx, d, metav1.UpdateOptions{}); err != nil {
return nil, err
}
}
return rsCopy, nil
}
// sync调用时 createIfNotExisted = false
// 所以到这里就结束了, 下面的函数省略....
if !createIfNotExisted {
return nil, nil
}
// ...
}
从上述代码我们可以确定我们的操作顺序及方法:
kubectl rollout pause deployment xxx
暂停Deployment
- 修改pod中的
image
字段 - 修改rs中的
image
字段 - 修改
Deployment
中的image
字段 kubectl rollout resume deployment xxx
恢复Deployment
从pod开始修改,ownerReference
资源的更新动作触发时, 检查"pod template"会始终与被控资源保持一致, 以此跳过资源的重建。
实践
- 获取当前资源信息
csi-driver-nfs|master ⇒ kubectl get deployment,rs,pod
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx 1/1 1 1 5h18m
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-54b596f5bf 0 0 0 5h18m
replicaset.apps/nginx-564768b864 1 1 1 5h15m
NAME READY STATUS RESTARTS AGE
pod/nginx-564768b864-vzqfp 1/1 Running 0 5h15m
- 暂停deployemnt
csi-driver-nfs|master ⇒ kubectl rollout pause deployment nginx
csi-driver-nfs|master ⇒ kubectl describe deployments.apps nginx
Name: nginx
Namespace: default
CreationTimestamp: Mon, 04 Mar 2024 11:44:49 +0800
Labels: app=nginx
Annotations: deployment.kubernetes.io/revision: 2
Selector: name=nginx
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: name=nginx
Containers:
nginx:
Image: nginx:1.25.4
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing Unknown DeploymentPaused # 标记出deployment被暂停
OldReplicaSets: nginx-54b596f5bf (0/0 replicas created)
NewReplicaSet: nginx-564768b864 (1/1 replicas created)
Events: <none>
- 修改pod中的
image
字段, nginx:1.25.4 –> nginx:1.25
修改完成后pod没有被重建, restrts+1 , revision+1
csi-driver-nfs|master ⇒ kubectl get deployment nginx -o jsonpath="{.spec.template.spec.containers[0]}"
{"image":"nginx:1.25.4","imagePullPolicy":"Always","name":"nginx","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}
csi-driver-nfs|master ⇒ kubectl edit pod nginx-564768b864-vzqfp
pod/nginx-564768b864-vzqfp edited
csi-driver-nfs|master ⇒ kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-564768b864-vzqfp 1/1 Running 1 (6s ago) 5h20m
csi-driver-nfs|master ⇒ kubectl get pod nginx-564768b864-vzqfp -o jsonpath='{.spec.containers[0]}'
{"image":"nginx:1.25","imagePullPolicy":"Always","name":"nginx","ports":[{"containerPort":80,"protocol":"TCP"}],"resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","volumeMounts":[{"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount","name":"kube-api-access-qwfr5","readOnly":true}]}%
- 修改rs中的
image
字段, pod无变化
csi-driver-nfs|master ⇒ kubectl edit rs nginx-564768b864
replicaset.apps/nginx-564768b864 edited
csi-driver-nfs|master ⇒ kubectl get rs
NAME DESIRED CURRENT READY AGE
nginx-54b596f5bf 0 0 0 5h28m # 这个旧版本是之前修改非本次实验内容留存的, 不用管
nginx-564768b864 1 1 1 5h25m # 注意看我们后续的操作会不会使这个rs被回收
csi-driver-nfs|master ⇒ kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-564768b864-vzqfp 1/1 Running 1 (4m56s ago) 5h25m
- 修改
Deployment
中的image
字段, pod无变化
csi-driver-nfs|master ⇒ kubectl edit deployments.apps nginx
deployment.apps/nginx edited
csi-driver-nfs|master ⇒ kubectl get rs
NAME DESIRED CURRENT READY AGE
nginx-54b596f5bf 0 0 0 5h30m
nginx-564768b864 1 1 1 5h27m
csi-driver-nfs|master ⇒ kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-564768b864-vzqfp 1/1 Running 1 (7m4s ago) 5h27m
- 记录当前资源信息
- Deployment状态为 DeploymentPaused,
- OldReplicaSets: nginx-54b596f5bf (0/0 replicas created)
- NewReplicaSet: nginx-564768b864 (1/1 replicas created)
- Deployment revision版本: deployment.kubernetes.io/revision: “2”
- Deployment resource版本: resourceVersion: “159028”
- RS revision版本:deployment.kubernetes.io/revision: “2”
- RS resource版本: resourceVersion: “158921”
- 恢复
Deployment
csi-driver-nfs|master ⇒ kubectl rollout resume deployment nginx
- 查看资源信息
- Deployment状态为 NewReplicaSetAvailable , rs状态与上文一致
- Deployment revision版本 与上文一致
- Deployment resource版本 变更 (因为状态变化)
- RS信息均无变换
- 确认原地升级完成
原地升级脚本
脚本代码访问https://github.com/Forget-C/demo/tree/main/inplaceupdate/scripts
使用方法
脚本接收4个参数:
- Deployment名称
- Deployment的namespace
- Deployment的container名称
- Deployment的container的镜像
4个参数缺一不可, 且顺序不能错。
scripts|main⚡ ⇒ bash inplaceupdate.sh help
Usage: inplaceupdate.sh <name> <namespace> <container> <image>
脚本执行后,会修改pod、rs、deployment的镜像, 但不会删除pod, pod的属性也不会变更。
检查原地升级是否成功的方法为查看
- pod的镜像是否变更
- pod restart次数+1
执行
scripts|main⚡ ⇒ bash inplaceupdate.sh nginx default nginx nginx:1.25
deployment.apps/nginx paused
Pod nginx-54b596f5bf-qwgkl updated
Replicaset nginx-54b596f5bf updated
Deployment nginx updated
deployment.apps/nginx resumed
Deployment nginx change to nginx:1.25 completed successfully
Waiting for pods to be ready...
Pod nginx-54b596f5bf-qwgkl is ready
All pods are ready