孤立资源(Orphaned Resources) 背景 孤立资源是指已经被删除的数据,但是还在 etcd 中。让我简单举个例子来说明一下,之前删除 namespace 的时候,使用了强行删除的方法,之后再检查资源的时候发现还存在:
1 2 3 4 5 root @master1:~# kubectl get pod -n kb-systemNAME READY STATUS RESTARTS AGEkb -addon-snapshot-controller-7 df64b4d5b-jl56q 0 /1 Error 6718 255 dkubeblocks -77 c594d646-nnl82 0 /1 CreateContainerConfigError 6 (31 m ago) 255 dkubeblocks -dataprotection-f8dd6659b-2 l8pd 0 /1 CreateContainerConfigError 6 (31 m ago) 255 d
如果这个时候我们再去删除会提示不存在:
1 2 3 root@master1:~# kubectl delete ns kb-system Error from server (NotFound): namespaces "kb-system" not found
1 2 3 4 5 root@master1:~# kubectl delete pod $(kubectl get pod -n kb-system | awk '{print $1}') -n kb-systemError from server (NotFound): pods "NAME" not foundError from server (NotFound): namespaces "kb-system" not foundError from server (NotFound): namespaces "kb-system" not foundError from server (NotFound): namespaces "kb-system" not found
像这种就是孤立资源。像孤立资源也会占用 k8s 里面的资源。导致我们系统性能下降。比如:
API Server 性能下降;
etcd 数据膨胀;
甚至影响 namespace 删除或重新创建。
清理方式 之前之所以删除失败,是因为 API Server 会先检查 namespace 会不会存在,因为我们之前是使用 API 方式去删除。对于 API Server 来说是不存在的了。
所以我们需要在 etcd 中去删除原数据,导入 etcd 证书,这里根据具体情况来导入:
1 2 3 4 5 6 export ETCDCTL_API=3export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pemexport ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-master1.pemexport ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-master1-key.pem
之后使用 etcdctl 进行测试:
1 2 3 4 5 6 etcdctl member list 3f2f31fb84eeea53, started, etcd-master3, https://10.10.254.14:2380, https://10.10.254.14:2379, false 4c83f4ebe5783013, started, etcd-master2, https://10.10.254.13:2380, https://10.10.254.13:2379, false 8079aa45283ecb82, started, etcd-master1, https://10.10.254.12:2380, https://10.10.254.12:2379, false
接着查看相关 pod 的 key:
1 2 3 4 5 etcdctl get --prefix /registry/pods/ --keys-only | grep /registry/pods/kb-system/kb-addon-snapshot-controller-7df64b4d5b-jl56q /registry/pods/kb-system/kubeblocks-77c594d646-nnl82 /registry/pods/kb-system/kubeblocks-dataprotection-f8dd6659b-2l8pd
最后执行删除:
1 2 3 for prefix in pods namespaces configmaps secrets deployments replicasets services; do etcdctl del --prefix /registry/$prefix /kb-system/ || true done
再次验证,相关 pod 已经删除:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 NAMESPACE NAME READY STATUS RESTARTS AGEdefault low-resource-app-6 c94c7f769-6 vj77 1 /1 Running 7 (138 m ago) 190 dkube -system cilium-7 b9l9 1 /1 Running 587 (138 m ago) 257 dkube -system cilium-kjdsz 1 /1 Running 587 (138 m ago) 257 dkube -system cilium-kk8v4 1 /1 Running 587 (138 m ago) 257 dkube -system cilium-operator-fddf55d65-pldkq 1 /1 Running 672 (138 m ago) 257 dkube -system cilium-pbz67 1 /1 Running 588 (139 m ago) 257 dkube -system cilium-qxqnq 1 /1 Running 587 (139 m ago) 257 dkube -system cilium-xftk8 1 /1 Running 0 7 m4skube -system coredns-794 b9d47c4-k54k8 1 /1 Running 9 (138 m ago) 257 dkube -system coredns-794 b9d47c4-zgvmk 1 /1 Running 9 (138 m ago) 257 dkube -system kube-apiserver-master1 1 /1 Running 558 (139 m ago) 257 dkube -system kube-apiserver-master2 1 /1 Running 554 (139 m ago) 257 dkube -system kube-apiserver-master3 1 /1 Running 555 (138 m ago) 257 dkube -system kube-controller-manager-master1 1 /1 Running 11 (139 m ago) 257 dkube -system kube-controller-manager-master2 1 /1 Running 9 (139 m ago) 257 dkube -system kube-controller-manager-master3 1 /1 Running 9 (138 m ago) 257 dkube -system kube-proxy-4 tsfp 1 /1 Running 9 (138 m ago) 257 dkube -system kube-proxy-g5fps 1 /1 Running 9 (138 m ago) 257 dkube -system kube-proxy-k2wf7 1 /1 Running 9 (139 m ago) 257 dkube -system kube-proxy-kzlmt 1 /1 Running 9 (138 m ago) 257 dkube -system kube-proxy-pj6kr 1 /1 Running 9 (138 m ago) 257 dkube -system kube-proxy-ssc49 1 /1 Running 9 (139 m ago) 257 dkube -system kube-scheduler-master1 1 /1 Running 11 (139 m ago) 257 dkube -system kube-scheduler-master2 1 /1 Running 9 (139 m ago) 257 dkube -system kube-scheduler-master3 1 /1 Running 9 (138 m ago) 257 dkube -system metrics-server-75 bf97fcc9-pvrkl 0 /1 Running 4 (138 m ago) 190 dkube -system nodelocaldns-8 gp4s 1 /1 Running 9 (138 m ago) 257 dkube -system nodelocaldns-cdpl2 1 /1 Running 9 (138 m ago) 257 dkube -system nodelocaldns-frqvs 1 /1 Running 9 (138 m ago) 257 dkube -system nodelocaldns-jr6b6 1 /1 Running 9 (139 m ago) 257 dkube -system nodelocaldns-kpvqg 1 /1 Running 9 (139 m ago) 257 dkube -system nodelocaldns-lprdx 1 /1 Running 9 (138 m ago) 257 d
孤立资源/幽灵资源生成过程 k8s 删除资源是一个异步过程,比如我们删除 namespace,他需要先删除 namespace 下面所有资源,比如 pod/deployment 等等。
kube-apiserver 收到来自 kubectl 或者客户端请求的时候,会给对应 namespace 打上标记:
1 2 3 4 5 6 status: phase: Terminating spec: finalizers: - kubernetes
最后按照之前说的清理掉资源, 最后删除 namespace。
flowchart TD
A["kubectl delete ns"] --> B["API Server 标记 Namespace 为 Terminating"]
B --> C["Namespace Controller 开始删除该命名空间下的所有子资源"]
C --> D["所有子资源(Pods / Services / Deployments / CRDs)清空"]
D --> E["Finalizer 被移除"]
E --> F["etcd 删除 /registry/namespaces/<ns> 键"]
F --> G["Namespace 完全删除"]
而如果我们是强行删除的话,就会绕过控制器的正常清理逻辑。资源不一定会被清空,有可能会留在 etcd 中,而留在 etcd 中的就是孤立资源。
参考
https://help.aliyun.com/zh/asm/support/how-do-i-delete-a-namespace-in-the-terminating-state
https://www.stackstate.com/blog/orphaned-resources-in-kubernetes-detection-impact-and-prevention-tips/
https://argo-cd.readthedocs.io/en/release-2.11/user-guide/orphaned-resources/