目录
一、背景
二、问题现象
三、问题分析
1. 单机整体分析
2. 单机分析初步结论
3. Kubelet进程资源占用分析
4. Kubelet分析初步结论
四、根因确定
五、修复验证测试
1. 手动卸载单条记录确认
2. 全部清理确认
六、总结
单机整体分析
-
硬件规格及资源:vCPU为48核、内存310GiB; -
操作系统内核:操作系统为云厂商提供的操作系统,内核版本4.19.91-26.6.al7.x86_64; -
近期负载:1分钟、5分钟、15分钟负载分别为48.03、43.91和42.94,与单机的48核vCPU来讲,负载正常(一般的排查经验是运行负载小于<1.5vCPU,则表明机器负载并不过高,但是高负载并完全对等CPU饱和)。
ps aux|wc -l
20135
$ ps aux|grep '['|wc -l # 内核线程总量
19798
$ ps aux|grep '[jbd2'|wc -l # 内核线程以 jbd2 开头的总量
12769
$ ps aux|grep -v '[jbd2'|grep ext4-rsv-conver |wc -l # 内核中 ext4-rsv-conver 的线程数量
6385
$ ps aux|grep '['|more
...
root 1951 0.0 0.0 0 0 ? S Feb01 0:00 [jbd2/vdfiw-8]
root 1952 0.0 0.0 0 0 ? S Feb01 0:00 [jbd2-ckpt/vdfiw]
root 1953 0.0 0.0 0 0 ? I< Feb01 0:00 [ext4-rsv-conver]
root 2078 0.0 0.0 0 0 ? I< 2023 4:26 [kworker/6:1H-kb]
root 3009 0.0 0.0 0 0 ? S Feb01 0:00 [jbd2/vdfjp-8]
root 3010 0.0 0.0 0 0 ? S Feb01 0:00 [jbd2-ckpt/vdfjp]
root 3011 0.0 0.0 0 0 ? I< Feb01 0:00 [ext4-rsv-conver]
...
-
jbd2/vdfiw-8是一个与jbd2(Journaling Block Device version 2)内核文件操作日志系统存储的的内核线程; -
jbd2-ckpt/vdfiw是一个特定于jbd2的内核线程,其中ckpt代表“checkpoint”(检查点)。在jbd2和使用它的文件系统(如ext4)中,检查点是一种机制,用于定期将日志中的信息转移到文件系统本身,以减少在系统崩溃后恢复所需的时间; -
ext4-rsv-conver则是ext4由实现用于处理来自写回的转换工作,即“需要未写入的扩展处理并保留事务的已完成IO”。
df |grep vdfiw
/dev/vdfiw 308521792 56841364 251664044 19% /var/lib/kubelet/plugins/kubernetes.io/csi/pv/d-bp1iau7367wkpf6xqdpv/globalmount
-
/var/lib/kubelet/plugins/kubernetes.io/csi/pv/d-bp1iau7367wkpf6xqdpv/globalmount,
-
/var/lib/container/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged/var/lib/container/kubelet/plugins/kubernetes.io/csi/pv/d-bp1iau7367wkpf6xqdpv/globalmount。请记住这个特殊目录,后续我们的突破口也将在这里。
$mount|grep vdfiw
/dev/vdfiw on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/d-bp1iau7367wkpf6xqdpv/globalmount type ext4 (ro,relatime)
/dev/vdfiw on /var/lib/container/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged/logxxx_host/var/lib/container/kubelet/plugins/kubernetes.io/csi/pv/d-bp1iau7367wkpf6xqdpv/globalmount type ext4 (ro,relatime)
$lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
vdiko ext4 0442c714-4a4d-4d2b-8771-038edaefcf3f /var/lib/container/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged/logxxx_host/var/lib/kubelet/pods/e2c23872-1f
vdb ext4 ef9f6dfa-d518-44f9-914f-b141ff423191 /var/lib/container/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged/logxxx_host/opt/hybrid-data
vdiau ext4 e403a2ef-2e1d-4cb7-9382-8ca33637c381 /var/lib/container/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged/logxxx_host/var/lib/kubelet/pods/da088c11-29
vdc ext4 34c4dbcf-d514-4948-94fe-81ccf6ae5c90 /var/lib/container/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged/logxxx_host/var/lib/container
vda
└─vda1 ext4 1612a49b-a79d-4e87-819a-a5bad80fe2a9 /var/lib/container/docker/overl
$ mount |grep '/dev/vd' |awk '{print $1}'|sort|uniq|more
/dev/vda1
/dev/vdaa
/dev/vdaaa
....
$mount |grep '/dev/vd' |awk '{print $1}'|sort|uniq|wc -l
6385
$ mount |grep "/dev/vd"| wc -l
26503
$ mount |grep '/dev/vd' |grep -v 9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be|wc -l
12777
$ docker inspect `docker ps -a | awk '{ print $1}'|grep -v CONTAINER` > /tmp/docker_inspect.log
$ grep 9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be /tmp/docker_inspect.log 定位到容器 ID e4261544f8eb
$ docker ps -a|grep e4261544f8eb
e4261544f8eb 889ceee008bf "/usr/local/ilogxxx…" 8 months ago Removal In Progress k8s_logxxx_logxxx-ds-hsbv9_kube-system_93e182b4-ed88-4344-8c8d-e22af0407be8_0
$ docker inspect e4261544f8e
[
{
: ,
: ,
: ,
], : [
"State": {
"Status": "dead",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": true,
"Pid": 0,
"ExitCode": 137,
"Error": "container e4261544f8eb355ee438c61a5c121217d54b54244d2957e495aa234f77444288: driver "overlay2" failed to remove root filesystem: unlinkat /var/lib/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged: device or resource busy",
"StartedAt": "2023-07-25T07:23:29.135183205Z",
"FinishedAt": "2023-08-15T06:18:44.69693858Z"
},
kubectl get ds logxxx-ds -o yaml -n kube-system
apiVersion: apps/v1
kind: DaemonSet
spec:
spec:
containers:
...
volumeMounts:
mountPath: /var/run
name: run
mountPath: /logxxx_host
mountPropagation: HostToContainer
name: root
readOnly: true
volumes:
hostPath:
path: /var/run
type: Directory
name: run
hostPath:
path: /
type: Directory
name: root
-
HostToContainer:宿主机上挂载到容器中的目录,宿主机目录中的变化,容器中单向可见。反之,容器中目录的变化对宿主机中的目录不可见。
-
Bidirectional:宿主机上挂载到容器中的目录,无论是在宿主机还是在容器中的变化,宿主机和容器中都可见。
单机分析初步结论
Kubelet进程资源占用分析
$ journalctl -u kubelet --no-pager --since="2024-04-12 15:25"
Apr 12 15:28:39 k8s-prd-standard-43.113 kubelet[3237254]: E0412 15:28:39.437193 3237254 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/diskplugin.csi.xxx.com^d-bp1dtkaipms3op820vnb podName: nodeName:}" failed. No retries permitted until 2024-04-12 15:30:41.437168443 +0800 CST m=+21254165.582404754 (durationBeforeRetry 2m2s). Error: "GetDeviceMountRefs check failed for volume "d-bp1dtkaipms3op820vnb" (UniqueName: "kubernetes.io/csi/diskplugin.csi.xxx.com^d-bp1dtkaipms3op820vnb") on node "k8s-prd-standard-43.113" : the device mount path "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/d-bp1dtkaipms3op820vnb/globalmount" is still mounted by other references [/var/lib/container/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged/logxxx_host/var/lib/kubelet/plugins/kubernetes.io/csi/pv/d-bp1dtkaipms3op820vnb/globalmount /var/lib/container/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged/logxxx_host/var/lib/container/kubelet/plugins/kubernetes.io/csi/pv/d-bp1dtkaipms3op820vnb/globalmount /var/lib/container/kubelet/plugins/kubernetes.io/csi/pv/d-bp1dtkaipms3op820vnb/globalmount]"
...
Kubelet分析初步结论
-
Kubelet删除挂载设备失败,失败原因是设备目录被logxxx容器占用,锲而不舍的持续删除,接近81%的CPU使用率消耗在了卸载本地大量挂载的云盘记录上; -
而删除异常的磁盘设备被云厂商日志产品单机实例logxxx-ds-hsbv9挂载占用,持续重复卸载失败。 -
挂载设备目录的的logxxx-ds-hsbv9容器,删除过程中卡在了容器目录的删除上。
"Error": "container e4261544f8eb355ee438c61a5c121217d54b54244d2957e495aa234f77444288: driver "overlay2" failed to remove root filesystem: unlinkat /var/lib/docker/overlay2/9caa9580e946898718f97e9d205d0ac34048d5242ea30c0bc859627483b664be/merged: device or resource busy",
$ rmdir /var/lib/container/docker/overlay2/9caa..64be/merged
rmdir: failed to remove ‘/var/lib/container/docker/overlay2/9caa..64be/merged’: Device or resource busy
bool __is_local_mountpoint(struct dentry *dentry)
{
struct mnt_namespace *ns = current->nsproxy->mnt_ns;
struct mount *mnt;
bool is_covered = false;
if (!d_mountpoint(dentry))
goto out;
down_read(&namespace_sem);
list_for_each_entry(mnt, &ns->list, mnt_list) {
is_covered = (mnt->mnt_mountpoint == dentry);
if (is_covered)
break;
}
up_read(&namespace_sem);
out:
return is_covered;
}
$ ls -hl /var/lib/container/docker/overlay2/9caa...64be/merged/logxxx_host/var/lib/kubelet/plugins/kubernetes.io/csi/pv/d-bp1iau7367wkpf6xqdpv/globalmount/
ls: reading directory /var/lib/container/docker/overlay2/9caa...64be/merged/logxxx_host/var/lib/kubelet/plugins/kubernetes.io/csi/pv/d-bp1iau7367wkpf6xqdpv/globalmount/: Input/output error
grep "vdfiw" /tmp/dmsg.log
[Mon Apr 8 10:19:15 2024] EXT4-fs warning (device vdfiw): htree_dirblock_to_tree:995: inode #2: lblock 0: comm find: error -5 reading directory block
[Mon Apr 8 10:19:15 2024] EXT4-fs error (device vdfiw): __ext4_get_inode_loc:4317: inode #2: block 1095: comm find: unable to read itable block
[Mon Apr 8 10:19:15 2024] EXT4-fs error (device vdfiw) in ext4_reserve_inode_write:5543: IO failure
[Mon Apr 8 10:19:21 2024] Aborting journal on device vdfiw-8.
[Mon Apr 8 10:19:21 2024] JBD2: Error -5 detected when updating journal superblock for vdfiw-8.
[Mon Apr 8 11:05:38 2024] EXT4-fs warning (device vdfiw): htree_dirblock_to_tree:995: inode #2: lblock 0: comm find: error -5 reading directory block
[Mon Apr 8 13:29:36 2024] EXT4-fs warning (device vdfiw): htree_dirblock_to_tree:995: inode #2: lblock 0: comm find: error -5 reading directory block
[Tue Apr 9 10:20:55 2024] EXT4-fs (vdfiw): error count since last fsck: 2
[Tue Apr 9 10:20:55 2024] EXT4-fs (vdfiw): initial error at time 1712543485: __ext4_get_inode_loc:4317
[Tue Apr 9 10:20:55 2024] EXT4-fs (vdfiw): last error at time 1712543485: ext4_reserve_inode_write:5543
[Wed Apr 10 10:48:53 2024] EXT4-fs (vdfiw): error count since last fsck: 2
[Wed Apr 10 10:48:53 2024] EXT4-fs (vdfiw): initial error at time 1712543485: __ext4_get_inode_loc:4317
[Wed Apr 10 10:48:53 2024] EXT4-fs (vdfiw): last error at time 1712543485: ext4_reserve_inode_write:5543
[Thu Apr 11 11:16:52 2024] EXT4-fs (vdfiw): error count since last fsck: 2
[Thu Apr 11 11:16:52 2024] EXT4-fs (vdfiw): initial error at time 1712543485: __ext4_get_inode_loc:4317
[Thu Apr 11 11:16:52 2024] EXT4-fs (vdfiw): last error at time 1712543485: ext4_reserve_inode_write:5543
[Fri Apr 12 09:49:08 2024] EXT4-fs warning (device vdfiw): htree_dirblock_to_tree:995: inode #2: lblock 0: comm find: error -5 reading directory block
[Fri Apr 12 09:49:08 2024] EXT4-fs error (device vdfiw): ext4_journal_check_start:61: Detected aborted journal
[Fri Apr 12 09:49:08 2024] EXT4-fs (vdfiw): Remounting filesystem read-only
[Fri Apr 12 09:49:39 2024] EXT4-fs warning (device vdfiw): htree_dirblock_to_tree:995: inode #2: lblock 0: comm find: error -5 reading directory block
[Fri Apr 12 09:56:20 2024] EXT4-fs warning (device vdfiw): htree_dirblock_to_tree:995: inode #2: lblock 0: comm find: error -5 reading directory block
[Fri Apr 12 11:44:50 2024] EXT4-fs (vdfiw): error count since last fsck: 4
[Fri Apr 12 11:44:50 2024] EXT4-fs (vdfiw): initial error at time 1712543485: __ext4_get_inode_loc:4317
[Fri Apr 12 11:44:50 2024] EXT4-fs (vdfiw): last error at time 1712887285: ext4_journal_check_start:61
[Fri Apr 12 14:36:00 2024] EXT4-fs warning (device vdfiw): htree_dirblock_to_tree:995: inode #2: lblock 0: comm ls: error -5 reading directory block
[Fri Apr 12 14:36:44 2024] EXT4-fs warning (device vdfiw): htree_dirblock_to_tree:995: inode #2: lblock 0: comm ls: error -5 reading directory block
手动卸载单条记录确认
ps aux|grep vdmgd
root 1078726 0.0 0.0 0 0 ? S 20:11 0:00 [jbd2/vdmgd-8]
root 1078727 0.0 0.0 0 0 ? S 20:11 0:00 [jbd2-ckpt/vdmgd]
$ df |grep vdmgd
/dev/vdmgd 295G 2.1G 293G 1% /var/lib/kubelet/plugins/kubernetes.io/csi/pv/d-bp1gr027v6iq2etcfsvk/globalmount
$ mount |grep d-bp1gr027v6iq2etcfsvk
/dev/vdmgd on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/d-bp1gr027v6iq2etcfsvk/globalmount type ext4 (rw,relatime)
/dev/vdmgd on /var/lib/container/docker/overlay2/f605aea15446c49d10108264b5f377a022e9c7c041487deb0c70189bf07a49c7/merged/logxxx_host/var/lib/container/kubelet/plugins/kubernetes.io/csi/pv/d-bp1gr027v6iq2etcfsvk/globalmount type ext4 (rw,relatime)
$ ls -hl /var/lib/container/docker/overlay2/f605aea15446c49d10108264b5f377a022e9c7c041487deb0c70189bf07a49c7/merged/logxxx_host/var/lib/container/kubelet/plugins/kubernetes.io/csi/pv/d-bp1gr027v6iq2etcfsvk/globalmount
ls: reading directory /var/lib/container/docker/overlay2/f605aea15446c49d10108264b5f377a022e9c7c041487deb0c70189bf07a49c7/merged/logxxx_host/var/lib/container/kubelet/plugins/kubernetes.io/csi/pv/d-bp1gr027v6iq2etcfsvk/globalmount: Input/output error
$ umount /var/lib/container/docker/overlay2/f605aea15446c49d10108264b5f377a022e9c7c041487deb0c70189bf07a49c7/merged/logxxx_host/var/lib/container/kubelet/plugins/kubernetes.io/csi/pv/d-bp1gr027v6iq2etcfsvk/globalmount
全部清理确认
-
日志采集单机容器实例挂载了Linux系统根目录,主机挂载记录全部传递到日志采集单机的容器实例中; -
运行的服务使用K8s持久存储挂载远程云盘,挂载到本地的磁盘开启了日志功能; -
单机运行的服务频繁申请远程云盘,云厂商提供的存储插件提前卸载远程云盘,使得磁盘挂载记录持续泄露; -
设备文件系统日志功能的内核线程,在远端云盘被删除后的异常场景,并不会异常终止,而是持续重试;
往期回顾
文 / David
关注得物技术,每周一、三、五更新技术干货
要是觉得文章对你有帮助的话,欢迎评论转发点赞~
未经得物技术许可严禁转载,否则依法追究法律责任。
“
扫码添加小助手微信
如有任何疑问,或想要了解更多技术资讯,请添加小助手微信:
本篇文章来源于微信公众号:得物技术
本文来自投稿,不代表TakinTalks稳定性技术交流平台立场,如若转载,请联系原作者。