環境說明:虛擬機 CentOS7中解壓一個8G的包時,內核報錯php
Message from syslogd@cosmo-01 at Apr 25 11:05:59 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#6 stuck for 21s! [xfs-data/dm-0:451]html
內核軟死鎖(soft lockup)bug緣由分析:
網上找資料分析了一下緣由,直接緣由是:若是CPU太忙致使喂狗(watchdog)不及時,此時系統會打印CPU死鎖信息:java
kernel:BUG: soft lockup - CPU#0 stuck for 38s! [kworker/0:1:25758]node
kernel:BUG: soft lockup - CPU#7 stuck for 36s! [java:16182]git
......ubuntu
內核參數kernel.watchdog_thresh(/proc/sys/kernel/watchdog_thresh)系統默認值爲10。若是超過2*10秒會打印信息,注意:調整值時參數不能大於60。centos
雖然調整該值能夠延長喂狗等待時間,可是不能完全解決問題,只能致使信息延遲打印。所以問題的解決,仍是須要找到根本緣由。服務器
能夠打開panic,將/proc/sys/kernel/panic的默認值0改成1,便於定位。post
網上查找資料,發現引起CPU死鎖的緣由有不少種:性能
* 服務器電源供電不足,致使CPU電壓不穩致使CPU死鎖
https://ubuntuforums.org/showthread.php?t=2205211
I bought a small (500W) new power supply made by what I feel is a reputable company and made the swap.
GREAT NEWS: After replacing the power supply, the crashes completely stopped!
I wanted to wait a while just to be sure, but it is now a few weeks since the new powersupply went in, and I haven't had a single crash since.
The power supply is not something that I would normally worry about, but in this case it totally fixed my problem.
Thanks to those who read my post, and especially to those who responded.
* vcpus超過物理cpu cores
https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds
* 虛機所在的宿主機的CPU太忙或磁盤IO過高
* 虛機的的CPU太忙或磁盤IO過高
https://www.centos.org/forums/viewtopic.php?t=60087
* BIOS KVM開啓之後的相關bug,關閉KVM可解決,但關閉之後物理機不支持虛擬化
https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds
* VM網卡驅動存在bug,處理高水位流量時存在bug致使CPU死鎖
* BIOS開啓了超頻,致使超頻時電壓不穩,容易出現CPU死鎖
https://ubuntuforums.org/showthread.php?t=2205211
* Linux kernel存在bug
https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds
* KVM存在bug
https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds
* clocksource tsc unstable on CentOS and cloud Linux with Hyper-V Virtualisation
https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds
經過設置clocksource=jiffies可解決
* BIOS Intel C-State開啓致使,關閉可解決
https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds
https://support.citrix.com/article/CTX127395
http://blog.sina.com.cn/s/blog_906d892d0102vn26.html
* BIOS spread spectrum開啓致使
當主板上的時鐘震盪發生器工做時,脈衝的尖峯會產生emi(電磁干擾)。spread spectrum(頻展)設定功能能夠下降脈衝發生器所產生的電磁干擾,脈衝波的尖峯會衰減爲較爲平滑的曲線。
若是咱們沒有遇到電磁干擾問題,建議將此項設定爲disabled,這欄能夠優化系統的性能表現和穩定性;
不然應該將此項設定爲enabled。 若是對cpu進行超頻,必須將此項禁用。由於即便是微小的脈衝值漂移也會致使超頻運行的cpu鎖死。
再次強調:CPU超頻時,SPREAD SPECTRUM必須關閉,不然容易出現鎖死cpu的狀況。
#追加到配置文件中 echo 30 > /proc/sys/kernel/watchdog_thresh #查看 [root@git-node1 data]# tail -1 /proc/sys/kernel/watchdog_thresh 30 #臨時生效 sysctl -w kernel.watchdog_thresh=30 #內核軟死鎖(soft lockup)bug緣由分析 Soft lockup名稱解釋:所謂,soft lockup就是說,這個bug沒有讓系統完全死機,可是若干個進程(或者kernel thread)被鎖死在了某個狀態(通常在內核區域),不少狀況下這個是因爲內核鎖的使用的問題。 vi /etc/sysctl.conf kernel.watchdog_thresh=30