linux平均負載的理解及緣由排查

時間 2019-12-08

標籤 linux 平均負載理解緣由排查欄目 Linux 简体版

原文原文鏈接

當系統響應緩慢時，通常會用top或uptime命令查看系統負載是否太高。例如輸入uptime命令顯示以下圖,其中23:47:19表示如今的時間,up 260 days,14:39表示系統運行了多久,1 user表示當前登陸用戶數,最重要的是 load average,有三個數值，分別表示過去1分鐘、5分鐘、15分鐘系統的負載。系統負載的英文解釋爲:linux

System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU. A
process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of
CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.工具

系統負載表示每分鐘處於可運行狀態(運行態和就緒態)以及不可中斷狀態(等待io)的進程數目，且沒有作歸一化處理。ui

圖1 uptime命令顯示圖spa

當平均負載等於CPU核數時是比較合理的狀態，不會存在進程等待且也不會有CPU空閒浪費，查看CPU核數的命令以下:orm

grep 'model name' /proc/cpuinfo | wc -lblog

uptime命令過去1分鐘，5分鐘，15分鐘的負載隱含了系統負載的變化趨勢，假如這三個指標的值差很少，說明系統比較穩定，假如過去1分鐘的負載比過去15分鐘大不少，說明系統負載在增長，反之，則說明負載在減少。three

平均負載和CPU使用率不是一個概念，平均負載高不能說明CPU使用率高，由於平均負載除了包括正在使用CPU的進程，還包括在等待CPU的進程和等待io的進程。進程

CPU使用率表示單位時間CPU的利用狀況,CPU使用率和平均負載的關係有三個場景:it

CPU密集型進程,大量使用CPU會使CPU利用率和平均負載都增高。
IO密集型進程，會使平均負載增高但CPU使用率不必定會增高。
大量等待CPU的進程調度會使平均負載增高，CPU使用率也會增高

可以使用stress模擬這三種狀況,stress爲linux下的壓測工具，並用mpstat和pidstat查看根源,mpstat顯示CPU的使用狀況,pidstat監視linux任務的統計信息，安裝以下:io

yum install stress sysstat

stress命令經常使用參數以下:

-c, --cpu N
spawn N workers spinning on sqrt()

-i, --io N
spawn N workers spinning on sync()

-t, --timeout N
timeout after N seconds

1.用stress模擬CPU密集型進程

用uptime查看負載

用mpstat查看CPU使用狀況，以下,其中-P的含義爲

-P { cpu [,...] | ON | ALL }
Indicate the processor number for which statistics are to be reported. cpu is the processor number. Note that processor 0 is the first processor. The ON keyword indicates that statistics
are to be reported for every online processor, whereas the ALL keyword indicates that statistics are to be reported for all processors.表示顯示哪一個處理器的使用狀況. 5表示統計間隔爲5s

能夠看到CPU使用率很高，pidstat查看具體佔用CPU的進程,進程stress佔的CPU達到99%，-u的含義爲Report CPU utilization,5表示間隔5秒,1表示輸出一組數據就中止，假如不加1,則會一直輸出。

2.stress模擬io密集型進程

uptime查看負載,能夠看到負載已經很高(單核CPU)

mpstat查看負載升高緣由，能夠看到CPU0的iowait高達88.31%，說明CPU有大量的時間在等待磁盤io,負載是由io進程致使,iowait含義:%iowait
Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.

定位具體的進程,能夠看到是stress-ng-hdd,wait含義

Percentage of CPU spent by the task while waiting to run.

3.stress模擬大量進程場景

uptime查看負載

mpstat能夠看到CPU利用率已經很高

pidstat查看每一個進程的CPU使用狀況,四個stress進程均在有較高的百分比在等待CPU(%wait)

當uptime負載較高時，多是CPU使用率較高，也多是io進程較多,能夠用mpstat查看CPU使用狀況，並用pidstat查看定位具體進程