關於 esxtop 命令

原文地址:http://www.yellow-bricks.com/esxtop/  來自 Duncan Epping
node


esxtop 命令的指標和對應閾值(原文做者根據官方文檔,測試和使用經驗給出的參考值)bash

Metrics and Thresholds
Display
Metric Threshold  Explanation
CPU
%RDY 10
Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanation for vSMP VMs
CPU
%CSTP
3
Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.
CPU %SYS
20
The percentage of time spent by system services on behalf of the world. Most likely caused by high IO VM. Check other metrics and VM for possible root cause
CPU %MLMTD
0
The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the 「CPU limit」 settings. If larger than 0 the world is being throttled due to the limit on CPU.
CPU
%SWPWT
5
VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.
MEM
MCTLSZ
1
If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.
MEM
SWCUR
1
If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment.
MEM
SWR/s
1
If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.
MEM
SWW/s
1
If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.
MEM
CACHEUSD
0
If larger than 0 host has compressed memory. Possible cause: Memory overcommitment.
MEM
ZIP/s
0
If larger than 0 host is actively compressing memory. Possible cause: Memory overcommitment.
MEM
UNZIP/s
0
If larger than 0 host has accessing compressed memory. Possible cause: Previously host was overcommited on memory.
MEM
N%L
80
If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and 「remotely」 uses memory via 「interconnect」. Check 「GST_ND(X)」 to find out which NUMA nodes are used.
NETWORK
%DRPTX
1
Dropped packets transmitted, hardware overworked. Possible cause: very high network utilization
NETWORK
%DRPRX
1
Dropped packets received, hardware overworked. Possible cause: very high network utilization
DISK
GAVG
25
Look at 「DAVG」 and 「KAVG」 as the sum of both is GAVG.
DISK
DAVG
25
Disk latency most likely to be caused by array.
DISK
KAVG
2
Disk latency caused by the VMkernel, high KAVG usually means queuing. Check 「QUED」.
DISK
QUED
1
Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value.
DISK
ABRTS/s
1
Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason.
DISK
RESETS/s
1
The number of commands reset per second.
DISK CONS/s 20 SCSI Reservation Conflicts per second. If many SCSI Reservation Conflicts occur performance could be degraded due to the lock on the VMFS.


基本使用方式
app

經過本地控制檯或者ssh登陸,執行esxtop啓動它less

esxtop

默認採集間隔是5秒,按s,輸入正整數修改採集間隔。ssh

s 2

經過如下快捷鍵切換視圖ide

c = cpu
m = memory
n = network
i = interrupts
d = disk adapter
u = disk device (includes NFS as of 4.0 Update 2)
v = disk VM
p = power states
V = only show virtual machine worlds
e = Expand/Rollup CPU statistics, show details of all worlds associated with group (GID)
k = kill world, for tech support purposes only!
l = limit display to a single group (GID), enables you to focus on one VM
# = limiting the number of entitites, for instance the top 5
2 = highlight a row, moving down8 = highlight a row, moving up
4 = remove selected row from view
e = statistics broken down per world
6 = statistics broken down per world

添加刪除字段工具

f<根據屏幕提示輸入字段對應的字母>

更改排序性能

o<輸入對應字符移動字段,大寫向左,小寫向右>

保存設置測試

W

不修改文件名的狀況下,以默認文件名保存時將做爲默認設置
獲取幫助
ui

?

在大型環境中可能由於大量數據須要蒐集和計算,從而致使使用esxtop佔用大量CPU資源。可使用命令行選項鎖定特定的實例和特定的信息來減小esxtop所消耗的CPU資源。

esxtop -l

瞭解更多信息,請查看 here.


經過批處理模式採集數據

首先,確認須要獲取的信息,添加/刪除你須要/不須要的字段(f),保存到配置文件(W)

運行如下命令蒐集數據,將結果保存到csv文件。

esxtop -b -d 2 -n 100 > esxtopcapture.csv

其中,"-b"表示批處理模式,"-d 2"表示採集間隔2秒,"-n 100"表示採集100次。間隔2秒採集100次,也就是採集200秒的數據。若是須要採集全部指標,使用"-a"參數

若是實例過多,或者採集週期較長,從而致使數據量很大,能夠經過 gzip壓縮

esxtop -b -a -d 2 -n 100 | gzip -9c > esxtopoutput.csv.gz

注意,這種方式採集的數據,將不包括命令執行之後新建立的虛機或者是從其餘主機上vMotion過來的虛機。這一點和 -l 參數類似。


數據分析

有多種方式,官方的方案是經過Windows的性能監視器或者Excel。在http://labs.vmware.com/flings/上還有一些工具能夠完成數據呈現。 好比 visualEsxtop和esxplot。


其餘

實際使用過程當中,可能由於實例數量,字段長度,顯示器分辨率等問題致使顯示不完整,能夠經過導出列表,修改編輯後從新導入的方式來限制顯示視圖

esxtop -export-entity filename

導出後,你能夠編輯這個文件,註釋掉不須要的部分

esxtop -import-entity filename

如下是命令行的方式篩選出須要的虛機信息,其中virtualmachinename須要根據須要修改(未測試)

VMWID=`vm-support -x | grep <virtualmachinename> |awk '{gsub("wid=", "");print $1}'`VMXCARTEL=`vsish -e cat /vm/$VMWID/vmxCartelID`vsish -e cat /sched/memClients/$VMXCARTEL/SchedGroupID
相關文章
相關標籤/搜索