【OS】NMON的簡介和使用

1. 目的

本文介紹操做系統監控工具Nmon的概念、使用方式及使用參數。指導運維人員經過nmon工具監視AIX/Linux操做系統資源使用狀況,收集監控結果及產生的數據文件,製做相關係統性能分析報告。php

2. Nmon簡介

Nmon (Nigel’s Monitor)是由IBM 提供、免費監控 AIX 系統與 Linux 系統資源的工具。該工具可將服務器系統資源耗用狀況收集起來並輸出一個特定的文件,並可利用 excel 分析工具(nmon analyser)進行數據的統計分析。html

2.1  軟件特性

nmon 工具能夠在一個屏幕上顯示全部重要的性能優化信息,並動態地對其進行更新。這個高效的工具能夠工做於任何啞屏幕、telnet 會話、甚至撥號線路。另外,它不會消耗大量的 CPU 週期,一般低於百分之二(在更新的計算機上,其 CPU 使用率將低於百分之一)。nmon使用啞屏幕,在屏幕上對數據進行顯示,而且每隔兩秒鐘對其進行更新。用戶能夠很容易地將這個時間間隔更改成更長或更短的時間段。若是拉伸窗口,並在 X Windows、VNC、PuTTY 或相似的窗口中顯示這些數據,nmon 工具能夠同時輸出大量的信息。node

nmon 工具還能夠將相同的數據捕獲到一個文本文件,便於之後對報告進行分析和繪製圖形。輸出文件採用電子表格的格式 (.csv)。linux

目前nmon已開源,以sourceforge爲根據地,網址是http://nmon.sourceforge.net。ios

2.2  軟件組成

Nmon使用須要nmon工具和nmonanalyser分析程序二者配合使用。nmon工具生成性能數據文件,而後monanalyser以nmon生成的數據文件做爲輸入,輸出爲Excel 電子表格,並自動地生成相應的圖形,使得咱們可以直觀地觀察OS性能(CPU、IO和內存等)的變化過程。web

2.3  運行環境

    nmon 工具運行於:shell

· AIX? 4.1.五、4.2.0、4.3.2 和 4.3.3(nmon Version 9a:該版本的功能已經肯定,而且不會對其進行進一步的開發。)數據庫

· AIX 5.一、5.2 和 5.3(nmon Version 10:該版本如今支持 AIX 5.3 和基於 POWER5? 處理器的計算機,而且提供了 SMT 和共享 CPU 微分區的支持。)windows

· pSeries? p5 和 OpenPower? 上的 Linux? SUSE SLES 九、Red Hat EL 3 和 四、Debiancentos

· Linux SUSE、Red Hat 和許多最新的 x86(32 位模式的 Intel 和 AMD)上的發佈版

· zSeries? 或 mainframe 上的 Linux SUSE 和 Red Hat

nmon 工具大約每六個月更新一次,或者在可用的新的操做系統發佈版中對其進行更新。

2.4  軟件功能

nmon 工具能夠爲 AIX 和 Linux 性能專家提供監視和分析性能數據的功能,其中包括:

· CPU 使用率

· 內存使用狀況

· 內核統計信息和運行隊列信息

· 磁盤 I/O 速度、傳輸和讀/寫比率

· 文件系統中的可用空間

· 磁盤適配器

· 網絡 I/O 速度、傳輸和讀/寫比率

· 頁面空間和頁面速度

· CPU 和 AIX 規範

· 消耗資源最多的進程

· IBM HTTP Web 緩存

· 用戶自定義的磁盤組

· 計算機詳細信息和資源

· 異步 I/O,僅適用於 AIX

· 工做負載管理器 (WLM),僅適用於 AIX

· IBM TotalStorage? Enterprise Storage Server? (ESS) 磁盤,僅適用於 AIX

· 網絡文件系統 (NFS)

· 動態 LPAR (DLPAR) 更改,僅適用於面向 AIX 或 Linux 的 pSeries p5 和 OpenPower

2.5  軟件獲取

nmon工具和nmonanalyser工具均可以在IBM的Wike頁面上下載到。

1) Nmon下載: 
位置:可從IBM Wiki上下載 
http://www-941.haw.ibm.com/collaboration/wiki/display/WikiPtype/nmon 
下載頁面以下: 
wpsD894.tmpwpsD895.tmp

例如測試的系統是AIX5.3,那麼就能夠下載nmon4aix12e.zip ,下載後能夠看到壓縮包裏是一些文件,以下: 
wpsD8A6.tmpwpsD8A7.tmp


其實nmon就是shell腳本,nmon文件運行時調用其餘的文件,生成性能數據,這個工具運行時也是經過執行nmon腳本接受參數。

2) Nmonanalyser下載: 
位置:可從可從IBM Wiki上下載 
http://www-941.haw.ibm.com/collaboration/wiki/display/Wikiptype/nmonanalyser
下載頁面以下: 
wpsD8A8.tmpwpsD8A9.tmp


例以下載V3.3版本的Nmonanalyser。

3. Nmon使用

3.1  下載軟件

1) 下載以前須肯定操做系統內核版本,本文以公司服務器192.168.40.212爲例。登陸服務器使用命令獲取操做系統版本信息:

wpsD8B9.tmp 

查看可知爲Enterprise Linux Server release 5.5。

2) 登陸IBM官方網站(http://nmon.sourceforge.net/pmwiki.php?n=Site.Download)下載相應版本的nmon工具:nmon_linux_14g.tar.gz 

3.2  安裝軟件

1) 用root用戶登陸系統,創建目錄:#mkdir  /nmon

2) 經過FTP將下載的nmon工具上傳至服務器 192.168.40.212目錄/nmon下。 

3) 修改tar包權限:#chmod +x  nmon_linux_14g.tar.gz 

4) 解壓文件:#tar xvfZ  nmon_linux_14g.tar.gz 

5) 執行受權命令:#chmod  +x  nmon_x86_rhel54

3.3  運行界面

1) 運行./nmon_x86_rhel54,即可進入nmon的監控界面,以下圖:

wpsD8BA.tmp 

2) 鍵入「c」查看系統CPU使用狀況

 wpsD8BD.tmp

3) 鍵入「m」查看系統內存使用狀況

wpsD8CD.tmp 

4)  鍵入「d」查看系統磁盤I/O狀況

wpsD8CE.tmp 

5) 鍵入「h」查看幫助信息

wpsD8CF.tmp 

3.4  使用方法

3.4.1 實時監控

在完成配置後,只須要「#./nmon」命令便可運行程序,經過一些快捷鍵調取關心的系統資源指標進行顯示,如按鍵「C」能夠查看CPU相 關信息;按鍵「D」能夠查看磁盤信息;按鍵「T」能夠查看系統的進程信息;「M」對應內存、「N」對應網絡等等,完整的快捷鍵對應內容能夠經過幫助(按鍵 「H」)查看,能夠顯示Linux系統CPU、內存、進程信息,包括了CPU的用戶、系統、等待和空閒狀態值,可用內存、緩存大小以及進程的CPU消耗等詳細指標。 

該種方式顯示信息實時性強,可以及時掌握系統承受壓力下的運行狀況,每顆CPU利用率是多少、內存使用多少、網絡流量多少、磁盤讀寫……這些數據均是實時刷新,一目瞭然。

3.4.2 後臺監控

爲了配合性能測試,咱們每每須要將一個時間段內系統資源消耗狀況記錄下來,這時可使用命令在遠程窗口執行命令:

./nmon/ nmon_x86_rhel5  -f -N -m /nmon/log  -s 30 -c 120 
其中各參數表示: 
  -f 按標準格式輸出文件:_YYYYMMDD_HHMM.nmon 
  -N include NFS sections 
  -m 切換到路徑去保存日誌文件 
  -s 每隔n秒抽樣一次,這裏爲30 
  -c 取出多少個抽樣數量,這裏爲120,即監控=120*(30/60/60)=1小時 
    根據小時計算這個數字的公式爲:c=h*3600/s,好比要監控10小時,每隔30秒採樣一次,則c=10*3600/30=1200

該命令啓動後,會在nmon所在目錄下生成監控文件,並持續寫入資源數據,直至360個監控點收集完成——即監控1小時,這些操做均自動完成,無需手工幹 預,測試人員能夠繼續完成其餘操做。若是想中止該監控,須要經過「#ps –ef|grep nmon」查詢進程號,而後殺掉該進程以中止監控。

3.4.3 定時任務

    除配合性能測試的短時間監控,咱們也能夠實現對系統的按期監控,做爲運營維護階段的參考。按期監控實現以下:

1) 執行命令:#crontab  –e

2) 在最後一行添加以下命令: 
0 8 * * 1,2,3,4,5  /nmon/ 
表示: 
週一到週五,從早上08點開始,監控10個小時(到18:00整爲止),輸出到/nmon/log

4. Nmon監控結果介紹

4.1  生成結果文件

     經過後臺監控和按期監控,咱們能夠獲得擴展名爲nmon的監控文件,這些文件記錄着系統資源的數據,須要配合分析工具(nmon analyser)進行解讀。

1) 使用FTP工具從服務器上取下生成結果文件/nmon/log/sjfx212_120318_1723.nmon到本機。

2) 打開nmon_analyser.zip 包下的nmon analyser v33g.xls 文件,點擊Analyse nomn data按鈕,選擇以前get下來的sjfx212_120318_1723.nmon文件。

wpsD8DF.tmpExcel可能禁止運行宏了,點「安全警告旁邊的選項」,容許運行宏:

        wpsD8E0.tmp

3) 生成分析結果文件sjfx212_120318_1723.nmon.xlsx,並生成統計圖,直觀顯示系統資源狀況。

wpsD8E1.tmp

4.2  主要性能參數介紹

u 系統彙總(對應excel標籤的‘SYS_SUMM’):其中藍線爲cpu佔有率變化狀況;粉線爲磁盤IO的變化狀況。

u 磁盤讀寫狀況彙總(對應excel標籤的‘DISK_SUMM’): 其中藍色爲磁盤讀的速率KB/sec;紫色爲磁盤寫的速率KB/sec。

u 內存狀況彙總(對應excel標籤的‘MEM’): 曲線表示內存剩餘量(MB)

4.3  頁面介紹

簡單介紹生成結果的各個頁面:

1) 系統彙總頁面:對應標籤頁(SYS_SUMM)

頁面顯示項主要有主機名,執行日期,系統cpu使用狀況(藍線),系統I/O狀況(粉紅線),其中座標左縱軸爲系統cpu(user%+sys%)使用率,橫軸爲運行時長(下圖爲一個小時),右縱軸爲系統磁盤傳輸(Disk xfers),座標下側爲統計信息:系統I/O狀況(一個週期內的平均值、最大值、出現最大值的時間),系統CPU使用狀況。

選項

說明

備註

User%

用戶進程時間在CPU 開銷時間百分比

若是一個CPU 被充分使用,利用率分類之間均衡的比例應該是:

65% - 70% User Time

30% - 35% System Time

0% - 5%   Idle Time

Wait%

全部進程線程被阻塞等待完成一次IO 請求所佔CPU 開銷idle的時間百分比

Sys%

線程和中斷在CPU 開銷時間百分比

Idle%

CPU 空閒時間的百分比

CPU%

CPU 利用率的百分比

wpsD8F1.tmp 

註釋:

若是系統CPU有IO wait存在,說明可能IO或內存方面存在瓶頸,其中主要致使IO Wait的主要緣由以下:

 內存不夠而引發頻繁的的數據交換,致使數據存取存在交換空間的 I/O瓶頸

 硬盤數據分佈不合理

2) 系統信息:標籤頁(AAA)

頁面信息主要包括:執行命令,主機CPU數(4),操做系統內核版本信息,主機名等信息。

wpsD8F2.tmp 

3) 系統詳細信息:標籤頁(BBBP)

 頁面信息主要包括:操做系統版本,主機磁盤信息,主機CPU型號、主頻信息,內存信息,網卡信息等。

wpsD8F3.tmp 

4) CPU使用狀況:標籤頁(CPU_ALL,CPU_SUMM,CPU001,CPU002,CPU003,CPU004)

主機CPU使用狀況彙總以及單顆CPU的運行狀況。

wpsD904.tmp 

wpsD905.tmp 

5) 磁盤讀寫狀況彙總:標籤頁(DISK_SUMM,DISKBSIZE,DISKBUSY,DISKREAD,DISKWRITE,DISKXFER)

 磁盤的讀、寫及I/O統計信息,系統各磁盤分區的讀寫狀況。其中DISKBUSY頁主要反映系統本地磁盤使用,DISK_SUMM包括本地和存儲(ESS, EMC, FASt及HDS)上全部磁盤使用狀況。

wpsD906.tmp 

wpsD907.tmp 

wpsD917.tmp 

6) 內存使用狀況:標籤頁(MEM)

系統內存空閒、使用,swap、cached等統計信息。

wpsD918.tmp 

7) 系統網絡狀況:標籤頁(NET,NETPACKET)

反映系統的網絡運行狀況,系統各個網絡適配器讀寫的數據包數

wpsD919.tmp 

8) 系統進程:標籤頁(PROC)

反映系統運行線程及等待切換的線程平均數。其中RunQueue - 每一個處理器應該運行隊列不超過1-3個線程。

wpsD91A.tmp 

wpsD92B.tmp 

9) 活動虛擬內存總量:標籤頁(VM)

Linux操做系統特有指標,主要包括系統/proc/vmstat文件中信息,兩張圖片主要顯示系統分頁文件(pagefile)和swap分區運行狀況。若是系統老是存在大量換出頁(pgpgout/s)KB數,說明系統須要更多內存。

wpsD92C.tmp 

wpsD92D.tmp 

10) 頁式調度:標籤頁(PAGE)

記錄系統(AIX)頁式調度的狀況。這一欄主要記錄系統換頁(paging)狀況和頁面掃描:自由比率(page scan:free ratio)。其中系統換頁頻率應該不大於5次/秒,而當頁面掃描:自由比率持續大於4時,須要重點關注下系統內存和分頁空間的使用狀況。

wpsD92E.tmpwpsD93F.tmp 

說明:

大量的內存交換操做會極大地影響系統的性能,尤爲是在當數據庫文件建立在文件系統上時(JFS and JFS2)。在這種狀況下常常訪問的數據,即在SGA中存在,也一樣在文件的緩存中存在。這種相同的數據在內存中緩存兩次的狀況,會下降內存的使用效率,從而使內存頻繁進行交換操做,形成系統的I/O瓶頸,下降整個系統的性能。

11) 採集時間:標籤頁(ZZZZ)

記錄nmon工具採集系統信息的時間點。

wpsD940.tmp 

5. Nmon監控案例介紹

本節介紹經過nmon工具發現系統性能降低問題的常見現象和處理流程。

5.1  常見現象和產生緣由

性能降低(Performance Degradation),主要是指系統的性能隨時間而逐漸降低(這裏假定在系統性能降低的過程當中系統的負載情況沒有明顯變化)。系統運行過程當中佔用的CPU或內存隨時間增長也屬於廣義的性能降低問題。

在生產環境中,一般由終端客戶最早感受到並報告性能降低問題。因此狹義的性能降低問題主要是指系統運行指標隨時間變化,好比吞吐率隨時間降低或頁面響應時間隨時間上升,或者二者兼而有之。

下面列舉一些引發性能降低問題的緣由:

 應用程序資源使用問題。主要是內存使用問題,即因爲應用服務器的內存碎片問題或內存泄漏問題,致使垃圾回收的開銷隨時間增大。也有多是由於磁盤臨時文件積累形成磁盤訪問開銷增大。

 應用程序設計問題。因爲應用程序的設計存在可擴展性或可靠性問題,致使運行開銷隨時間或業務對象的積累而增大。

 數據庫訪問問題。該問題又能夠分爲許多類型,如調優參數問題、表結構或索引設計問題、垃圾數據問題等。其共同特色是致使應用程序利用特定操做訪問數據庫的開銷隨時間而增大。

 服務器軟件資源使用問題。雖然可能性很小,可是應用服務器、數據庫服務器等服務器程序也是軟件程序,也有可能存在性能降低問題。這些服務器程序在自身測試過程當中可能遺漏了某些性能問題,而在用戶特定的執行情況下觸發了這些問題,結果致使這些服務器程序使用的操做系統資源泄漏而出現性能降低問題。

 測試用例設計問題。性能測試中有可能發現一些「假」的性能降低問題。好比測試用例設計時假設在測試執行過程當中系統負載保持恆定,但實際的測試用例實現致使系統負載或特定頁面的處理內容隨時間增多,也可能致使測試工具的測試報告中出現性能降低問題。

5.2  實例介紹

5.2.1 示例一

wpsD941.tmp 

圖5-1  nmon操做系統監視彙總信息圖

由此信息圖可發如今12小時測試進行過程當中,系統的磁盤傳輸(Disk xfers)逐漸增大,與此同時系統CPU佔用率逐漸降低。進一步檢查單個CPU的使用狀況,發現1號CPU的Wait狀態佔用率明顯增大,如圖5-2所示。這說明CPU佔用率逐漸降低是因爲等待磁盤I/O引發的。

wpsD951.tmp 

圖5-2  nmon單個CPU使用狀況圖

接下來分析磁盤傳輸彙總信息,如圖5-3所示,能夠看出磁盤寫數據量沒有明顯增長,可是磁盤讀數據量明顯隨時間而增長。

wpsD952.tmp 

圖5-3  nmon磁盤傳輸彙總狀況圖

憑磁盤傳輸彙總信息,在排除由應用服務器讀取致使的性能降低問題後,基本能夠確定不斷增長的磁盤讀取操做是由數據庫引發的。

隨後,分析DB2的快照監視器的監視結果,能夠發現DB2的緩衝池(Buffer pool)的數據和索引物理讀(physical read)的比例很是高。以下例所示:

Buffer pool data logical reads             = 5502388

Buffer pool data physical reads            = 430671

Buffer pool temporary data logical reads   = 0

Buffer pool temporary data physical reads  = 0

……

能夠看到緩衝池的物理讀比例(即緩衝池不命中率)高達7%,這遠遠大於1%的警惕線。並且物理讀比例有隨時間增長的趨勢(經過不一樣時間的快照信息對比發現)。

至此能夠懷疑性能降低問題是因爲DB2的緩衝池配置參數設置不當引發的。考察數據庫配置參數信息發現,該數據庫的BUFFPAGE參數值爲10000。與該測試用例使用的數據規模相比,這個參數值明顯偏小。因而將BUFFPAGE參數值增大10倍,變爲100000,從新運行性能測試,發現性能降低問題基本消失。

5.2.2 示例二

本實例爲經過nmon監控河北稅源管理平臺數據倉庫收集的數據。對照nmonanalyser工具生成結果進行分析。

wpsD953.tmp

1) 收集服務器基本信息

wpsD954.tmp 

wpsD965.tmp 

wpsD966.tmp 

       經過以上數據能夠知道河北數據倉庫服務器基本配置信息以下:

 主機名:HE_SSGLY_DB_01

 操做系統版本:AIX 5.3.0.44 build 5300-04

 操做系統內核:HW-type=CHRP=Common H/W Reference Platform Bus=PCI LPAR=Dynamic Multi-Processor 64 bit

 主機型號:IBM p5 595 (9119-595)

 網絡配置:IP Address: 75.16.16.191 Sub Netmask: 255.255.248.0 Gateway: 75.16.16.100

 主機存儲:EMC存儲

 主機物理內存:49152 MB

 網卡信息:2塊網卡 速率爲1024M/S

2) 文件系統使用狀況

wpsD967.tmp 

wpsD968.tmp 

       經過以上數據能夠知道河北數據倉庫服務器文件系統相關信息以下:

 文件系統分區大小及使用狀況

 文件系統掛載點信息

 文件系統類型爲JFS2(Journaled File System 2)文件系統

3) 系統資源使用狀況

 wpsD979.tmp

      觀察上圖,能夠發現服務器cpu使用率較高的時段爲8:00-11:30、14:00-17:20,而系統I/O開銷較高的時段均出如今21:00-第二天5:50。以上信息基本符合服務器晚上進行數據ETL處理、白天工做時段多進行平臺通常事務處理的狀況。進一步觀察服務器cpu使用狀況:

wpsD97A.tmp 

wpsD97B.tmp 

wpsD98B.tmp 

     發現CPU利用率較高均爲用戶進程,單獨查看CPU-11注意到22:00-0:00這個時段內CPU的Wait狀態佔用率明顯增大, user狀態佔用率卻很低,此時系統I/O明顯升高,說明這個時段內有大量磁盤I/O發生,CPU佔用率逐漸降低是因爲等待磁盤I/O引發的。實際狀況中,這個時段服務器正在進行ETL數據處理,的確有大量數據傳輸和磁盤讀寫發生。

wpsD98C.tmp 

5.2.3 示例三

     介紹個利用nmon進行系統實時監控的例子。

     把nmon腳本上傳到服務中,直接運行便可,執行命令以下:#./nmon或者#/tmp/nmon/nmon

     如下是aix5.3下使用nmon的一些截圖:

wpsD98D.tmp 

   分別輸入c、t、n、m,能夠了解系統cpu,內存,消耗資源最高的線程的使用狀況。 
wpsD98E.tmp

wpsD99F.tmp 




Links:

NMON home page

http://www-941.haw.ibm.com/collaboration/wiki/display/Wikiptype/nmon

NMON_Analyser home page

http://www-941.haw.ibm.com/collaboration/wiki/display/Wikiptype/nmonanalyser

User Forum

http://www.ibm.com/developerworks/forums/dw_forum.jsp?forum=749&cat=56





一.nmon工具介紹

nmon 是一個分析aix和linux性能的免費工具(主要是ibm爲本身的aix操做系統開發的,可是也能夠用在linux操做系統),而nmon_analyser是nmon的一個工具能夠把nmon生成的報告轉化成excel報表的形式供查看

nmon 工具能夠在一個屏幕上顯示全部重要的性能優化信息,並動態地對其進行更新。它並不會消耗大量的 CPU ,一般低於百分之二

 

下載地址:

http://nmon.sourceforge.net/pmwiki.php?n=Site.Download

http://sourceforge.net/projects/nmon/files/?source=navbar

 

注意選擇和本身機器對應的版本:

linux查看系統版本號:

1.方法1

對於linux系統而已,有成百上千個發行版。對於發行版的版本號查看方法

如以centos爲例。輸入lsb_release -a便可

該命令適用於全部的linux,包括Redhat、SuSE、Debian等發行版

 

 

2.

 

3.

 

查看系統位數

一樣對於Linux系統而言也有32和64位之分,用上述查看內核方法,輸入

uname -a  或 more /proc/version

在內核版本後面會有一個X86_64就是表示64爲系統啦

# uname -a
x86_64則說明你是64位內核, 跑的是64位的系統.
i386, i686說明你是32位的內核, 跑的是32位的系統

 

查看內核的幾種方法:

1. uname -a#顯示詳細的內核信息,

Linux localhost.localdomain 2.6.18-92.1.6.el5xen #1 SMP Wed Jun 25 12:56:52 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

2. cat /etc/issue#顯示版本類型

Scientific Linux SL release 5.2 (Boron)
Kernel \r on an \m

3. cat /proc/version

Linux version 2.6.18-92.1.6.el5xen (brewbuilder@norob.fnal.gov) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Wed Jun 25 12:56:52 EDT 2008

 

 

查看操做系統位數:

1.ls / #若是有lib64或這個目錄,那操做系統就是64位的
2.getconfig LONG_BIT 若輸出32即爲32位系統,64爲64位系統

32位的系統中int類型和long類型通常都是4字節,

64位的系統中int類型仍是4字節的,可是long已變成了8字節。

inux系統中可 用"getconf WORD_BIT"和"getconf LONG_BIT"得到word和long的位數。
64位系統中應該分別獲得32和64。

3. uname -a中若爲X86示意爲64位系統,i386等位32位系統

 

 

 

二.下載安裝

 

nmon默認下載以後只是一個可執行測的文件,可是下載時根據不一樣的系統來進行區分的

你們能夠根據本身的系統選擇相應的版本。

執行命令 more /etc/issue 查看系統版本。

 

首先將下載的nmon_x86_64_centos6文件上傳到linux服務器上

 

[root@localhost source]#cp nmon_x86_64_centos6 /usr/bin

[root@localhost source]#cd /usr/bin

[root@localhost source]# chmod 777 nmon_x86_64_centos6

[root@localhost source]# ./nmon_x86_64_centos6(切換到nmon_x86_rhel45文件所在的目錄)

[root@localhost source]# env

[root@localhost source]# mv nmon_x86_64_centos6 nmon

[root@localhost source]# nmon(此時nmon就是全局命令)

也有可能

 

監控設置

15分鐘=900s

#alias nmon15='nmon -f -s3 -c300 -m /root/qumf/'

 

 

nmon工具命令的使用:

#nmon -s 300 -c 288 -f -m /tmp

-s 300:表示每300秒採集一次數據,

-c 288 :表示採集288次,300*288=86400秒,恰好是1天的數據,這樣運行一次這個程序就會生成一個一天的數據文件,

-m /tmp: 表示生成的數據文件的路徑

-f :表示生成的數據文件名中有時間

也可讓其自動監控,天天一個文件夾,每小時一個文件,每分鐘或5分採樣一次

 

 

 

參數解釋:
-s10 每 10 秒採集一次數據。
-c60 採集 60 次,即爲採集十分鐘的數據。
-f 生成的數據文件名中包含文件建立的時間。
-m 生成的數據文件的存放目錄。

 

nmon -f -s 10 -c 60

-f 表示生成的數據文件名中有時間;

-s 10 表示每 10 秒採集一次數據;

-c 60 表示採集 60 次,10*60=600 秒,

恰好是 10分鐘的數據,這樣運行一次這個程序就會生成一個採集 10分鐘數據的文件。該行命令將在當前目錄中建立輸出文件,其名稱爲:_date_time.nmon", hostname" 是這臺主機的主機名。

 

更多用法
crontab -e
0 0 * * * /usr/local/nmo -s300 -c288 -f -m /home/nmon/ > /dev/null 2>&1
表示:
300*288=86400 秒,正好是一天的數據。
0 8 * * 1,2,3,4,5 /usr/local/nmon -f -N -m /home/nmon/log -s 30 -c 1200
表示:
週一到週五,從早上08點開始,監控10個小時(到18:00整爲止),輸出到/home/nmon/log。

 

 

範例1:

自動按天採集數據:
在 crontab 中增長一條記錄:
0 0 * * * nmon -s300 -c288 -f -m /home/ > /dev/null 2>&1:
300*288=86400 秒,正好是一天的數據。

範例2:
A、執行命令:#crontab -e
B、在最後一行添加以下命令:
 0 8 * * 1,2,3,4,5 /nmon/scriptn/mon_x86_rhel52 -f -N -m /nmon/log -s 30 -c 1200
表示:
 週一到週五,從早上08點開始,監控10個小時(到18:00整爲止),輸出到/nmon/log

 

3、nmon數據採集

一、數據採集

爲了實時監控系統在一段時間內的使用狀況並將結果記錄下來,咱們能夠經過運行如下命令實現:

 

1 #./ nmon -f -t -s30 -c 180

參數說明:

l-f:按標準格式輸出文件:_YYYYMMDD_HHMM.nmon;

l-t:輸出中包括佔用率較高的進程;

l-s30:每30秒進行一次數據採集

l-c180:一共採集180次

輸入命令回車後,將自動在當前目錄生成一個hostname_timeSeries.nmon的文件,nmon輸出文件的命名方式是服務器名_日期時間.nmon

特別說明:

該命令啓動後,會在nmon所在目錄下生成監控文件,並持續寫入資源數據,直至360個監控點收集完成——即監控1小時,這些操做均自動完成,無需手工干預,測試人員能夠繼續完成其餘操做。若是想中止該監控,須要經過「#ps–ef|grepnmon」查詢進程號,而後殺掉該進程以中止監控。

 

[root@localhost source]# hostname

linux_test

[root@localhost source]# ./nmon -f -s 10 -c 60

[root@localhost source]# ps -ef | grep nmon

root 17815 1 0 08:22 pts/1 00:00:00 ./nmon -f -s 10 -c 60

root 17888 6977 0 08:22 pts/1 00:00:00 grep nmon

[root@localhost source]# ls linux_test_120724_0822.nmon

linux_test_120724_0822.nmon

當咱們執行相關命令後,就在當前目錄生成了一個以本主機名linux_test開頭,以執行日期爲規則,nmon結尾的文件,而且咱們經過ps命令會看到相關的nmon進程,這裏咱們當10分鐘過去之後咱們會發現相關nmon進程會消失的。

linux_test_120724_0822.nmon 就是咱們所生成的數據文件,全部的信息都在linux_test_120724_0822.nmon記錄,並且咱們經過more命令後發現都是咱們看不懂的一些文本,這就須要咱們把其轉化成咱們能看懂的excel格式的文件。首先咱們把linux_test_120724_0822.nmon文件導出到咱們的windows本地。而後咱們從http://www.ibm.com/developerworks/wikis/display/Wikiptype/nmonanalyser

下載nmonanalyser軟件到windows本地。打開解壓縮文件後,咱們會發現2個文件,一個是nmonanalyse的說明word格式的說明穩定,另外一個是excel格式的nmonanalyse文件。

咱們這裏所須要的就是excel格式問文件,咱們雙擊打開。

 

四.生成圖形化結果

下載 nmon analyser (生成性能報告的免費工具):

下載地址:

https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/b7fc61a1-eef9-4756-8028-6e687997f176/attachment/721e9797-b5fc-41d7-9b2f-5bd2aa2c8f7d/media/nmon_analyser_34a.zip

 

解壓以後能夠獲得一個nmon analyser v34a.xls,

雙擊打開nmon analyser v34a.xls;

設置excel宏的安全級別調低
選中上圖中的選項,

打開nmon_analyser中的excel,而後載入nmon數據,保存輸出的excel便可。

 

 

動態顯示:







 一、概述

  監控,在檢查系統問題或優化系統性能工做上是一個不可缺乏的部分。經過操做系統監控工具監視操做系統資源的使用狀況,間接地反映了各服務器程序的運行狀況。根據運行結果分析能夠幫助咱們快速定位系統問題範圍或者性能瓶頸點。

  nmon是一種在AIX與各類Linux操做系統上普遍使用的監控與分析工具,相對於其它一些系統資源監控工具來講,nmon所記錄的信息是比較全面的,它能在系統運行過程當中實時地捕捉系統資源的使用狀況,而且能輸出結果到文件中,而後經過nmon_analyzer工具產生數據文件與圖形化結果。

  nmon所記錄的數據包含如下一些方面(也是咱們在尋找問題過程當中所關注的資源點):

  ● cpu佔用率

  ● 內存使用狀況

  ● 磁盤I/O速度、傳輸和讀寫比率

  ● 文件系統的使用率

  ● 網絡I/O速度、傳輸和讀寫比率、錯誤統計率與傳輸包的大小

  ● 消耗資源最多的進程

  ● 計算機詳細信息和資源

  ● 頁面空間和頁面I/O速度

  ● 用戶自定義的磁盤組

  ● 網絡文件系統

  另外在AIX操做系統上,nmon還能監控到其餘的一些信息,如異步I/O等。

  二、下載安裝nmon

  如何獲取nmon呢?咱們能夠在IBM的官方網站上免費下載獲取,下載網址爲:http://www.ibm.com/developerworks/wikis/display/WikiPtype/nmon。

  nmon的安裝步驟以下:

  1)用root用戶登陸到系統中;

  2)建目錄:#mkdir /test;

  3)把nmon用ftp上傳到/test,或者經過其餘介質拷貝到/test目錄中;

  4)執行受權命令:#chmod +x nmon。

  三、nmon數據採集

  3.1 數據採集

  爲了實時監控系統在一段時間內的使用狀況並將結果記錄下來,咱們能夠經過運行如下命令實現:

  #./ nmon -f -t -s 30 -c 180

  n -f:按標準格式輸出文件:_YYYYMMDD_HHMM.nmon;

  n -t:輸出中包括佔用率較高的進程;

  n -s 30:每30秒進行一次數據採集

  n -c 180:一共採集180次

  輸入命令回車後,將自動在當前目錄生成一個hostname_timeSeries.nmon的文件,若是hosname爲test1,生產的文件爲:test1_090308_1313.nmon。


  經過sort命令能夠將nmon結果文件轉換爲csv文件:

  # sort -A test1_090308_1313.nmon > test1_090308_1313.csv

  執行完sort命令後便可在當前目錄生產test1_090308_1313.csv文件。

  3.2 生成圖形化結果

  爲了分析nmon監控得到的結果,IBM還提供了相應的圖形化分析工具nmon_analyser,經過nmon analyser.xls工具能夠把監控的結果文件轉換成excel文件,方便分析系統的各項資源佔用狀況。

  nmon analyser.xls工具的使用方法以下:

  (1)打開nmon analyser.xls工具;

  (2)調整excel宏安全性:工具-宏-安全性

  (修改安全級別與可靠發行商)

  (選擇)安全級別:低

  (勾上)信任全部安裝的加載項和模板

  (勾上)信任對於「Visual Baisc項目」的訪問

  (3)修改完後,肯定-關閉nmon analyser.xls,從新打開;

  (4)點擊Analyse nmon data按鈕,加載以前下載的test1_090308_1313.csv文件。

  如下是分析結果的截圖:

  以上就是nmon的簡單描述與使用介紹,你們能夠根據本身所採集到的結果分析系統的狀況。






nmon for Linux - nmon is short for Nigel's performance Monitor for Linux on POWER, x86, x86_64, Mainframe & now ARM (Raspberry Pi)


STOP PRESS: nmon for Linux Hits 500,000 downloads July 2017

This systems administrator, tuner, benchmark tool gives you a huge amount of important performance information in one go. It can output the data in two ways

  1. On screen (console, telnet, VNC, putty or X Windows) using curses for low CPU impact which is updated once every two seconds. You hit single characters on you keyboard to enable/disable the various sorts of data.
  • You can display the CPU, memory, network, disks (mini graphs or numbers), file systems, NFS, top processes, resources (Linux version & processors) and on Power micro-partition information.
  • For lots of examples, see the "Screen shots" from the left menu.
  • As you can see on the left lmon12e now in colour
Save the data to a comma separated file for analysis and longer term data capture. 
  • Use nmonchart (from this website) to generate a Googlechart webpage.
  • Use this together with nmon Analyser Microsoft Excel spreadsheet, which loads the nmon output file and automatically creates dozens of graphs ready for you to study or write performance reports.
  • Filter this data, add it to a rrd database (using an excellent freely available utility called rrdtool). This graphs the data to .gif or .png files plus generates the webpage .html file and you can then put the graphs directly on a website automatically on AIX with no need of a Windows based machine.
  • Directly put the data into a rrd database or other database for your own analysis
Latest version nmon for Linux is 16f
Download the precompiled binaries or nmon source code

 


More details

 

  • nmon is a single binary for
    • each operating system (Red Hat, SUSE, Ubuntu, Fedora, OpenSUSE etc.) and
    • each platform (Power, Mainframe, arm, x86 or x86_64).
  • Installing is very easy - just start the right executable.
    • Or rename the version you need to /usr/bin/nmon and then type: nmon
  • Why use five or six tools when one free tool can give you everything you need!!
  • For the pre-compiled versions - click on Download
  • For the source code & compiling - click on Compiling nmon

 


On-screen

When using nmon via a terminal session you can see the performance data directly on the screen and updated every second. You should if possible, stretch the terminal window to be longer to see more stats at one time. Here is a sample example from a Raspberry Pi 2 running Ubuntu 15.10 and nmon v16b. I typed "cCUd" to display this data.

For more screen shots take the left-hand side menu option Screen shots or click Screen shots.

 


Data Analysis

Once you save the nmon data you have a number of options to analyse and graph the statistics:

  • nmonchart tool/script - see left hand menu
    • Nigel's nmonchart tools is quick and simple to convert a nmon output file to a webpage file .html that you can open with a browser directly or add to a website to share.
    • It takes a second or too and generates very nice looking graphs.
    • It is implemented in Korn shell script so you can add features (please share your updates).
    • The Clever part is using the Google.com Charting Javascript Library and your browser to do the actual graphing.
    • This this works on your PC, tablet or even larger mobile phone regardless of operating system.
    • Click here to find out more nmonchart
  • nmon Analyser Excel Spread-sheet Download 
    • This is the original tool and been developed over many years by Stephen Atkins
    • You can request support via the Performance Tools Forum
    • However, Linux users might not like the idea of using the Microsoft Excel Spreadsheet and automating the creation of graphs can be tricky.
    • Sample Graphs out of the many (see screen shots for more and larger examples:
    • CPU Compared to Disk I/O
    • Disk Read and Write with I/O per second
    • Hot Disk analysis with Average, Weighted Average and Peak values
    • Network Read (top half) and Write (bottom half) Transfer Rates
  • nmon Consolidator Excel Spread-sheet Download 
    • This is a newer tool and can combine nmon output files. It is by Stephen Atkins
    • Again its Microsoft Excel Spreadsheet
  • nmon2rrd
    • Microsoft free tool
    • This tool uses the excellent rrdtool to generate all the graphs and a website .html file.
    • Download it from the nmon for AIX Wiki
    • This allows the automated analysis on many machines and viewing via a Browser.

 


Now - Open Source

nmon for Linux is a single source code file of 5000 lines and single makefile. This will enable you to compile nmon for your precise Linux version (if you can't find what you want in the binaries) and open a few other possibilities:

  • Fixing my code - be gentle, please.
  • Removing magic numbers i.e. constants that can catch us out as machines get larger
  • Developing for some strange environments like machines with no disks, blades that boot from NFS, internal Linux based engines within disks subsystems, embedded machines.
  • Who knows we may get nmon for Linux within the Linux Distro's - any one know how to go about that?

Thanks for your support, suggestions, testing and I hope this starts a whole new wave of development and interest.

 


History

 

  • nmon for Linux was an internal project at IBM for many years and was released to open source under GPL on 27th July 2009.
  • Sourceforge.net is being used to host the project, see http://sourceforge.net/projects/nmon 
  • nmon for AIX does has a similar online look, file format but was always complete different source code.
    • AIX 5.3 TL09
    • AIX 6.1 TL02.
    • It is now integrated into AIX topas command from
    • nmon for AIX is not open source.
    • For more information nmon for AIX Wiki



Documentation

nmon of Linux Documentation - Updated 21st Nov 2016

Ha ha ha ha - you are joking right :-)


This page contains the following sections:

  1. Hardware and Linux Supported
  2. Getting Started via YouTube Videos - Including nmon for Linux and nmon for AIX
  3. Getting Started - If you prefer to read the absolute minimum
  4. nmon Command help - nmon for Linux -? and -h command output for the full syntax
  5. nmon Support
  6. Other sources


Hardware and Linux Supported

  • Platforms = hardware 
    • POWER
    • x86_64 = AMD64 - 63 bit
    • Mainframe
    • x86 = 32 bit dropping off rapidly
    • ARM Raspberry Pi 2+3
    • Others . . . include embedded processors running Linux
  • Linux Distro’s
    • Ubuntu
    • Debian
    • SUSE SLES
    • OpenSUSE
    • Red Hat RHEL
    • Fedora
    • Centos
    • Many others . . .

 



YouTube Videos about nmon for Linux

Many People prefer to watch a YouTube Video to learn - here are the six videos on nmon for both Linux and AIX operating systems. These are all the details you need to know to use nmon well. Note: nmonchart creates a .html file of all the graphs - it is one of many nmon graphing tools.

nmon for Linux

  1. nmon for Linux Starter Pack 20 minutes
  2. nmon for Linux Data Capture 15 minutes
  3. nmonchart to graph your nmon data files 22 minutes

Many nmon users use both Linux and AIX so here are the AIX equivalent videos and many nmon for Linux user use the nmon Analyser (Microsoft Excel spreadsheet)

nmon for AIX

  1. nmon Starter Pack Monitoring Online 14 minutes
  2. nmon Starter Pack for AIX Data Capture 15 minutes
  3. nmon Starter Pack for AIX Analyser 10 minutes

 



nmon for Linux Getting Started - If you prefer to read the absolute minimum

Below assumes that you are logged on your system, that you have renamed your nmon binary file to just "nmon", that the nmon file has execute permission (chmod ugo+x nmon) and it is in your PATH.

Using nmon for Linux Online

  • Just start nmon for Linux with: nmon
  • To stop it, just type: q
  • To get on screen hints type: h
    • and h again to remove the hints
  • Most of the rest are toggled commands i.e. type c to see the CPU stats and type c again to remove CPU stats.
  • The various stats come out in a set order (you can't control this) starting with CPU then memory and finally top processes at the bottom as there can be many processes this tends to fill up the rest of the window
  • Note if you make the window larger you can see more lines of output - this works in X Windows, VMC and Putty.
  • For memory stats type m
  • For disk graphs type d and you will see a 50 column graph of the read and write busy percentages
  • For disk numbers type D and if you type D again you see different information eventually typing D will close this section
  • For top processes there are different modes for the order of displaying the processes and different information, See the top line of the Top Processes section for further details.

Using nmon for Linux in data capture mode

  • Start by capturing a small sample file. Type: nmon -f -s2 -c 30
    • The -f means you want the data saved to a file and not displayed on the screen.
    • The -s 2 means you want to capture data every 2 seconds
    • The -c 30 means you want thirty data points or snap shots
    • This means that after a few seconds collecting the configuration nmon for Linux will run for 2 x 30 = 60 seconds and stop. At the end some further configuration data is collected.
  • As nmon for Linux start up briefly check you system and options and then disconnect from you terminal sessions.
  • It then runs like a daemon process in the background. The point is that if you log out or get disconnected then nmon will complete the data file capture - this is a good thing.
  • If you want to be sure nmon is still running you can't use a simple "ps" because it is not associated with your log on session. Use "ps -ef | grep nmon" instead.
  • It is a common mistake to try to start using nmon for Linux output file before nmon has finished - this results in a either incomplete data which messes up a later tool or if you asked to a longer time between snap shots then there may be no data actually in the file, which confuses all tools trying to analyse the file.
  • Once nmon for Linux has finished and to build confidence try:
    • This should output the one line for each snapshot and date and time it happened.
  1. Use: grep ZZZZ yourfile.nmon
  2. Edit the nmon file with vi. You will notice it is a simple text file The start of each line defines the content of the line and then the values are separated with commas. This means the file can be imported in to a spread sheet. If you want to manually import the file, make sure you sort the file first (with the um er "sort" command). This sort means all the lines of a particular type are together. A sort is not required by most of the nmon for Linux analysing tools as they perform the function themselves.

 




Hint for nmon version 16d


 # ./nmon -h

Hint for nmon version 16d

        Full Help Info : nmon -h         On-screen Stats: nmon         Data Collection: nmon -f [-s ] [-c ] [-t|-T]         Capacity Plan  : nmon -x

Interactive-Mode:

        Read the Welcome screen & at any time type: "h" for more help         Type "q" to exit nmon

For Data-Collect-Mode

        -f            Must be the first option on the line (switches off interactive mode)                       Saves data to a CSV Spreadsheet format .nmon file in then local directory                       Note: -f sets a defaults -s300 -c288    which you can then modify         Further Data Collection Options:         -s   time between data snapshots         -c     of snapshots before exiting         -t            Includes Top Processes stats (-T also collects command arguments)         -x            Capacity Planning=15 min snapshots for 1 day. (nmon -ft -s 900 -c 96)

End of Hints


Full Help Information for nmon 16d


For Interactive and Data Collection Mode:

        User Defined Disk Groups (DG) - This works in both modes         It is a work around Linux issues, where disks & partitions are mixed up in /proc files         & drive driver developers use bizarre device names, making it trick to separate them.         -g  Use this file to define the groups                       - On each line: group-name    (space separated list)                       - Example line: database sdb sdc sdd sde                       - Up to 64 disk groups, 512 disks per line                       - Disks names can appear more than one group         -g auto       - Will generate a file called "auto" with just disks from "lsblk|grep disk" output          For Interactive use define the groups then type: g or G          For Data Capture defining the groups switches on data collection

Data-Collect-Mode = spreadsheet format (i.e. comma separated values)

        Note: Use only one of f, F, R, x, X or z to switch on Data Collection mode         Note: Make it the first argument then use other options to modify the defaults         Note: Don't collect data that you don't want - it just makes the files too large         Note: Too many snapshots = too much data and crashes Analyser and other tools         Note: 500 to 800 snapshots make a good graph on a normal size screen         Recommended normal minimal options: snapshots every 2 minutes all day:                 Simple capture:      nmon -f  -s 120 -c 720                 With Top Procs:      nmon -fT -s 120 -c 720                 Set the directory:   nmon -fT -s 120 -c 720 -m /home/nag/nmon                 Capture a busy hour: nmon -fT -s   5 -c 720 -m /home/nag/nmon

For Data-Collect-Mode Options

        -f            spreadsheet output format [note: default -s300 -c288]                          output file is _YYYYMMDD_HHMM.nmon         -F  same as -f but user supplied filename                          Not recommended as the default file name is perfect         The other options in alphabetical order:         -a            Include Accelerator GPU stats         -b            Online only: for black and white mode (switch off colour)         -c    The number of snapshots before nmon stops         -d     To set the maximum number of disks [default 256]                       Ignores disks if the systems has 100's of disk or the config is odd!         -D            Use with -g to add the Disk Wait/Service Time & in-flight stats         -f and -F     See above         -g  User Defined Disk Groups (see above) - Data Capture: Generates  BBBG & DG lines         -g auto       See above but makes the file "auto" for you of just the disks like sda etc.         -h            This help output         -I   Set the ignore process & disks busy threshold (default 0.1%)                       Don't save or show proc/disk using less than this percent         -l       Disks per line in data capture to avoid spreadsheet width issues. Default 150. EMC=64.         -m  nmon changes to this directory before saving to file                       Useful when starting nmon via cron         -M              Adds MHz stats for each CPU thread. Some POWER8 model CPU cores can be different frequencies         -N            Include NFS Network File System for V2, V3 and V4         -p            nmon outputs the PID when it starts. Useful in scripts to capture the PID for a later safe stop.         -r   Use in a benchmark to record the run details for later analysis [default hostname]         -R              Old rrdtool format used by some - may be removed in the future. If you use this email Nigel         -s   Time between snap shots - with "-c count" decides duration of the data capture         -t            Include Top Processes in the output         -T            As -t plus it saves command line arguments in UARG section         -U            Include the Linux 10 CPU utilisation stats (CPUUTIL lines in the file)         -V            Print nmon version & exit immediately         To manually load nmon files into a spreadsheet:                 sort -A *nmon >stats.csv                 Transfer the stats.csv file to your PC                 Start spreadsheet & then Open with type=comma-separated-value ASCII file                 This puts every datum in a different cell                 Now select the data of one type (same 1st column) and graph it                 The nmon Analyser & other tools do not need the file sorted.

Capacity Planning mode - use cron to run each day

        -x            Sensible spreadsheet output for one day                       Every 15 mins for 1 day ( i.e. -ft -s 900 -c 96)         -X            Sensible spreadsheet output for busy hour                       Every 30 secs for 1 hour ( i.e. -ft -s 30 -c 120)         -z            Like -x but the output saved in /var/perf/tmp assuming root user

Interactive Mode Keys in Alphabetical Order

    Start nmon then type the letters below to switch on & off particular stats     The stats are always in the same order on-screen     To see more stats: make the font smaller or use two windows         Key --- Toggles on off to control what is displayed ---         b   = Black and white mode (or use -b command line option)         c   = CPU Utilisation stats with bar graphs (CPU core threads)         C   = CPU Utilisation as above but concise wide view (up to 192 CPUs)         d   = Disk I/O Busy% & Graphs of Read and Write KB/s         D   = Disk I/O Numbers including Transfers, Average Block Size & Peaks (type: 0 to reset)         g   = User Defined Disk Groups            (assumes -g  when starting nmon)         G   = Change Disk stats (d) to just disks (assumes -g auto   when starting nmon)         h   = This help information         j   = File Systems including Journal File Systems         k   = Kernel stats Run Queue, context-switch, fork, Load Average & Uptime         l   = Long term Total CPU (over 75 snapshots) via bar graphs         L   = Large and =Huge memory page stats         m   = Memory & Swap stats         M   = MHz for machines with variable frequency 1st=Threads 2nd=Cores 3=Graphs         n   = Network stats & errors (if no errors it disappears)         N   = NFS - Network File System               1st NFS V2 & V3, 2nd=NFS4-Client & 3rd=NFS4-Server         o   = Disk I/O Map (one character per disk pixels showing how busy it is)               Particularly good if you have 100's of disks         p   = PowerVM LPAR Stats from /proc/ppc64/lparcfg         q   = Quit         r   = Resources: Machine type, name, cache details & OS version & Distro + LPAR         t   = Top Processes: select the data & order 1=Basic, 3=Perf 4=Size 5=I/O=root only         u   = Top Process with command line details         U   = CPU utilisation stats - all 10 Linux stats:               user, user_nice, system, idle, iowait, irq, softirq, steal, guest, guest_nice         v   = Experimental Verbose mode - tries to make recommendations         V   = Virtual Memory stats         Key --- Other Interactive Controls ---         +   = Double the screen refresh time         -   = Halves the screen refresh time         0   = Reset peak counts to zero (peak highlight with ">")         1   = Top Processes mode 1 Nice, Priority, Status         3   = Top Processes mode 3 CPU, Memory, Faults         4   = Top Processes mode 4 as 3 but order by memory         5   = Top Processes mode 5 as 3 but order by I/O (if root user)         6   = Highlights 60% row on Long Term CPU view         7   = Highlights 70% row on Long Term CPU view         8   = Highlights 80% row on Long Term CPU view         9   = Highlights 90% row on Long Term CPU view         .   = Minimum mode i.e. only busy disks and processes shown         space = Refresh screen now

Interactive Start-up Control

        If you find you always type the same toggles every time you start         then place them in the NMON shell variable. For example:          export NMON=cmdrtn

Other items for Interactive and Data Collection mode:

        a) To limit the processes nmon lists (online and to a file)             either set NMONCMD0 to NMONCMD63 to the program names             or use -C cmd:cmd:cmd etc. example: -C ksh:vi:syncd

Other items for Data Collection mode:

        b) To you want to stop nmon use: kill -USR2          c) Use -p and nmon outputs the background process pid         d) If you want to pipe nmon output to other commands use a FIFO:             mkfifo /tmp/mypipe             nmon -F /tmp/mypipe &             tail -f /tmp/mypipe         e) If nmon fails please report it with:            1) nmon version like: 16d            2) the output of: cd /proc; cat cpuinfo meminfo partitions stat vmstat            3) some clue of what you were doing            4) I may ask you to run the debug version or collect data files         f) If box & line characters are letters then check: terminal emulator & $TERM         g) External Data Collectors - nmon will execute a command or script at each snapshot time            They must output to a different file which is merge afterwards with the nmon output            Set the following shell variables:             NMON_START  = script to generate CVS Header test line explaining the columns                  Generate: TabName,DataDescription,Column_name_and_units,Column_name_and_units ...             NMON_SNAP   = script for each snapshots data, the parameter is the T0000 snapshot number                  Generate: TabName,T00NN,Data,Data,Data ...             NMON_END    = script to clean up or finalise the data             NMON_ONE_IN = call NMON_START less often (if it is heavy in CPU terms)             Once capture done: cat nmon-file data-file >merged-file ; ready for Analyser or other tools             The nmon Analyser will automatically do its best to graph the data on a new Tab sheet         Developer: Nigel Griffiths     See http://nmon.sourceforge.net Feedback welcome - On the current release only         No warranty given or implied. Copyright GPLv3

 



nmon for Linux Support

If you:

  • need help running nmon or understanding the data saved
  • have a suggestion for improvements
  • have bug fixes or want to report errors
  • want to extend nmon for Linux on to other platforms
  • want to include nmon on your distro (not a problem, I would just like to know)

getting in touch at the nmon for Linux - Help Forum

 

  • Alternatively use the Performance Tools Forum





NMON_Analyser User Guide for V4.6


Preface

NMON_Analyser is designed to complement NMON (Nigel’s Monitor) in analysing and reporting performance problems; it produces graphs for virtually all sections of output created using the 「spreadsheet output」 mode of NMON as well as doing some additional analyses for ESS, EMC and FAStT subsystems.  It will also work with files produced by topasout and with other tools that produce data in 「NMON」 format. It is written in VBA for Excel and will work with Excel 2007 or later.    It may work also on Excel 2003 with the required Microsoft updates to support .xlsx files https://support.microsoft.com/kb/924074?wa=wsignin1.0

 

NMON was originally written by Nigel Griffiths (nag@uk.ibm.com) and is now (since AIX 5.3 TL09 and AIX 6.1 TL02) part of topas.  NMON_Analyser was originally written by Stephen Atkins with contributions from many people – including Ralf Schmidt-Dannert and Markus Fehling, both of IBM.  Currently nmon Analyzer is maintained by Ron McCargar (mccargar@us.ibm.com.)

 

Support for NMON_Analyser is provided on a best efforts basis.  Please direct questions to the User Forum (see below) rather than contact the author direct.    

 

Links:

NMON_Analyser home page https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/nmon_analyser

User Forum https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000749&ps=25


Collecting data using NMON

Please make sure you have the latest versions of both NMON and NMON_Analyser before starting a new engagement.  If you want automatic notification of a new release of the Analyser send a note to steve_atkins@uk.ibm.com and I’ll add you to my distribution list.  Nigel maintains a similar list for NMON.

 

You will need to have root privileges in order to get a complete set of data on the BBBP sheet.  In order to collect data for the DISKBUSY sheets you need to make sure that iostat data collection is enabled:

chdev -l sys0 -a iostat=true

 

For spreadsheet output mode (comma separated values) use the following flags when invoking nmon:

-f           spreadsheet output format [note: default -s300 -c288]

         Output file is _YYYYMMDD_HHMM.nmon

-F   same as -f but user supplied filename

-c     number of snapshots

-d      requests disk service and wait times (DISKSERV and DISKWAIT)

-i   Ignore processes using less than this amount of CPU when generating TOP section – useful for reducing data volumes

-g file containing disk group definitions

-l   number of hdisks per sheet - defaults to 150, maximum 250.  See notes

-m 

    NMON changes to this directory before saving the file


-r    goes into spreadsheet file [default hostname]

-s    interval between snap shots

-x  capacity planning (15 mins for 1 day = -fdt -s900 -c96)

-t         include top processes in the output

-T         as –t plus saves command line arguments in UARG section

-A         include data for async I/O (PROCAIO) sections

-Dprevents DISK sections being produced (useful when Disk Groups are being used because there are too many hdisks to process)

-E         stops ESS sections being produced (necessary when Disk Groups are being used because there are too many vpaths to process)

-J prevents JFS sections being produced (prevents Excel errors when you have more than 255 filesystems)

-Lincludes LARGEPAGE section

-Ninclude NFS sections

-Sinclude WLM sections with subclasses

-Winclude WLM sections without subclasses

-Yinclude SUMMARY section (very efficient alternative to –t if PID level data is not required)

 

        example: nmon_aix51 -F asterix.nmon -r Test1 -s6 -c12


Notes:

1. The –f (or –F) flag must appear first.

2. The value of the -l flag controls the number of hdisks per sheet on the DISK sheets and per line on the BBBD sheet.   There are two factors to consider when choosing this value.  Excel has a limit of 256 columns per sheet; however, both NMON and NMON_Analyser use some columns, so the upper limit is really 250.   The second factor is that Excel VBA has an upper limit of 2048 bytes for input line length.   This particularly affects users of EMC systems that use long hdisk names (e.g. hdiskpower123).   The default of 150 is safe for such systems.   Other users may set the value to 250 in order to reduce the number of output sheets.

3. Consider the value of the -s flag very carefully.   The shorter the interval between snapshots, the more variable the values for each resource will be.  If you use an interval of 1 second, don’t be surprised to see many of your disks hitting 100% busy for short periods.  For normal monitoring, 10-minute intervals (-c 600) provide a good balance.

4. The graphs produced by NMON_Analyser look best when the number of snapshots (specified by the -c flag) is 300 or less.    

5. The TOP section (produced by specifying the -t flag) can generate large amounts of output and the size of the output can grow exponentially if a large value is specified for the -c flag.   If you want the TOP section then specify no more than 250 snapshots - ideally less.

Collecting data using topas (xmwlm)

You need AIX V5.3 TL5 Service pack 4 with APAR IY87993 or later.    Note that only the output using the –a flag can be analysed.  In particular, cross-partition statistics cannot be analysed; if you wish to get a report for the entire machine, collect data from each LPAR separately and then use NMON_Consolidator to merge the data.  Commands like the following are required to collect the data.

 

topasout -a /etc/perf/daily/xmwlm.yymmdd

:

cp /etc/perf/daily/xmwlm.yymmdd_01 hostname.ddmmyy.topasout.csv

Using NMON_Analyser

· FTP the input file to your PC – ideally using the ASCI or TEXT options to make sure that lines are terminated with the CRLF characters required by Windows applications.

· Open the NMON_Analyser spreadsheet and specify the options you want on the 「Analyser」 and 「Settings」 sheets (see below). Save the spreadsheet if you want to make these options your personal defaults.

? Click on the "Analyse nmon data" button and find/select the .nmon file(s) to be processed.  You may select several files in the same directory.  If you wish to process several files in different directories you may wish to consider using the 「FILELIST」 option described below.

? You may see the message SORT command failed for 「filename」 if the file has >65K lines and the filename (or directory name) contains blanks or special characters.  Either rename the file/directory or just pre-sort the file before using the Analyser.

Analyser options

 

GRAPHSThe first option is either ALL or LIST.   If the value is LIST then only those sheets which appear in the LIST on the Settings sheet will have graphs drawn for them.    This option is particularly useful if the graphs are to be printed/published or to reduce the amount of memory/fonts/disk space required when analysing files from large systems.

The second is either CHARTS, PICTURES, PRINT or WEB.   The meaning of these are as follows:

· CHARTS produce Excel charts in-place on the selected sheets

· PICTURES graphs will be produced on a separate 「Charts」 sheets as pictures.  Selecting this option can reduce the size of the output file by up to 90%.

· PRINTimplies PICTURES.  Pictures will be printed to the designated printer (see 「Printing Options」 below)

· WEB implies PICTURES.  Automated web publishing (see 「Web Publishing」 below)

INTERVALSspecifies the first and last time interval to be processed.   Intervals outside this range will be discarded after parsing.   Note that these are numbers between 1-999999 and are not time values.   Setting a value of 2 for the first interval is useful in discarding the very large numbers that often appear at the start of an NMON collection run with AIX.   If you have used a splitter program on the input file, or if you are analysing data from a LINUX system, then you should leave this as 1.

TIMESspecifies the first and last time/date to be processed.   Samples outside this range will be discarded after parsing.   They can be specified in any form recognised by Excel as time/date values; e.g.  

14:00:1016:15:30

 4-Aug-12 6-Aug-12

18:00 28/6/1204:00

    Notes: 

· Specifying a date without a time is the same as specifying a time of 00:00.

· If the second time is less than the first then Analyser will assume the second time is in the next calendar day

· If there are no qualifying intervals in the file a message will be issued:

「Invalid values for FIRST/LAST - values reset to 1/999999」

and the entire file will be processed.

MERGEspecifying YES here results in NMON_Analyser merging all of the input files to form a single file.   The input files must be unsorted. By default the Analyser will delete the TOP and UARG sections during the merge process; specify TOP to prevent this from happening but be aware that processing time will be increased and that if the TOP section exceeds the maximum number of lines per sheet (depending on the version of Excel) then data will be lost. Specify KEEP to stop the merged file from being deleted at the end of the run.  Specify ONLY if you simply want to create a merged file for future analysis.

PIVOTspecifying YES here results in NMON_Analyser creating  a Pivot Chart from the specified sheet after all other processing (including printing/publishing the other charts) has completed.   See additional parameters on the 「Settings」 sheet.

ESSspecifying NO here results in NMON_Analyser bypassing the additional analysis performed for ESS subsystems.   This will result in faster analysis and can allow larger files to be analysed successfully when 「out of memory」 errors occur.

FILELISTthe name of a control file containing a list of nmon output files to be processed by the Analyser.  Leave this field blank for normal operation.  The name must be fully qualified (e.g. c:\nmon\testcases\filenames.txt).   The names specified in the text file must contain full path information.  Wildcard characters may be included in the filenames so long as they conform to Windows standards. For example:

c:\nmon\testcases\*.nmon

NB: if you save the spreadsheet with a value in this field, the Analyser will automatically begin execution the next time you open it.  This is defined as 「batch mode」 (see Appendix).  You can stop the execution by pressing Ctrl+Break or by deleting/renaming the Control File.

Batch Processing Options

The following fields can be found on the 「Settings」 sheet.

 

REPROCChange this to NO if you want to bypass processing of input files which may have been processed in a previous run.   This is useful if you make use of wildcards in the batch control file.  Note that the REPROC option only takes effect when you have specified more than one input file.      

OUTDIRthe name of an existing directory in which output files will be saved by default. This is primarily intended for batch operation (seeFILELIST above) but also works to set the default directory for interactive sessions.  If the directory does not exist (or OUTDIR is blank) then output files are saved back to the same directory as the corresponding input files.

Example: C:\NMON\Analyser\Output\

Formatting Options

The following fields can be found on the 「Settings」 sheet.

 

BBBFontEnter the name of a fixed pitch font to be used for formatting the BBBC and BBBP sheets.  Courier provides acceptable results.

GWIDTH Change the values in this row to make the generated graphs bigger or smaller.  The default value of 0 means that the Analyser will dynamically size the graphs according to your screen size, font settings or page size.   Be careful not to set a value larger than your page width when printing.

GHEIGHT  Change the values in this row to make the generated graphs bigger or smaller.  If you specify a value here you must also specify a value for GWIDTH.

LIST  A comma-separated list a sheets for which the Analyser is to draw graphs.  Only used if the GRAPHS option is set to LIST.   The list can contain any valid wildcard characters recognised by Excel, e.g. 「EMC*」    

Note: graphs are always drawn for SYS_SUMM, CPU_SUMM and DISK_SUMM

NOLISTThe default is KEEP.  If you change this to DELETE then all sheets which do not appear in LIST will be deleted after analysis.  This can dramatically reduce the size of files that are to be kept for long periods. 

CPUmaxSpecifies the maximum number of CPUnnn, PCPUnnn and SCPUnnn sheets that will be generated.    The default value of 0 will choose all sheets for an LPAR using dedicated processors and number equal to the SMT mode for an LPAR using shared processors.    

REORDERspecifying YES here results in NMON_Analyser reordering the sheets to improve navigation to more relevant information.

TOPDISKSthe maximum number of hdisks/vpaths to include on disk graphs.  A value of 0 produces graphs containing all the hdisks on a sheet (up to 250).   Graphs containing more than 50 hdisks will be automatically scaled to fit and may therefore exceed the size of the screen.

xToDFormat to be used for timestamps on Time of Day graphs.  Anything acceptable to Excel as a Number Format Code  may be entered.   Default is hh:mm.  Note that  the date is also available within the timestamp and you may therefore use something like dd-mmm-yy hh:mm if, for example, you have merged multiple NMON files together.  If you use something other than the default string you may need to increase the value of GHEIGHT – experiment with different values if you don’t see what you expect.

SORTDEFAULT  This setting indicates if the 1st graph on 'default' sheets (ones not handled else where by the Analyzer) are sorted.  Note: If Yes, this will also reorder the columns.

Pivot Chart

These parameters are used to construct a pivot chart.     The required parameters are: Sheetname, PageField, RowField, ColumnField, DataField and xlFunction (can beCOUNT, SUM, MIN, AVG, MAX).  This is primarily useful for the TOP and SUMMARY sheets but might prove useful for other, possibly user-supplied, data sheets.       

Printing Options

The following fields can be found on the 「Settings」 sheet. Note - these only take effect if you select PRINT for the OUTPUT option on the Analyser sheet.

LSCAPEChange to YES if you want the Analyser to set the page orientation to Landscape.   By default the Analyser will fit one chart per page when printing landscape.  

COPIESSet to the number of copies to be printed.

PRINTERThe name of the printer.  Specifying a value of PREVIEW will cause the Analyser to invoke the Excel print preview function - useful for testing.    You may also specify DEFAULT to print to the system default printer or the name of network printer.

The Analyser adds page headers and footers.

Web Publishing Options

The following fields can be found on the 「Settings」 sheet.

PNGChange to NO if your browser can’t handle the PNG graphics format.  Graphics will be generated as GIF files.

SUBDIRIf this is YES then all supporting files, such as background textures and graphics, are organized in a separate folder. If this is NO then supporting files are saved in the same folder as the Web page.

WEBDIRthe name of an existing directory in which HTML files will be saved by default.  If the directory does not exist (or OUTDIR is blank) then output files are saved back to the same directory as the corresponding input file. 

Example: C:\NMON\Analyser\HTML\  

Interpreting the output sections

Notes on the 「Weighted Average」 as used in the Analyser

Several graphs produced by the Analyser show average, weighted average and maximum values.   Although everyone understands averages and maximums, the concept of a weighted average is a little more difficult to grasp.  

 

One of the problems we are faced with in analysing sample data is that resources on the target system may be idle for long periods during the collection.  For example, the NMON data collection may be started some time before the system reaches peak utilisation and may not be stopped until the workload being monitored has long since finished.   Although this does not affect the maximums it can severely affect the accuracy of the averages.

 

The idea of a weighted average is to apply a weighting factor to each snapshot to indicate how relevant that snapshot is to the average.    In NMON_Analyser, we use the value of the measurement itself as the weighting factor.   In effect, this produces a figure that shows how busy a resource is when it is active.  For example:  a database log disk is only active during the middle part of a benchmark.   We record the following figures for %tm_act (DISKBUSY):

 

Snapshot

%tm_act

1

0

2

0

3

0

4

0

5

0

6

6.1

7

6.3

8

6.5

9

9.1

10

5.9

11

0

12

0

13

0

14

0

15

0

 

The average for this set of data is 2.3 and the weighted average is 7.0.   The weighted average gives a better picture of how busy the disk is while logging is taking place.   NMON_Analyser uses the weighted average as a sort key when sorting the contents of disk sheets.

 

Note that, occasionally, NMON generates very large numbers for the first interval and this heavily skews the weighted average values.   If you see this problem occurring then change the  value for the FIRST parameter to 2 in order to exclude the first interval completely.   This is only a problem for AIX.

SYS_SUMM

This section is entirely generated by the Analyser and contains a useful summary of data taken from other sheets.    Note the the avg/max values for User%, Sys%, Wait% and Idle% are independent and will not add up to 100%.   The CPU% column shows User% + Sys % for each line.   

 

For non-partitioned or dedicated CPU partitions the graph shows the total CPU Utilisation (%usr + %sys) together with the Disk I/O rate (taken from the DISKXFER sheet) by time of day.   For micro-partitions the graph shows the number of physical CPUs being used instead of CPU%.

 

The value 「Max:Avg」 is simply the maximum value divided by the average.  If monitored over a long period of time the value for CPU% can be useful in spotting a system reaching saturation level (the ratio will steadily decrease).  If you have historical data then the value can also be useful in determining how much latent demand is present in a system running at saturation level.

AAA

The AAA section is generated by NMON at the start of the data collection and contains information about the system and NMON itself – contents vary by release, following is for 12e

 

AIXthis is the release / maintenance level of AIX being used on the target system as reported by the lslpp command.

buildthe particular build of nmon used to collect this data

commandthe command line used to invoke NMON and the date when it finished.

cpus  the number of CPUs in the system and the number active at the start of data collection.

datedate at the start of the collection.

disks_per_linethe setting of the -l flag on the NMON command line or the default value.

hardwarethe processor technology used in the target system.

hostthe hostname of the target system.

Intervalthe time (in seconds) between snapshots.

kernelinformation from the kernel - useful in identifying the type of kernel (32-bit or 64-bit) and whether this is an LPAR.

LPARNumberName the LPAR number followed by the name

MachineTypemachine type and model of the system

prognamethe name of the NMON executable.

runnametaken from the NMON command line if specified using the -r parameter, else defaults to hostname.

SerialNumber the machine serial number

snapshots the number of snapshots - this is used by the Analyser.   The Analyser will modify this value to match the number of snapshots actually found in the input file.

stealthis value will be a 1 if running on Linux and Steal% CPU is provided in the CPUnnn and CPU_ALL data.

subversion detailed information about the nmon version used to collect this data – including the date and time it was created.

timetime as shown by the system clock at the start of the collection. Also seedate value.

timestampsizethe number of characters used for timestamps in each record.  The default is 5.

TLthe Technology Level of the AIX release

userthe name of the user executing the NMON command.

versionthe version of NMON used to collect this data.

VIOSthe release/maintenance level if  this is a VIOS LPAR.

analyser the version of NMON_Analyser used to generate the output file together with the elapsed time (in seconds) for processing this file.

environmentthe version of Excel you are using.

parms the values of  most of the user options specified on the 「Analyser」 sheet.

settings the values of  most of the user options specified on the 「Settings」 sheet.

elapsed the execution time of the Analyser.  I use this for tuning.

NodeNamethe name of the LPAR’s node name

 

NMON_Analyser deletes the NOTES lines generated by NMON.

 BBBB

The BBBB sheet lists all of the disks listed in the ODM together with the capacity (in Gbytes) and the adapter type (SCSI/SSA/Fibre) as reported by lsdev.   Note that some fibre-attached devices do not report their capacity to AIX.

NMON_Analyser deletes the column containing the sort key generated by NMON on all BBB sheets.

BBBC

The BBBC sheet shows the output from the lspv command for all local disks at the start of the data collection. The Analyser highlights the hdisk name using a bold font and sets the sheet to use the fixed-pitch font specified on the NLS sheet (default Courier) in order to improve readability.

BBBD

The BBBD sheet shows a list of all I/O adapters listed in the ODM together with the hdisks addressed through that adapter.  

BBBE

The BBBE sheet contains data extracted from the lsdev command and shows the mapping between vpaths and hdisks.  NMON_Analyser uses this information to construct the ESSBUSY, E***SIZE and ESSWSIZE sheets.

BBBG

The BBBG sheet contains details of the NMON disk group mappings.

BBBL

The BBBL sheet is only produced if the operating system is running in a partitions and contains details of the configuration of the LPAR at the start of the collection run.

BBBN

The BBBN describes each network adapter in the system and shows the name, speed and MTU size.

BBBP

The BBBP sheet contains the un-interpreted output from the emstat and lsattr commands.  Note that to get output from these commands requires NMON to be running with root privileges.

The Analyser sets the sheet to use the fixed-pitch font specified on the NLS sheet (default Courier) in order to improve readability.  

BBBR

This sheet records dynamic LPAR reconfiguration events during the collection run.

BBBV

This sheet lists all of the volume groups present at the start of the collection run.

CPUnnn

These sheets show %usr, %sys, %wait and %idle by time of day for each logical processor.  Note that for micropartitions the Idle% and Wait% figures will include times when the physical processor was ceded to the shared pool.

NMON_Analyser generates a graph and a column headed 「CPU%」 containing the sum of %usr and %sys for use on the CPU_SUMM sheet.   The Analyser also adds blank intervals for CPUs that are varied online during the collection interval so that the graphs cover the entire collection period.   

 

If REORDER is set to YES on the Control sheet the Analyser will move all CPUnn sheets to the end of the file.

CPU_ALL

This sheet shows the average utilisation for all physical threads by time of day.   Note that for micro-partitions CPU% is a measure of utilisation vs the entitlement but is adjusted by libperfstat so that it never exceeds 100%; this makes the numbers virtually useless for analysis of uncapped partitions and you may choose to look at the charts on the LPAR sheet instead.  Note also that micro-partitions generally record very little Idle% or Wait% because they will normally cede their timeslice to other LPARs rather than waste CPU time waiting for work.    If the CPUmax value is lower than the number of physical threads in use then the graph title will show the number of threads excluded.

 

The second graph shows the number of active CPUs by time of day and is useful in determining whether CPUs have been varied on/off during the collection period.  If the system has SMT enabled then the Analyser shows logical CPUs rather than physical CPUs and the legend reports 「SMT=ON」.

CPU_SUMM

The Analyser generates this sheet from data on the CPUnnn sheets.   It gives a breakdown of CPU Utilisation by thread (logical processor) and by core over the collection period.   The chart can be very useful in identifying situations in which the system is thread-starved (i.e. too few threads to fully utilise the logical processors) or where the workload is dominated by a small number of  single-threaded processes.

 

Note that if CPUs have been dynamically reconfigured during the collection period, these figures reflect only those intervals when the CPU was varied on.

General notes for DISK, ESS, EMC, FASt and DG sheets

The DISK sheets record device statistics for each hdisk in the system.   If there are more than  hdisks in the system (see "Collecting data using NMON" above) NMON generates multiple output sections.   DISKBUSY will therefore contain device busy statistics for the first group of hdisks, DISKBUSY1 for the next, DISKBUSY2 for the next and so on.  

 

The Analyser will normally sort the contents of the sheet in ascending sequence using the weighted average values as a sort key.  However, if a storage subsystem is detected then sorting of the DISK sheets is disabled and the ESS/EMC/FASt/DG sheet contents are sorted instead.   

 

If the number of hdisks (or vpaths) on the sheet exceeds the value specified forTOPDISKS (see 「Analyser options」 above), the graph will only show information for the specified number of disks and a warning will appear in the graph title.

 

If REORDER is set to YES on the Control sheet the Analyser will move all but the DISKBUSY, DISK_SUMM and DISKSERV sheets to the end of the file when a storage subsystem or Disk Group is detected.   The rationale behind this being that the ESS/EMC/FASt/DG sheets contain the most useful data and that the DISK sheets merely replicate it.   However, the DISKBUSY sheet is useful for checking the activity on system disks (normally local) and the DISK_SUMM sheet gives total data rates for the system (local + subsystem disks).

 Note that NMON and NMON_Analyser can only handle a maximum of 250 vpaths in a system.  If you have more than this then you need to use the Disk Groups feature of NMON to select the vpaths that are of primary interest.

EMC/PowerPath subsystems

NMON_Analyser detects the presence of an EMC/PowerPath subsystem by scanning the input file for the string 「hdiskpower」 before starting the analysis.   EMC PowerPath creates devices called 「hdiskpowern」 which each map to multiple hdisks.   They are therefore comparable to the vpaths generated by ESS/SDD.   However, unlike vpaths, hdiskpower devices appear to AIX (and, therefore NMON) as real disks and NMON therefore records their activity on the DISK sheets.   

 

NMON_Analyser removes all hdiskpower entries from DISK sheets and moves them to new sheets beginning with the letters EMC.   For example, hdiskpower entries found on DISKBUSY1 are simply moved to a new sheet called EMCBUSY1.  If a DISK sheet contains only hdiskpower devices, that sheet is simply renamed.  

 

All device statistics reported by the Analyser (e.g. System I/O rates) are correct and as NMON excludes hdiskpower activity from the IOADAPT statistics these figures are also correct.  

FAStT subsystems

NMON_Analyser detects the presence of a FAStT subsystem by scanning the input file for the string 「dac」 before starting the analysis.  They are handled in the same way as EMC/PowerPath subsystems except that the dac devices are moved to sheets with names beginning "FASt".

Note that the Analyser is not able to correctly handle systems having both EMC and FASt subsystems.

DGBUSY

This sheet records the average value of device busy for each hdisk in the NMON Disk Group.

DGREAD

This sheet records the average data rate (Kbytes/sec) for read operations to each NMON Disk Group.

DGSIZE

This sheet records the average data transfer size (block size), in Kbytes, for read/write operations to each NMON Disk Group.  

DGWRITE

This sheet records the average data rate (Kbytes/sec) for write operations to each NMON Disk Group.

DGXFER

This sheet records the total I/O operations per second to each NMON Disk Group.

DISKBSIZE

These sheets record the average data transfer size (block size), in Kbytes, for read/write operations on each hdisk in the system.  If this number is not very close to the stripe size for the device there may be a problem that could be solved by increasing the value of numclust.

DISKBUSY

These sheets record device busy for each hdisk in the system.  This is the same as the %tm_act value recorded by iostat. Note that if this sheet contains all zero values then it means you forgot to enable iostat collection before starting nmon:

chdev -l sys0 -a iostat=true

DISKREAD

These sheets record the data rate (Kbytes/sec) for read operations on each hdisk in the system.

DISKSERV

These sheets record the service times (in milliseconds) for read/write transfers to each hdisk in the system.

DISKWAIT

These sheets record the queue times (in milliseconds) for read/write transfers to each hdisk in the system.

DISKWRITE

These sheets record the data rate (Kbytes/sec) for write operations on each hdisk in the system.

DISKXFER

These sheets record the I/O operations per second for each hdisk in the system.   This is the same as the tps value recorded by iostat.

DISK_SUMM

The Analyser creates this sheet.   It shows the total data rates (reads and writes) in Kbytes/sec plus total I/O rates for all hdisks in the system.    The figures on this sheet are accurate for all systems including ESS, EMC, FASt and HDS configurations.   

 

These data are displayed on the chart; the IO/sec data are also graphically displayed on the AAA sheet.

DONATE

This sheet records physical processor usage and donation to the shared pool.  Only present for dedicated LPARs running on  POWER6 systems.

EMCBSIZE/FAStBSIZE

This sheet records the average data transfer size (blocksize), in Kbytes, for read/write operations to each esoteric device in a system using EMC/PowerPath or FAStT.

EMCBUSY/FAStBUSY

These sheets record device busy for each esoteric device in a system using EMC/PowerPath or FAStT.  

EMCREAD/FAStREAD

These sheets record the data rate (Kbytes/sec) for read operations to each esoteric device in a system using EMC/PowerPath or FAStT.

EMCWRITE/FAStWRITE

This sheet records the data rate (Kbytes/sec) for write operations to each esoteric device in a system using EMC/PowerPath or FAStT.

EMCXFER/FAStXFER

These sheets record the I/O operations per second to each esoteric device in a system using EMC/PowerPath or FAStT.

EMCSERV/FAStSERV

The Analyser creates this sheet.   It shows estimated services times (not response times) for each esoteric device over the collection interval.   The service time is derived from the device busy and the transfer rate taken from the corresponding BUSY and XFER sheets.   Intervals where the transfer rate is below SVCXLIM are ignored in order to improve the accuracy of the estimate.

ERROR

This sheet shows all errors reported by nmon during the collection run.   FIRST/LAST intervals are ignored for this sheet.   Timestamps are not converted to time values.

ESSBSIZE

This sheet is only present if ESS is set to YES and records the average data transfer size (blocksize), in Kbytes, for read/write operations on each vpath in the system. The data on this sheet are calculated by NMON_Analyser as the average of the DISKBSIZE values for component hdisks as recorded on the BBBE sheet.

ESSBUSY

This sheet is only present if ESS is set to YES and records device busy for each vpath in a system using ESS.  The data on this sheet are calculated by NMON_Analyser as the average of the DISKBUSY values for component hdisks as recorded on the BBBE sheet.  

E***EAD

This sheet records the data rate (Kbytes/sec) for read operations on each vpath in the system.  This information is provided by NMON.

ESSWRITE

This sheet records the data rate (Kbytes/sec) for write operations on each vpath in the system.  This information is provided by NMON.

ESSXFER

These sheets record the I/O operations per second for each vpath in the system.   This information is provided by NMON.

FCREAD

This sheet records the data rate (MBytes/sec) for read operations on each Fibre Channel adapter in the system.  

FCWRITE

This sheet records the data rate (MBytes/sec) for write operations on each Fibre Channel adapter in the system.

FCXFERIN

This sheet records the read operations per second for each Fibre Channel adapter in the system.

FCXFEROUT

This sheet records the write operations per second for each Fibre Channel adapter in the system.   

FILE

This sheet contains a subset of the fields reported by NMON on the Kernel Internal Statistics panel.   These are the same values as reported by the sar command.

 

All fields are rates/sec.

igettranslations of i-node numbers to pointers to the i-node structure of a file or device. This is reported as iget/s by the sar -a command. Calls to iget occur when a call to to namei has failed to find a pointer in the i-node cache. This figure should therefore be as close to 0 as possible.  

nameicalls to the directory search routine that finds the address of a v-node given a path name. This is reported as lookuppn/s by the sar -a command.

dirblknumber of 512-byte blocks read by the directory search routine to locate a directory entry for a specific file. This is reported as dirblk/s by the sar -a command.

readchcharacters transferred by read system call. This is reported as rchar/s by the sar -c command

writechcharacters transferred by write system call. This is reported as wchar/s by the sar -c command.

ttyrawchtty input queue characters. This is reported as rawch/s by the sar -y command.

ttycanchtty canonical input queue characters. This field is always 0 (zero) for AIX Version 4 and later versions.

ttyoutchtty output queue characters. This is reported as outch/s by the sar -y command

 

NMON_Analyser produces two graphs - one showing rates/sec for readch and writech by time of day and one showing rates/sec for iget, namei and dirblk.  

FRCA

This sheet is only generated if FRCA is loaded on the target system. NMON_Analyser produces a graph showing the cache hit ratio (as a percentage).  If FRCA is not loaded, NMON generates no data and the Analyser consigns the redundant header record to the StrayLines sheet.

IOADAPT

For each I/O adapter listed on the BBBC sheet, contains the data rates for both read and write operations (Kbytes/sec) and total number of I/O operations performed. On AIX 5.1 and later, this information is reported by the iostat -A command. NMON_Analyser reorders the columns on the sheet for easier graphing.

 

The Analyser generates three graphs.   Note the area charts can be easily converted to line charts if required.   Simply right click on the white space within the chart area, then select Chart Type>Line>OK.

IP

This sheet only appears for topasout.  

JFSFILE

For each file system, this sheet shows what percentage of the space allocation is being used during each interval.  These figures are the same as the %Used value reported by the df command. The column headings show the mount point; sheet BBBC can be used to cross-reference to the file system/LV.  

JFSINODE

For each file system, this sheet shows what percentage of the Inode allocation is being used during each interval.  These figures are the same as the %Iused value reported by the df command. The column headings show the mount point; sheet BBBC can be used to cross-reference to the file system/LV.

LAN

This sheet only appears for topasout.  

LARGEPAGE

The graph shows Usedpages and Freepages over time.

 

Columns on the sheet are as follows:

 

Freepagesthe number of large pages on the free list.

Usedpagesthe number of large pages currently in use.

Pagesthe number of large pages in the pool.

HighWaterthe maximum number of pages used since the last reboot.

SizeMBthe size of a large page in Mbytes.

 

LPAR

The first  graph shows the number of physical processors used by this partition vs the entitlement.   For an uncapped partition the number of physical processors may exceed the entitlement but can never exceed the number of virtual processors allocated to the partition.   For AIX the graph also shows the number of unfolded virtual processors (AIX will 「fold」 - stop dispatching work to - excess processors in order to minimise scheduling costs).

 

Note that the ratio of physical processor to entitlement (shown as %entc in the output of the lparstat command) will generally be higher than CPU% on the CPU_ALL sheet.   The reason for this is that a partition that is within its entitlement may wait for a short period of time before ceding a processor that enters an I/O wait or becomes idle.   This can eliminate unnecessary context switches.

 

The second graph shows CPU utilisation as a percentage of virtual processors – for AIX this is broken down in to usr%, sys% and wait%.   This level of detail is not available for Linux or releases of NMON prior to version 12.

 

The third graph is only present for AIX systems and shows cpu utilisation of the shared pool by this and other partitions.   The area marked 「UsedPoolCPU%」 represents the percentage of the shared pool that has been used by this partition, while the area marked 「other% 「 represents the percentage used by all other partitions.   Note that if the partition is not authorised to see utilisation of the shared pool then the pool will appear to be 100% utilised.

 

Columns on the sheet are as follows:

PhysicalCPUphysical cores consumed by AIX

VirtualCPUsnumber of Virtual CPUs allocated to the LPAR

logicalCPUsnumber of threads (i.e. Virtual CPUs multiplied by SMT mode)

poolCPUsnumber of cores in the pool that this LPAR occupies

entitled the number of cores guaranteed to be available to this LPAR

weightthe priority of this LPAR when competing for unused CPU cycles

PoolIdle  the number of  unused cores in the Pool

usedAllCPU%  percentage of active cores in the machine that this LPAR is using

usedPoolCPU  percentage of cores in the pool that this LPAR is using

SharedCPU  1 if the LPAR is not a dedicated CPU

Capped  1 if the LPAR is capped

EC_User%  percentage of Entitlement used in User mode

EC_Sys%  percentage of Entitlement used in System mode

EC_Wait%  percentage of Entitlement waiting for I/O

EC_Idle%  percentage of Entitlement used in Idle mode

VP_User%  percentage of Virtual CPU used in User mode

VP_Sys%  percentage of Virtual CPU used in System mode

VP_Wait%  percentage of Virtual CPU waiting for I/O

VP_Idle%  percentage of Virtual CPU spent in Idle mode

Folded  no. of Virtual CPUs unused for efficiency reasons

CPU_Pool_id  the Id of the CPU pool, useful if there are multiple pools

MEM

The main graph on this sheet shows the amount of Real Free memory in Mbytes by time of day.  This would be the same as dividing the fre values reported by vmstat over the same interval by 256. The small graph shows the amount of real memory. This is useful in determining if dynamic reconfiguration has been used during the collection period.  

 

For AIX, other columns on the sheet are as follows:

 

Real Freethe percentage of real pages on the free list.

Virtual Freethe percentage of unallocated virtual slots on the paging spaces.

Real Free (MB)the amount of memory on the free list in Mbytes.

Virtual Free (MB)the amount of unallocated space on the paging spaces.

Real Total (MB)the total amount of memory available to AIX.

Virtual Total (MB)the total amount of space allocated for paging spaces.  

 

Note: you can calculate the amount of memory used during an interval simply by subtracting the Real Free (MB) value from the Real Total (MB) value.    This will, however, include file pages.   The graph on the MEMUSE sheet gives a more accurate assessment of memory used by programs (computational pages).

MEMUSE

Except for %comp, the values on this sheet are the same as would be reported by the vmtune command.  

 

%numpermthe percentage of real memory allocated to file pages.

%minpermvalue specified on the vmtune command or system default of 20%. This will normally be constant for the run unless the vmtune or rmss commands are used during collection.

%maxpermvalue specified on the vmtune command or system default of 80%. This will normally be constant for the run unless the vmtune or rmss commands are used during collection.

minfreethe minimum number of pages AIX is to keep on the free list.  Specified on the vmtune command or system default of maxfree - 8.

maxfreethe maximum number of pages AIX is steal in order to replenish the free list.  Specified on the vmtune command or system default.

%compThe percentage of real memory allocated to computational pages. NMON_Analyser calculates this value.  Computational pages are those backed by page space and include working storage and program text segments.   They exclude data, executable and shared library files.

 

The Analyser generates two graphs.  The first shows the split between computational and file pages by time of day.  The second plots the values of%numperm, %minperm, %maxperm and %comp.

  

If %numperm falls below %minperm then computational pages will be stolen.   If%numperm rises above %maxperm then computational pages cannot be stolen.  Low values for both %minperm and %maxperm indicate that the system has been tuned for a database server.   You may also want to check the setting of STRICT_MAXPERM on the BBBP sheet (if present).

MEMNEW

The graph shows the allocation of memory split into the three major categories: pages used by user processes, file system cache, and pages used by the system (kernel).

 

Process%the percentage of real memory allocated to user processes

FSCache%the percentage of real memory allocated to file system cache

System%the percentage of real memory used by system segments

Free%the percentage of unallocated real memory

User%the percentage of real memory used by non-system segments

MEMPAGES4K/64K/16MB/16GB

These sheets are only present for AIX.  They show various statistics for different page sizes in use within the system.    The Analyser will delete the sheets for pages sizes not currently in use.

 

For the MEMPAGES64KB sheet a graph is drawn showing the use of  both 4KB and 64KB pages within the system – starting with AIX V6.1 (and supporting hardware) these page sizes are selected dynamically based on memory access patterns.

MEMREAL

This sheet only appears for topasout. The Analyser adds a column showing Real Free memory in Mbytes.   

MEMVIRT

This sheet only appears for topasout.

NET 

This sheet shows the data rates, in Kbytes/sec, for each network adapter in the system (including SP switch if present).  This is the same as produced by the netpmon –O dd command.  NMON_Analyser adds one column for each adapter showing the total data rate (read + write) and two columns showing Total Read and Total Write.  Note that the Total Write is calculated as a negative number for graphing.  

 

The Analyser generates three graphs.   The first graph shows total network traffic broken down as Total-Read and Total-Write.  The writes are shown below the X-Axis.

 

Note the area chart can be easily converted to a line chart if required.   Simply right click on the white space within the chart area, then select Chart Type>Line>OK.

NETPACKET

This sheet shows the number of read/write network packets for each adapter. This is the same as produced by the netpmon –O dd command.   

NETSIZE

This sheet shows the average packet size in bytes for each network adapter in the system.

NFS sheets

There are separate sheets for NFS2, NFS3 and NF4 client/server.    The Analyser will delete empty sheets.   

PAGE

This sheet has the paging statistics as recorded by NMON.

 

faultsthe number of page faults per second. This is not a count of page faults that generate I/O, because some page faults can be resolved without I/O.

pginthe total rate/sec of page-in operations to both paging space and file systems during the interval.

pgoutthe total rate/sec of page-out operations to both paging space and file systems during the interval. 

pgsinthe rate/sec of page-in operations from paging space during the interval.  This is the same as the pi value reported by vmstat. If pgsin is consistently higher than pgsout this may indicate thrashing.   

pgsoutthe rate/sec of page-out operations to paging space during the interval. This is the same as the po value reported by vmstat.

reclaimsfrom NMON 10 onwards this field is the same as the fr value reported by vmstat and represents the number of pages/sec freed by the page-replacement routine.

scansthe number of pages/sec examined by the page replacement routine.  This is the same as the sr value reported by vmstat. Page replacement is initiated when the number of free pages falls below minfree and stops when the number of free pages exceeds maxfree.

cyclesthe number of times/sec the page replacement routine had to scan the entire Page Frame Table in order to replenish the free list.  This is the same as the cy value reported by vmstat but note that vmstat reports this number as an integer whereas nmon reports it as a real number.

fsincalculated by the Analyser as pgin-pgsin for graphing

fsoutcalculated by the Analyser as pgout-pgsout for graphing

sr/frcalculated by the Analyser as scans/reclaims for graphing

 

NMON_Analyser produces two graphs.   The first shows paging operations to/from paging space.  The ideal here would be no more than 5 operations/sec per page space (see the BBBC sheet for details).   The second graph shows the scan:free rate.   Memory may be over-committed when this figure is >4 although you also need to examine the MEM and PAGE sheets as well.

POOLS

This sheet contains information about the shared pool in which the LPAR is running.   Most of the data will only be present if  「Allow performance information collection.」 is set in the LPAR properties.

  

shcpus_in_systhe number of cores allocated to the global shared pool

max_pool_capacitythe maximum number of VPs defined for this pool

entitled_pool_capacitythe entitlement for this pool (includes reserve entitlement)

pool_max_timesame as max_pool_capacity but may vary in value if the pool definition is changed during the collection run.

pool_busy_timethe average number of cores in use by this shared pool during the interval

shcpu_tot_timethe average number of cores available to the global shared pool (including shared dedicated resources?) during the interval

shcpu_busy_timethe average number of core in use within the global shared pool

pool_idthis identifier of this pool

entitledthe entitlement of this LPAR

PROC

This sheet contains a subset of the fields reported by NMON on the Kernel Internal Statistics panel.  The RunQueue and Swap-in fields are average values for the interval. All other fields are rates/sec:

  

RunQueuethe average number of kernel threads in the run queue. This is reported asrunq-sz by the sar -q command and is reported as RunQueue on the nmon Kernel Internal Statistics panel.   A value that exceeds 3x the number of CPUs may indicate CPU constraint.

Swap-inthe average number of kernel threads waiting to be paged in. This is reported as swpq-sz by the sar -q command.

pswitchthe number of context switches. This is reported as pswch/s by the sar -w command.

syscallthe total number of system calls. This is reported as scall/s by the sar -c command.

readthe number of read system calls. This is reported as sread/s reported by the sar -c command.

writethe number of write system calls. This is reported as swrit/s by the sar -c command.

forkthe number of fork system calls. This is reported as fork/s by the sar -c command.  

execthe number of exec system calls. This is reported as exec/s by the sar -c command.

rcvintthe number of tty receive interrupts. This is reported as revin/s by the sar -y command.

xmtintthe number of tty transmit interrupts. This is reported as xmtin/s by the sar -y command.

semthe number of IPC semaphore primitives (creating, using and destroying). This is reported as sema/s by the sar -m command.

msgthe number of IPC message primitives (sending and receiving). This is reported as msg/s by the sar -m command.

 

NMON_Analyser produces three graphs - one showing the average length of theRunQueue and the number of swap-ins by time of day, another showing rates/sec forpswitch and syscalls by time of day and a third showing rates/sec for forks andexecs.  

 

The graph for forks/execs can be useful when monitoring web server systems.  

PROCAIO

This sheet contains information about the number of asynchronous I/O processes available and active (i.e. using more than 0.1% of the CPU) .     It also shows the amount of CPU being used by the AIO processes during the collection interval.

 

Two graphs are produced.   The second uses two y-axes.   The number of running aio processes is shown against the first axis and the amount of cpu used is shown against the second.

RAWCPUTOTAL, RAWLPAR

These sheets contain a dump of various counters such as context switches and phantom interrupts.  

TCPUDP

This sheet only appears for topasout.

TOP

This sheet is only generated if you specify the -t  flag on the NMON command line.  The output is similar to that produced using the ps v command.   Note that, because of the limitation of having only 65,000 lines on a single sheet, some data may be omitted for very large files and this may mean that entire PIDs or even commands may be missing from the analysis.

 

Note that data are only present for processes that consumed a significant amount of CPU during an interval.  The TOP sheet does not represent a complete view of the system.

 

NMON_Analyser does the following:

? reorders the columns for easier processing.   

? Sorts the data on the sheet into COMMAND name order - using TIME as a minor sort key.  

? Creates a table at the end of the sheet summarising the data by command name and used for graphing.

 

You can see the detail section by scrolling to the top of the sheet.   The summary table is largely obscured by the graphs and so you will need to move (or delete) them for easier viewing.

 

PIDin the detail section this is the process ID of a specific invocation of a command.  In the summary table this is the command name.

%CPUin the detail section this is the utilisation of a single processor (rather than of the system) by that PID during the interval. In the summary table this is the average amount of CPU used by all invocations of the command during the collection period.

%Usrin the detail section this is the average amount of User-mode CPU used by that PID during the interval.

%Sysin the detail section this is the average amount of Kernel-mode CPU used by that PID during the interval.

Threadsthe number of (software) threads being used by this command.

Sizethe average amount of paging space (in Kbytes) allocated for the data section (private segment + shared library data pages) for one invocation of this command.  This is the same as the SIZE figure on the ps v command.  Note that if Size is greater than ResData it means some working segment pages are currently paged out.

ResTextthe average amount of real memory (in Kbytes) used for the code segments of one invocation of this command.   Note that multiple concurrent invocations will normally share these pages.

ResDatathe average amount of real memory (in Kbytes) used for the data segments of one invocation of this command.  A method of calculating real memory usage for a command is ResText + (ResData * N).

CharIOthis is the count of bytes/sec being passed via the read and write system calls. The bulk of this is reading and writing to disks but also includes data to/from terminals, sockets and pipes. Use this to work out which processes are doing the I/O.

%RAMthis is an indication of what percentage of real memory this command is using. This is (ResText + ResData) / Real Mem; it is the same as the %MEM value on the

ps v command.    Due to rounding/truncation, and the large amounts of memory in modern systems, this is usually 0.  

Pagingsum of all page faults for this process.  Use this to identify which process is causing paging but note that the figure includes asynchronous I/O and can be misleading.

Commandname of the command

WLMClassname of the Workload Partition or Workload Manager superclass to which this command has been allocated (64-bit kernel only).   

IntervalCPUgenerated by the Analyser.   In the detail section this shows the total amount of CPU used by all invocations of a command in the time interval.  It is calculated as the sum of CPU used by all PIDS running the same command divided by the number of active processors (physical cores) available during the interval. In the summary section this is broken down as Average, Weighted Average and Maximum and is used to generate the graph.

WSetgenerated by the Analyser.   In the detail section this shows the total amount of memory used by all invocations of a command recorded in the time interval.  It is calculated as ResText + (ResData * N) (where 「N」 is the number of copies of this command running concurrently during the interval).  In the summary section this is broken down as Minimum, Average and Maximum and is used to generate the graph.

Usergenerated by the Analyser if a UARG sheet is present.  This contains the name of the user running the process.  

Arggenerated by the Analyser if a UARG sheet is present.  This contains the complete argument string entered for the command.

 

The Analyser generates four graphs using data in the generated table:

? A graph showing Average, Weighted Average and Maximum CPU Utilisation by command

? A graph showing Minimum, Average and Maximum Memory Utilisation by command

? A graph showing Average, Weighted Average and Maximum CHARIO by command

? A graph showing the CPU utilisation for each PID for each interval as a scatter chart.  Note that this chart is only produced if there are fewer than 32,000 lines on the TOP sheet. See below for notes on interpreting this chart.

Interpreting the %Processor by PID chart

The purpose of the chart is to provide a link to the UARG sheet so that you can discover precisely which invocation of a command was responsible for using the CPU. It shows the processor utilisation (utilisation of a single CPU) by each PID captured on the TOP sheet.  Note that a process can use more than 100% of a single CPU if it is multi-threaded.   

 

Active PIDs will create a cluster of points on the chart.    The highest point will show the maximum amount of CPU used during any one snapshot.   To find out which PID a point refers to, move the mouse to position the cursor above it and Excel will display a coordinate pair.   The first coordinate is the PID – use this to refer to the UARG sheet to find precisely which command was being executed.

UARG

This sheet has the first 1,000 commands executed during the collection period.   The commands are listed in time order.   Note that commands appearing in the first interval may have been executing prior to the start of the collection.

 

PIDthe process ID of a specific invocation of a command

PPIDthe parent process ID

COMMthe command being executed

THCOUNTthe number of threads started by this process

USERthe name of the user running this process

GROUPthe name of the group to which the user belongs

FullCommand the full command string entered by the user

VM

This sheet is only present for Linux systems and contains a dump of the /proc/vmstat file values.

The two graphs shows file-backed paging (pgpgin/pgpgout) and swap space activity.

WPAR sheets

These sheets are only present for AIX V6 and record data for each Workload Partition in the system.    They are not currently generated by the topas version of nmon.

WLM sheets

Contain details of CPU, Memory and I/O bandwidth used by each Superclass/Subclass defined to WLM during the collection run.   The Analyser will extract subclasss data and create a new set of sheets for each class with more than one subclass.  These sheets will be named 「WLMCPU.class」 etc.  

If this is a Micro-partition then the Analyser will also create a set of WLMPCPU sheets which will show the physical processor utilisation rather than %CPU utilisation.

ZZZZ

The Analyser uses the information on this sheet to automatically convert all NMON time stamps to actual time of day for easier analysis. For NMON10 or later a column is added which contains the date and time as a single value and this is used for the conversions.  The number of rows on this sheet is used by the Analyser to reset the 「snapshots」 value on the AAA sheet in case the nmon run was terminated with SIGUSR2.

 Error Handling

Error handling in NMON_Analyser is rudimentary.    The Analyser can handle many input file errors, but occasionally the analysis will halt leaving you staring at a dialog box.  Should this happen, please accept my apologies.  However, before sending me a copy of the input file, please read the following:

Common problems

The most commonly reported problems arise from invalid input files.   We also get problems reported where, for whatever reason, lines have been truncated, split or even duplicated.   NMON_Analyser attempts to trap these errors and will report them on the 「StrayLines」 sheet.  Check this sheet if the run stops unexpectedly.

 

? 'No valid input! NMON run may have failed.'

 

The most common cause of this message is that the NMON run failed and there really is no valid input.   NMON initialises the output file by writing all of the section headings.   If it subsequently fails, you will get an output file that consists purely of headings  - with no data.   Check the file by loading it into a word processor or, indeed, a spreadsheet (as a .csv file) before you send it to me.

 

? 'Unexpected end of file.'

 

This is only reported when processing files containing more  lines than can be stored on one sheet and when SORTINP is set to NO. It is usually caused by the fact that lines  are being terminated with a CR character instead of CRLF.  Change your FTP option to ASCII  or TEXT when sending the file to your PC.  This problem only shows up with large files because the Analyser uses a different technique to read them than that used to read smaller files.

 

Strangely, one of the most common problems I get is caused by the fact that the Analyser can’t handle files with a single data interval  very well.   If you want to test the package just let NMON run for a few minutes to get a reasonable data sample!

Known bugs/problems (V3.4, topas_nmon)

? When analysing systems with a very large number of disks, Excel can stop with error 「No more fonts can be added」 or 「Insufficient Resources」.   Set GRAPHS toLIST and select only those sheets you are interested in using the LIST option on the Settings sheet.

? The data can be sorted incorrectly with some versions of nmon (notably 14g) which generate variable length timestamps.    Process each file separately and specify MERGE=YES to get the data sorted correctly.

? The PIVOT option does not seem to work with Excel 2007 or later

? The Analyser will crash if you edit the .csv file using Excel prior to the run and the file contains a TOP section.   If you need to edit the input file, use a word processor.

? There are some issues with processing files from systems having both ESS and EMC subsystems attached.  

? When analysing ESS subsystems with more than 253 vpaths some vpath data will be missing from the output.  Use NMON Disk Groups to combine several vpaths into a single unit for reporting and use the –E flag to prevent the ESS sections from being produced.    Alternatively, set ESS to NO on the Control sheet and ignore warning messages about data truncation – only the first 253 vpaths will appear in the output.

How to report a problem 

Post the relevant information on the nmon forum (see the link in the introduction).  It can help to include a copy of the original, unmodified .nmon input file, plus the incomplete output spreadsheet, as a compressed (zipped) file but please consider your system security before doing so.

 

If you have the ability to capture a screenshot then a copy of any dialog boxes also proves useful on occasion.

 

Note: Development is currently done on Microsoft Excel 2003 (11.8320.8221) SP3.  It may not be possible to fix problems arising from the use of different releases.

Excel/VBA Resources/Links

This is a good source for Excel tips and it has some VBA examples as well:

http://exceltips.vitalnews.com/

 

This is the home of an excellent reference book:

http://www.exceltip.com/

 

And of course

http://www.microsoft.com


Appendix: Notes on Batch Operation

If you regularly process large numbers of files, the operation of NMON_Analyser can be completely automated.    Simply create a text file containing a list of nmon file names (using wild card characters as appropriate) and enter the name of this file into the FILELIST field of the Analyser control sheet.    Specify the name of an existing directory in the OUTDIR field if you want all of the output files to end up in one place.   Save the NMON_Analyser spreadsheet under a new name (this is recommended so that you can still use NMON_Analyser for interactive sessions).  

Now create Windows .bat files to invoke Excel (see the samples below).   

 

After processing the last input file, the Analyser will automatically close down Excel.   Note, however, that this only happens if you load a copy of the Analyser that has a saved FILELIST name and if there are no other open spreadsheets.   This allows you to use the FILELIST option safely during an interactive session.

Sample .bat files

These sample batch control files are designed to use the pscp file from the PuTTY suite written and maintained by Simon Tatham athttp://www.chiark.greenend.org.uk/~sgtatham/putty/

My thanks to Jamie Dennis for providing them.

getcsv.bat

cd \NMON\RawData

del *.csv

d:\progra~1\putty\pscp -p -r -l userid host://Performance/NMON/Rawdata/*.csv .

analyser.bat

D:

cd \NMON\FinishedData

del d:\NMON\FinishedData\*.xls

"D:\NMON\nmon analyser batch.xls"

putxls.bat

D:

cd \NMON\FinishedData

d:\progra~1\putty\pscp -p -r -l userid *.xls host:/Performance/NMON/

Control.txt

d:\NMON\RawData\*.csv

NMON Analyser Batch.xls

OUTDIR

d:\NMON\FinishedData\

FILELIST

d:\NMON\FinishedData\Control.txt

相關文章
相關標籤/搜索