tsar使用說明

時間 2019-12-20

標籤 tsar 使用說明简体版

原文原文鏈接

經常使用命令 tsar --nginx --live -i 1 查詢1秒的狀態每秒採樣一次系統模塊 cpu 字段含義ios

user: 表示CPU執行用戶進程的時間,一般指望用戶空間CPU越高越好.
sys: 表示CPU在內核運行時間,系統CPU佔用率高,代表系統某部分存在瓶頸.一般值越低越好.
wait: CPU在等待I/O操做完成所花費的時間.系統不該該花費大量時間來等待I/O操做,不然就說明I/O存在瓶頸.
hirq: 系統處理硬中斷所花費的時間百分比
sirq: 系統處理軟中斷所花費的時間百分比
util: CPU總使用的時間百分比
nice: 系統調整進程優先級所花費的時間百分比
steal: 被強制等待（involuntary wait）虛擬CPU的時間,此時hypervisor在爲另外一個虛擬處理器服務
ncpu: CPU的總個數

採集方式nginx

CPU的佔用率計算,都是根據/proc/stat計數器文件而來,stat文件的內容基本格式是:git

cpu 67793686 1353560 66172807 4167536491 2705057 0 195975 609768 cpu0 10529517 944309 11652564 835725059 2150687 0 74605 196726 cpu1 14380773 127146 13908869 832565666 150815 0 31780 108418github

cpu是總的信息,cpu0,cpu1等是各個具體cpu的信息,共有8個值,單位是ticks,分別是算法

User time, 67793686 Nice time, 1353560 System time, 66172807 Idle time, 4167536491 Waiting time, 2705057 Hard Irq time, 0 SoftIRQ time, 195975 Steal time, 609768apache

CPU總時間=user+system+nice+idle+iowait+irq+softirq+Stl
各個狀態的佔用=狀態的cpu時間％CPU總時間＊100%
比較特殊的是CPU總使用率的計算(util),目前的算法是:
util = 1 - idle - iowait - stealswift

mem 字段含義cookie

free: 空閒的物理內存的大小
used: 已經使用的內存大小
buff: buff使用的內存大小,buffer is something that has yet to be "written" to disk.
cach: 操做系統會把常常訪問的東西放在cache中加快執行速度,A cache is something that has been "read" from the disk and stored for later use
total: 系統總的內存大小
util: 內存使用率

採集方法app

內存的計數器在/proc/meminfo,裏面有一些關鍵項dom

MemTotal: 7680000 kB MemFree: 815652 kB Buffers: 1004824 kB Cached: 4922556 kB

含義就不解釋了,主要介紹一下內存使用率的計算算法:
util = (total - free - buff - cache) / total * 100%

load 字段含義

load1: 一分鐘的系統平均負載
load5: 五分鐘的系統平均負載
load15:十五分鐘的系統平均負載
runq: 在採樣時刻,運行隊列的任務的數目,與/proc/stat的procs_running表示相贊成思
plit: 在採樣時刻,系統中活躍的任務的個數（不包括運行已經結束的任務）

採集方法

/proc/loadavg文件中保存的有負載相關的數據
0.00 0.01 0.00 1/271 23741
分別是1分鐘負載,五分鐘負載,十五分鐘負載,運行進程／總進程最大的pid
只須要採集前五個數據既可獲得全部信息
注意:只有當系統負載除cpu核數>1的時候,系統負載較高

traffic 字段含義

bytin: 入口流量byte/s
bytout: 出口流量byte/s
pktin: 入口pkt/s
pktout: 出口pkt/s

採集方法

流量的計數器信息來自:/proc/net/dev

face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed lo:1291647853895 811582000 0 0 0 0 0 0 1291647853895 811582000 0 0 0 0 0 0 eth0:853633725380 1122575617 0 0 0 0 0 0 1254282827126 808083790 0 0 0 0 0 0

字段的含義第一行已經標示出來,每一行表明一個網卡,tsar主要採集的是出口和入口的bytes／packets
注意tsar只對以eth和em開頭的網卡數據進行了採集,像lo這種網卡直接就忽略掉了,流量的單位是byte

tcp 字段含義

active:主動打開的tcp鏈接數目
pasive:被動打開的tcp鏈接數目
iseg: 收到的tcp報文數目
outseg:發出的tcp報文數目
EstRes:Number of resets that have occurred at ESTABLISHED
AtmpFa:Number of failed connection attempts
CurrEs:當前狀態爲ESTABLISHED的tcp鏈接數
retran:系統的重傳率

採集方法

tcp的相關計數器文件是:/proc/net/snmp

Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts Tcp: 1 200 120000 -1 31702170 14416937 935062 772446 16 1846056224 1426620266 448823 0 5387732

咱們主要關注其中的ActiveOpens/PassiveOpens/AttemptFails/EstabResets/CurrEstab/InSegs/OutSegs/RetransSegs
主要關注一下重傳率的計算方式:
retran = (RetransSegs－last RetransSegs) ／ (OutSegs－last OutSegs) * 100%

udp 字段含義

idgm: 收到的udp報文數目
odgm: 發送的udp報文數目
noport:udp協議層接收到目的地址或目的端口不存在的數據包
idmerr:udp層接收到的無效數據包的個數

採集方法

UDP的數據來源文件和TCP同樣,也是在/proc/net/snmp

Udp: InDatagrams NoPorts InErrors OutDatagrams Udp: 31609577 10708119 0 159885874 io 字段含義

rrqms: The number of read requests merged per second that were issued to the device.
wrqms: The number of write requests merged per second that were issued to the device.
rs: The number of read requests that were issued to the device per second.
ws: The number of write requests that were issued to the device per second.
rsecs: The number of sectors read from the device per second.
wsecs: The number of sectors written to the device per second.
rqsize:The average size (in sectors) of the requests that were issued to the device.
qusize:The average queue length of the requests that were issued to the device.
await: The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
svctm: The average service time (in milliseconds) for I/O requests that were issued to the device.
util: Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device).Device saturation occurs when this value is close to 100%.

採集方法

IO的計數器文件是:/proc/diskstats,好比:

202 0 xvda 12645385 1235409 416827071 59607552 193111576 258112651 3679534806 657719704 0 37341324 717325100 202 1 xvda1 421 2203 3081 9888 155 63 421 1404 0 2608 11292

每一行字段的含義是:

major: 主設備號
minor: 次設備號,設備號是用來區分磁盤的類型和廠家信息
name: 設備名稱
rd_ios: 讀完成次數,number of issued reads. This is the total number of reads completed successfully
rd_merges: 合併讀完成次數,爲了效率可能會合並相鄰的讀和寫.從而兩次4K的讀在它最終被處理到磁盤上以前可能會變成一次8K的讀,才被計數（和排隊）,所以只有一次I/O操做
rd_sectors: 讀扇區的次數,number of sectors read. This is the total number of sectors read successfully.
rd_ticks: 讀花費的毫秒數,number of milliseconds spent reading. This is the total number of milliseconds spent by all reads
wr_ios: 寫完成次數,number of writes completed. This is the total number of writes completed successfully
wr_merges: 合併寫完成次數,number of writes merged Reads and writes which are adjacent to each other may be merged for efficiency. Thus two 4K reads may become one 8K read before it is ultimately handed to the disk, and so it will be counted (and queued) as only one I/O.
wr_sectors: 寫扇區次數,number of sectors written. This is the total number of sectors written successfully
wr_ticks: 寫花費的毫秒數,number of milliseconds spent writing. This is the total number of milliseconds spent by all writes.
cur_ios: 正在處理的輸入/輸出請求數,number of I/Os currently in progress. The only field that should go to zero. Incremented as requests are given to appropriate request_queue_t and decremented as they finish.
ticks: 輸入/輸出操做花費的毫秒數
aveq: 輸入/輸出操做花費的加權毫秒數

經過這些計數器能夠算出來上面的每一個字段的值

double n_ios = rd_ios + wr_ios; double n_ticks = rd_ticks + wr_ticks; double n_kbytes = (rd_sectors + wr_sectors) / 2; st_array[0] = rd_merges / (inter * 1.0); st_array[1] = wr_merges / (inter * 1.0); st_array[2] = rd_ios / (inter * 1.0); st_array[3] = wr_ios / (inter * 1.0); st_array[4] = rd_sectors / (inter * 2.0); st_array[5] = wr_sectors / (inter * 2.0); st_array[6] = n_ios ? n_kbytes / n_ios : 0.0; st_array[7] = aveq / (inter * 1000); st_array[8] = n_ios ? n_ticks / n_ios : 0.0; st_array[9] = n_ios ? ticks / n_ios : 0.0; st_array[10] = ticks / (inter * 10.0);

注意:

扇區通常都是512字節,所以有的地方除以2了 ws是指真正落到io設備上的寫次數, wrqpms是指系統調用合併的寫次數, 它們之間的大小關係沒有可比性,由於不知道多少請求可以被合併,好比發起了100個read系統調用,每一個讀4K,假如這100個都是連續的讀,因爲硬盤一般容許最大的request爲256KB,那麼block層會把這100個讀請求合併成2個request,一個256KB,另外一個144KB,rrqpm/s爲100,由於100個request都發生了合併,無論它最後合併成幾個；r/s爲2,由於最後的request數爲2

paritition 字段含義

bfree: 分區空閒的字節
bused: 分區使用中的字節
btotl: 分區總的大小
util: 分區使用率

採集方法

首先經過/etc/mtab獲取到分區信息,而後經過statfs訪問該分區的信息,查詢文件系統相關信息,包含:

struct statfs { long f_type; long f_bsiz e; long f_blocks; long f_bfree; long f_bavail; long f_files; long f_ffree; fsid_t f_fsid; long f_namelen; };

而後就能夠計算出tsar須要的信息,分區的字節數＝塊數＊塊大小＝f_blocks * f_bsize

pcsw 字段含義

cswch: 進程切換次數
proc: 新建的進程數

採集方法

計數器在/proc/stat:

ctxt 19873315174 processes 296444211

分別表明進程切換次數,以及進程數

tcpx 字段含義

recvq sendq est twait fwait1 fwait2 lisq lising lisove cnest ndrop edrop rdrop pdrop kdrop
分別表明
tcprecvq tcpsendq tcpest tcptimewait tcpfinwait1 tcpfinwait2 tcplistenq tcplistenincq tcplistenover tcpnconnest tcpnconndrop tcpembdrop tcprexmitdrop tcppersistdrop tcpkadrop

採集方法

計數器來自:/proc/net/netstat /proc/net/snmp 裏面用到的數據有:

TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSPassive PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPPrequeued TCPDirectCopyFromBacklog TCPDirectCopyFromPrequeue TCPPrequeueDropped TCPHPHits TCPHPHitsToUser TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLoss TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPForwardRetrans TCPSlowStartRetrans TCPTimeouts TCPRenoRecoveryFail TCPSackRecoveryFail TCPSchedulerFailed TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnSyn TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TcpExt: 0 0 0 80 539 0 0 0 0 0 3733709 51268 0 0 0 80 5583301 5966 104803 146887 146887 6500405 39465075 2562794034 0 689613557 2730596 540646233 234702206 0 44187 2066 94 240 0 114 293 1781 7221 60514 185158 2 2 3403 400 107505 5860 24813 174014 0 2966 7 168787 106151 40 32851 2 0 2180 9862 0 15999 0 0 0

具體字段找到而且獲取便可

percpu ncpu 字段含義

字段含義等同cpu模塊,只不過可以支持採集具體的每個cpu的信息

採集方法

等同於cpu模塊

pernic 字段含義

字段含義等同traffic模塊,只不過可以支持採集具體的每個網卡的信息

採集方法

等同於traffic模塊

應用模塊 proc 字段含義

user: 某個進程用戶態cpu消耗
sys: 某個進程系統態cpu消耗
total:某個進程總的cpu消耗
mem: 某個進程的內存消耗百分比
RSS: 某個進程的虛擬內存消耗,這是駐留在物理內存的一部分.它沒有交換到硬盤.它包括代碼,數據和棧
read: 進程io讀字節
write:進程的io寫字節

採集方法

計數器文件

/proc/pid/stat:獲取進程的cpu信息
/proc/pid/status:獲取進程的mem信息
/proc/pid/io:獲取進程的讀寫IO信息

注意,須要將採集的進程名稱配置在/etc/tsar/tsar.conf總的mod_proc on procname,這樣就會找到procname的pid,並進行數據採集

nginx 字段含義

Accept:總共接收的新鏈接數目
Handle:總共處理的鏈接數目
Reqs:總共產生請求數目
Active:活躍的鏈接數,等於read+write+wait
Read:讀取請求數據的鏈接數目
Write:向用戶寫響應數據的鏈接數目
Wait:長鏈接等待的鏈接數目
Qps:每秒處理的請求數
Rt:平均響應時間ms

採集方法

經過nginx的採集模塊配置,訪問特定地址,具體參見:https://github.com/taobao/tsar-mod_nginx

location = /nginx_status { stub_status on; }

請求到的數據是:

Active connections: 1 server accepts handled requests request_time 24 24 7 0 Reading: 0 Writing: 1 Waiting: 0

須要確保nginx配置該location,而且可以訪問curl http://localhost/nginx_status獲得上面的數據
若是nginx的端口不是80,則須要在配置文件中指定端口,配置文件是/etc/tsar/tsar.conf,修改mod_nginx on爲mod_nginx on 8080

相似的有nginx_code, nginx_domain模塊,相應的配置是:

req_status_zone server "$host" 20M; req_status server; location /traffic_status { req_status_show; }

經過訪問curl http://localhost/traffic_status可以獲得以下字段的數據
localhost,0,0,2,2,2,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0

請求到的數據每一個字段的含義是:

kv 計算獲得的req_status_zone指令定義變量的值,此時爲domain字段
bytes_in_total 從客戶端接收流量總和
bytes_out_total 發送到客戶端流量總和
conn_total 處理過的鏈接總數
req_total 處理過的總請求數
2xx 2xx請求的總數
3xx 3xx請求的總數
4xx 4xx請求的總數
5xx 5xx請求的總數
other 其餘請求的總數
rt_total rt的總數
upstream_req 須要訪問upstream的請求總數
upstream_rt 訪問upstream的總rt
upstream_tries upstram總訪問次數
200 200請求的總數
206 206請求的總數
302 302請求的總數
304 304請求的總數
403 403請求的總數
404 404請求的總數
416 416請求的總數
499 499請求的總數
500 500請求的總數
502 502請求的總數
503 503請求的總數
504 504請求的總數
508 508請求的總數
detail_other 非以上13種status code的請求總數

若是domain數量太多,或者端口不是80,須要進行專門的配置,配置文件內容以下:
port=8080 #指定nginx的端口
top=10 #指定最多采集的域名個數，按照請求總個數排列
domain=a.com b.com #指定特定須要採集的域名列表,分隔符爲空格,逗號,或者製表符
在/etc/tsar/tsar.conf中指定配置文件的路徑:mod_nginx_domain on /tmp/my.conf

squid 字段含義