注:該文原文是 Chapter 5. Useful SystemTap Scriptshtml
注:還未完成,先丟上來純粹是爲了測試新功能目錄結構滴。這個備註在文章完成後,會刪除滴。node
本章列舉了幾種能夠用來監測和調查不一樣的子系統的 SystemTap 腳本。一旦你安裝了 systemtap-testsuite
RPM 包,全部的這些腳本均可以在 /usr/share/systemtap/testsuite/systemtap.examples/
目錄下找到。redis
後面的章節展現了跟蹤網絡相關的函數和構建一個網絡活動的概要文件的腳本。網絡
本節描述瞭如何描述網絡活動,nettop.stp 提供了一個瞭解在每臺機器上每一個進程生成了多少網絡流量的機會。app
nettop.stpssh
#! /usr/bin/env stap global ifxmit, ifrecv global ifmerged probe netdev.transmit { ifxmit[pid(), dev_name, execname(), uid()] <<< length } probe netdev.receive { ifrecv[pid(), dev_name, execname(), uid()] <<< length } function print_activity() { printf("%5s %5s %-7s %7s %7s %7s %7s %-15s\n", "PID", "UID", "DEV", "XMIT_PK", "RECV_PK", "XMIT_KB", "RECV_KB", "COMMAND") foreach ([pid, dev, exec, uid] in ifrecv) { ifmerged[pid, dev, exec, uid] += @count(ifrecv[pid,dev,exec,uid]); } foreach ([pid, dev, exec, uid] in ifxmit) { ifmerged[pid, dev, exec, uid] += @count(ifxmit[pid,dev,exec,uid]); } foreach ([pid, dev, exec, uid] in ifmerged-) { n_xmit = @count(ifxmit[pid, dev, exec, uid]) n_recv = @count(ifrecv[pid, dev, exec, uid]) printf("%5d %5d %-7s %7d %7d %7d %7d %-15s\n", pid, uid, dev, n_xmit, n_recv, n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0, n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0, exec) } print("\n") delete ifxmit delete ifrecv delete ifmerged } probe timer.ms(5000), end, error { print_activity() }
注意 function print_activity()
使用如下表達式:socket
n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0 n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0
這些表達式是 if/else 條件判斷語句,上面第二個語句是如下僞代碼的一個更簡潔的寫做方式:tcp
if n_recv != 0 then @sum(ifrecv[pid, dev, exec, uid])/1024 else 0
nettop.stp
跟蹤在系統上哪一個進程在生成網絡流量,並提供關於進程的如下信息:ide
nettop.stp
每 5 秒提供網絡性能分析取樣。你能夠根據 probe timer.ms(5000)
改變這個設置, Example 5.1, 「nettop.stp Sample Output」 包含了一份從 nettop.stp
輸出的 20s 內的摘錄。函數
Example 5.1. nettop.stp Sample Output [...] PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 0 0 eth0 0 5 0 0 swapper 11178 0 eth0 2 0 0 0 synergyc PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 2886 4 eth0 79 0 5 0 cups-polld 11362 0 eth0 0 61 0 5 firefox 0 0 eth0 3 32 0 3 swapper 2886 4 lo 4 4 0 0 cups-polld 11178 0 eth0 3 0 0 0 synergyc PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 0 0 eth0 0 6 0 0 swapper 2886 4 lo 2 2 0 0 cups-polld 11178 0 eth0 3 0 0 0 synergyc 3611 0 eth0 0 1 0 0 Xorg PID UID DEV XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND 0 0 eth0 3 42 0 2 swapper 11178 0 eth0 43 1 3 0 synergyc 11362 0 eth0 0 7 0 0 firefox 3897 0 eth0 0 1 0 0 multiload-apple [...]
本節描述了怎樣從 net/socket.c
文件中跟蹤函數調用。這個任務能夠幫助你在更多的細節識別,在內核中,每一個進程是怎麼與網絡交互的。
socket-trace.stp
#! /usr/bin/env stap probe kernel.function("*@net/socket.c").call { printf ("%s -> %s\n", thread_indent(1), ppfunc()) } probe kernel.function("*@net/socket.c").return { printf ("%s <- %s\n", thread_indent(-1), ppfunc()) }
socket-trace.stp
是徹底和 Example 3.6, 「thread_indent.stp」 同樣的。最先在 SystemTap Functions 中使用用於證實 thread_indent()
是怎麼工做的。
Example 5.2. socket-trace.stp Sample Output [...] 0 Xorg(3611): -> sock_poll 3 Xorg(3611): <- sock_poll 0 Xorg(3611): -> sock_poll 3 Xorg(3611): <- sock_poll 0 gnome-terminal(11106): -> sock_poll 5 gnome-terminal(11106): <- sock_poll 0 scim-bridge(3883): -> sock_poll 3 scim-bridge(3883): <- sock_poll 0 scim-bridge(3883): -> sys_socketcall 4 scim-bridge(3883): -> sys_recv 8 scim-bridge(3883): -> sys_recvfrom 12 scim-bridge(3883):-> sock_from_file 16 scim-bridge(3883):<- sock_from_file 20 scim-bridge(3883):-> sock_recvmsg 24 scim-bridge(3883):<- sock_recvmsg 28 scim-bridge(3883): <- sys_recvfrom 31 scim-bridge(3883): <- sys_recv 35 scim-bridge(3883): <- sys_socketcall [...]
Example 5.2, 「socket-trace.stp Sample Output」 包含了 socket-trace.stp 輸出中的 3s 引用。想要腳本 thread_indent()
提供的更多信息,請移步至 SystemTap Functions Example 3.6, 「thread_indent.stp」。
本節說明如何監控傳入的TCP鏈接。這個任務在識別任何未受權的,可疑的,或是沒必要要的實時網絡訪問請求方面十分有用。
tcp_connections.stp
#! /usr/bin/env stap probe begin { printf("%6s %16s %6s %6s %16s\n", "UID", "CMD", "PID", "PORT", "IP_SOURCE") } probe kernel.function("tcp_accept").return?, kernel.function("inet_csk_accept").return? { sock = $return if (sock != 0) printf("%6d %16s %6d %6d %16s\n", uid(), execname(), pid(), inet_get_local_port(sock), inet_get_ip_source(sock)) }
當 tcp_connections.stp
正在運行,它將打印任何關於被系統實時接收的 TCP 鏈接的如下信息:
Example 5.3. tcp_connections.stp Sample Output UID CMD PID PORT IP_SOURCE 0 sshd 3165 22 10.64.0.227 0 sshd 3165 22 10.64.0.227
本節說明了如何監控被系統接收的 TCP 包。這個對分析在系統上運行的應用生成的網絡流量很是有用。
tcpdumplike.stp
#! /usr/bin/env stap // A TCP dump like example probe begin, timer.s(1) { printf("-----------------------------------------------------------------\n") printf(" Source IP Dest IP SPort DPort U A P R S F \n") printf("-----------------------------------------------------------------\n") } probe udp.recvmsg /* ,udp.sendmsg */ { printf(" %15s %15s %5d %5d UDP\n", saddr, daddr, sport, dport) } probe tcp.receive { printf(" %15s %15s %5d %5d %d %d %d %d %d %d\n", saddr, daddr, sport, dport, urg, ack, psh, rst, syn, fin) }
當 tcpdumplike.stp
在運行,它將打印如下關於任何被實時接收的 TCP 包的信息:
爲了肯定被包使用的標誌,tcpdumplike.stp
使用瞭如下函數:
上述函數返回 1 或 0 來指定包是否使用了匹配的標誌。
Example 5.4. tcpdumplike.stp Sample Output ----------------------------------------------------------------- Source IP Dest IP SPort DPort U A P R S F ----------------------------------------------------------------- 209.85.229.147 10.0.2.15 80 20373 0 1 1 0 0 0 92.122.126.240 10.0.2.15 80 53214 0 1 0 0 1 0 92.122.126.240 10.0.2.15 80 53214 0 1 0 0 0 0 209.85.229.118 10.0.2.15 80 63433 0 1 0 0 1 0 209.85.229.118 10.0.2.15 80 63433 0 1 0 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.147 10.0.2.15 80 21141 0 1 1 0 0 0 209.85.229.118 10.0.2.15 80 63433 0 1 1 0 0 0 [...]
在 Linux 網絡棧能夠由於各類緣由丟棄數據包。一些 Linux 內核包含了跟蹤點,kernel.trace("kfree_skb")
,能夠很容易的跟蹤包在哪裏丟棄了。 dropwatch.stp 使用 kernel.trace("kfree_skb")
來追蹤包丟棄;這個腳本概述了每 5 秒的間隔包丟棄的位置。
dropwatch.stp
#! /usr/bin/env stap ############################################################ # Dropwatch.stp # Author: Neil Horman <nhorman@redhat.com> # An example script to mimic the behavior of the dropwatch utility # http://fedorahosted.org/dropwatch ############################################################ # Array to hold the list of drop points we find global locations # Note when we turn the monitor on and off probe begin { printf("Monitoring for dropped packets\n") } probe end { printf("Stopping dropped packet monitor\n") } # increment a drop counter for every location we drop at probe kernel.trace("kfree_skb") { locations[$location] <<< 1 } # Every 5 seconds report our drop locations probe timer.sec(5) { printf("\n") foreach (l in locations-) { printf("%d packets dropped at %s\n", @count(locations[l]), symname(l)) } delete locations }
kernel.trace("kfree_skb")
跟蹤到內核丟棄網絡包的位置。kernel.trace("kfree_skb")
有兩個參數:一個指向緩衝區的指針被釋放($skb)的 buffer,內核代碼緩衝區的位置被釋放($location)。dropwatch.stp
腳本提供了包含 $location
的函數。把 $location
映射回函數的信息不是測量的默認值。在 SystemTap 1.4 ,--all-modules
選項將包含要求的映射信息,如下命令能夠被用於運行這個腳本。
stap --all-modules dropwatch.stp
在 SystemTap 的老版本,你可使用如下命令來模仿 --all-modules
選項:
stap -dkernel \ `cat /proc/modules | awk 'BEGIN { ORS = " " } {print "-d"$1}'` \ dropwatch.stp
運行 dropwatch.stp 腳本 15s 將有相似 Example 5.5, 「dropwatch.stp Sample Output」 的輸出結果。
Example 5.5. dropwatch.stp Sample Output Monitoring for dropped packets 1762 packets dropped at unix_stream_recvmsg 4 packets dropped at tun_do_read 2 packets dropped at nf_hook_slow 467 packets dropped at unix_stream_recvmsg 20 packets dropped at nf_hook_slow 6 packets dropped at tun_do_read 446 packets dropped at unix_stream_recvmsg 4 packets dropped at tun_do_read 4 packets dropped at nf_hook_slow Stopping dropped packet monitor
當腳本在一臺機器上編譯,在另一臺機器上運行, --all-modules
和 /proc/modules
目錄是不可用的。symname
函數將打印出原始地址。爲了使得原始地址丟棄的更有意義,涉及 /boot/System.map-
uname -r`` 文件。文件列表列出了每一個函數的開始地址。容許你映射地址到 Example 5.5, 「dropwatch.stp Sample Output」
輸出的一個指定的函數名字。獲得 /boot/System.map-
uname -r 文件的如下片斷。
0xffffffff8149a8ed
地址映射到函數 unix_stream_recvmsg
:
[...] ffffffff8149a420 t unix_dgram_poll ffffffff8149a5e0 t unix_stream_recvmsg ffffffff8149ad00 t unix_find_other [...]
後面的章節展現了監控磁盤和 I/O 活動的腳本。
這節描述了怎樣識別哪一個進程在執行頻繁的磁盤 reads/writes。
disktop.stp
#!/usr/bin/env stap # # Copyright (C) 2007 Oracle Corp. # # Get the status of reading/writing disk every 5 seconds, # output top ten entries # # This is free software,GNU General Public License (GPL); # either version 2, or (at your option) any later version. # # Usage: # ./disktop.stp # global io_stat,device global read_bytes,write_bytes probe vfs.read.return { if ($return>0) { if (devname!="N/A") {/*skip read from cache*/ io_stat[pid(),execname(),uid(),ppid(),"R"] += $return device[pid(),execname(),uid(),ppid(),"R"] = devname read_bytes += $return } } } probe vfs.write.return { if ($return>0) { if (devname!="N/A") { /*skip update cache*/ io_stat[pid(),execname(),uid(),ppid(),"W"] += $return device[pid(),execname(),uid(),ppid(),"W"] = devname write_bytes += $return } } } probe timer.ms(5000) { /* skip non-read/write disk */ if (read_bytes+write_bytes) { printf("\n%-25s, %-8s%4dKb/sec, %-7s%6dKb, %-7s%6dKb\n\n", ctime(gettimeofday_s()), "Average:", ((read_bytes+write_bytes)/1024)/5, "Read:",read_bytes/1024, "Write:",write_bytes/1024) /* print header */ printf("%8s %8s %8s %25s %8s %4s %12s\n", "UID","PID","PPID","CMD","DEVICE","T","BYTES") } /* print top ten I/O */ foreach ([process,cmd,userid,parent,action] in io_stat- limit 10) printf("%8d %8d %8d %25s %8s %4s %12d\n", userid,process,parent,cmd, device[process,cmd,userid,parent,action], action,io_stat[process,cmd,userid,parent,action]) /* clear data */ delete io_stat delete device read_bytes = 0 write_bytes = 0 } probe end{ delete io_stat delete device delete read_bytes delete write_bytes }
disktop.stp
輸出了最頻繁讀寫磁盤的前 10 進程。Example 5.6, 「disktop.stp Sample Output」顯示了這個腳本的取樣輸出,每一個列出的進程包含如下數據:
disktop.stp
輸出的時間和日期是由函數 ctime()
和 gettimeofday_s(). ctime()
返回的。硬件時鐘從 UNIX 時間(January 1, 1970)以秒爲單位傳遞。 gettimeofday_s()
計算了從 UNIX 時間的實際秒數。給出了一個至關準確的人類可讀的時間戳做爲輸出。
在這個腳本中,$return
是一個本地變量,存儲了每一個進程從虛擬文件系統讀或寫的實際字節數。$return
僅能被用於返回探針(例如, vfs.read.return
)。
Example 5.6. disktop.stp Sample Output [...] Mon Sep 29 03:38:28 2008 , Average: 19Kb/sec, Read: 7Kb, Write: 89Kb UID PID PPID CMD DEVICE T BYTES 0 26319 26294 firefox sda5 W 90229 0 2758 2757 pam_timestamp_c sda5 R 8064 0 2885 1 cupsd sda5 W 1678 Mon Sep 29 03:38:38 2008 , Average: 1Kb/sec, Read: 7Kb, Write: 1Kb UID PID PPID CMD DEVICE T BYTES 0 2758 2757 pam_timestamp_c sda5 R 8064 0 2885 1 cupsd sda5 W 1678
這節描述了每一個進程讀或寫任何文件所花費的時間。這對肯定哪一個文件在系統中加載慢是很是有用的。
iotime.stp
#! /usr/bin/env stap /* * Copyright (C) 2006-2007 Red Hat Inc. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions * of the GNU General Public License v.2. * * You should have received a copy of the GNU General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. * * Print out the amount of time spent in the read and write systemcall * when each file opened by the process is closed. Note that the systemtap * script needs to be running before the open operations occur for * the script to record data. * * This script could be used to to find out which files are slow to load * on a machine. e.g. * * stap iotime.stp -c 'firefox' * * Output format is: * timestamp pid (executabable) info_type path ... * * 200283135 2573 (cupsd) access /etc/printcap read: 0 write: 7063 * 200283143 2573 (cupsd) iotime /etc/printcap time: 69 * */ global start global time_io function timestamp:long() { return gettimeofday_us() - start } function proc:string() { return sprintf("%d (%s)", pid(), execname()) } probe begin { start = gettimeofday_us() } global filehandles, fileread, filewrite probe syscall.open.return { filename = user_string($filename) if ($return != -1) { filehandles[pid(), $return] = filename } else { printf("%d %s access %s fail\n", timestamp(), proc(), filename) } } probe syscall.read.return { p = pid() fd = $fd bytes = $return time = gettimeofday_us() - @entry(gettimeofday_us()) if (bytes > 0) fileread[p, fd] += bytes time_io[p, fd] <<< time } probe syscall.write.return { p = pid() fd = $fd bytes = $return time = gettimeofday_us() - @entry(gettimeofday_us()) if (bytes > 0) filewrite[p, fd] += bytes time_io[p, fd] <<< time } probe syscall.close { if ([pid(), $fd] in filehandles) { printf("%d %s access %s read: %d write: %d\n", timestamp(), proc(), filehandles[pid(), $fd], fileread[pid(), $fd], filewrite[pid(), $fd]) if (@count(time_io[pid(), $fd])) printf("%d %s iotime %s time: %d\n", timestamp(), proc(), filehandles[pid(), $fd], @sum(time_io[pid(), $fd])) } delete fileread[pid(), $fd] delete filewrite[pid(), $fd] delete filehandles[pid(), $fd] delete time_io[pid(),$fd] }
iotime.stp
追蹤系統調用打開, 關閉, 讀, 和 寫一個文件的時間。對於每一個系統調用訪問,iotime.stp
會計算任何讀寫花費的微秒數和追蹤讀寫進文件中的數據量。
iotime.stp
也使用本地變量 $count
來追蹤任何系統調用試圖讀和寫的數據量。注意 $return
(被用於 Section 5.2.1, 「Summarizing Disk Read/Write Traffic」 的 disktop.stp ) 存儲讀寫的實際數據量。 $count
僅能被用於追蹤數據讀寫的探針上(是 syscall.read
和 syscall.write
)。
Example 5.7. iotime.stp Sample Output [...] 825946 3364 (NetworkManager) access /sys/class/net/eth0/carrier read: 8190 write: 0 825955 3364 (NetworkManager) iotime /sys/class/net/eth0/carrier time: 9 [...] 117061 2460 (pcscd) access /dev/bus/usb/003/001 read: 43 write: 0 117065 2460 (pcscd) iotime /dev/bus/usb/003/001 time: 7 [...] 3973737 2886 (sendmail) access /proc/loadavg read: 4096 write: 0 3973744 2886 (sendmail) iotime /proc/loadavg time: 11 [...]
Example 5.7, 「iotime.stp Sample Output」
打印如下數據:
若是一個進程能夠讀寫任何數據,一對 access 和 iotime 應該出如今一塊兒, access 行的時間戳涉及到一個給定的進程訪問文件的時間;在這行的最後,它將顯示讀寫字節數。iotime 行顯示了一個進程爲了執行讀寫所花費的時間。
若是 access 行後跟隨的不是任何 iotime 行,意味着該進程沒有讀寫任何數據。
這節描述了怎樣跟蹤累積的系統 I/O。
traceio.stp
#! /usr/bin/env stap # traceio.stp # Copyright (C) 2007 Red Hat, Inc., Eugene Teo <eteo@redhat.com> # Copyright (C) 2009 Kai Meyer <kai@unixlords.com> # Fixed a bug that allows this to run longer # And added the humanreadable function # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # global reads, writes, total_io probe vfs.read.return { if ($return > 0) { reads[pid(),execname()] += $return total_io[pid(),execname()] += $return } } probe vfs.write.return { if ($return > 0) { writes[pid(),execname()] += $return total_io[pid(),execname()] += $return } } function humanreadable(bytes) { if (bytes > 1024*1024*1024) { return sprintf("%d GiB", bytes/1024/1024/1024) } else if (bytes > 1024*1024) { return sprintf("%d MiB", bytes/1024/1024) } else if (bytes > 1024) { return sprintf("%d KiB", bytes/1024) } else { return sprintf("%d B", bytes) } } probe timer.s(1) { foreach([p,e] in total_io- limit 10) printf("%8d %15s r: %12s w: %12s\n", p, e, humanreadable(reads[p,e]), humanreadable(writes[p,e])) printf("\n") # Note we don't zero out reads, writes and total_io, # so the values are cumulative since the script started. }
traceio.stp
打印了前十的可執行文件生成 I/O 通訊。此外,它也跟蹤 I/O 讀寫的累積數量,經過這些前十的可執行文件。這些信息會被追蹤並每隔 1s 打印出來,以降序的方式。
注意 traceio.stp
也使用本地變量 $return
,被 Section 5.2.1, 「Summarizing Disk Read/Write Traffic」 章節的 disktop.stp 使用的。
Example 5.8. traceio.stp Sample Output [...] Xorg r: 583401 KiB w: 0 KiB floaters r: 96 KiB w: 7130 KiB multiload-apple r: 538 KiB w: 537 KiB sshd r: 71 KiB w: 72 KiB pam_timestamp_c r: 138 KiB w: 0 KiB staprun r: 51 KiB w: 51 KiB snmpd r: 46 KiB w: 0 KiB pcscd r: 28 KiB w: 0 KiB irqbalance r: 27 KiB w: 4 KiB cupsd r: 4 KiB w: 18 KiB Xorg r: 588140 KiB w: 0 KiB floaters r: 97 KiB w: 7143 KiB multiload-apple r: 543 KiB w: 542 KiB sshd r: 72 KiB w: 72 KiB pam_timestamp_c r: 138 KiB w: 0 KiB staprun r: 51 KiB w: 51 KiB snmpd r: 46 KiB w: 0 KiB pcscd r: 28 KiB w: 0 KiB irqbalance r: 27 KiB w: 4 KiB cupsd r: 4 KiB w: 18 KiB
這節描述了怎樣在指定設備上監控 I/O 活動。
traceio2.stp
#! /usr/bin/env stap global device_of_interest probe begin { /* The following is not the most efficient way to do this. One could directly put the result of usrdev2kerndev() into device_of_interest. However, want to test out the other device functions */ dev = usrdev2kerndev($1) device_of_interest = MKDEV(MAJOR(dev), MINOR(dev)) } probe vfs.write, vfs.read { if (dev == device_of_interest) printf ("%s(%d) %s 0x%x\n", execname(), pid(), ppfunc(), dev) }
traceio2.stp
須要一個參數:整個設備號。爲了獲取這個數字,使用 stat -c "0x%D" directory
,directory
位於被監控的設備。
usrdev2kerndev()
函數把整個設備號轉換成內核可理解的格式。usrdev2kerndev()
產生的輸出被用於鏈接 MKDEV()
, MINOR()
, 和 MAJOR()
函數來肯定指定設備的最大和最小的數字。
traceio2.stp
輸出包含任何執行讀寫進程的 ID 和名字,執行的函數(vfs_read 或 vfs_write),和內核設備號。
如下示例是從 stap traceio2.stp 0x805
的完整輸出摘錄的,0x805
是 /home
的整個設備號,/home
在 /dev/sda5
中,就是咱們但願監控的設備。
Example 5.9. traceio2.stp Sample Output [...] synergyc(3722) vfs_read 0x800005 synergyc(3722) vfs_read 0x800005 cupsd(2889) vfs_write 0x800005 cupsd(2889) vfs_write 0x800005 cupsd(2889) vfs_write 0x800005 [...]
這節描述了怎樣監控文件的實時讀寫。
inodewatch.stp
#! /usr/bin/env stap probe vfs.write, vfs.read { # dev and ino are defined by vfs.write and vfs.read if (dev == MKDEV($1,$2) # major/minor device && ino == $3) printf ("%s(%d) %s 0x%x/%u\n", execname(), pid(), ppfunc(), dev, ino) }