GopherCon2017 中的一個視頻講解了如何用golang實現一個簡單的strace,本文是基於此演講整理而來。linux
先看下wiki的定義:git
In computing, a system call is the programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on. This may include hardware-related services (for example, accessing a hard disk drive), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.
系統調用是程序向操做系統內核請求服務的過程,一般包含硬件相關的服務(例如訪問硬盤),建立新進程等。系統調用提供了一個進程和操做系統之間的接口。github
只要在os上寫程序,就沒法避免和syscall打交道。舉個最經常使用的例子, fmt.Println("hello world")
, 這裏就用到了系統調用 write
, 咱們翻一下源碼。golang
func Fprintln(w io.Writer, a ...interface{}) (n int, err error) { p := newPrinter() p.doPrintln(a) // writer 是 stdout n, err = w.Write(p.buf) p.free() return } Stdout = NewFile(uintptr(syscall.Stdout), "/dev/stdout") func (f *File) write(b []byte) (n int, err error) { if len(b) == 0 { return 0, nil } // 實際的write方法,就是調用syscall.Write() return fixCount(syscall.Write(f.fd, b)) }
再舉一個例子,咱們常聽到的 zero-copy,咱們看看zero-copy是用來解決什麼問題的。shell
read(file, tmp_buf, len); write(socket, tmp_buf, len);
借用一張圖來講明問題socket
read()
致使上下文切換(context switch),從用戶模式進入內核模式,DMA(Direct memory access) engine 從磁盤中讀取內容,存入內核地址buffer。read()
返回,上下文切換回用戶態。write()
上下文切換,把buffer拷貝到內核地址buffer。write()
返回,第四次上下文切換,DMA engine 把數據從內核buffer傳給協議引擎,通常是進入隊列,等待傳輸。咱們看到,這裏數據在用戶空間和內核空間來回拷貝,實際上是沒必要要的。ide
解決的辦法有: mmap
, sendfile
, 具體能夠參考這篇文章函數
到這裏咱們應該對系統調用有了必定的認識了。工具
strace
是用於查看進程系統調用的工具, 通常使用方法以下ui
strace <bin> strace -p <pid> // 用於統計各個系統調用的次數 strace -c <bin> // 例如 strace -c echo hello hello % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 0.00 0.000000 0 1 read 0.00 0.000000 0 1 write 0.00 0.000000 0 3 open 0.00 0.000000 0 5 close 0.00 0.000000 0 4 fstat 0.00 0.000000 0 7 mmap 0.00 0.000000 0 4 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 3 3 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 34 3 total
stace 的實現原理是系統調用 ptrace, 咱們來看下 ptrace 是什麼。
man page 描述以下:
The ptrace() system call provides a means by which one process (the "tracer") may observe and control the execution of another process (the "tracee"), and examine and change the tracee's memory and registers. It is primarily used to implement breakpoint debuggingand system call tracing.
簡單來講有三大能力:
int ptrace(int request, pid_t pid, caddr_t addr, int data); request包含: PTRACE_ATTACH PTRACE_SYSCALL PTRACE_PEEKTEXT, PTRACE_PEEKDATA 等
tracer 使用 PTRACE_ATTACH
命令,指定須要追蹤的PID。緊接着調用 PTRACE_SYSCALL
。
tracee 會一直運行,直到遇到系統調用,內核會中止執行。 此時,tracer 會收到 SIGTRAP
信號,tracer 就能夠打印內存和寄存器中的信息了。
接着,tracer 繼續調用 PTRACE_SYSCALL
, tracee 繼續執行,直到 tracee退出當前的系統調用。
須要注意的是,這裏在進入syscall和退出syscall時,tracer都會察覺。
瞭解以上內容後,presenter 現場實現了一個go版本的strace, 須要在 linux amd64 環境編譯。
github
// strace.go
package main import ( "fmt" "os" "os/exec" "syscall" ) func main() { var err error var regs syscall.PtraceRegs var ss syscallCounter ss = ss.init() fmt.Println("Run: ", os.Args[1:]) cmd := exec.Command(os.Args[1], os.Args[2:]...) cmd.Stderr = os.Stderr cmd.Stdout = os.Stdout cmd.Stdin = os.Stdin cmd.SysProcAttr = &syscall.SysProcAttr{ Ptrace: true, } cmd.Start() err = cmd.Wait() if err != nil { fmt.Printf("Wait err %v \n", err) } pid := cmd.Process.Pid exit := true for { // 記得 PTRACE_SYSCALL 會在進入和退出syscall時使 tracee 暫停,因此這裏用一個變量控制,RAX的內容只打印一遍 if exit { err = syscall.PtraceGetRegs(pid, ®s) if err != nil { break } //fmt.Printf("%#v \n",regs) name := ss.getName(regs.Orig_rax) fmt.Printf("name: %s, id: %d \n", name, regs.Orig_rax) ss.inc(regs.Orig_rax) } // 上面Ptrace有提到的一個request命令 err = syscall.PtraceSyscall(pid, 0) if err != nil { panic(err) } // 猜想是等待進程進入下一個stop,這裏若是不等待,那麼會打印大量重複的調用函數名 _, err = syscall.Wait4(pid, nil, 0, nil) if err != nil { panic(err) } exit = !exit } ss.print() }
// 用於統計信息的counter, syscallcounter.go
package main import ( "fmt" "os" "text/tabwriter" "github.com/seccomp/libseccomp-golang" ) type syscallCounter []int const maxSyscalls = 303 func (s syscallCounter) init() syscallCounter { s = make(syscallCounter, maxSyscalls) return s } func (s syscallCounter) inc(syscallID uint64) error { if syscallID > maxSyscalls { return fmt.Errorf("invalid syscall ID (%x)", syscallID) } s[syscallID]++ return nil } func (s syscallCounter) print() { w := tabwriter.NewWriter(os.Stdout, 0, 0, 8, ' ', tabwriter.AlignRight|tabwriter.Debug) for k, v := range s { if v > 0 { name, _ := seccomp.ScmpSyscall(k).GetName() fmt.Fprintf(w, "%d\t%s\n", v, name) } } w.Flush() } func (s syscallCounter) getName(syscallID uint64) string { name, _ := seccomp.ScmpSyscall(syscallID).GetName() return name }
最後結果:
Run: [echo hello] Wait err stop signal: trace/breakpoint trap name: execve, id: 59 name: brk, id: 12 name: access, id: 21 name: mmap, id: 9 name: access, id: 21 name: open, id: 2 name: fstat, id: 5 name: mmap, id: 9 name: close, id: 3 name: access, id: 21 name: open, id: 2 name: read, id: 0 name: fstat, id: 5 name: mmap, id: 9 name: mprotect, id: 10 name: mmap, id: 9 name: mmap, id: 9 name: close, id: 3 name: mmap, id: 9 name: arch_prctl, id: 158 name: mprotect, id: 10 name: mprotect, id: 10 name: mprotect, id: 10 name: munmap, id: 11 name: brk, id: 12 name: brk, id: 12 name: open, id: 2 name: fstat, id: 5 name: mmap, id: 9 name: close, id: 3 name: fstat, id: 5 hello name: write, id: 1 name: close, id: 3 name: close, id: 3 1|read 1|write 3|open 5|close 4|fstat 7|mmap 4|mprotect 1|munmap 3|brk 3|access 1|execve 1|arch_prctl
對比一下結果,能夠發現和 strace 是同樣的。