PouchContainer 是阿里巴巴集團開源的一款容器運行時產品,它具有強隔離和可移植性等特色,可用來幫助企業快速實現存量業務容器化,以及提升企業內部物理資源的利用率。git
func main() { waitCh := make(chan struct{}) go func() { fmt.Println("Hi, Pouch. I'm new gopher!") waitCh <- struct{}{} }() <-waitCh }
正常狀況下,一隻土撥鼠完成任務以後,它將會回籠,而後等待你的下一次召喚。可是也有可能出現這隻土撥鼠很長時間沒有回籠的狀況。api
func main() { // /exec?cmd=xx&args=yy runs the shell command in the host http.HandleFunc("/exec", func(w http.ResponseWriter, r *http.Request) { defer func() { log.Printf("finish %v\n", r.URL) }() out, err := genCmd(r).CombinedOutput() if err != nil { w.WriteHeader(500) w.Write([]byte(err.Error())) return } w.Write(out) }) log.Fatal(http.ListenAndServe(":8080", nil)) } func genCmd(r *http.Request) (cmd *exec.Cmd) { var args []string if got := r.FormValue("args"); got != "" { args = strings.Split(got, " ") } if c := r.FormValue("cmd"); len(args) == 0 { cmd = exec.Command(c) } else { cmd = exec.Command(c, args...) } return }
func main() { logGoNum() // without sender and blocking.... var ch chan int go func(ch chan int) { <-ch }(ch) for range time.Tick(2 * time.Second) { logGoNum() } } func logGoNum() { log.Printf("goroutine number: %d\n", runtime.NumGoroutine()) }
形成 goroutine leak 有不少種不一樣的場景,本文接下來會經過描述 Pouch Logs API 場景,介紹如何對 goroutine leak 進行檢測並給出相應的解決方案。高併發
爲了更好地說明問題,本文將 Pouch Logs HTTP Handler 的代碼進行簡化:工具
func logsContainer(ctx context.Context, w http.ResponseWriter, r *http.Request) { ... writeLogStream(ctx, w, msgCh) return } func writeLogStream(ctx context.Context, w http.ResponseWriter, msgCh <-chan Message) { for { select { case <-ctx.Done(): return case msg, ok := <-msgCh: if !ok { return } w.Write(msg.Byte()) } } }
# step 1: create background job pouch run -d busybox sh -c "while true; do sleep 1; done" # step 2: follow the log and stop it after 3 seconds curl -m 3 {ip}:{port}/v1.24/containers/{container_id}/logs?stdout=1&follow=1 # step 3: after 3 seconds, dump the stack info curl -s "{ip}:{port}/debug/pprof/goroutine?debug=2" | grep -A 10 logsContainer github.com/alibaba/pouch/apis/server.(*Server).logsContainer(0xc420330b80, 0x251b3e0, 0xc420d93240, 0x251a1e0, 0xc420432c40, 0xc4203f7a00, 0x3, 0x3) /tmp/pouchbuild/src/github.com/alibaba/pouch/apis/server/container_bridge.go:339 +0x347 github.com/alibaba/pouch/apis/server.(*Server).(github.com/alibaba/pouch/apis/server.logsContainer)-fm(0x251b3e0, 0xc420d93240, 0x251a1e0, 0xc420432c40, 0xc4203f7a00, 0x3, 0x3) /tmp/pouchbuild/src/github.com/alibaba/pouch/apis/server/router.go:53 +0x5c github.com/alibaba/pouch/apis/server.withCancelHandler.func1(0x251b3e0, 0xc420d93240, 0x251a1e0, 0xc420432c40, 0xc4203f7a00, 0xc4203f7a00, 0xc42091dad0) /tmp/pouchbuild/src/github.com/alibaba/pouch/apis/server/router.go:114 +0x57 github.com/alibaba/pouch/apis/server.filter.func1(0x251a1e0, 0xc420432c40, 0xc4203f7a00) /tmp/pouchbuild/src/github.com/alibaba/pouch/apis/server/router.go:181 +0x327 net/http.HandlerFunc.ServeHTTP(0xc420a84090, 0x251a1e0, 0xc420432c40, 0xc4203f7a00) /usr/local/go/src/net/http/server.go:1918 +0x44 github.com/alibaba/pouch/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0xc4209fad20, 0x251a1e0, 0xc420432c40, 0xc4203f7a00) /tmp/pouchbuild/src/github.com/alibaba/pouch/vendor/github.com/gorilla/mux/mux.go:133 +0xed net/http.serverHandler.ServeHTTP(0xc420a18d00, 0x251a1e0, 0xc420432c40, 0xc4203f7800)
golang 提供的包 net/http
有監控連接斷開的功能:
// HTTP Handler Interceptors func withCancelHandler(h handler) handler { return func(ctx context.Context, rw http.ResponseWriter, req *http.Request) error { // https://golang.org/pkg/net/http/#CloseNotifier if notifier, ok := rw.(http.CloseNotifier); ok { var cancel context.CancelFunc ctx, cancel = context.WithCancel(ctx) waitCh := make(chan struct{}) defer close(waitCh) closeNotify := notifier.CloseNotify() go func() { select { case <-closeNotify: cancel() case <-waitCh: } }() } return h(ctx, rw, req) } }
CloseNotify 並不適用於 Hijack 連接的場景,由於 Hijack 以後,有關於連接的全部處理都交給了實際的 Handler,HTTP Server 已經放棄了數據的管理權。
那麼這樣的檢測能夠作成自動化嗎?下面會結合經常使用的分析工具來進行說明。
goroutine 93 [chan receive]: github.com/alibaba/pouch/daemon/mgr.NewContainerMonitor.func1(0xc4202ce618) /tmp/pouchbuild/src/github.com/alibaba/pouch/daemon/mgr/container_monitor.go:62 +0x45 created by github.com/alibaba/pouch/daemon/mgr.NewContainerMonitor /tmp/pouchbuild/src/github.com/alibaba/pouch/daemon/mgr/container_monitor.go:60 +0x8d goroutine 94 [chan receive]: github.com/alibaba/pouch/daemon/mgr.(*ContainerManager).execProcessGC(0xc42037e090) /tmp/pouchbuild/src/github.com/alibaba/pouch/daemon/mgr/container.go:2177 +0x1a5 created by github.com/alibaba/pouch/daemon/mgr.NewContainerManager /tmp/pouchbuild/src/github.com/alibaba/pouch/daemon/mgr/container.go:179 +0x50b
goroutine stack 一般第一行包含着 Goroutine ID,接下來的幾行是具體的調用棧信息。有了調用棧信息,咱們就能夠經過 關鍵字匹配 的方式來檢索是否存在泄漏的狀況了。
總的來講,debug
接口的方式適用於 集成測試 ,由於測試用例和目標服務不在同一個進程裏,須要 dump 目標進程的 goroutine stack 來獲取泄漏信息。
當測試用例和目標函數/服務在同一個進程裏時,能夠經過 goroutine 的數目變化來判斷是否存在泄漏問題。
func TestXXX(t *testing.T) { orgNum := runtime.NumGoroutine() defer func() { if got := runtime.NumGoroutine(); orgNum != got { t.Fatalf("xxx", orgNum, got) } }() ... }