Golang 大殺器之性能剖析 PProf

時間 2019-11-06

標籤 golang 性能剖析 pprof 欄目 Go 简体版

原文原文鏈接

原文地址：Golang 大殺器之性能剖析 PProfhtml

前言

寫了幾噸代碼，實現了幾百個接口。功能測試也經過了，終於成功的部署上線了node

結果，性能不佳，什麼鬼？😭git

想作性能分析

PProf

想要進行性能優化，首先矚目在 Go 自身提供的工具鏈來做爲分析依據，本文將帶你學習、使用 Go 後花園，涉及以下：github

runtime/pprof：採集程序（非 Server）的運行數據進行分析
net/http/pprof：採集 HTTP Server 的運行時數據進行分析

是什麼

pprof 是用於可視化和分析性能分析數據的工具golang

pprof 以 profile.proto 讀取分析樣本的集合，並生成報告以可視化並幫助分析數據（支持文本和圖形報告）web

profile.proto 是一個 Protocol Buffer v3 的描述文件，它描述了一組 callstack 和 symbolization 信息，做用是表示統計分析的一組採樣的調用棧，是很常見的 stacktrace 配置文件格式性能優化

支持什麼使用模式

Report generation：報告生成
Interactive terminal use：交互式終端使用
Web interface：Web 界面

能夠作什麼

CPU Profiling：CPU 分析，按照必定的頻率採集所監聽的應用程序 CPU（含寄存器）的使用狀況，可肯定應用程序在主動消耗 CPU 週期時花費時間的位置
Memory Profiling：內存分析，在應用程序進行堆分配時記錄堆棧跟蹤，用於監視當前和歷史內存使用狀況，以及檢查內存泄漏
Block Profiling：阻塞分析，記錄 goroutine 阻塞等待同步（包括定時器通道）的位置
Mutex Profiling：互斥鎖分析，報告互斥鎖的競爭狀況

一個簡單的例子

咱們將編寫一個簡單且有點問題的例子，用於基本的程序初步分析app

編寫 demo 文件

（1）demo.go，文件內容：函數

package main

import (
    "log"
    "net/http"
    _ "net/http/pprof"
    "github.com/EDDYCJY/go-pprof-example/data"
)

func main() {
    go func() {
        for {
            log.Println(data.Add("https://github.com/EDDYCJY"))
        }
    }()

    http.ListenAndServe("0.0.0.0:6060", nil)
}

（2）data/d.go，文件內容：工具

package data

var datas []string

func Add(str string) string {
    data := []byte(str)
    sData := string(data)
    datas = append(datas, sData)

    return sData
}

運行這個文件，你的 HTTP 服務會多出 /debug/pprof 的 endpoint 可用於觀察應用程序的狀況

分析

1、經過 Web 界面

查看當前總覽：訪問 http://127.0.0.1:6060/debug/pprof/

/debug/pprof/

profiles:
0    block
5    goroutine
3    heap
0    mutex
9    threadcreate

full goroutine stack dump

這個頁面中有許多子頁面，我們繼續深究下去，看看能夠獲得什麼？

cpu（CPU Profiling）: $HOST/debug/pprof/profile，默認進行 30s 的 CPU Profiling，獲得一個分析用的 profile 文件
block（Block Profiling）：$HOST/debug/pprof/block，查看致使阻塞同步的堆棧跟蹤
goroutine：$HOST/debug/pprof/goroutine，查看當前全部運行的 goroutines 堆棧跟蹤
heap（Memory Profiling）: $HOST/debug/pprof/heap，查看活動對象的內存分配狀況
mutex（Mutex Profiling）：$HOST/debug/pprof/mutex，查看致使互斥鎖的競爭持有者的堆棧跟蹤
threadcreate：$HOST/debug/pprof/threadcreate，查看建立新OS線程的堆棧跟蹤

2、經過交互式終端使用

（1）go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60

$ go tool pprof http://localhost:6060/debug/pprof/profile\?seconds\=60

Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=60
Saved profile in /Users/eddycjy/pprof/pprof.samples.cpu.007.pb.gz
Type: cpu
Duration: 1mins, Total samples = 26.55s (44.15%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

執行該命令後，需等待 60 秒（可調整 seconds 的值），pprof 會進行 CPU Profiling。結束後將默認進入 pprof 的交互式命令模式，能夠對分析的結果進行查看或導出。具體可執行 pprof help 查看命令說明

(pprof) top10
Showing nodes accounting for 25.92s, 97.63% of 26.55s total
Dropped 85 nodes (cum <= 0.13s)
Showing top 10 nodes out of 21
      flat  flat%   sum%        cum   cum%
    23.28s 87.68% 87.68%     23.29s 87.72%  syscall.Syscall
     0.77s  2.90% 90.58%      0.77s  2.90%  runtime.memmove
     0.58s  2.18% 92.77%      0.58s  2.18%  runtime.freedefer
     0.53s  2.00% 94.76%      1.42s  5.35%  runtime.scanobject
     0.36s  1.36% 96.12%      0.39s  1.47%  runtime.heapBitsForObject
     0.35s  1.32% 97.44%      0.45s  1.69%  runtime.greyobject
     0.02s 0.075% 97.51%     24.96s 94.01%  main.main.func1
     0.01s 0.038% 97.55%     23.91s 90.06%  os.(*File).Write
     0.01s 0.038% 97.59%      0.19s  0.72%  runtime.mallocgc
     0.01s 0.038% 97.63%     23.30s 87.76%  syscall.Write

flat：給定函數上運行耗時
flat%：同上的 CPU 運行耗時總比例
sum%：給定函數累積使用 CPU 總比例
cum：當前函數加上它之上的調用運行總耗時
cum%：同上的 CPU 運行耗時總比例

最後一列爲函數名稱，在大多數的狀況下，咱們能夠經過這五列得出一個應用程序的運行狀況，加以優化 🤔

（2）go tool pprof http://localhost:6060/debug/pprof/heap

$ go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /Users/eddycjy/pprof/pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.008.pb.gz
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 837.48MB, 100% of 837.48MB total
      flat  flat%   sum%        cum   cum%
  837.48MB   100%   100%   837.48MB   100%  main.main.func1

-inuse_space：分析應用程序的常駐內存佔用狀況
-alloc_objects：分析應用程序的內存臨時分配狀況

（3） go tool pprof http://localhost:6060/debug/pprof/block

（4） go tool pprof http://localhost:6060/debug/pprof/mutex

3、PProf 可視化界面

這是使人期待的一小節。在這以前，咱們須要簡單的編寫好測試用例來跑一下

編寫測試用例

（1）新建 data/d_test.go，文件內容：

package data

import "testing"

const url = "https://github.com/EDDYCJY"

func TestAdd(t *testing.T) {
    s := Add(url)
    if s == "" {
        t.Errorf("Test.Add error!")
    }
}

func BenchmarkAdd(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Add(url)
    }
}

（2）執行測試用例

$ go test -bench=. -cpuprofile=cpu.prof
pkg: github.com/EDDYCJY/go-pprof-example/data
BenchmarkAdd-4       10000000           187 ns/op
PASS
ok      github.com/EDDYCJY/go-pprof-example/data    2.300s

-memprofile 也能夠了解一下