Golang的性能測試

本文介紹Golang的性能測試(Benchmark)。html

使用testing包

看下面的bench_test.go:node

import "testing"

func Fib(n int) int {
	switch n {
	case 0:
		return 0
	case 1:
		return 1
	default:
		return Fib(n-1) + Fib(n-2)
	}
}

func BenchmarkFib20(b *testing.B) {
	for n := 0; n < b.N; n++ {
		Fib(20) // run the Fib function b.N times
	}
}
複製代碼

運行程序git

> $ go test -bench=. -run=^BenchmarkFib20$
goos: darwin
goarch: amd64
pkg: github.com/liangyaopei/GolangTester/bench
BenchmarkFib20-8           25608             46494 ns/op
PASS
ok      github.com/liangyaopei/GolangTester/bench       1.678s
複製代碼

從輸出能夠看到, BenchmarkFib20-8中的-8後綴，指的是GOMAXPROCS，它與CPU的數量有關。能夠經過-cpu標誌來指定github

> $ go test -bench=. -cpu=1,2,4 -run=^BenchmarkFib20$
複製代碼

25608表示進行了25608次循環，每次循環耗時46494ns (46494 ns/op)。web

其餘標誌瀏覽器

	做用	例子
-benchtime	指定運行時間	`go test -bench=. -benchtime=10s -run=^BenchmarkFib20$`
-count	指定運行次數	`go test -bench=. -count=10 -run=^BenchmarkFib20$`
-benchmem	監控內存分配	`go test -bench=. -benchmem -run=^BenchmarkFib20$`

將test保存爲二進制文件

go test -c將測試的代碼保存爲二進制文件，方便下次調用。markdown

CPU/內存/Block的Profile

-cpuprofile=$FILE 將 CPU profile 輸出 $FILE.
-memprofile=$FILE, 將內存 profile 輸出 $FILE, -memprofilerate=N adjusts the profile rate to 1/N.
-blockprofile=$FILE, 將 block profile 輸出to $FILE.

例子：併發

> $ go test -bench=.  -run=^BenchmarkFib20$ -cpuprofile=profile
> $ go tool pprof profile
複製代碼

使用PProf

profile用來跟蹤整個程序的運行，用來定位程序性能的問題。函數

pprof來自Google Perf Tools, 它被整合進Golang的runtime。pprof包含2個部分，oop

runtime/pprof 每一個Golang程序都有用到
go tool pprof 用來讀取profile輸出文件

pprof的類型

CPU Profile

程序運行時(runtime)， CPU Profile每隔10ms會打斷(interrupt)程序執行, 記錄當前運行的Goroutine的堆棧蹤影(stack trace)。
Memory profiling

當堆內存分配時，Memory profiling記錄stack trace.

與CPU Profile相似，默認狀況下，每1000次堆內存分配，Memory profiling會進行1次取樣。

Memory profiling是取樣的，而且它跟蹤的時沒有使用的內存分配，所以它不能肯定整個程序運行使用的內存。
Block (or blocking) profiling

Block profiling記錄一個Goroutine等待共享資源的時間，這能夠用來肯定程序中的併發瓶頸。
Mutex contention profiling

Mutex contention profiling記錄由於互斥致使延遲的操做。

每次一個Profile

使用profile是有性能消耗的，使用時每次只用1種類型的Profile。若是同時使用多個Profile，Profile之間會相互影響。

使用PProf

看下面的例子words.go

import (
	"fmt"
	"io"
	"log"
	"os"
	"unicode"

	"github.com/pkg/profile"
)

func readbyte(r io.Reader) (rune, error) {
	var buf [1]byte
	_, err := r.Read(buf[:])
	return rune(buf[0]), err
}

func main() {
	defer profile.Start().Stop()

	f, err := os.Open(os.Args[1])
	if err != nil {
		log.Fatalf("could not open file %q: %v", os.Args[1], err)
	}

	words := 0
	inword := false
	for {
		r, err := readbyte(f)
		if err == io.EOF {
			break
		}

		if err != nil {
			log.Fatalf("could not read file %q: %v", os.Args[1], err)
		}

		if unicode.IsSpace(r) && inword {
			words++
			inword = false
		}
		inword = unicode.IsLetter(r)
	}
	fmt.Printf("%q: %d words\n", os.Args[1], words)
}
複製代碼

進行性能測試：

> $ go run words.go whales.txt
2021/02/06 14:38:20 profile: cpu profiling enabled, /var/folders/q9/5v5tz4_92gd343hvb3mb1_s40000gn/T/profile247619673/cpu.pprof
"whales.txt": 181276 words
2021/02/06 14:38:21 profile: cpu profiling disabled, /var/folders/q9/5v5tz4_92gd343hvb3mb1_s40000gn/T/profile247619673/cpu.pprof

複製代碼

使用命令go tool pprof讀取文件

> $ go tool pprof /var/folders/q9/5v5tz4_92gd343hvb3mb1_s40000gn/T/profile247619673/cpu.pprof
Type: cpu
Time: Feb 6, 2021 at 2:38pm (CST)
Duration: 1.63s, Total samples = 1.29s (79.06%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1.29s, 100% of 1.29s total
      flat  flat%   sum%        cum   cum%
     1.29s   100%   100%      1.29s   100%  syscall.syscall
         0     0%   100%      1.29s   100%  internal/poll.(*FD).Read
         0     0%   100%      1.29s   100%  internal/poll.ignoringEINTR
         0     0%   100%      1.29s   100%  main.main
         0     0%   100%      1.29s   100%  main.readbyte (inline)
         0     0%   100%      1.29s   100%  os.(*File).Read
         0     0%   100%      1.29s   100%  os.(*File).read (inline)
         0     0%   100%      1.29s   100%  runtime.main
         0     0%   100%      1.29s   100%  syscall.Read
         0     0%   100%      1.29s   100%  syscall.read

複製代碼

能夠看到syscall.syscall耗費最多的CPU資源, 由於每次readbyte(f)都會發生系統調用，讀取字符。

(pprof) list main.main
Total: 1.29s
ROUTINE ======================== main.main in /Users/liangyaopei/workspace/GolangTester/profile/words.go
         0      1.29s (flat, cum)   100% of Total
         .          .     25:   }
         .          .     26:
         .          .     27:   words := 0
         .          .     28:   inword := false
         .          .     29:   for {
         .      1.29s     30:           r, err := readbyte(f)
         .          .     31:           if err == io.EOF {
         .          .     32:                   break
         .          .     33:           }
         .          .     34:
         .          .     35:           if err != nil {

複製代碼

go tool pprof經常使用命令

	說明
top	Top 命令會按指標大小列出前 10 個函數，好比 CPU 是按執行時間長短，內存是按內存佔用多少。
traces	打印全部調用棧和調用棧的指標信息。
list	list 命令則是用來幫咱們確認函數在代碼中的位置。
web	打開瀏覽器圖形界面
go tool pprof -http=:8080	使用瀏覽器形式讀取Profile輸出

編譯器優化

編譯器優化主要包括3個方面

逃逸分析
函數內聯(inline)
死代碼(dead code)清除

go build 命令

	做用	例子
-m	打印編譯器的逃逸分析決定	`go build -gcflags=-m example.go`
-l	是否內聯	`-gcflags=-l`禁止內聯; `-gcflags='-l -l'` 兩層內聯

參考文獻

High Performance Go Workshop

個人公衆號：lyp分享的地方

個人知乎專欄: zhuanlan.zhihu.com/c_127546654…

個人博客：www.liangyaopei.com

Github Page: liangyaopei.github.io/