2019.9.10 Golang 垃圾回收剖析

時間 2019-11-15

標籤 2019.9.10 golang 垃圾回收剖析欄目 Go 简体版

原文原文鏈接

轉自http://legendtkl.com/2017/04/28/golang-gc/git

另外還有一篇這個https://studygolang.com/articles/14497github

1. Golang GC 發展

Golang 從第一個版本以來，GC 一直是你們詬病最多的。可是每個版本的發佈基本都伴隨着 GC 的改進。下面列出一些比較重要的改動。golang

v1.1 STW
v1.3 Mark STW, Sweep 並行
v1.5 三色標記法
v1.8 hybrid write barrier

2. GC 算法簡介

這一小節介紹三種經典的 GC 算法：引用計數（reference counting）、標記-清掃（mark & sweep）、節點複製（Copying Garbage Collection），分代收集（Generational Garbage Collection）。算法

2.1 引用計數

引用計數的思想很是簡單：每一個單元維護一個域，保存其它單元指向它的引用數量（相似有向圖的入度）。當引用數量爲 0 時，將其回收。引用計數是漸進式的，可以將內存管理的開銷分佈到整個程序之中。C++ 的 share_ptr 使用的就是引用計算方法。數據結構

引用計數算法實現通常是把全部的單元放在一個單元池裏，好比相似 free list。這樣全部的單元就被串起來了，就能夠進行引用計數了。新分配的單元計數值被設置爲 1（注意不是 0，由於申請通常都說 ptr = new object 這種）。每次有一個指針被設爲指向該單元時，該單元的計數值加 1；而每次刪除某個指向它的指針時，它的計數值減 1。當其引用計數爲 0 的時候，該單元會被進行回收。雖然這裏說的比較簡單，實現的時候仍是有不少細節須要考慮，好比刪除某個單元的時候，那麼它指向的全部單元都須要對引用計數減 1。那麼若是這個時候，發現其中某個指向的單元的引用計數又爲 0，那麼是遞歸的進行仍是採用其餘的策略呢？遞歸處理的話會致使系統顛簸。關於這些細節這裏就不討論了，能夠參考文章後面的給的參考資料。併發

優勢

漸進式。內存管理與用戶程序的執行交織在一塊兒，將 GC 的代價分散到整個程序。不像標記-清掃算法須要 STW (Stop The World，GC 的時候掛起用戶程序)。
算法易於實現。
內存單元可以很快被回收。相比於其餘垃圾回收算法，堆被耗盡或者達到某個閾值纔會進行垃圾回收。

缺點

原始的引用計數不能處理循環引用。大概這是被詬病最多的缺點了。不過針對這個問題，也除了不少解決方案，好比強引用等。
維護引用計數下降運行效率。內存單元的更新刪除等都須要維護相關的內存單元的引用計數，相比於一些追蹤式的垃圾回收算法並不須要這些代價。
單元池 free list 實現的話不是 cache-friendly 的，這樣會致使頻繁的 cache miss，下降程序運行效率。

2.2 標記-清掃

標記-清掃算法是第一種自動內存管理，基於追蹤的垃圾收集算法。算法思想在 70 年代就提出了，是一種很是古老的算法。內存單元並不會在變成垃圾馬上回收，而是保持不可達狀態，直到到達某個閾值或者固定時間長度。這個時候系統會掛起用戶程序，也就是 STW，轉而執行垃圾回收程序。垃圾回收程序對全部的存活單元進行一次全局遍歷肯定哪些單元能夠回收。算法分兩個部分：標記（mark）和清掃（sweep）。標記階段代表全部的存活單元，清掃階段將垃圾單元回收。可視化能夠參考下圖。app

標記-清掃算法的優勢也就是基於追蹤的垃圾回收算法具備的優勢：避免了引用計數算法的缺點（不能處理循環引用，須要維護指針）。缺點也很明顯，須要 STW。函數

三色標記算法

三色標記算法是對標記階段的改進，原理以下：oop

起初全部對象都是白色。
從根出發掃描全部可達對象，標記爲灰色，放入待處理隊列。
從隊列取出灰色對象，將其引用對象標記爲灰色放入隊列，自身標記爲黑色。
重複 3，直到灰色對象隊列爲空。此時白色對象即爲垃圾，進行回收。

可視化以下。
性能

三色標記的一個明顯好處是可以讓用戶程序和 mark 併發的進行，具體能夠參考論文：《On-the-fly garbage collection: an exercise in cooperation.》。Golang 的 GC 實現也是基於這篇論文，後面再具體說明。

2.3 節點複製

節點複製也是基於追蹤的算法。其將整個堆等分爲兩個半區（semi-space），一個包含現有數據，另外一個包含已被廢棄的數據。節點複製式垃圾收集從切換（flip）兩個半區的角色開始，而後收集器在老的半區，也就是 Fromspace 中遍歷存活的數據結構，在第一次訪問某個單元時把它複製到新半區，也就是 Tospace 中去。在 Fromspace 中全部存活單元都被訪問過以後，收集器在 Tospace 中創建一個存活數據結構的副本，用戶程序能夠從新開始運行了。

優勢

全部存活的數據結構都縮並地排列在 Tospace 的底部，這樣就不會存在內存碎片的問題。
獲取新內存能夠簡單地經過遞增自由空間指針來實現。

缺點

內存得不到充分利用，總有一半的內存空間處於浪費狀態。

2.4 分代收集

基於追蹤的垃圾回收算法（標記-清掃、節點複製）一個主要問題是在生命週期較長的對象上浪費時間（長生命週期的對象是不須要頻繁掃描的）。同時，內存分配存在這麼一個事實「most object die young」。基於這兩點，分代垃圾回收算法將對象按生命週期長短存放到堆上的兩個（或者更多）區域，這些區域就是分代（generation）。對於新生代的區域的垃圾回收頻率要明顯高於老年代區域。

分配對象的時候重新生代裏面分配，若是後面發現對象的生命週期較長，則將其移到老年代，這個過程叫作 promote。隨着不斷 promote，最後新生代的大小在整個堆的佔用比例不會特別大。收集的時候集中主要精力在新生代就會相對來講效率更高，STW 時間也會更短。

優勢

性能更優。

缺點

實現複雜

3. Golang GC

3.1 Overview

在說 Golang 的具體垃圾回收流程時，咱們先來看一下幾個基本的問題。

1. 什麼時候觸發 GC

在堆上分配大於 32K byte 對象的時候進行檢測此時是否知足垃圾回收條件，若是知足則進行垃圾回收。

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
 ...
 shouldhelpgc := false
 // 分配的對象小於 32K byte
 if size <= maxSmallSize {
 ...
 } else {
 shouldhelpgc = true
 ...
 }
 ...
 // gcShouldStart() 函數進行觸發條件檢測
 if shouldhelpgc && gcShouldStart(false) {
 // gcStart() 函數進行垃圾回收
 gcStart(gcBackgroundMode, false)
 }
}

上面是自動垃圾回收，還有一種是主動垃圾回收，經過調用 runtime.GC()，這是阻塞式的。

// GC runs a garbage collection and blocks the caller until the
// garbage collection is complete. It may also block the entire
// program.
func GC() {
 gcStart(gcForceBlockMode, false)
}

2. GC 觸發條件

觸發條件主要關注下面代碼中的中間部分：forceTrigger || memstats.heap_live >= memstats.gc_trigger 。forceTrigger 是 forceGC 的標誌；後面半句的意思是當前堆上的活躍對象大於咱們初始化時候設置的 GC 觸發閾值。在 malloc 以及 free 的時候 heap_live 會一直進行更新，這裏就再也不展開了。

// gcShouldStart returns true if the exit condition for the _GCoff
// phase has been met. The exit condition should be tested when
// allocating.
//
// If forceTrigger is true, it ignores the current heap size, but
// checks all other conditions. In general this should be false.
func gcShouldStart(forceTrigger bool) bool {
 return gcphase == _GCoff && (forceTrigger || memstats.heap_live >= memstats.gc_trigger) && memstats.enablegc && panicking == 0 && gcpercent >= 0
}

//初始化的時候設置 GC 的觸發閾值
func gcinit() {
 _ = setGCPercent(readgogc())
 memstats.gc_trigger = heapminimum
 ...
}
// 啓動的時候經過 GOGC 傳遞百分比 x
// 觸發閾值等於 x * defaultHeapMinimum (defaultHeapMinimum 默認是 4M)
func readgogc() int32 {
 p := gogetenv("GOGC")
 if p == "off" {
 return -1
 }
 if n, ok := atoi32(p); ok {
 return n
 }
 return 100
}

3. 垃圾回收的主要流程

三色標記法，主要流程以下：

全部對象最開始都是白色。
從 root 開始找到全部可達對象，標記爲灰色，放入待處理隊列。
遍歷灰色對象隊列，將其引用對象標記爲灰色放入待處理隊列，自身標記爲黑色。
處理完灰色對象隊列，執行清掃工做。

詳細的過程以下圖所示，具體可參考 [9]。

關於上圖有幾點須要說明的是。

首先從 root 開始遍歷，root 包括全局指針和 goroutine 棧上的指針。
mark 有兩個過程。
1. 從 root 開始遍歷，標記爲灰色。遍歷灰色隊列。
2. re-scan 全局指針和棧。由於 mark 和用戶程序是並行的，因此在過程 1 的時候可能會有新的對象分配，這個時候就須要經過寫屏障（write barrier）記錄下來。re-scan 再完成檢查一下。
Stop The World 有兩個過程。
1. 第一個是 GC 將要開始的時候，這個時候主要是一些準備工做，好比 enable write barrier。
2. 第二個過程就是上面提到的 re-scan 過程。若是這個時候沒有 stw，那麼 mark 將無休止。

另外針對上圖各個階段對應 GCPhase 以下：

Off: _GCoff
Stack scan ~ Mark: _GCmark
Mark termination: _GCmarktermination

3.2 寫屏障 (write barrier)

關於 write barrier，徹底能夠另外寫成一篇文章，因此這裏只簡單介紹一下，這篇文章的重點仍是 Golang 的 GC。垃圾回收中的 write barrier 能夠理解爲編譯器在寫操做時特地插入的一段代碼，對應的還有 read barrier。

爲何須要 write barrier，很簡單，對於和用戶程序併發運行的垃圾回收算法，用戶程序會一直修改內存，因此須要記錄下來。

Golang 1.7 以前的 write barrier 使用的經典的 Dijkstra-style insertion write barrier [Dijkstra ‘78]， STW 的主要耗時就在 stack re-scan 的過程。自 1.8 以後採用一種混合的 write barrier 方式（Yuasa-style deletion write barrier [Yuasa ‘90] 和 Dijkstra-style insertion write barrier [Dijkstra ‘78]）來避免 re-scan。具體的能夠參考 17503-eliminate-rescan。

3.3 標記

下面的源碼仍是基於 go1.8rc3。這個版本的 GC 代碼相比以前改動仍是挺大的，咱們下面儘可能只關注主流程。垃圾回收的代碼主要集中在函數 gcStart() 中。

// gcStart 是 GC 的入口函數，根據 gcMode 作處理。
// 1. gcMode == gcBackgroundMode（後臺運行，也就是並行）, _GCoff -> _GCmark
// 2. 不然 GCoff -> _GCmarktermination，這個時候就是主動 GC 
func gcStart(mode gcMode, forceTrigger bool) {
 ...
}

1. STW phase 1

在 GC 開始以前的準備工做。

func gcStart(mode gcMode, forceTrigger bool) {
 ...
 //在後臺啓動 mark worker 
 if mode == gcBackgroundMode {
 gcBgMarkStartWorkers()
 }
 ...
 // Stop The World
 systemstack(stopTheWorldWithSema)
 ...
 if mode == gcBackgroundMode {
 // GC 開始前的準備工做

 //處理設置 GCPhase，setGCPhase 還會 enable write barrier
 setGCPhase(_GCmark)
 
 gcBgMarkPrepare() // Must happen before assist enable.
 gcMarkRootPrepare()

 // Mark all active tinyalloc blocks. Since we're
 // allocating from these, they need to be black like
 // other allocations. The alternative is to blacken
 // the tiny block on every allocation from it, which
 // would slow down the tiny allocator.
 gcMarkTinyAllocs()
 
 // Start The World
 systemstack(startTheWorldWithSema)
 } else {
 ...
 }
}

2. Mark

Mark 階段是並行的運行，經過在後臺一直運行 mark worker 來實現。

func gcStart(mode gcMode, forceTrigger bool) {
 ...
 //在後臺啓動 mark worker 
 if mode == gcBackgroundMode {
 gcBgMarkStartWorkers()
 }
}

func gcBgMarkStartWorkers() {
 // Background marking is performed by per-P G's. Ensure that
 // each P has a background GC G.
 for _, p := range &allp {
 if p == nil || p.status == _Pdead {
 break
 }
 if p.gcBgMarkWorker == 0 {
 go gcBgMarkWorker(p)
 notetsleepg(&work.bgMarkReady, -1)
 noteclear(&work.bgMarkReady)
 }
 }
}
// gcBgMarkWorker 是一直在後臺運行的，大部分時候是休眠狀態，經過 gcController 來調度
func gcBgMarkWorker(_p_ *p) {
 for {
 // 將當前 goroutine 休眠，直到知足某些條件
 gopark(...)
 ...
 // mark 過程
 systemstack(func() {
 // Mark our goroutine preemptible so its stack
 // can be scanned. This lets two mark workers
 // scan each other (otherwise, they would
 // deadlock). We must not modify anything on
 // the G stack. However, stack shrinking is
 // disabled for mark workers, so it is safe to
 // read from the G stack.
 casgstatus(gp, _Grunning, _Gwaiting)
 switch _p_.gcMarkWorkerMode {
 default:
 throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
 case gcMarkWorkerDedicatedMode:
 gcDrain(&_p_.gcw, gcDrainNoBlock|gcDrainFlushBgCredit)
 case gcMarkWorkerFractionalMode:
 gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
 case gcMarkWorkerIdleMode:
 gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
 }
 casgstatus(gp, _Gwaiting, _Grunning)
 })
 ...
 }
}

Mark 階段的標記代碼主要在函數 gcDrain() 中實現。

// gcDrain scans roots and objects in work buffers, blackening grey
// objects until all roots and work buffers have been drained.
func gcDrain(gcw *gcWork, flags gcDrainFlags) {
 ... 
 // Drain root marking jobs.
 if work.markrootNext < work.markrootJobs {
 for !(preemptible && gp.preempt) {
 job := atomic.Xadd(&work.markrootNext, +1) - 1
 if job >= work.markrootJobs {
 break
 }
 markroot(gcw, job)
 if idle && pollWork() {
 goto done
 }
 }
 }
 
 // 處理 heap 標記
 // Drain heap marking jobs.
 for !(preemptible && gp.preempt) {
 ...
 //從灰色列隊中取出對象
 var b uintptr
 if blocking {
 b = gcw.get()
 } else {
 b = gcw.tryGetFast()
 if b == 0 {
 b = gcw.tryGet()
 }
 }
 if b == 0 {
 // work barrier reached or tryGet failed.
 break
 }
 //掃描灰色對象的引用對象，標記爲灰色，入灰色隊列
 scanobject(b, gcw)
 }
}

3. Mark termination (STW phase 2)

mark termination 階段會 stop the world。函數實如今 gcMarkTermination()。1.8 版本已經不會再對 goroutine stack 進行 re-scan 了。細節有點多，這裏不細說了。

func gcMarkTermination() {
 // World is stopped.
 // Run gc on the g0 stack. We do this so that the g stack
 // we're currently running on will no longer change. Cuts
 // the root set down a bit (g0 stacks are not scanned, and
 // we don't need to scan gc's internal state). We also
 // need to switch to g0 so we can shrink the stack.
 systemstack(func() {
 gcMark(startTime)
 // Must return immediately.
 // The outer function's stack may have moved
 // during gcMark (it shrinks stacks, including the
 // outer function's stack), so we must not refer
 // to any of its variables. Return back to the
 // non-system stack to pick up the new addresses
 // before continuing.
 })
 ...
}

3.4 清掃

清掃相對來講就簡單不少了。

func gcSweep(mode gcMode) {
 ...
 //阻塞式
 if !_ConcurrentSweep || mode == gcForceBlockMode {
 // Special case synchronous sweep.
 ...
 // Sweep all spans eagerly.
 for sweepone() != ^uintptr(0) {
 sweep.npausesweep++
 }
 // Do an additional mProf_GC, because all 'free' events are now real as well.
 mProf_GC()
 mProf_GC()
 return
 }
 
 // 並行式
 // Background sweep.
 lock(&sweep.lock)
 if sweep.parked {
 sweep.parked = false
 ready(sweep.g, 0, true)
 }
 unlock(&sweep.lock)
}

對於並行式清掃，在 GC 初始化的時候就會啓動 bgsweep()，而後在後臺一直循環。

func bgsweep(c chan int) {
 sweep.g = getg()

 lock(&sweep.lock)
 sweep.parked = true
 c <- 1
 goparkunlock(&sweep.lock, "GC sweep wait", traceEvGoBlock, 1)

 for {
 for gosweepone() != ^uintptr(0) {
 sweep.nbgsweep++
 Gosched()
 }
 lock(&sweep.lock)
 if !gosweepdone() {
 // This can happen if a GC runs between
 // gosweepone returning ^0 above
 // and the lock being acquired.
 unlock(&sweep.lock)
 continue
 }
 sweep.parked = true
 goparkunlock(&sweep.lock, "GC sweep wait", traceEvGoBlock, 1)
 }
}

func gosweepone() uintptr {
 var ret uintptr
 systemstack(func() {
 ret = sweepone()
 })
 return ret
}

不論是阻塞式仍是並行式，都是經過 sweepone()函數來作清掃工做的。若是對於上篇文章 Golang 內存管理熟悉的話，這個地方就很好理解。內存管理都是基於 span 的，mheap_ 是一個全局的變量，全部分配的對象都會記錄在 mheap_ 中。在標記的時候，咱們只要找到對對象對應的 span 進行標記，清掃的時候掃描 span，沒有標記的 span 就能夠回收了。

// sweeps one span
// returns number of pages returned to heap, or ^uintptr(0) if there is nothing to sweep
func sweepone() uintptr {
 ...
 for {
 s := mheap_.sweepSpans[1-sg/2%2].pop()
 ...
 if !s.sweep(false) {
 // Span is still in-use, so this returned no
 // pages to the heap and the span needs to
 // move to the swept in-use list.
 npages = 0
 }
 }
}

// Sweep frees or collects finalizers for blocks not marked in the mark phase.
// It clears the mark bits in preparation for the next GC round.
// Returns true if the span was returned to heap.
// If preserve=true, don't return it to heap nor relink in MCentral lists;
// caller takes care of it.
func (s *mspan) sweep(preserve bool) bool {
 ...
}

3.5 其餘

1. gcWork

這裏介紹一下任務隊列，或者說灰色對象管理。每一個 P 上都有一個 gcw 用來管理灰色對象（get 和 put），gcw 的結構就是 gcWork。gcWork 中的核心是 wbuf1 和 wbuf2，裏面存儲就是灰色對象，或者說是 work（下面就所有統一叫作 work）。

type p struct {
 ...
 gcw gcWork
}

type gcWork struct {
 // wbuf1 and wbuf2 are the primary and secondary work buffers.
 wbuf1, wbuf2 wbufptr
 
 // Bytes marked (blackened) on this gcWork. This is aggregated
 // into work.bytesMarked by dispose.
 bytesMarked uint64

 // Scan work performed on this gcWork. This is aggregated into
 // gcController by dispose and may also be flushed by callers.
 scanWork int64
}

既然每一個 P 上有一個 work buffer，那麼是否是還有一個全局的 work list 呢？是的。經過在每一個 P 上綁定一個 work buffer 的好處和 cache 同樣，不須要加鎖。

var work struct {
 full uint64 // lock-free list of full blocks workbuf
 empty uint64 // lock-free list of empty blocks workbuf
 pad0 [sys.CacheLineSize]uint8 // prevents false-sharing between full/empty and nproc/nwait
 ...
}

那麼爲何使用兩個 work buffer （wbuf1 和 wbuf2）呢？我下面舉個例子。好比我如今要 get 一個 work 出來，先從 wbuf1 中取，wbuf1 爲空的話則與 wbuf2 swap 再 get。在其餘時間將 work buffer 中的 full 或者 empty buffer 移到 global 的 work 中。這樣的好處在於，在 get 的時候去全局的 work 裏面取（多個 goroutine 去取會有競爭）。這裏有趣的是 global 的 work list 是 lock-free 的，經過原子操做 cas 等實現。下面列舉幾個函數看一下 gcWrok。

初始化。

func (w *gcWork) init() {
 w.wbuf1 = wbufptrOf(getempty())
 wbuf2 := trygetfull()
 if wbuf2 == nil {
 wbuf2 = getempty()
 }
 w.wbuf2 = wbufptrOf(wbuf2)
}

put。

// put enqueues a pointer for the garbage collector to trace.
// obj must point to the beginning of a heap object or an oblet.
func (w *gcWork) put(obj uintptr) {
 wbuf := w.wbuf1.ptr()
 if wbuf == nil {
 w.init()
 wbuf = w.wbuf1.ptr()
 // wbuf is empty at this point.
 } else if wbuf.nobj == len(wbuf.obj) {
 w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
 wbuf = w.wbuf1.ptr()
 if wbuf.nobj == len(wbuf.obj) {
 putfull(wbuf)
 wbuf = getempty()
 w.wbuf1 = wbufptrOf(wbuf)
 flushed = true
 }
 }

 wbuf.obj[wbuf.nobj] = obj
 wbuf.nobj++
}

get。

// get dequeues a pointer for the garbage collector to trace, blocking
// if necessary to ensure all pointers from all queues and caches have
// been retrieved. get returns 0 if there are no pointers remaining.
//go:nowritebarrier
func (w *gcWork) get() uintptr {
 wbuf := w.wbuf1.ptr()
 if wbuf == nil {
 w.init()
 wbuf = w.wbuf1.ptr()
 // wbuf is empty at this point.
 }
 if wbuf.nobj == 0 {
 w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
 wbuf = w.wbuf1.ptr()
 if wbuf.nobj == 0 {
 owbuf := wbuf
 wbuf = getfull()
 if wbuf == nil {
 return 0
 }
 putempty(owbuf)
 w.wbuf1 = wbufptrOf(wbuf)
 }
 }

 // TODO: This might be a good place to add prefetch code

 wbuf.nobj--
 return wbuf.obj[wbuf.nobj]
}

2. forcegc

咱們上面講了兩種 GC 觸發方式：自動檢測和用戶主動調用。除此以後 Golang 自己還會對運行狀態進行監控，若是超過兩分鐘沒有 GC，則觸發 GC。監控函數是 sysmon()，在主 goroutine 中啓動。

// The main goroutine
func main() {
 ...
 systemstack(func() {
 newm(sysmon, nil)
 })
}
// Always runs without a P, so write barriers are not allowed.
func sysmon() {
 ...
 for {
 now := nanotime()
 unixnow := unixnanotime()
 
 lastgc := int64(atomic.Load64(&memstats.last_gc))
 if gcphase == _GCoff && lastgc != 0 && unixnow-lastgc > forcegcperiod && atomic.Load(&forcegc.idle) != 0 {
 lock(&forcegc.lock)
 forcegc.idle = 0
 forcegc.g.schedlink = 0
 injectglist(forcegc.g) // 將 forcegc goroutine 加入 runnable queue
 unlock(&forcegc.lock)
 }
 }
}

var forcegcperiod int64 = 2 * 60 *1e9 //兩分鐘

4.參考資料

《Go 語言學習筆記》
《垃圾收集》 - 豆瓣
Tracing Garbage Collection - wikipedia
《On-the-fly garbage collection: an exercise in cooperation.》 — Edsger W. Dijkstra, Leslie Lamport, A. J. Martin
Garbage Collection)
Tracing Garbage Collection
Copying Garbage Collection – youtube
Generational Garbage Collection – youtube
golang gc talk
17503-eliminate-rescan

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。