golang內存分配

時間 2020-01-06

標籤 golang 內存分配欄目 Go 简体版

原文原文鏈接

　　Go語言內置運行時（就是runtime），不一樣於傳統的內存分配方式，go爲自主管理，最開始是基於tcmalloc架構，後面逐步迭新。自主管理可實現更好的內存使用模式，如內存池、預分配等，從而避免了系統調用所帶來的性能問題。node

1. 基本策略

每次從操做系統申請一大塊內存，而後將其按特定大小分紅小塊，構成鏈表(組織方式是一個單鏈表數組，數組的每一個元素是一個單鏈表，鏈表中的每一個元素具備相同的大小。)；
爲對象分配內存時從大小合適的鏈表提取一小塊，避免每次都向操做系統申請內存，減小系統調用。
回收對象內存時將該小塊從新歸還到原鏈表，以便複用；若閒置內存過多，則歸還部份內存到操做系統，下降總體開銷。

1.1 內存塊

　　span：即上面所說的操做系統分配的大塊內存，由多個地址連續的頁組成；linux

　　object：由span按特定大小切分的小塊內存，每個可存儲一個對象；c++

　　按照用途，span面向內部管理，object面向對象分配。算法

關於span

　　內存分配器按照頁數來區分不一樣大小的span，如以頁數爲單位將span存放到管理數組中，且以頁數做爲索引；json

　　span大小並不是不變，在沒有獲取到合適大小的閒置span時，返回頁數更多的span，而後進行剪裁，多餘的頁數構成新的span，放回管理數組；數組

　　分配器還能夠將相鄰的空閒span合併，以構建更大的內存塊，減小碎片提供更靈活的分配策略。緩存

分配的內存塊大小

　　在$GOROOT/src/runtime/malloc.go文件下能夠找到相關信息。架構

1 //malloc.go
2 _PageShift = 13
3 _PageSize  = 1<<  _PageShift  //8KB

　　用於存儲對象的object，按8字節倍數分爲n種。如，大小爲24的object可存儲範圍在17~24字節的對象。在形成一些內存浪費的同時減小了小塊內存的規格，優化了分配和複用的管理策略。app

　　分配器還會將多個微小對象組合到一個object塊內，以節約內存。frontend

1 //malloc.go
2 _NumSizeClasses = 67

 1 //mheap.go
 2 type mspan struct {
 3     next *mspan   //雙向鏈表 next span in list, or nil if none
 4     prev *mspan   //previous span in list, or nil if none
 5     list *mSpanList  //用於調試。TODO: Remove.
 6 
 7     //起始序號 = （address >> _PageShift）
 8     startAddr uintptr  //address of first byte of span aka s.base()
 9     npages    uintptr  //number of pages in span
10 
11     //待分配的object鏈表
12     manualFreeList gclinkptr  //list of free objects in mSpanManual spans
13 }

　　分配器初始化時，會構建對照表存儲大小和規格的對應關係，包括用來切分的span頁數。

 1 //msize.go
 2 
 3  // Malloc small size classes.
 4  //
 5  // See malloc.go for overview.
 6  // See also mksizeclasses.go for how we decide what size classes to use.
 7 
 8  package runtime
 9 
10  // 若是須要，返回mallocgc將分配的內存塊的大小。
11  func roundupsize(size uintptr) uintptr {
12      if size < _MaxSmallSize {
13          if size <= smallSizeMax-8 { 
14              return uintptr(class_to_size[size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]])
15          } else {       
16              return uintptr(class_to_size[size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]])
17         }                                                                                                                                                                                                                                                                                
18     }
19      if size+_PageSize < size {                                                                                                                                                                                                                                                           
20          return size
21      }                                                                                                                                                                                                                                                                                    
22      return round(size, _PageSize)
23 }

　　若是對象大小超出特定閾值限制，會被當作大對象（large object）特別對待。

1 //malloc.go
2 _MaxSmallSize = 32 << 10   //32KB

　　這裏的對象分類：

小對象（tiny）: size < 16byte;
普通對象： 16byte ~ 32K;
大對象（large）：size > 32K;

1.2 內存分配器

分配器分爲三個模塊

　　cache：每一個運行期工做線程都會綁定一個cache，用於無鎖object分配(Central組件其實也是一個緩存，但它緩存的不是小對象內存塊，而是一組一組的內存page(一個page佔4k大小))。

1 //mcache.go
2 type mcache struct{
3     以spanClass爲索引管理多個用於分配的span
4     alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass  
5 }

　　central：爲全部cache提供切分好的後備span資源。

1 //mcentral.go
2 type mcentral struct{
3     spanclass   spanClass             //規格
4   //鏈表：尚有空閒object的span
5     nonempty mSpanList // list of spans with a free object, ie a nonempty free list      
6     // 鏈表：沒有空閒object，或已被cache取走的span
7     empty    mSpanList // list of spans with no free objects (or cached in an mcache)
8 }
9

　　heap：管理閒置span，須要時間向操做系統申請新內存(堆分配器，以8192byte頁進行管理)。

 1 type mheap struct{
 2     largealloc  uint64                  // bytes allocated for large objects 
 3     //頁數大於127（>=127）的閒置span鏈表                                                                                                                     
 4     largefree   uint64                  // bytes freed for large objects (>maxsmallsize)    
 5     nlargefree  uint64                  // number of frees for large objects (>maxsmallsize) 
 6     //頁數在127之內的閒置span鏈表數組                                                                                                                     
 7     nsmallfree  [_NumSizeClasses]uint64 // number of frees for small objects (<=maxsmallsize)
 8     //每一個central對應一種sizeclass
 9     central [numSpanClasses]struct {
10         mcentral mcentral
11         pad      [cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
12 }

　　一個線程有一個cache對應，這個cache用來存放小對象。全部線程共享Central和Heap。

虛擬地址空間

　　內存分配和垃圾回收都依賴連續地址，因此係統預留虛擬地址空間，用於內存分配，申請內存時，系統承諾但不當即分配物理內存。虛擬地址分紅三個區域：

頁所屬span指針數組 spans 512MB spans_mapped
GC標記位圖 bitmap 32GB bit_map
用戶內存分配區域 arena 512GB arena_start arena_used arena_end

　　三個數組組成一個高性能內存管理結構。使用arena地址向操做系統申請內存，其大小決定了可分配用戶內存上限；bitmap爲每一個對象提供4bit 標記位，用以保存指針、GC標記等信息；建立span時，按頁填充對應spans空間。這些區域的相關屬性保存在heap裏，其中包括遞進的分配位置mapped/used。

各個模塊關係圖以下：

1.3 內存分配流程

從對象的角度：

　　一、計算待分配對象規格大小（size class）;

　　二、cache.alloc數組中找到對應規格的apan；

　　三、span.freelist提取可用object，若該span.freelist爲空從central獲取新sapn；

　　四、若central.nonempty爲空，從heap.free/freelarge獲取，並切分紅object 鏈表；

　　五、如heap沒有大小合適的閒置span，向操做系統申請新內存塊。

釋放流程：

　　一、將標記爲可回收的object交還給所屬span.freelist；

　　二、該span被放回central，可供任意cache從新獲取使用；

　　三、如span已回收所有object，則將其交還給heap，以便從新切分複用；

　　四、按期掃描heap里長期閒置的span，釋放其佔用內存。

　　（注：以上不包括大對象，它直接從heap分配和回收）

　　cache爲每一個工做線程私有且不被共享，是實現高性能無鎖分配內存的核心。central是在多個cache中提升object的利用率，避免浪費。回收操做將span交還給central後，該span可被其餘cache從新獲取使用。將span歸還給heap是爲了在不一樣規格object間平衡。

2. 內存分配器初始化

　　初始化流程：

  1 func mallocinit() {
  2     testdefersizes()
  3 
  4     if heapArenaBitmapBytes&(heapArenaBitmapBytes-1) != 0 {
  5         // heapBits須要位圖上的模塊化算法工做地址。
  6         throw("heapArenaBitmapBytes not a power of 2")
  7     }
  8 
  9     // //複製類大小以用於統計信息表。
 10     for i := range class_to_size {
 11         memstats.by_size[i].size = uint32(class_to_size[i])
 12     }
 13 
 14     // 檢查 physPageSize.
 15     if physPageSize == 0 {
 16         // 操做系統初始化代碼沒法獲取物理頁面大小。
 17         throw("failed to get system page size")
 18     }
 19     if physPageSize < minPhysPageSize {
 20         print("system page size (", physPageSize, ") is smaller than minimum page size (", minPhysPageSize, ")\n")
 21         throw("bad system page size")
 22     }
 23     if physPageSize&(physPageSize-1) != 0 {
 24         print("system page size (", physPageSize, ") must be a power of 2\n")
 25         throw("bad system page size")
 26     }
 27 
 28     // 初始化堆。
 29     mheap_.init()
 30     //爲當前對象綁定cache對象
 31     _g_ := getg()
 32     _g_.m.mcache = allocmcache()
 33 
 34     //建立初始 arena 增加提示。
 35     if sys.PtrSize == 8 && GOARCH != "wasm" {
 36         //在64位計算機上：
 37         // 1.從地址空間的中間開始，能夠輕鬆擴展到連續範圍，而無需運行其餘映射。
 38         //
 39         // 2.這使Go堆地址調試時更容易識別。
 40         //
 41         // 3. gccgo中的堆棧掃描仍然很保守，所以將地址與其餘數據區分開很重要。
 42         //
 43         //在AIX上，對於64位，mmaps從0x0A00000000000000開始設置保留地址，若是失敗，則嘗試0x1c00000000000000~0x7fc0000000000000。
 44         //  流程.
 45         for i := 0x7f; i >= 0; i-- {
 46             var p uintptr
 47             switch {
 48             case GOARCH == "arm64" && GOOS == "darwin":
 49                 p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
 50             case GOARCH == "arm64":
 51                 p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
 52             case GOOS == "aix":
 53                 if i == 0 {
 54                     //咱們不會直接在0x0A00000000000000以後使用地址，以免與非執行程序所完成的其餘mmap發生衝突。
 55                     continue
 56                 }
 57                 p = uintptr(i)<<40 | uintptrMask&(0xa0<<52)
 58             case raceenabled:
 59                 // TSAN運行時要求堆的範圍爲[0x00c000000000，0x00e000000000）。
 60                 p = uintptr(i)<<32 | uintptrMask&(0x00c0<<32)
 61                 if p >= uintptrMask&0x00e000000000 {
 62                     continue
 63                 }
 64             default:
 65                 p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
 66             }
 67             hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
 68             hint.addr = p
 69             hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
 70         }
 71     } else {
 72         //在32位計算機上，須要更加關注保持可用堆是連續的。
 73         //
 74         // 1.咱們爲全部的heapArenas保留空間，這樣它們就不會與heap交錯。它們約爲258MB。
 75         //
 76         // 2. 咱們建議堆從二進制文件的末尾開始，所以咱們有最大的機會保持其連續性。
 77         //
 78         // 3. 咱們嘗試放出一個至關大的初始堆保留。
 79 
 80         const arenaMetaSize = (1 << arenaBits) * unsafe.Sizeof(heapArena{})
 81         meta := uintptr(sysReserve(nil, arenaMetaSize))
 82         if meta != 0 {
 83             mheap_.heapArenaAlloc.init(meta, arenaMetaSize)
 84         }
 85 
 86         procBrk := sbrk0()
 87 
 88         p := firstmoduledata.end
 89         if p < procBrk {
 90             p = procBrk
 91         }
 92         if mheap_.heapArenaAlloc.next <= p && p < mheap_.heapArenaAlloc.end {
 93             p = mheap_.heapArenaAlloc.end
 94         }
 95         p = round(p+(256<<10), heapArenaBytes)
 96         // // 由於咱們擔憂32位上的碎片，因此咱們嘗試進行較大的初始保留。
 97         arenaSizes := []uintptr{
 98             512 << 20,
 99             256 << 20,
100             128 << 20,
101         }
102         for _, arenaSize := range arenaSizes {
103             a, size := sysReserveAligned(unsafe.Pointer(p), arenaSize, heapArenaBytes)
104             if a != nil {
105                 mheap_.arena.init(uintptr(a), size)
106                 p = uintptr(a) + size // For hint below
107                 break
108             }
109         }
110         hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
111         hint.addr = p
112         hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
113     }
114 }

大概流程：

　　一、建立對象規格大小對照表；

　　二、計算相關區域大小，並嘗試從某個指定位置開始保留地址空間；

　　三、在heap裏保存區域信息，包括起始位置和大小；

　　四、初始化heap其餘屬性。

　　看一下保留地址操做細節：

 1 //mem_linux.go
 2 func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
 3     p, err := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0) //PORT_NONE: 頁面沒法訪問；
 4     if err != 0 {
 5         return nil
 6     }
 7     return p
 8 }
 9 
10 func sysMap(v unsafe.Pointer, n uintptr, sysStat *uint64) {
11     mSysStatInc(sysStat, n)
12 
13     p, err := mmap(v, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)  //_MAP_FIXED: 必須使用指定起始位置
14     if err == _ENOMEM {
15         throw("runtime: out of memory")
16     }
17     if p != v || err != 0 {
18         throw("runtime: cannot map pages in arena address space")
19     }
20 }

　　函數mmap()要求操做系統內核建立新的虛擬存儲器區域，可指定起始位置和長度。

3. 內存分配

　　編譯器有責任儘量使用寄存器和棧來存儲對象，有助於提高性能，減小垃圾回收器的壓力。

　　以new函數爲例看一下內存分配

 1 //test.go
 2 package main 
 3 
 4 import ()
 5 
 6 func test() *int {
 7     x :=new(int)
 8     *x = 0xAABB
 9     return x
10 }
11 
12 func main(){
13     println(*test())
14 }

　　在默認有內聯優化的時候：

　　內聯優化是避免棧和搶佔檢查這些成本的經典優化方法。

　　在沒有內聯優化的時候new函數會調用newobject在堆上分配內存。要在兩個棧幀間傳遞對象，所以會在堆上分配而不是返回一個失效棧幀裏的數據。而當內聯後它實際上就成了main棧幀內的局部變量，無須去堆上操做。

　　GO語言支持逃逸分析（eseape, analysis）, 它會在編譯期經過構建調用圖來分析局部變量是否會被外部調用，從而決定是否能夠直接分配在棧上。

　　編譯參數-gcflags "-m" 可輸出編譯優化信息，其中包括內聯和逃逸分析。性能測試時使用go-test-benchemem參數能夠輸出堆分配次數統計。

3.1 newobject分配內存的過程

 1 //mcache.go
 2 
 3 //小對象的線程（按Go，按P）緩存。 不須要鎖定，由於它是每一個線程的（每一個P）。 mcache是從非GC的內存中分配的，所以任何堆指針都必須進行特殊處理。
 4 //go:not in heap
 5 type mcache struct {
 6     ...
 7     // Allocator cache for tiny objects w/o pointers.See "Tiny allocator" comm ent in malloc.go.
 8     // tiny指向當前微小塊的開頭，若是沒有當前微小塊，則爲nil。
 9     //
10     //  tiny是一個堆指針。 因爲mcache位於非GC的內存中，所以咱們經過在標記終止期間在releaseAll中將其清除來對其進行處理。
11     tiny             uintptr
12     tinyoffset       uintptr
13     local_tinyallocs uintptr // 未計入其餘統計信息的微小分配數
14 
15     // 其他的不是在每一個malloc上訪問的。
16     alloc [numSpanClasses]*mspan // 要分配的範圍，由spanClass索引
17 }

　　內置new函數的實現

  1 //malloc.go
  2 // implementation of new builtin
  3 // compiler (both frontend and SSA backend) knows the signature
  4 // of this function
  5 func newobject(typ *_type) unsafe.Pointer {
  6     return mallocgc(typ.size, typ, true)
  7 }
  8 
  9 // Allocate an object of size bytes.
 10 // Small objects are allocated from the per-P cache's free lists.
 11 // Large objects (> 32 kB) are allocated straight from the heap.
 12 ///分配一個大小爲字節的對象。小對象是從per-P緩存的空閒列表中分配的。 大對象（> 32 kB）直接從堆中分配。
 13 func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
 14     if gcphase == _GCmarktermination { //垃圾回收有關
 15         throw("mallocgc called with gcphase == _GCmarktermination")
 16     }
 17 
 18     if size == 0 {
 19         return unsafe.Pointer(&zerobase)
 20     }
 21     if debug.sbrk != 0 {
 22         align := uintptr(16)
 23         if typ != nil {
 24             align = uintptr(typ.align)
 25         }
 26         return persistentalloc(size, align, &memstats.other_sys)  //圍繞sysAlloc的包裝程序，能夠分配小塊。沒有相關的自由操做。用於功能/類型/調試相關的持久數據。若是align爲0，則使用默認的align（當前爲8）。返回的內存將被清零。考慮將持久分配的類型標記爲go：notinheap。
 27     }
 28 
 29     // assistG是要爲此分配收費的G，若是GC當前未激活，則爲n。
 30     var assistG *g
 31 
 32      ...
 33 
 34     // Set mp.mallocing to keep from being preempted by GC.
 35     //加鎖放防止GC被搶佔。
 36     mp := acquirem()
 37     if mp.mallocing != 0 {
 38         throw("malloc deadlock")
 39     }
 40     if mp.gsignal == getg() {
 41         throw("malloc during signal")
 42     }
 43     mp.mallocing = 1
 44 
 45     shouldhelpgc := false
 46     dataSize := size
 47 
 48      //當前線程所綁定的cache
 49     c := gomcache()
 50     var x unsafe.Pointer
 51     // 判斷分配的對象是否 是nil或非指針類型
 52     noscan := typ == nil || typ.kind&kindNoPointers != 0
 53     //微小對象
 54     if size <= maxSmallSize {
 55         //無須掃描非指針微小對象（16）
 56         if noscan && size < maxTinySize {
 57             // Tiny allocator.
 58             //微小的分配器將幾個微小的分配請求組合到一個內存塊中。當全部子對象均不可訪問時，將釋放結果存儲塊。子對象必須是noscan（沒有指針），以確保限制可能浪費的內存量。
 59             //用於合併的存儲塊的大小（maxTinySize）是可調的。當前設置爲16字節.
 60             //小分配器的主要目標是小字符串和獨立的轉義變量。在json基準上，分配器將分配數量減小了約12％，並將堆大小減小了約20％。
 61             off := c.tinyoffset
 62             // 對齊所需（保守）對齊的小指針。調整偏移量。
 63             if size&7 == 0 {
 64                 off = round(off, 8)
 65             } else if size&3 == 0 {
 66                 off = round(off, 4)
 67             } else if size&1 == 0 {
 68                 off = round(off, 2)
 69             }
 70             //若是剩餘空間足夠.  當前mcache上綁定的tiny 塊內存空間足夠，直接分配，並返回
 71             if off+size <= maxTinySize && c.tiny != 0 {
 72                 // 返回指針，調整偏移量爲下次分配作好準備。
 73                 x = unsafe.Pointer(c.tiny + off)
 74                 c.tinyoffset = off + size
 75                 c.local_tinyallocs++
 76                 mp.mallocing = 0
 77                 releasem(mp)
 78                 return x
 79             }
 80             //當前mcache上的 tiny 塊內存空間不足，分配新的maxTinySize塊。就是從sizeclass=2的span.freelist獲取一個16字節object。
 81             span := c.alloc[tinySpanClass]
 82             // 嘗試從 allocCache 獲取內存，獲取不到返回0
 83             v := nextFreeFast(span)
 84             if v == 0 {
 85             // 沒有從 allocCache 獲取到內存，netxtFree函數 嘗試從 mcentral獲取一個新的對應規格的內存塊（新span），替換原先內存空間不足的內存塊，並分配內存，後面解析 nextFree 函數
 86                 v, _, shouldhelpgc = c.nextFree(tinySpanClass)
 87             }
 88             x = unsafe.Pointer(v)
 89             (*[2]uint64)(x)[0] = 0
 90             (*[2]uint64)(x)[1] = 0
 91             // 對比新舊兩個tiny塊剩餘空間，看看咱們是否須要用剩餘的自由空間來替換現有的微型塊。新塊分配後其tinyyoffset = size,所以比對偏移量便可
 92             if size < c.tinyoffset || c.tiny == 0 {
 93                 //用新塊替換
 94                 c.tiny = uintptr(x)
 95                 c.tinyoffset = size
 96             }
 97             //消費一個新的完整tiny塊
 98             size = maxTinySize
 99         } else {
100             // 這裏開始 正常對象的 內存分配
101             
102             // 首先查表，以肯定 sizeclass
103             var sizeclass uint8
104             if size <= smallSizeMax-8 {
105                 sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
106             } else {
107                 sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
108             }
109             size = uintptr(class_to_size[sizeclass])
110             spc := makeSpanClass(sizeclass, noscan)
111             //找到對應規格的span.freelist，從中提取object
112             span := c.alloc[spc]
113             // 同小對象分配同樣，嘗試從 allocCache 獲取內存，獲取不到返回0
114             v := nextFreeFast(span)
115             
116             //沒有可用的object。從central獲取新的span。
117             if v == 0 {
118                 v, span, shouldhelpgc = c.nextFree(spc)
119             }
120             x = unsafe.Pointer(v)
121             if needzero && span.needzero != 0 {
122                 memclrNoHeapPointers(unsafe.Pointer(v), size)
123             }
124         }
125     } else {
126         // 這裏開始大對象的分配
127 
128         // 大對象的分配與 小對象 和普通對象 的分配有點不同，大對象直接從 mheap 上分配
129         var s *mspan
130         shouldhelpgc = true
131         systemstack(func() {
132             s = largeAlloc(size, needzero, noscan)
133         })
134         s.freeindex = 1
135         s.allocCount = 1
136         //span.start實際由address >> pageshift生成。
137         x = unsafe.Pointer(s.base())
138         size = s.elemsize
139     }
140 
141     // bitmap標記...
142     // 檢查出發條件，啓動垃圾回收 ...
143 
144     return x
145 }

代碼基本思路：

　　1. 斷定對象是大對象、小對象仍是微小對象。

　　2. 若是是微小對象：

　　　　直接從 mcache 的alloc 找到對應 classsize 的 mspan；

　　　　若當前mspan有足夠空間，分配並修改mspan的相關屬性（nextFreeFast函數中實現）；

　　　　若當前mspan的空間不足，則從 mcentral從新獲取一塊對應 classsize的 mspan，替換原先的mspan，而後分配並修改mspan的相關屬性；

　　對於微小對象，它不能是指針，由於多個微小對象被組合到一個object裏，顯然沒法應對辣雞掃描。其次它從span.freelist獲取一個16字節的object，而後利用偏移量來記錄下一次分配的位置。

　　3. 若是是小對象，內存分配邏輯大體同微小對象：

　　　　首先查表，以肯定須要分配內存的對象的 sizeclass，並找到對應 classsize的 mspan；

　　　　若當前mspan有足夠的空間，分配並修改mspan的相關屬性（nextFreeFast函數中實現）；

　　　　若當前mspan沒有足夠的空間，從 mcentral從新獲取一塊對應 classsize的 mspan，替換原先的mspan，而後分配並修改mspan的相關屬性；

　　4. 若是是大對象，直接從mheap進行分配，這裏的實現依靠 largeAlloc 函數實現，再看一下這個函數的實現。仍是在malloc.go下面：

 1 func largeAlloc(size uintptr, needzero bool, noscan bool) *mspan {
 2     // print("largeAlloc size=", size, "\n")
 3     
 4   // 內存溢出判斷
 5     if size+_PageSize < size {
 6         throw("out of memory")
 7     }
 8   
 9   // 計算出對象所需的頁數
10     npages := size >> _PageShift
11     if size&_PageMask != 0 {
12         npages++
13     }
14 
15     // Deduct credit for this span allocation and sweep if
16     // necessary. mHeap_Alloc will also sweep npages, so this only
17     // pays the debt down to npage pages.
18     // 清理（Sweep）垃圾
19     deductSweepCredit(npages*_PageSize, npages)
20     
21   // 分配函數的具體實現
22     s := mheap_.alloc(npages, makeSpanClass(0, noscan), true, needzero)
23     if s == nil {
24         throw("out of memory")
25     }
26     s.limit = s.base() + size
27   // bitmap 記錄分配的span
28     heapBitsForAddr(s.base()).initSpan(s)
29     return s
30 }

　　再看看 mheap_.allo（）函數的實現：

 1 //mheap.go
 2 // alloc allocates a new span of npage pages from the GC'd heap.
 3 // Either large must be true or spanclass must indicates the span's size class and scannability.
 4 // If needzero is true, the memory for the returned span will be zeroed.
 5 func (h *mheap) alloc(npage uintptr, spanclass spanClass, large bool, needzero bool) *mspan {
 6     // Don't do any operations that lock the heap on the G stack.
 7     // It might trigger stack growth, and the stack growth code needs
 8     // to be able to allocate heap.
 9     //若是needzero爲true，則返回範圍的內存將爲零。
10     //不要執行任何將堆鎖定在G堆棧上的操做。
11     //它可能會觸發堆棧增加，而堆棧增加代碼須要可以分配堆。
12     var s *mspan
13     systemstack(func() {
14         s = h.alloc_m(npage, spanclass, large)
15     })
16 
17     if s != nil {
18         if needzero && s.needzero != 0 {
19             memclrNoHeapPointers(unsafe.Pointer(s.base()), s.npages<<_PageShift)
20         }
21         s.needzero = 0
22     }
23     return s
24 }

　　mheap.alloc_m()根據頁數從 heap 上面分配一個新的span，而且在 HeapMap 和 HeapMapCache 上記錄對象的sizeclass。

 1 //mheap.go
 2 func (h *mheap) alloc_m(npage uintptr, spanclass spanClass, large bool) *mspan {
 3     _g_ := getg()
 4     if _g_ != _g_.m.g0 {
 5         throw("_mheap_alloc not on g0 stack")
 6     }
 7     lock(&h.lock)
 8 
 9     // 清理垃圾，內存塊狀態標記 省略...
10     
11     // 從 heap中獲取指定頁數的span
12     s := h.allocSpanLocked(npage, &memstats.heap_inuse)
13     if s != nil {
14         // Record span info, because gc needs to be
15         // able to map interior pointer to containing span.
16         atomic.Store(&s.sweepgen, h.sweepgen)
17         h.sweepSpans[h.sweepgen/2%2].push(s) // Add to swept in-use list.// 忽略
18         s.state = _MSpanInUse
19         s.allocCount = 0
20         s.spanclass = spanclass
21     // 重置span的狀態
22         if sizeclass := spanclass.sizeclass(); sizeclass == 0 {
23             s.elemsize = s.npages << _PageShift
24             s.divShift = 0
25             s.divMul = 0
26             s.divShift2 = 0
27             s.baseMask = 0
28         } else {
29             s.elemsize = uintptr(class_to_size[sizeclass])
30             m := &class_to_divmagic[sizeclass]
31             s.divShift = m.shift
32             s.divMul = m.mul
33             s.divShift2 = m.shift2
34             s.baseMask = m.baseMask
35         }
36 
37         // update stats, sweep lists
38         h.pagesInUse += uint64(npage)
39         if large {
40       // 更新 mheap中大對象的相關屬性
41             memstats.heap_objects++
42             mheap_.largealloc += uint64(s.elemsize)
43             mheap_.nlargealloc++
44             atomic.Xadd64(&memstats.heap_live, int64(npage<<_PageShift))
45             // Swept spans are at the end of lists.
46       // 根據頁數判斷是busy仍是 busylarge鏈表，並追加到末尾
47             if s.npages < uintptr(len(h.busy)) {
48                 h.busy[s.npages].insertBack(s)
49             } else {
50                 h.busylarge.insertBack(s)
51             }
52         }
53     }
54     // gc trace 標記，省略...
55     unlock(&h.lock)
56     return s
57 }

　　mheap.allocSpanLocked()函數分配一個給定大小的span，並將分配的span從freelist中移除。

  //mheap.go
 1 func (h *mheap) allocSpanLocked(npage uintptr, stat *uint64) *mspan {
 2     var list *mSpanList
 3     var s *mspan
 4 
 5     // Try in fixed-size lists up to max.
 6   // 先嚐試獲取指定頁數的span，若是沒有，則試試頁數更多的
 7     for i := int(npage); i < len(h.free); i++ {
 8         list = &h.free[i]
 9         if !list.isEmpty() {
10             s = list.first
11             list.remove(s)
12             goto HaveSpan
13         }
14     }
15     // Best fit in list of large spans.
16   // 從 freelarge 上找到一個合適的span節點返回 ，下面繼續分析這個函數
17     s = h.allocLarge(npage) // allocLarge removed s from h.freelarge for us
18     if s == nil {
19     // 若是 freelarge上找不到合適的span節點，就只有從 系統 從新分配了
20     // 後面繼續分析這個函數
21         if !h.grow(npage) {
22             return nil
23         }
24     // 從系統分配後，再次到freelarge 上尋找合適的節點
25         s = h.allocLarge(npage)
26         if s == nil {
27             return nil
28         }
29     }
30 
31 HaveSpan:
32   // 從 free 上面獲取到了 合適頁數的span
33     // Mark span in use. 省略....
34     
35     if s.npages > npage {
36         // Trim extra and put it back in the heap.
37     // 建立一個 s.napges - npage 大小的span，並放回 heap
38         t := (*mspan)(h.spanalloc.alloc())
39         t.init(s.base()+npage<<_PageShift, s.npages-npage)
40     // 更新獲取到的span s 的屬性
41         s.npages = npage
42         h.setSpan(t.base()-1, s)
43         h.setSpan(t.base(), t)
44         h.setSpan(t.base()+t.npages*pageSize-1, t)
45         t.needzero = s.needzero
46         s.state = _MSpanManual // prevent coalescing with s
47         t.state = _MSpanManual
48         h.freeSpanLocked(t, false, false, s.unusedsince)
49         s.state = _MSpanFree
50     }
51     s.unusedsince = 0
52     // 將s放到spans 和 arenas 數組裏面
53     h.setSpans(s.base(), npage, s)
54 
55     *stat += uint64(npage << _PageShift)
56     memstats.heap_idle -= uint64(npage << _PageShift)
57 
58     //println("spanalloc", hex(s.start<<_PageShift))
59     if s.inList() {
60         throw("still in list")
61     }
62     return s
63 }

　　mheap.allocLarge()函數從 mheap 的 freeLarge 樹上面找到一個指定page數量的span，並將該span從樹上移除，找不到則返回nil。

 1 //mheap.go
 2 func (h *mheap) allocLarge(npage uintptr) *mspan {
 3     // Search treap for smallest span with >= npage pages.
 4     return h.freelarge.remove(npage)
 5 }
 6 
 7 // 上面的 h.freelarge.remove 即調用這個函數
 8 // 典型的二叉樹尋找算法
 9 func (root *mTreap) remove(npages uintptr) *mspan {
10     t := root.treap
11     for t != nil {
12         if t.spanKey == nil {
13             throw("treap node with nil spanKey found")
14         }
15         if t.npagesKey < npages {
16             t = t.right
17         } else if t.left != nil && t.left.npagesKey >= npages {
18             t = t.left
19         } else {
20             result := t.spanKey
21             root.removeNode(t)
22             return result
23         }
24     }
25     return nil
26 }

　　mheap.grow()函數在 mheap.allocSpanLocked 這個函數中，若是 freelarge上找不到合適的span節點，就只有從系統從新分配了，那咱們接下來就繼續分析一下這個函數的實現。

 1 func (h *mheap) grow(npage uintptr) bool {
 2     ask := npage << _PageShift
 3   // 向系統申請內存，後面繼續追蹤 sysAlloc 這個函數
 4     v, size := h.sysAlloc(ask)
 5     if v == nil {
 6         print("runtime: out of memory: cannot allocate ", ask, "-byte block (", memstats.heap_sys, " in use)\n")
 7         return false
 8     }
 9 
10     // Create a fake "in use" span and free it, so that the
11     // right coalescing happens.
12   // 建立 span 來管理剛剛申請的內存
13     s := (*mspan)(h.spanalloc.alloc())
14     s.init(uintptr(v), size/pageSize)
15     h.setSpans(s.base(), s.npages, s)
16     atomic.Store(&s.sweepgen, h.sweepgen)
17     s.state = _MSpanInUse
18     h.pagesInUse += uint64(s.npages)
19   // 將剛剛申請的span放到 arenas 和 spans 數組裏面
20     h.freeSpanLocked(s, false, true, 0)
21     return true
22 }

　　mheao.sysAlloc（）

  1 func (h *mheap) sysAlloc(n uintptr) (v unsafe.Pointer, size uintptr) {
  2    n = round(n, heapArenaBytes)
  3 
  4    // First, try the arena pre-reservation.
  5  // 從 arena 中 獲取對應大小的內存， 獲取不到返回nil
  6    v = h.arena.alloc(n, heapArenaBytes, &memstats.heap_sys)
  7    if v != nil {
  8    // 從arena獲取到須要的內存，跳轉到 mapped操做
  9        size = n
 10        goto mapped
 11    }
 12 
 13    // Try to grow the heap at a hint address.
 14  // 嘗試 從 arenaHint向下擴展內存
 15    for h.arenaHints != nil {
 16        hint := h.arenaHints
 17        p := hint.addr
 18        if hint.down {
 19            p -= n
 20        }
 21        if p+n < p {
 22            // We can't use this, so don't ask.
 23      // 表名 hint.down = false 不能向下擴展內存
 24            v = nil
 25        } else if arenaIndex(p+n-1) >= 1<<arenaBits {
 26      // 超出 heap 可尋址的內存地址，不能使用
 27            // Outside addressable heap. Can't use.
 28            v = nil
 29        } else {
 30      // 當前hint能夠向下擴展內存，利用mmap向系統申請內存
 31            v = sysReserve(unsafe.Pointer(p), n)
 32        }
 33        if p == uintptr(v) {
 34            // Success. Update the hint.
 35            if !hint.down {
 36                p += n
 37            }
 38            hint.addr = p
 39            size = n
 40            break
 41        }
 42        // Failed. Discard this hint and try the next.
 43        //
 44        // TODO: This would be cleaner if sysReserve could be
 45        // told to only return the requested address. In
 46        // particular, this is already how Windows behaves, so
 47        // it would simply things there.
 48        if v != nil {
 49            sysFree(v, n, nil)
 50        }
 51        h.arenaHints = hint.next
 52        h.arenaHintAlloc.free(unsafe.Pointer(hint))
 53    }
 54 
 55    if size == 0 {
 56        if raceenabled {
 57            // The race detector assumes the heap lives in
 58            // [0x00c000000000, 0x00e000000000), but we
 59            // just ran out of hints in this region. Give
 60            // a nice failure.
 61            throw("too many address space collisions for -race mode")
 62        }
 63 
 64        // All of the hints failed, so we'll take any
 65        // (sufficiently aligned) address the kernel will give
 66        // us.
 67        v, size = sysReserveAligned(nil, n, heapArenaBytes)
 68        if v == nil {
 69            return nil, 0
 70        }
 71 
 72        // Create new hints for extending this region.
 73        hint := (*arenaHint)(h.arenaHintAlloc.alloc())
 74        hint.addr, hint.down = uintptr(v), true
 75        hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
 76        hint = (*arenaHint)(h.arenaHintAlloc.alloc())
 77        hint.addr = uintptr(v) + size
 78        hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
 79    }
 80 
 81    // Check for bad pointers or pointers we can't use.
 82    {
 83        var bad string
 84        p := uintptr(v)
 85        if p+size < p {
 86            bad = "region exceeds uintptr range"
 87        } else if arenaIndex(p) >= 1<<arenaBits {
 88            bad = "base outside usable address space"
 89        } else if arenaIndex(p+size-1) >= 1<<arenaBits {
 90            bad = "end outside usable address space"
 91        }
 92        if bad != "" {
 93            // This should be impossible on most architectures,
 94            // but it would be really confusing to debug.
 95            print("runtime: memory allocated by OS [", hex(p), ", ", hex(p+size), ") not in usable address space: ", bad, "\n")
 96            throw("memory reservation exceeds address space limit")
 97        }
 98    }
 99 
100    if uintptr(v)&(heapArenaBytes-1) != 0 {
101        throw("misrounded allocation in sysAlloc")
102    }
103 
104    // Back the reservation.
105    sysMap(v, size, &memstats.heap_sys)
106 
107 mapped:
108    // Create arena metadata.
109  // 根據 v 的address，計算出 arenas 的L1 L2
110    for ri := arenaIndex(uintptr(v)); ri <= arenaIndex(uintptr(v)+size-1); ri++ {
111        l2 := h.arenas[ri.l1()]
112        if l2 == nil {
113      // 若是 L2 爲 nil，則分配 arenas[L1]
114            // Allocate an L2 arena map.
115            l2 = (*[1 << arenaL2Bits]*heapArena)(persistentalloc(unsafe.Sizeof(*l2), sys.PtrSize, nil))
116            if l2 == nil {
117                throw("out of memory allocating heap arena map")
118            }
119            atomic.StorepNoWB(unsafe.Pointer(&h.arenas[ri.l1()]), unsafe.Pointer(l2))
120        }
121        
122    // 若是 arenas[ri.L1()][ri.L2()] 不爲空 說明已經實例化過了
123        if l2[ri.l2()] != nil {
124            throw("arena already initialized")
125        }
126        var r *heapArena
127    // 從 arena 上分配內存
128        r = (*heapArena)(h.heapArenaAlloc.alloc(unsafe.Sizeof(*r), sys.PtrSize, &memstats.gc_sys))
129        if r == nil {
130            r = (*heapArena)(persistentalloc(unsafe.Sizeof(*r), sys.PtrSize, &memstats.gc_sys))
131            if r == nil {
132                throw("out of memory allocating heap arena metadata")
133            }
134        }
135 
136        // Store atomically just in case an object from the
137        // new heap arena becomes visible before the heap lock
138        // is released (which shouldn't happen, but there's
139        // little downside to this).
140        atomic.StorepNoWB(unsafe.Pointer(&l2[ri.l2()]), unsafe.Pointer(r))
141    }
142    // ...
143    return
144 }

　　大對象的分配流程至此結束。

3.2 小對象和微小對象的分配

　　nextFreeFast（）函數返回 span 上可用的地址，若是找不到則返回0

 1 func nextFreeFast(s *mspan) gclinkptr {
 2   // 計算s.allocCache從低位起有多少個0
 3     theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
 4     if theBit < 64 {
 5     
 6         result := s.freeindex + uintptr(theBit)
 7         if result < s.nelems {
 8             freeidx := result + 1
 9             if freeidx%64 == 0 && freeidx != s.nelems {
10                 return 0
11             }
12       // 更新bitmap、可用的 slot索引
13             s.allocCache >>= uint(theBit + 1)
14             s.freeindex = freeidx
15             s.allocCount++
16       // 返回 找到的內存的地址
17             return gclinkptr(result*s.elemsize + s.base())
18         }
19     }
20     return 0
21 }

　　mcache.nextFree()函數。若是 nextFreeFast 找不到合適的內存，就會進入這個函數。nextFree 若是在cached span 裏面找到未使用的object，則返回，不然，調用refill 函數，從 central 中獲取對應classsize的span，而後重新的span裏面找到未使用的object返回。

 1 //mcache.go
 2 func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
 3     // 先找到 mcache 中 對應 規格的 span
 4   s = c.alloc[spc]
 5     shouldhelpgc = false
 6   // 在 當前span中找到合適的 index索引
 7     freeIndex := s.nextFreeIndex()
 8     if freeIndex == s.nelems {
 9         // The span is full.
10     // freeIndex == nelems 時，表示當前span已滿
11         if uintptr(s.allocCount) != s.nelems {
12             println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
13             throw("s.allocCount != s.nelems && freeIndex == s.nelems")
14         }
15     // 調用refill函數，從 mcentral 中獲取可用的span，並替換掉當前 mcache裏面的span
16         systemstack(func() {
17             c.refill(spc)
18         })
19         shouldhelpgc = true
20         s = c.alloc[spc]
21         
22     // 再次到新的span裏面查找合適的index
23         freeIndex = s.nextFreeIndex()
24     }
25 
26     if freeIndex >= s.nelems {
27         throw("freeIndex is not valid")
28     }
29     
30   // 計算出來 內存地址，並更新span的屬性
31     v = gclinkptr(freeIndex*s.elemsize + s.base())
32     s.allocCount++
33     if uintptr(s.allocCount) > s.nelems {
34         println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
35         throw("s.allocCount > s.nelems")
36     }
37     return
38 }

mcache.refill()函數

　　Refill 根據指定的sizeclass獲取對應的span，並做爲 mcache的新的sizeclass對應的span

 1 //mcache.go
 2 func (c *mcache) refill(spc spanClass) {
 3     _g_ := getg()
 4 
 5     _g_.m.locks++
 6     // Return the current cached span to the central lists.
 7     s := c.alloc[spc]
 8 
 9     if uintptr(s.allocCount) != s.nelems {
10         throw("refill of span with free space remaining")
11     }
12     
13   // 判斷s是否是 空的span
14     if s != &emptymspan {
15         s.incache = false
16     }
17     // 嘗試從 mcentral 獲取一個新的span來代替老的span
18     // Get a new cached span from the central lists.
19     s = mheap_.central[spc].mcentral.cacheSpan()
20     if s == nil {
21         throw("out of memory")
22     }
23 
24     if uintptr(s.allocCount) == s.nelems {
25         throw("span has no free space")
26     }
27     // 更新mcache的span
28     c.alloc[spc] = s
29     _g_.m.locks--
30 }

若是從 mcentral 找不到對應的span，就會開始內存擴張，和上面分析的 mheap.alloc就相同了

4. 總結

1. 斷定對象大小：

2. 如果微小對象：

從 mcache 的 alloc 找到對應 classsize 的 mspan；
當前mspan有足夠的空間時，分配並修改mspan的相關屬性（nextFreeFast函數中實現）；
若當前mspan沒有足夠的空間，從 mcentral 從新獲取一塊對應 classsize的 mspan，替換原先的mspan，而後分配並修改mspan的相關屬性；
若 mcentral 沒有足夠的對應的classsize的span，則去向mheap申請；
若對應classsize的span沒有了，則找一個相近的classsize的span，切割並分配；
若找不到相近的classsize的span，則去向系統申請，並補充到mheap中；

3. 如果小對象，內存分配邏輯大體同小對象：

查表以肯定須要分配內存的對象的 sizeclass，找到對應classsize的 mspan；
mspan有足夠的空間時，分配並修改mspan的相關屬性（nextFreeFast函數中實現）；
若當前mspan沒有足夠的空間，從 mcentral從新獲取一塊對應 classsize的 mspan，替換原先的mspan，而後分配並修改mspan的相關屬性；
若mcentral沒有足夠的對應的classsize的span，則去向mheap申請；
若對應classsize的span沒有了，則找一個相近的classsize的span，切割並分配
若找不到相近的classsize的span，則去向系統申請，並補充到mheap中

4. 如果大對象，直接從mheap進行分配