Five things that make Go fast-渣渣翻譯-讓GO語言更快的5個緣由

原文地址:https://dave.cheney.net/2014/06/07/five-things-that-make-go-fasthtml

翻譯放在每一個小段下面python

 

Anthony Starks has remixed my original Google Present based slides using his fantastic Deck presentation tool. You can check out his remix over on his blog, mindchunk.blogspot.com.au/2014/06/remixing-with-deck.程序員

Anthony Starks :安東尼·斯塔克斯?
remix :使混合,再攪拌
original :原物;原做;原始的
Present :呈現,如今
based :
slides :滑,下跌;幻燈片
fantastic :奇異的,極好的
Deck :夾板,行李倉n;裝飾;裝甲板,打扮vt;人名Deck
presentation :展現;描述,陳述,介紹
check out :檢驗;結帳離開
Anthony Starks 使用他本身作的演示工具將我原來的文章中的幻燈片作了一番改進和增強,大家能夠去他的博客找到相關的工具和內容

I was recently invited to give a talk at Gocon, a fantastic Go conference held semi-annually in Tokyo, Japan. Gocon 2014 was an entirely community-run one day event combining training and an afternoon of presentations surrounding the theme of Go in production.web

recently :最近
invited to :被邀請幹...
conference :會議
semi-annually :半年度
entirely :徹底
community-run :社區運營
combining :結合
presentations :報告,提供,外觀,表演
surround :圍繞

最近,我被邀請到Gocon作演講。Gocon是日本東京的一個每半年舉行一次的關於GO語言的會議。Gocon 2014是一個徹底由社區運營的爲期一天的活動,結合了培訓和下午圍繞「Go in production」主題的演講。編程

The following is the text of my presentation. The original text was structured to force me to speak slowly and clearly, so I have taken the liberty of editing it slightly to be more readable.數組

I want to thank Bill Kennedy, Minux Ma, and especially Josh Bleecher Snyder, for their assistance in preparing this talk.緩存

structure :組織,安排
force :促使,強迫
liberty:自由
slightly :輕微的,稍稍

下面是我演講的文檔,原始文檔的結構使我可以講的緩慢而清晰,因此我稍微修改事後,使其更具可讀性。安全


Good afternoon.網絡

My name is David.數據結構

I am delighted to be here at Gocon today. I have wanted to come to this conference for two years and I am very grateful to the organisers for extending me the opportunity to present to you today.

delighted :喜歡的,高興的、
organisers :組織者
opportunity :機會

很是高興,今天可以參加Gocon。兩年前,我就在盼望能夠參加這個會議,而且很是感謝組織方給我這個和你們分享的機會。

Gocon 2014
I want to begin my talk with a question.

Why are people choosing to use Go ?

When people talk about their decision to learn Go, or use it in their product, they have a variety of answers, but there always three that are at the top of their list

variety :種類

首先,我有一個問題,人們問什麼選擇GO語言?

在他們談論本身學習GO的緣由,或者在生產中使用時,他們說了許多的答案。有三個觀點是最多的。Gocon 2014

  • concurrency 併發性
  • easy of deployment 容易部署
  • performance 性能表現

These are the top three.

The first, Concurrency.

Go’s concurrency primitives are attractive to programmers who come from single threaded scripting languages like Nodejs, Ruby, or Python, or from languages like C++ or Java with their heavyweight threading model.

primitive :原始的,早起的,簡單的
attractive :有吸引力的
single threaded :單線程
scripting :腳本
heavyweight :重量級

第一點,併發性

Go語言簡單的併發特性,對於那些使用單線程的腳本語言,如Nodejs、Ruby、Python或者像C++和Java的重量級線程模型的程序員,都頗有吸引力

Ease of deployment.

We have heard today from experienced Gophers who appreciate the simplicity of deploying Go applications.

appreciate :感激,感謝,欣賞,賞識,重視

第二點,容易部署

咱們從經驗豐富的Go語言使用者(Gophers)那裏聽到,他們很是欣賞部署Go應用程序的簡單性。

Gocon 2014

This leaves Performance.

I believe an important reason why people choose to use Go is because it is fast.

最後就是性能了。

我相信,人們選擇Go語言最重要的緣由,就是由於它快。

Gocon 2014 (4)

For my talk today I want to discuss five features that contribute to Go’s performance.

I will also share with you the details of how Go implements these features.

feature :特徵
contribute :起促進做用

今天的演講,我想討論關於促進Go的性能的五個特性。
我還將與你分享Go如何實現這些特性的詳細信息。

1、變量

Gocon 2014 (5)

The first feature I want to talk about is Go’s efficient treatment and storage of values.

efficient :效率高的
treatment :處理
storage :存儲

第一個特性,就是Go語言處理和存儲變量的高效性。

Gocon 2014 (6)

This is an example of a value in Go. When compiled, gocon consumes exactly four bytes of memory.

Let’s compare Go with some other languages

compiled :編譯
consumes :消耗
compare :對比

這是GO語言定義變量的一個例子,編譯完成後,gocon佔用將近4個字節的內存。

讓咱們對比一下其餘語言。

Gocon 2014 (7)

Due to the overhead of the way Python represents variables, storing the same value using Python consumes six times more memory.

This extra memory is used by Python to track type information, do reference counting, etc

Let’s look at another example:

overhead :離地面的,頭頂上的;經費
represents :表現
track :足跡,蹤影,追蹤
reference :參考
etc :等等

 

因爲python表達變量的方式,存儲相同的變量的開銷佔用內存空間高達6倍。

額外的空間用來追蹤變量類型信息,記錄引用計數,等等。

來看下一個例子:

 

Gocon 2014 (8)

Similar to Go, the Java int type consumes 4 bytes of memory to store this value.

However, to use this value in a collection like a List or Map, the compiler must convert it into an Integer object.

collection :收集,收取
這裏是集合的意思,一種數據類型

和Go語言同樣,Java的int類型,佔用4個字節來存儲這個變量。

可是,要在List或Map之類的集合中使用這個變量,編譯器必須將其轉換爲Integer對象。

Gocon 2014 (9)

So an integer in Java frequently looks more like this and consumes between 16 and 24 bytes of memory.

Why is this important ? Memory is cheap and plentiful, why should this overhead matter ?

frequently :頻繁的,多次的
plentiful :豐富的

因此,在Java中,integer一般是這樣的,他將佔用16-24個字節的空間

爲何這個很重要?內存又便宜又多。爲何這點開銷關係重大?

Gocon 2014 (10)

This is a graph showing CPU clock speed vs memory bus speed.

Notice how the gap between CPU clock speed and memory bus speed continues to widen.

The difference between the two is effectively how much time the CPU spends waiting for memory.

lags :延遲
graph :曲線圖
gap :缺口
widen :變寬

這是顯示CPU時鐘速度與內存總線速度的圖表。
請注意CPU時鐘速度和內存總線速度之間的差距如何繼續擴大。
二者之間的區別其實是CPU花費多少時間等待內存。

Gocon 2014 (11)

Since the late 1960’s CPU designers have understood this problem.

Their solution is a cache, an area of smaller, faster memory which is inserted between the CPU and main memory.

自1960年代後期以來,CPU設計師已經意識到這個問題。
他們的解決方案是增長緩存,一個更小,更快的內存區域,放在CPU和主內存之間。

CPU之因此發展迅猛更多的是依賴與於緩存 
那麼若是緩存的有用數據能更多,那麼緩存的性能也就能隨之提升
緩存的性能提升將會帶來更佳的程序性能 一般動態類型的語言,數組中的全部元素類型各不相同 ,須要被單獨的存放到堆中
而不是一個連續的存儲數組,這就使CPU緩存無用武之地
由於CPU緩存會把一片連續的內存空間讀入
而這種分散在不一樣的內存地址中的數據,緩存幫不上忙,就只能CPU去讀取內存
緩存讀取一個數據實在3個CPU時鐘週期
而從內存讀取一個數據則須要100CPU時鐘週期 因此程序性能下降也就是理所應當的了

所以,咱們在定義變量的時候,能用小的就用小的,儘可能讓數值留在CPU Cache,而不是在速度更慢的內存裏。

緩存的參考文章:https://blog.51cto.com/wingeek/274006

Gocon 2014 (12)

This is a Location type which holds the location of some object in three dimensional space. It is written in Go, so each Location consumes exactly 24 bytes of storage.

We can use this type to construct an array type of 1,000 Locations, which consumes exactly 24,000 bytes of memory.

Inside the array, the Location structures are stored sequentially, rather than as pointers to 1,000 Location structures stored randomly.

This is important because now all 1,000 Location structures are in the cache in sequence, packed tightly together.

Location :位置
dimensional :空間的,維的,尺寸的
construct :構建
structures :結構
sequentially :繼續的
sequence :順序,連續
packed :包裝,充滿...的
tightly :牢牢的

這是一種位置類型,它將對象保存在三維空間中的位置(X、Y、Z)。 它是用Go編寫的,所以每一個位置只消耗24個字節的存儲空間。
咱們可使用這種類型來構造一個1,000個Locations的數組類型,它只消耗24,000個字節的內存。
在數組內部,Location結構按順序存儲,而不是做爲指向隨機存儲的1,000個Location結構的指針。
這很重要,由於如今全部1,000個Location結構都按順序放在緩存中,緊密排列在一塊兒。

Gocon 2014 (13)

Go lets you create compact data structures, avoiding unnecessary indirection.

Compact data structures utilise the cache better.

Better cache utilisation leads to better performance.

compact :合同,緊湊的
avoiding :迴避;撤銷
indirection :間接
utilise :利用

Go可讓你建立緊湊的數據結構,避免沒必要要的間接。
緊湊的數據結構更好地利用緩存。
更好的緩存利用率可帶來更好的性能。

2、內聯函數

Gocon 2014 (14)

Function calls are not free.

函數調用不是免費的。

這是由於函數調用是有開銷的。
這個開銷大體分爲兩個部分,參數傳遞和保存當前程序的上下文。
對於傳遞參數的開銷而言,傳遞的參數越多開銷就越大;
對於保存當前程序上下文所花費的開銷而言,函數越複雜須要花費的開銷就越大。
若是一個很簡單的函數其函數功能的開銷甚至比函數調用的開銷還要來的小的多,那就極其的不划算了。
內聯函數的目的就在於,編譯器會將一些功能極其簡單的被調用函數代碼內嵌到調用函數中。

Gocon 2014 (15)

Three things happen when a function is called.

A new stack frame is created, and the details of the caller recorded.

Any registers which may be overwritten during the function call are saved to the stack.

The processor computes the address of the function and executes a branch to that new address.

procedure :程序,步驟
stack frame :棧幀
registers :註冊,登記
processor :加工,處理事物的人
execute :執行,實現,使生效

調用函數後,發生的三件事。

一、建立一個新的棧幀,並記錄調用者的詳細信息。
二、在函數調用期間,可能被重寫的任何寄存器都將保存到棧中。
三、處理器計算函數的地址並執行到該新地址的分支。

Gocon 2014 (16)

Because function calls are very common operations, CPU designers have worked hard to optimise this procedure, but they cannot eliminate the overhead.

Depending on what the function does, this overhead may be trivial or significant.

A solution to reducing function call overhead is an optimisation technique called Inlining.

unavoidable :不可避免的
unavoidable  overhead :不可避免的開銷
common :廣泛的,常見的,共有的
operations :操做
optimise :使最優化
aliminate :消除
depend on :依賴
trivial :不重要的
significant :重要的 /sɪg'nɪfɪk(ə)nt/
reduce :減小
optimisation : 優化
optimist :樂天派
Mathematical optimization 最優化
technique :技巧,技術
Inlining :內聯

由於函數調用是很是廣泛和頻繁的操做,所以,cpu的設計師們一直在努力尋找優化函數調用開銷的方法,但,始終不能下降這個開銷。

開銷的大或者小,依賴於函數的功能的複雜程度。

內聯函數就是減小函數調用開銷的一種優化技術。

Gocon 2014 (17)

The Go compiler inlines a function by treating the body of the function as if it were part of the caller

Inlining has a cost; it increases binary size.

It only makes sense to inline when the overhead of calling a function is large relative to the work the function does, so only simple functions are candidates for inlining.

Complicated functions are usually not dominated by the overhead of calling them and are therefore not inlined.

treating :處理
increases :增長
binary :二進制
sense :識別,官能,辨別
relative :相對的
candidate :候選人
complicated :結構複雜的
dominated :受控的
therefore :所以

 

Go編譯器的內聯是將一個函數處理到調用函數的內部,就至關於函數就是調用者的一部分。

內聯是有代價的,他會增長二進制文件的大小。

會被作內聯操做的只有那些調用開銷比功能的開銷大的函數,因此只有簡單的函數纔會被內聯。

複雜函數的調用佔用的那一點開銷一般是微不足道的,不在監控範圍,所以,不會被內聯。

Gocon 2014 (18)

This example shows the function Double calling util.Max.

To reduce the overhead of the call to util.Max, the compiler can inline util.Max into Double, resulting in something like this

這是一個展現函數Double調用函數Max的例子。

爲了減小調用Max的開銷,編譯器將Max內聯在了Double中,結果以下:

Gocon 2014 (19)

After inlining there is no longer a call to util.Max, but the behaviour of Double is unchanged.

Inlining isn’t exclusive to Go. Almost every compiled or JITed language performs this optimisation. But how does inlining in Go work?

The Go implementation is very simple. When a package is compiled, any small function that is suitable for inlining is marked and then compiled as usual.

Then both the source of the function and the compiled version are stored.

exclusive :高級的,專用的
JITed :也是一張編譯器
perform :執行,完成,
implementation :貫徹,落實,執行
suitable :適當的,相配的

 

內聯之後,再也不調用Max,但Double的行爲不變。
內聯不是Go獨有的。 幾乎每種編譯或JITed語言都執行此優化。 但在Go中,內聯是如何運做的?
Go的實現很是簡單。 當一個包被編譯,任何適合內聯的小函數都會被標記,而後照常編譯。
函數的源和編譯的版本都會被存儲起來。

Gocon 2014 (20)

This slide shows the contents of util.a. The source has been transformed a little to make it easier for the compiler to process quickly.

When the compiler compiles Double it sees that util.Max is inlinable, and the source of util.Max is available.

Rather than insert a call to the compiled version of util.Max, it can substitute the source of the original function.

Having the source of the function enables other optimizations.

這張幻燈片展現了util.a.的內容,原文件被編譯後,變得更小了,是爲了讓程序運行的更快。

當編譯器在編譯Double的時候,他會發現Max已經被內聯,而且,Max的源代碼也是能訪問到的。

與其插入一個對Max的編譯版本的調用,倒不如使用內聯替換源代碼,使源代碼能夠獲得優化。

Gocon 2014 (21)

In this example, although the function Test always returns false, Expensive cannot know that without executing it.

When Test is inlined, we get something like this

dead code elimination :刪除無用代碼
executing :執行

 

在這個例子中:景觀函數Test返回了false,可是函數Expensive若是不執行,它將不會知道這一點。當Test被內聯後,咱們能夠看見下面這些東西:

Gocon 2014 (22)

The compiler now knows that the expensive code is unreachable.

Not only does this save the cost of calling Test, it saves compiling or running any of the expensive code that is now unreachable.

The Go compiler can automatically inline functions across files and even across packages. This includes code that calls inlinable functions from the standard library.

automatically :自動的
standard library :標準庫

 

編譯器如今知道了expensive的代碼沒有任何結果。這不只省掉了調用Test的的開銷,還省去了編譯和運行像expensive這樣無效的代碼。

Go編譯器能夠跨文件甚至是跨包自動內聯函數,包括標準庫中的函數。

3、逃逸分析

Gocon 2014 (23)

什麼是逃逸:http://www.javashuo.com/article/p-yvjakqkn-bt.html

介紹逃逸分析:https://www.iteye.com/topic/473355

簡單的講:逃逸就是在一個方法內建立的對象,被外部引用,在方法執行結束以後,可是外部引用還在,致使方法沒法被GC回收,這就是逃逸,結果就是空間佔用變大,GC負擔增大,影響性能。

最好的方法就是:爲了GC好,爲了性能好,能在方法內建立對象,就不要在方法外建立對象。

Mandatory garbage collection makes Go a simpler and safer language.

This does not imply that garbage collection makes Go slow, or that garbage collection is the ultimate arbiter of the speed of your program.

What it does mean is memory allocated on the heap comes at a cost. It is a debt that costs CPU time every time the GC runs until that memory is freed.

mandatory :命令的,強制的,義務的
garbage :垃圾
collection :收集
imply :暗示
ultimate :最後的,最終的
arbiter :仲裁者
allocate :分配,
heap :堆
debt :債務

強制垃圾回收機制(GC),讓Go成爲一個簡單和安全的語言。這並非在暗示說GC讓go語言變慢,也不是說GC就是提升你的程序性能的最終王牌。

這裏的意思是,給堆的內存分配是有代價的。在內存被釋放以前,GC的運行一直佔用着cpu資源。

Gocon 2014 (24)

There is however another place to allocate memory, and that is the stack.

Unlike C, which forces you to choose if a value will be stored on the heap, via malloc, or on the stack, by declaring it inside the scope of the function, Go implements an optimisation called escape analysis.

allocate :分配
force :武力,力量n;強迫,強加,推進vt
via :經過
:內存動態分配函數,malloc的全稱是memory allocation,中文叫動態內存分配

declare :聲明,宣佈
scope :視野,範圍
implement :使生效malloc
還有另外一個地方,用來分配內存。
Go語言據此完成了一項優化,叫作escape analysis.
和C不同,它經過在函數範圍內聲明,讓你選擇是將變量存儲在堆、經過malloc、或者存儲在棧上。

Gocon 2014 (25)

Escape analysis determines whether any references to a value escape the function in which the value is declared.

If no references escape, the value may be safely stored on the stack.

Values stored on the stack do not need to be allocated or freed.

Lets look at some examples。

determines : 使下決心,肯定,限定
reference :說起,涉及
declare :宣佈,聲明

逃逸分析判斷變量引用是否逃逸,
若是沒有逃逸,則該值能夠安全地存儲在棧中。
存儲在棧中的值不須要分配或釋放。
讓咱們看一些例子。

stack和heap?

stack做用域是本地的(locals),在函數執行完以後會自動收回,CPU控制,效率高 
heap則須要由程序來管理,效率低
具體有篇文章講這個: Memory stack vs heap
所以,就算有GC,也應該把不須要傳出的參數儘可能控制在函數內。

 

Gocon 2014 (26)

Sum adds the numbers between 1 and 100 and returns the result. This is a rather unusual way to do this, but it illustrates how Escape Analysis works.

Because the numbers slice is only referenced inside Sum, the compiler will arrange to store the 100 integers for that slice on the stack, rather than the heap.

There is no need to garbage collect numbers, it is automatically freed when Sum returns.

illutrate :目不識丁的,文盲的
arrange :排列,安排,整理

在Sum中添加numbers,它是1-100之間的100個數字,並返回結果,這是一個很是規的方法,可是它能方便解釋逃逸分析的工做原理。

由於numbers只在Sum中被引用,因此編譯器安排這100個整型數字的隊列存儲在棧中,要比存儲在堆中好。這樣就不須要GC來回收numbers,當Sum結束後,將被自動釋放。

Gocon 2014 (27)

This second example is also a little contrived. In CenterCursor we create a new Cursor and store a pointer to it in c.

Then we pass c to the Center() function which moves the Cursor to the center of the screen.

Then finally we print the X and Y locations of that Cursor.

Even though c was allocated with the new function, it will not be stored on the heap, because no reference c escapes the CenterCursor function.

contrived :不天然的,勉強的
第二個例子,也比較勉強。在函數CenterCursor中,咱們建立了一個新的Cursor對象,並將它指向c的位置,而後,咱們將c傳遞給Center(),這個函數的做用是,將鼠標移動到屏幕中間。
最後,咱們打印出鼠標的位置座標X,Y。
儘管c是在新的函數中分配的,可是也沒有被存放在堆中,由於沒有函數CenterCursor以外對c的引用。
逃逸分析運行過程:

Gocon 2014 (28)

Go’s optimisations are always enabled by default. You can see the compiler’s escape analysis and inlining decisions with the -gcflags=-m switch.

Because escape analysis is performed at compile time, not run time, stack allocation will always be faster than heap allocation, no matter how efficient your garbage collector is.

I will talk more about the stack in the remaining sections of this talk.

perform :執行
remaining :剩餘的
section :章節,地區,部門

默認狀況下,Go的優化是開啓的。 你可使用-gcflags=-m switch查看編譯器的逃逸分析和內聯結果。
由於逃逸分析是在編譯時執行的,不是在運行時,因此不管垃圾收集器的效率如何,棧分配老是比堆分配快。
我將在本演講的其他部分詳細討論棧。

4、Goroutines

Gocon 2014 (30)

Goroutine是什麼?https://www.jianshu.com/p/7ebf732b6e1f

https://baijiahao.baidu.com/s?id=1620972759226100794&wfr=spider&for=pc

Go has goroutines. These are the foundations for concurrency in Go.

I want to step back for a moment and explore the history that leads us to goroutines.

In the beginning computers ran one process at a time. Then in the 60’s the idea of multiprocessing, or time sharing became popular.

In a time-sharing system the operating systems must constantly switch the attention of the CPU between these processes by recording the state of the current process, then restoring the state of another.

This is called process switching.

foundations :創建,基礎,地基
concurrency :併發
explore :探索
constantly :不斷地
switch :轉換

Go的goroutines,是併發的根基。

我想帶領你們探索協一下歷史,以便了解goroutines。剛開始的時候,電腦在同一時間只能運行一個進程,而後在上個世紀60年代,多線程和分時技術的想法開始變得流行。在分時操做系統中,操做系統必需要在多個進程間切換CPU資源時,要記錄當前進程的狀態,而後恢復其餘進程的運行。

這個就叫作進程切換。

Gocon 2014 (29)

進程切換的成本

There are three main costs of a process switch.

First is the kernel needs to store the contents of all the CPU registers for that process, then restore the values for another process.

The kernel also needs to flush the CPU’s mappings from virtual memory to physical memory as these are only valid for the current process.

Finally there is the cost of the operating system context switch, and the overhead of the scheduler function to choose the next process to occupy the CPU.

kernel :內核
virtual :虛擬的
valid :有效的
scheduler :調度程序
occupy :佔據

進程的切換主要有3個成本
首先,內核須要存儲該進程的全部CPU寄存器的內容,而後恢復另外一個進程的值。
內核還須要將CPU的映射從虛擬內存刷新到物理內存,由於這些映射僅對當前進程有效。
最後是操做系統上下文切換的成本,以及調度器選擇下一個進程使用CPU資源的開銷。

 

Gocon 2014 (31)

Processor registers :寄存器

There are a surprising number of registers in a modern processor. I have difficulty fitting them on one slide, which should give you a clue how much time it takes to save and restore them.

Because a process switch can occur at any point in a process’ execution, the operating system needs to store the contents of all of these registers because it does not know which are currently in use.

clue :線索,情節,爲...提供線索
occur :發生,舉行,存在
execution :實行,執行

如今的處理器中的寄存器的數量有一個驚人的數字,我很難爲它們找一個合適的位置放在PPT上,我是想用這個比喻讓你想象到保存和恢復它們須要多少時間。

由於在進程運行的過程當中,進程的切換會發生在任何一個時刻,操做系統須要保存全部寄存器的內容,由於它不知道當前正在使用哪種寄存器。

Gocon 2014 (32)

線程

This lead to the development of threads, which are conceptually the same as processes, but share the same memory space.

As threads share address space, they are lighter than processes so are faster to create and faster to switch between.

這個狀況致使線程的開發。這是一種概念上個進程同樣,可是多個線程能夠共享內存空間。

當線程共享地址空間的時候,他們比進程更加輕量,能夠更快的建立和切換。

Gocon 2014 (33)

Goroutines take the idea of threads a step further.

Goroutines are cooperatively scheduled, rather than relying on the kernel to manage their time sharing.

The switch between goroutines only happens at well defined points, when an explicit call is made to the Go runtime scheduler.

The compiler knows the registers which are in use and saves them automatically.

cooperatively :合做的,共同的,協做的
relying :依賴
defined :清晰的adj;給...下定義,使明確v
explicit:詳述的,明確的,清晰的

Goroutines 吸取並強化了線程的思想。

Goroutines 是協做調度,而不是靠內核來管理他們的分時操做。

當一個明確的調用被安排到GO的runtime調度器,goroutines之間的切換隻發生在肯定好的目標上。

編譯器知道哪個寄存器正在使用中,並自動保存。

Gocon 2014 (34)

While goroutines are cooperatively scheduled, this scheduling is handled for you by the runtime.

Places where Goroutines may yield to others are:

  • Channel send and receive operations, if those operations would block.
  • The Go statement, although there is no guarantee that new goroutine will be scheduled immediately.
  • Blocking syscalls like file and network operations.
  • After being stopped for a garbage collection cycle.
block :塊;阻塞;成批的
statement:聲明,陳述
guarantee :保證

雖然goroutines是協做調度,此調度的工做依賴的是runtime。

這裏是goroutines可能會迭代的幾點:

  • 通道發送和接收操做,若是這些操做阻塞的話。
  • go聲明,儘管不能保證新的goroutine將當即被調度。
  • 系統調用阻塞,如文件讀取和網絡操做等。
  • 在中止垃圾收集循環後。

Gocon 2014 (35)

This an example to illustrate some of the scheduling points described in the previous slide.

The thread, depicted by the arrow, starts on the left in the ReadFile function. It encounters os.Open, which blocks the thread while waiting for the file operation to complete, so the scheduler switches the thread to the goroutine on the right hand side.

Execution continues until the read from the c chan blocks, and by this time the os.Open call has completed so the scheduler switches the thread back the left hand side and continues to the file.Read function, which again blocks on file IO.

The scheduler switches the thread back to the right hand side for another channel operation, which has unblocked during the time the left hand side was running, but it blocks again on the channel send.

Finally the thread switches back to the left hand side as the Read operation has completed and data is available.

illustrate :給...加插圖,說明,闡明,代表
previous :之前的
depicted :描繪
arrow :箭頭
encounters :遇到,遭遇
execution :執行,實行,依法處決

這個例子演示上一張幻燈片中描述的一些調度點。
箭頭所示的線程從ReadFile函數的左側開始。它遇到os.open,在等待文件操做完成時阻塞線程,所以調度器將線程切換到右側的goroutine。
一直到從C chan遇到阻塞,ReadFile的執行將繼續,此時OS.open調用已完成,所以調度器將線程切換回左側,並繼續執行ReadFile函數,該函數再次遇到文件IO阻塞。
調度器將線程切換回右側以進行另外一個通道操做,在左側運行期間該線程已被解除阻塞,但它在通道發送時再次阻塞。
最後,當讀取操做完成且數據可用時,線程切換回左側。

Gocon 2014 (36)

This slide shows the low level runtime.Syscall function which is the base for all functions in the os package.

Any time your code results in a call to the operating system, it will go through this function.

The call to entersyscall informs the runtime that this thread is about to block.

This allows the runtime to spin up a new thread which will service other goroutines while this current thread blocked.

This results in relatively few operating system threads per Go process, with the Go runtime taking care of assigning a runnable Goroutine to a free operating system thread.

informs :通知
spin :快速旋轉
relatively :相對的,比較而言
assigning :分配

這張幻燈片展現了底層runtime.Syscall函數,它是os包中全部函數的基礎。
只要你的代碼調用操做系統,就要遇到這個函數。
entersyscall的調用通知runtime該線程即將阻塞。
這容許runtime啓動一個新線程,該線程將在當前線程被阻塞時爲其餘goroutine提供服務。
這致使每一個Go進程的佔用的操做系統線程相對較少,Go運行時,負責將可運行的goroutine分配給空閒的操做系統線程。

5、Segment And Copyings Stack

Gocon 2014 (37)

In the previous section I discussed how goroutines reduce the overhead of managing many, sometimes hundreds of thousands of concurrent threads of execution.

There is another side to the goroutine story, and that is stack management, which leads me to my final topic.

segment :段,指佔用數據文件空閒
segmented :劃分的,分割的
copying :複製
segmented and copying stacks :棧的劃分和複製

在上一節中,我討論了Goroutines如何下降大量,有時是成千上萬個併發執行線程的開銷的解決方案。

關於goroutine的故事還有另外一個方面,那就是棧管理,它將引導我進入最後一個主題。

Gocon 2014 (39)

This is a diagram of the memory layout of a process. The key thing we are interested is the location of the heap and the stack.

Traditionally inside the address space of a process, the heap is at the bottom of memory, just above the program (text) and grows upwards.

The stack is located at the top of the virtual address space, and grows downwards.

diagram :圖表
layout :佈局,安排,設計,陳列
Traditionally :傳統上,傳說上
located :位於
virtual :虛擬的

這是一個進程的內存佈局圖。咱們感興趣的關鍵點是堆和棧的位置。
傳統上,在進程的地址空間內,堆位於內存的底部,恰好在程序(文本)的上方,並向上增加。
堆棧位於虛擬地址空間的頂部,並向下增加。

Gocon 2014 (40)

Because the heap and stack overwriting each other would be catastrophic, the operating system usually arranges to place an area of unwritable memory between the stack and the heap to ensure that if they did collide, the program will abort.

This is called a guard page, and effectively limits the stack size of a process, usually in the order of several megabytes.

guard :保護,控制,警惕
guard page :保護頁
catastrophic/ˌkætəˈstrɔfɪk/ 災難性的
arrange :排列,安排,整理
collide :碰撞,抵觸
abort :流產,夭折,使終止,終止計劃

因爲堆和堆棧相互覆蓋將是災難性的,操做系統一般會安排在堆棧和堆之間放置一個不可寫內存區域,以確保若是它們發生衝突,程序將停止。
這被稱爲保護頁,並有效地限制了進程的堆棧大小,一般以幾兆字節的順序排列。

下面的幻燈片是線程的棧和保護頁

 

Gocon 2014 (41)

We’ve discussed that threads share the same address space, so for each thread, it must have its own stack.

Because it is hard to predict the stack requirements of a particular thread, a large amount of memory is reserved for each thread’s stack along with a guard page.

The hope is that this is more than will ever be needed and the guard page will never be hit.

The downside is that as the number of threads in your program increases, the amount of available address space is reduced.

predict :預報
particular :特定的
reserved :保留的
increases :增長

 

咱們已經討論過線程共享相同的地址空間,所以對於每一個線程,它必須有本身的棧。因爲很難預測特定線程的棧的大小的需求,所以爲每一個線程的棧以及保護頁預留了大量內存。但願的是比以往任什麼時候候都須要的多,而且永遠不會擊中保護頁面。缺點是,隨着程序中線程數的增長,可用地址空間的數量會減小。

Gocon 2014 (42)

We’ve seen that the Go runtime schedules a large number of goroutines onto a small number of threads, but what about the stack requirements of those goroutines ?

Instead of using guard pages, the Go compiler inserts a check as part of every function call to check if there is sufficient stack for the function to run. If there is not, the runtime can allocate more stack space.

Because of this check, a goroutines initial stack can be made much smaller, which in turn permits Go programmers to treat goroutines as cheap resources.

sufficient :足夠的
initial :最初的
permit :容許
treat :招待,對待

 

咱們已經瞭解了Go 的runtime將大量goroutines調度到少許線程上,可是這些goroutines的棧需求如何?
Go編譯器不使用保護頁,而是在每一個函數調用中插入一個檢查,以檢查是否有足夠的堆棧供函數運行。若是沒有,runtime就分配更多的棧空間。
因爲這種檢查,goroutines的初始堆棧能夠變得更小,反之,將容許go程序員將goroutines視爲廉價資源,不會由於過多的使用,擔憂資源被耗盡。

Gocon 2014 (43)

This is a slide that shows how stacks are managed in Go 1.2.

When G calls to H there is not enough space for H to run, so the runtime allocates a new stack frame from the heap, then runs H on that new stack segment. When H returns, the stack area is returned to the heap before returning to G.

frame :框架,結構
stack frame :棧幀
這張幻燈片爲咱們演示如何在Go 1.2中管理棧。
當g調用h時,h沒有足夠的空間運行,因此runtime從堆中分配一個新的棧幀,而後在新的棧段上運行h。當h返回時,這段棧空間在返回到g以前返回到堆。

Gocon 2014 (44)

This method of managing the stack works well in general, but for certain types of code, usually recursive code, it can cause the inner loop of your program to straddle one of these stack boundaries.

For example, in the inner loop of your program, function G may call H many times in a loop,

Each time this will cause a stack split. This is known as the hot split problem.

in general :通常而言
certain types :某些類型
recursive :遞歸的
canse :引發,使遭受
straddle :跨坐
boundaries :邊界

 

這種管理堆棧的方法一般工做得很好,可是對於某些類型的代碼,一般是遞歸代碼,它可能會致使程序的內部循環跨越他們其中一個的棧邊界。
例如,在程序的內部循環中,函數g能夠在一個循環中屢次調用h,每次這樣都會致使堆棧拆分。這就是所謂的熱拆分問題。

Gocon 2014 (45)

To solve hot splits, Go 1.3 has adopted a new stack management method.

Instead of adding and removing additional stack segments, if the stack of a goroutine is too small, a new, larger, stack will be allocated.

The old stack’s contents are copied to the new stack, then the goroutine continues with its new larger stack.

After the first call to H the stack will be large enough that the check for available stack space will always succeed.

This resolves the hot split problem.

爲了解決熱拆分問題,Go1.3採用了一種新的棧管理方法。
若是goroutine的棧過小,將分配一個更大的新堆棧,而不是添加和刪除其餘堆棧段。
舊堆棧的內容被複制到新堆棧,而後goroutine繼續其新的較大堆棧。
在對h的第一次調用以後,堆棧將足夠大,以便在檢查的時候,對於可用的堆棧空間始終成功經過檢查。
這解決了熱剝離問題。不會不停的拆分和分配空間了。

6、總結

Gocon 2014 (46)

Values, Inlining, Escape Analysis, Goroutines, and segmented/copying stacks.

These are the five features that I chose to speak about today, but they are by no means the only things that makes Go a fast programming language, just as there more that three reasons that people cite as their reason to learn Go.

As powerful as these five features are individually, they do not exist in isolation.

For example, the way the runtime multiplexes goroutines onto threads would not be nearly as efficient without growable stacks.

Inlining reduces the cost of the stack size check by combining smaller functions into larger ones.

Escape analysis reduces the pressure on the garbage collector by automatically moving allocations from the heap to the stack.

Escape analysis is also provides better cache locality.

Without growable stacks, escape analysis might place too much pressure on the stack.

cite :引用,想起
individually :分別的,個別的
isolation :隔離,與世隔絕
multiplex :多元的
pressure :壓力
garbage :垃圾
allocation :分配,配給

 

這些是我今天選擇談論的五個特性,但並不意味着它們是go成爲一種快速編程語言的惟一緣由,正如人們引用的三個以上的緣由做爲學習go的理由同樣。
儘管這五個特性各自都很強大,但它們並不孤立地存在。
例如,若是沒有可增加的棧,runtime將goroutine多路複用到線程上的方法就沒有那麼有效了。
經過將較小的函數組合爲較大的函數,內聯下降了堆棧大小檢查的成本。
逃逸分析經過自動將分配從堆移動到堆棧來下降GC的壓力。
逃逸分析也提供了更好的緩存位置。
若是沒有可增加的棧,逃逸分析可能會對棧施加過大的壓力。

 

 

Gocon 2014 (47)

* Thank you to the Gocon organisers for permitting me to speak today
* twitter / web / email details
* thanks to @offbymany, @billkennedy_go, and Minux for their assistance in preparing this talk.

相關文章
相關標籤/搜索