MIT 6.824學習筆記3 Go語言併發解析

時間 2019-11-20

標籤 mit 6.824 學習筆記語言併發解析简体版

原文原文鏈接

以前看過一個go語言併發的介紹：https://www.cnblogs.com/pdev/p/10936485.html 但這個太簡略啦。下面看點深刻的html

還記得http://www.javashuo.com/article/p-kacbshul-ew.html中咱們寫過一個簡單的爬蟲。這裏面就用到了Go的兩種併發方式：golang

1. Go routines和Go channels（ConcurrentChannel），這是Go語言特有的一種併發方式，能夠簡化編程

1.1 Go routines

Goroutines 能夠看做是輕量級線程。建立一個 goroutine 很是簡單，只須要把 go 關鍵字放在函數調用語句前。爲了說明這有多麼簡單，咱們建立兩個 finder 函數，並用 go 調用，讓它們每次找到 "ore" 就打印出來。算法

package main
import (
    "fmt"
    "time"
    "math/rand"
)

func finder(mines [5]string, coreid int) {
    <-time.After(time.Second * time.Duration(coreid))
    rand.Seed(time.Now().UnixNano())
    idx := rand.Intn(5)
    fmt.Println(time.Now(), coreid, mines[idx])
}

func main() {
    theMine := [5]string{"rock", "ore", "gold", "copper", "sliver"}
    go finder(theMine, 1)
    go finder(theMine, 2)
    <-time.After(time.Second * 3) //you can ignore this for now
    fmt.Println(time.Now(), "END")
}

程序的輸出以下:

F:\My Drive\19summer\6824>go run gor.go
2019-08-01 17:45:41.0985917 -0500 CDT m=+1.001057201 1 ore
2019-08-01 17:45:42.0986489 -0500 CDT m=+2.001114401 2 ore
2019-08-01 17:45:43.0987061 -0500 CDT m=+3.001171601 END

從執行時間能夠看出，兩個finder是併發運行的編程

但這兩個線程是彼此獨立的。若是他們須要交流信息呢？就須要Go channel了。segmentfault

1.2 Go Channel

Channels 容許 go routines 之間相互通訊。你能夠把 channel 看做管道，goroutines 能夠往裏面發消息，也能夠從中接收其它 go routines 的消息。安全

myFirstChannel := make(chan string)併發

Goroutines 能夠往 channel 發送消息，也能夠從中接收消息。這是經過箭頭操做符 (<-) 完成的，它指示 channel 中的數據流向。ide

myFirstChannel <-"hello" // Send函數

myVariable := <- myFirstChannel // Receiveoop

再來看一個程序：

package main
import (
    "fmt"
    "time"
)

func main() {
    theMine := [5]string{"ore1", "ore2", "ore3", "ore4", "ore5"}
    oreChan := make(chan string)

    // Finder
    go func(mine [5]string) {
        for _, item := range mine {
            oreChan <- item //send
            fmt.Println("Miner: Send " + item + " to breaker")
        }
    }(theMine)

    // Ore Breaker
    go func() {
        for i := 0; i < 5; i++ {
            foundOre := <-oreChan //receive
            <-time.After(time.Nanosecond * 10)
            fmt.Println("Miner: Receive " + foundOre + " from finder")
        }
    }()

    <-time.After(time.Second * 5) // Again, ignore this for now
}

程序的輸出以下：

F:\My Drive\19summer\6824>go run gor2.go
Miner: Send ore1 to breaker
Miner: Receive ore1 from finder
Miner: Send ore2 to breaker
Miner: Receive ore2 from finder
Miner: Send ore3 to breaker
Miner: Receive ore3 from finder
Miner: Send ore4 to breaker
Miner: Receive ore4 from finder
Miner: Send ore5 to breaker
Miner: Receive ore5 from finder

能夠看到已經能夠經過go channel在線程之間進行通訊啦！

在receive和fmt.Println之間的<-time.After(time.Nanosecond * 10)是爲了方便在命令行查看輸出，不然由於cpu運行程序太快了，命令行打印順序會和實際運行順序不同。

1.3 阻塞的Go Channel

默認的，信道的存消息和取消息都是阻塞的 (叫作無緩衝的信道)。也就是說, 無緩衝的信道在取消息和存消息的時候都會掛起當前的goroutine，除非另外一端已經準備好。Channels 阻塞 goroutines 發生在各類情形下。這能在 goroutines 各自歡快地運行以前，實現彼此之間的短暫同步。

Blocking on a Send：一旦一個 goroutine(gopher) 向一個 channel 發送了數據，它就被阻塞了，直到另外一個 goroutine 從該 channel 取走數據。

Blocking on a Receive：和發送時情形相似，當channel是空的時，一個 goroutine 可能阻塞着等待從一個 channel 獲取數據。

一開始接觸阻塞的概念可能使人有些困惑，但你能夠把它想象成兩個 goroutines(gophers) 之間的交易。其中一個 gopher 不管是等着收錢仍是送錢，都須要等待交易的另外一方出現。

既然已經瞭解 goroutine 經過 channel 通訊可能發生阻塞的不一樣情形，讓咱們討論兩種不一樣類型的 channels: unbuffered 和 buffered 。選擇使用哪種 channel 可能會改變程序的運行表現。

Unbuffered Channels：在前面的例子中咱們一直在用 unbuffered channels，它們不同凡響的地方在於每次只有一份數據能夠經過。不管如何，咱們測試到的無緩衝信道的大小都是0 (len(channel))

Buffered Channels：在併發程序中，時間協調並不老是完美的。在挖礦的例子中，咱們可能遇到這樣的情形：開礦 gopher 處理一塊礦石所花的時間，尋礦 gohper 可能已經找到 3 塊礦石了。爲了避免讓尋礦 gopher 浪費大量時間等着給開礦 gopher 傳送礦石，咱們可使用 buffered channel。咱們先建立一個容量爲 3 的 buffered channel。

bufferedChan := make(chan string, 3)

buffered 和 unbuffered channels 工做原理相似，但有一點不一樣—在須要另外一個 gorountine 取走數據以前，咱們能夠向 buffered channel 發送3份數據，而在buffer滿以前都不會發生阻塞，而當第4份數據發過來時就會發生阻塞。也就是說，緩衝信道會在滿容量的時候加鎖。

無緩衝區的channel能夠理解爲make(chan string, 0)

例以下面的程序：

package main
import (
    "fmt"
    "time"
)

func main() {
    bufferedChan := make(chan string, 3)

    go func() {
        bufferedChan <-"first"
        fmt.Println("Sent 1st")
        bufferedChan <-"second"
        fmt.Println("Sent 2nd")
        bufferedChan <-"third"
        fmt.Println("Sent 3rd")
    }()

    <-time.After(time.Second * 1)

    go func() {
        firstRead := <- bufferedChan
        fmt.Println("Receiving..")
        fmt.Println(firstRead)
        secondRead := <- bufferedChan
        fmt.Println(secondRead)
        thirdRead := <- bufferedChan
        fmt.Println(thirdRead)
    }()

    <-time.After(time.Second * 5) // Again, ignore this for now
}

輸出結果以下：

F:\My Drive\19summer\6824>go run gor2.go
Sent 1st
Sent 2nd
Sent 3rd
Receiving..
first
second
third

相比最初的例子，已經有了很大改進！如今每一個函數都獨立地運行在各自的 goroutines 中。此外，每次處理完一塊礦石，它就會被帶進挖礦流水線的下一個階段。

其實，緩衝信道是先進先出的，咱們能夠把緩衝信道看做爲一個線程安全的隊列：

func main() {
    ch := make(chan int, 3)
    ch <- 1
    ch <- 2
    ch <- 3

    fmt.Println(<-ch) // 1
    fmt.Println(<-ch) // 2
    fmt.Println(<-ch) // 3
}

1.4 其餘一些概念

匿名的 Goroutines

咱們能夠用以下方式建立一個匿名函數並運行在它的 goroutine 中。若是隻須要調用一次函數，經過這種方式咱們可讓它在本身的 goroutine 中運行，而不須要建立一個正式的函數聲明。

go func() {
    fmt.Println("I'm running in my own go routine")
}()

和匿名函數的定義很是像

main 函數是一個 goroutine

main 函數確實運行在本身的 goroutine 中！更重要的是要知道，一旦 main 函數返回，它將關掉當前正在運行的其餘 goroutines。這就是爲何咱們在 main 函數的最後設置了一個定時器—它建立了一個 channel，並在 5 秒後發送一個值。經過添加上面這行代碼，main routine 會阻塞，以給其餘 goroutines 5 秒的時間來運行。不然主線程就會過早結束，致使finder沒有機會執行

<-time.After(time.Second * 5) // Receiving from channel after 5 sec

但是採用等待的辦法並很差，若是能像Python同樣有個thread.join()來阻塞主線程，等待全部子線程跑完就行了。

有一種方法能夠阻塞 main 函數直到其餘全部 goroutines 都運行完。一般的作法是建立一個 done channel， main 函數在等待讀取它時被阻塞。一旦完成工做，向這個 channel 發送數據，程序就會結束了。

func main() {
    doneChan := make(chan string)

    go func() {
        // Do some work…
        doneChan <- "I'm all done!"
    }()

    <-doneChan // block until go routine signals work is done
}

能夠遍歷 channel

在前面的例子中咱們讓 miner 在 for 循環中迭代 3 次從 channel 中讀取數據。若是咱們不能確切知道將從 finder 接收多少塊礦石呢？

相似於對集合數據類型 (注: 如 slice) 進行遍歷，你也能夠遍歷一個 channel。更新前面的 miner 函數，咱們能夠這樣寫：

// Ore Breaker
go func() {
    for foundOre := range oreChan {
        fmt.Println("Miner: Received " + foundOre + " from finder")
    }
}()

因爲 miner 須要讀取 finder 發送給它的全部數據，遍歷 channel 能確保咱們接收到已經發送的全部數據。

注意遍歷 channel 會阻塞，直到有新數據被髮送到 channel。下面這個程序就會發生死鎖：

func main() {
    ch := make(chan int, 3)
    ch <- 1
    ch <- 2
    ch <- 3

    for v := range ch {
        fmt.Println(v)
    }
}

緣由是range不等到信道關閉是不會結束讀取的。也就是若是緩衝信道乾涸了，那麼range就會阻塞當前goroutine, 因此死鎖咯。在全部數據發送完以後避免 go routine 阻塞的惟一方法就是用 "close(channel)" 關掉 channel。以下程序

ch := make(chan int, 3)
ch <- 1
ch <- 2
ch <- 3

// 顯式地關閉信道
close(ch)

for v := range ch {
    fmt.Println(v)
}

被關閉的信道會禁止數據流入, 是隻讀的。咱們仍然能夠從關閉的信道中取出數據，可是不能再寫入數據了。

對 channel 進行非阻塞讀寫（不用擔憂channel空/滿形成阻塞）

有一個技巧，利用 Go 的 select case 語句能夠實現對 channel 的非阻塞讀。經過使用這這種語句，若是 channel 有數據，goroutine 將會從中讀取，不然就執行默認的分支。

myChan := make(chan string)

go func(){
    myChan <- "Message!"
}()


select {
    case msg := <- myChan:
        fmt.Println(msg)
    default:
        fmt.Println("No Msg")
}

<-time.After(time.Second * 1)

select {
    case msg := <- myChan:
        fmt.Println(msg)
    default:
        fmt.Println("No Msg")
}

程序輸出以下:

No Msg
Message!

非阻塞寫也是使用一樣的 select case 語句來實現，惟一不一樣的地方在於，case 語句看起來像是發送而不是接收。

select {
    case myChan <- "message":
        fmt.Println("sent the message")
    default:
        fmt.Println("no message sent")
}

1.5 併發和並行

默認地， Go全部的goroutines只能在一個線程（一個cpu核心）裏跑。也就是說，兩個go routine不是並行的，可是是併發的。在同一個原生線程裏，若是當前goroutine不發生阻塞，它是不會讓出CPU時間給其餘同線程的goroutines的，這是Go運行時對goroutine的調度，咱們也可使用runtime包來手工調度。

前面帶有sleep的程序看時間像是「並行」的，是由於sleep函數則阻塞掉了當前goroutine, 當前goroutine主動讓其餘goroutine執行, 因此造成了邏輯上的並行, 也就是併發。而對於下面這段程序，兩個goroutine是一個一個進行的，打印的結果老是同樣的：

var quit chan int = make(chan int)

func loop() {
    for i := 0; i < 10; i++ {
        fmt.Printf("%d ", i)
    }
    quit <- 0
}

func main() {
    go loop()
    go loop()

    for i := 0; i < 2; i++ {
        <- quit
    }
}

F:\My Drive\19summer\6824>go run gor2.go
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

還有一個頗有意思的例子：https://segmentfault.com/q/1010000000207474

爲了能實現真正的多核並行，咱們須要用到runtime包(runtime包是goroutine的調度器)，來顯式的指定要用兩個核心。有兩種實現方案：

1. 指定要用幾個核

package main
import (
    "fmt"
    "runtime"
)

var quit chan int = make(chan int)

func loop(coreid int) {
    for i := 0; i < 1000; i++ { //爲了觀察，跑多些
        fmt.Printf("%d-%d ", coreid, i)
    }
    quit <- 0
}

func main() {
    runtime.GOMAXPROCS(2) // 最多使用2個核

    go loop(0)
    go loop(1)

    for i := 0; i < 2; i++ {
        <- quit
    }
}

這種輸出將會是不規律的兩個線程交替輸出，達到了真正的並行

2. 顯式地讓出CPU時間（其實這種主動讓出CPU時間的方式仍然是在單核裏跑。但手工地切換goroutine致使了看上去的「並行」。）

package main
import (
    "fmt"
    "runtime"
)

var quit chan int = make(chan int)

func loop(coreid int) {
    for i := 0; i < 10; i++ { //爲了觀察，跑多些
        runtime.Gosched() // 顯式地讓出CPU時間給其餘goroutine
        fmt.Printf("%d-%d ", coreid, i)
    }
    quit <- 0
}

func main() {
    go loop(0)
    go loop(1)

    for i := 0; i < 2; i++ {
        <- quit
    }
}

輸出是很是有規律的交替進行：
F:\My Drive\19summer\6824>go run gor2.go
1-0 0-0 1-1 0-1 1-2 0-2 1-3 0-3 1-4 0-4 1-5 0-5 1-6 0-6 1-7 0-7 1-8 0-8 1-9 0-9

關於runtime包幾個函數:

Gosched 讓出cpu
NumCPU 返回當前系統的CPU核數量
GOMAXPROCS 設置最大的可同時使用的CPU核數
Goexit 退出當前goroutine(可是defer語句會照常執行)

咱們知道「進程是資源分配的最小單位，線程是CPU調度的最小單位」。那麼go routine和線程有什麼關係呢？能夠看go官方文檔中的一段話（https://golang.org/doc/faq#goroutines）：

Why goroutines instead of threads?

Goroutines are part of making concurrency easy to use. The idea, which has been around for a while, is to multiplex independently executing functions—coroutines（協程）—onto a set of threads. When a coroutine blocks, such as by calling a blocking system call, the run-time automatically moves other coroutines on the same operating system thread to a different, runnable thread so they won't be blocked. The programmer sees none of this, which is the point. The result, which we call goroutines, can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes.

To make the stacks small, Go's run-time uses resizable, bounded stacks. A newly minted goroutine is given a few kilobytes, which is almost always enough. When it isn't, the run-time grows (and shrinks) the memory for storing the stack automatically, allowing many goroutines to live in a modest amount of memory. The CPU overhead averages about three cheap instructions per function call. It is practical to create hundreds of thousands of goroutines in the same address space. If goroutines were just threads, system resources would run out at a much smaller number.

協程能夠理解爲同一個線程經過上下文切換來「超線程」，併發執行兩個工做。( https://www.liaoxuefeng.com/wiki/897692888725344/923057403198272 )

對於 進程、線程，都是有內核進行調度，有 CPU 時間片的概念，進行 搶佔式調度（有多種調度算法）

對於 協程(用戶級線程)，這是對內核透明的，也就是系統並不知道有協程的存在，是徹底由用戶本身的程序進行調度的，
由於是由用戶程序本身控制，那麼就很難像搶佔式調度那樣作到強制的 CPU 控制權切換到其餘進程/線程，一般只能進行
協做式調度，須要協程本身主動把控制權轉讓出去以後，其餘協程才能被執行到。

本質上，goroutine 就是協程。 不一樣的是，Golang 在 runtime、系統調用等多方面對 goroutine 調度進行了封裝
和處理，當遇到長時間執行或者進行系統調用時，會主動把當前 goroutine 的CPU (P) 轉讓出去，讓其餘 goroutine
能被調度並執行，也就是 Golang 從語言層面支持了協程。Golang 的一大特點就是從語言層面原生支持協程，在函數或
者方法前面加 go關鍵字就可建立一個協程。

http://www.javashuo.com/article/p-xzfhmmyd-er.html

假設咱們開了三個Goroutine，但只分配了兩個核（兩個線程），會發生什麼呢？寫段程序來試驗一下：

package main

import (
    "fmt"
    "runtime"
)

var quit chan int = make(chan int)

func loop(id int) { // id: 該goroutine的標號
    for i := 0; i < 100; i++ { //打印10次該goroutine的標號
        fmt.Printf("%d ", id)
    }
    quit <- 0
}

func main() {
    runtime.GOMAXPROCS(2) // 最多同時使用2個核

    for i := 0; i < 3; i++ { //開三個goroutine
        go loop(i)
    }

    for i := 0; i < 3; i++ {
        <- quit
    }
}


輸出結果有不少種：
F:\My Drive\19summer\6824>go run gor2.go
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
F:\My Drive\19summer\6824>go run gor2.go
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F:\My Drive\19summer\6824>