用Golang處理每分鐘百萬級請求

時間 2019-11-09

標籤 golang 處理每分鐘百萬請求欄目 Go 简体版

原文原文鏈接

翻譯原文連接轉帖／轉載請註明出處html

原文連接@medium.com 發表於2017/08/30git

我在防垃圾郵件，防病毒和防惡意軟件領域已經工做了15年，先後在好幾個公司任職。我知道這些系統最後都會由於要處理海量的數據而變得很是複雜。github

我如今是smsjunk.com的CEO而且是KnowBe4的首席架構師。這兩個公司在網絡安全領域都很是活躍。golang

有趣的是，在過去10年裏做爲一個碼農，全部我經歷過的網站後臺開發用的幾乎都是用Ruby on Rails。不要誤解，我很喜歡Ruby on Rails而且認爲它是一個很是棒的開發環境。每每在一段時間後，你開始以ruby的方式來設計系統。這時你會忘記利用多線程，並行，快速執行（fast executions）和較小的內存開銷（small memory overhead），軟件的架構會變得簡單而高效。不少年來，我一直是C/C++，Delphi和C#的開發者。我開始意識到使用正確的工具，工做會變得簡單不少。docker

我對語言和框架並非很熱衷。我相信效率，產出和代碼的可維護性取決於你如何架構一個簡潔地解決方案。編程

問題

在開發咱們的匿名遙測和分析系統時，咱們的目標是可以處理從上百萬個端點發來的大量POST請求。HTTP請求處理函數會收到包含不少載荷（payloads）的JSON文檔。這些載荷（payloads）須要被寫到Amazon S3上，接着由map-reduce系統來處理。json

一般咱們會建立一個worker池架構（worker-tier architecture)。利用以下的一些工具：緩存

而後設置兩個集羣，一個用做處理HTTP請求，另一個用做workers。這樣咱們可以根據處理的後臺工做量進行擴容。

從一開始咱們小組就以爲應該用Go來實現，由於在討論階段咱們估計這可能會是一個處理很是大流量的系統。我已經使用Go語言兩年並用它在工做中開發了一些系統，但它們都沒有處理過這麼大的負載（load）。

咱們首先建立了幾個結構來定義HTTP請求的載荷。咱們經過POST請求接收這些載荷，而後用一個函數上傳到S3 bucket。

type PayloadCollection struct {
    WindowsVersion  string    `json:"version"`
    Token           string    `json:"token"`
    Payloads        []Payload `json:"data"`
}

type Payload struct {
    // [redacted]
}

func (p *Payload) UploadToS3() error {
    // the storageFolder method ensures that there are no name collision in
    // case we get same timestamp in the key name
    storage_path := fmt.Sprintf("%v/%v", p.storageFolder, time.Now().UnixNano())

    bucket := S3Bucket

    b := new(bytes.Buffer)
    encodeErr := json.NewEncoder(b).Encode(payload)
    if encodeErr != nil {
        return encodeErr
    }

    // Everything we post to the S3 bucket should be marked 'private'
    var acl = s3.Private
    var contentType = "application/octet-stream"

    return bucket.PutReader(storage_path, b, int64(b.Len()), contentType, acl, s3.Options{})
}

簡單地使用Goroutines

一開始咱們用了最簡單的方法來實現POST請求的處理函數。咱們嘗試經過goroutine來並行處理請求。

func payloadHandler(w http.ResponseWriter, r *http.Request) {

    if r.Method != "POST" {
        w.WriteHeader(http.StatusMethodNotAllowed)
        return
    }

    // Read the body into a string for json decoding
    var content = &PayloadCollection{}
    err := json.NewDecoder(io.LimitReader(r.Body, MaxLength)).Decode(&content)
    if err != nil {
        w.Header().Set("Content-Type", "application/json; charset=UTF-8")
        w.WriteHeader(http.StatusBadRequest)
        return
    }

    // Go through each payload and queue items individually to be posted to S3
    for _, payload := range content.Payloads {
        go payload.UploadToS3()   // <----- DON'T DO THIS
    }

    w.WriteHeader(http.StatusOK)
}

對於適量的負載，這個方案應該沒有問題。可是負載增長之後這個方法就不能很好地工做。當咱們把這個版本部署到生產環境中後，咱們遇到了比預期大一個數量級的請求量。咱們徹底低估了流量。

這個方法有些不盡如人意。它沒法控制建立goroutine的數量。由於咱們每分鐘收到了一百萬個POST請求，上面的代碼很快就奔潰了。

再次嘗試

咱們須要一個不一樣的解決方案。在一開始，咱們就討論到須要把HTTP請求處理函數寫的簡潔，而後把處理工做轉移到後臺。固然，這是你在Ruby on Rails世界裏必須作的，不然你會阻塞全部worker的工做（例如puma，unicorn，passenger等等，咱們這裏就不繼續討論JRuby了）。咱們須要用到Resque，Sidekiq，SQS等經常使用的解決方案。這個列表能夠很長，由於有許多方法來完成這項任務。

第二個版本是建立帶緩衝的channel。這樣咱們能夠把工做任務放到隊列裏而後再上傳到S3。由於能夠控制隊列的長度而且有充足的內存，咱們以爲把工做任務緩存在channel隊列裏應該沒有問題。

var Queue chan Payload

func init() {
    Queue = make(chan Payload, MAX_QUEUE)
}

func payloadHandler(w http.ResponseWriter, r *http.Request) {
    ...
    // Go through each payload and queue items individually to be posted to S3
    for _, payload := range content.Payloads {
        Queue <- payload
    }
    ...
}

而後咱們須要從隊列裏提取工做任務並進行處理。代碼下圖所示：

func StartProcessor() {
    for {
        select {
        case job := <-Queue:
            job.payload.UploadToS3()  // <-- STILL NOT GOOD
        }
    }
}

坦白的說，我不知道咱們當時在想什麼。這確定是熬夜喝紅牛的結果。這個方法並無給咱們帶來任何幫助。隊列僅僅是將問題延後了。咱們的處理函數（processor）一次僅上傳一個載荷（payload），而接收請求的速率比一個處理函數上傳S3的能力大太多了，帶緩衝的channel很快就到達了它的極限。從而阻塞了HTTP請求處理函數往隊列裏添加更多的工做任務。

咱們僅僅是延緩了問題的觸發。系統在倒計時，最後仍是崩潰了。在這個有問題的版本被部署以後，系統的延遲以恆定速度在不停地增加。

更好的解決辦法

咱們決定使用Go channel的經常使用編程模式。使用一個兩級channel系統，一個用來存聽任務隊列，另外一個用來控制處理任務隊列的併發量。

這裏的想法是根據一個可持續的速率將S3上傳並行化。這個速率不會使機器變慢或者致使S3的鏈接錯誤。咱們選擇了一個Job／Worker模式。若是大家對Java，C#等語言熟悉的話，能夠把它想象成是Go語言用channel來實現的工做線程池。

var (
    MaxWorker = os.Getenv("MAX_WORKERS")
    MaxQueue  = os.Getenv("MAX_QUEUE")
)

// Job represents the job to be run
type Job struct {
    Payload Payload
}

// A buffered channel that we can send work requests on.
var JobQueue chan Job

// Worker represents the worker that executes the job
type Worker struct {
    WorkerPool  chan chan Job
    JobChannel  chan Job
    quit        chan bool
}

func NewWorker(workerPool chan chan Job) Worker {
    return Worker{
        WorkerPool: workerPool,
        JobChannel: make(chan Job),
        quit:       make(chan bool)}
}

// Start method starts the run loop for the worker, listening for a quit channel in
// case we need to stop it
func (w Worker) Start() {
    go func() {
        for {
            // register the current worker into the worker queue.
            w.WorkerPool <- w.JobChannel

            select {
            case job := <-w.JobChannel:
                // we have received a work request.
                if err := job.Payload.UploadToS3(); err != nil {
                    log.Errorf("Error uploading to S3: %s", err.Error())
                }

            case <-w.quit:
                // we have received a signal to stop
                return
            }
        }
    }()
}

// Stop signals the worker to stop listening for work requests.
func (w Worker) Stop() {
    go func() {
        w.quit <- true
    }()
}

咱們修改了HTTP請求處理函數來建立一個含有載荷（payload）的Job結構，而後將它送到一個叫JobQueue的channel。worker會對它們進行處理。

func payloadHandler(w http.ResponseWriter, r *http.Request) {

    if r.Method != "POST" {
        w.WriteHeader(http.StatusMethodNotAllowed)
        return
    }

    // Read the body into a string for json decoding
    var content = &PayloadCollection{}
    err := json.NewDecoder(io.LimitReader(r.Body, MaxLength)).Decode(&content)
    if err != nil {
        w.Header().Set("Content-Type", "application/json; charset=UTF-8")
        w.WriteHeader(http.StatusBadRequest)
        return
    }

    // Go through each payload and queue items individually to be posted to S3
    for _, payload := range content.Payloads {

        // let's create a job with the payload
        work := Job{Payload: payload}

        // Push the work onto the queue.
        JobQueue <- work
    }

    w.WriteHeader(http.StatusOK)
}

在初始化服務的時候，咱們建立了一個Dispatcher而且調用了Run()函數來建立worker池。這些worker會監聽JobQueue上是否有新的任務並進行處理。

dispatcher := NewDispatcher(MaxWorker)
dispatcher.Run()

下面是咱們的dispatcher實現代碼：

type Dispatcher struct {
    // A pool of workers channels that are registered with the dispatcher
    WorkerPool chan chan Job
}

func NewDispatcher(maxWorkers int) *Dispatcher {
    pool := make(chan chan Job, maxWorkers)
    return &Dispatcher{WorkerPool: pool}
}

func (d *Dispatcher) Run() {
    // starting n number of workers
    for i := 0; i < d.maxWorkers; i++ {
        worker := NewWorker(d.pool)
        worker.Start()
    }

    go d.dispatch()
}

func (d *Dispatcher) dispatch() {
    for {
        select {
        case job := <-JobQueue:
            // a job request has been received
            go func(job Job) {
                // try to obtain a worker job channel that is available.
                // this will block until a worker is idle
                jobChannel := <-d.WorkerPool

                // dispatch the job to the worker job channel
                jobChannel <- job
            }(job)
        }
    }
}

這裏咱們提供了建立worker的最大數目做爲參數，並把這些worker加入到worker池裏。由於咱們已經在docker化的Go環境裏使用了Amazon的Elasticbeanstalk而且嚴格按照12-factor方法來配置咱們的生產環境，這些參數值能夠從環境變量裏得到。咱們能夠方便地控制worker數目和任務隊列的長度。咱們能夠快速地調整這些值而不須要從新部署整個集羣。

var (
  MaxWorker = os.Getenv("MAX_WORKERS")
  MaxQueue  = os.Getenv("MAX_QUEUE")
)

部署了新版本以後，咱們看到系統延遲一會兒就降到了能夠忽略的量級。同時處理請求的能力也大幅攀升。

在Elastic Load Balancers熱身後幾分鐘，咱們看到Elasticbeanstalk應用開始處理將近每分鐘一百萬個請求。咱們的流量一般在早上的時候會攀升至超過每分鐘一百萬個請求。同時，咱們也將服務器的數目從100臺縮減到了20臺。

經過合理地配置集羣和auto-scaling，咱們可以作到只配置4臺EC2 c4.Large實例。而後當CPU使用率持續5分鐘在90%以上時用Elastic Auto-Scaling來建立新的實例。

結束語

對我來講簡潔（simplicity）是第一位的。咱們能夠利用無數隊列，不少後臺worker以及複雜的部署來設計一個複雜系統，最終咱們仍是使用了Elasticbeanstalk auto-scaling的強大功能和Go語言提供的應對併發的簡單方法。用僅僅4臺機器（可能還不如個人MacBook Pro強大）來處理每分鐘一百萬次POST請求對Amazon S3進行寫操做。

每項任務都有對應的正確工具。當你的Ruby on Rails系統須要一個很強大的HTTP請求處理器，能夠嘗試看看ruby生態系統之外的其它更強大的選項。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。