golang http服務的graceful問題

時間 2019-12-04

原文原文鏈接

背景

一個web服務若是能將自身變更和對外服務隔離對服務的穩定性和可用性是友好的，因此出現了graceful的東西，實現隨不一樣，但原理大體類似，我用的一個具體的實現請看 github.com/cgCodeLife/…，在測試中發現一個現象，在執行graceful的服務熱升級的時候發現golang的client會偶現EOF, read connection reset by peer, connection idle close等現象，因此須要結合我本身的測試代碼和現象分析下內部緣由，爲此有這篇文章以此做爲公共討論的地方但願能有人給出好的建議或指正個人問題，共同窗習進步。

結論

web服務雖然能實現graceful的能力，可是並非理想的，client會偶爾出現鏈接問題，無關乎併發量

測試環境

golang client

goang版本1.10

http協議 1.1

是否長鏈接是/否都嘗試過

併發數 1,30個都嘗試過

每一個鏈接發送的次數1， 1000次其中次數爲1次的實驗在client端未發現鏈接問題

請求方式 post十幾字節的字符串

golang server

golang版本 1.10

響應數據本身的進程號，7字節左右

問題分析

golang client代碼

package main

import (
	"net/http"
	log "github.com/sirupsen/logrus"
	"io/ioutil"
	"fmt"
	"bytes"
	"sync"
)

func main() {
	var wg sync.WaitGroup
	var count int
	var rw sync.RWMutex
TEST:
	for i := 0; i < 30; i++ {
		wg.Add(1)
		go func () {
			defer wg.Done()
			tr := http.Transport{DisableKeepAlives: false}
			client := &http.Client{Transport: &tr}
			for i := 0; i < 1000; i++ {
				f, err := ioutil.ReadFile("data")
				if err != nil {
					fmt.Println("read file err", err)
					return
				}
				fmt.Println(len(f))
				reader := bytes.NewReader(f)
				rw.Lock()
				count += 1
				index := count
				rw.Unlock()
				resp, err := client.Post("http://0.0.0.0:8888", "application/x-www-form-urlencoded", reader)
				if err != nil {
					rw.RLock()
					currentCount := count
					rw.RUnlock()
					log.Fatal(err, index, currentCount)
				}
				defer resp.Body.Close()
				data, err := ioutil.ReadAll(resp.Body)
				if err != nil {
					log.Fatal(err)
				}
				log.Printf("data[%s]", string(data))
			}
		}()
	}
	wg.Wait()
	goto TEST
}複製代碼

golang server代碼git

package main

import (
	graceful "github.com/cgCodeLife/graceful2"
	"net/http"
	log "github.com/sirupsen/logrus"
	"io/ioutil"
	"fmt"
	"os"
	"strconv"
)

func main() {
	server := graceful.NewServer()
	handler := http.HandlerFunc(handle)
	server.Register("0.0.0.0:8888", handler)
	err := server.Run()
	if err != nil {
		log.Fatal(err)
	}
}

func handle(w http.ResponseWriter, r *http.Request) {
	defer r.Body.Close()
	_, err := ioutil.ReadAll(r.Body)
	if err != nil {
		fmt.Println("read body error[%s] pid[%d]", err, os.Getpid())
	}

	w.Write([]byte(strconv.Itoa(os.Getpid())))
}複製代碼

實驗部分截圖

1個鏈接請求1次併發是1的狀況

1個鏈接請求1000次併發是1的狀況

1個鏈接請求1次併發是30 (鏈接資源應該耗盡了，可是沒有觸發EOF, reset等鏈接問題)

1個鏈接請求1000次併發是30

這裏簡單描述的我用的graceful的原理，它是一個master-worker模式，master常駐，只處理信號和像worker發送terminate信號，worker負責web服務，在收到信號以後會進行shutdown操做，邏輯就這些。

看下shutdown的代碼 src/net/http/server.go 2536行開始

// shutdownPollInterval is how often we poll for quiescence
// during Server.Shutdown. This is lower during tests, to
// speed up tests.
// Ideally we could find a solution that doesn't involve polling, // but which also doesn't have a high runtime cost (and doesn't // involve any contentious mutexes), but that is left as an // exercise for the reader. var shutdownPollInterval = 500 * time.Millisecond // Shutdown gracefully shuts down the server without interrupting any // active connections. Shutdown works by first closing all open // listeners, then closing all idle connections, and then waiting // indefinitely for connections to return to idle and then shut down. // If the provided context expires before the shutdown is complete, // Shutdown returns the context's error, otherwise it returns any
// error returned from closing the Server's underlying Listener(s). // // When Shutdown is called, Serve, ListenAndServe, and // ListenAndServeTLS immediately return ErrServerClosed. Make sure the // program doesn't exit and waits instead for Shutdown to return.
//
// Shutdown does not attempt to close nor wait for hijacked
// connections such as WebSockets. The caller of Shutdown should
// separately notify such long-lived connections of shutdown and wait
// for them to close, if desired. See RegisterOnShutdown for a way to
// register shutdown notification functions.
func (srv *Server) Shutdown(ctx context.Context) error {
	atomic.AddInt32(&srv.inShutdown, 1)
	defer atomic.AddInt32(&srv.inShutdown, -1)

	srv.mu.Lock()
	lnerr := srv.closeListenersLocked()
	srv.closeDoneChanLocked()
	for _, f := range srv.onShutdown {
		go f()
	}
	srv.mu.Unlock()

	ticker := time.NewTicker(shutdownPollInterval)
	defer ticker.Stop()
	for {
		if srv.closeIdleConns() {
			return lnerr
		}
		select {
		case <-ctx.Done():
			return ctx.Err()
		case <-ticker.C:
		}
	}
}複製代碼

shutdown主要作兩件事情

1.終止listen

2.關閉全部空閒鏈接

空閒鏈接怎麼來的呢，在request的時候就把conn設置爲active，而後在當前請求處理完成以後設置成idle,因此，因此我理解的在一個鏈接上若是持續屢次請求會比較容易出現shutdown掃描的時候這個request雖然被idle了，可是在close的同時收到了客戶端發送過來的請求被reset，因此懷疑golang在close的時候是同時關閉socket fd的讀寫通道的(爲此我單獨針對shutdown作了一個小實驗， juejin.im/post/5d033c… )，因此這種狀況會出現client鏈接問題。