go http 框架性能大幅降低緣由分析

    最近在開發一個web 框架,而後業務方使用過程當中,跟咱們說,壓測qps 上不去,我就很納悶,httprouter + net/http.httpserver , 性能不可能這麼差啊,網上的壓測結果都是10w qps 以上,幾個middleware 至於將性能拖垮?後來一番排查,發現些有意思的東西。git

    首先,我就簡單壓測hello world, 每一個請求進來,我日誌都不打,而後,打開pprof ,顯示的狀況以下:github

    這裏futex 怎麼這麼高?看着上面的一些操做,addtimer, deltimer 我想到之前的本身實現的定時器,這估計是超時引發的。而後檢查版本,go1.9,  而後框架默認爲每一個conn 設置了4個timeout,readtimeout, writetimeout, idletimeout, headertimeout ,這直接致使了定時器在添加和刪除回調的時候,鎖的壓力特別大。web

    下面咱們分析下,正常的加超時操做,到底發生了些什麼,下面是個最簡單的例子,爲了安全,每一個鏈接設置超時。promise

package main

import (
	"fmt"
	"github.com/julienschmidt/httprouter"
	"log"
	"net/http"
	"time"
)

func Index(w http.ResponseWriter, r *http.Request, _ httprouter.Params) {
	fmt.Fprint(w, "Welcome!\n")
}

func Hello(w http.ResponseWriter, r *http.Request, ps httprouter.Params) {
	fmt.Fprintf(w, "hello, %s!\n", ps.ByName("name"))
}

func main() {

	router := httprouter.New()
	router.GET("/", Index)
	router.GET("/hello/:name", Hello)

	srv := &http.Server{
		ReadTimeout:       5 * time.Second,
		WriteTimeout:      10 * time.Second,
		ReadHeaderTimeout: 10 * time.Second,
		IdleTimeout:       10 * time.Second,
		Addr:              "0.0.0.0:8998",
		Handler:           router,
	}

	log.Fatal(srv.ListenAndServe())
}

    其中,ListenAndServe() 在調用accept 每一個鏈接後,會調用 server.serve(), 根據是否添加超時,調用conn.SetReadDeadline等函數,對應的是 net/http/server.go,以下:安全

// Serve a new connection.
func (c *conn) serve(ctx context.Context) {
	...

	if tlsConn, ok := c.rwc.(*tls.Conn); ok {
		if d := c.server.ReadTimeout; d != 0 {
			c.rwc.SetReadDeadline(time.Now().Add(d)) // 設置讀超時
		}
		if d := c.server.WriteTimeout; d != 0 {
			c.rwc.SetWriteDeadline(time.Now().Add(d))// 設置寫超時
		}
		if err := tlsConn.Handshake(); err != nil {
			c.server.logf("http: TLS handshake error from %s: %v", c.rwc.RemoteAddr(), err)
			return
		}
		c.tlsState = new(tls.ConnectionState)
		*c.tlsState = tlsConn.ConnectionState()
		if proto := c.tlsState.NegotiatedProtocol; validNPN(proto) {
			if fn := c.server.TLSNextProto[proto]; fn != nil {
				h := initNPNRequest{tlsConn, serverHandler{c.server}}
				fn(c.server, tlsConn, h)
			}
			return
		}
	}
   ...

    以後,con.SetReadDeadline 會調用 internal/poll/fd_poll_runtime.go的 fd.setReadDeadline,最後調用runtime/netpoll.go 的poll_runtime_pollSetDeadline, 這個函數會連接成internal/poll.runtime_pollSetDeadline。這個函數比較關鍵:網絡

//go:linkname poll_runtime_pollSetDeadline internal/poll.runtime_pollSetDeadline
func poll_runtime_pollSetDeadline(pd *pollDesc, d int64, mode int) {
	lock(&pd.lock)
	if pd.closing {
		unlock(&pd.lock)
		return
	}
	pd.seq++ // invalidate current timers
	// Reset current timers.
	if pd.rt.f != nil {
		deltimer(&pd.rt)
		pd.rt.f = nil
	}
	if pd.wt.f != nil {
		deltimer(&pd.wt)
		pd.wt.f = nil
	}
	// Setup new timers.
	if d != 0 && d <= nanotime() {
		d = -1
	}
	if mode == 'r' || mode == 'r'+'w' {
		pd.rd = d
	}
	if mode == 'w' || mode == 'r'+'w' {
		pd.wd = d
	}
	if pd.rd > 0 && pd.rd == pd.wd {
		pd.rt.f = netpollDeadline
		pd.rt.when = pd.rd
		// Copy current seq into the timer arg.
		// Timer func will check the seq against current descriptor seq,
		// if they differ the descriptor was reused or timers were reset.
		pd.rt.arg = pd
		pd.rt.seq = pd.seq
		addtimer(&pd.rt)
	} else {
		if pd.rd > 0 {
			pd.rt.f = netpollReadDeadline // 設置讀的定時回調
			pd.rt.when = pd.rd
			pd.rt.arg = pd
			pd.rt.seq = pd.seq
			addtimer(&pd.rt)             // 添加到系統定時器中
		}
		if pd.wd > 0 {
			pd.wt.f = netpollWriteDeadline // 設置寫的定時回調
			pd.wt.when = pd.wd
			pd.wt.arg = pd
			pd.wt.seq = pd.seq
			addtimer(&pd.wt)             // 添加到系統定時器中
		}
	}
	// If we set the new deadline in the past, unblock currently pending IO if any.
	var rg, wg *g
	atomicstorep(unsafe.Pointer(&wg), nil) // full memory barrier between stores to rd/wd and load of rg/wg in netpollunblock
	if pd.rd < 0 {
		rg = netpollunblock(pd, 'r', false)
	}
	if pd.wd < 0 {
		wg = netpollunblock(pd, 'w', false)
	}
	unlock(&pd.lock)
	if rg != nil {
		netpollgoready(rg, 3)
	}
	if wg != nil {
		netpollgoready(wg, 3)
	}
}

    這裏主要工做就是檢查過時定時器,而後添加定時器,設置回調函數爲netpollReadDeadline 或者netpollWriteDeadline。 從中能夠看出添加和刪除定時器操做爲addtimer(&pd.rt), deltimer(&pd.rt)。數據結構

    後面就是核心了,爲啥加超時後這麼慢,看下addtimer 的實現,timer 是個四叉小頂堆,每次添加一個超時,最後都須要對一個全局的timers 進行加鎖,當qps 很高,一個請求,屢次加鎖,這性能能很高嗎?框架

type timer struct {
	i int // heap index

	// Timer wakes up at when, and then at when+period, ... (period > 0 only)
	// each time calling f(arg, now) in the timer goroutine, so f must be
	// a well-behaved function and not block.
	when   int64
	period int64
	f      func(interface{}, uintptr)
	arg    interface{}
	seq    uintptr
}

var timers struct {
	lock         mutex
	gp           *g
	created      bool
	sleeping     bool
	rescheduling bool
	sleepUntil   int64
	waitnote     note
	t            []*timer
}

//添加一個定時器

func addtimer(t *timer) {
	lock(&timers.lock)
	addtimerLocked(t)
	unlock(&timers.lock)
}

    解決鎖衝突改怎麼辦?分段鎖是很常見一個思路,在go1.10 後,timers 由一個,變成64個,定時器被打散到64個鎖上去,天然鎖衝突就下降了。看1.10的runtime/time.go 能夠發現定義以下,每一個p有單獨的timer, 每一個timer能被多個p使用:函數

// Package time knows the layout of this structure.
// If this struct changes, adjust ../time/sleep.go:/runtimeTimer.
// For GOOS=nacl, package syscall knows the layout of this structure.
// If this struct changes, adjust ../syscall/net_nacl.go:/runtimeTimer.
type timer struct {
	tb *timersBucket // the bucket the timer lives in
	i  int           // heap index

	// Timer wakes up at when, and then at when+period, ... (period > 0 only)
	// each time calling f(arg, now) in the timer goroutine, so f must be
	// a well-behaved function and not block.
	when   int64
	period int64
	f      func(interface{}, uintptr)
	arg    interface{}
	seq    uintptr
}

// timersLen is the length of timers array.
//
// Ideally, this would be set to GOMAXPROCS, but that would require
// dynamic reallocation
//
// The current value is a compromise between memory usage and performance
// that should cover the majority of GOMAXPROCS values used in the wild.
const timersLen = 64 //64個bucket

// timers contains "per-P" timer heaps.
//
// Timers are queued into timersBucket associated with the current P,
// so each P may work with its own timers independently of other P instances.
//
// Each timersBucket may be associated with multiple P
// if GOMAXPROCS > timersLen.
var timers [timersLen]struct {
	timersBucket

	// The padding should eliminate false sharing
	// between timersBucket values.
	pad [sys.CacheLineSize - unsafe.Sizeof(timersBucket{})%sys.CacheLineSize]byte
}

下面是go1.10 後的timer 數據結構(此圖來源於網絡):性能

 

    總結,網上不少httpserver 框架壓測 qps 很高,可是它們的demo並無設置超時,數據真實值會差不少。線上若是須要設置超時,須要注意go 的版本,qps 很高的狀況下,最好使用1.10以上。最終咱們不作任何其餘操做狀況下,僅將go 版本提升到1.10,qps 提升接近2倍。

相關文章
相關標籤/搜索