OpenResty 最佳實踐（2）

時間 2019-11-16

標籤 openresty 最佳實踐简体版

原文原文鏈接

此文已由做者湯曉靜受權網易雲社區發佈。
html

歡迎訪問網易雲社區，瞭解更多網易技術產品運營經驗。mysql

lua 協程與 nginx 事件機制結合

文章前部分用大量篇幅闡述了 lua 和 nginx 的相關知識，包括 nginx 的進程架構，nginx 的事件循環機制，lua 協程，lua 協程如何與 C 實現交互；在瞭解這些知識以後，本節闡述 lua 協程是如何和 nginx 的事件機制協同工做。nginx

從 nginx 的架構和事件驅動機制來看, nginx 的併發處理模型歸納爲：單 worker + 多鏈接 + epoll + callback。即每一個 nginx worker 同時處理了大量鏈接，每一個鏈接對應一個 http 請求，一個 http 請求對應 nignx 中的一個結構體(ngx_http_request_t):c++

struct ngx_http_request_s {
    uint32_t                          signature;         /* "HTTP" */

    ngx_connection_t                 *connection;    void                            **ctx;    void                            **main_conf;    void                            **srv_conf;    void                            **loc_conf;

    ngx_http_event_handler_pt         read_event_handler;
    ngx_http_event_handler_pt         write_event_handler;

    ....
}複製代碼

結構體中的核心成員爲 ngx_connection_t *connection，其定義以下:git

struct ngx_connection_s {    void               *data;
    ngx_event_t        *read;      // epoll 讀事件對應的結構體成員
    ngx_event_t        *write;     // epoll 寫事件對應的結構體成員

    ngx_socket_t        fd;        // tcp 對應的 socket fd

    ngx_recv_pt         recv;
    ngx_send_pt         send;
    ngx_recv_chain_pt   recv_chain;
    ngx_send_chain_pt   send_chain;

    ngx_listening_t    *listening;

    ...
}複製代碼

從如上結構體可知，每一個請求中對應的 ngx_connection_t 中的讀寫事件和 epoll 關聯；nginx epoll 的事件處理核心代碼以下：github

...

    events = epoll_wait(ep, event_list, (int) nevents, timer);    for (i = 0; i < events; i++) {
        c = event_list[i].data.ptr;

        instance = (uintptr_t) c & 1;
        c = (ngx_connection_t *) ((uintptr_t) c & (uintptr_t) ~1); // epoll 獲取激活事件，將事件轉換成 ngx_connection_t

        ...

        rev = c->read;
        rev->handler(rev);

        ...

        wev = c->write;
        wev->handler(ev);

        ...
    }複製代碼

nginx epoll loop 中調用 epoll_wait 獲取 epoll 接管的激活事件，並經過 c 的指針強轉，獲得 ngx_connection_t 獲取對應的鏈接和鏈接上對應的讀寫事件的回調函數，即經過 C 結構體變量成員之間的相關關聯來串聯請求和事件驅動，實現請求的併發處理；這裏其實和高級語言的面向對象的寫法一模一樣，只是模塊和成員變量之間的獲取方式的差別。redis

若是引入 lua 的協程機制，在 lua 代碼中出現阻塞的時候，主動調用 coroutine.yield 將自身掛起，待阻塞操做恢復時，再將掛起的協程調用 coroutine.resume 恢復則能夠避免在 lua 代碼中寫回調；而什麼時候恢復協程能夠交由 c 層面的 epoll 機制來實現，則能夠實現事件驅動和協程之間的關聯。如今咱們只須要考慮，如何將 lua_State 封裝的 lua land 和 C land 中的 epoll 機制融合在一塊兒。算法

事實上 lua-nginx-module 確實是按照這種方式來處理協程與 nginx 事件驅動之間的關係，lua-nginx-module 爲每一個 nginx worker 生成了一個 lua_state 虛擬機，即每一個 worker 綁定一個 lua 虛擬機，當須要 lua 腳本介入請求處理流程時，基於 worker 綁定的虛擬機建立 lua_coroutine 來處理邏輯，當阻塞發生、須要掛起時或者處理邏輯完成時掛起本身，等待下次 epoll 調度時再次喚醒協程執行。以下是 rewrite_by_lua 核心代碼部分：sql

tatic ngx_int_tngx_http_lua_rewrite_by_chunk(lua_State *L, ngx_http_request_t *r){
    co = ngx_http_lua_new_thread(r, L, &co_ref);

    lua_xmove(L, co, 1);
    ngx_http_lua_get_globals_table(co);
    lua_setfenv(co, -2);

    ngx_http_lua_set_req(co, r);       // 此處設置協程與 ngx_http_request_t 之間的關係

    ...

    rc = ngx_http_lua_run_thread(L, r, ctx, 0);  // 運行 lua 腳本處理 rewrite 邏輯

    if (rc == NGX_ERROR || rc > NGX_OK) {        return rc;
    }

    ...
}複製代碼

從上述代碼片斷中咱們看到了協程與 ngx 請求之間的綁定關係，那麼只要在 ngx_http_lua_run_thread 函數中（其實是在 lua 腳本中）處理什麼時候掛起 lua 的執行便可。大部分時候咱們在 lua 中的腳本工做類型分兩種，一種是基於請求信息的邏輯改寫，一種是基於 tcp 鏈接的後端交互。邏輯改寫每每不會發生 io 阻塞，即當前腳本很快執行完成後回到 C land，不須要掛起再喚醒的流程。而對於方式二，lua-nginx-module 提供了 cosocket api，它封裝了 tcp api，而且會在合適的時候（coroutine.yield 的調用發生在 IO 異常，讀取包體完畢，或者 proxy_buffers 已滿等情形，具體的實現讀者能夠參考 ngx_http_lua_socket_tcp.c 源碼）調用 coroutine.yield 方法。編程

綜上所述，結合lua 協程和 nginx 事件驅動機制，使用 OpenResty 可使用 lua 腳本方便的擴展 nignx 的功能。

OpenResty hooks (編程鉤子)

init_by_lua

該階段主要用於預加載一些 lua 模塊，如加載全局 json 模塊：require 'cjson.safe'；設置全局的 lua_share_dict 等，而且能夠利用操做系統的 copy-on-write 機制；reload nginx 會從新加載該階段的代碼。

init_worker_by_lua

該階段可用於爲每一個 worker 設置獨立的定時器，設置心跳檢查等。

rewrite_by_lua

實際場景中應用最多的一個 hooks 之一，可用於請求重定向相關的邏輯，如改寫 host 頭，改寫請求參數和請求路徑等

access_by_lua

該階段可用於實現訪問控制相關的邏輯，如動態限流、限速，防盜鏈等

content_by_lua

該階段用於生成 http 請求的內容，和 proxy_pass 指令衝突；兩者在同一個階段只能用一個。該階段可用於動態的後端交互，如 mysql、redis、kafaka 等；也可用於動態的 http 內容生成，如使用 lua 實現 c 的 slice 功能，完成大文件的分片切割。

banalce_by_lua

該階段可用於動態的設置 proxy_pass 的上游地址，例如用 lua 實現一個帶監控檢測機制的一致性 hash 輪序後端算法，根據上游的響應動態設置該地址是否可用。

body_filter_by_lua

用於過濾和加工響應包體，如對 chunk 模式的包體進行 gzip; 也能夠根據包體的大小來動態設置 ngx.var.limit_rate.

header_filter_by_lua

調整發送給 client 端的響應頭，也是最經常使用的 hooks 之一；好比設置響應的 server 頭，修緩存頭 cache-control 等。

log_by_lua

一方面能夠設置 nginx 日誌輸出的字段值，另外一方面咱們也能夠用 cosocket 將日誌信息發送到指定的 http server；因響應頭和響應體已發送給客戶端，該階段的操做不會影響到客戶端的響應速度。

OpenResty 之 lua 編寫常見陷阱

elseif，區別於 else if；
and & or，不支持問號表達式；lua 中 0 表示 true；
no continue，lua 中不支持 continue 語法；須要用 if 和 else 語句實現；
. & :，lua 中 object.method 和 object:method 行爲不一樣，object:method 爲語法糖，會擴展成第一個參數爲 self
forgot return _M，在編寫模塊的時候若是最後忘記 return _M, 調用時會提示嘗試對 string 調用方法的異常

OpenResty 編程優化

do local statement，儘可能使用 local 化的變量聲明，加速變量索引速度的同時避免全局命名空間的污染；
do not use blocked api，不要調用會阻塞 lua 協程的 api，好比 lua 原生的 socket，會形成 nginx worker block；
use ngx.ctx instead of ngx.var，ngx.var 會調用 ngx.var 的變量索引系統，比 ngx.ctx 低效不少；
decrease table resize，避免 lua table 表的 resize 操做，能夠用 luajit 事先聲明指定大小的 table。好比頻繁的 lua 字符串相加的 .. 操做，當 lua 預分配內存不夠時，會從新動態擴容(和 c++ vector 類型)，會形成低效；
use lua-resty-core，使用 lua-resty-core api，該部分 api 用 luajit 的 ffi 實現比直接的 C 和 lua 交互高效；
use jit support function，少用不可 jit 加速的函數，那些函數不能 jit 支持，能夠參看 luajit 文檔。
ffi，對本身實現的 C 接口，也建議用 ffi 暴露出接口給 lua 調用。

nginx 易混易錯配置說明

so_keepalive

用於 listen 中，探測鏈接保活; 採用TCP鏈接的C/S模式軟件，鏈接的雙方在鏈接空閒狀態時，若是任意一方意外崩潰、當機、網線斷開或路由器故障，另外一方沒法得知TCP鏈接已經失效，除非繼續在此鏈接上發送數據致使錯誤返回。不少時候，這不是咱們須要的。咱們但願服務器端和客戶端都能及時有效地檢測到鏈接失效，而後優雅地完成一些清理工做並把錯誤報告給用戶。

如何及時有效地檢測到一方的非正常斷開，一直有兩種技術能夠運用。一種是由TCP協議層實現的Keepalive，另外一種是由應用層本身實現的心跳包。

TCP默認並不開啓Keepalive功能，由於開啓 Keepalive 功能須要消耗額外的寬帶和流量，儘管這微不足道，但在按流量計費的環境下增長了費用，另外一方面，Keepalive設置不合理時可能會由於短暫的網絡波動而斷開健康的TCP鏈接。而且，默認的Keepalive超時須要7,200,000 milliseconds，即2小時，探測次數爲 5 次。系統默認的 keepalive 配置以下：

net.ipv4.tcpkeepaliveintvl = 75
net.ipv4.tcpkeepaliveprobes = 5
net.ipv4.tcpkeepalivetime = 7200複製代碼

若是在 listen 的時候不設置 so_keepalive 則使用了系統默認的 keepalive 探測保活機制，須要 2 小時才能清理掉這種異常鏈接；若是在 listen 指令中加入

so_keepalive=30m::10複製代碼

可設置若是鏈接空閒了半個小時後每 75s 探測一次，若是超過 10 次探測失敗，則釋放該鏈接。

sendfile/directio

sendfile

copies data between one file descriptor and another. Because this copying is done within the kernel, sendfile() is more efficient than the combination of read(2) and write(2), which would require transferring data to and from user space.

從 Linux 的文檔中能夠看出，當 nginx 有磁盤緩存文件時候，能夠利用 sendfile 特性將磁盤內容直接發送到網卡避免了用戶態的讀寫操做。

directio

Enables the use of the O_DIRECT flag (FreeBSD, Linux), the F_NOCACHE flag (macOS), or the directio() function (Solaris), when reading files that are larger than or equal to the specified size. The directive automatically disables (0.7.15) the use of sendfile for a given request

寫文件時不通過 Linux 的文件緩存系統，不寫 pagecache, 直接寫磁盤扇區。啓用aio時會自動啓用directio, 小於directio定義的大小的文件則採用 sendfile 進行發送，超過或等於 directio 定義的大小的文件，將採用 aio 線程池進行發送，也就是說 aio 和 directio 適合大文件下載。由於大文件不適合進入操做系統的 buffers/cache,這樣會浪費內存，並且 Linux AIO(異步磁盤IO) 也要求使用directio的形式。

proxy_request_buffering

控制處理客戶端包體的行爲，若是設置爲 on, 則 nginx 會接收完 client 的整個包體後處理。如 nginx 做爲反向代理服務處理客戶端的上傳操做，則先接收完包體再轉發給上游，這樣上游異常的時候，nginx 能夠屢次重試上傳，但有個問題是若是包體過大，nginx 端若是負載較重話，會有大量的寫磁盤操做，同時對磁盤的容量也有較高要求。若是設置爲 off, 則傳輸變成流式處理，一個 chunk 一個 chunk 傳輸，傳輸出錯更多須要 client 端重試。

proxy_buffer_size

Sets the size of the buffer used for reading the first part of the response received from the proxied server. This part usually contains a small response header. By default, the buffer size is equal to one memory page. This is either 4K or 8K, depending on a platform.

proxy_buffers

Sets the number and size of the buffers used for reading a response from the proxied server, for a single connection. By default, the buffer size is equal to one memory page. This is either 4K or 8K, depending on a platform.

proxy_buffering

Enables or disables buffering of responses from the proxied server.

When buffering is enabled, nginx receives a response from the proxied server as soon as possible, saving it into the buffers set by the proxy_buffer_size and proxy_buffers directives. If the whole response does not fit into memory, a part of it can be saved to a temporary file on the disk. Writing to temporary files is controlled by the proxy_max_temp_file_size and proxy_temp_file_write_size directives.

When buffering is disabled, the response is passed to a client synchronously, immediately as it is received. nginx will not try to read the whole response from the proxied server. The maximum size of the data that nginx can receive from the server at a time is set by the proxy_buffer_size directive.

當 proxy_buffering on 時處理上游的響應可使用 proxy_buffer_size 和 proxy_buffers 兩個緩衝區；而設置 proxy_buffering off 時，只能使用proxy_buffer_size 一個緩衝區。

proxy_busy_size

When buffering of responses from the proxied server is enabled, limits the total size of buffers that can be busy sending a response to the client while the response is not yet fully read. In the meantime, the rest of the buffers can be used for reading the response and, if needed, buffering part of the response to a temporary file. By default, size is limited by the size of two buffers set by the proxy_buffer_size and proxy_buffers directives.

當接收上游的響應發送給 client 端時，也須要一個緩存區，即發送給客戶端而未確認的部分，這個 buffer 也是從 proxy_buffers 中分配，該指令限定能從 proxy_buffers 中分配的大小。

keepalive

該指令可做用於 nginx.conf 和 upstream 的 server 中；看成用於 nginx.conf 中時，表示做爲 http server 端回覆客戶端響應後，不關閉該鏈接，讓該鏈接保持 ESTAB 狀態，即 keepalive。當該指令做用於 upstrem 塊中時，表示發送給上游的 http 請求加入 connection: keepalive, 讓服務端保活該鏈接。值得注意的是服務端和客戶端均須要設置 keepalive 才能實現長鏈接。同時 keepalive指令須要和以下兩個指令配合使用：

keepalive_requests 100;keepalive_timeout 65;複製代碼

keepalive_requests 表示一個長鏈接能夠複用的次數，keepalive_timeout 表示長鏈接在空閒多久後能夠關閉。 keepalive_timeout 若是設置過大會形成 nginx 服務端 ESTAB 狀態的鏈接數增多。

nginx 維護與更新

nginx 信號集和 nginx 操做之間的對應關係以下：

nginx operation	signal
reload	SIGHUP
reload	SIGUSR1
stop	SIGTERM
quit	SIGQUIT
hot update	SIGUSR2 & SIGWINCH & SIGQUIT

stop vs quit

stop 發送 SIGTERM 信號，表示要求強制退出，quit 發送 SIGQUIT，表示優雅地退出。具體區別在於，worker 進程在收到 SIGQUIT 消息(注意不是直接發送信號，因此這裏用消息替代)後，會關閉監聽的套接字，關閉當前空閒的鏈接(能夠被搶佔的鏈接)，而後提早處理全部的定時器事件，最後退出。沒有特殊狀況，都應該使用 quit 而不是 stop。

reload

master 進程收到 SIGHUP 後，會從新進行配置文件解析、共享內存申請，等一系列其餘的工做，而後產生一批新的 worker 進程，最後向舊的 worker 進程發送 SIGQUIT 對應的消息，最終無縫實現了重啓操做。再 master 進程從新解析配置文件過程當中，若是解析失敗則會回滾使用原來的配置文件，即 reload 失敗，此時工做的仍是老的 worker。

reopen

master 進程收到 SIGUSR1 後，會從新打開全部已經打開的文件(好比日誌)，而後向每一個 worker 進程發送 SIGUSR1 信息，worker 進程收到信號後，會執行一樣的操做。reopen 可用於日誌切割，好比 nginx 官方就提供了一個方案：

$ mv access.log access.log.0
 $ kill -USR1 `cat master.nginx.pid`
 $ sleep 1
 $ gzip access.log.0    # do something with access.log.0複製代碼

這裏 sleep 1 是必須的，由於在 master 進程向 worker 進程發送 SIGUSR1 消息到 worker 進程真正從新打開 access.log 之間，有一段時間窗口，此時 worker 進程仍是向文件 access.log.0 裏寫入日誌的。經過 sleep 1s，保證了 access.log.0 日誌信息的完整性(若是沒有 sleep 而直接進行壓縮，頗有可能出現日誌丟失的狀況)。

hot update

某些時候咱們須要進行二進制熱更新，nginx 在設計的時候就包含了這種功能，不過沒法經過 nginx 提供的命令行完成，咱們須要手動發送信號。

首先須要給當前的 master 進程發送 SIGUSR2，以後 master 會重命名 nginx.pid 到 nginx.pid.oldbin，而後 fork 一個新的進程，新進程會經過 execve 這個系統調用，使用新的 nginx ELF 文件替換當前的進程映像，成爲新的 master 進程。新 master 進程起來以後，就會進行配置文件解析等操做，而後 fork 出新的 worker 進程開始工做。

接着咱們向舊的 master 發送 SIGWINCH 信號，而後舊的 master 進程則會向它的 worker 進程發送 SIGQUIT 信息，從而使得 worker 進程退出。向 master 進程發送 SIGWINCH 和 SIGQUIT 都會使得 worker 進程退出，可是前者不會使得 master 進程也退出。

最後，若是咱們以爲舊的 master 進程使命完成，就能夠向它發送 SIGQUIT 信號，讓其退出了。