zero down-time update服務的方案

時間 2019-11-10

標籤 zero time update 服務方案简体版

原文原文鏈接

從問題開始

先來拋一塊磚，對於靜態編譯的應用程序，好比用C、C++、Golang或者其它的語言編寫的程序，若是咱們修改一個BUG或者添加一個新的特性後，如何在服務不下線的狀況下更遠應用程序呢？nginx

拋出了一個問題，一個很日常的問題，有人對問題思考比較透徹，好比牛頓，被蘋果砸中了以後，引發了不少的思考，最後發現了萬有引力定律。apache

若是你被蘋果砸中了怎麼辦？api

玩笑話一句，那咱們若是被蘋果砸中了會不死變成智障呢？緩存

那麼咱們回到剛纔這個問題：服務器

當咱們修復BUG，添加新的需求後，如何如絲般順滑地升級服務器應用程序，而不會中斷服務？dom

這個問題意味着：socket

C / C++ / GO都是靜態語言，全部的指令都編譯在可執行文件，升級就意味着編譯新的執行文件替換舊的執行文件，已經運行的進程如何加載新的image（可執行程序文件）去執行呢？ide

正在處理的業務邏輯不能中斷，正在處理的鏈接不能暴力中斷？函數

這種如絲般順滑地升級應用程序，咱們稱之爲熱更新。測試

用個形象上的比喻表示就是：

你如今在坐卡車，卡車開到了150KM/H

而後，有個輪胎，爆了

而後，司機說，你就直接換吧，我不停車。你當心點換

哦，Lee哥，我明白了，在這些狀況下，咱們是不能使用哪一個萬能地「重啓」去解決問題的。

第一種解決方案：灰度發佈和A/B測試引發的思考

灰度發佈（又名金絲雀發佈）是指在黑與白之間，可以平滑過渡的一種發佈方式。在其上能夠進行A/B testing，即讓一部分用戶繼續用產品特性A，一部分用戶開始用產品特性B，若是用戶對B沒有什麼反對意見，那麼逐步擴大範圍，把全部用戶都遷移到B 上面來。灰度發佈能夠保證總體系統的穩定，在初始灰度的時候就能夠發現、調整問題，以保證其影響度。利用nginx作灰度發佈的方案以下圖：

nginx是一個反向代理軟件，能夠把外網的請求轉發到內網的業務服務器上，系統的分層的設計，通常咱們把nginx歸爲接入層，固然LVS/F5/Apache等等都能去轉發用戶請求。好比咱們來看一個nginx的配置：

http {

    upstream cluster {

        ip_hash;

        server 192.168.2.128:8086 weight=1 fail_timeout=15 max_fails =3;

        server 192.168.2.130:8086 weight=2 fail_timeout=15 max_fails =3;

    }

    server {

        listen 8080;

        location / {

            proxy_pass http://cluster;

        }

    }

}

咱們對8080端口的訪問，都會轉發到cluster說定義的upstream裏，upstream裏會根據IP hash的策略轉發給192.168.2.128和192.168.2.130的8086端口的服務上。這裏配置的是ip hash，固然nginx還支持其餘策略。

那麼經過nginx如何去如絲般升級服務程序呢？

好比nginx的配置：

http {  

    upstream cluster {  

        ip_hash;

        server 192.168.2.128:8086 weight=1 fail_timeout=15 max_fails =3;

        server 192.168.2.130:8086 weight=2 fail_timeout=15 max_fails =3;

    }  



    server {  

        listen 80;  



        location / {

            proxy_pass http://cluster;  

        }  

    }  

}

假如咱們的服務部署在192.168.2.128上，如今咱們修復BUG或者增長新的特性後，咱們從新部署了一臺服務（好比192.168.2.130上），那麼咱們就能夠修改nginx配置如上，而後執行nginx -s reload加載新的配置，這樣咱們現有的鏈接和服務都沒有斷掉，可是新的業務服務已經能夠開始服務了，這就是經過nginx作的灰度發佈，依據這樣的方法作的測試稱之爲A/B測試，好了，那如何讓老的服務完全停掉呢？

能夠修改nginx的配置以下，即在對應的upstream的服務器上添加down字段：

http {  

    upstream cluster {  

        ip_hash;

server 192.168.2.128:8086 weight=1 fail_timeout=15 max_fails =3down;

        server 192.168.2.130:8086 weight=2 fail_timeout=15 max_fails =3;

    }  



    server {  

        listen 80;  



        location / {

            proxy_pass http://cluster;  

        }  

    }  

}

這樣等過一段時間，就能夠把192.168.2.128上的服務給停掉了。

這就是經過接入層nginx的一個如絲般順滑的一個方案，這種思想一樣能夠應用於其餘的好比LVS、apache等，固然還能夠經過DNS，zookeeper，etcd等，就是把流量全都打到新的系統上去。

灰度發佈解決的流量轉發到新的系統中去，可是若是對於nginx這樣的應用程序，或者我就是要在這臺機器上升級image，那怎麼辦呢？這就必需要實現熱更新，這裏須要考慮的問題是舊的服務若是緩存了數據怎麼辦？若是正在處理業務邏輯怎麼辦？

第二種解決方案：nginx的熱更新方案

nginx採用Master/Worker的多進程模型，Master進程負責整個nginx進程的管理，好比停機、日誌重啓和熱更新等等，worker進程負責用戶的請求處理。

如上一個nginx裏配置的全部的監聽端口都是首先在Master進程裏create的socket（sfd）、bind、listen，而後Master在建立worker進程的時候把這些socket經過unix domain socket複製給了Worker進程，Worker進程把這些socket全都添加到epoll，以後若是有客戶端鏈接進來了，則由worker進程負責處理，那麼也就是說用戶的請求是由worker進程處理的。

先交代了nginx的IO處理模型的背景，而後咱們再看nginx的熱更新方案：

升級的步驟：

第一步：升級nginx二進制文件，須要先將新的nginx可執行文件替換原有舊的nginx文件，而後給nginx master進程發送USR2信號，告知其開始升級可執行文件；nginx master進程會將老的pid文件增長.oldbin後綴，而後調用exec函數拉起新的master和worker進程，並寫入新的master進程的pid。

UID        PID  PPID  C STIME TTY          TIME CMD

root      4584     1  0 Oct17 ?        00:00:00 nginx: master process /usr/local/apigw/apigw_nginx/nginx

root     12936  4584  0 Oct26 ?        00:03:24 nginx: worker process

root     12937  4584  0 Oct26 ?        00:00:04 nginx: worker process

root     12938  4584  0 Oct26 ?        00:00:04 nginx: worker process

root     23692  4584  0 21:28 ?        00:00:00 nginx: master process /usr/local/apigw/apigw_nginx/nginx

root     23693 23692  3 21:28 ?        00:00:00 nginx: worker process

root     23694 23692  3 21:28 ?        00:00:00 nginx: worker process

root     23695 23692  3 21:28 ?        00:00:00 nginx: worker process

關於exec家族的函數說明見下：

NAME

       execl, execlp, execle, execv, execvp, execvpe - execute a file

SYNOPSIS

       #include <unistd.h>

       extern char **environ;

       int execl(const char *path, const char *arg, ...

                       /* (char  *) NULL */);

       int execlp(const char *file, const char *arg, ...

                       /* (char  *) NULL */);

       int execle(const char *path, const char *arg, ...

                       /*, (char *) NULL, char * const envp[] */);

       int execv(const char *path, char *const argv[]);

       int execvp(const char *file, char *const argv[]);

       int execvpe(const char *file, char *const argv[],

                       char *const envp[]);

   Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

       execvpe(): _GNU_SOURCE

DESCRIPTION

The  exec()  family of functions replaces the current process image with a new process image.  The functions described in this manual page are front-ends for execve(2).

       (See the manual page for execve(2) for further details about the replacement of the current process image.)

       The initial argument for these functions is the name of a file that is to be executed.

       The const char *arg and subsequent ellipses in the execl(), execlp(), and execle() functions can be thought of as arg0, arg1, ..., argn.  Together they describe a  list

       of  one or more pointers to null-terminated strings that represent the argument list available to the executed program.  The first argument, by convention, should point

       to the filename associated with the file being executed.  The list of arguments must be terminated by a null pointer, and, since  these  are  variadic  functions,  this

       pointer must be cast (char *) NULL.

       The  execv(),  execvp(),  and execvpe() functions provide an array of pointers to null-terminated strings that represent the argument list available to the new program.

       The first argument, by convention, should point to the filename associated with the file being executed.  The array of pointers must be terminated by a null pointer.

       The execle() and execvpe() functions allow the caller to specify the environment of the executed program via the argument envp.  The envp argument is an array of point‐

       ers  to null-terminated strings and must be terminated by a null pointer.  The other functions take the environment for the new process image from the external variable

       environ in the calling process.

第二步：在此以後，全部工做進程(包括舊進程和新進程)將會繼續接受請求。這時候，須要發送WINCH信號給nginx master進程，master進程將會向worker進程發送消息，告知其須要進行graceful shutdown，worker進程會在鏈接處理完以後進行退出。

UID        PID  PPID  C STIME TTY          TIME CMD

root      4584     1  0 Oct17 ?        00:00:00 nginx: master process /usr/local/apigw/apigw_nginx/nginx

root     12936  4584  0 Oct26 ?        00:03:24 nginx: worker process

root     12937  4584  0 Oct26 ?        00:00:04 nginx: worker process

root     12938  4584  0 Oct26 ?        00:00:04 nginx: worker process

root     23692  4584  0 21:28 ?        00:00:00 nginx: master process /usr/local/apigw/apigw_nginx/nginx

若是舊的worker進程還須要處理鏈接，則worker進程不會當即退出，須要待消息處理完後再退出。

第三步：通過一段時間以後，將會只會有新的worker進程處理新的鏈接。

注意，舊master進程並不會關閉它的listen socket；由於若是出問題後，須要回滾，master進程須要法從新啓動它的worker進程。

第四步：若是升級成功，則能夠向舊master進程發送QUIT信號，中止老的master進程；若是新的master進程（意外）退出，那麼舊master進程將會去掉本身的pid文件的.oldbin後綴。

幾個核心的步驟和命令說明以下：

操做的命令

master進程相關信號