在工做中,咱們常常須要重啓PHP-FPM,那麼這個重啓過程都發生了那些事情呢?讓咱們從PHP源碼中一探究竟吧。php
運行環境: Mac 10.14.2 + PHP 7.3.7segmentfault
信號在fpm的重啓中扮演着重要的角色。那什麼是信號呢?異步
信號是由用戶、系統或者進程發送給目標進程的信息,以通知目標進程某個狀態的改變或系統異常。Linux信號可由以下條件產生:socket
- 對於前臺進程,用戶能夠經過輸入特殊的終端字符來給它發送信號。
- 系統異常。好比浮點異常和非法內存段訪問。
- 系統狀態變化。好比 alarm 定時器到期將引發 SIGALARM 信號。
- 運行 kill 命令或調用 kill 函數
在PHP-FPM中,用戶經過kill
命令來重啓fpm,master進程也是經過kill()
函數向worker進程發送信號來結束進程。fpm的重啓分爲優雅重啓(kill -SIGUSR2
)和強制重啓(kill -SIGTERM
)兩種,下面是以優雅重啓爲例,master進程將收到SIGUSR2
信號。ide
master進程信號初始化函數fpm_signals_init_main()
主要作了兩件事情:函數
經過socketpair()
來建立這一對雙全工的unix_socket,其中sp[0]
的可讀事件在fpm_event_loop()
中被註冊到事件隊列中,其回調函數爲fpm_got_signal()
,這樣往sp[1]
寫入數據時將觸發sp[0]
的可讀事件回調。對這倆unix_socket還有兩個操做:php-fpm
fcntl(fd, F_SETFL, old_flags|O_NONBLOCK)
,這樣當fd不可讀或不可寫的時候,read()
、write()
不會阻塞,而是直接返回-1,errno設爲EAGAIN。fcntl(fd, F_SETFD, FD_CLOEXEC)
,這樣當進程調用exec()
族函數前會關閉該fd。這麼作是爲了防止文件描述符的泄露,由於調用exec()
族函數會用新程序替換掉當前進程執行的程序,進程的正文、數據、堆和棧段都會被替換,這就致使原先保存文件描述符的變量不存在了,也就沒法關閉「老進程「的fd,致使文件描述符泄露。註冊的信號有SIGTERM
、SIGINT
、SIGUSR1
、SIGUSR2
、SIGCHLD
、SIGQUIT
六種。oop
int fpm_signals_init_main() /* {{{ */ { struct sigaction act; if (0 > socketpair(AF_UNIX, SOCK_STREAM, 0, sp)) { zlog(ZLOG_SYSERROR, "failed to init signals: socketpair()"); return -1; } if (0 > fd_set_blocked(sp[0], 0) || 0 > fd_set_blocked(sp[1], 0)) { zlog(ZLOG_SYSERROR, "failed to init signals: fd_set_blocked()"); return -1; } if (0 > fcntl(sp[0], F_SETFD, FD_CLOEXEC) || 0 > fcntl(sp[1], F_SETFD, FD_CLOEXEC)) { zlog(ZLOG_SYSERROR, "falied to init signals: fcntl(F_SETFD, FD_CLOEXEC)"); return -1; } memset(&act, 0, sizeof(act)); act.sa_handler = sig_handler; sigfillset(&act.sa_mask); if (0 > sigaction(SIGTERM, &act, 0) || 0 > sigaction(SIGINT, &act, 0) || 0 > sigaction(SIGUSR1, &act, 0) || 0 > sigaction(SIGUSR2, &act, 0) || 0 > sigaction(SIGCHLD, &act, 0) || 0 > sigaction(SIGQUIT, &act, 0)) { zlog(ZLOG_SYSERROR, "failed to init signals: sigaction()"); return -1; } return 0; }
worker進程信號初始化函數fpm_signals_init_child()
主要作了三件事情:ui
這對unix_socket繼承自master進程,worker進程用不到它們。.net
sig_soft_quit()
,sa_flags
變量設爲SA_RESTART
表示信號處理函數返回後從新調用被中斷的系統調用,這樣worker進程正在處理中的事情不會受到影響。SIG_DFL
,即採用默認行爲。調用zend_signal_init()
,這個不展開講了。
int fpm_signals_init_child() /* {{{ */ { struct sigaction act, act_dfl; memset(&act, 0, sizeof(act)); memset(&act_dfl, 0, sizeof(act_dfl)); act.sa_handler = &sig_soft_quit; act.sa_flags |= SA_RESTART; act_dfl.sa_handler = SIG_DFL; close(sp[0]); close(sp[1]); if (0 > sigaction(SIGTERM, &act_dfl, 0) || 0 > sigaction(SIGINT, &act_dfl, 0) || 0 > sigaction(SIGUSR1, &act_dfl, 0) || 0 > sigaction(SIGUSR2, &act_dfl, 0) || 0 > sigaction(SIGCHLD, &act_dfl, 0) || 0 > sigaction(SIGQUIT, &act, 0)) { zlog(ZLOG_SYSERROR, "failed to init child signals: sigaction()"); return -1; } zend_signal_init(); return 0; }
master進程收到SIGUSR2
信號後將回調sig_handler()
進行信號處理。咱們能夠看到SIGUSR2
被映射爲2
,並寫入到 sp[1]
。
static void sig_handler(int signo) /* {{{ */ { static const char sig_chars[NSIG + 1] = { [SIGTERM] = 'T', [SIGINT] = 'I', [SIGUSR1] = '1', [SIGUSR2] = '2', [SIGQUIT] = 'Q', [SIGCHLD] = 'C' }; char s; int saved_errno; if (fpm_globals.parent_pid != getpid()) { /* prevent a signal race condition when child process have not set up it's own signal handler yet */ return; } saved_errno = errno; s = sig_chars[signo]; zend_quiet_write(sp[1], &s, sizeof(s)); //實際調用write() errno = saved_errno; }
當往sp[1]
寫入數據後,sp[0]
變爲可讀,觸發事件回調fpm_got_signal()
。從sp[0]
讀取到寫入的數據 2
,以後調用fpm_pctl()
來進行重啓操做。
static void fpm_got_signal(struct fpm_event_s *ev, short which, void *arg) /* {{{ */ { char c; int res, ret; int fd = ev->fd; do { do { res = read(fd, &c, 1); } while (res == -1 && errno == EINTR); if (res <= 0) { if (res < 0 && errno != EAGAIN && errno != EWOULDBLOCK) { zlog(ZLOG_SYSERROR, "unable to read from the signal pipe"); } return; } switch (c) { case 'C' : /* SIGCHLD */ zlog(ZLOG_DEBUG, "received SIGCHLD"); fpm_children_bury(); break; ...... case '2' : /* SIGUSR2 */ zlog(ZLOG_DEBUG, "received SIGUSR2"); zlog(ZLOG_NOTICE, "Reloading in progress ..."); fpm_pctl(FPM_PCTL_STATE_RELOADING, FPM_PCTL_ACTION_SET); break; } if (fpm_globals.is_child) { break; } } while (1); return; }
由下面的fpm_pctl()
代碼可知,對於FPM_PCTL_ACTION_SET
操做只有當fpm狀態fpm_state
爲正常時(FPM_PCTL_STATE_NORMAL
),重啓操做才能進行下去。
以後將重置已發送信號(fpm_signal_sent=0
),並設置fpm當前狀態爲FPM_PCTL_STATE_RELOADING
,而後調用fpm_pctl_action_next()
進行下一步操做。
void fpm_pctl(int new_state, int action) /* {{{ */ { switch (action) { case FPM_PCTL_ACTION_SET : if (fpm_state == new_state) { /* already in progress - just ignore duplicate signal */ return; } switch (fpm_state) { /* check which states can be overridden */ case FPM_PCTL_STATE_NORMAL : /* 'normal' can be overridden by any other state */ break; case FPM_PCTL_STATE_RELOADING : /* 'reloading' can be overridden by 'finishing' */ if (new_state == FPM_PCTL_STATE_FINISHING) break; case FPM_PCTL_STATE_FINISHING : /* 'reloading' and 'finishing' can be overridden by 'terminating' */ if (new_state == FPM_PCTL_STATE_TERMINATING) break; case FPM_PCTL_STATE_TERMINATING : /* nothing can override 'terminating' state */ zlog(ZLOG_DEBUG, "not switching to '%s' state, because already in '%s' state", fpm_state_names[new_state], fpm_state_names[fpm_state]); return; } fpm_signal_sent = 0; fpm_state = new_state; zlog(ZLOG_DEBUG, "switching to '%s' state", fpm_state_names[fpm_state]); /* fall down */ case FPM_PCTL_ACTION_TIMEOUT : fpm_pctl_action_next(); break; case FPM_PCTL_ACTION_LAST_CHILD_EXITED : fpm_pctl_action_last(); break; } }
此階段能夠當作是三個升級信號的發送過程:
SIGQUIT
信號,worker進程收到後會進行優雅關閉,並設置一個超時時爲process_control_timeout
的定時器事件,關於process_control_timeout
能夠看我另一篇文章【PHP】配置文件中的超時時間解析,定時器超時後最終將調用fpm_pctl(FPM_PCTL_STATE_UNSPECIFIED, FPM_PCTL_ACTION_TIMEOUT);
,從action名稱能夠看出是要進行超時的操做。fpm_pctl()
源碼可知,action FPM_PCTL_ACTION_TIMEOUT
仍然調用fpm_pctl_action_next()
,只不過此次SIGQUIT
信號會升級爲SIGTERM
發送給worker進程,定時器超時時間變爲1s。SIGTERM
會升級爲終極信號SIGKILL
。SIGKILL
信號相比SIGTERM
是不可被捕獲或者忽略的,它將強行終止worker進程。static void fpm_pctl_action_next() /* {{{ */ { int sig, timeout; if (!fpm_globals.running_children) { fpm_pctl_action_last(); } if (fpm_signal_sent == 0) { if (fpm_state == FPM_PCTL_STATE_TERMINATING) { sig = SIGTERM; } else { sig = SIGQUIT; } timeout = fpm_global_config.process_control_timeout; } else { if (fpm_signal_sent == SIGQUIT) { sig = SIGTERM; } else { sig = SIGKILL; } timeout = 1; } // 實際調用kill() fpm_pctl_kill_all(sig); fpm_signal_sent = sig; fpm_pctl_timeout_set(timeout); }
worker進程主要處理master發送過來的三個信號,即SIGQUIT
、SIGTERM
、SIGKILL
。
SIGQUIT
信號的回調事件是sig_soft_quit()
。它首先會關閉listening_socket
,而且將in_shutdown
置爲1,這樣accept()
系統調用將當即返回-1,worker進程再也不接收請求,開始結束進程的操做。static void sig_soft_quit(int signo) /* {{{ */ { int saved_errno = errno; /* closing fastcgi listening socket will force fcgi_accept() exit immediately */ close(fpm_globals.listening_socket); if (0 > socket(AF_UNIX, SOCK_STREAM, 0)) { zlog(ZLOG_WARNING, "failed to create a new socket"); } // 設置in_shutdown=1 fpm_php_soft_quit(); errno = saved_errno; } int fcgi_accept_request(fcgi_request *req) { while (1) { if (req->fd < 0) { while (1) { if (in_shutdown) { return -1; } ...... req->fd = accept(listen_socket, (struct sockaddr *)&sa, &len); ...... } } else { fcgi_close(req, 1, 1); } } }
SIGTERM
信號採用SIG_DFL
默認處理方式,即終止進程,能夠被阻塞、捕獲、忽略。SIGKILL
信號不能被捕獲或者忽略,將強行終止worker進程。worker進程的狀態發生變化時,被終止或者暫停,內核會向master進程發送一個異步通知,即SIGCHLD
信號,由信號處理函數fpm_got_signal()
可知將執行fpm_children_bury()
。
下面將fpm_children_bury()
的代碼拆解到對應部分下。
在這裏先介紹下waitpid()
是幹嗎的:
當子進程結束的時候,內核會爲終止子進程保存必定量的信息,這些信息至少包括進程ID、該進程的的終止狀態、以及該進程使用的CPU時間總量。一個已經終止、可是其父進程還沒有對其進行善後處理(獲取終止子進程的有關信息,釋放它仍佔用的資源)的進程會成爲殭屍進程。殭屍進程的進程號會被一直佔用着,可是系統所能使用的進程號是有限的,因此若是有大量的殭屍進程產生,將由於沒有可用的進程號而致使系統不能產生新的進程。
wait()
或waitpid()
就可讓父進程獲取到這些信息,並被內核釋放掉。
// 最外層循環 while ( (pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) { ...... }
master進程經過waitpid()
獲取到終止的worker進程的pid
和終止狀態status
後,將對status
進行一些判斷
WTERMSIG(status)
來獲取時子進程終止的信號編號。request_slowlog_timeout
後,master進程的心跳檢測模塊會給worker進程發送SIGSTOP
信號,worker進程被暫停,狀態發生變化,內核向master進程發送SIGCHLD
信號,以後就會執行到這裏。最後將調用fpm_php_trace()
函數來打印致使請求slow的堆棧信息。if (WIFEXITED(status)) { snprintf(buf, sizeof(buf), "with code %d", WEXITSTATUS(status)); /* if it's been killed because of dynamic process management * don't restart it automaticaly */ if (child && child->idle_kill) { restart_child = 0; } // 調用fpm_php_trace() if (WEXITSTATUS(status) != FPM_EXIT_OK) { severity = ZLOG_WARNING; } } else if (WIFSIGNALED(status)) { const char *signame = fpm_signal_names[WTERMSIG(status)]; const char *have_core = WCOREDUMP(status) ? " - core dumped" : ""; if (signame == NULL) { signame = ""; } snprintf(buf, sizeof(buf), "on signal %d (%s%s)", WTERMSIG(status), signame, have_core); /* if it's been killed because of dynamic process management * don't restart it automaticaly */ if (child && child->idle_kill && WTERMSIG(status) == SIGQUIT) { restart_child = 0; } if (WTERMSIG(status) != SIGQUIT) { /* possible request loss */ severity = ZLOG_WARNING; } } else if (WIFSTOPPED(status)) { zlog(ZLOG_NOTICE, "child %d stopped for tracing", (int) pid); if (child && child->tracer) { child->tracer(child); } continue; }
child = fpm_child_find(pid); if (child) { struct fpm_worker_pool_s *wp = child->wp; struct timeval tv1, tv2; // 資源釋放 fpm_child_unlink(child); fpm_scoreboard_proc_free(wp->scoreboard, child->scoreboard_i); fpm_clock_get(&tv1); timersub(&tv1, &child->started, &tv2); ...... // 關閉標準輸出、標準錯誤 fpm_child_close(child, 1 /* in event_loop */); // 在後文中詳解 fpm_pctl_child_exited(); ...... } else { zlog(ZLOG_ALERT, "oops, unknown child (%d) exited %s. Please open a bug report (https://bugs.php.net).", pid, buf); }
從fpm_pctl_child_exited()
源碼可知,若是這是最後一個worker進程的終止,將調用fpm_pctl(FPM_PCTL_STATE_UNSPECIFIED, FPM_PCTL_ACTION_LAST_CHILD_EXITED);
。
int fpm_pctl_child_exited() /* {{{ */ { if (fpm_state == FPM_PCTL_STATE_NORMAL) { return 0; } if (!fpm_globals.running_children) { fpm_pctl(FPM_PCTL_STATE_UNSPECIFIED, FPM_PCTL_ACTION_LAST_CHILD_EXITED); } return 0; }
繼續追蹤源碼會發現,在重啓操做中最後會調用fpm_pctl_exec()
。
execvp()
函數將從新執行php-fpm
程序,當前進程的正文、數據、堆和棧段都將被替換掉。
static void fpm_pctl_exec() /* {{{ */ { fpm_cleanups_run(FPM_CLEANUP_PARENT_EXEC); execvp(saved_argv[0], saved_argv); // 正常狀況不會走到這裏 zlog(ZLOG_SYSERROR, "failed to reload: execvp() failed"); exit(FPM_EXIT_SOFTWARE); }
至此,PHP-FPM就完成了重啓。
PHP打印了不少Debug日誌,你們能夠在php-fpm.conf中將log_level
選項設置爲debug
來開啓。下面是debug日誌的例子,能夠對照着理解下上文內容。
[16-Jul-2019 16:51:40.248439] DEBUG: pid 36507, fpm_got_signal(), line 110: received SIGUSR2 [16-Jul-2019 16:51:40.248711] NOTICE: pid 36507, fpm_got_signal(), line 111: Reloading in progress ... [16-Jul-2019 16:51:40.248909] DEBUG: pid 36507, fpm_pctl(), line 229: switching to 'reloading' state [16-Jul-2019 16:51:40.249112] DEBUG: pid 36507, fpm_pctl_kill_all(), line 157: [pool www] sending signal 3 SIGQUIT to child 36508 [16-Jul-2019 16:51:40.249360] DEBUG: pid 36507, fpm_pctl_kill_all(), line 166: 1 child(ren) still alive [16-Jul-2019 16:51:40.249624] DEBUG: pid 36507, fpm_event_loop(), line 417: event module triggered 1 events [16-Jul-2019 16:51:40.256626] DEBUG: pid 36507, fpm_got_signal(), line 74: received SIGCHLD [16-Jul-2019 16:51:40.256968] DEBUG: pid 36507, fpm_children_bury(), line 259: [pool www] child 36508 exited with code 0 after 16.412179 seconds from start [16-Jul-2019 16:51:40.257411] NOTICE: pid 36507, fpm_pctl_exec(), line 96: reloading: execvp("/usr/local/Cellar/php/7.3.7/sbin/php-fpm", {"/usr/local/Cellar/php/7.3.7/sbin/php-fpm", "--fpm-config=/usr/local/etc/php/7.3.7/php-fpm.conf", "--pid=/usr/local/Cellar/php/7.3.7/var/run/php-fpm.pid"}) [16-Jul-2019 16:51:40.319184] DEBUG: pid 36507, fpm_unix_init_main(), line 518: The calling process is waiting for the master process to ping via fd=4 [16-Jul-2019 16:51:40.321064] DEBUG: pid 36699, fpm_scoreboard_init_main(), line 38: got clock tick '100' [16-Jul-2019 16:51:40.321588] NOTICE: pid 36699, fpm_sockets_init_main(), line 417: using inherited socket fd=7, "127.0.0.1:9001" [16-Jul-2019 16:51:40.321588] NOTICE: pid 36699, fpm_sockets_init_main(), line 417: using inherited socket fd=7, "127.0.0.1:9001" [16-Jul-2019 16:51:40.321782] DEBUG: pid 36699, fpm_socket_af_inet_socket_by_addr(), line 290: Found address for 127.0.0.1, socket opened on 127.0.0.1 [16-Jul-2019 16:51:40.321969] DEBUG: pid 36699, fpm_event_init_main(), line 335: event module is kqueue and 1 fds have been reserved [16-Jul-2019 16:51:40.322374] NOTICE: pid 36699, fpm_init(), line 83: fpm is running, pid 36699 [16-Jul-2019 16:51:40.322505] DEBUG: pid 36699, main(), line 1858: Sending "1" (OK) to parent via fd=5 [16-Jul-2019 16:51:40.322648] DEBUG: pid 36507, fpm_unix_init_main(), line 537: I received a valid acknowledge from the master process, I can exit without error [16-Jul-2019 16:51:40.322977] DEBUG: pid 36699, fpm_children_make(), line 428: [pool www] child 36702 started [16-Jul-2019 16:51:40.323302] DEBUG: pid 36699, fpm_event_loop(), line 364: 1296 bytes have been reserved in SHM [16-Jul-2019 16:51:40.323498] NOTICE: pid 36699, fpm_event_loop(), line 365: ready to handle connections