這個問題以前沒有怎麼留意過,是最近在面試過程當中遇到的一個問題,面了兩家公司,兩家公司居然都面到到了這個問題,不得不使我開始關注這個問題。提及CLOSE_WAIT狀態,若是不知道的話,仍是先瞧一下TCP的狀態轉移圖吧。php
關閉socket分爲主動關閉(Active closure)和被動關閉(Passive closure)兩種狀況。前者是指有本地主機主動發起的關閉;然後者則是指本地主機檢測到遠程主機發起關閉以後,做出迴應,從而關閉整個鏈接。將關閉部分的狀態轉移摘出來,就獲得了下圖:html
產生緣由
經過圖上,咱們來分析,什麼狀況下,鏈接處於CLOSE_WAIT狀態呢?
在被動關閉鏈接狀況下,在已經接收到FIN,可是尚未發送本身的FIN的時刻,鏈接處於CLOSE_WAIT狀態。
一般來說,CLOSE_WAIT狀態的持續時間應該很短,正如SYN_RCVD狀態。可是在一些特殊狀況下,就會出現鏈接長時間處於CLOSE_WAIT狀態的狀況。
出現大量close_wait的現象,主要緣由是某種狀況下對方關閉了socket連接,可是我方忙與讀或者寫,沒有關閉鏈接。代碼須要判斷socket,一旦讀到0,斷開鏈接,read返回負,檢查一下errno,若是不是AGAIN,就斷開鏈接。
參考資料4中描述,經過發送SYN-FIN報文來達到產生CLOSE_WAIT狀態鏈接,沒有進行具體實驗。不過我的認爲協議棧會丟棄這種非法報文,感興趣的同窗能夠測試一下,而後把結果告訴我;-)
爲了更加清楚的說明這個問題,咱們寫一個測試程序,注意這個測試程序是有缺陷的。
只要咱們構造一種狀況,使得對方關閉了socket,咱們還在read,或者是直接不關閉socket就會構造這樣的狀況。
server.c:linux
#include <stdio.h> #include <string.h> #include <netinet/in.h> #define MAXLINE 80 #define SERV_PORT 8000 int main(void) { struct sockaddr_in servaddr, cliaddr; socklen_t cliaddr_len; int listenfd, connfd; char buf[MAXLINE]; char str[INET_ADDRSTRLEN]; int i, n; listenfd = socket(AF_INET, SOCK_STREAM, 0); int opt = 1; setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htonl(INADDR_ANY); servaddr.sin_port = htons(SERV_PORT); bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr)); listen(listenfd, 20); printf("Accepting connections ...\n"); while (1) { cliaddr_len = sizeof(cliaddr); connfd = accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len); //while (1) { n = read(connfd, buf, MAXLINE); if (n == 0) { printf("the other side has been closed.\n"); break; } printf("received from %s at PORT %d\n", inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)), ntohs(cliaddr.sin_port)); for (i = 0; i < n; i++) buf[i] = toupper(buf[i]); write(connfd, buf, n); } //這裏故意不關閉socket,或者是在close以前加上一個sleep均可以 //sleep(5); //close(connfd); } } |
client.c:面試
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <netinet/in.h> #define MAXLINE 80 #define SERV_PORT 8000 int main(int argc, char *argv[]) { struct sockaddr_in servaddr; char buf[MAXLINE]; int sockfd, n; char *str; if (argc != 2) { fputs("usage: ./client message\n", stderr); exit(1); } str = argv[1]; sockfd = socket(AF_INET, SOCK_STREAM, 0); bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr); servaddr.sin_port = htons(SERV_PORT); connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr)); write(sockfd, str, strlen(str)); n = read(sockfd, buf, MAXLINE); printf("Response from server:\n"); write(STDOUT_FILENO, buf, n); write(STDOUT_FILENO, "\n", 1); close(sockfd); return 0; } |
結果以下:app
debian-wangyao:~$ ./client a Response from server: A debian-wangyao:~$ ./client b Response from server: B debian-wangyao:~$ ./client c Response from server: C debian-wangyao:~$ netstat -antp | grep CLOSE_WAIT (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 1 0 127.0.0.1:8000 127.0.0.1:58309 CLOSE_WAIT 6979/server tcp 1 0 127.0.0.1:8000 127.0.0.1:58308 CLOSE_WAIT 6979/server tcp 1 0 127.0.0.1:8000 127.0.0.1:58307 CLOSE_WAIT 6979/server |
解決方法
基本的思想就是要檢測出對方已經關閉的socket,而後關閉它。
1.代碼須要判斷socket,一旦read返回0,斷開鏈接,read返回負,檢查一下errno,若是不是AGAIN,也斷開鏈接。(注:在UNP 7.5節的圖7.6中,能夠看到使用select可以檢測出對方發送了FIN,再根據這條規則就能夠處理CLOSE_WAIT的鏈接)
2.給每個socket設置一個時間戳last_update,每接收或者是發送成功數據,就用當前時間更新這個時間戳。按期檢查全部的時間戳,若是時間戳與當前時間差值超過必定的閾值,就關閉這個socket。
3.使用一個Heart-Beat線程,按期向socket發送指定格式的心跳數據包,若是接收到對方的RST報文,說明對方已經關閉了socket,那麼咱們也關閉這個socket。
4.設置SO_KEEPALIVE選項,並修改內核參數
前提是啓用socket的KEEPALIVE機制:
//啓用socket鏈接的KEEPALIVE
int iKeepAlive = 1;
setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (void *)&iKeepAlive, sizeof(iKeepAlive));
tcp_keepalive_intvl (integer; default: 75; since Linux 2.4)
The number of seconds between TCP keep-alive probes.
tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
The maximum number of TCP keep-alive probes to send before giving up and killing the connection if no response is obtained from the other end.
tcp_keepalive_time (integer; default: 7200; since Linux 2.2)
The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep-alives are only sent when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours). An idle connec‐tion is terminated after approximately an additional 11 minutes (9 probes an interval of 75 seconds apart) when keep-alive is enabled.
echo 120 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 2 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 1 > /proc/sys/net/ipv4/tcp_keepalive_probes
除了修改內核參數外,可使用setsockopt修改socket參數,參考man 7 socket。socket
int KeepAliveProbes=1; int KeepAliveIntvl=2; int KeepAliveTime=120; setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, (void *)&KeepAliveProbes, sizeof(KeepAliveProbes)); setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, (void *)&KeepAliveTime, sizeof(KeepAliveTime)); setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, (void *)&KeepAliveIntvl, sizeof(KeepAliveIntvl)); |
參考:
http://blog.chinaunix.net/u/20146/showart_1217433.html
http://blog.csdn.net/eroswang/archive/2008/03/10/2162986.aspx
http://haka.sharera.com/blog/BlogTopic/32309.htm
http://learn.akae.cn/media/ch37s02.html
http://faq.csdn.net/read/208036.html
http://www.cndw.com/tech/server/2006040430203.asp
http://davidripple.bokee.com/1741575.html
http://doserver.net/post/keepalive-linux-1.php
man 7 tcptcp