解決php sockect Address already的問題

時間 2019-11-06

原文原文鏈接

In order for a network connection to close, both ends have to send FIN (final) packets, which indicate they will not send any additional data, and both ends must ACK (acknowledge) each other's FIN packets. The FIN packets are initiated by the application performing a close(), a shutdown(), or an exit(). The ACKs are handled by the kernel after the close()has completed. Because of this, it is possible for the process to complete before the kernel has released the associated network resource, and this port cannot be bound to another process until the kernel has decided that it is done.html

Figure 1linux

Figure 1 shows all of the possible states that can occur during a normal closure, depending on the order in which things happen. Note that if you initiate closure, there is a TIME_WAIT state that is absent from the other side. This TIME_WAIT is necessary in case the ACK you sent wasn't received, or in case spurious packets show up for other reasons. I'm really not sure why this state isn't necessary on the other side, when the remote end initiates closure, but this is definitely the case. TIME_WAIT is the state that typically ties up the port for several minutes after the process has completed. The length of the associated timeout varies on different operating systems, and may be dynamic on some operating systems, however typical values are in the range of one to four minutes.web

If both ends send a FIN before either end receives it, both ends will have to go through TIME_WAIT.編程

Normal Closure of Listen Sockets

A socket which is listening for connections can be closed immediately if there are no connections pending, and the state proceeds directly to CLOSED. If connections are pending however, FIN_WAIT_1 is entered, and a TIME_WAITis inevitable.sass

Note that it is impossible to completely guarantee a clean closure here. While you can check the connections using a select() call before closure, a tiny but real possibility exists that a connection could arrive after the select() but before the close().服務器

Abnormal Closure

If the remote application dies unexpectedly while the connection is established, the local end will have to initiate closure. In this case TIME_WAIT is unavoidable. If the remote end disappears due to a network failure, or the remote machine reboots (both are rare), the local port will be tied up until each state times out. Worse, some older operating systems do not implement a timeout for FIN_WAIT_2, and it is possible to get stuck there forever, in which case restarting your server could require a reboot.cookie

If the local application dies while a connection is active, the port will be tied up in TIME_WAIT. This is also true if the application dies while a connection is pending.網絡

Strategies for Avoidance

SO_REUSEADDR

You can use setsockopt() to set the SO_REUSEADDR socket option, which explicitly allows a process to bind to a port which remains in TIME_WAIT(it still only allows a single process to be bound to that port). This is the both the simplest and the most effective option for reducing the "address already in use" error.併發

Oddly, using SO_REUSEADDR can actually lead to more difficult "address already in use" errors. SO_REUSADDR permits you to use a port that is stuck in TIME_WAIT, but you still can not use that port to establish a connection to the last place it connected to. What? Suppose I pick local port 1010, and connect to foobar.com port 300, and then close locally, leaving that port in TIME_WAIT. I can reuse local port 1010 right away to connect to anywhere except for foobar.com port 300.app

A situation where this might be a problem is if my program is trying to find a reserved local port (< 1024) to connect to some service which likes reserved ports. If I used SO_REUSADDR, then each time I run the program on my machine, I'll keep getting the same local reserved port, even if it is stuck in TIME_WAIT, and I risk getting a "connect: Address already in use" error if I go back to any place I've been to in the last few minutes. The solution here is to avoid SO_REUSEADDR.

Some folks don't like SO_REUSEADDR because it has a security stigma attached to it. On some operating systems it allows the same port to be used with a different address on the same machine by different processes at the same time. This is a problem because most servers bind to the port, but they don't bind to a specific address, instead they use INADDR_ANY (this is why things show up in netstat output as *.8080). So if the server is bound to *.8080, another malicious user on the local machine can bind to local-machine.8080, which will intercept all of your connections since it is more specific. This is only a problem on multi-user machines that don't have restricted logins, it is NOT a vulnerability from outside the machine. And it is easily avoided by binding your server to the machine's address.

Additionally, others don't like that a busy server may have hundreds or thousands of these TIME_WAIT sockets stacking up and using kernel resources. For these reasons, there's another option for avoiding this problem.

Client Closes First

Looking at the diagram above, it is clear that TIME_WAIT can be avoided if the remote end initiates the closure. So the server can avoid problems by letting the client close first. The application protocol must be designed so that the client knows when to close. The server can safely close in response to an EOFfrom the client, however it will also need to set a timeout when it is expecting an EOF in case the client has left the network ungracefully. In many cases simply waiting a few seconds before the server closes will be adequate.

It probably makes more sense to call this method "Remote Closes First", because otherwise it depends on what you are calling the client and the server. If you are developing some system where a cluster of client programs sit on one machine and contact a variety of different servers, then you would want to foist the responsibility for closure onto the servers, to protect the resources on the client.

For example, I wrote a script that uses rsh to contact all of the machines on our network, and it does it in parallel, keeping some number of connections open at all times. rsh source ports are arbitrary available ports less than 1024. I initially used "rsh -n", which it turns out causes the local end to close first. After a few tests, every single free port less than 1024 was stuck in TIME_WAIT and I couldn't proceed. Removing the "-n" option causes the remote (server) end to close first (understanding why is left as an exercise for the reader), and should've eliminated the TIME_WAIT problem. However, without the -n, rsh can hang waiting for input. And, if you close input at the local end, this can again result in the port going into TIME_WAIT. I ended up avoiding the system-installed rsh program, and developing my own implementation in perl. My current implementation, multi-rsh, is available for download

Reduce Timeout

If (for whatever reason) neither of these options works for you, it may also be possible to shorten the timeout associated with TIME_WAIT. Whether this is possible and how it should be accomplished depends on the operating system you are using. Also, making this timeout too short could have negative side-effects, particularly in lossy or congested networks.

從"address already in use"談起

1 問題

問題起源：不少時候，server端若是重啓或者崩潰，會遇到「 Address already in use」。過幾分鐘，就能夠從新啓動了。

下面是問題：

A）爲何會出現這種狀況?

B) 如何解決，使得服務器可以立刻啓動？

2 分析

原來，Server端若是重啓或者遇到崩潰，會進入TIME_WAIT狀態，而且會等待2MSL的時間，在這個時間內，是不容許服務器重啓的。

那爲何Server端會是TIME_WAIT狀態，而不是Close狀態。這就涉及到TCP鏈接關閉的問題。

2.1 TCP鏈接關閉流程

TCP中，執行主動關閉的一方會進入TIME_WAIT的狀態，圖中的例子是Client進入TIME_WAIT狀態。

進入 TIME_WAIT狀態以後，會等待2MSL（Max Segment Lifetime，最大段生存時間，MSL爲2min，1min，30s,根據不一樣的實現決定，RFC 793 建議爲2min）。

做爲參考，下面是TCP鏈接狀態轉換圖。

2.2 TIME_WAIT的做用

TIME_WAIT有2個做用：

1）當主動關閉方發送最後的ACK消息丟失時，會致使另外一方從新發送FIN消息。 TIME-WAIT 狀態用於維護鏈接狀態。

–若是主動關閉方直接關閉鏈接，當重傳的FIN消息到達時，由於TCP已經再也不有鏈接的信息了，因此它就用RST（從新啓動）消息應答，這樣會致使對等方進入錯誤狀態而不是有序的終止狀態。

–從新啓動2MSL計時器，防止該ACK再次丟失。

2）爲鏈接中「離羣的段」提供從網絡中消失的時間。

網絡中的數據包由於延時等因素，可能在鏈接關閉以後纔到達，若是沒有進入TIME_WAIT狀態，且知足

A) 又創建了新的鏈接，且新的鏈接的4元組和上次的鏈接同樣，即Src_IP, Src_Port, Dst_IP,Dst_Port同樣。

B）這個延時的數據包的序列號剛好又處於對方新鏈接的可接受窗口以內。

知足這個2個條件，就會被接收，而且會破壞新的鏈接。

而進入TIME_WAIT狀態，而且等待2 MSL，就給網絡中「離羣的段」提供了消失的時間。

2.3 如何結束TIME_WAIT狀態呢

有種說法，叫作TIME_WAIT Assassination，就是TIME_WAIT暗殺。有2種狀況會致使TIME_WAIT Assassination.

A) 意外終止。

以下圖所示，當有個延時的MSG發送過來的時候，執行主動關閉的HOST1處於TIME_WAIT,由於這個延時的MSG的序列號不在當前能處理的窗口範圍以內，HOST1會發送一個ACK包，告訴對方說，我HOST1能收的序列號是多少。而對方已經關閉，處於Close狀態，收到一個ACK包，就會回覆一個RST包給HOST1。致使HOST1當即結束。

TIME_WAIT給Assassinate掉了。這種狀況有沒有辦法避免呢？

有的，有的實現這麼處理：當處於TIME_WAIT狀態時不處理RST包便可。

B) 人爲形成。

能夠調用setsockopt，設置SO_LINGER，就能夠不進行結束鏈接的4次握手，不進入TIME_WAIT，而直接關閉鏈接。

關於SO_LINGER

–應用程序關閉鏈接時，close或者closesocket調用會操當即返回，若是有數據殘留在套接口緩衝區中則系統將試着將這些數據發送給對方，可是應用程序並不知道遞交是否成功。

–close的成功返回僅告訴咱們發送的數據（和FIN）已由對方TCP確認，它並不能告訴咱們對方應用進程是否已讀了數據。若是套接口設爲非阻塞的，它將不等待close完成

–SO_LINGER選項用來改變此缺省設置

設置SO_LINGER結構

struct linger {

int l_onoff; /* 0 = off, nozero = on */

int l_linger; /* linger time */

};

–l_onoff爲0，則該選項關閉，l_linger的值被忽略，等於缺省狀況，close當即返回；

–l_onoff爲非0，l_linger爲0，則套接口關閉時TCP中斷鏈接，TCP將丟棄保留在套接口發送緩衝區中的任何數據併發送一個RST給對方，而不是一般的四次揮手終止序列，這避免了TIME_WAIT狀態；

–l_onoff 爲非0，l_linger爲非0，當套接口關閉時內核將拖延一段時間（由l_linger決定）。若是套接口緩衝區中仍殘留數據，進程將處於睡眠狀態，直到全部數據發送完且被對方確認，以後進行正常的終止序列或延遲時間到。此種狀況下，應用程序檢查close的返回值是很是重要的，若是在數據發送完並被確認前時間到，close將返回EWOULDBLOCK錯誤且套接口發送緩衝區中的任何數據都丟失。

2.4 關於TIME_WAIT狀態的結論

健壯的應用程序永遠不該該干涉TIME-WAIT狀態----它是TCP可靠性機制的一個重要部分。

3 Server問題分析

上面講了TIME_WAIT相關的知識，如今咱們知道，當Server端重啓或者崩潰的時候，它就是主動關閉的一方，會進入TIME_WAIT狀態，致使服務器不能重啓。

那咱們能夠立刻重啓麼，能夠的。

4 如何立刻重啓Server

在調用bind函數以前，設置SO_REUSEADDR就能夠了。

說到這裏，好像應該結束了，可是，咱們剛剛介紹過，TIME_WAIT的做用有2個，那這裏Server重用這個地址，有沒有可能致使問題呢。

答案是確定的，有這個可能。只要知足4元組相同，而且delay的數據包的序列號在新的鏈接可接受的窗口以內，就可能致使問題。

在Stackoverflow上，有人問過這個問題：

Using SO_REUSEADDR - What happens to previously open socket?

答案就是：The SO_REUSEADDR option overrides that behavior, allowing you to reuse the port immediately.

Effectively, you're saying: "I understand the risks and would like to use the port anyway."

Linux下有關TCP協議TIME_WAIT狀態分析

今天遇到一個端口問題。socket編程中，值得注意的是，調用close(sock_id)函數sock_id套接口不會當即釋放。這是TCP協議的特性，主要是爲了讓雙方有足夠的時候進行「四次信號」關閉。

咱們能夠回顧下計算機網絡TCP的握手操做：

所以，當調用close()函數以後，套接口狀態由原來的ESTABLISHED狀態變成TIME_WAIT狀態，這段時間端口未被釋放，這段時間內調用bind()函數，綁定這個端口，將會出錯「can’t bind server socket :address already in use」。能夠修改協議保持TIME_WAIT狀態的時間，具體修改辦法，能夠參考

下面一段代碼應用自：linux下解決大量的TIME_WAIT

[root@web02 ~]# vi /etc/sysctl.conf
新增以下內容：
net.ipv4.tcp_tw_reuse  = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies=1
使 內核參數生效：
[root@web02 ~]# sysctl -p
readme:
net.ipv4.tcp_syncookies=1  打開TIME-WAIT套接字重用功能，對於存在大量鏈接的Web服務器很是有效。
net.ipv4.tcp_tw_recyle=1
net.ipv4.tcp_tw_reuse=1  減小處於FIN-WAIT-2鏈接狀態的時間，使系統能夠處理更多的鏈接。
net.ipv4.tcp_fin_timeout=30  減小TCP KeepAlive鏈接偵測的時間，使系統能夠處理更多的鏈接。
net.ipv4.tcp_keepalive_time=1800  增長TCP SYN隊列長度，使系統能夠處理更多的併發鏈接。
net.ipv4.tcp_max_syn_backlog=8192

1. 若是仍是想執行bind()函數，能夠繞過TIME_WAIT狀態，使用setsockopt函數，重用端口，這樣bind()的時候就不會出錯。例如：

int sock,opt=1;//opt=0則爲禁止重用
sock=sock(....);
setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,&opt,sizeof(opt));
bind(...);

具體setsockopt函數的操做能夠參考int setsockopt(int socket, int level, int option_name, const void *option_value, socklen_t option_len);

2. 禁止TIME_WAIT狀態。

setsockopt函數的SO_LINGER參數能夠設置是否延遲關閉套接口。
struct linger {
int l_onoff; /* 0 = off, nozero = on */
int l_linger; /* linger time */
};

int server_fd;
server_fd=socket(AF_INET,SOCK_STREAM,0);
int opt=1;
setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct linger li;
li.l_onoff = 1;
li.l_linger = 0;
setsockopt (server_fd,SOL_SOCKET, SO_LINGER,(const char *)&li,sizeof (li));

通過個人測試以上代碼能夠實現當即關閉暴力套接口，在終端執行 sudo netstat -anp | grep 8080 以後8080端口不會是TIME_WAIT狀態。

若是你的程序是C/S模式的。此時，經過以上代碼，當服務器關閉套接口後，對方不會出現 peer reset而自動退出。我判斷，當禁止掉延遲關閉套接口以後，並無執行「四次信號」結束，服務器本身斷開了，沒有通知客戶端。