大部分人都知道tcp的keepalive. 假設讀者知道keepalive會如何觸發. 這篇文章想討論keepalive觸發後, 對socket使用者的影響.python
修改/etc/sysctl.confdocker
ubuntu# vim /etc/sysctl.conf ubuntu# sysctl -p fs.file-max = 131072 net.ipv4.tcp_keepalive_time = 10 net.ipv4.tcp_keepalive_intvl = 5 net.ipv4.tcp_keepalive_probes = 3
驗證ubuntu
ubuntu# sysctl -a | grep keepalive net.ipv4.tcp_keepalive_intvl = 5 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_time = 10
tcp_server.pyvim
import socket import sys sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_address = ('localhost', 22345) sock.bind(server_address) sock.listen(1) connection, client_address = sock.accept() while True: data = connection.recv(1024) print("data", data)
tcp_client.pyapi
import socket import sys import time sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) server_address = ('localhost', 22345) sock.connect(server_address) time.sleep(999999999)
能夠看到, 由於tcp_client開啓了SO_KEEPALIVE, 因此tcp_client主動往tcp_server發起KEEPALIVE探測.
若tcp_server開啓SO_KEEPALIVE, 則是tcp_server往tcp_client發送KEEPALIVE探測.
若是tcp_server/tcp_client都開啓KEEPALIVE, 則會雙向探測.網絡
爲了模擬keepalive生效的狀況, 用docker模擬斷網線的狀況.socket
ubuntu# sudo docker run -it \ --volume=//home/enjolras/code_repo/python/keepalive_test://home/enjolras/code_repo/python/keepalive_test \ --detach=true \ --name=tcp_server \ --privileged=true \ --network=multi-host-network \ ubuntu_with_python 08f89dcff3547bb15c7aed975dfa5a0821e4d0246d6d812e02fd1470f3cef6c3 ubuntu# sudo docker run -it \ --volume=//home/enjolras/code_repo/python/keepalive_test://home/enjolras/code_repo/python/keepalive_test \ --detach=true \ --name=tcp_client \ --privileged=true \ --network=multi-host-network \ ubuntu_with_python
import socket import sys sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_address = ('0.0.0.0', 22345) sock.bind(server_address) sock.listen(1) connection, client_address = sock.accept() connection.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) data = connection.recv(1024) print("data", data)
import socket import sys import time sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) server_address = ('tcp_server', 22345) sock.connect(server_address) time.sleep(999999999)
能夠看到, tcp_server/tcp_client互發心跳.tcp
root@0b3f1ee81446:/# tcpdump -i any port 22345 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 12:29:34.491239 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [S], seq 2347845399, win 28200, options [mss 1410,sackOK,TS val 951128354 ecr 0,nop,wscale 7], length 0 12:29:34.491279 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [S.], seq 1169988006, ack 2347845400, win 27960, options [mss 1410,sackOK,TS val 2298965862 ecr 951128354,nop,wscale 7], length 0 12:29:34.491299 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951128354 ecr 2298965862], length 0 12:29:44.666952 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2298976038 ecr 951128354], length 0 12:29:44.666969 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951138530 ecr 2298965862], length 0 12:29:44.666978 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2298976038 ecr 951128354], length 0 12:29:44.666987 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951138530 ecr 2298976038], length 0 12:29:54.907019 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2298986278 ecr 951138530], length 0 12:29:54.907054 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951148770 ecr 2298976038], length 0 12:29:54.907059 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951148770 ecr 2298976038], length 0 12:29:54.907062 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2298986278 ecr 951138530], length 0
將tcp_server/tcp_client斷網.spa
ubuntu# docker network disconnect multi-host-network tcp_client
能夠看到tcp_server在連續3個探測包沒有回覆後, 往tcp_client發了一個RST.3d
12:31:47.547010 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951261408 ecr 2299088676], length 0 12:31:47.547019 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2299098916 ecr 951251168], length 0 12:31:47.547061 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951261408 ecr 2299098916], length 0 12:31:57.787226 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2299109156 ecr 951261408], length 0 12:32:02.906612 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2299114276 ecr 951261408], length 0 12:32:08.026829 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2299119396 ecr 951261408], length 0 12:32:13.146776 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [R.], seq 1, ack 1, win 219, options [nop,nop,TS val 2299124516 ecr 951261408], length 0
能夠看到, 在心跳機制檢測到socket狀態異常後, 會經過異常/錯誤碼等方式通知調用者.
3f1ee81446:/home/enjolras/code_repo/python/keepalive_test# python tcp_serv Traceback (most recent call last): File "tcp_server.py", line 11, in <module> data = connection.recv(1024) socket.error: [Errno 110] Connection timed out
import socket import sys import select sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_address = ('0.0.0.0', 22345) sock.bind(server_address) sock.listen(1) connection, client_address = sock.accept() connection.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) readable, writable, exeptional = select.select([connection], [], []) print("readable", readable, writable, exeptional) data = connection.recv(1024) print("data", data)
3f1ee81446:/home/enjolras/code_repo/python/keepalive_test# python tcp_serv ('readable', [<socket._socketobject object at 0x7f4e3d5037c0>], [], []) Traceback (most recent call last): File "tcp_server.py", line 14, in <module> data = connection.recv(1024) socket.error: [Errno 110] Connection timed out
不作實驗, 應該和select一致.
heartbeat檢測到tcp連接斷開後, 會以可讀事件方式通知應用層. 若無tcp heartbeat, 也無應用層heartbeat, 應用層沒法得知連接的真實狀態.