今天上班遇到一個神奇的問題,之因此神奇,是由於本身以前歷來沒遇到過,就好像以前歷來沒打過電話,拿到電話,忽然發現一根線或線都不須要就能夠和千里以外的Ta聊天的感受linux
首先介紹一個工具,微軟雲同事介紹的,用於解決一些服務器禁止Ping, icmp 包響應的另一種方式:web
1. 使用paping來測試連通性 Linux 平臺: :windows
wget http://www.updateweb.cn/softwares/paping_1.5.5_x86-64_linux.tar.gzapi
or 瀏覽器
wget https://zhangtaostorage.blob.core.chinacloudapi.cn/share/paping_1.5.5_x86-64_linux.tar.gz服務器
這是一個壓縮包,解壓方法 tar zvxf paping_1.5.5_x86-64_linux.tar.gz網絡
使用方法 ./paping–p 80 -c 500 www.xxx.com (該示例命令爲進行500次的 目標IP 80 端口的連通性測試)架構
二、使用psping來測試連通性 win平臺: app
Psping下載地址: http://www.updateweb.cn/softwares/PSTools.zipcurl
OR
Psping下載地址: http://technet.microsoft.com/en-us/sysinternals/jj729731
而且放到C:\Windows\system32目錄下
而後在cmd模式下執行:psping ipaddress:port
例如:
-----------------------------------------------------------------------
言歸正傳:
你發現第一張圖,出現了connection timed out 的提示, 而同一網下的另外一臺機器卻連通自在, 第一反應就是防火牆 or 網絡黑名單阻止掉了,通過多方一塊兒排查,這個猜測是錯誤的,
最後解決方法是:
確認一下您的Linux系統的內核參數配置:sysctl -a | grep tcp
sysctl -w net.ipv4.tcp_timestamps=1
sysctl -w net.ipv4.tcp_tw_recycle=0
改成Linux 內核參數:
sysctl -w net.ipv4.tcp_timestamps=0
sysctl -w net.ipv4.tcp_tw_recycle=0
便可, 網絡一會兒就順暢了
附註:
---------------------------------------------------------
PsPing v2.01 使用說明
By Mark Russinovich
Published: January 29, 2014
Download PsTools(1,644 KB)
Rate:
Introduction
PsPing implements Ping functionality, TCP ping, latency and bandwidth measurement. Use the following command-line options to show the usage for each test type:
Installation
Copy PsPing onto your executable path. Typing "psping" displays its usage syntax.
Using PsPing
PsPing implements Ping functionality, TCP ping, latency and bandwidth measurement. Use the following command-line options to show the usage for each test type:
Usage: psping -? [i|t|l|b]
-? IUsage for ICMP ping.-? TUsage for TCP ping.-? LUsage for latency test.-? BUsage for bandwidth test.
ICMP ping usage: psping [[-6]|[-4]] [-h [buckets | <val1>,<val2>,...]] [-i <interval>] [-l <requestsize>[k|m] [-q] [-t|-n <count>] [-w <count>] <destination>
-hPrint histogram (default bucket count is 20).
If you specify a single argument, it's interpreted as a bucket count and the histogram will contain that number of buckets covering the entire time range of values. Specify a comma-separated list of times to create a custom histogram (e.g. "0.01,0.05,1,5,10").-iInterval in seconds. Specify 0 for fast ping.-lRequest size. Append 'k' for kilobytes and 'm' for megabytes.-nNumber of pings or append 's' to specify seconds e.g. '10s'.-qDon't output during pings.-tPing until stopped with Ctrl+C and type Ctrl+Break for statistics.-wWarmup with the specified number of iterations (default is 1).-4Force using IPv4.-6Force using IPv6.
For high-speed ping tests use -q and -i 0.
TCP ping usage: psping [[-6]|[-4]] [-h [buckets | <val1>,<val2>,...]] [-i <interval>] [-l <requestsize>[k|m] [-q] [-t|-n <count>] [-w <count>] <destination:destport>
-hPrint histogram (default bucket count is 20).
If you specify a single argument, it's interpreted as a bucket count and the histogram will contain that number of buckets covering the entire time range of values. Specify a comma-separated list of times to create a custom histogram (e.g. "0.01,0.05,1,5,10").-iInterval in seconds. Specify 0 for fast ping.-lRequest size. Append 'k' for kilobytes and 'm' for megabytes.-nNumber of pings or append 's' to specify seconds e.g. '10s'.-qDon't output during pings.-tPing until stopped with Ctrl+C and type Ctrl+Break for statistics.-wWarmup with the specified number of iterations (default is 1).-4Force using IPv4.-6Force using IPv6.
For high-speed ping tests use -q and -i 0.
TCP and UDP latency usage:
server: psping [[-6]|[-4]] [-f] <-s source:sourceport>
client: psping [[-6]|[-4]] [-f] [-u] [-h [buckets | <val1>,<val2>,...]] [-r] <-l requestsize>[k|m]] <-n count> [-w <count>] <destination:destport>
-fOpen source firewall port during the run.-uUDP (default is TCP).-hPrint histogram (default bucket count is 20).
If you specify a single argument, it's interpreted as a bucket count and the histogram will contain that number of buckets covering the entire time range of values. Specify a comma-separated list of times to create a custom histogram (e.g. "0.01,0.05,1,5,10").-lRequest size. Append 'k' for kilobytes and 'm' for megabytes.-nNumber of sends/receives. Append 's' to specify seconds e.g. '10s'-rReceive from the server instead of sending.-wWarmup with the specified number of iterations (default is 5).-4Force using IPv4.-6Force using IPv6.-sServer listening address and port.
The server can serve both latency and bandwidth tests and remains active until you terminate it with Control-C.
TCP and UDP bandwidth usage:
server: psping [[-6]|[-4]] [-f] <-s source:sourceport>
client: psping [[-6]|[-4]] [-f] [-u] [-h [buckets | <val1>,<val2>,...]] [-r] <-l requestsize>[k|m]] <-n count> [-i <outstanding>] [-w <count>] <destination:destport>
-fOpen source firewall port during the run.-uUDP (default is TCP).-bBandwidth test.-hPrint histogram (default bucket count is 20).
If you specify a single argument, it's interpreted as a bucket count and the histogram will contain that number of buckets covering the entire time range of values. Specify a comma-separated list of times to create a custom histogram (e.g. "0.01,0.05,1,5,10").-iNumber of outstanding I/Os (default is min of 16 and 2x CPU cores).-lRequest size. Append 'k' for kilobytes and 'm' for megabytes.-nNumber of sends/receives. Append 's' to specify seconds e.g. '10s'-rReceive from the server instead of sending.-wWarmup for the specified iterations (default is 2x CPU cores).-4Force using IPv4.-6Force using IPv6.-sServer listening address and port.
The server can serve both latency and bandwidth tests and remains active until you terminate it with Control-C.
Examples
This command executes an ICMP ping test for 10 iterations with 3 warmup iterations:
psping -n 10 -w 3 marklap
To execute a TCP connect test, specify the port number. The following command executes connect attempts against the target as quickly as possible, only printing a summary when finished with the 100 iterations and 1 warmup iteration:
psping -n 100 -i 0 -q marklap:80
To configure a server for latency and bandwidth tests, simply specify the -s option and the source address and port the server will bind to:
psping -s 192.168.2.2:5000
A buffer size is required to perform a TCP latency test. This example measures the round trip latency of sending an 8KB packet to the target server, printing a histogram with 100 buckets when completed:
psping -l 8k -n 10000 -h 100 192.168.2.2:5000
This command tests bandwidth to a PsPing server listening at the target IP address for 10 seconds and produces a histogram with 100 buckets. Note that the test must run for at least one second after warmup for a histogram to generate. Simply add -u to have PsPing perform a UDP bandwidth test.
psping -b -l 8k -n 10000 -h 100 192.168.2.2:5000
---------------------------
附2:
tcp_tw_recycle和tcp_timestamps致使connect失敗問題
近來線上陸續出現了一些connect失敗的問題,通過分析試驗,最終確認和proc參數tcp_tw_recycle/tcp_timestamps相關;
1. 現象
第一個現象:模塊A經過NAT網關訪問服務S成功,而模塊B經過NAT網關訪問服務S常常性出現connect失敗,抓包發現:服務S端已經收到了syn包,但沒有回覆synack;另外,模塊A關閉了tcp timestamp,而模塊B開啓了tcp timestamp;
第二個現象:不一樣主機上的模塊C(開啓timestamp),經過NAT網關(1個出口ip)訪問同一服務S,主機C1 connect成功,而主機C2 connect失敗;
2. 分析
根據現象上述問題明顯和tcp timestmap有關;查看linux 2.6.32內核源碼,發現tcp_tw_recycle/tcp_timestamps都開啓的條件下,60s內同一源ip主機的socket connect請求中的timestamp必須是遞增的。
源碼函數:tcp_v4_conn_request(),該函數是tcp層三次握手syn包的處理函數(服務端);
源碼片斷:
if (tmp_opt.saw_tstamp &&
tcp_death_row.sysctl_tw_recycle &&
(dst = inet_csk_route_req(sk, req)) != NULL &&
(peer = rt_get_peer((struct rtable *)dst)) != NULL &&
peer->v4daddr == saddr) {
if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
(s32)(peer->tcp_ts - req->ts_recent) >
TCP_PAWS_WINDOW) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
goto drop_and_release;
}
}
tmp_opt.saw_tstamp:該socket支持tcp_timestamp
sysctl_tw_recycle:本機系統開啓tcp_tw_recycle選項
TCP_PAWS_MSL:60s,該條件判斷表示該源ip的上次tcp通信發生在60s內
TCP_PAWS_WINDOW:1,該條件判斷表示該源ip的上次tcp通信的timestamp 大於 本次tcp
分析:主機client1和client2經過NAT網關(1個ip地址)訪問serverN,因爲timestamp時間爲系統啓動到當前的時間,所以,client1和client2的timestamp不相同;根據上述syn包處理源碼,在tcp_tw_recycle和tcp_timestamps同時開啓的條件下,timestamp大的主機訪問serverN成功,而timestmap小的主機訪問失敗;
參數:/proc/sys/net/ipv4/tcp_timestamps - 控制timestamp選項開啓/關閉
/proc/sys/net/ipv4/tcp_tw_recycle - 減小timewait socket釋放的超時時間
3. 解決方法
echo 0 > /proc/sys/net/ipv4/tcp_tw_recycle;
tcp_tw_recycle默認是關閉的,有很多服務器,爲了提升性能,開啓了該選項;
爲了解決上述問題,我的建議關閉tcp_tw_recycle選項,而不是timestamp;由於 在tcp timestamp關閉的條件下,開啓tcp_tw_recycle是不起做用的;而tcp timestamp能夠獨立開啓並起做用。
源碼函數: tcp_time_wait()
源碼片斷:
if (tcp_death_row.sysctl_tw_recycle && tp->rx_opt.ts_recent_stamp)
recycle_ok = icsk->icsk_af_ops->remember_stamp(sk);
......
if (timeo < rto)
timeo = rto;
if (recycle_ok) {
tw->tw_timeout = rto;
} else {
tw->tw_timeout = TCP_TIMEWAIT_LEN;
if (state == TCP_TIME_WAIT)
timeo = TCP_TIMEWAIT_LEN;
}
inet_twsk_schedule(tw, &tcp_death_row, timeo,
TCP_TIMEWAIT_LEN);
timestamp和tw_recycle同時開啓的條件下,timewait狀態socket釋放的超時時間和rto相關;不然,超時時間爲TCP_TIMEWAIT_LEN,即60s;
內核說明文檔 對該參數的介紹以下:
tcp_tw_recycle - BOOLEAN
Enable fast recycling TIME-WAIT sockets. Default value is 0.
It should not be changed without advice/request of technical
experts.
原文連接:http://blog.sina.com.cn/u/2015038597
-----------------------------
附2:
一.狀況表現爲
1.在公司內網對站點的http訪問:
linux主機出現故障:curl以及抓包分析,發現服務端不響應linux客戶端的請求,沒法創建TCP鏈接,瀏覽器返回「沒法鏈接到服務器」
windows主機正常
2.http訪問質量降低:
基調顯示,新架構上線後,訪問質量下滑,主要表現爲
2.1.訪問提示「沒法鏈接到服務器」
2.2.僅少數人遇到這種故障,而且一天中不是每次訪問都會遇到,而是出現時好時壞的現象
二.處理過程
直接上google搜索關鍵字「服務器沒法創建TCP鏈接」。
翻了幾頁後。
看了一下,和咱們公司內網的表現如出一轍,但各類問題(1爲這方面基礎知識薄弱,2爲沒有時間驗證此配置)
而後這種問題持續了n久...一直覺得是內部設備問題
後期搞不定了,大膽在線上啓用這個參數「net.ipv4.tcp_timestamps = 0」,作了下測試後,發現故障解除,原故障機每次訪問都正常了!
不過仍是不明其中原理,只是大意瞭解,一樣處於NAT上網方式的用戶裏(與別人共用出口IP地址),若是你的時間戳小於別人的,那麼服務器不會響應你的TCP請求,要忽略此項,將net.ipv4.tcp_timestamps = 0(/etc/sysctl.conf)
三.總結
後期學習時,看見了一個更加詳細的博客,講的很詳細,也引入了新的問題:
====== 小抄 ======
其實,linux服務器本來對時間戳(timestamps)默認是不開啓的,Linux是否啓用這種行爲取決於tcp_timestamps和tcp_tw_recycle,由於tcp_timestamps缺省就是開啓的,因此當tcp_tw_recycle被開啓後,實際上這種行爲就被激活了。
net.ipv4.tcp_tw_recycle又是啥呢,搜索了一下基本上是TIME_WAIT鏈接的回收參數
當 net.ipv4.tcp_timestamps 沒有設置(缺省爲開啓),而且 net.ipv4.tcp_tw_recycle 也開啓時,這個坑爹的錯誤就出現了,可是注意,只表如今NAT網絡環境中。並且,大多數博客,以及一些大牛們,都有說過要開啓 net.ipv4.tcp_tw_recycle ...
====== 小抄 ======
四.未完成的事項
1.(未驗證)關閉timestamps後,tw_recycle功能是失效的問題
2.(未驗證)新的解決TIME_WAIT鏈接過多的方法:net.ipv4.tcp_max_tw_buckets = 10000 設置一個最大值,不過壞處是系統日誌會提示:TCP: time wait bucket table overflow