環境:html
dotnet core 1.0.1nginx
CentOS 7.2web
今天在服務器巡檢的時候,發現一個服務大量拋出異常服務器
異常信息爲:網絡
LockStatusPushError&&Message:One or more errors occurred. (An error occurred while sending the request. Too many open files)&InnerMessageAn error occurred while sending the request. Too many open files& at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken) at System.Threading.Tasks.Task.Wait() at CommonHelper.HttpHelper.HttpRequest(String Url, String Method, String ContentType, Byte[] data, Encoding encoding) at CommonHelper.HttpHelper.PostForm(String Url, Dictionary`2 para, Encoding encoding) at CommonHelper.HttpHelper.PostForm(String Url, Dictionary`2 para) at DeviceService.Program.LockStatusPushMethod()
首先推斷,是程序打開文件(端口或者管道)太多致使的超過系統最大限制app
使用 ulimit -n 查看最大限制 發現 系統最大限制爲65535 爲正常值socket
使用 lsof | wc -l 查看當前打開文件數 發現執行很是緩慢,執行結果顯示系統當前打開文件數500w++。。。。。async
繼而查看dotnet程序打開文件數,發現爲400w++tcp
lsof>>/tmp/lsof.log 把當前打開文件列表保存 以供問題判斷。ide
文件導出後,發現 dotnet 程序有大量狀態爲 CLOSE_WAIT 的socket鏈接 目的地址爲程序訪問的HTTP服務器的80端口
dotnet 12208 20425 root 216r FIFO 0,8 0t0 2273974 pipe dotnet 12208 20425 root 217w FIFO 0,8 0t0 2273974 pipe dotnet 12208 20425 root 218u IPv4 2274459 0t0 TCP txk-web:44336->txk-web:http (CLOSE_WAIT) dotnet 12208 20425 root 219r FIFO 0,8 0t0 2274460 pipe dotnet 12208 20425 root 220w FIFO 0,8 0t0 2274460 pipe dotnet 12208 20425 root 221u IPv4 2271144 0t0 TCP txk-web:44340->txk-web:http (CLOSE_WAIT) dotnet 12208 20425 root 222r FIFO 0,8 0t0 2273977 pipe dotnet 12208 20425 root 223w FIFO 0,8 0t0 2273977 pipe dotnet 12208 20425 root 224u IPv4 2274462 0t0 TCP txk-web:44344->txk-web:http (CLOSE_WAIT) dotnet 12208 20425 root 225r FIFO 0,8 0t0 2271147 pipe dotnet 12208 20425 root 226w FIFO 0,8 0t0 2271147 pipe dotnet 12208 20425 root 227u IPv4 2272624 0t0 TCP txk-web:44348->txk-web:http (CLOSE_WAIT) dotnet 12208 20425 root 228r FIFO 0,8 0t0 2272625 pipe dotnet 12208 20425 root 229w FIFO 0,8 0t0 2272625 pipe dotnet 12208 20425 root 230u IPv4 2273985 0t0 TCP txk-web:44352->txk-web:http (CLOSE_WAIT) dotnet 12208 20425 root 231r FIFO 0,8 0t0 2271150 pipe dotnet 12208 20425 root 232w FIFO 0,8 0t0 2271150 pipe dotnet 12208 20425 root 233u IPv4 2272627 0t0 TCP txk-web:44356->txk-web:http (CLOSE_WAIT)
定位緣由出如今HTTP訪問上
繼而查看程序的日誌,發現須要程序訪問的HTTP接口報500錯誤,
出現錯誤後程序會重試請求(邏輯上要求重試),重試間隔爲100ms,過短致使短期內有太多請求
首先解釋CLOSE_WAIT
對方主動關閉鏈接或者網絡異常致使鏈接中斷,這時我方的狀態會變成CLOSE_WAIT 此時我方要關閉鏈接來使得鏈接正確關閉。
初步判斷可能有以下緣由:
1.程序拋出異常後沒有釋放資源
2.dotnet core 底層的 bug
3.nginx代理強制關個人鏈接,又沒有給我關閉的確認包
4.HTTP請求超時(這個基本沒可能,HTTP接口在本機)
接下來首先看代碼,個人HTTP訪問方法代碼以下:
private static byte[] HttpRequest(string Url, string Method, string ContentType, byte[] data, Encoding encoding) { WebResponse response = null; HttpWebRequest request = null; byte[] result = null; try { request = (HttpWebRequest)WebRequest.Create(Url); request.Headers["UserAgent"] = @"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"; request.Accept = @"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"; request.Method = Method; request.ContentType = ContentType; if (data != null) { var reqStreamAsync = request.GetRequestStreamAsync(); //reqStreamAsync.Wait(); using (Stream reqStream = reqStreamAsync.Result) { reqStream.Write(data, 0, data.Length); reqStream.Dispose(); } } var reqAsync = request.GetResponseAsync(); //reqAsync.Wait(); using (response = reqAsync.Result) { using (Stream stream = response.GetResponseStream()) { List<byte> byteArr = new List<byte>(); int tmp = -1; while ((tmp = stream.ReadByte()) >= 0) { byteArr.Add((byte)tmp); } result = byteArr.ToArray(); stream.Dispose(); } response.Dispose(); } } catch (Exception ex) { throw; } finally { if (request != null) { request.Abort(); request = null; } if (response != null) { response.Dispose(); response = null; } } return result; }
看到代碼 第一想法是 HttpWebRequest 沒有套using也沒有Dispose(),
可是嘗試後發現,這個類根本就沒有實現IDisposable接口,也無法手工釋放,
百度以後獲得結論,只能Abort(),添加到finally,順便給WebResponse增長Dispost(),從新嘗試 -------- 無效。
以後修改了Centos的/etc/sysctl.conf
增長對keepalive相關配置進行嘗試
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=2
net.ipv4.tcp_keepalive_intvl=2
而後 sysctl -p 從新加載配置,再次嘗試 -------- 問題依舊。
以後又感受是程序沒有釋放HttpWebRequest,
在HTTP訪問方法的finally中加入GC.Collect(),但願強制回收 -------- 仍是沒用。
最終已經放棄尋找問題,直接把重試的地方增長延時,若是http請求出錯,Thread.Sleep(10000);
臨時解決此問題。
問題最終沒有完美解決。
但願各位若是誰能知道問題緣由,與我討論,謝謝
2017.04.07 更新
今天更換HttpClient進行HTTP通信
發現問題解決了。。。。
代碼以下,歡迎指正~
private async static Task<byte[]> HttpRequest(string Url, HttpMethodEnum HttpMethod, string ContentType, byte[] data) { byte[] result = null; try { using (HttpClient http = new HttpClient()) { http.DefaultRequestHeaders.Add("User-Agent", @"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"); http.DefaultRequestHeaders.Add("Accept", @"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); HttpResponseMessage message = null; if (HttpMethod == HttpMethodEnum.POST) { using (Stream dataStream = new MemoryStream(data ?? new byte[0])) { using (HttpContent content = new StreamContent(dataStream)) { content.Headers.Add("Content-Type", ContentType); message = await http.PostAsync(Url, content); } } } else if (HttpMethod == HttpMethodEnum.GET) { message = await http.GetAsync(Url); } if (message != null && message.StatusCode == System.Net.HttpStatusCode.OK) { using (message) { using (Stream responseStream = await message.Content.ReadAsStreamAsync()) { if (responseStream != null) { byte[] responseData = new byte[responseStream.Length]; responseStream.Read(responseData, 0, responseData.Length); result = responseData; } } } } } } catch (Exception ex) { throw; } return result; }