原文地址:http://blog.csdn.net/directionofear/article/details/8033506 html
若是Web應用程序常常遇到的問題按頻率排名的話,我以爲 web
第一名unhandled exception服務器
第二名high memorysession
第三名high cpu asp.net
這篇文章介紹web應用程序中cpu使用率太高問題相應的數據收集方式和調試問題的方法。 工具
對ASP.NET Web應用程序CPU使用率太高的問題,從宏觀上分分類,大概就這麼幾種狀況,性能
經過如下命令行建立性能計數信息,該命令的意思是在系統性能計數器裏面添加如下性能計數分類中的全部計數器,生成的日誌大小最大爲500M,達到該上限計數會中止,採樣時間間隔爲15秒,生成的日誌存放在c:\PerfLog\Perf-15Sec.blg文件中。 測試
Processor
Memory
Process
Asp.NET
ASP.NETApplications
.NET CLRExceptions
.NET CLRMemory
.NET CLRLoading 網站
logman.execreate counter Perf-15Sec -f bincirc -max 500 -c "\Processor(*)\*""\Memory\*" "\Process(*)\*" "\ASP.Net(*)\*" "\ASP.Net Applications(*)\*" "\.NET CLRExceptions(*)\*" "\.NET CLR Memory(*)\*" "\.NET CLRLoading(*)\*" -si 00:00:15 -o C:\PerfLog\Perf-15Sec.blg ui
重現該問題前經過以下命令開始計數,
logman.exestart Perf-15Sec
重現問題後經過以下計數中止計數。
logman.exe stop Perf-15Sec
收集性能計數可以從統計信息的角度對問題進行大概的定位,若是要查看程序內部邏輯,須要收集user dump,經過調試器查看程序詳細信息。
收集dump的工具較多,對於high cpu的問題能夠選擇如下幾種,其中Procdump是專門針對high cpu問題而設計,命令和節奏控制比較簡單,推薦使用。
Procdump
下載地址
http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx
在命令行裏面運行如下命令,意思是當目標進程 w3wp的CPU使用率高於80%連續超過10秒鐘的時候,連續收集三個dump到目標路徑c:\dump目錄下。參數須要根據須要自行調整。
procdump -c 80 -s 10 -n 3 -o w3wpc:\dump
Debug Diag
下載地址
http://www.microsoft.com/en-us/download/details.aspx?id=26798
DebugDiag有圖形界面,能夠配置相似的參數規則來抓取不一樣類型的dump。好比抓進程高cpu使用率的狀況。
添加一個performnce規則,按照cpu的performance counter來添加相應的參數。(能夠添加多種條件)
Adplus
Adplus包含在windbg安裝包中,
下載地址
http://www.microsoft.com/en-us/download/details.aspx?id=8279
在問題發生的時候,打開命令行執行如下命令,
adplus -hang -pnw3wp.exe -quiet
調試high cpu的問題就是要弄清楚cpu時間都花在了什麼地方。CPU使用率就是一個統計數據,在採樣時間內cpu忙的時間所佔的百分比。
如何查看cpu時間具體花在了什麼地方?
首先咱們要弄清楚是哪一個或者哪幾個進程佔用了cpu的處理時間。咱們看到機器的cpu使用率偏高,通常都是注意到機器總體的使用率,這個時候性能計數能夠幫助咱們定位high cpu的進程都有哪些。
打開日誌查看process/processortime對於各個進程不一樣數值,就能夠看出來在特定時間哪一個進程在佔用cpu資源。
肯定了進程後,要看進程中到底在作什麼。咱們研究的是asp.net web應用程序,這裏假設發現w3wp進程佔用了大量的cpu資源。
接下來咱們要搞清楚,咱們的cpu資源使用大概落在咱們以前分類的那一種或者幾種類型中。
請求過多的狀況通常表現爲
經過!threadpool命令查看有多少個請求在queue中等待,多少個工做線程在運行,多少個空閒,當前的CPU佔用率爲多少,這裏要注意這個CPU佔用率是當前計算機總體的佔用率,不針對當前進程。
0:000> .load psscor2
0:000> !threadpool
Work Request in Queue: 129
--------------------------------------
Number of Timers: 56
--------------------------------------
CPU utilization 99%
--------------------------------------
Worker Thread: Total: 49 Running: 24 Idle: 25MaxLimit: 400 MinLimit: 4
Completion Port Thread:Total: 1 Free: 1 MaxFree: 8 CurrentLimit: 0 MaxLimit: 400 MinLimit: 4
經過!runaway命令來查看各個線程消耗CPU時間的狀況。
0:000> !runaway
User Mode Time
Thread Time
26:2bc 0 days 0:00:23.781
14:162c 0 days 0:00:22.562
31:1458 0 days 0:00:22.328
25:8c4 0 days 0:00:21.984
13:1708 0 days 0:00:21.937
38:b44 0 days 0:00:21.781
27:eec 0 days 0:00:21.734
34:e6c 0 days 0:00:21.328
20:70c 0 days 0:00:20.890
53:3c8 0 days 0:00:20.093
15:4bc 0 days 0:00:19.578
18:16a0 0 days 0:00:19.531
19:ef8 0 days 0:00:18.937
58:fd4 0 days 0:00:17.296
63:11b0 0 days 0:00:15.843
17:be4 0 days 0:00:15.484
64:15c0 0 days 0:00:14.984
65:c30 0 days 0:00:14.875
70:1058 0 days 0:00:14.031
73:150 0 days 0:00:13.609
66:708 0 days 0:00:13.593
69:f80 0 days 0:00:13.359
71:174c 0 days 0:00:13.171
68:bc0 0 days 0:00:12.796
能夠經過如下命令來查看managed線程的當前調用棧信息。
~*e!clrstack
這種狀況在壓力測試時,或者網站請求負載平衡出現問題的時,或者業務量較大的時段有可能發生。在這種狀況下經過更改代碼來提高性能可能有些得不償失了,更實際一點的建議是scale up(提高機器硬件性能)或者scaleout(擴展機器將請求分流)。
無限循環或過多循環表現爲
爲了肯定問題咱們通常會抓多個dump文件,中間存在必定時間間隔,從十幾秒到幾分鐘,基於high CPU持續時間而定。
測試程序
protected void Page_Load(object sender, EventArgs e) { System.Diagnostics.PerformanceCounter counter = new System.Diagnostics.PerformanceCounter(); counter.CategoryName = "Processor"; counter.CounterName = "% Processor Time"; counter.InstanceName = "_Total"; while (true) { while (counter.NextValue() < 70) ; } }
第一個dump中的cpu使用相關數據,
0:014> !runaway
User Mode Time
Thread Time
17:10f8 0 days 0:01:16.268
12:c64 0 days 0:00:07.659
7:908 0 days 0:00:00.109
23:a6c 0 days 0:00:00.015
9:175c 0 days 0:00:00.015
...
2:528 0 days 0:00:00.000
0:014> .time
Debug session time: Sat Sep 29 21:33:09.341 2012 (UTC + 8:00)
System Uptime: 6 days 22:15:37.061
Process Uptime: 0 days 0:04:56.007
Kernel time: 0 days 0:00:26.223
User time: 0 days 0:01:24.100
0:017> !clrstack
OS Thread Id: 0x10f8 (17)
Child-SP RetAddr Call Site
00000079fda8dc50 000007fe99ff648d Microsoft.Win32.RegistryKey.InternalGetValue(System.String, System.Object, Boolean, Boolean)
00000079fda8dcf0 000007fe9872fa76 Microsoft.Win32.RegistryKey.GetValue(System.String)
00000079fda8dd40 000007fe9872ec05 System.Diagnostics.PerformanceMonitor.GetData(System.String)
00000079fda8ddc0 000007fe9872ca17 System.Diagnostics.PerformanceCounterLib.GetPerformanceData(System.String)
00000079fda8de20 000007fe9872c8d2 System.Diagnostics.PerformanceCounterLib.GetCategorySample(System.String)
00000079fda8de70 000007fe986b59a4 System.Diagnostics.PerformanceCounterLib.GetCategorySample(System.String, System.String)
00000079fda8deb0 000007fe986b52c1 System.Diagnostics.PerformanceCounter.NextSample()
00000079fda8df90 000007fe3d070a76 System.Diagnostics.PerformanceCounter.NextValue()
00000079fda8e180 000007fe3d0d0f39 ASP.cpu70_aspx.Page_Load(System.Object, System.EventArgs)
00000079fda8e1d0 000007fe97399f8a System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
00000079fda8e200 000007fe97390744 System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
00000079fda8e230 000007fe973907a2 System.Web.UI.Control.OnLoad(System.EventArgs)
...
00000079fda8f660 000007fe9c8cb31b DomainNeutralILStubClass.IL_STUB(Int64, Int64, Int64, Int32)
第二個dump中cpu使用相關數據,
0:014> !runaway
User Mode Time
Thread Time
17:10f8 0 days 0:02:14.706
12:c64 0 days 0:00:13.665
7:908 0 days 0:00:00.109
9:175c 0 days 0:00:00.015
...
1:1020 0 days 0:00:00.000
0:017> .time
Debug session time: Sat Sep 29 21:35:58.423 2012 (UTC + 8:00)
System Uptime: 6 days 22:18:26.143
Process Uptime: 0 days 0:07:45.089
Kernel time: 0 days 0:00:46.472
User time: 0 days 0:02:28.559
0:017> !clrstack
OS Thread Id: 0x10f8 (17)
Child-SP RetAddr Call Site
00000079fda8db20 000007fe9a9ee566 DomainNeutralILStubClass.IL_STUB(Microsoft.Win32.SafeHandles.SafeRegistryHandle, System.String, Int32[], Int32 ByRef, Byte[], Int32 ByRef)
00000079fda8dc50 000007fe99ff648d Microsoft.Win32.RegistryKey.InternalGetValue(System.String, System.Object, Boolean, Boolean)
00000079fda8dcf0 000007fe9872fa76 Microsoft.Win32.RegistryKey.GetValue(System.String)
00000079fda8dd40 000007fe9872ec05 System.Diagnostics.PerformanceMonitor.GetData(System.String)
00000079fda8ddc0 000007fe9872ca17 System.Diagnostics.PerformanceCounterLib.GetPerformanceData(System.String)
00000079fda8de20 000007fe9872c8d2 System.Diagnostics.PerformanceCounterLib.GetCategorySample(System.String)
00000079fda8de70 000007fe986b59a4 System.Diagnostics.PerformanceCounterLib.GetCategorySample(System.String, System.String)
00000079fda8deb0 000007fe986b52c1 System.Diagnostics.PerformanceCounter.NextSample()
00000079fda8df90 000007fe3d070a76 System.Diagnostics.PerformanceCounter.NextValue()
00000079fda8e180 000007fe3d0d0f39 ASP.cpu70_aspx.Page_Load(System.Object, System.EventArgs)
00000079fda8e1d0 000007fe97399f8a System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
...
00000079fda8f660 000007fe9c8cb31b DomainNeutralILStubClass.IL_STUB(Int64, Int64, Int64, Int32)
根據抓取dump的時間來看(.time命令輸出),
兩個dump中該進程用戶模式下消耗的cpu時間爲2:28.559 - 1:24.110 = 1:04.449
兩個dump中17號線程消耗cpu時間爲2:14.706 - 1:16.268 = 0:58.438
由此能夠看到在該段時間內絕大多數的cpu時間是17號線程消耗的。所以咱們能夠把關注點放在17號線程上。
對比兩次調用棧信息能夠看到都是在cpu70這個頁面Page_Load方法中。
GC頻繁的狀況的表現
測試程序
<%@ Page Language="C#" AutoEventWireup="true" Inherits="System.Web.UI.Page" %> <script runat="server"> protected void btnStart_Click(object sender, EventArgs e) { int number =0; if (int.TryParse(txtNumber.Text, out number)) { string dates = "<Dates>"; for (int i = 0; i < number; i++) { for (int j = 0; j < i; j++) { DateTime dt = DateTime.Now.Add(new TimeSpan(j, 0, 0, 0)); dates += "<Date><Month>" + dt.Month + "</Month><Day>" + dt.Day + "</Day><DayOfWeek>" + dt.DayOfWeek + "</DayOfWeek><Date>"; dates += "<Date><Month>" + dt.Month + "</Month><Day>" + dt.Day + "</Day><DayOfWeek>" + dt.DayOfWeek + "</DayOfWeek><Date></br>"; } dates += "</Dates>"; } Response.Write(dates); } } </script> <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title></title> </head> <body> <form id="form1" runat="server"> <div> <asp:TextBox ID="txtNumber" runat="server"/> <asp:Button ID="btnStart" Text="Start" OnClick="btnStart_Click" runat="server"/> </div> </form> </body> </html>
對於該問題初步的定位能夠參照如下性能計數器,
要關注這些性能計數的點
經過查看性能計數咱們懷疑此次問題可能與GC有關,接下來查看dump文件。
先查看GC線程在作什麼(這裏服務器默認採用server mode,每一個cpu覈對應一個gc線程)
如下是一個GC沒有作事的狀態做爲參照。
0:013> kL
Child-SP RetAddr Call Site
0000009d`d308f628 000007fe`bea310ea ntdll!ZwWaitForSingleObject+0xa
0000009d`d308f630 000007fe`9a3bd220 KERNELBASE!WaitForSingleObjectEx+0x92
0000009d`d308f6d0 000007fe`9a3bd10f mscorwks!CLREvent::WaitEx+0x174
0000009d`d308f720 000007fe`9a9cb437 mscorwks!CLREvent::WaitEx+0x63
0000009d`d308f7d0 000007fe`9a46d4d3 mscorwks!SVR::gc_heap::gc_thread_function+0x47
0000009d`d308f830 000007fe`c0e6167e mscorwks!SVR::gc_heap::gc_thread_stub+0x93
0000009d`d308f880 000007fe`c1a4c3f1 KERNEL32!BaseThreadInitThunk+0x1a
0000009d`d308f8b0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
先查看那個線程是gc線程
0:013> !threads -special
ThreadCount: 8
UnstartedThread: 0
BackgroundThread: 8
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
PreEmptive Lock
ID OSID ThreadOBJ State GC GC Alloc Context Domain Count APT Exception
7 1 1534 0000009dd2bfd260 8220 Enabled 0000000000000000 0000009dd2be2c90 0 Ukn
14 2 1764 0000009dd2c02ef0 b220 Enabled 0000000000000000 0000009dd2be2c90 0 MTA
11 3 5dc 0000009dd2c33bc0 880a220 Enabled 0000000000000000 0000009dd2be2c90 0 MTA
16 4 1048 0000009dd2c54e30 80a220 Enabled 0000000000000000 0000009dd2be2c90 0 MTA
17 5 7f4 0000009dd2c55c80 1220 Enabled 0000000000000000 0000009dd2be2c90 0 Ukn
18 6 1448 0000009dd2c9c1a0 180b220 Enabled 0000000000000000 0000009dd2c56ba0 1 MTA
9 7 768 0000009dd3b56150 220 Enabled 0000000000000000 0000009dd2be2c90 0 Ukn
19 8 1074 0000009dd19066d0 180b220 Enabled 0000000000000000 0000009dd2be2c90 0 MTA
OSID Special thread type
10 cc0 Gate
11 5dc IOCompletion
12 1344 DbgHelper
13 107c GC SuspendEE
14 1764 Finalizer
16 1048 Timer
17 7f4 ADUnloadHelper
18 1448 ThreadpoolWorker
19 1074 ThreadpoolWorker
GC13號線程調用棧,與不作事的時候調用棧不同,說明正在GC。
0:013> k
Child-SP RetAddr Call Site
0000009d`d308f5a0 000007fe`9a44dfed mscorwks!SVR::gc_heap::plan_phase+0x499
0000009d`d308f700 000007fe`9a968d35 mscorwks!SVR::gc_heap::gc1+0x6d
0000009d`d308f760 000007fe`9a9cb46f mscorwks!SVR::gc_heap::garbage_collect+0x2f5
0000009d`d308f7d0 000007fe`9a46d4d3 mscorwks!SVR::gc_heap::gc_thread_function+0x7f
0000009d`d308f830 000007fe`c0e6167e mscorwks!SVR::gc_heap::gc_thread_stub+0x93
0000009d`d308f880 000007fe`c1a4c3f1 KERNEL32!BaseThreadInitThunk+0x1a
0000009d`d308f8b0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
接下來咱們要看爲何會這麼頻繁的發生GC。
先看看目前運行的請求都有哪些,發現只有18號線程(這裏咱們爲了演示只用了一個請求,現實狀況應該會有較多請求)
0:013> !aspxpages
HttpContext Timeout Completed Running ThreadId ReturnCode Verb RequestPath+QueryString
0x0000009d800ed140 19200 Sec no 652 Sec 18 200 POST /high_gc.aspx
Total 1 HttpContext objects
查看一個18號線程,原來該線程在嘗試在大對象對上分配內存而觸發的了GC。
0:018> !dumpstack
OS Thread Id: 0x1448 (18)
Child-SP RetAddr Call Site
0000009dd3e2d4b8 000007febea3152c ntdll!ZwSetEvent+0xa
0000009dd3e2d4c0 000007fe9a770218 KERNELBASE!SetEvent+0xc
0000009dd3e2d4f0 000007fe9a689cac mscorwks!CLREvent::Set+0x128
0000009dd3e2d570 000007fe9a81362c mscorwks!SVR::GCHeap::GarbageCollectGeneration+0x1fc
0000009dd3e2d5d0 000007fe9a332ae2 mscorwks!SVR::gc_heap::try_allocate_more_space+0x1ac
0000009dd3e2d6c0 000007fe9a35df7f mscorwks!SVR::gc_heap::allocate_more_space+0x32
0000009dd3e2d710 000007fe9a40baf8 mscorwks!SVR::gc_heap::allocate_large_object+0x5f
0000009dd3e2d780 000007fe9a40be0e mscorwks!SVR::GCHeap::Alloc+0x198
0000009dd3e2d7c0 000007fe9a89b143 mscorwks!SlowAllocateString+0x7e
0000009dd3e2d840 000007fe97de9566 mscorwks!FramedAllocateString+0xd3
0000009dd3e2da00 000007fe3ac9130d mscorlib_ni!System.String.ConcatArray(System.String[], Int32)+0x16
0000009dd3e2da60 000007fe89b592b3 App_Web_brwfhign!ASP.high_gc_aspx.btnStart_Click(System.Object, System.EventArgs)+0x53d
0000009dd3e2dbd0 000007fe89b5978c System_Web_ni!System.Web.UI.WebControls.Button.OnClick(System.EventArgs)+0x73
0000009dd3e2dc10 000007fe892f3a2d System_Web_ni!System.Web.UI.WebControls.Button.RaisePostBackEvent(System.String)+0x8c
…
0000009dd3e2f5c0 000007fec0e6167e mscorwks!Thread::intermediateThreadProc+0x78
0000009dd3e2f790 000007fec1a4c3f1 KERNEL32!BaseThreadInitThunk+0x1a
0000009dd3e2f7c0 0000000000000000 ntdll!RtlUserThreadStart+0x1d
經過調用棧來看應該是在分配內存給string類型的數據,請求還在進行中,該string應該還在線程棧上,接下來查看該線程棧找到該對象,
0:018> !dso
OS Thread Id: 0x1448 (18)
RSP/REG Object Name
0000009dd3e2d618 0000009d8000b5d8 System.RuntimeType
0000009dd3e2d628 0000009d8000b5d8 System.RuntimeType
0000009dd3e2d630 0000009d800aea00 System.RuntimeType
0000009dd3e2d658 0000009d8000b5d8 System.RuntimeType
0000009dd3e2d7c8 0000009d800efab0 System.Web.UI.WebControls.Button
0000009dd3e2d7e0 0000009d8000b458 System.Reflection.Module
0000009dd3e2d7f8 0000009d80009648 System.RuntimeType
0000009dd3e2d8b0 0000009d800f5178 System.RuntimeType
0000009dd3e2d938 0000009d800efab0 System.Web.UI.WebControls.Button
0000009dd3e2d958 0000009d800baee0 System.Web.HttpApplication+PipelineStepManager
0000009dd3e2d9c0 0000009d800baee0 System.Web.HttpApplication+PipelineStepManager
0000009dd3e2d9c8 0000009d800ed140 System.Web.HttpContext
0000009dd3e2da20 0000009d800ed140 System.Web.HttpContext
0000009dd3e2da48 0000009d800efab0 System.Web.UI.WebControls.Button
0000009dd3e2da98 0000009dc04f2db0 System.String
0000009dd3e2dac0 0000009dc04f2db0 System.String
0000009dd3e2db28 0000009d8012ae88 System.Object[] (System.Object[])
0000009dd3e2db38 0000009d8012ae88 System.Object[] (System.Object[])
0000009dd3e2db48 0000009d8012ae88 System.Object[] (System.Object[])
0000009dd3e2db58 0000009dc04f2db0 System.String
0000009dd3e2db90 0000009dc03fbfa0 System.String
隨便挑了一個string都1M左右,大於85,000byte就是大對象。
0:018> !objsize 0000009dc04f2db0
sizeof(0000009dc04f2db0) = 1,011,336 ( 0xf6e88) bytes (System.String)
經過!do命令把這個字符串輸出+回頭查看代碼邏輯=解決問題
0:018> !do 0000009dc04f2db0Name: System.StringMethodTable: 000007fe97f17c20EEClass: 000007fe97b1e550Size: 1011330(0xf6e82) bytesGC Generation: 3 (C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)String: <Dates></Dates><Date><Month>9</Month><Day>29</Day><DayOfWeek>Saturday</DayOfWeek>……………省略1M字符串