【問題描述】:sql
前段時間有一項目反饋,經常使用的審批功能有時的響應較慢,多個管理員功能不按期的出現客戶端無響應的情況,而且管理員功能一旦出現卡死,也會影響到普通的業務用戶導致不少用戶沒法操做。該系統爲混合場景,管理員功能使用的CS客戶端,業務用戶使用的是WEB。數據庫
【環境信息】:緩存
應用服務器:Windows Server 2008 R2 IIS7 .NET 4.0服務器
數據庫服務器:AIX Oracle 10.2.0.4網絡
【處理過程】:session
通過跟蹤發現兩個狀況:app
在出現系統掛起時,抓取應用服務器的dump文件後,發現存在明顯的線程阻塞:ide
下圖是線程阻塞對應windbg的log日誌性能
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [D:\dumps\w3wp.DMP]
User Mini Dump File with Full Memory: Only application data is available
Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 7 Version 7601 (Service Pack 1) MP (8 procs) Free x64
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Machine Name:
Debug session time: Tue Apr 14 14:22:10.000 2015 (UTC + 8:00)
System Uptime: 19 days 20:26:50.903
Process Uptime: 0 days 5:33:15.000
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
......................................
ntdll!NtWaitForSingleObject+0xa:
00000000`7711135a c3 ret
0:000> .load D:\dumps\sos.dll 0:000> !tp CPU utilization: 0% Worker Thread: Total: 29 Running: 6 Idle: 23 MaxLimit: 800 MinLimit: 8 Work Request in Queue: 0 -------------------------------------- Number of Timers: 43 --------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 16 CurrentLimit: 0 MaxLimit: 800 MinLimit: 8
0:000> !syncblk Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner 47 0000000005532488 9 1 000000001385b1e0 93a0 105 000000017ffefbe0 System.Object -----------------------------
Total 6125
CCW 4
RCW 11
ComClassFactory 0
Free 4333
0:000> ~105s
ntdll!NtWaitForSingleObject+0xa:
00000000`7711135a c3 ret
0:105> !clrstack
OS Thread Id: 0x93a0 (105)
Child SP IP Call Site
000000000fbac038 000000007711135a [NDirectMethodFrameStandalone: 000000000fbac038] System.Data.Common.UnsafeNativeMethods.OCILobRead(System.Data.OracleClient.OciHandle, System.Data.OracleClient.OciHandle, System.Data.OracleClient.OciHandle, UInt32 ByRef, UInt32, IntPtr, UInt32, IntPtr, IntPtr, UInt16, CHARSETFORM)
000000000fbabfb0 000007fef3455640 DomainNeutralILStubClass.IL_STUB_PInvoke(System.Data.OracleClient.OciHandle, System.Data.OracleClient.OciHandle, System.Data.OracleClient.OciHandle, UInt32 ByRef, UInt32, IntPtr, UInt32, IntPtr, IntPtr, UInt16, CHARSETFORM)*** WARNING: Unable to verify checksum for System.Data.OracleClient.ni.dll
000000000fbac0f0 000007fef344ea9b System.Data.OracleClient.TracedNativeMethods.OCILobRead(System.Data.OracleClient.OciHandle, System.Data.OracleClient.OciHandle, System.Data.OracleClient.OciHandle, Int32 ByRef, UInt32, IntPtr, UInt32, UInt16, CHARSETFORM)
000000000fbac1d0 000007fef347c63d System.Data.OracleClient.OracleLob.Read(Byte[], Int32, Int32)
000000000fbac2a0 000007fef347d483 System.Data.OracleClient.OracleLob.get_Value()
000000000fbac320 000007fef34721b6 System.Data.OracleClient.OracleColumn.GetValue(System.Data.OracleClient.NativeBuffer_RowBuffer)
000000000fbac3d0 000007fef346efd2 System.Data.OracleClient.OracleDataReader.GetValues(System.Object[])
000000000fbac420 000007fedd16faec System.Data.ProviderBase.SchemaMapping.LoadDataRow()*** WARNING: Unable to verify checksum for System.Data.ni.dll
000000000fbac460 000007fedd1674fd System.Data.Common.DataAdapter.FillLoadDataRow(System.Data.ProviderBase.SchemaMapping)
000000000fbac4f0 000007fedd1673d4 System.Data.Common.DataAdapter.FillFromReader(System.Data.DataSet, System.Data.DataTable, System.String, System.Data.ProviderBase.DataReaderContainer, Int32, Int32, System.Data.DataColumn, System.Object)
000000000fbac5b0 000007fedd167271 System.Data.Common.DataAdapter.Fill(System.Data.DataSet, System.String, System.Data.IDataReader, Int32, Int32)
000000000fbac670 000007fedd167ef6 System.Data.Common.DbDataAdapter.FillInternal(System.Data.DataSet, System.Data.DataTable[], Int32, Int32, System.String, System.Data.IDbCommand, System.Data.CommandBehavior)
000000000fbac720 000007fedd167cc4 System.Data.Common.DbDataAdapter.Fill(System.Data.DataSet, Int32, Int32, System.String, System.Data.IDbCommand, System.Data.CommandBehavior)
000000000fbac7d0 000007fedd167ac2 System.Data.Common.DbDataAdapter.Fill(System.Data.DataSet)
000000000fbac880 000007ff002393ee xxxxxx.Platform.Core.DataAccess.Database.GetDataSet(System.String, System.String, System.Data.IDbDataParameter[], Boolean)*** ERROR: Module load completed but symbols could not be loaded for xxxxxx.Platform.Core.DataAccess.dll
000000000fbac950 000007ff002391cd xxxxxx.Platform.Core.DataAccess.Database.GetDataSet(System.String, System.Data.IDbDataParameter[], Boolean)
000000000fbac990 000007ff0023906a xxxxxx.Platform.Core.DataAccess.Database.ExecuteDataSet(System.String, System.Data.IDbDataParameter[])
000000000fbac9f0 000007ff00238810 xxxxxx.Platform.Core.DataAccess.Database.ExecuteDataSet(System.String, System.Object[])
000000000fbaca60 000007ff01566749 xxxxxx.Platform.Workflow.Server.Designer.OperateDataBaseManager.GetPacekage(System.String)*** WARNING: Unable to verify checksum for xxxxxx.Platform.Workflow.Server.dll
*** ERROR: Module load completed but symbols could not be loaded for xxxxxx.Platform.Workflow.Server.dll
000000000fbacab0 000007ff01566548 xxxxxx.Platform.Workflow.Server.Designer.OperateDataBaseManager.Load(System.String)
000000000fbacb00 000007ff0024bbfe xxxxxx.Platform.Resource.ResourceManager.MetadataService.GetCheckedinVersionContent(System.String, System.Data.DataRow, System.String ByRef)*** ERROR: Module load completed but symbols could not be loaded for xxxxxx.Platform.Resource.MetadataManager.dll
000000000fbacb60 000007ff0024b842 xxxxxx.Platform.Resource.ResourceManager.MetadataService.GetMetadataContentString(System.String, System.String ByRef, System.String)
000000000fbacbe0 000007ff0024b53b xxxxxx.Platform.Resource.ResourceManager.MetadataService.GetMetadataContentFromDB(System.String)
000000000fbacc40 000007ff00243d1a xxxxxx.Platform.Resource.ResourceManager.MetadataCache.GetObject(System.String, xxxxxx.Platform.AppFramework.Service.xxxState)
000000000fbaccb0 000007ff00fb036b xxxxxx.Platform.Resource.MetadataManager.MetadataRestFul.TryGetNewVersionContent(System.IO.BinaryReader, System.IO.BinaryWriter)
000000000fbad418 000007fef98d10b4 [DebuggerU2MCatchHandlerFrame: 000000000fbad418]
000000000fbad500 000007fef98d10b4 [CustomGCFrame: 000000000fbad500]
000000000fbad4c8 000007fef98d10b4 [GCFrame: 000000000fbad4c8]
000000000fbad458 000007fef98d10b4 [GCFrame: 000000000fbad458]
000000000fbad828 000007fef98d10b4 [HelperMethodFrame_PROTECTOBJ: 000000000fbad828] System.RuntimeMethodHandle._InvokeMethodFast(System.IRuntimeMethodInfo, System.Object, System.Object[], System.SignatureStruct ByRef, System.Reflection.MethodAttributes, System.RuntimeType)
000000000fbad970 000007fef895587f System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo, Boolean)*** WARNING: Unable to verify checksum for mscorlib.ni.dll
000000000fbadab0 000007fef89a0426 System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
000000000fbadb00 000007ff00b4f3b3 xxxxxx.Platform.AppFramework.Service.xxxRestfulContext.InternalInvoke(System.String, System.Reflection.MethodInfo, System.Object, System.Object[], Int32[] ByRef, System.Object[] ByRef)*** ERROR: Module load completed but symbols could not be loaded for xxxxxx.Platform.AppFramework.RestfulService.dll
000000000fbadb60 000007ff00b4ebf0 xxxxxx.Platform.AppFramework.Service.xxxRestfulContext.Invoke(System.String, System.String, System.String, Boolean, System.Object[], Int32[] ByRef, System.Object[] ByRef)
000000000fbadc40 000007ff00c28ae8 xxxxxx.Platform.AppFramework.Service.xxxRestfulContext.Invoke(System.String, System.String, System.String, Boolean, System.Object[])
000000000fbadca0 000007ff00c2876b xxxxxx.Platform.AppFramework.RESTFulWebService.xxxHttpWebHandler.InvokeStream(System.IO.BinaryReader, System.Web.HttpContext, Boolean)*** ERROR: Module load completed but symbols could not be loaded for xxxxxx.Platform.AppFramework.RESTFulWebService.dll
000000000fbadd80 000007ff00b4de0c xxxxxx.Platform.AppFramework.RESTFulWebService.xxxHttpWebHandler.ProcessRequest(System.Web.HttpContext)
000000000fbade70 000007fef1eb5c25 System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()*** WARNING: Unable to verify checksum for System.Web.ni.dll
000000000fbadf30 000007fef1e8337a System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
000000000fbadfe0 000007fef1e97030 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
000000000fbae0a0 000007fef1e82879 System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
000000000fbae100 000007fef1e8777c System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
000000000fbae180 000007fef1e86477 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
000000000fbae2b0 000007fef25ed6dd DomainNeutralILStubClass.IL_STUB_COMtoCLR(Int64, Int32, Int32 ByRef)
000000000fbae5e8 000007fef993e93e [GCFrame: 000000000fbae5e8]
000000000fbae628 000007fef993e93e [ContextTransitionFrame: 000000000fbae628]
000000000fbae668 000007fef993e93e [GCFrame: 000000000fbae668]
000000000fbae850 000007fef993e93e [ComMethodFrame: 000000000fbae850]
0:105> ~* e!clrstack OS Thread Id: 0x2aec (0) Unable to walk the managed stack. The current thread is likely not a managed thread. You can run !threads to get a list of managed threads in the process OS Thread Id: 0x15f0 (106)
Child SP IP Call Site
000000000ed5df70 00000000771118ca [GCFrame: 000000000ed5df70]
000000000ed5e1c8 00000000771118ca [GCFrame: 000000000ed5e1c8]
000000000ed5e178 00000000771118ca [HelperMethodFrame: 000000000ed5e178] System.Threading.Monitor.Enter(System.Object)
000000000ed5e2d0 000007ff00243ce8 xxxxxx.Platform.Resource.ResourceManager.MetadataCache.GetObject(System.String, xxxxxx.Platform.AppFramework.Service.GSPState)
000000000ed5e340 000007ff00d40c8d xxxxxx.Platform.Resource.ResourceManager.DefaultValueManager.IsFormMobileForm(System.String)
000000000ed5e380 000007ff00d40ac0 xxxxxx.Platform.Resource.ResourceManager.DefaultValueManager.GetDefaultValueXmlExactly(System.String, System.String, System.String)
000000000ed5e3d0 000007ff00d409df xxxxxx.Platform.Engine.DataAccess.Common.DAEUtils.GetDefaultValueXml(System.String, System.String, System.String)
000000000ed5e420 000007ff00d40583 xxxxxx.Platform.Engine.DataAccess.GSPDataAccessEngine.NewObjectWithDefault(xxxxxx.Platform.Resource.Metadata.DataModel.GspDataModel, System.String, System.String)
000000000ed5e500 000007ff0091495d xxxxxx.Platform.BizComponent.JSProxy.BizComponentRESTFulService.InvokeInternal(System.String, System.Object[], System.Collections.Generic.Dictionary`2<Int32,System.Object> ByRef, Boolean ByRef)
000000000ed5e590 000007ff0023fe16 xxxxxx.Platform.BizComponent.JSProxy.BizComponentRESTFulService.Get(System.Collections.Generic.Dictionary`2<System.String,System.String>)
000000000ed5e610 000007ff002276d8 xxxxxx.Platform.WebFramework.RestFulService.RESTFulHandlerForWeb.ProcessRequest(System.Web.HttpContext)
000000000ed5e840 000007fef1eb5c25 System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
000000000ed5e900 000007fef1e8337a System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
000000000ed5e9b0 000007fef1e97030 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
000000000ed5ea70 000007fef1e82879 System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
000000000ed5ead0 000007fef1e8777c System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
000000000ed5eb50 000007fef1e86477 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
000000000ed5ec80 000007fef25ed6dd DomainNeutralILStubClass.IL_STUB_COMtoCLR(Int64, Int32, Int32 ByRef)
000000000ed5efb8 000007fef993e93e [GCFrame: 000000000ed5efb8]
000000000ed5eff8 000007fef993e93e [ContextTransitionFrame: 000000000ed5eff8]
000000000ed5f038 000007fef993e93e [GCFrame: 000000000ed5f038]
000000000ed5f220 000007fef993e93e [ComMethodFrame: 000000000ed5f220] 測試
下圖爲阻塞源頭105號線程(被數據庫返回阻塞),執行SQL對應的程序代碼:
下圖爲線程阻塞的同步程序代碼:
至此問題應該比較清楚了,執行SQL時間長的線程阻塞了其餘線程,而線程同步鎖的粒度太大(僅爲了作數據緩存時,防止插入重複值,結果鎖定了全部的元數據對象)。可是,該SQL腳本真的有問題嗎?
【網絡分析】:
將該SQL放到plsql中執行,客戶端響應時間很長,通過對不一樣單據進行測試後發現,當LOB類型的數據超過16K之後問題就會出現,但該SQL在測試環境並無問題。前面已經跟蹤過數據庫服務器,沒有發現什麼問題,所以目標開始轉移到應用服務器到數據庫服務器的網絡環境上。
直接使用ADO.NET和JDBC分別直接訪問數據庫,均發現響應延遲問題。與客戶方IT人員溝通了解到,從應用服務器到數據庫通過了一個思科的防火牆。咱們就分別在應用服務器、應用服務器端的交換機、數據庫服務器端的交換機、數據庫服務器,4個點進行網絡抓包。通過對比發現,通過防火牆先後兩個交換機的數據包明顯有問題:有很是明顯的包亂序、包數量不一致、window size遞減的狀況;而且全部通過防火牆的網絡包,window size的值均從原來的64k篡改成16k;
以下圖:
把這個狀況反饋給思科的代理後,他們調整防火牆設置:好像是改成按流量控制,不是特別懂。
至此,問題完全解決!