教程:官方文檔https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/debug-memory-leak
環境:Linux、Docker、.NET Core 3.1 SDK及更高版本
示例代碼:https://github.com/dotnet/samples/tree/master/core/diagnostics/DiagnosticScenarios
html
示例中包含了泄漏內存、死鎖線程、CPU佔用太高等接口,方便學習。
ios
[root@localhost ~]# cd Diagnostic_scenarios_sample_debug_target [root@localhost Diagnostic_scenarios_sample_debug_target]# dotnet build -c Release Microsoft (R) Build Engine version 16.7.0+7fb82e5b2 for .NET ... Build succeeded. 0 Warning(s) 0 Error(s) Time Elapsed 00:00:05.62
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-buster-slim AS base WORKDIR /app EXPOSE 80 COPY bin/Release/netcoreapp3.1 . ENTRYPOINT ["dotnet", "DiagnosticScenarios.dll"]
[root@localhost Diagnostic_scenarios_sample_debug_target]# vim Dockerfile [root@localhost Diagnostic_scenarios_sample_debug_target]# docker build -t dumptest . Sending build context to Docker daemon 1.47MB Step 1/5 : FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-buster-slim AS base ... Successfully built 928e512d9be4 Successfully tagged dumptest:latest [root@localhost Diagnostic_scenarios_sample_debug_target]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE dumptest latest 928e512d9be4 5 seconds ago 267MB
--privileged=true
參數的做用是給容器賦予root權限,後面createdump須要該支持。git
[root@localhost Diagnostic_scenarios_sample_debug_target]# docker run --name diagnostic --privileged=true -p 888:80 -d dumptest a68efbfbfb13117142bac9b0c6fb9f4c5124e4490eaf4850dd17fdc612e2cfda [root@localhost Diagnostic_scenarios_sample_debug_target]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a68efbfbfb13 dumptest "dotnet DiagnosticSc…" 24 seconds ago Up 23 seconds 0.0.0.0:888->80/tcp diagnostic
看看可否訪問成功 http://192.168.0.161:888/api/values
github
不少時候問題出如今生產環境,而測試環境又很難重現,因此問題出現時優先生成一個dump文件,而後去恢復服務(重啓解決大部分問題啊😂),最後再來慢慢分析問題是一個比較合理的方案。
web
經過示例中提供的接口生成一些死鎖線程和內存泄漏。
http://192.168.0.161:888/api/diagscenario/deadlock
http://192.168.0.161:888/api/diagscenario/memleak/101024
docker
進入示例容器,經過find
找到netcore自帶的createdump工具;
執行createdump路徑 PID
命令建立dump文件(若是容器內只有一個應用,通常PID默認爲1,也可使用top命令來查看PID)
容器佔用越大createdump越慢,建立完以後退出容器,將coredump.1文件拷貝到宿主機慢慢分析。vim
[root@localhost dumpfile]# docker exec -it diagnostic bash root@6b6f0a7ebe10:/app# find / -name createdump /usr/share/dotnet/shared/Microsoft.NETCore.App/3.1.10/createdump root@6b6f0a7ebe10:/app# /usr/share/dotnet/shared/Microsoft.NETCore.App/3.1.10/createdump 1 Writing minidump with heap to file /tmp/coredump.1 Written 2920263680 bytes (712955 pages) to core file root@6b6f0a7ebe10:/app# exit [root@localhost dumpfile]# docker cp 6b6f0a7ebe10:/tmp/coredump.1 /mnt/dumpfile/coredump.1 [root@localhost dumpfile]# ls coredump.1
dotnet-dump
工具依賴dotnet的sdk,若是宿主機中安裝了sdk能夠直接在宿主機中分析;
若是不想污染宿主機環境能夠拉取一個sdk鏡像mcr.microsoft.com/dotnet/core/sdk:3.1
,建立一個臨時環境用於分析。
api
須要把用於存放coredump.1文件的目錄掛載到容器,或者本身cp進去。
建立好容器以後須要安裝dotnet-dump
工具bash
[root@iZwz9883hajros0tZ ~]# docker run --rm -it -v /mnt/dumpfile:/tmp/coredump mcr.microsoft.com/dotnet/core/sdk:3.1 root@6e560e9f5d55:/# cd /tmp/coredump root@6e560e9f5d55:/tmp/coredump# ls coredump.1 root@6e560e9f5d55:/tmp/coredump# dotnet tool install -g dotnet-dump Tools directory '/root/.dotnet/tools' is not currently on the PATH environment variable. ... export PATH="$PATH:/root/.dotnet/tools" You can invoke the tool using the following command: dotnet-dump Tool 'dotnet-dump' (version '5.0.152202') was successfully installed. root@6e560e9f5d55:/tmp/coredump# export PATH="$PATH:/root/.dotnet/tools" root@6e560e9f5d55:/tmp/coredump#
利用clrstack
命令輸出調用堆棧。session
root@6e560e9f5d55:/tmp/coredump# dotnet-dump analyze coredump.1 Loading core dump: coredump.1 ... Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command. Type 'quit' or 'exit' to exit the session. > clrstack -all OS Thread Id: 0x311 Child SP IP Call Site 00007FCDE06DF530 00007fd1521e600c [GCFrame: 00007fcde06df530] 00007FCDE06DF620 00007fd1521e600c [GCFrame: 00007fcde06df620] 00007FCDE06DF680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcde06df680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) 00007FCDE06DF7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1() 00007FCDE06DF800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44] 00007FCDE06DF820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] 00007FCDE06DF870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93] 00007FCDE06DFBD0 00007fd1512f849f [GCFrame: 00007fcde06dfbd0] 00007FCDE06DFCA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcde06dfca0] OS Thread Id: 0x312 Child SP IP Call Site 00007FCDDFEDE530 00007fd1521e600c [GCFrame: 00007fcddfede530] 00007FCDDFEDE620 00007fd1521e600c [GCFrame: 00007fcddfede620] 00007FCDDFEDE680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcddfede680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) 00007FCDDFEDE7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1() 00007FCDDFEDE800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44] 00007FCDDFEDE820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] 00007FCDDFEDE870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93] 00007FCDDFEDEBD0 00007fd1512f849f [GCFrame: 00007fcddfedebd0] 00007FCDDFEDECA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcddfedeca0] ...
300多個線程的調用堆棧大多數線程共享一個公共調用堆棧,
該調用堆棧彷佛顯示請求傳入了死鎖方法,而死鎖方法繼而又調用了 Monitor.ReliableEnter,此方法表示這些線程正試圖進入鎖定,然而這個obj可能已被其餘線程獲取了排它鎖了。
光看這個調用堆棧,太難看出問題... 可是結合源碼就很容易看出DeadlockFunc這個方法早就已經將o1,o2兩個object形成了交叉死鎖,而後後面啓動的300個線程都試圖獲取獲取鎖定,結果就是無限等待。
00007FCDDEEDC530 00007fd1521e600c [GCFrame: 00007fcddeedc530] 00007FCDDEEDC620 00007fd1521e600c [GCFrame: 00007fcddeedc620] 00007FCDDEEDC680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcddeedc680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) 00007FCDDEEDC7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1() 00007FCDDEEDC800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44] 00007FCDDEEDC820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] 00007FCDDEEDC870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93] 00007FCDDEEDCBD0 00007fd1512f849f [GCFrame: 00007fcddeedcbd0] 00007FCDDEEDCCA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcddeedcca0]
利用syncblk
命令找出實際持有排它鎖的線程。
> syncblk Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner 25 0000000000E59BB8 603 1 00007FCE8C00FB30 1e8 14 00007fcea81ffb98 System.Object 26 0000000000E59C00 3 1 00007FD0C8001FA0 1e9 15 00007fcea81ffbb0 System.Object ----------------------------- Total 326 Free 307
Owning Thread Info
列下面有3個子列,第一列是地址,第二列是操做系統線程 ID,第三列是線程索引,也能夠經過threads
命令拿到線程索引。
經過setthread
命令將線程切換到0x1e8上,而後用clrstack
查看它的調用堆棧。
> setthread 14 > clrstack OS Thread Id: 0x1e8 (14) Child SP IP Call Site 00007FCE77FFE500 00007fd1521e600c [GCFrame: 00007fce77ffe500] 00007FCE77FFE5F0 00007fd1521e600c [GCFrame: 00007fce77ffe5f0] 00007FCE77FFE650 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fce77ffe650] System.Threading.Monitor.Enter(System.Object) 00007FCE77FFE7A0 00007FD0DB4F5D0C testwebapi.Controllers.DiagScenarioController.DeadlockFunc() 00007FCE77FFE7E0 00007FD0DB4F5B27 testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_0() 00007FCE77FFE800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44] 00007FCE77FFE820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] 00007FCE77FFE870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93] 00007FCE77FFEBD0 00007fd1512f849f [GCFrame: 00007fce77ffebd0] 00007FCE77FFECA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fce77ffeca0]
這個線程的調用堆棧跟上面的300多個比起來多了一些信息,能夠看出它經過DiagScenarioController.DeadlockFunc中的Monitor.Enter方法得到了某個object得排它鎖。
SOS命令:https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/dotnet-dump#dotnet-dump-analyze
檢查當前全部託管類型的統計信息 -min [byte]
可選參數能夠限制統計範圍
root@6e560e9f5d55:/tmp/coredump# dotnet-dump analyze coredump.1 Loading core dump: coredump.1 ... Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command. Type 'quit' or 'exit' to exit the session. > dumpheap -stat Statistics: MT Count TotalSize Class Name 00007fd0dc2b3ab0 1 24 System.Collections.Generic.ObjectEqualityComparer`1[[System.Threading.ThreadPoolWorkQueue+WorkStealingQueue, System.Private.CoreLib]] 00007fd0dc29ca28 1 24 System.Collections.Generic.ObjectEqualityComparer`1[[Microsoft.AspNetCore.Mvc.ModelBinding.Metadata.DefaultModelMetadata, Microsoft.AspNetCore.Mvc.Core]] ... 00007fd0d7f85510 482 160864 System.Object[] 0000000000da9d80 19191 7512096 Free 00007fd0dc2b3718 2 8388656 testwebapi.Controllers.Customer[] 00007fd0dadbeb58 1010240 24245760 testwebapi.Controllers.Customer 00007fd0d7f90f90 1012406 95190854 System.String
可在此處看到大多數是String或Customer對象。String對象佔用了90MB左右的空間
使用方法表mt
分析佔用最多的System.String類型
>dumpheap -mt 00007f61876a0f90 00007fcfabc376f8 00007fd0d7f90f90 94 00007fcfabc37770 00007fd0d7f90f90 94 00007fcfabc377e8 00007fd0d7f90f90 94 00007fcfabc37860 00007fd0d7f90f90 94 00007fcfabc378d8 00007fd0d7f90f90 94 ... Statistics: MT Count TotalSize Class Name 00007fd0d7f90f90 1012406 95190854 System.String Total 1012406 objects
一百萬個對象,大小都是94byte,使用gcroot
命令隨便查看幾個對象的根。
> gcroot -all 00007fcfabc378d8 HandleTable: 00007FD1508815F8 (pinned handle) -> 00007FD0A7FFF038 System.Object[] -> 00007FCEA830A230 testwebapi.Controllers.Processor -> 00007FCEA830A248 testwebapi.Controllers.CustomerCache -> 00007FCEA830A260 System.Collections.Generic.List`1[[testwebapi.Controllers.Customer, DiagnosticScenarios]] -> 00007FD0B821F0A8 testwebapi.Controllers.Customer[] -> 00007FCFABC378C0 testwebapi.Controllers.Customer -> 00007FCFABC378D8 System.String Found 1 roots. >
基本上就是Customer,CustomerCache,Processor這幾個對象形成的內存佔用太高。
下一篇利用perf分析容器中CPU佔用的問題。
參考:
調試內存泄漏教程 | Microsoft Docs https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/debug-memory-leak
dotnet core調試docker下生成的dump文件 https://www.cnblogs.com/iamsach/p/10118628.html