最近咱們的程序在退出時會卡住,調查發現是在卸載dll
時死鎖了。大概流程是這樣的:咱們的dll
在加載的時候會建立一個工做線程,在卸載的時候,會設置退出標誌並等待以前開啓的工做線程結束。爲了研究這個經典的死鎖問題,寫了一個模擬程序,用到的dump
文件及示例代碼參考附件。c++
主程序 WaitDllUnloadExe
git
//WaitDllUnloadExe.cpp
#include "stdafx.h"
#include "windows.h"
int _tmain(int argc, _TCHAR* argv[])
{
HMODULE module = LoadLibraryA(".\\DllUnload.dll");
Sleep(5000);
FreeLibrary(module);
return 0;
}
複製代碼
DLL程序 DllUnload
編程
// dllmain.cpp
#include "stdafx.h"
#include "process.h"
HANDLE g_hThread;
bool g_quit = false;
unsigned __stdcall procThread(void *) {
while ( !g_quit )
{
OutputDebugStringA("procThread running.\n");
Sleep(100);
}
OutputDebugStringA("==========================procThread quitting.\n");
return 0;
}
unsigned __stdcall quitDemoProc(void *) {
int idx = 0;
while ( idx++ < 5 )
{
OutputDebugStringA("quitDemoProc running!!!!!!!!.\n");
Sleep(100);
}
OutputDebugStringA("--------------------------------------------------quitDemoProc quitting.\n");
return 0;
}
BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved ) {
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
{
g_hThread = (HANDLE)_beginthreadex(NULL, 0, &procThread, NULL, 0, NULL);
CloseHandle((HANDLE)_beginthreadex(NULL, 0, &quitDemoProc, NULL, 0, NULL));
}
break;
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
{
OutputDebugStringA("------------DLL_THREAD_DETACH called.\n");
}
break;
case DLL_PROCESS_DETACH:
{
OutputDebugStringA("------------DLL_PROCESS_DETACH begin wait...\n");
g_quit = true;
WaitForSingleObject(g_hThread, INFINITE);
OutputDebugStringA("------------DLL_PROCESS_DETACH end wait...\n");
}
break;
}
return TRUE;
}
複製代碼
點我下載測試工程windows
使用windbg
打開dump
文件。而後使用~kvn
列出全部線程的調用棧。bash
0 Id: 1918.1924 Suspend: 1 Teb: 7efdd000 Unfrozen
# ChildEBP RetAddr Args to Child
00 004af6f4 76150816 00000038 00000000 00000000 ntdll!NtWaitForSingleObject+0x15 (FPO: [3,0,0])
01 004af760 76781194 00000038 ffffffff 00000000 KERNELBASE!WaitForSingleObjectEx+0x98 (FPO: [Non-Fpo])
02 004af778 76781148 00000038 ffffffff 00000000 kernel32!WaitForSingleObjectExImplementation+0x75 (FPO: [Non-Fpo])
*** WARNING: Unable to verify checksum for DllUnload.dll
03 004af78c 6d0c15eb 00000038 ffffffff 00000000 kernel32!WaitForSingleObject+0x12 (FPO: [Non-Fpo])
04 004af86c 6d0c1e2b 6d0b0000 00000000 00000000 DllUnload!DllMain+0xdb (FPO: [Non-Fpo]) (CONV: stdcall) [c:\users\bianchengnan\documents\visual studio 2012\projects\waitdllunloadexe\dllunload\dllmain.cpp @ 55]
05 004af8b0 6d0c1d4f 6d0b0000 00000000 00000000 DllUnload!__DllMainCRTStartup+0xcb (FPO: [Non-Fpo]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\crtdll.c @ 508]
06 004af8c4 77139930 6d0b0000 00000000 00000000 DllUnload!_DllMainCRTStartup+0x1f (FPO: [Non-Fpo]) (CONV: stdcall) [f:\dd\vctools\crt_bld\self_x86\crt\src\crtdll.c @ 472]
07 004af8e4 77160000 6d0c10f0 6d0b0000 00000000 ntdll!LdrpCallInitRoutine+0x14
08 004af96c 77141221 6d0b0000 004af990 750227be ntdll!LdrpUnloadDll+0x375 (FPO: [Non-Fpo])
09 004af9b0 76151da7 6d0b0000 7efde000 004afaa4 ntdll!LdrUnloadDll+0x4a (FPO: [Non-Fpo])
*** WARNING: Unable to verify checksum for WaitDllUnloadExe.exe
0a 004af9c0 003a1425 6d0b0000 00000000 00000000 KERNELBASE!FreeLibrary+0x15 (FPO: [Non-Fpo])
0b 004afaa4 003a1989 00000001 0059a650 0059cf30 WaitDllUnloadExe!wmain+0x55 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\users\bianchengnan\documents\visual studio 2012\projects\waitdllunloadexe\waitdllunloadexe\waitdllunloadexe.cpp @ 13]
0c 004afaf4 003a1b7d 004afb08 767833ca 7efde000 WaitDllUnloadExe!__tmainCRTStartup+0x199 (FPO: [Non-Fpo]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 533]
0d 004afafc 767833ca 7efde000 004afb48 77139ed2 WaitDllUnloadExe!wmainCRTStartup+0xd (FPO: [Non-Fpo]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 377]
0e 004afb08 77139ed2 7efde000 75022546 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
0f 004afb48 77139ea5 003a107d 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
10 004afb60 00000000 003a107d 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
1 Id: 1918.594 Suspend: 1 Teb: 7efda000 Unfrozen
# ChildEBP RetAddr Args to Child
00 0090fc68 77138dd4 00000040 00000000 00000000 ntdll!NtWaitForSingleObject+0x15 (FPO: [3,0,0])
01 0090fccc 77138cb8 00000000 00000000 0059c5b8 ntdll!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo])
02 0090fcf4 7715d349 772020c0 75d82382 00000000 ntdll!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])
03 0090fd8c 7715d5c2 00000000 00000000 0090fdac ntdll!LdrShutdownThread+0x50 (FPO: [Non-Fpo])
04 0090fd9c 0f78e099 00000000 0059ec48 0090fde8 ntdll!RtlExitUserThread+0x2a (FPO: [Non-Fpo])
05 0090fdac 0f78e007 00000000 d910e7ee 00000000 MSVCR110D!_endthreadex+0x39 (FPO: [Non-Fpo])
06 0090fde8 0f78e1d1 0059ec48 0090fe00 767833ca MSVCR110D!_beginthreadex+0x1a7 (FPO: [Non-Fpo])
07 0090fdf4 767833ca 0059ec48 0090fe40 77139ed2 MSVCR110D!_endthreadex+0x171 (FPO: [Non-Fpo])
08 0090fe00 77139ed2 0059c5b8 75d8204e 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
09 0090fe40 77139ea5 0f78e120 0059c5b8 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
0a 0090fe58 00000000 0f78e120 0059c5b8 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
# 2 Id: 1918.1960 Suspend: 1 Teb: 7efd7000 Unfrozen
# ChildEBP RetAddr Args to Child
00 00a4f904 7719f826 75ec273a 00000000 00000000 ntdll!DbgBreakPoint (FPO: [0,0,0])
01 00a4f934 767833ca 00000000 00a4f980 77139ed2 ntdll!DbgUiRemoteBreakin+0x3c (FPO: [Non-Fpo])
02 00a4f940 77139ed2 00000000 75ec278e 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
03 00a4f980 77139ea5 7719f7ea 00000000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
04 00a4f998 00000000 7719f7ea 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
複製代碼
0號線程
是主線程(線程id
爲1924
),1號線程
是子線程(線程id
爲594
),2號線程
(線程id
爲1960
)是windbg
插入的遠程線程,用來中斷到調試器。測試
0號線程
在調用WaitForSingleObject
時陷入了等待,咱們來看它等什麼。輸入!handle 0x38 f
ui
!handle 0x38 f
Handle 00000038
Type Thread
Attributes 0
GrantedAccess 0x1fffff:
Delete,ReadControl,WriteDac,WriteOwner,Synch
Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,Impersonate,DirectImpersonate
HandleCount 5
PointerCount 8
Name <none>
Object specific information
Thread Id 1918.594
Priority 10
Base Priority 0
複製代碼
原來0號線程
在等線程id
爲594
的線程 。咱們代碼裏確實有WaitForSingleObject(g_hThread, INFINITE);
,咱們再來看看1號線程
。從調用棧看來,1號線程
已經在調用_endthreadex()
準備關閉了,在關閉的過程當中進入了一個關鍵段,並調用ntdll!NtWaitForSingleObject()
進入等待。等待的句柄爲0x40
。輸入!handle 0x40 f
查看句柄的相關信息。spa
0:002> !handle 0x40 f
Handle 00000040
Type Event
Attributes 0
GrantedAccess 0x100003:
Synch
QueryState,ModifyState
HandleCount 2
PointerCount 4
Name <none>
Object specific information
Event Type Auto Reset
Event is Waiting
複製代碼
咱們發現句柄0x40
對應的對象是Event
,暫時先無論。使用萬能死鎖調試命令!cs -l
看看(由於從調用堆棧來看1號線程
是調用RtlEnterCriticalSection
而死鎖的。)線程
0:002> !cs -l
-----------------------------------------
DebugInfo = 0x77204360
Critical section = 0x772020c0 (ntdll!LdrpLoaderLock+0x0)
LOCKED
LockCount = 0x1
WaiterWoken = No
OwningThread = 0x00001924
RecursionCount = 0x1
LockSemaphore = 0x40
SpinCount = 0x00000000
複製代碼
從輸出結果可知,有一個鎖住的關鍵段,被0號線程
(線程id
爲0x00001924
)擁有。並且這個死鎖的關鍵段的成員LockSemaphore
正是1號線程
正在等待的句柄值。忽然想起來《windows核心編程》上講過關鍵段的結構,其中的LockSemaphore
爲Event
類型的,具體參考第八章8.4節。debug
至此,終於真相大白了,0號線程
在DllMain()
內(ul_reason_for_call
爲DLL_PROCESS_DETACH
)等待1號線程
結束,而1號線程
在結束的時候一樣要調用DllMain()
,而且ul_reason_for_call
參數爲DLL_THREAD_DETACH
。因爲對DllMain()
的調用須要序列化,須要等待0號線程
釋放鎖後,其它線程才能調用。而0號線程
又在無限等待1號線程
結束,故死鎖。
注意:即便在DllMain()
裏調用DisableThreadLibraryCalls(hModule);
也無論用,具體參考《windows核心編程》中的相關分析。
在winnt.h
裏找到了CriticalSection
的定義,以下
typedef struct _RTL_CRITICAL_SECTION {
PRTL_CRITICAL_SECTION_DEBUG DebugInfo;
//
// The following three fields control entering and exiting the critical
// section for the resource
//
LONG LockCount;
LONG RecursionCount;
HANDLE OwningThread; // from the thread's ClientId->UniqueThread
HANDLE LockSemaphore;
ULONG_PTR SpinCount; // force size on 64-bit systems when packed
} RTL_CRITICAL_SECTION, *PRTL_CRITICAL_SECTION;
複製代碼
不要在DllMain()
裏等待線程結束。
使用!cs -l
調試關鍵段死鎖,真香。
CriticalSection
相關知識,尤爲是8.4.1
節) 第二十章(dll
相關知識,尤爲是20.2.5
節)的相關內容。