調試TerminateThread致使的死鎖

項目裏的一個升級程序偶爾會死鎖,查看dump後發現是死在了ShellExecuteExW裏。經驗少,不知道爲何.
git

根據描述可知,應該是擁有關鍵段的線程意外結束了。仔細檢查項目中的代碼,發現程序中有使用TerminateThread()來強制殺線程的代碼。很可疑,因而寫了一個測試程序,還原了這個問題。windows

問題重現

重現方法

主程序會加載一個DLL,並調用該DLL的導出函數建立一個線程,而後調用TerminateThread()強制殺死這個線程,而後調用RunProcess()(內部封裝了對ShellExecuteEx()的調用)執行一個新進程,會卡死在ShellExecuteEx()。爲了讓問題更容易重現,特意在DllMain()的參數ul_reason_for_callDLL_THREAD_DETACH時,強制睡眠了5秒。bash

代碼摘錄

主工程 testTerminateThread架構

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
複製代碼
copy
//testTerminateThread.cpp
#include "stdafx.h"
#include "windows.h"
#include "process.h"

typedef HANDLE (*pfnGenerateThread)();

HANDLE RunProcess(const TCHAR* app_name, const TCHAR* cmd)
{
  SHELLEXECUTEINFO shex = {sizeof(SHELLEXECUTEINFO)};
  shex.fMask = SEE_MASK_NOCLOSEPROCESS;
  shex.lpVerb = _T("open");
  shex.lpFile = app_name; 
  shex.lpParameters = cmd; 
  shex.lpDirectory = NULL; 
  shex.nShow = SW_NORMAL;

  if (!::ShellExecuteEx(&shex))
  {
    return INVALID_HANDLE_VALUE;
  }

  return shex.hProcess;
}

int _tmain(int argc, _TCHAR* argv[])
{
  while ( 1 )
  {
    HMODULE hModule = LoadLibrary(_T("testDll.dll"));
    if ( NULL == hModule )
      return 0;

    pfnGenerateThread pfn = (pfnGenerateThread)GetProcAddress(hModule, "GenerateThread");
    if ( NULL == pfn )
      return 0;

    HANDLE hThread = pfn();

    // give thread time to start up
    Sleep(1000);
    
    // terminate thread.
    BOOL bOk = TerminateThread(hThread, 0);

    // dead lock in this function...
    RunProcess(argv[0], NULL);

    FreeLibrary(hModule);
  }

  return 0;
}
複製代碼

DLL工程 testDllapp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
複製代碼
copy
// DllMain.cpp
#include "stdafx.h"
#include "windows.h"

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
					 )
{

  switch (ul_reason_for_call)
  {
  case DLL_PROCESS_ATTACH:
    OutputDebugString(L"====> DLL_PROCESS_ATTACH called.\n");
    break;
  case DLL_THREAD_ATTACH:
    OutputDebugString(L"----> DLL_THREAD_ATTACH called.\n");
    break;
  case DLL_THREAD_DETACH:
    OutputDebugString(L"<---- DLL_THREAD_DETACH called.\n");
        // with LdrpLoaderLock held! sleep 5 seconds. 
    Sleep(5000);
    break;
  case DLL_PROCESS_DETACH:
    OutputDebugString(L"<==== DLL_PROCESS_DETACH called.\n");
    break;
  }
  return TRUE;
}
複製代碼
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
複製代碼
copy
// testDll.cpp
#include "stdafx.h"
#include "stdio.h"
#include "process.h"
#include "windows.h"

void OutputCurrentThreadId()
{
  TCHAR szBuffer[1024];
  swprintf_s(szBuffer, L"thread [0x%x], running & exiting...\n", GetCurrentThreadId());
  OutputDebugString(szBuffer);
  return;
}

unsigned __stdcall testProc(void *)
{
  OutputCurrentThreadId();
  return 0;
}

HANDLE GenerateThread()
{
  HANDLE hThread = (HANDLE)_beginthreadex(NULL, 0, &testProc, NULL, 0, NULL);
  return hThread;
}
複製代碼

問題分析

運行測試程序前先打開DbgView監視調試信息,而後運行測試程序。分佈式

DebugView

從日誌可知,咱們啓動的測試線程的線程id0x1400函數

當程序hang住後,使用windbg附加。附加成功後,先運行~*kvn查看線程及每一個線程的的調用棧信息。發現只有一個0號線程(1號線程是windbg附加到進程時產生的)。源碼分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
複製代碼
copy
0:001> ~*kvn

   0  Id: 18c0.1008 Suspend: 1 Teb: 7ffdf000 Unfrozen
 # ChildEBP RetAddr Args to Child 
00 002bf614 775a6a64 77592278 00000064 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
01 002bf618 77592278 00000064 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
02 002bf67c 7759215c 00000000 00000000 00000001 ntdll!RtlpWaitOnCriticalSection+0x13e (FPO: [Non-Fpo])
03 002bf6a4 775c00e1 77637340 77bf1b77 00000000 ntdll!RtlEnterCriticalSection+0x150 (FPO: [Non-Fpo])
04 002bf6dc 75587bc3 00000001 00000000 002bf704 ntdll!LdrLockLoaderLock+0xe4 (FPO: [Non-Fpo])
05 002bf728 7679215d 00000000 002bf73c 00000104 KERNELBASE!GetModuleFileNameW+0x75 (FPO: [Non-Fpo])
06 002bf948 76792112 002bfbb0 002bf968 7ffdb000 SHELL32!InRunDllProcess+0x39 (FPO: [Non-Fpo])
*** WARNING: Unable to verify checksum for C:\Users\BianChengNan\Documents\Visual Studio 2012\Projects\testTerminateThread\Debug\testTerminateThread.exe
07 002bf95c 013714db 002bfa44 002bfcbc 002bfbc0 SHELL32!ShellExecuteExW+0x51 (FPO: [Non-Fpo])
08 002bfbb0 01371685 000ac518 00000000 00000000 testTerminateThread!RunProcess+0xdb (FPO: [Non-Fpo]) (CONV: cdecl) [c:\users\bianchengnan\documents\visual studio 2012\projects\testterminatethread\testterminatethread\testterminatethread.cpp @ 28]
09 002bfcbc 01371c69 00000001 000ac510 000ae660 testTerminateThread!wmain+0xc5 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\users\bianchengnan\documents\visual studio 2012\projects\testterminatethread\testterminatethread\testterminatethread.cpp @ 59]
0a 002bfd0c 01371e5d 002bfd20 758ced6c 7ffdb000 testTerminateThread!__tmainCRTStartup+0x199 (FPO: [Non-Fpo]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 533]
0b 002bfd14 758ced6c 7ffdb000 002bfd60 775c37eb testTerminateThread!wmainCRTStartup+0xd (FPO: [Non-Fpo]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 377]
0c 002bfd20 775c37eb 7ffdb000 77bf10cb 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
0d 002bfd60 775c37be 01371082 7ffdb000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
0e 002bfd78 00000000 01371082 7ffdb000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

# 1 Id: 18c0.193c Suspend: 1 Teb: 7ffde000 Unfrozen
 # ChildEBP RetAddr Args to Child 
00 0133fbac 775ff20f 76a71677 00000000 00000000 ntdll!DbgBreakPoint (FPO: [0,0,0])
01 0133fbdc 758ced6c 00000000 0133fc28 775c37eb ntdll!DbgUiRemoteBreakin+0x3c (FPO: [Non-Fpo])
02 0133fbe8 775c37eb 00000000 76a71183 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
03 0133fc28 775c37be 775ff1d3 00000000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
04 0133fc40 00000000 775ff1d3 00000000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
複製代碼

經過調用棧,咱們發現程序卡在了ShellExecuteExW裏。測試

運行!cs -l看下輸出結果:ui

1
2
3
4
5
6
7
8
9
10
11
複製代碼
copy
0:001> !cs -l
-----------------------------------------
DebugInfo          = 0x77637540
Critical section   = 0x77637340 (ntdll!LdrpLoaderLock+0x0)
LOCKED
LockCount          = 0x1
WaiterWoken        = No
OwningThread       = 0x00001400
RecursionCount     = 0x1
LockSemaphore      = 0x64
SpinCount          = 0x00000000
複製代碼

注意OwningThread的值0x00001400 正是咱們生成的測試線程,與咱們在DbgView裏看到的線程id一致。可是該線程已經被咱們殺死了,它在被殺死前得到了進程加載鎖0x77637340 (ntdll!LdrpLoaderLock+0x0)

至此,真相大白。

總結

  • 不要隨便用TerminateThread來強行殺死線程!

  • windbg真是windows下的調試神器。

  • !cs -l能夠幫助咱們快速的查找到死鎖的關鍵段。

【關注公衆號領資料】

搜索公衆號【Java耕耘者】點擊小助理,便可獲取大量優質電子書和一份Java高級架構資料、Spring源碼分析、Dubbo、Redis、Netty、zookeeper、Spring cloud、分佈式等視頻資料

相關文章
相關標籤/搜索