For better maintainence, you will find updates at http://www.codeplex.com/easyhook. This article belongs to a project that also supports kernel mode hooking.
Please
note that this article is designed to be read parallel with the source
code. Also I highly recommend to read the first uninformed article
(link follows), otherwise you won't understand much of the
work covered here. This is no tutorial and beginners should not read
it. There are so much
things to know about PatchGuard that it is impossible to cover them all
in one single article. Also I don't want to repeat other guys work!
You will find a PDF version of this documentation here.
Table of Content
1 Introduction to PatchGuard
2 Disarming PatchGuard 3
3 Using and compiling the drivers
4 Ideas for PatchGuard 4
5 Windows timer internals
If
you want awesome technical articles about how PatchGuard is working, I
can only recommend the following (mine is about how to disable it and
not that much about how PatchGuard is working):
• PatchGuard 1: http://www.uninformed.org/?v=3&a=3&t=pdf (recommended)
• PatchGuard 2: http://www.uninformed.org/?v=6&a=1&t=pdf
• PatchGuard 3: http://www.uninformed.org/?v=8&a=5&t=pdf
There
is another thing to mention. You can of course disable PatchGuard in a
DOCUMENTED, STABLE and EASY manner, by running the following commands
in a root-shell and restarting the PC afterwards:
Bcdedit /debug ON
Bcdedit /dbgsettings SERIAL DEBUGPORT:1 BAUDRATE:115200 /start AUTOENABLE /noumex
“noumex” will disable user mode exceptions for kernel debuggers which
in fact would Visual Studio prevent from working. “AUTOENABLE” will
force PatchGuard to be disabled, because even if you don’t attach a
kernel debugger, you could do it at any time, and that is enough. Don’t
use this setting to write kernel patching software for end-users. The
DEBUG switches will have many side-effects. Mainly with Visual Studio
and probably other debuggers, but also with DRM content playback, I
suppose. I also noticed an annoying system slowdown and a huge overall
latency. The boot time will be increased too, probably because Windows
is waiting for a debugger…
Why is PatchGuard disabled with these
settings? Simply, because to set breakpoints, you will have to
overwrite kernel code, for example, with INT3 and that would already be
enough for PatchGuard to BSOD. Another aspect is that a debugger is a
common way to explore PatchGuard ;-).
This
driver is NOT intended to be used in any end-user scenarios. It has
been tested on Windows Vista x64 (all updates, 01.08.2008) and on
Windows Vista x64 SP1 (all updates, 01.08.2008). It is known to not
work on an out-dated Windows, so make sure that all PatchGuard
related updates (better all updates), released before the above dates,
are installed. Any future Windows Update may shot this approach to
hell, and would BSOD the systems of your customers in the worst case!
This is not limited to my approach… There is no way to bypass
PatchGuard on end-user PCs, but only on your own, where you have
control about updates and may hide all future PatchGuard related ones,
for example! The Symantec Showcase, how I like to call it, has proven
that you only can rely on documented things, especially when dealing
with the kernel. Like Microsoft said to Symantec, that they will
release an update, BSODing related PCs, if Symantec puts PatchGuard
bypassing code into their products, I am saying to you: Don’t ever
release a PatchGuard bypassing or 64-Bit undocumented kernel hooking
product to customers! An option may be the PatchGuard API, but I think
it is not publicly available. BTW I disagree to Symantec’s opinion that
they need to use undocumented kernel patching to develop security
products. The only thing is that they already had such products and it
would be very expensive for them to reinvent the wheel, by not using
kernel patching. Microsoft just did the right thing by not listening to
Symantec. Undocumented kernel patching can never be an option to raise
security in a trusted environment like Vista 64-Bit. It will probably
even lower security AND stability. It is Microsoft’s duty to protect
the kernel against malware and this will already work very well with
UAC, enforced driver signing (it really was a burden to get my test
driver installed the first time, so I think the kernel security
performs quite well) and PatchGuard. The story that “malware is able to
bypass PatchGuard” is something strange. I never saw a signed virus.
And even if so, there is no way to do something against it if malware
already is in the kernel. In this case you just lost the fight. Malware
instead should NEVER get into the kernel; this is what security
software should care about and not some kind of unstable, undocumented
and also unsafe or even useless Post-Mortem-Security.
The following data and structures are protected by PatchGuard:
• Modifying system service tables, for example, by hooking KeServiceDescriptorTable
• Modifying the interrupt descriptor table (IDT)
• Modifying the global descriptor table (GDT)
• Using kernel stacks that are not allocated by the kernel
• Patching any part of the kernel (detected only on AMD64-based systems)
PG
uses a system check routine that validates all the above structures and
data. Basically PatchGuard is about how to protect this system check
routine from being cracked. Cracking means to violate at least one of
the above statements without knowledge of PatchGuard. This article
describes how to violate ALL of them by completely disabling PatchGuard!
In
the following I will refer to so called “code paths”. Those are meant
to invoke the system check routine. The simplest way is to directly
call it, like you would call any other API. But PatchGuard is about
making it as hard as possible to
• Find out how the system check routine is invoked
• Interfere with this invocation (by preventing/redirecting it)
• Disassemble the system check routine or find out its entry point address
To
accomplish this task, PatchGuard uses various tricks and uncommon
processor behavior to obfuscate as much as possible. The system check
routine itself will be executed every few minutes.
The following is a diagram for the code logic of PatchGuard:
The
orange arrow applies to PatchGuard 3 only. As you can see, PatchGuard
uses the general purpose mechanism for delayed code execution (timers).
There are ten PatchGuard related DPC routines:
CmpEnableLazyFlushDpcRoutine
CmpLazyFlushDpcRoutine
ExpTimeRefreshDpcRoutine
ExpTimeZoneDpcRoutine
ExpCenturyDpcRoutine
ExpTimerDpcRoutine
IopTimerDispatch
IopIrpStackProfilerTimer
KiScanReadyQueues
PopThermalZoneDpc
But
keep in mind that those are also executing important system code if
PatchGuard is not invoked. This switch is made by passing an invalid
pointer as
DeferredContext
. One may say that this is
buggy, because dereferencing an invalid pointer at dispatch level
throws a non-catchable trap resulting in a bug check. But PatchGuard is
using a so called non-canonical pointer. Such a pointer does not follow
the x64 processor specification that requires a pointer to have to
upper 16 bits either set to one or zero (that is 0x0000 or 0xFFFF). A
non-canonical pointer starts with 0x6238, 0xF10A and so on.
Dereferencing them will instead cause a general protection fault, which
is catchable and in case of PatchGuard, executes the system check
routine.
PatchGuard 3 further has the ability to resume
execution in the DPC routine after a #GP, probably raising another one,
which again could decide to invoke the system check routine or continue
execution. PatchGuard 3 also may execute the system check routine
without raising an exception at all. What it does for a particular DPC
depends on the encrypted
DeferredContext
and probably also on the DPC itself. So we just don’t know!
The
following is a code flow diagram of PatchGuard. The green boxes apply
to all PatchGuard versions whereas the yellow box only applies since
PatchGuard 2 and the orange boxes to PatchGuard 3. Please keep in mind
that most of the stuff is just “guessing”. I am optimistic that the
following diagram will be a good approximation:
As
you can see, PG3 has become much more flexible and thus much harder to
bypass. My driver will attack the red arrows:
“DPC Dispatcher” to “PatchGuard DPC”The driver filters all DPCs with PatchGuard specific parameters as DeferredContext
.
Further it catches the non-SEH code path of PatchGuard which
previously has been overwritten with unhandled breakpoints, finding
their way back to the exception handler in my driver! Of course there
are still some DPCs left and this is why we need to add further
blocking.
“Dynamic stub” to “System check” and “PatchGuard DPC to System check”By
overwriting the dynamic stubs and parts of the PatchGuard DPCs with
breakpoints, execution continues in the DPC interceptor’s
exception handler instead of the system check routine.
“ExpWorkerThread” to “Dynamic stub” and “Unknown”
Another hook is applied to the worker thread queue. This is filtering all dynamic worker routines and wrapping all others in a try-except statement. Some special kinds of system check invocations which seem to be incompatible with the next blocking mechanism are always originated from the worker queue. But as I said, we filter them, so their incompatibility won’t cause any harm...
“System check” to “KeBugCheckEx”
Only PatchGuard methods raised over ExQueueWorkItem
get here. It is a burden to reproduce this case because you have to restart 10 – 30 times…
This block just suspends all CRITICAL_STRUCTURE_CORRUPTION
s. So they never cause the system to BSOD.
The problem was that the driver caught the bug check attempt but failed to suspend the calling worker thread. Then I also hooked ExpWorkerThread
and since that step, only suspend able calls to KeBugCheckEx
are made.
By patching some of the code paths, I could extract interesting call stacks on their way to the system check routine.
The
following is one of the longest possible call stacks in case of
PatchGuard. It raises two handled #GPs and finally invokes the system
check routine directly from the DPC without exception handling:
nt!KeBugCheckEx
nt! ?? ::FNODOBFM::`string'+0x12767
nt!KiExceptionDispatch+0xae
nt!KiBreakpointTrap+0xb7
nt!ExpTimeRefreshDpcRoutine+0x1e6
nt!_C_specific_handler+0x8c
nt!RtlpExecuteHandlerForException+0xd
nt!RtlDispatchException+0x228
nt!KiDispatchException+0xc2
nt!KiExceptionDispatch+0xae
nt!KiGeneralProtectionFault+0xcd
nt!ExpTimeRefreshDpcRoutine+0xf1
nt!_C_specific_handler+0x140
nt!RtlpExecuteHandlerForUnwind+0xd
nt!RtlUnwindEx+0x233
nt!_C_specific_handler+0xcc
nt!RtlpExecuteHandlerForException+0xd
nt!RtlDispatchException+0x228
nt!KiDispatchException+0xc2
nt!KiExceptionDispatch+0xae
nt!KiGeneralProtectionFault+0xcd
nt!KiCustomRecurseRoutine0+0xd
nt!KiCustomRecurseRoutine9+0xd
nt!KiCustomRecurseRoutine8+0xd
nt!KiCustomRecurseRoutine7+0xd
nt!KiCustomAccessRoutine7+0x22
nt!ExpTimeRefreshDpcRoutine+0x54 <== PatchGuard DPC
nt!KiRetireDpcList+0x155
nt!KiIdleLoop+0x5f
nt!KiSystemStartup+0x1d4
The interesting thing about this one is that the #GPs were
handled by PatchGuard, but didn’t invoke the system check routine. So
we have proven that an access violation is not necessarily invoking the
SCR since PatchGuard 3.
The next one in contrast is one of the
shortest possible call stacks. It invokes the system check routine over
a resident method (copied into non-paged pool). You probably see
where this article will go to, because my driver is also listed in the
call stack:
nt!KeBugCheckEx
nt! ?? ::FNODOBFM::`string'+0x12767
nt!KiExceptionDispatch+0xae
nt!KiBreakpointTrap+0xb7
0xfffffa80010b3cc9 <== PatchGuard’s dynamic method
0xfffffa80010bbc00 <== PatchGuard’s optional intro
PG3Disable!VistaAll_DpcInterceptor+0x34
nt!KiRetireDpcList+0x117
nt!KiIdleLoop+0x62
…
Someone who already tried to disable PatchGuard may wonder how I got these call stacks. The
KiBreakPointTrap
should
be self explaining. I patched the code paths with unhandled
breakpoints. So instead of invoking the system check routine,
KeBugCheckEx
will create a memory dump and we can work with it in a post-mortem session using WinDbg.
One way to filter PatchGuard DPCs was to detect its special
DeferredContext
values
and cancel related timers. Since PatchGuard 3, this alone is not
sufficient enough. But it is still a good starting point to filter out
most of PatchGuard’s DPCs before they actually raise the system check
routine.
The following code snipped is able to tell whether a given pointer is a PatchGuard context or not:
BOOLEAN CheckSubValue(ULONGLONG InValue)
{
ULONG i;
ULONG Result;
UCHAR* Chars = (UCHAR*)&InValue;
Result = 0;
for(i = 0; i < 8; i++)
{
Result += ((Chars[i] & 0xF0) >> 4) + (Chars[i] & 0x0F);
}
if(Result < 70)
return TRUE;
return FALSE;
}
BOOLEAN PgIsPatchGuardContext(void* Ptr)
{
ULONGLONG Value = (ULONGLONG)Ptr;
UCHAR* Chars = (UCHAR*)&Value;
LONG i;
if((Value & 0xFFFF000000000000) == 0xFFFF000000000000)
return FALSE;
if((Value & 0xFFFF000000000000) == 0)
return FALSE;
if(CheckSubValue(Value) || CheckSubValue(~Value))
return FALSE;
if(Ptr == NULL)
return FALSE;
for(i = 7; i >= 0; i -= 2)
{
if(Chars[i] != 0)
return TRUE;
}
return FALSE;
}
The
problem is that our task here is NOT to filter non-canonical pointers.
Our task is to distinguish between PatchGuard context parameters and
any other common
DeferredContext
value such as zero, some
table indices like “1, 2, 3, 4, 5, …”, Unicode sequences, etc. All in
all the above code detects pseudo random values.
With this
ability in mind, we can now look at how to patch the DPC dispatcher,
which is the only way to apply these custom checks.
In order to disable PatchGuard 3, we will have to block all DPCs with a PatchGuard specific context and to catch the exceptions raised by unhandled breakpoints. But there still seem to be code paths left, running in a worker queue, executing the system check routine and finally raising the bug check. For this purpose we will also hook
KeBugCheckEx
and suspend all threads raising a
CRITICAL_STRUCTURE_CORRUPTION
. Please note that the latter is a very rare case and won’t impact system performance. But this is still not enough. Some of the
KeBugCheckEx
calls are not suspend able. I solved this by also hooking
ExpWorkerThread
, filtering out dynamic stubs and wrapping all others in a try-except statement.
It is time to mention that
KiRetireDpcList
and
KiTimerExpiration
are the only points in the kernel which are responsible for dispatching queued DPCs. If you look at the disassembly for
KiTimerExpiration
and
KiRetireDpcList
, which is quite too long for showing, you will find the following four indirect call code blocks:
nt!KiTimerExpiration+0x888:
488b5308 mov rdx, qword ptr [rbx+8]
488b4bf8 mov rcx, qword ptr [rbx-8]
4d8bcc mov r9, r12
4c8bc7 mov r8, rdi
ff13 call qword ptr [rbx]
4084f6 test sil, sil
742c je nt!KiTimerExpiration+0x8d3
nt!KiTimerExpiration+0x679:
488b5308 mov rdx,qword ptr [rbx+8]
488b4bf8 mov rcx,qword ptr [rbx-8]
4d8bcc mov r9,r12
4d8bc5 mov r8,r13
ff13 call qword ptr [rbx]
4084ed test bpl, bpl
742c je nt!KiTimerExpiration+0x7e5
nt!KiTimerExpiration+0x799:
488b5308 mov rdx,qword ptr [rbx+8]
488b4bf8 mov rcx,qword ptr [rbx-8]
4d8bcc mov r9,r12
4d8bc5 mov r8,r13
ff13 call qword ptr [rbx]
4084ed test bpl, bpl
742c je nt!KiTimerExpiration+0x7e5
nt!KiRetireDpcList+0x145:
4d8bcc mov r9, r12
4c8bc5 mov r8, rbp
488bd6 mov rdx, rsi
488bcf mov rcx, rdi
ff542470 call qword ptr [rsp+70h]
4584ff test r15b, r15b
742b je nt!KiRetireDpcList+0x185
What we can see is that they are very similar and in all
tested kernel images, these four code blocks were unique. So what are
those blocks actually doing? They will invoke the user defined DPC
routine…
But how does the thing look like under Windows Vista SP1? The following are the only two points in the SP1 kernel, executing user DPCs:
Nt!KiTimerListExpire+0x31a:
458b4e04 mov r9d,dword ptr [r14+4]
458b06 mov r8d,dword ptr [r14]
4189ac24a0370000 mov dword ptr [r12+37A0h], ebp
488b5308 mov rdx, qword ptr [rbx+8]
488b4bf8 mov rcx, qword ptr [rbx-8]
ff13 call qword ptr [rbx]
4084ff test dil, dil
0f856c8ffdff jne nt! ?? ::FNODOBFM::`string'+0x39742
nt!KiRetireDpcList+0x107:
4d8bce mov r9,r14
4d8bc5 mov r8,r13
498bd4 mov rdx,r12
488bcb mov rcx, rbx
ff542470 call qword ptr [rsp+70h]
4584ff test r15b,r15b
0f856e7ffdff jne nt! ?? ::FNODOBFM::`string'+0x39888
One
will observe, that even if the whole underlying code logic has changed
(only two DPC invocations, another method for expiring timers, far
jumps at the end, etc.), it does still look very similar. Also the
chance that a usual Windows Update changes the machine code for
this part is quite small. Before developing any flexible mechanism we
just search for the plain bytes. This works and is much more
stable. For every update that changes the bytes we can simply add
additional search vectors, currently I only know those two…
From
now on I focus on Service Pack 1, because it only has two methods
to patch and is more up to date. The code without Service Pack is quite
similar and you should have no problems to understand it, once you got
it for SP1.
How we can patch
KiTimerExpiration
and
KiRetireDpcList
if
we nowhere can insert a JMP instruction without messing up the whole
code logic? We will overwrite the last MOV instruction too and emulate
it in a proper jumper table before we continue execution in our
interception method:
Nt!KiTimerListExpire+0x31a:
458b4e04 mov r9d,dword ptr [r14+4]
458b06 mov r8d,dword ptr [r14]
4189ac24a0370000 mov dword ptr [r12+37A0h], ebp
488b5308 mov rdx, qword ptr [rbx+8]
90 nop
E8XXXXXXXX call TIMER_FIX
4084ff test dil, dil
0f856c8ffdff jne nt! ?? ::FNODOBFM::`string'+0x39742
nt!KiRetireDpcList+0x107:
4d8bce mov r9,r14
4d8bc5 mov r8,r13
498bd4 mov rdx,r12
90 nop
90 nop
E8XXXXXXXX call DPC_FIX
4584ff test r15b,r15b
0f856e7ffdff jne nt! ?? ::FNODOBFM::`string'+0x39888
Remember that this operation can and has to be performed atomically, by using a 64-Bit wide MOV instruction!
Some
of you may probably ask their selves, where the hell do we wanna jump
to with a 32-Bit offset and the custome image section (where our driver is located) being
hundred gigabytes away?! I also thought much about it and there are only two
solutions. The first one is a sledgehammer approach and I don’t like
such. This is by stopping all CPUs except one, raising IRQL to
HIGH_LEVEL
and putting an absolute call instruction in the two code blocks above.
This is the only way one could replace more than eight bytes as an
atomic operation. The other way is to hijack one of those ten very same
KiCustomAccessRoutines
, PatchGuard is using to obfuscate
the call stack. So what we do here is placing a jumper as first
instruction of such a routine and redirecting it to another one. Now we
can work with the other bytes, and there are quite enough, to build our
jump table. Of course I know that even the atomic MOV instruction is
not 100% safe. But the chance that a thread will be between the MOV and
CALL, which our patch overwrites, and all this within such a short
timeframe, is quite small. The NOPs have to be placed in front of the
CALL because if the changes are rolled back when the driver is
unloading, any code that currently is within the CALL instruction (this
chance is in contrast pretty high) will still return to valid code, the
TEST instruction, because the return address stays the same.
The
jump table has to do two things. Firstly it should recover the
overwritten MOV instruction and then invoke our interception method
using an absolute far jumper. In case of SP1 we have to take care of
two different cases and the code we place in such a custom access
routine may look like this:
nt!KiCustomAccessRoutine4:
E9XXXXXXXX jmp KiCustomAccessRoutine0
TIMER_FIX:
488b4bf8 mov rcx,qword ptr [rbx-8]
eb03 jmp INTERCEPTOR
DPC_FIX:
488bcb mov rcx,rbx
INTERCEPTOR:
48b8XXXXXXXXXXXXXXXX mov rax, VistaAll_DpcInterceptor
ffe0 jmp rax
Well, that’s the entire mystic about patching the DPC
invocation code. Now ANY DPC is going through our interceptor. You
probably can imagine that there are a lot, because most interrupts will
also raise a DPC; thus, the interceptor has to be executed as fast as
possible.
Now we should take a look at the DPC interceptor:
void VistaAll_DpcInterceptor(
PKDPC InDpc,
PVOID InDeferredContext,
PVOID InSystemArgument1,
PVOID InSystemArgument2)
{
ULONGLONG Routine = (ULONGLONG)InDpc->DeferredRoutine;
__try
{
if((Routine >= 0xFFFFFA8000000000) &&
(Routine <= 0xFFFFFAA000000000))
{
}
else if(KeContainsSymbol((void*)Routine))
{
if(!PgIsPatchGuardContext(InDeferredContext))
InDpc->DeferredRoutine(
InDpc,
InDeferredContext,
InSystemArgument1,
InSystemArgument2);
}
else
InDpc->DeferredRoutine(
InDpc,
InDeferredContext,
InSystemArgument1,
InSystemArgument2);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
}
}
The
first thing we check is whether the DPC routine in either in the
confines of the kernel image, where all of the ten DPCs reside, or in
the memory pool, where the dynamic invocation stubs reside. This
way we can reduce the chance that we are cancelling non-PatchGuard
DPCs, because the kernel is very unlikely to have pseudo random numbers
as
DeferredContext
. Also I don’t know any sane driver that needs dynamic DPCs. After this check we can just skip all obvious PatchGuard DPCs and
DeferredRoutines
residing in dynamic memory.
You
may wonder about the exception frame. I mentioned earlier, that
PatchGuard 3 introduces a code path that is not using exceptions and
also only canonical
DeferredContexts
. This way it would pass our filter and get to
KeBugCheckEx
.
The problem now is that PatchGuard may decide to directly invoke it
(not using a worker thread), thus our hook would run at DPC level and
cause a bug check (because a wait attempt at DPC level is not
permitted). I solved this by overwriting some fingerprints with
breakpoints, which actually converts such non-SEH code paths into
SEH ones, because unhandled breakpoints throw a catchable
exception! The following is the prototype of all memory resident
dynamic methods:
nt!KiTimerDispatch:
6690 xchg ax,ax
9c pushfq
4883ec20 sub rsp,20h
8b442420 mov eax,dword ptr [rsp+20h]
4533c9 xor r9d,r9d
4533c0 xor r8d,r8d
4889442430 mov qword ptr [rsp+30h],rax
488b4140 mov rax,qword ptr [rcx+40h]
48b90000000000f8ffff mov mov rcx,0FFFFF80000000000h
4833c2 xor rax,rdx
480bc1 or rax,rcx
48b9f048311148315108 mov rcx,8513148113148F0h <<<<<<
488b10 mov rdx,qword ptr [rax]
c700f0483111 mov dword ptr [rax],113148F0h
4833d1 xor rdx,rcx
488bc8 mov rcx,rax
ffd0 call rax
4883c420 add rsp,20h
59 pop rcx
c3 ret
The driver will search for the bold printed byte sequence and overwrite it with breakpoints; that’s all!
Please
note that the exception frame doesn’t bring any advantage for the SEH
code path. I tried to catch it without the DPC filtering but the Driver
Verifier does not seem to like it. Fortunately, the SEH code path does
ALWAYS use non-canonical
DeferredContext
values and that’s why we can filter it entirely.
Patching the DPC code alone is not enough. PatchGuard 3 also uses worker items to accomplish the same task. Our worker thread interceptor looks something similar, but is not the same:
VOID VistaAll_ExpWorkerThreadInterceptor(
PWORKER_THREAD_ROUTINE InRoutine,
VOID* InContext,
VOID* InRSP)
{
ULONGLONG Val = (ULONGLONG)InRoutine;
if((Val >= 0xfffffa8000000000) && (Val <= 0xfffffaa000000000))
return;
__try
{
InRoutine(InContext);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
}
}
What we do here is to filter out all routines residing in dynamic memory, just like we did it in the DPC hook. Additionally we wrap all other routines in a try-except statement. The context of a PatchGuard worker thread is canonical, so we have no chance to filter it out explicitly.
What I experienced so far is that PatchGuard is using dynamic methods also for the worker queue. After blocking them like above, there were still some bug checks left. They were originated from ExpWorkerThread
, but never gone through our handler! This is something strange because the disassembly shows that we only have on single point where the worker items are actually called. I only can imagine that PatchGuard is again using exceptions to redirect execution to a special exception handler, just like it did for the DPC handler. And this is why I also wrap the calls in a try-except statement!
The procedure of patching the worker queue is very similar to the DPC queue. We have the following bytes to patch:
nt!ExpWorkerThread+0x11a:
5c pop rsp
2470 and al,70h
4c8b6318 mov r12,qword ptr [rbx+18h]
488b7b10 mov rdi,qword ptr [rbx+10h]
>498bcc mov rcx,r12
>ffd7 call rdi
4c8d9ee0020000 lea r11,[rsi+2E0h]
4d391b cmp qword ptr [r11],r11
And redirect them to the very same jump table. The difference is that we don’t restore the registers in the jump table but in a prepared jump target in our driver:
VistaSp0_ExpWorkerThread_Fix PROC
mov rcx, rdi
mov rdx, r12
mov r8, rsp
jmp VistaAll_ExpWorkerThreadInterceptor
VistaSp0_ExpWorkerThread_Fix ENDP
This will call the above interceptor and we are done!
With
the work done so far, we are able to prevent, let’s say, 95% of all
system check routine invocations. The other 5% will pass our DPC filter
and are still causing a BSOD. I said earlier that we are going to hook
KeBugCheckEx
to
solve this issue. Well it is not that easy, because PatchGuard actually
overwrites the method with a fresh copy, before invoking it. So we need
to hook a subroutine. If we look at the first instructions
nt!KeBugCheckEx:
48894c2408 mov qword ptr [rsp+8],rcx
4889542410 mov qword ptr [rsp+10h],rdx
4c89442418 mov qword ptr [rsp+18h],r8
4c894c2420 mov qword ptr [rsp+20h],r9
9c pushfq
4883ec30 sub rsp,30h
fa cli
65488b0c2520000000 mov rcx,qword ptr gs:[20h]
4881c120010000 add rcx,120h
e8c1050000 call nt!RtlCaptureContext
, we see that
RtlCaptureContext
seems to be perfect for our task. This is because
KeBugCheckEx
is calling it very early and thus the system is still in a sane state. In order to hook
RtlCaptureContext
, we need to place a jumper as usual:
Nt!RtlCaptureContext:
50 push rax
48b9Xxx mov rax, PG3Disable! RtlCaptureContext_Hook
ffe0 jmp rax
The
difference is that we are hooking a context capturing method and thus
have to ensure that this context in particular is NOT changed by our
hook. This is why we have to backup volatile registers. The jumper will
continue execution in a native interception stub which was built only
for the purpose of hooking
RtlCaptureContext
:
RtlCaptureContext_Hook PROC
; call high level handler without messing up the context structure...
pushfq
push rcx
push rdx
push r8
push r9
push r10
push r11
mov rcx, qword ptr[rsp + 136]
mov rdx, qword ptr[rsp + 8 * 8]
sub rsp, 32
call KeBugCheck_Hook
mov qword ptr [rsp], rax
add rsp, 32
pop r11
pop r10
pop r9
pop r8
pop rdx
pop rcx
popfq
pop rax
; recover destroyed bytes of RtlCaptureContext
pushfq
mov word ptr [rcx+38h],cs
mov word ptr [rcx+3Ah],ds
mov word ptr [rcx+3Ch],es
mov word ptr [rcx+42h],ss
; jump behind destroyed bytes... (return value of KeBugCheck_Hook)
jmp qword ptr[rsp - 32 - 8 * 7 + 8]
RtlCaptureContext_Hook ENDP
Firstly, it safely calls our
KeBugCheck_Hook
method with the bug check code as first parameter and the caller of
RtlCaptureContext
as second one. Secondly, it recovers the overwritten instructions of
RtlCaptureContext
and finally continues execution behind the jumper.
If you now look at the
KeBugCheck_Hook
ULONGLONG KeBugCheck_Hook(ULONGLONG InBugCode, ULONGLONG InCaller)
{
FAST_MUTEX WaitAlways;
if((InCaller >= KeBugCheckEx_Sym) &&
(InCaller <= KeBugCheckEx_Sym + 100))
{
if(InBugCode == CRITICAL_STRUCTURE_CORRUPTION)
{
EnableInterrupts();
ExInitializeFastMutex(&WaitAlways);
ExAcquireFastMutex(&WaitAlways);
ExAcquireFastMutex(&WaitAlways);
}
}
return RtlCaptureContext_Sym + 14;
}
, you may observe that it just checks whether the caller is
KeBugCheckEx
and the bug check code is
CRITICAL_STRUCTURE_CORRUPTION
.
If that is the case, it reenables interrupts, and block the thread
forever. We are only able to do this, because we eliminated the chance,
that this bug check was raised though the DPC dispatcher and it
definitely would be, if we skip our DPC filter! If the caller was any
other symbol, we just execute
RtlCaptureContext
(). I
assume that every thread switch will call this method, so this is the
next critical execution path of Windows which we are hooking…
I am
sure someone will ask whether this doesn’t lead to an endless thread
creation loop with a period of some mintues. Again, we can only do
this, because we disabled the DPC code path already. As the invokation
method is randomly choosen, we cause one or two orphaned threads
until PatchGuard is never executed again (because it does not always
use
ExQueueWorkItem
and if we block all other code paths, there is a point when no PatchGuard contexts are left).
OPERATION SUCCEEDED, PATIENT DEADSo
what we have done so far? We disabled PatchGuard 3 on Windows Vista
SP1, all updates installed. Of course the patches we applied were not
that common coding style ;-). But everyone will agree that potential
malware is written like that and actually the patches are very stable
for a given OS. You may rollback all changes after approx. 20 minutes.
The reason is that PatchGuard will only add a new code path, if the
system check routine is invoked. So when we block its execution for a
reasonable period of time, there is nothing to block anymore… My driver
does not support rolling back the changes so far and in fact never
will. This is also why you can’t unload it after patching.
If
you never developed a driver for 64-Bit, you have several obstacles to
solve. Therefore I prepared a clean Windows installation. Now I will
describe how you can build my drivers and install them.
As first step you should download the latest Windows Driver Kit from
https://connect.microsoft.com/ and install it to
“C:\WinDDK”
for example. You have to register and sign in to Microsoft Passport I believe but after all it is still free.
Then you have to add a system wide environment variable called
“WDKROOT”
with the value of your installation path.
From now on I assume that you extracted the DisablePatchGuard archive to
“C:\PGDisable”
; so that the project solution is located in
“C:\PGDisable\PGDisable.sln”
.
To install a driver on Windows Vista, you have to sign it. Now open a root-shell and type the following commands:
> Bcdedit –set TESTSIGNING ON
You need to restart the PC. After this we are ready to create a test certificate:
> cd c:\winddk\bin\SelfSign
> MakeCert -r -pe -ss PGDisableCertStore -n "CN=PGDisableCert" “c:\PGDisable\PGDisableCert.cer”
Now
open the certificate in the Explorer by double clicking it. Currently
the certificate is an untrusted one. To make Windows trusting the
certificate, click
“Install certificate”
– “Next” – “place all certificates in the following store” – “Trusted
Root Certification Authorities” – “OK” – “Next” – “Finish”
and do the same procedure to add it to the
“Trusted Publishers”
store.
The project comes with a post-build event that relies on the fact that a certificate named
“PGDisableCert”
is in the
“PGDisableCertStore”
. This will automatically sign the drivers after each build and you have to care about nothing.
We are done and you can start building the project!
To
keep the code simple, I didn’t provide much service management
functionality. The application checks whether the driver is already
installed. If not, it installs the driver. If yes, it just
ensures that it is running.
But if you want to frequently
recompile it, you need to make sure that the driver has been unloaded
and removed from the service control manager, before you restart the
application. Otherwise your newly compiled driver won’t be loaded
because the old one is already there…
If you want to remove the driver from the service manager, just type
“sc delete PG3Disable.sys”
into a root shell. If the driver is currently running, this command
will still succeed, but the driver actually stays installed. To prevent
this you also have to stop it by using
“sc stop PG3Disable.sys”
.
But keep in mind that due to the missing rollback of patches you won’t
be able to load the driver again in the current system session (if you
disabled PatchGuard)!
During development a frequent system restart is common. This is why I develop such drivers in a virtual machine.
The drivers in fact are very easy to use. The following table shows the supported control codes for the PG3Disable-Driver:
IOCTL_PATCHGUARD_DUMPWrites a list of fingerprints to “C:\patchguard.log”
. If you execute this command after disabling PatchGuard, the file should only contain eight custom access routines.
IOCTRL_PATCHGUARD_DISABLEThis silently disables PatchGuard on success.
Both control codes have no parameters and not return value. The only thing is the status code available through
GetLastError
:
ERROR_SUCCESSOperation has been completed successfully.
ERROR_NOT_SUPPORTEDThis
is returned either in case your system is not supported or PatchGuard
was already patched by the driver. For stability reasons, the driver
may decide to refuse patching sometimes; ten just try it 5-10 minutes
later.
All other error values are not explicitly raised by my code but may be returned by invoked kernel APIs.
The
PG2Disable-Driver works very similar but there are two main
differences. Firstly, you can disable PatchGuard multiple times and
will always get
ERROR_SUCCESS
as result. Secondly, you may load/unload the driver as often as you like. Furthermore it exports an additional command:
IOCTL_PATCHGUARD_PROBEInstalls a test hook for KeCancelTimer
. Please note that your system will BSOD if PatchGuard is not already disabled!
Please note that PG2Disable won’t work on Windows Vista SP1.
I
really appreciate what Microsoft has done so far in order to harden
PatchGuard. I don’t think that it takes much effort to raise the bar of
work, necessary to disable PatchGuard, to a degree that can be
considered as non-exploitable. I would eliminate the DPC dispatcher as
the main point of failure. Microsoft should make critical symbols
invisible through the debugger. This could possibly be done by
exporting proper debugging APIs over service interrupts which are using
all required symbols internally; this way the debugger doesn’t need to
know about them but can still keep all functionality. Also some sort of
a private way for delayed code execution would improve the whole thing.
A practical approach to realize this idea would be to define a macro named
“PATCHGUARD_INVOKATION”
,
for example, and use it all over the windows source code, but only for
unexported APIs of higher order (not called inside any public API).
Then a pre-build event could automatically replace such a macro with
randomly generated invocation stubs, or even do nothing for most
occurrences. One could base the randomness on a constant value, so that
major updates will use different constant values resulting in totally
different PatchGuard invocation stubs, while minor updates won’t cause
any changes to PatchGuard related sections in the binaries.
The
form of delayed execution could be further improved by adding additional code
to the internal interrupt handlers and only invoke PatchGuard after a
counter has been incremented X times. Then this code again can check in
a more expensive way whether it is “time” to do a system check or not.
Even if one can still fingerprint many parts of PatchGuard, one could
not disable it in a stable manner, because there are always some parts
missing.
Further, multiple versions of the system check routine,
which then of course shall go through a code morphing engine, would do
their part.
Remember, the goal is not to make PatchGuard
unexploitable on a single machine. The goal is to prevent malware from
disabling it in an automated manner on a wide range of machines.
Now
I want to write a little bit about my first investigations of
PatchGuard 3. At the beginning I thought I could just cancel all the
PatchGuard timers and I’d be done. As you can see I was mistaking, but
that’s what programming is about ;-).
I don’t want to connect
this with disabling PatchGuard. The reason is that PG3 actually is not
exploitable with a timer cancelling approach. PG2 in fact is and the
PG2Disable-Driver has all of the code required to do this and also
shows how to extract the unexported kernel symbols in a stable manner
(not working since Service Pack 1; here you’d have to use
fingerprinting like it is done for
KiRetireDpcList
, for example).
On
the surface, there was nothing in the net that showed how to enumerate
timers. Also the WDK documentation contained nothing about it. So I
finally disassembled some of the timer routines like
KeSetTimerEx
,
KeCancelTimer
, etc.
KeCancelTimer
seems to be the best starting point, because it is so small:
KIRQL OldIrql = 0;
BOOLEAN Existed = FALSE;
PKSPIN_LOCK_QUEUE LockArray = NULL;
ULONG LockIndex = 0;
KTIMER_TABLE_ENTRY* TimerEntry = NULL;
OldIrql = KiAcquireDispatcherLockRaiseToSynch();
Existed = InTimer->Header.Inserted;
if(Existed)
{
LockArray = KeGetPcr()->LockArray;
LockIndex = ((InTimer->Header.Hand / sizeof(KSPIN_LOCK_QUEUE)) & 0x0F) + LockQueueTimerTableLock;
KeAcquireQueuedSpinLockAtDpcLevel(&LockArray[LockIndex]);
InTimer->Header.Inserted = FALSE;
if(RemoveEntryList(&InTimer->TimerListEntry))
{
TimerEntry = &KiTimerTableListHead[InTimer->Header.Hand];
if(TimerEntry == (KTIMER_TABLE_ENTRY*)TimerEntry->Entry.Flink)
{
TimerEntry->Time.HighPart = 0xFFFFFFFF;
}
}
KeReleaseQueuedSpinLockFromDpcLevel(&LockArray[LockIndex]);
}
KiReleaseDispatcherLockFromSynchLevel();
KiExitDispatcher(OldIrql);
return Existed;
I
already inserted the source code guessed from the disassembly. I don’t
want to explain how you get it, because this is mainly based on
experience with compiler building and (dis-)assemblers.
As you
can see, the C-Code for it is quite straightforward. Though, there is a
bug in the timer management, but I will come back to that later. The
following is a little documentation about totally undocumented and
unexported kernel symbols.
KIRQL KiAcquireDispatcherLockRaiseToSynch()Probably locks the timer and/or DPC database. Raises the IRQL to DISPATCH_LEVEL
and returns the previous state.
void KeAcquireQueuedSpinLockAtDpcLevel(PKSPIN_LOCK_QUEUE)Is similar to the publicly available KeAcquireInStackQueuedSpinLock
. You may use this method to acquire any of the locks in KPCR::LockArray
. Please note that this method shall be called at DISPATCH_LEVEL
only. The following constants may be helpful to index the right one:
LockQueueDispatcherLock 0
LockQueueExpansionLock 1
LockQueuePfnLock 2
LockQueueSystemSpaceLock 3
LockQueueVacbLock 4
LockQueueMasterLock 5
LockQueueNonPagedPoolLock 6
LockQueueIoCancelLock 7
LockQueueWorkQueueLock 8
LockQueueIoVpbLock 9
LockQueueIoDatabaseLock 10
LockQueueIoCompletionLock 11
LockQueueNtfsStructLock 12
LockQueueAfdWorkQueueLock 13
LockQueueBcbLock 14
LockQueueMmNonPagedPoolLock 15
LockQueueTimerTableLock 17
void KeReleaseQueuedSpinLockFromDpcLevel (PKSPIN_LOCK_QUEUE)Is similar to the publicly available KeReleaseInStackQueuedSpinLock
. You have to call it to release any of the previously acquired locks in KPCR::LockArray
. Please note that this method shall be called at DISPATCH_LEVEL
only.
void KiReleaseDispatcherLockFromSynchLevel()Probably releases the timer/DPC lock at DISPATCH_LEVEL
and does NOT change the IRQL.
void KiExitDispatcher(KIRQL InOldIrql)Shall be called at DISPATCH_LEVEL
and will lower the IRQL to InOldIrql
. I think this method was not combined with the previous method, to allow performing more operations at DISPATCH_LEVEL
(without holding the lock) before actually lowering the IRQL. I
recently read in some Microsoft paper, that it also may schedule a new
thread, now after the DPC level operation is done…
With this information in mind, we are ready to build our own enumeration method:
PKSPIN_LOCK_QUEUE KeTimerIndexToLockQueue(UCHAR InTimerIndex)
{
return &(KeGetPcr()->LockArray[((InTimerIndex / sizeof(KSPIN_LOCK_QUEUE)) & 0x0F) + LockQueueTimerTableLock]);
}
OldIrql = KiAcquireDispatcherLockRaiseToSynch();
for(Index = 0; Index < TIMER_TABLE_SIZE; Index++)
{
LockQueue = KeTimerIndexToLockQueue((UCHAR)(Index & 0xFF));
KeAcquireQueuedSpinLockAtDpcLevel(LockQueue);
TimerListHead = &KiTimerTableListHead[Index];
TimerList = TimerListHead->Entry.Flink;
while(TimerList != (PLIST_ENTRY)TimerListHead)
{
Timer = CONTAINING_RECORD(TimerList, KTIMER, TimerListEntry); TimerList = TimerList->Flink;
}
KeReleaseQueuedSpinLockFromDpcLevel(LockQueue);
}
KiReleaseDispatcherLockFromSynchLevel();
KiExitDispatcher(OldIrql);
If you now want to cancel a timer during enumeration, you could use the following code snippet:
Timer->Header.Inserted = FALSE;
if(RemoveEntryList(&Timer->TimerListEntry))
TimerListHead->Time.HighPart = 0xFFFFFFFF;
Since PatchGuard 2, the timer DPCs are encrypted, so don’t
try to dereference the pointer. The PG2Disable-Driver shows you how to
obtain the two internal decryption keys
KiWaitNever
and
KiWaitAlways
. With those symbols you may decrypt the KDPC pointer with the following code:
ULONGLONG RDX = (ULONGLONG)Timer->Dpc;
RDX ^= InKiWaitNever;
RDX = _rotl64(RDX, *KiWaitNever & 0xFF);
RDX ^= (ULONGLONG)Timer;
RDX = _byteswap_uint64(RDX);
RDX ^= *KiWaitAlways;
return (KDPC*)RDX;
Now you
are able to enumerate all Windows timers in a stable and interlocked
way, just like the OS does it. Please note that you can’t call
KeCancelTimer
during
the enumeration as this would cause a deadlock! The PG2Disable-Driver
may write all the timer information to a log file, even if we are
running at DPC level during enumeration.
Now we can compare our both code parts. In the code of
KeCancelTimer
we had:
TimerEntry = &KiTimerTableListHead[InTimer->Header.Hand];
And in our enumeration:
TimerListHead = &KiTimerTableListHead[Index];
Well, they look equal on the surface. But the difference here is that
Index
in our enumeration actually has a range from zero to
511
. This is because the public WDK constant
TIMER_TABLE_SIZE
has a value of
512
. Now you might see the problem:
InTimer->Header.Hand
is only one byte wide according to the publicly available
DISPATCH_HEADER
structure. This causes Hand to overflow if the timer is placed in a linked list with an index greater than
255
. This also explains the strange switch we extracted from
KeCancelTimer
, which again checks whether the linked list is empty, even if the use of
RemoveEntryList
already
proved it. Redmon probably realized that there is something wrong and
applied this workaround to make sure that only the timestamp of empty
timer lists is reset.
But actually this seems to cause no big
trouble at all. Just a funny thing that we discovered the reason of
such a bug with plain reverse engineering.