Introduction
It would seem for the most part that a large sum of the malware analysis and
reverse engineering world takes for granted some of the extended features that the processor
provides us. This write-up will explain the details of system debug MSR's (Model
Specific Registers) for both AMD and Intel and how these features can be leveraged
to user-mode level debuggers and not just code running at a CPL of 0.
It is important to note that certain debug feature MSRs vary between Intel and AMD processors. For example certain Intel CPUs provide
up to 15 Last Branch records whereas AMD does not. In either case however those cannot be leveraged from code running at a CPL of 3 and is not the scope of our discussion.
Background
The author assumes that you have a decent knowledge of Windows debugger APIs,
Windows internals, and assembly. Last branch recording and branch tracing should have
a solid place among the malware analyst's or software reverse engineer's arsenal of tactics for analysis of ring 3 code. The
Windows OS itself provides several backdoors
into leveraging these techniques from user mode. The goal of this article is to provide a decent explanation of how to use these features and incorporate them into your
own debuggers and analysis tools.
Branch tracing
A branch is an instruction that can conditionally or unconditionally transfer control flow. For example any conditional jump, unconditional jump, call,
ret, far call, far jump, iret, retf, int n, syscall, sysexit, icebp, etc.
The term 'branch taken' means there was an actual change in control flow resulting from the branch. In an unconditional branch instruction,
the branch will always be taken. However with a conditional jump (for example following a bitwise comparison) the branch will not always be taken
and is based upon the result of the prior comparison.
As you hopefully already know, the processor single-step feature (EFLAGS.TF=1
) causes a #DB exception to occur after each and every instruction boundary
is reached. This type of exception is known as a trap, meaning the instruction pointer that is pushed onto the interrupt handler stack will point to the next instruction to be executed.
Simple x86 example:
pushfd
or dword ptr [esp], 0x100
popfd
inc eax
push ebx
The DebugCtl MSR provides a bit that will, when set along with EFLAGS.TF=1
, only raise a #DB trap (single_step
) after a branch instruction boundary has been reached,
instead of every instruction. This occurs only if the branch is taken. The instruction pushed onto the handler stack is then that of the destination
of the branch, which is then of course your instruction pointer EIP/RIP in the Windows debugger
CONTEXT
structure.
Simple x86 example:
(EFLAGS.TF=1 and DebugCtl.BTF=1
)
push ebx
push eax
call ecx
xor eax, eax
inc ebx
pop eax
pop ebx
ret
Now, how do we access DebugCtl
from usermode? It's simple, and Windows provides access to both BTF and LBR bits of
DebugCtl
via bits 8 and 9 of DR7. If interested,
see KiRestoreDebugRegisterState
.
- bit 8 of DR7 represents bit 0 of
DebugCtl
. This is the LBR bit. (last branch record, will explain) - bit 9 of DR7 represents bit 1 of
DebugCtl
.
This is the BTF bit. (single-step on branches)
As I'm sure you can imagine, this can speed up a running trace by a long shot. Because in theory, when looking for a difference in code control flow,
or a bug our answer is most likely going to rely in which branches are taken and which are not, and when only tracing branches, you can trace hundreds
of thousands of instructions per second as opposed to generating an interrupt after every instruction boundary.
Now maybe you have noticed, or maybe not, this leaves us with a problem. The instruction pointer pushed onto the handler stack is that of the destination
of the branching instruction. Thus RIP/EIP in your usermode CONTEXT
structure will be that of the destination. What if we want to know the location
of the branching instruction itself? This is where the last branch record stack comes in, also known as LBR.
Lets imagine you are already branch tracing a program with your user mode debugger or analysis tool. You have bit 9 of DR7 set to enable branch tracing,
and the trap flag set as well. Here is what to do. Additionally set the LBR bit via DR7 (bit 8, as shown above). When a #DB exception occurs due to a taken branch,
analyze EIP/RIP in your CONTEXT
. That as stated before is your destination instruction. Now for the yummy part of the article: the address of the branching instruction
itself is tucked away by Windows at EXCEPTION_RECORD->ExceptionInformation[0]
provided of course that you properly enabled LBR. This is then the virtual address
of the branching instruction itself which branched to whatever your instruction pointer is.
I had gotten a little creative with this myself and I couldn't find any in-depth articles on the web related to these features so I decided to write one myself.
I was analyzing a little piece of software for my friend and I had noticed that prior to calling into
ws32.send()
it would clear the stack, and set up a fake return address
then JMP
to send()
as to not push the original return address onto the stack, making the job of finding wherever it originated from a royal pain.
LBR to the rescue
Here is how we could easily overcome this problem, some of you may already know by this point, but read on for important details.
- First of all, we must initialize LBR on the thread we are analyzing. In this case we do not need the BTF feature. So set bit 8 of DR7 for the thread.
- Next we must establish a breakpoint on
ws32.send()
or whatever you are analyzing. Here is the important part: the type of exception raised MUST be a #DB exception.
This is because the only Windows interrupt handler that inserts the LastBranchFromIp
into
EXCEPTION_RECORD->ExceptionInformation[0]
is the Windows int 01
handler.
The Windows int 3
handler does not do this for us, and if you use
int 3
your ExceptionInformation[0]
member will be empty.
You can either use a debug register breakpoint or ICEBP
(int 01 with no DPL check). I would personally recommend
ICEBP
and here is why: The code could IRET
to send()
with the resume flag set. If the user of your debugger initialized a breakpoint on the first instruction of send() it would be ignored!
And let's face it, a lot of us like the put breakpoints on the first instruction of an API.
Points of Interest - VM detection
Besides analyzing a branch to a function which was setup with a bogus stack, the LBR feature works as a pretty decent method to detect whether or not your program is running within a hyper-visor. This works because most virtualization software including both VMware and Vbox do not make use of LBR virtualization (even though it is possible and supported).
Here is a rough example. I will leave the logic of the exception handler up to you.
RunningInHyperVisor PROC
mov [eax], 0x10
lea ebx, [eax+0x18]
mov [ebx], 0x100
push eax
push ecx
call SetThreadContext
_emit 0xeb
_emit 0x00
_emit 0xf1
RunningInHyperVisor ENDP
At this point you would then examine the contents of ExceptionInformation[0]
. If running under VM it will not contain the virtual address of the most previous branch.