Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / ASM

x64 Memory Access Monitor

5.00/5 (11 votes)
23 Apr 2019CPOL4 min read 10.6K  
This article shows how to automatically catch memory access (read /write) for some memory region and log these changes into file

Introduction

Memory access monitor is implemented as DLL that is injected into the target process. I extended command line interface of tool described in my previous article, https://www.codeproject.com/Articles/1266083/x64-API-Hooker-plus-Disassembler to inject our DLL and eject it. I will include the existing source (with some bug fixes; I wonder how it worked now...) with source of monitor DLL. The DLL itself is also 64-bit, however it can become 32-bit with some minor modifications.

Using the Code

We will use vectored exception handler to catch our read/write access violations. We can add process-wide exception handler with AddVectoredExceptionHandler function:

C++
PVOID WINAPI AddVectoredExceptionHandler(
  _In_ ULONG                       FirstHandler,
  _In_ PVECTORED_EXCEPTION_HANDLER VectoredHandler
);

The first parameter determines the order in which multiple exception handlers get called. If the process we are going to monitor has already registered its own exception handler, it might be important to assure that we set this parameter to TRUE, so we can catch our read/write exceptions and handle them without passing them to this handler, which might become irritated and call TerminateProcess without a word, etc.

Vectored exception handler is process-wide, and it applies to all threads in the process, so we need to synchronize execution between multiple threads, so our monitor won't break. MSDN says it's not recommended to use synchronization objects or allocate memory within the handler, see Remarks here, so I decided to implement a simple spin lock from Wikipedia (you will see the code later).

Memory region to be monitored is represented by the following struct:

C++
struct MONITOR_ENTRY
{
    UCHAR *Start;       // start address of region
    DWORD Size;         // size of region
    FILE *File;         // each region has associated file to which we write memory read/writes
    int Counter;        // r/w access counter
};

When we start monitoring, we change protection to PAGE_EXECUTE only, so if given region contains code, it will be allowed to execute. We register our exception handler that will be called when process will try to read or write to this memory region. Exception handler has the following prototype:

C++
LONG NTAPI Handler(EXCEPTION_POINTERS *ExceptionInfo);

And EXCEPTION_POINTERS structure:

C++
typedef struct _EXCEPTION_POINTERS {
    PEXCEPTION_RECORD ExceptionRecord;
    PCONTEXT ContextRecord;
} EXCEPTION_POINTERS, *PEXCEPTION_POINTERS;

ContextRecord holds thread context at the moment when exception occurred, and ExceptionRecord holds information about exception. We can modify thread context structure (e.g., Rax register value), so when we return from handler Windows will update context before it continues thread execution. To signal that exception is handled and continue execution, we return EXCEPTION_CONTINUE_EXECUTION from the handler, however when we are not interested in exception, we should return EXCEPTION_CONTINUE_SEARCH (e.g., for exceptions that should be handled by our process).

When read / write attempt will occur, we will catch EXCEPTION_ACCESS_VIOLATION (exception code is stored in ExceptionInfo->ExceptionRecord->ExceptionCode) exception. To handle it, we will need:

  1. Address of instruction that caused exception
  2. Address of inaccessible data
  3. Access type (read / write)

The first parameter is retrieved from thread context structure (ExceptionInfo->ContextRecord->Rip), the second parameter is stored in ExceptionInfo->ExceptionRecord->ExceptionInformation[1], and the access type is stored inside ExceptionInfo->ExceptionRecord->ExceptionInformation[0]. Refer to this link for more details. Actions we will perform are listed below:

  1. Acquire lock
  2. Suspend all other threads (because we can't change protection on the fly, in case some thread executes code inside our region)
  3. Change protection of region to PAGE_READWRITE, so we can read the bytes of instruction that caused access violation
  4. Copy this instruction to some buffer (in case rip relative addressing is used, we will need to modify it a little, preserving its side effects)
  5. Add invalid instruction opcode (UD2) instruction after the one we have just copied
  6. Modify instruction pointer so it will point to our buffer
  7. Continue execution (without releasing the lock)

Thread will continue its execution inside our buffer, will execute our copied instruction, and after that will attempt to execute UD2 instruction. This will trigger yet another exception EXCEPTION_ILLEGAL_INSTRUCTION. Now our actions are:

  1. Change protection of region back to PAGE_EXECUTE
  2. Modify instruction pointer so it will point to instruction that immediately follows the original instruction that caused access violation
  3. Resume all other threads
  4. Release lock
  5. Continue execution

We need to make one clarification: transfer control instructions like jmp qword ptr [rax] can be executed without read permission, though they implicitly reference memory.

Now let's see the actual code of our DLL monitor. We have DllMain to catch target process thread creation and termination:

C++
extern BOOL g_Update;

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
        break;
    case DLL_THREAD_ATTACH:
        g_Update = TRUE;
        break;
    case DLL_THREAD_DETACH:
        g_Update = TRUE;
        break;
    case DLL_PROCESS_DETACH:
        break;
    default:
        break;
    }
    return TRUE;
}

Auxiliary functions to handle instruction bytes:

C++
/*
UD2
*/
UCHAR g_InvalidOpcode[] = { 0x0F, 0x0B };

/*
pop        r64
*/
UCHAR g_RegisterRestore[] = { 0x48, 0x58 };

/*
push    r64
push                        low dword            // sign extended to 64bit before push
mov dword ptr [rsp + 4],    high dword
pop        r64
*/
UCHAR g_RegisterOverride[] = { 0x48, 0x50,
                                0x68,                            0x00, 0x00, 0x00, 0x00,
                                0xc7, 0x44, 0x24, 0x04,            0x00, 0x00, 0x00, 0x00,
                                0x48, 0x58 };

#define REGISTER_OVERRIDE_SIZE                        sizeof(g_RegisterOverride)
#define REGISTER_RESTORE_SIZE                        sizeof(g_RegisterRestore)
#define INVALID_OPCODE_SIZE                            sizeof(g_InvalidOpcode)

void GenerateInvalidOpcode(UCHAR *Bytes)
{
    memcpy(Bytes, g_InvalidOpcode, INVALID_OPCODE_SIZE);
}

void GenerateRegisterOverride(DWORD Register, DWORD64 Value, UCHAR *OverBytes)
{
    memcpy(OverBytes, g_RegisterOverride, REGISTER_OVERRIDE_SIZE);

    OverBytes[1] += Register;
    *((INT32*)(OverBytes + 3)) = Value;
    *((INT32*)(OverBytes + 11)) = Value >> 32;
    OverBytes[16] += Register;
}

void GenerateRegisterRestore(DWORD Register, UCHAR *RestBytes)
{
    memcpy(RestBytes, g_RegisterRestore, REGISTER_RESTORE_SIZE);

    RestBytes[1] += Register;
}

void GenerateTrampoline(UCHAR *Ptr, UCHAR *Bytes, DWORD Size, 
                        bool rip, int index, DWORD *pTrampSize)
{
    DWORD64 Address;
    DWORD TrampSize;
    INT32 Offset;
    UCHAR Rex, Lock, Prefix, Prefix0F;
    UCHAR Opcode;
    UCHAR Modrm;
    DWORD AddrReg;
    DWORD Reg;
    DWORD i, j, pi;

    i = 0;
    j = 0;

    if (rip)
    {
        if (Bytes[i] == 0xF0)
        {
            Lock = Bytes[i];
            ++i;
        }
        else Lock = 0;

        if ((Bytes[i] == 0x66) || (Bytes[i] == 0xF2) || (Bytes[i] == 0xF3))
        {
            Prefix = Bytes[i];
            ++i;
        }
        else Prefix = 0;

        if ((Bytes[i] >= 0x40) && (Bytes[i] <= 0x4F))
        {
            Rex = Bytes[i];
            ++i;
        }
        else Rex = 0;

        if (Bytes[i] == 0x0F)
        {
            Prefix0F = Bytes[i];
            ++i;
        }
        else Prefix0F = 0;

        Opcode = Bytes[i];
        ++i;

        Modrm = Bytes[i];
        ++i;

        Offset = *((INT32*)&Bytes[i]);
        i += sizeof(Offset);

        pi = Size - i;
        i += pi;

        TrampSize = REGISTER_OVERRIDE_SIZE + (i - sizeof(Offset)) + REGISTER_RESTORE_SIZE;
        if ((Ptr + TrampSize + INVALID_OPCODE_SIZE) > (Ptr + BUFFER_SIZE))
        {
            fprintf(g_Entry[index].File, "buffer overflow\n");
            TerminateProcess(GetCurrentProcess(), 0);
        }

        Address = (DWORD64)(Bytes + Size + Offset);

        Reg = (Modrm & 0x38) >> 3;

        // exclude: 0, 4, 5
        // 0, 1, 2, 5, 6        ++
        // 3, 7, 4                --

        // 0: 1
        // 1: 2
        // 2: 3
        // 3: 2
        // 4: 3
        // 5: 6
        // 6: 7
        // 7: 6

        AddrReg = ((Reg == 7) || (Reg == 3) || (Reg == 4)) ? (Reg - 1) : (Reg + 1);

        GenerateRegisterOverride(AddrReg, Address, &Ptr[j]);
        j += REGISTER_OVERRIDE_SIZE;

        if (Lock)
        {
            Ptr[j] = Lock;
            ++j;
        }

        if (Prefix)
        {
            Ptr[j] = Prefix;
            ++j;
        }

        if (Rex)
        {
            Ptr[j] = Rex;
            ++j;
        }

        if (Prefix0F)
        {
            Ptr[j] = Prefix0F;
            ++j;
        }

        Ptr[j] = Opcode;
        ++j;

        Ptr[j] = AddrReg | (Reg << 3);
        ++j;

        memcpy(&Ptr[j], &Bytes[i - pi], pi);
        j += pi;

        GenerateRegisterRestore(AddrReg, &Ptr[j]);
        j += REGISTER_RESTORE_SIZE;
    }
    else
    {
        TrampSize = Size;
        if ((Ptr + TrampSize + INVALID_OPCODE_SIZE) > (Ptr + BUFFER_SIZE))
        {
            fprintf(g_Entry[index].File, "buffer overflow\n");
            TerminateProcess(GetCurrentProcess(), 0);
        }

        memcpy(&Ptr[j], &Bytes[i], Size);
        j += Size;
    }

    GenerateInvalidOpcode(&Ptr[j]);
    *pTrampSize = TrampSize;
}

Auxiliary functions to update, suspend and resume threads:

C++
DWORD g_ThreadId[100];
DWORD g_ThreadIdCount;

HANDLE g_ThreadHandle[100];
DWORD g_ThreadHandleCount;

void UpdateThreads()
{
    HANDLE hThreadSnap;
    THREADENTRY32 te32;
    hThreadSnap = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
    te32.dwSize = sizeof(THREADENTRY32);
    Thread32First(hThreadSnap, &te32);
    g_ThreadIdCount = 0;
    do
    {
        if ((te32.th32OwnerProcessID == GetCurrentProcessId()) && 
            (te32.th32ThreadID != GetCurrentThreadId()))
        {
            if (g_ThreadIdCount == ARRAYSIZE(g_ThreadId))
            {
                fprintf(g_File, "Array for thread ids is too small\n");
                TerminateProcess(GetCurrentProcess(), 0);
            }
            g_ThreadId[g_ThreadIdCount] = te32.th32ThreadID;
            ++g_ThreadIdCount;
        }
    } while (Thread32Next(hThreadSnap, &te32));
    CloseHandle(hThreadSnap);

    fprintf(g_File, "thread count updated: %d\n\n", g_ThreadIdCount);
    fflush(g_File);
}

void SuspendThreads()
{
    g_ThreadHandleCount = 0;
    for (int i = 0; i < g_ThreadIdCount; ++i)
    {
        if (g_ThreadId[i] != GetCurrentThreadId())
        {
            g_ThreadHandle[g_ThreadHandleCount] = 
                    OpenThread(THREAD_ALL_ACCESS, FALSE, g_ThreadId[i]);
            SuspendThread(g_ThreadHandle[g_ThreadHandleCount]);
            ++g_ThreadHandleCount;
        }
    }
    if (g_ThreadHandleCount) Sleep(THREAD_DELAY);  // wait, SuspendThread is asynchronous
}

void ResumeThreads()
{
    for (int i = 0; i < g_ThreadHandleCount; ++i)
    {
        ResumeThread(g_ThreadHandle[i]);
        CloseHandle(g_ThreadHandle[i]);
    }
    if (g_ThreadHandleCount) Sleep(THREAD_DELAY);   // wait, ResumeThread is asynchronous
}

Spinlock is implemented in ASM:

PUBLIC spin_lock
PUBLIC spin_unlock

.data
    locked dd 0

.code

spin_lock PROC
    mov eax, 1
    xchg eax, [locked]
    test eax, eax
    jnz spin_lock
    ret
spin_lock ENDP

spin_unlock PROC
    xor eax, eax
    xchg eax, [locked]
    ret
spin_unlock ENDP

END

And called from C:

C++
extern "C"
{
    void spin_lock();
    void spin_unlock();
}

Global variables to hold information about memory ranges, addresses, etc.

C++
MONITOR_ENTRY g_Entry[100];
DWORD g_EntryCount;

FILE *g_File;
DWORD g_index;
PVOID g_Handler;
UCHAR *g_NextInstructionAddress;
UCHAR *g_InvalidOpcodeAddress;
UCHAR *g_DataAddress;
UCHAR *g_Buffer;
DWORD g_Access;
DWORD g_TicksBegin;
BOOL g_Stopped;
BOOL g_Update;

Exported function to start monitor. Memory ranges are constructed from array of strings that hold module names:

C++
__declspec(dllexport) void StartMonitor()
{
    DWORD OldProtect;
    IMAGE_NT_HEADERS64 *Headers;

    read_spec(L"data.bin");

    char* Modules[] = { "{this}" };
    char Buffer[MAX_PATH];
    char *ModuleName;
    int i;

    for (i = 0; (i < ARRAYSIZE(Modules)) && (i < ARRAYSIZE(g_Entry)); ++i)
    {
        if (!strcmp(Modules[i], "{this}")) ModuleName = NULL;
        else ModuleName = Modules[i];

        g_Entry[i].Start = (UCHAR*)GetModuleHandleA(ModuleName);

        if (!ModuleName)
        {
            GetModuleFileNameA((HMODULE)g_Entry[i].Start, Buffer, sizeof(Buffer));
            ModuleName = Buffer + strlen(Buffer) - 1;
            while (*ModuleName != '\\') --ModuleName;
            ++ModuleName;
        }
        else
        {
            strcpy(Buffer, ModuleName);
            ModuleName = Buffer;
        }

        strcat(ModuleName, ".txt");
        g_Entry[i].File = fopen(ModuleName, "w");
        if (!g_Entry[i].File) TerminateProcess(GetCurrentProcess(), 0);

        Headers = (IMAGE_NT_HEADERS64*)((UCHAR*)g_Entry[i].Start + 
                  ((IMAGE_DOS_HEADER*)g_Entry[i].Start)->e_lfanew);
        g_Entry[i].Size = Headers->OptionalHeader.SizeOfImage;
        if (!VirtualProtect(g_Entry[i].Start, g_Entry[i].Size, PAGE_EXECUTE, &OldProtect))
        {
            fprintf(g_Entry[i].File, "VirtualProtect\n");
            TerminateProcess(GetCurrentProcess(), 0);
        }
        g_Entry[i].Counter = 0;
    }
    g_EntryCount = i;
    g_Stopped = FALSE;

    g_File = fopen("default.txt", "w");
    if (!g_File) TerminateProcess(GetCurrentProcess(), 0);

    fprintf(g_File, "StartMonitor : %d\n\n", GetCurrentThreadId());
    fflush(g_File);

    g_Buffer = (UCHAR*)VirtualAlloc(NULL, BUFFER_SIZE, MEM_RESERVE | 
                MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    if (!g_Buffer)
    {
        fprintf(g_File, "VirtualAlloc\n");
        TerminateProcess(GetCurrentProcess(), 0);
    }

    g_TicksBegin = GetTickCount();
    g_Handler = AddVectoredExceptionHandler(TRUE, Handler);
    if (!g_Handler)
    {
        fprintf(g_File, "AddVectoredExceptionHandler\n");
        TerminateProcess(GetCurrentProcess(), 0);
    }
}

Exported function to stop monitor. Actions that we perform:

  1. Acquire lock
  2. Suspend all other threads (because we can't change protection on the fly, in case some thread executes code inside our region)
  3. Change protection to PAGE_EXECUTE_READWRITE
  4. Resume all other threads
  5. Release lock
  6. Remove our exception handler and cleanup
C++
__declspec(dllexport) void StopMonitor()
{
    spin_lock();
    UpdateThreads();
    SuspendThreads();
    DWORD OldProtect;
    for (int i = 0; i < g_EntryCount; ++i)
    {
        if (!VirtualProtect(g_Entry[i].Start, g_Entry[i].Size, 
            PAGE_EXECUTE_READWRITE, &OldProtect))
        {
            fprintf(g_File, "VirtualProtect\n");
            TerminateProcess(GetCurrentProcess(), 0);
        }
    }
    g_Stopped = TRUE;
    ResumeThreads();
    spin_unlock();
    RemoveVectoredExceptionHandler(g_Handler);
    Sleep(THREAD_DELAY * 5);            // wait, windows is asynchronous :-)
    for (int i = 0; i < g_EntryCount; ++i)
    {
        fclose(g_Entry[i].File);
    }
    free_spec();
    VirtualFree(g_Buffer, 0, MEM_RELEASE);
    fprintf(g_File, "StopMonitor : %d, %d\n\n", GetCurrentThreadId(), 
            GetTickCount() - g_TicksBegin);
    fclose(g_File);
}

And the handler itself. Note that fprintf functions can be replaced by functions that write to some buffer that get flushed to file on disk when it is full. Also, we handle MSVC_EXCEPTION just for fun, it serves no purpose in our memory monitor.

C++
LONG NTAPI Handler(EXCEPTION_POINTERS *ExceptionInfo)
{
    Buffer code_buf;
    Instruction inst;
    UCHAR *InstAddress, *DataAddress;
    DWORD InstSize, TrampSize, ExcCode, OldProtect, i, Access;
    
    ExcCode = ExceptionInfo->ExceptionRecord->ExceptionCode;
    if (ExcCode == EXCEPTION_ACCESS_VIOLATION)
    {
        InstAddress = (UCHAR*)ExceptionInfo->ContextRecord->Rip;
        Access = ExceptionInfo->ExceptionRecord->ExceptionInformation[0];
        DataAddress = (UCHAR*)ExceptionInfo->ExceptionRecord->ExceptionInformation[1];

        for (i = 0; i < g_EntryCount; ++i)
        {
            if ((DataAddress >= (UCHAR*)g_Entry[i].Start) && 
               (DataAddress < ((UCHAR*)g_Entry[i].Start + g_Entry[i].Size)))
            {
                spin_lock();
                if (g_Stopped)
                {
                    spin_unlock();
                    return EXCEPTION_CONTINUE_EXECUTION;
                }

                if (Access == 0) fprintf(g_Entry[i].File, "Access: READ\n");
                else if (Access == 1) fprintf(g_Entry[i].File, "Access: WRITE\n");
                else
                {
                    fprintf(g_Entry[i].File, "Access: EXECUTE\n");
                    TerminateProcess(GetCurrentProcess(), 0);
                }

                fprintf(g_Entry[i].File, "Counter: %d\n", g_Entry[i].Counter);
                ++(g_Entry[i].Counter);
                fprintf(g_Entry[i].File, "Thread Id: %d\n", GetCurrentThreadId());
                fprintf(g_Entry[i].File, "Instruction Address: %p\n", InstAddress);
                fprintf(g_Entry[i].File, "Data Address: %p\n", DataAddress);
                if (g_Update)
                {
                    UpdateThreads();
                    g_Update = FALSE;
                }
                SuspendThreads();
                if (!VirtualProtect(g_Entry[i].Start, g_Entry[i].Size, 
                    PAGE_READWRITE, &OldProtect))
                {
                    fprintf(g_Entry[i].File, "VirtualProtect\n");
                    TerminateProcess(GetCurrentProcess(), 0);
                }

                if (Access == 1) fprintf(g_Entry[i].File, "Data Before: ");
                else fprintf(g_Entry[i].File, "Data: ");
                for (int j = 0; j < VAR_SIZE; ++j)
                {
                    fprintf(g_Entry[i].File, "%02hhX ", DataAddress[j]);
                }
                fprintf(g_Entry[i].File, "\n");
                fflush(g_Entry[i].File);

                c_MakeBuffer(InstAddress, 100, (Encoding)0, &code_buf);
                inst_set_params(&inst, MODE_64, C_TRUE, &code_buf, NULL, 
                                SHOW_ADDRESS | SHOW_LOWER | SHOW_PSEUDO);
                if (!decode(&inst))
                {
                    fprintf(g_Entry[i].File, "decode\n");
                    TerminateProcess(GetCurrentProcess(), 0);
                }
                InstSize = code_buf.i;
                GenerateTrampoline(g_Buffer, InstAddress, InstSize, inst.rip, i, &TrampSize);
                GenerateInvalidOpcode(g_Buffer + TrampSize);
                ExceptionInfo->ContextRecord->Rip = (DWORD64)g_Buffer;
                g_NextInstructionAddress = InstAddress + InstSize;
                g_InvalidOpcodeAddress = g_Buffer + TrampSize;
                g_DataAddress = DataAddress;
                g_Access = Access;
                g_index = i;

                return EXCEPTION_CONTINUE_EXECUTION;
            }
        }
    }
    else if (ExcCode == EXCEPTION_ILLEGAL_INSTRUCTION)
    {
        if (ExceptionInfo->ContextRecord->Rip == (DWORD64)g_InvalidOpcodeAddress)
        {
            i = g_index;
            DataAddress = g_DataAddress;
            Access = g_Access;

            if (Access == 1)
            {
                fprintf(g_Entry[i].File, "Data After: ");
                for (int j = 0; j < VAR_SIZE; ++j)
                {
                    fprintf(g_Entry[i].File, "%02hhX ", DataAddress[j]);
                }
                fprintf(g_Entry[i].File, "\n");
            }

            fprintf(g_Entry[i].File, "\n");
            fflush(g_Entry[i].File);

            if (!VirtualProtect(g_Entry[i].Start, g_Entry[i].Size, PAGE_EXECUTE, &OldProtect))
            {
                fprintf(g_Entry[i].File, "VirtualProtect\n");
                TerminateProcess(GetCurrentProcess(), 0);
            }

            ExceptionInfo->ContextRecord->Rip = (DWORD64)g_NextInstructionAddress;
            ResumeThreads();
            spin_unlock();

            return EXCEPTION_CONTINUE_EXECUTION;
        }
    }
    else if (ExcCode == MSVC_EXCEPTION)
    {
        THREADNAME_INFO *info = 
          (THREADNAME_INFO*)ExceptionInfo->ExceptionRecord->ExceptionInformation;

        fprintf(g_File, "Thread Exception: %x %d %p\n", 
          ExcCode, GetCurrentThreadId(), ExceptionInfo->ContextRecord->Rip);
        if (info->szName) fprintf(g_File, "Name: %s\n", info->szName);
        fprintf(g_File, "Id: %d\n\n", info->dwThreadID);
        fflush(g_File);

        return EXCEPTION_CONTINUE_SEARCH;
    }
    fprintf(g_File, "Skip Exception: %x %d %p\n\n", ExcCode, 
            GetCurrentThreadId(), ExceptionInfo->ContextRecord->Rip);
    fflush(g_File);
    return EXCEPTION_CONTINUE_SEARCH;
}

As you can see, we have one default file and each file for each memory region. Default file contents might look like:

Image 1

And file contents for some memory range might look like:

Image 2

To start monitor, we will use the following commands passed to our tool:

inject Monitor.dll
add kernel32.dll export AddVectoredExceptionHandler
addh Handlers.dll : AddVectoredExceptionHandlerHandler to kernel32.dll : 
                    AddVectoredExceptionHandler
wait async

To stop monitor, we will use:

eject-stop
remove kernel32.dll : AddVectoredExceptionHandler

Basically, that's it! Thank you for reading.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)