Tracing and Logging Technologies on Windows. Part 3 - User Mode Handles in the Kernel

Maxim Kartavenkov

5.00/5 (6 votes)

5 Jul 2023CPOL24 min read

5.8K

180

Continue discussion of the simple kernel drivers tracing mechanisms by using output information into the handles of the files pipes and console passed to the driver from the host application.

On the first article we discuss redirecting output from the console of the application into files or pipes. In this article we showing how to pass those handles to the drivers. Discovers underwater stones and possible issues which are related to our target. Related examples implemented in C++ and C#.

Introduction
Communication with the driver
Basic implementation
Handles and platform issue
Closing handle and tracking host process exits
Pipe handles
Console handles
Console output on user mode
Console output implementation in the kernel
Identify handle type
Process space issue with console implementation
Switching into host process space
Creating host process space thread in the kernel
Application lockdown issue with user space thread
Track starting and stopping process threads
Getting number of threads for the process
Code samples

Introduction

From the previous article, we already figured out that the driver can use the handles in the kernel the same way as in user mode with opening shared handles by their names. That was named objects which were accessed from the user and the kernel spaces separately. Opening shared handles with the global name prefix does not always work as it is required to think about possible security issues with the different users, for example, drivers running under local system account, windows services can run under local service account, and the particular applications can run as guests. More of it, since Windows Vista creating mapped sections should have the SeCreateGlobalPrivilege enabled.

What if we try to pass the handle for the file object into the driver and the driver performs output into it? Using handles from user space in a kernel has some underwater stones. You should be aware of lots of possible issues while you are planning to use handles from the user space. All of the issues are described here and by handling them properly, you can be sure that your application and driver will work correctly.

Communication with the Driver

At the start, we should design communication with the driver, as we are planning to pass our own created handle. Good way for that in case of small data input or output is the Device Input and Output Control mechanism (IOCTL). We need to have the ability to enable and disable the output into our user space handle. This can be split into separate device IOCTL messages, but I decided to put that into a single call and with the next structure as an input.

C++

#include <pshpack1.h>
typedef struct _APP_HANDLE_INFO {
    // Application Handle
    HANDLE    Handle;
    // Enable Or Disable
    BOOLEAN Enabled;
}APP_HANDLE_INFO,*PAPP_HANDLE_INFO;
#include <poppack.h>

Which looks in C#:

[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct APP_HANDLE_INFO
{
    // Application Handle
    [MarshalAs(UnmanagedType.SysInt)]
    public IntPtr Handle;
    // Enable Or Disable
    [MarshalAs(UnmanagedType.U1)]
    public bool Enabled;
};

In the structure, we have the handle to our object and a boolean variable which enables or disables this handle as output target. Later, I will explain why we use the structure here. That structure will be passed with DeviceIoControl function which allows communication with the driver. We also prepare control code which our driver will check in the specified dispatch routine.

C++

#define IOCTL_DRIVER_CONFIGURE_HANDLE_OUTPUT        \
        CTL_CODE( FILE_DEVICE_UNKNOWN, 0x801, METHOD_BUFFERED, FILE_ANY_ACCESS )

To handle IOCTL in a driver, we should prepare the dispatch routine for the IRP_MJ_DEVICE_CONTROL which will receive our driver call. And in that routine, we check for the control code and if it equals a IOCTL_DRIVER_CONFIGURE_HANDLE_OUTPUT value, then we process input structure arguments.

C++

EXTERN_C NTSTATUS DriverDispatchDeviceControl(IN PDEVICE_OBJECT pDO, IN PIRP Irp)
{
    PAGED_CODE();
    UNREFERENCED_PARAMETER(pDO);

    NTSTATUS Status = STATUS_SUCCESS;
    PIO_STACK_LOCATION Stack = IoGetCurrentIrpStackLocation(Irp);
    ULONG ControlCode = Stack->Parameters.DeviceIoControl.IoControlCode;

    Irp->IoStatus.Information =
        Stack->Parameters.DeviceIoControl.OutputBufferLength;

    switch (ControlCode) {
    case IOCTL_DRIVER_CONFIGURE_HANDLE_OUTPUT:
    {
        DbgPrint("%S: IOCTL_DRIVER_CONFIGURE_HANDLE_OUTPUT \n", DRIVER_NAME);
        Irp->IoStatus.Status = STATUS_SUCCESS;
        Irp->IoStatus.Information = 0;

        //... Process control message

        break;
    }
    default:
        Irp->IoStatus.Status = STATUS_INVALID_PARAMETER;
        Irp->IoStatus.Information = 0;
        break;
    }
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    return Status;
}

Basic Implementation

The basic handle which we can pass to our driver is the file handle. So we open the file in our console application and the driver will be writing information to it.

C++

APP_HANDLE_INFO info = {0};
info.Enabled = TRUE;
info.Handle = CreateFile(_T("d:\\mylog.txt"), (GENERIC_READ | GENERIC_WRITE), 
    FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwBytesReturned = 0;
// Enable Shared Handle
if (DeviceIoControl(hDevice,IOCTL_DRIVER_CONFIGURE_HANDLE_OUTPUT,
    &info, sizeof(info), NULL,0, &dwBytesReturned,NULL) == 0) {
    _tprintf(_T("DeviceIOControl Failed %d\n"),GetLastError());
}

On the code above, we pass the file handle to the driver with the described method. Same implementation in C# looks like below:

var file = new FileStream(@"d:\mylog.txt", FileMode.Create, 
                          FileAccess.ReadWrite, FileShare.Read);

APP_HANDLE_INFO info = new APP_HANDLE_INFO();
info.Handle = file.SafeFileHandle.DangerousGetHandle();
info.Enabled = true;

int Size = Marshal.SizeOf(info);
IntPtr ptr = Marshal.AllocCoTaskMem(Size);
Marshal.StructureToPtr(info, ptr, false);

// Enable Shared Handle
if (!DeviceIoControl(hDevice, IOCTL_DRIVER_CONFIGURE_HANDLE_OUTPUT,
    ptr, Size, IntPtr.Zero, 0, out BytesReturned, IntPtr.Zero))
{
    Console.WriteLine("DeviceIOControl Failed {0}", DrvCommon.GetLastError());
}

In the driver, we are unable to use that handle directly, we must reopen it for kernel usage. First, when we receive a handle from the user mode, we need to reference it in the kernel. For that, we will use the ObReferenceObjectByHandle kernel API. It has similar to DuplicateHandle user mode API functionality.

C++

// Create Object Reference
Irp->IoStatus.Status = 
    ObReferenceObjectByHandle(
        target->Handle, 0, 
        *IoFileObjectType, UserMode, (PVOID *)&s_pUserObject, NULL);

Then when we make reference to the user object in the kernel, it increases the internal object reference counter and marks that this reference was created in the kernel mode, so we will not be worried that the user mode application exits unexpectedly and the driver crash due handle becomes invalid. To decrease reference, use the ObDereferenceObject API. After increasing the reference of the object, we need to open another handle to that object which will be used in the kernel. That is done with the ObOpenObjectByPointer API.

C++

// Open Kernel Handle
Irp->IoStatus.Status = 
    ObOpenObjectByPointer(
        s_pUserObject,OBJ_KERNEL_HANDLE,
        NULL,GENERIC_WRITE,*IoFileObjectType,KernelMode,
        &s_hOutputKernelHandle
    );

Now we can use the handle variable s_hOutputKernelHandle in the kernel mode. Such handles must be closed with the ZwClose function after use.

We should prepare the routine for closing all user handles objects. It will be called once we receive from application IOCTL with the Enabled field of the APP_HANDLE_INFO structure equal to FALSE.

C++

EXTERN_C VOID CloseUserOutputHandle() {
    PAGED_CODE();
    if (s_hOutputKernelHandle) {
        ZwClose(s_hOutputKernelHandle);
    }
    if (s_pUserObject) {
        ObDereferenceObject(s_pUserObject);
    }
    s_hOutputKernelHandle = NULL;
    s_pUserObject = NULL;
    s_hOutputUserHandle = NULL;
}

To write data, we are going to use the ZwWriteFile API with the handle which we opened previously.

C++

IO_STATUS_BLOCK iosb = { 0 };
CHAR text[] = "Hello From Driver :)\n"
Status = ZwWriteFile(s_hOutputKernelHandle,
    NULL, NULL, NULL, &iosb, (PVOID)text, (ULONG)strlen(text), NULL, NULL);

For testing purposes, we call that in the IRP_MJ_DEVICE_CONTROL dispatch routine right after setting the user handle and check the result. Then we start the application on drive D: appears the file mylog.txt, and once we call DeviceIoControl we have text in that file which was written by the driver.

Handles and Platform Issue

The structure APP_HANDLE_INFO, which defined for IOCTL in our driver, is fine, but let's think about the next situation: we have an x64 operating system, so in the kernel we also install the x64 driver which has the HANDLE size of 64 bit. At the same time, user mode applications which call the driver can be built as x86 or as x64 so the HANDLE in each of those applications will have different sizes. On the quiz picture below, find what process running as x86 build and what as x64.

Note: If you try the sample driver, you should keep in mind that the test application checks the compiled driver in the target folder which is x86 and x64 depending on the compilation platform. So, for trying an x86 test application on the 64-bit Windows platform, you should place the x64 platform driver to the applications folder. Or install the driver by the x64 application with a commented uninstallation code line. There are some command line arguments in the driver test application, so you can use them to check implementation notes.

So, for displaying the mentioned issue, I decided to use structure as an input to the driver. There are two ways to handle this. One is pretty simple - make the structure Handle field as fixed size type instead of HANDLE which can vary depending on application platform. So the structure will be:

C++

#include <pshpack1.h>
typedef struct _APP_HANDLE_INFO {
    // Application Handle
    PVOID64    Handle;
    // Enable Or Disable
    BOOLEAN Enabled;
}APP_HANDLE_INFO,*PAPP_HANDLE_INFO;
#include <poppack.h>

And in same time in C#.

[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct APP_HANDLE_INFO
{
    // Application Handle
    [MarshalAs(UnmanagedType.U8)]
    public long Handle;
    // Enable Or Disable
    [MarshalAs(UnmanagedType.U1)]
    public bool Enabled;
};

Another way is not so simple but more grateful. The structure stays as it is for the application, the changes will be done on the driver side. As we may have on input two different structures for x64 and x86 platforms and our driver has an x64 target, then we need to convert the different inputs into something common to the driver, which is structure for x64 platform. So for the x86 platform which will be delivered to the driver, this structure looks like:

C++

#include <pshpack1.h>
typedef struct _APP_HANDLE_INFO32 {
    // Input Handle
    ULONG32    Handle;
    // Enable Or Disable
    BOOLEAN Enabled;
}APP_HANDLE_INFO32,*PAPP_HANDLE_INFO32;
#include <poppack.h>

If the process is x86, we can detect that with the IoIs32bitProcess function. So if the application is 32 bit, we receive the APP_HANDLE_INFO32 structure then just convert its fields into x64 structure and finally use the last one for the processing. The code of that algorithm:

C++

PAPP_HANDLE_INFO target = (PAPP_HANDLE_INFO)Irp->AssociatedIrp.SystemBuffer;
APP_HANDLE_INFO _info = {0};
// That only need for X64 build
#if defined(_WIN64)
// If host process 32 bit application
if (IoIs32bitProcess(Irp) && cch >= sizeof(APP_HANDLE_INFO32)) {
    PAPP_HANDLE_INFO32 _info32 = (PAPP_HANDLE_INFO32)target;
    // Fill X64 structure
    target = &_info;
    target->Handle = Handle32ToHandle((const void * __ptr32)_info32->Handle);
    target->Enabled = _info32->Enabled;
    cch = sizeof(_info);
}
#endif

We call that code block only if we have a driver built on the x64 platform because on the x86 platform, we will have the same APP_HANDLE_INFO32 structure as the target.

Closing Handle and Tracking Host Process Exits

As mentioned, to close the handle outputs, we call the same IOCTL with the same structure but set Enabled field to FALSE. If we do not call IOCTL to close our handle and application is about to exit, then we should check during the IRP_MJ_CLOSE dispatch routine: if the process, which passes the handle, is closing the driver handle, or it is closed by the system. This we can do with the next way: if the caller process PID is the same as the process which sets the user handle then we should call the closing routine and release user handles.

C++

// Get current process PID
PEPROCESS process = PsGetCurrentProcess();
HANDLE pid = process ? PsGetProcessId(process) : 0;
BOOLEAN bClose = FALSE;
    
if (STATUS_SUCCESS ==
    KeWaitForSingleObject(&s_LockUserHandle, Executive, KernelMode, FALSE, NULL)) {
    // Compare the Id with IOCTL Id
    bClose = (pid != 0 && s_hUserPID == pid);
    KeReleaseMutex(&s_LockUserHandle, FALSE);
}
if (bClose) {
    // Close user handle
    CloseUserOutputHandle();
}

In our implementation during the registration of the user mode handle with IOCTL, we should also save the process PID.

There is a possible situation that the application exits unexpectedly, that can be during a crash or debugger ends the process or someone closes it under task manager. Such a situation we also must keep in mind, as then our handle in a driver becomes invalid and we must close it. For handling that in our driver, we can use the PsSetCreateProcessNotifyRoutine function, it is like DbgSetDebugPrintCallback, which was mentioned in the previous article, has two arguments: first is the callback function pointer and the second is the boolean variable which enables or disables that callback.

C++

// Set Process Callback
PsSetCreateProcessNotifyRoutine(CreateProcessNotifyCallback,FALSE);

The value FALSE of the second argument means that callback is enabled. On the driver unload routine, we should call this function and set the second argument to TRUE. The callback function receives the parent process PID, target process PID and the flag which displays if the target process started or exits. To see how that callback works, we can add a DbgPrint API call to it and see output in the DbgView.

C++

EXTERN_C VOID CreateProcessNotifyCallback(HANDLE ParentId, 
                                          HANDLE ProcessId, BOOLEAN Create) {
    PAGED_CODE();
    UNREFERENCED_PARAMETER(ParentId);
    DbgPrint("Process: %lld %s\n", ProcessId, Create ? "Started" : "Exits");
}

The result of the callback usage in the next screenshot:

As we are saving process PID in our implementation of the registration, the user mode handle with IOCTL then, once that process exits, we close the opened handle.

C++

EXTERN_C VOID CreateProcessNotifyCallback
         (HANDLE ParentId,HANDLE ProcessId,BOOLEAN Create) {
    PAGED_CODE();
    UNREFERENCED_PARAMETER(ParentId);
    DbgPrint("Process: %lld %s\n",ProcessId, Create ? "Started" : "Exits");
    
    // We only interesting for process exits
    if (!Create) {
        BOOLEAN bClose = FALSE;
        if (STATUS_SUCCESS ==
            KeWaitForSingleObject(&s_LockUserHandle, Executive, 
                                  KernelMode, FALSE, NULL)) {
            // Detach our object if target process exits
            bClose = (ProcessId && s_hUserPID == ProcessId);
            KeReleaseMutex(&s_LockUserHandle, FALSE);
        }
        if (bClose) {
            CloseUserOutputHandle();
        }
    }
}

Pipe Handles

I agree that writing data into applications file handles from the driver is not very useful. Another handle which we can try to use for communication is the pipes. So we create a pipe pair and pass the pipe handle for writing to the driver. In the application, we will be waiting for data from the reader pipe and if any data appears on it, we just write that into the console window. There is no need to change code in a driver. All changes were made in the driver control application.

C++

HANDLE handle = NULL;
HANDLE hReadPipe = NULL;
DWORD dwBytesReturned = 0;

CreatePipe(&hReadPipe, &handle, NULL, 0);
APP_HANDLE_INFO info = {0};
info.Handle = handle;
info.Enabled = TRUE;

// Enable Shared Handle
if (DeviceIoControl(hDevice,IOCTL_DRIVER_CONFIGURE_HANDLE_OUTPUT,
    &info, sizeof(info), NULL,0, &dwBytesReturned,NULL) == 0) {
    _tprintf(_T("DeviceIOControl Failed %d\n"),GetLastError());
}

The C# implementation of that part is next.

var server = new AnonymousPipeServerStream(PipeDirection.Out,
                                HandleInheritability.Inheritable);
APP_HANDLE_INFO info = new APP_HANDLE_INFO();
info.Handle = server.SafePipeHandle.DangerousGetHandle();
info.Enabled = true;
int Size = Marshal.SizeOf(info);
IntPtr ptr = Marshal.AllocCoTaskMem(Size);
Marshal.StructureToPtr(info, ptr, false);

// Enable Shared Handle
DeviceIoControl(hDevice, IOCTL_DRIVER_CONFIGURE_HANDLE_OUTPUT,
    ptr, Size, IntPtr.Zero, 0, out BytesReturned, IntPtr.Zero);

Now receiving and displaying data:

C++

HANDLE hHandles[] = { g_hQuit, hReadPipe };
DWORD dwTimeOutput = 1000;
HANDLE hCurrentHandle = GetStdHandle(STD_ERROR_HANDLE);
bool bExit = false;
while (!bExit) {
    DWORD dwRead = 0;
    BYTE buf[1024] = { 0 };
    if (hReadPipe) {
        // Check for any information available on a pipe
        if (PeekNamedPipe(hReadPipe, buf, sizeof(buf), &dwRead, NULL, NULL) 
            && dwRead) {
            // Pull data From pipe
            if (!ReadFile(hReadPipe, buf, sizeof(buf), &dwRead, NULL) || 
                dwRead == 0) {
                break;
            }
        }
    }
    // If Something readed then output it into stderr
    if (dwRead) {
        WriteFile(hCurrentHandle, buf, dwRead, &dwRead, NULL);
    }

    // Check quit event
    DWORD _result = WaitForMultipleObjects
                    (_countof(hHandles),hHandles,FALSE,dwTimeOutput);
        
    if (WAIT_OBJECT_0 == _result) {
        bExit = true;
    }

    // Check If Key Were Pressed
    if (_kbhit()) {
        bExit = true;
    }
}

We wait for the pipe handle or quit event to be signaled. And once data is received through the pipe, we output it into stderr.

.NET implementation looks different, as we must open the receiver pipe separately, with the string identifier from the server pipe.

var pipe = new AnonymousPipeClientStream(PipeDirection.In, 
                                        server.GetClientHandleAsString());
CancellationTokenSource cancel = new CancellationTokenSource();
Stream console = Console.OpenStandardOutput();
pipe.CopyToAsync(console, 4096, cancel.Token);
while (true)
{
    // Wait Until Quit
    if (g_evQuit.WaitOne(1000)) break;
    // Check If Key Were Pressed
    if (Console.KeyAvailable)
    {
        Console.ReadKey();
        break;
    }
}
cancel.Cancel();
cancel.Dispose();
console.Dispose();

For the output, we open the console stream and call the CopyToAsync method which performs all of the work. Once the quit event occurs, we cancel the async task and dispose of all the streams.
The result of execution is as below:

In the driver, we have the same callback functionality for receiving DbgPrint output, which right now passed directly into application with the pipe handle. That way of usage user mode handles is much better, right?

Console Handles

Okay, how about the console? We also can get the console handle which is received from the function GetStdHandle(). That handle can be used in the system with the WriteFile API as we know already from the previous article. So let's try to pass that handle and check the result. Yes, when you try to pass the console handle, you get the error STATUS_NOT_SUPPORTED on the next code in the driver.

C++

Irp->IoStatus.Status = 
    ObOpenObjectByPointer(
        s_pUserObject,OBJ_KERNEL_HANDLE,
        NULL,GENERIC_WRITE,*IoFileObjectType,KernelMode,
        &s_hOutputKernelHandle
    );

If we skip GENERIC_WRITE parameter, the above code works fine. But then, we got an error STATUS_ACCESS_DENIED during writing output text.

C++

// Write text output 
Status = ZwWriteFile(s_hOutputKernelHandle,
    NULL, NULL, NULL, &iosb, (PVOID)text, (ULONG)strlen(text), NULL, NULL);

So we can’t open the handle for writing as the writing operation is not supported. How to make it work, or that is not possible? To answer that question, we should understand what the console is and how we can work with it.

First, let’s check the object name of the console handle which we are passing to our driver. This we can do by calling the NtQueryObject API with an ObjectNameInformation request as ObjectInformationClass. That request returned structure with the type of OBJECT_NAME_INFORMATION.

C++

typedef struct _OBJECT_NAME_INFORMATION {
    UNICODE_STRING Name;
} OBJECT_NAME_INFORMATION, *POBJECT_NAME_INFORMATION;

ULONG ObjectNameInformation = 1;

The structure contains a Unicode string and can have variable size. That's why we should call the NtQueryObject API twice. First time to retrieve the required size and the second time for filling the allocated structure.

C++

HANDLE hHandle = GetStdHandle(STD_OUTPUT_HANDLE);
ULONG _size = 0;
NtQueryObject(hHandle, ObjectNameInformation, NULL, 0, &_size);
// We allocate more space for zero ending unicode string
POBJECT_NAME_INFORMATION text = (POBJECT_NAME_INFORMATION)malloc(_size + 2);
if (text) {
    memset(text, 0x00, _size + 2);
    NtQueryObject(hHandle, ObjectNameInformation, text, _size, &_size);
    if (text->Name.Length > 0 && text->Name.Buffer) {
        wprintf(L"Console Object Name: \"%s\"\n", info ? info : L"", text->Name.Buffer);
    }
    free(text);
}

That API is exported by ntdll.dll and available in user mode, in the kernel, it is possible to use ZwQueryObject or ObQueryNameString functions.
Result of code execution is as follows:

So the handle which we received is the handle to the console driver. In Process Explorer, we can see a couple of instances of ConDrv objects.

Those instances used for communicating to the application as stdin, stdout and stderr handles another three used to communicate with the broker - it is middle level between console host process and application.

Console Output on User Mode

If we look at the Task Manager to our test console application, we see two different processes - one is our application and another is the console host window.

During console creation, which can be done automatically for console application, or manually by AllocConsole API, as we already discussed in the previous article, that child process is created. We are doing output to the console handles in the application and the specified broker object proceeds messages to the console process and performs displaying characters.

If there is no ability to write data into the received handle like in a regular file, then it is possible to communicate with the driver by IOCTL, as we do in our driver for passing the handle structure and enabling output. We can do it in kernel mode and in user mode. In the kernel, we have the ZwDeviceIoControlFile API. To try to make calls to the console driver in user mode, we have the similar NtDeviceIoControlFile API, which is exported from the ntdll.dll library.

Communication IOCTL with the console for input and output is IOCTL_CONDRV_ISSUE_USER_IO, which is defined as follows:

C++

#define IOCTL_CONDRV_ISSUE_USER_IO \
    CTL_CODE(FILE_DEVICE_CONSOLE, 5, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)

That IOCTL accepts the structure CD_USER_DEFINED_IO as input argument.

C++

typedef struct _CD_USER_DEFINED_IO {
    HANDLE Client;
    ULONG InputCount;
    ULONG OutputCount;
    CD_IO_BUFFER Buffers[ANYSIZE_ARRAY];
} CD_USER_DEFINED_IO, *PCD_USER_DEFINED_IO;

This is the basic structure with the description of the list input and output buffers. We must allocate that structure with the size followed by CD_IO_BUFFER structures which is equal to the addition of InputCount and OutputCount fields.

C++

typedef struct _CD_IO_BUFFER {
    ULONG_PTR Size;
    PVOID Buffer;
} CD_IO_BUFFER, *PCD_IO_BUFFER;

At start goes input buffers and after the output. Each buffer contains a data pointer to the argument and the size of that argument. We have two input buffers: the first describes the message structure which we pass to the console driver and the second is the text information. On output, we have only reference to the structure which receives the number of characters which outputs to the console.

C++

typedef struct _CONSOLE_MSG_HEADER {
    ULONG ApiNumber;
    ULONG ApiDescriptorSize;
} CONSOLE_MSG_HEADER, *PCONSOLE_MSG_HEADER;

#include <pshpack4.h>
typedef struct _CONSOLE_WRITECONSOLE_MSG {
    OUT ULONG NumRecords;
    IN BOOLEAN Unicode;
} CONSOLE_WRITECONSOLE_MSG, *PCONSOLE_WRITECONSOLE_MSG;
#include <poppack.h>

typedef struct _CONSOLE_MSG {
    CONSOLE_MSG_HEADER Header;
    CONSOLE_WRITECONSOLE_MSG Msg;
}CONSOLE_MSG, *PCONSOLE_MSG;

Console message contains the header and the payload. The payload has a large union but we are interested in only the console write text operation. The header has an ApiNumber field which defines the operation with the console. For the text output, it should be set to API_NUMBER_WRITECONSOLE value, which is defined next.

C++

#define API_NUMBER_WRITECONSOLE    0x01000006

The NumRecords field of the payload receives on output the number of processed characters.
Next step is to prepare all together and fill up the buffers. Now let's see how that can be implemented.

C++

HMODULE hDll = LoadLibraryW(L"ntdll.dll");
// Console output with IOCTL
typedef NTSTATUS(NTAPI * PFN_NtDeviceIoControlFile) (
    HANDLE, HANDLE, PIO_APC_ROUTINE, PVOID,
    PIO_STATUS_BLOCK, ULONG, PVOID, ULONG, PVOID, ULONG);

PFN_NtDeviceIoControlFile NtDeviceIoControlFile = 
    (PFN_NtDeviceIoControlFile)GetProcAddress(hDll, "NtDeviceIoControlFile");

In the code above, we load the ntdll library and initialize the exported API.

C++

CHAR text[] = "This text will be output into console with direct IOCTL\n";
// Total size of the data: CD_USER_DEFINED_IO with two additional CD_IO_BUFFER 
size_t size = sizeof(CD_USER_DEFINED_IO) + 2 * sizeof(CD_IO_BUFFER);
PCD_USER_DEFINED_IO buffer = (PCD_USER_DEFINED_IO)malloc(size);
HANDLE handle = GetStdHandle(STD_OUTPUT_HANDLE);

We allocate a buffer which has CD_USER_DEFINED_IO with one embedded CD_IO_BUFFER and allocate additional two CD_IO_BUFFER structures, as we have two input buffers and one output. Now prepare structures and fill those buffers.

C++

IO_STATUS_BLOCK iosb = { 0 };
memset(buffer, 0x00, size);
// Initialize message structure
CONSOLE_MSG msg = { API_NUMBER_WRITECONSOLE, sizeof(CONSOLE_WRITECONSOLE_MSG), 0, 0 };
// We not use it 
buffer->Client = NULL;
// Two Input buffers
buffer->InputCount = 2;
// One Output
buffer->OutputCount = 1;
// First Input Message Structure
buffer->Buffers[0].Buffer = &msg;
buffer->Buffers[0].Size = sizeof(msg);
// Second Buffer of the text string
buffer->Buffers[1].Buffer = (PVOID)text;
buffer->Buffers[1].Size = strlen(text);
// The Output resulted number of characters
buffer->Buffers[2].Buffer = &msg.Msg;
buffer->Buffers[2].Size = sizeof(msg.Msg);
// Call API
NTSTATUS Status = NtDeviceIoControlFile(handle,
    NULL, NULL, NULL, &iosb, IOCTL_CONDRV_ISSUE_USER_IO, buffer, (ULONG)size, NULL, 0);

free(buffer);

The result of code execution is as below:

You can check under the breakpoint that the NumRecords field of the output buffer after calling the NtDeviceIoControlFile API is the same as the length of the text which we specify as input argument, which means that entire string has been output to the console buffer.

The .NET implementation is a little harder as required to properly initialize all pointers. First, define the required structure types. In the CD_USER_DEFINED_IO C# wrapper, we do not specify an array of the CD_IO_BUFFER which is followed by the OutputCount field, as we are going to write those values manually.

[StructLayout(LayoutKind.Sequential, Pack = 0)]
class CD_USER_DEFINED_IO
{
    public IntPtr Client;
    [MarshalAs(UnmanagedType.U4)]
    public int InputCount;
    [MarshalAs(UnmanagedType.U4)]
    public int OutputCount;
}

All other structures are defined in the following way:

[StructLayout(LayoutKind.Sequential, Pack = 4, Size = 8)]
class CONSOLE_MSG_HEADER
{
    [MarshalAs(UnmanagedType.U4)]
    public int ApiNumber;
    [MarshalAs(UnmanagedType.U4)]
    public int ApiDescriptorSize;

    public CONSOLE_MSG_HEADER(int ApiNumber, int ApiDescriptorSize)
    {
        this.ApiNumber = ApiNumber;
        this.ApiDescriptorSize = ApiDescriptorSize;
    }
}

[StructLayout(LayoutKind.Sequential, Pack = 4, Size = 8)]
class CONSOLE_WRITECONSOLE_MSG
{
    [MarshalAs(UnmanagedType.U4)]
    public int NumRecords;
    [MarshalAs(UnmanagedType.Bool)]
    public bool Unicode;

    public CONSOLE_WRITECONSOLE_MSG(int NumRecords, bool Unicode)
    {
        this.NumRecords = NumRecords;
        this.Unicode = Unicode;
    }
}

[StructLayout(LayoutKind.Sequential)]
class CONSOLE_MSG
{
    public CONSOLE_MSG_HEADER Header;
    public CONSOLE_WRITECONSOLE_MSG Msg;

    public CONSOLE_MSG
    (int ApiNumber, int ApiDescriptorSize, int NumRecords, bool Unicode)
    {
        Header = new CONSOLE_MSG_HEADER(ApiNumber, ApiDescriptorSize);
        Msg = new CONSOLE_WRITECONSOLE_MSG(NumRecords, Unicode);
    }
}

Filling up the structures with the data.

// Prepare Structures 
CONSOLE_MSG msg = new CONSOLE_MSG(API_NUMBER_WRITECONSOLE,
                  Marshal.SizeOf(typeof(CONSOLE_WRITECONSOLE_MSG)), 0, false);

CD_USER_DEFINED_IO buffer = new CD_USER_DEFINED_IO();
buffer.Client = IntPtr.Zero;
buffer.InputCount = 2;
buffer.OutputCount = 1;

Next step is to allocate the required pointers and fill up the buffers.

// We need 4 pointers
IntPtr[] ptr = new IntPtr[4];
// Here is the sizes of allocated memory for quick access
int[] Sizes = new int[] { StructureSize, Marshal.SizeOf(msg),
    text.Length, Marshal.SizeOf(msg.Msg) };
// Allocate memory
ptr[0] = Marshal.AllocHGlobal(Sizes[0]);
ptr[1] = Marshal.AllocHGlobal(Sizes[1]);
ptr[2] = Marshal.StringToHGlobalAnsi(text);
ptr[3] = Marshal.AllocHGlobal(Sizes[3]);

// Setup Pointers
Marshal.StructureToPtr(buffer, ptr[0], false);
Marshal.StructureToPtr(msg, ptr[1], false);
Marshal.StructureToPtr(msg.Msg, ptr[3], false);

IntPtr p = ptr[0] + StructureSize - 3 * IoBufferSize;
for (int i = 0; i < 3; i++)
{
    Marshal.WriteIntPtr(p, (IntPtr)Sizes[i + 1]);
    Marshal.WriteIntPtr(p + IntPtr.Size, ptr[i + 1]);
    p += IoBufferSize;
}

In the code above, we allocate extra space for the holding of the 3 CD_IO_BUFFER structures. After we copy CD_USER_DEFINED_IO to the pointer, we advise it to the end and manually fill with the CD_IO_BUFFER structures by calling Marshal.WriteIntPtr for each field. So now we are ready to call the NtDeviceIoControlFile API.

// Status
IO_STATUS_BLOCK iosb = new IO_STATUS_BLOCK();
// Console
IntPtr handle = GetStdHandle(STD_OUTPUT_HANDLE);
        
// Write Output
int Status = NtDeviceIoControlFile(handle,
    IntPtr.Zero, IntPtr.Zero, IntPtr.Zero, 
    iosb, IOCTL_CONDRV_ISSUE_USER_IO, ptr[0], StructureSize, IntPtr.Zero, 0);

Marshal.PtrToStructure(ptr[3], msg.Msg);

// msg.Msg.NumRecords - contains number of characters output
// same as iosb.Information

The NtDeviceIoControlFile API in C# has the following declaration:

[DllImport("ntdll.dll")]
[return: MarshalAs(UnmanagedType.U4)]
static extern int NtDeviceIoControlFile(
    [In] IntPtr FileHandle,
    [In, Optional] IntPtr Event,
    [In, Optional] IntPtr ApcRoutine,
    [In, Optional] IntPtr ApcContext,
    [Out, MarshalAs(UnmanagedType.LPStruct)] IO_STATUS_BLOCK IoStatusBlock,
    [In, MarshalAs(UnmanagedType.U4)] int IoControlCode,
    [In] IntPtr InputBuffer,
    [In, MarshalAs(UnmanagedType.U4)] int InputBufferLength,
    [In] IntPtr OutputBuffer,
    [In, MarshalAs(UnmanagedType.U4)] int OutputBufferLength
    );

Result of the execution is the same as in C++ implementation.
If you want to find out more of the console internals, you can check the Microsoft terminal project on github.

Console Output Implementation in the Kernel

Yes, console output with the IOCTL is working and now it’s time to try that in the kernel. In the kernel mode, we already figured out that we can open the object without specifying the GENERIC_WRITE flag, and for writing text to the console window, we can call the ZwDeviceIoControlFile API. The code will look similar to the user mode implementation.

C++

size_t size = sizeof(CD_USER_DEFINED_IO) + 2 * sizeof(CD_IO_BUFFER);
PCD_USER_DEFINED_IO buffer = (PCD_USER_DEFINED_IO)ExAllocatePool(NonPagedPool, size);

if (buffer) {
    memset(buffer, 0x00, size);
    // Prepare console arguments
    CONSOLE_MSG msg = { API_NUMBER_WRITECONSOLE, 
                        sizeof(CONSOLE_WRITECONSOLE_MSG), 0, 0 };

    buffer->Client = NULL;
    buffer->InputCount = 2;
    buffer->OutputCount = 1;
    buffer->Buffers[0].Buffer = &msg;
    buffer->Buffers[0].Size = sizeof(msg);
    buffer->Buffers[1].Buffer = (PVOID)text;
    buffer->Buffers[1].Size = strlen(text);
    buffer->Buffers[2].Buffer = &msg.Msg;
    buffer->Buffers[2].Size = sizeof(msg.Msg);
    // Call console output
    Status = ZwDeviceIoControlFile(s_hOutputKernelHandle,
        NULL, NULL, NULL, &iosb, IOCTL_CONDRV_ISSUE_USER_IO, 
                    buffer, (ULONG)size, NULL, 0);

    ExFreePool(buffer);
}
else {
    Status = STATUS_NO_MEMORY;
}

After we integrate that code into the driver and pass the console handle, we got the results in our test application.

Identify Handle Type

As we keep common implementation for all handle types and passed handles can be either pipe or file or console, then it is required a way to identify them and depending on what object we got, then we use different methods for output. It is possible to identify the object we have with a few ways and depending on what type we have, call either writing function or device control function. The simple way in that case is if we are unable to open an object with the GENERIC_WRITE flag, then use it as a console, otherwise write output in a regular way. But let’s check other ways to identify such objects.

One of those ways we already know - retrieving name of the object by using NtQueryObject API. For the console, we already know what it returns, just look at what we got from other object types.

Not all objects are safe for usage of the NtQueryObject API for the object name, as this API can cause a hang in user mode on a pipe object if that pipe is locked for waiting data. By designing your own application, you can handle that, but if your application is inspecting handles of another process, you should be aware of that. That's why retrieving object names with the NtQueryObject API is not publicly documented. In the kernel mode, such issues do not appear as we can access object header structure directly.

You may know the function for retrieving file name NtQueryInformationFile. In that function, there is a request for the file name: FileNameInformation. That function fills the pre-allocated structure of the type FILE_NAME_INFORMATION.

C++

typedef struct _FILE_NAME_INFORMATION {
    ULONG FileNameLength;
    WCHAR FileName[1];
} FILE_NAME_INFORMATION, *PFILE_NAME_INFORMATION;

FILE_INFORMATION_CLASS FileNameInformation = (FILE_INFORMATION_CLASS)9;

The usage prototype is:

C++

HANDLE handle = GetStdHandle(STD_OUTPUT_HANDLE);
IO_STATUS_BLOCK iosb = { 0 };

ULONG _size = 1024 + sizeof(FILE_NAME_INFORMATION);
PFILE_NAME_INFORMATION information = (PFILE_NAME_INFORMATION)malloc(_size);
if (information) {
    memset(information, 0x00, _size);
    NtQueryInformationFile(hHandle, &iosb, information, _size, FileNameInformation);
    if (information->FileNameLength) {
        wprintf(L"Console Object Name: \"%s\"\n", info ? info : L"", 
                information->FileName);
    }
    free(information);
}

If we run such requests to compare what we got from the previous code example, we receive the next output:

Implementation of the same code in C# is done in the following way:

int size = 1024 + Marshal.SizeOf<FILE_NAME_INFORMATION>();
IntPtr p = Marshal.AllocCoTaskMem(size);

IO_STATUS_BLOCK iosb = new IO_STATUS_BLOCK();
if (p != IntPtr.Zero)
{
    try
    {
        // Clear Memory as Marshal.Alloc not doing so
        ZeroMemory(p, size);
        // Request Information Block
        NtQueryInformationFile(handle, iosb,
                    p, size, FileNameInformation);
        // Length in bytes
        int FileNameLength = Marshal.ReadInt32(p);
        if (FileNameLength > 0 && FileNameLength < size - 4)
        {
            string FileName = Marshal.PtrToStringUni(p + 4, (FileNameLength >> 1));
            Console.WriteLine("{0} FileInformation: \"{1}\"", info, FileName);
        }
    }
    finally
    {
        Marshal.FreeCoTaskMem(p);
    }
}

The function NtQueryInformationFile along with the requesting NtQueryObject for the object name is used in the GetFinalPathNameByHandle API which builds a full path to the file by given handle. And that function is safer for hanging than NtQueryObject.

As we see, the NtQueryInformationFile function only works for the files. So it will not be good for usage in our case.

Another way is to get the type of the device by its handle. Type of the devices definitions you can find in winioctl.h header file, they are started as FILE_DEVICE_*.

C++

//...
#define FILE_DEVICE_VMBUS               0x0000003E
#define FILE_DEVICE_CRYPT_PROVIDER      0x0000003F
#define FILE_DEVICE_WPD                 0x00000040
#define FILE_DEVICE_BLUETOOTH           0x00000041
#define FILE_DEVICE_MT_COMPOSITE        0x00000042
#define FILE_DEVICE_MT_TRANSPORT        0x00000043
#define FILE_DEVICE_BIOMETRIC           0x00000044
#define FILE_DEVICE_PMI                 0x00000045
#define FILE_DEVICE_EHSTOR              0x00000046
#define FILE_DEVICE_DEVAPI              0x00000047
#define FILE_DEVICE_GPIO                0x00000048
#define FILE_DEVICE_USBEX               0x00000049
#define FILE_DEVICE_CONSOLE             0x00000050
#define FILE_DEVICE_NFP                 0x00000051
#define FILE_DEVICE_SYSENV              0x00000052
#define FILE_DEVICE_VIRTUAL_BLOCK       0x00000053
#define FILE_DEVICE_POINT_OF_SERVICE    0x00000054
#define FILE_DEVICE_STORAGE_REPLICATION 0x00000055
#define FILE_DEVICE_TRUST_ENV           0x00000056
//...

Depending on what value of the device type we got, we call IOCTL or file write function. For retrieving device type, we can use NtQueryVolumeInformationFile API. This function is exported from the ntdll library for user mode applications. In the kernel mode with the same API but with Zw prefix: ZwQueryVolumeInformationFile. We should call this function with a FileFsDeviceInformation request. On that request NtQueryVolumeInformationFile API fill passed FILE_FS_DEVICE_INFORMATION structure.

C++

typedef struct _FILE_FS_DEVICE_INFORMATION {
    DEVICE_TYPE DeviceType;
    ULONG       Characteristics;
} FILE_FS_DEVICE_INFORMATION, *PFILE_FS_DEVICE_INFORMATION;

ULONG FileFsDeviceInformation = 4;

We are interested in the DeviceType Field of the structure above. The code example of calling this API for the console handle.

C++

FILE_FS_DEVICE_INFORMATION console_info = { 0 };
IO_STATUS_BLOCK iosb = { 0 };
handle = GetStdHandle(STD_OUTPUT_HANDLE);
NtQueryVolumeInformationFile(handle, &iosb, 
    &console_info, sizeof(console_info), FileFsDeviceInformation);

The output types for different object handles is displayed on the next screenshot.

If you look for those definition names, you can see those values are in respect of FILE_DEVICE_DISK, FILE_DEVICE_NAMED_PIPE and FILE_DEVICE_CONSOLE. So in a driver code, we also be able to setup processing method based on that information.

In .NET, we are also able to have implementation of the above code.

int Size = Marshal.SizeOf(typeof(FILE_FS_DEVICE_INFORMATION));
IntPtr ptr = Marshal.AllocCoTaskMem(Size);
IO_STATUS_BLOCK iosb = new IO_STATUS_BLOCK();

IntPtr handle = GetStdHandle(STD_OUTPUT_HANDLE);
NtQueryVolumeInformationFile(handle, iosb, ptr, Size, FileFsDeviceInformation);
Marshal.PtrToStructure(ptr, console_info);

We allocate a pointer with the FILE_FS_DEVICE_INFORMATION structure size, and use this pointer as an argument to the NtQueryVolumeInformationFile API. After function returns, we convert pointer data into the actual structure variable.

Process Space Issue with Console Implementation

So, we got the output from the driver into the process console. It is very cool, but we call the text output into the console right after we receive the IOCTL from the application. That means that the was on the same thread as an application space. But what if we have called our API from the DbgPrint callback? In that case, we have not seen any output and the error code which we have after execution of ZwDeviceIoControlFile API is STATUS_NOT_SUPPORTED. The result is displayed on the next screenshot.

Switching into Host Process Space

There are two ways which we can use as the solution for the situation above. First is the simple one as usual. During the call of the console driver with IOCTL, we should switch thread context into our application process.

That can be done with the KeStackAttachProcess API.

C++

KAPC_STATE State = { 0 };
// Switch to Target Console Process 
KeStackAttachProcess(s_Process, &State);

But, before, during setting up the handle, we should save the process structure to use it in that function. The s_Process variable has a type of PEPROCESS.

C++

if (NT_SUCCESS(Irp->IoStatus.Status)) {
    s_Process = PsGetCurrentProcess();
}

And switch context back right after by the KeUnstackDetachProcess API.

C++

// Switch Process Back  
KeUnstackDetachProcess(&State);

The code within the attached block to another process address space should be very simple. Although Microsoft does not recommend calling any drivers within the attached context block, but we are calling the console object which is created in that process.
You can see the result of the solution in the next screenshot.

Anyway, if you are afraid to use those APIs, then you can try another method to solve the described issue.

Creating Host Process Space Thread in the Kernel

The second way for the solution is the most complex, but it displays more interesting internals of the system. For the solution, we can create a kernel thread which will be executed under the host process. Sounds impossible? If we look at the kernel function PsCreateSystemThread for creating threads, it contains the input parameter of ProcessHandle at which you can specify the host process handle, and the thread will be created on that process space. We will be creating that thread while the driver receives IOCTL to enable the user handle.

C++

HANDLE hThread;
KeResetEvent(&s_EvQuit);
// Start Output Thread
Status = PsCreateSystemThread(&hThread, 0, NULL, 
         ZwCurrentProcess(), NULL, ConsoleHandlerThread, NULL);
if (NT_SUCCESS(Status)) {
    Status = ObReferenceObjectByHandle(hThread, GENERIC_READ | GENERIC_WRITE,
        NULL, KernelMode, (PVOID *)&s_pThreadObject, NULL);
}
if (!NT_SUCCESS(Status)) {
    // Set drop event once we have error
    KeSetEvent(&s_EvQuit, IO_NO_INCREMENT, FALSE);
}

We will do the shutdown of that thread once IOCTL disables that handle. The thread will be quit once the specified event will be set into signaled state. Once that event is signaled, we will wait until the thread exits.

C++

KeSetEvent(&s_EvQuit, IO_NO_INCREMENT, FALSE);
PFILE_OBJECT pThread = NULL;
if (STATUS_SUCCESS ==
    KeWaitForSingleObject(&s_LockUserHandle, Executive, KernelMode, FALSE, NULL)) {
    pThread = s_pThreadObject;
    s_pThreadObject = NULL;
    KeReleaseMutex(&s_LockUserHandle, FALSE);
}
if (pThread) {
    KeWaitForSingleObject(pThread, Executive, KernelMode, FALSE, NULL);
    ObDereferenceObject(pThread);
}

Now we need to prepare the ability to process the text output. We will put the text messages into the list and signal to a specified event for the thread to start processing messages from the list.

C++

size_t cch = strlen(text) + 1;
PIO_OUTPUT_TEXT item = (PIO_OUTPUT_TEXT)ExAllocatePool
                       (NonPagedPool, sizeof(IO_OUTPUT_TEXT));
if (item) {
    memset(item, 0x00, sizeof(IO_OUTPUT_TEXT));
    // Put text into Thread Queue
    // Wait For List Mutex 
    if (STATUS_SUCCESS == (Status = KeWaitForSingleObject(
            &s_ListLock, Executive, KernelMode, FALSE,
            PASSIVE_LEVEL != KeGetCurrentIrql() ? &time_out : NULL))) {
        item->Text = (CHAR*)ExAllocatePool(NonPagedPool, cch);
        memcpy(item->Text, text, cch);
        // Insert entry into the list
        InsertTailList(&s_List, &(item->Entry));
        // Notify that we have some data
        KeSetEvent(&s_EvHaveData, IO_NO_INCREMENT, FALSE);
        KeReleaseMutex(&s_ListLock, FALSE);
    }
    else {
        ExFreePool(item);
    }
}

The call thread performed in case of the IRQL level execution is higher than the passive level, or the PID of the caller process does not equate to target process PID. The structure IO_OUTPUT_TEXT in current implementation contains the Entry field which is used to put elements into the list.

C++

typedef struct
{
    // List Entry
    LIST_ENTRY        Entry;
    // Text For Output
    CHAR          *   Text;
}IO_OUTPUT_TEXT,*PIO_OUTPUT_TEXT;

The actual thread code looks like:

C++

// Console Thread Function
VOID ConsoleHandlerThread(PVOID Context)
{
    PAGED_CODE();
    UNREFERENCED_PARAMETER(Context);
    PVOID hEvents[2] = { 0 };
    hEvents[0] = &s_EvHaveData;
    hEvents[1] = &s_EvQuit;

    while (TRUE) {
        // Wait For Event Of Quit Or Data Arriving
        NTSTATUS Status = KeWaitForMultipleObjects(2, hEvents, 
            WaitAny, Executive, KernelMode, FALSE, NULL, NULL);
        if (Status == STATUS_WAIT_0) {
            while (TRUE) {
                PIO_OUTPUT_TEXT item = NULL;
                // Lock List Mutex
                if (STATUS_SUCCESS == KeWaitForSingleObject(&s_ListLock, 
                                        Executive, KernelMode, FALSE, 0)) {
                    if (!IsListEmpty(&s_List)) {
                        // Extract record from start of the list 
                        PLIST_ENTRY entry = s_List.Flink;
                        if (entry) {
                            item = CONTAINING_RECORD(entry, IO_OUTPUT_TEXT, Entry);
                            RemoveEntryList(entry);
                        }
                    }
                    if (!item) {
                        // Reset data fag if no records 
                        // as we have manual reset event
                        KeResetEvent(&s_EvHaveData);
                    }
                    KeReleaseMutex(&s_ListLock, FALSE);
                }
                if (!item) break;
                if (item->Text) {
                    // Actual Output Writing
                    WriteUserHandleOutputText(item->Text);
                    ExFreePool(item->Text);
                }
                ExFreePool(item);
            }
        } else {
            break;
        }
    }
    // Just mark that we are done
    KeSetEvent(&s_EvQuit, IO_NO_INCREMENT, FALSE);
    PsTerminateSystemThread(STATUS_SUCCESS);
}

Now we can start the test application and see the result.

Application Lockdown Issue with User Space Thread

The application with additional thread from the kernel also works well, and if you open the Process Explorer tool, you can see the additional thread with the start address drv2.sys!ConsoleHandlerThread. You will not be able that thread to see in Visual Studio threads window once you pause the execution. More of it: that thread is not able to be shutdown from the Process Explorer. So, in case our application exits unexpectedly and doesn't call the IOCTL to disable the handle, then the application hangs, as the process thread in the kernel will still be active. That situation you can reproduce with the example application just exit the main thread right after calling IOCTL for enabling handle. The ‘quit’ command line argument can be used for that.

As you can see in the picture, the application does not contain any user mode threads, but it is still running. And it is not available to be closed nor by Task Manager nor by Process Explorer.
To avoid such a situation, we can add additional IOCTL to our application.

C++

// IOCTL Fallback For quit if hang happening
#define IOCTL_DRIVER_SHUTDOWN_OUTPUT        \
        CTL_CODE( FILE_DEVICE_UNKNOWN, 0x802, METHOD_BUFFERED, FILE_ANY_ACCESS )

By receiving this IOCTL, the driver will stop the thread if it exists.

C++

case IOCTL_DRIVER_SHUTDOWN_OUTPUT:
    CloseUserOutputHandle();
    break;

In application, we control that with the additional command line argument: ‘shutdown’.

C++

// Shutdown fallback
if (argc > 1 && _stricmp(argv[1],"shutdown") == 0 ) {

    if (DeviceIoControl(hDevice,IOCTL_DRIVER_SHUTDOWN_OUTPUT,
        NULL, 0, NULL,0, &dwBytesReturned,NULL) == 0) {
        _tprintf(_T("DeviceIOControl Failed %d\n"),GetLastError());
    }
    CloseHandle(hDevice);
    return 0;
}

C# implementations also has the same handler for the same argument.

Track Starting and Stopping Process Threads

The process start and stop callback does not work in the previously described situation, as our process does not quit due to outstanding thread. But in the system, there is the API which can set up the callback to track thread starting or stopping. The implementation of this is not similar to the previous callback initialization functions and their arguments. In the case of this callback function, we have two different APIs to enable and disable callback.

To enable callback, we should start PsSetCreateThreadNotifyRoutine API and pass the callback function address.

C++

// Set Thread Callback
PsSetCreateThreadNotifyRoutine(CreateThreadNotifyRoutine);

Enabling callback we do under DriverEntry implementation. For disabling our callback, we should use the PsRemoveCreateThreadNotifyRoutine API and pass the same callback address which we want to disable. The actual callback function looks next.

C++

EXTERN_C VOID CreateThreadNotifyRoutine(HANDLE ProcessId, 
                                        HANDLE ThreadId, BOOLEAN Create) {
    UNREFERENCED_PARAMETER(ThreadId);
    PAGED_CODE();
    DbgPrint("Thread %lld in Process: %lld %s\n", 
        ThreadId, ProcessId, Create ? "Started" : "Exits");
}

As arguments, we receive the process ID, Thread ID and the boolean flag which have the meaning of thread created or exits. We add the DbgPrint call to track how that callback works.

In the actual callback, we need to have functionality which tracks the number of threads for our process, then the additional thread is started and then only one kernel thread stays active - then closes it to perform proper application quitting. Let’s extend the thread callback functionality for that.

C++

BOOLEAN bClose = FALSE;
if (STATUS_SUCCESS ==
    KeWaitForSingleObject(&s_LockUserHandle, Executive, KernelMode, FALSE, NULL)) {
    // Our Process Threads Changed
    if (s_pThreadObject && ProcessId && s_hUserPID == ProcessId) {
        if (!Create) {
            if (1 >= --s_nThreadsCount) {
                bClose = TRUE;
            }
        }
        else {
            s_nThreadsCount++;
        }
    }
    KeReleaseMutex(&s_LockUserHandle, FALSE);
}
// Finally close user handle
if (bClose) {
    CloseUserOutputHandle();
}

In the function, we see that we are checking threads of the specified process only. If thread is created, we increment threads counter and then it exits, we decrement counter.

Getting Number of Threads for the Process

Okay, for the proper functionality implementation above, we need to get the initial number of threads, which is the variable s_nThreadsCount in the code, for the process in IOCTL once we enable output to the handle. We know the tool helper library API in the user mode which allows us to do that, but it is not available in the kernel. So maybe there is another API which we can use? And yes, that function is NtQuerySystemInformation for user mode application and the ZwQuerySystemInformation for kernel mode. We should perform a request for the SystemProcessInformation as SYSTEM_INFORMATION_CLASS. On success function fill buffer allocated by the user with the SYSTEM_PROCESS_INFORMATION structures Otherwise, on failure with code STATUS_INFO_LENGTH_MISMATCH function set the required number of bytes to be allocated to fill all information.

C++

NTSTATUS Status;
ULONG Length = 0x10000;
PVOID p = NULL;
// Allocate memory 
while (TRUE) {
    Status = STATUS_NO_MEMORY;
    ULONG Size = Length;
    p = realloc(p,Size);
    if (p) {
        Status = NtQuerySystemInformation(SystemProcessInformation,p,Size,&Length);
        if (Status == STATUS_INFO_LENGTH_MISMATCH) {
            // Align Memory
            Length = (Length + 0x1FFF) & 0xFFFFE000;
            continue;
        }
    }
    break;
}

Once we get the structures, we can enumerate them by shift structure pointer by the number of bytes specified in the NextEntryOffset field of the given structure. If that value is equal to zero, then we have the last entry. The code example for displaying processes and number of threads in the user mode application.

C++

SYSTEM_PROCESS_INFORMATION * pi = (SYSTEM_PROCESS_INFORMATION *)p;
UCHAR * end = (UCHAR*)p + Length;
// Check each process informatino structure
while (pi && (UCHAR*)pi < end) {
    WCHAR temp[512] = {0};
    memset(temp,0x00,sizeof(temp));
    Length = pi->ImageName.Length;
    // Copy FileName
    if (pi->ImageName.Buffer && Length) {
        if (Length > sizeof(temp) - 2) {
            Length = sizeof(temp) - 2;
        }
        memcpy_s(temp,sizeof(temp),pi->ImageName.Buffer,Length);
    }
    // Print Output Process Information
    wprintf(L"Process [%d]\t'%s' Threads: %d\n", 
        HandleToUlong(pi->UniqueProcessId),temp, pi->NumberOfThreads);
    // Last entry 
    if (!pi->NextEntryOffset) {
        break;
    }
    // Shift to next structure
    pi = (SYSTEM_PROCESS_INFORMATION *)(((UCHAR*)pi) + pi->NextEntryOffset);
}

You can see the result of the code execution in the next picture.

Similarly, we can do in C# by making a wrapper of the NtQuerySystemInformation function and the SYSTEM_PROCESS_INFORMATION structure. First, we allocate memory.

int Status;
int Length = 0x10000;
IntPtr p = IntPtr.Zero;
// Allocate memory 
while (true)
{
    Status = STATUS_NO_MEMORY;
    int Size = Length;
    p = Marshal.ReAllocCoTaskMem(p, Size);
    if (p != IntPtr.Zero)
    {
        ZeroMemory(p, Size);
        Status = NtQuerySystemInformation(SystemProcessInformation, p, Size, out Length);
        if (Status == STATUS_INFO_LENGTH_MISMATCH)
        {
            // Align Memory
            Length = (int)(((long)Length + 0x1FFF) & 0xFFFFE000);
            continue;
        }
    }
    break;
}

And then, iterate through the resulting process information structures.

IntPtr pi = p;
IntPtr end = p + Length;
// Check each process informatino structure
while (pi != IntPtr.Zero && pi.ToInt64() < end.ToInt64())
{
    var info = Marshal.PtrToStructure<SYSTEM_PROCESS_INFORMATION>(pi);
    string temp = "";
    // Copy FileName
    if (info.ImageName.Length > 0 
        && info.ImageName.Length < info.ImageName.MaximumLength)
    {
        temp = info.ImageName.Buffer;
    }
    // Print Output Process Information
    Console.WriteLine("Process [{0}]\t'{1}' Threads: {2}",
        info.UniqueProcessId.ToInt32(), temp, info.NumberOfThreads);
    // Last entry 
    if (info.NextEntryOffset == 0)
    {
        break;
    }
    // Shift to next structure
    pi = pi + info.NextEntryOffset;
}

Once we run the code, the result is the same as on C++ sample.
So now we are ready to integrate that into the drivers code and initialize the s_nThreadsCount variable mentioned above.

C++

NTSTATUS Status = STATUS_SUCCESS;
typedef NTSTATUS(NTAPI * PFN_ZwQuerySystemInformation)(ULONG, PVOID, ULONG, PULONG);
UNICODE_STRING Name = { 0 };
RtlInitUnicodeString(&Name, L"ZwQuerySystemInformation");
PFN_ZwQuerySystemInformation ZwQuerySystemInformation = 
            (PFN_ZwQuerySystemInformation)MmGetSystemRoutineAddress(&Name);
if (ZwQuerySystemInformation) {
    ULONG Length = 0x10000;
    PVOID p = NULL;
    while (TRUE) {
        ULONG Size = Length;
        p = ExAllocatePool(NonPagedPool, Size);
        if (p) {
            Status = ZwQuerySystemInformation
                     (SystemProcessInformation, p, Size, &Length);
            if (Status != STATUS_INFO_LENGTH_MISMATCH) {
                break;
            }
            ExFreePool(p);
            p = NULL;
            Length = (Length + 0x1FFF) & 0xFFFFE000;
        }
        else {
            Status = STATUS_NO_MEMORY;
            break;
        }
    }
    if (NT_SUCCESS(Status)) {
        Status = STATUS_NOT_FOUND;
        SYSTEM_PROCESS_INFORMATION * pi = (SYSTEM_PROCESS_INFORMATION *)p;
        UCHAR * end = (UCHAR *)p + Length;
        while (pi && (UCHAR *)pi < end) {
            if (pi->UniqueProcessId == hPID) {
                nCount = pi->NumberOfThreads;
                break;
            }
            pi = (SYSTEM_PROCESS_INFORMATION *)(((UCHAR *)pi) + pi->NextEntryOffset);
            if (!pi->NextEntryOffset) {
                break;
            }
        }
    }
    if (p) {
        ExFreePool(p);
    }
}

We are using the MmGetSystemRoutineAddress API to get the ZwQuerySystemInformation function pointer dynamically. We put that implementation into the separate function. And call it to initialize the number of threads variable on IOCTL handler.

C++

s_nThreadsCount = GetProcessThreadsCount(pid);

After those changes, we can see that the drive handles properly unexpected application exits and closes internal kernel thread.

Code Samples

Code to that part is available to be downloaded. If you want to try out the sample drivers for this part, it can be compiled from the sources. It is configured to compile with the WDK toolset from Visual Studio. During build, it creates the test certificate drvN.cer and signs the driver. To be able to use the driver, you need to install that certificate on your PC and enable test mode on the system, or disable driver signing checking. Driver test application will work from the admin only as it loads and unloads the driver which requires admin rights.

History

5^th July, 2023: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Tracing and Logging Technologies on Windows. Part 3 - User Mode Handles in the Kernel

Table of Contents

History

License