[Apologies for the misformatted code snippets that appear sporadically in this article. I've been over these, deleted then retyped them, redone the formatting, all to no avail. As near as I can tell, this appears to be a bug in the CodeProject HTML - if not, I'm just not seeing what the issue is. The line breaks are not present when I submit the article. They show up afterward; horizontal scrolling in the snippets turns off and the line breaks appear out of nowhere.]
The source code for this article can be downloaded from http://www.starjourneygames.com/demo part 3.zip
Beginning the Main Module
Unlike 32-bit assembly, relevant directives (such as .686P) are few and far between. I have not yet had occasion to use one. With this being the case, the main module begins with the following:
include constants.asm ;
include externals.asm ;
include macros.asm ;
include structuredefs.asm ;
include wincons.asm ;
.data ;
include lookups.asm ;
include riid.asm ;
include routers.asm ;
include strings.asm ;
include structures.asm ;
include variables.asm ;
.code ;
With this, an adjustment is made to the main source file’s beginning so that utility files, such as typedefs, structure declarations, and macros, come before the .data directive. The files that contain no code or data (those appearing before the .data directive) define macros, structures, etc. for the compiler. They generate no actual output, so these can be placed before the .data directive, where the compiler doesn't yet have any idea where to place generated output and thus would reject actual code or data.
The linker will set the entry point (discussed next) at the start of the .code section, so the ordering of the code and data sections is irrelevant. This app places data first.
Each of the include files listed above is self-explanatory as to its content. The accompanying source code contains the complete files.
The Windows Entry Point
An application’s WinMain function isn’t really its entry point. When an assembly language application declares its code segment with
.code
the first executable instruction after .code becomes the app’s true entry point. WinMain is purely superfluous and is not actually required at all. I don’t use it; in an assembly language application it provides no benefit over creating the app without it. While one could argue that the parameters passed to WinMain might be critical to initializing an application, that argument is zero sum when it comes to an assembly app because you, the developer, need to set up that call to WinMain if you’re going to use it. If you have to retrieve all the information typically sent to WinMain before calling it, why use the function at all?
That said, there is certainly no measurable harm in including WinMain if you want to do it. You may have sound reasons for wanting to include it, so add it if you feel it’s necessary. Just be aware that your own startup code – which begins execution immediately after the .code statement – must manually set up the WinMain call.
To call WinMain, the nCmdShow parameter is retrieved from the wShowWindow field of the STARTUPINFO structure, which is passed to GetStartupInfo.
GetCommandLine returns the lpCmdLine parameter; pass that value as-is to WinMain.
hPrevInstance is always null, per MSDN documentation for WinMain.
hInstance is retrieved by calling GetModuleHandle (0).
Below is the complete initialization source for calling WinMain. Note that even if you're not going to include WinMain, you may still need some or all of the parameters that are passed to it. The discussion that follows covers retrieval of this information, with or without implementation of WinMain.
Declare the STARTUPINFO structure in structuredefs.asm (or wherever you prefer to put it):
STARTUPINFO struct
cb qword sizeof ( STARTUPINFO )
lpReserved qword ?
lpDesktop qword ?
lpTitle qword ?
dwX dword ?
dwY dword ?
dwXSize dword ?
dwYSize dword ?
dwXCountChars dword ?
dwYCountChars dword ?
dwFillAttribute dword ?
dwFlags dword ?
wShowWindow word ?
cbReserved2 word 3 dup ( ? )
lpReserved2 qword ?
hStdInput qword ?
hStdOutput qword ?
hStdError qword ?
STARTUPINFO ends
In the externals.asm file, declare the functions to be called for initialization:
extrn __imp_GetCommandLineA:qword
GetCommandLine textequ <__imp_GetCommandLineA>
extrn __imp_GetModuleHandleA:qword
GetModuleHandle textequ <__imp_GetModuleHandleA>
extrn __imp_GetStartupInfoA:qword
GetStartupInfo textequ <__imp_GetStartupInfoA>
If you’re using Unicode, you should declare GetCommandLineW, GetModuleHandleW, and GetStartupInfoW instead of the “A” functions shown above.
With the required functions now a known quantity to the compiler, variables need to be declared for holding the parameters to pass to WinMain – these are placed in the file variables.asm:
hInstance qword ?
lpCmdLine qword ?
In the file structures.asm, declare the STARTUPINFO structure:
startup_info STARTUPINFO <> ; cbSize is already set in the structure declaration
The entry point to the application can then be coded as follows (Startup can be renamed to anything you like - if you're going to call it WinMain, be aware that it doesn't inherently conform to the documentation for WinMain):
.code
align qword
Startup proc ; Declare the startup function; this is declared as /entry in the linker command line
local holder:qword ; Required for the WinCall macro
xor rcx, rcx ; The first parameter (NULL) always goes into RCX
WinCall GetModuleHandle, 1, rcx ; 1 parameter is passed to this function
mov hInstance, rax ; RAX always holds the return value when calling Win32 functions
WinCall GetCommandLine, 0 ; No parameters on this call
mov lpCmdLine, rax ; Save the command line string pointer
lea rcx, startup_info ; Set lpStartupInfo
WinCall GetStartupInfo, 1, rcx ; Get the startup info
xor rax, rax ; Zero all bits of RAX
mov ax, startup_info.wShowWindow ; Get the incoming nCmdShow
; Since this is the last "setup" call, there is no reason to place nCmdShow into a memory variable then
; pull it right back out again to pass in a register. Register-to-register moves are exponentially
; faster than memory access, so all that needs to be done is to move RAX into R9 for the call to WinMain.
mov r9, rax ; Set nCmdShow
mov r8, lpCmdLine ; Set lpCmdLine
xor rdx, rdx ; Zero RDX for hPrevInst
mov rcx, hInstance ; Set hInstance
call WinMain ; Execute call to WinMain
xor rax, rax ; Zero final return – or use the return from WinMain
ret ; Return to caller
Startup endp ; End of startup procedure
NOTE: If you’re strictly adhering to 64-bit calling convention, then the startup code’s call to WinMain should use the WinCall macro and not a direct call instruction. I don’t do this in my apps, and I won’t be doing it in this sample code. Stack usage means memory usage, and my first rule of programming is to avoid memory hits. When accessing the shadow area on the stack for RCX, RDX, R8, and R9 during a call, indirect addressing must be used, which further slows the app. As mentioned earlier, the in trepidation over violating the 64-bit calling convention may be strong. However it will still be unfounded as far as necessity is concerned. I simply see no benefit to using that convention within my app’s local functions – it requires a relatively large amount of setup code, which, across an entire app, takes too much time to execute; it costs memory, and it adds to development time. Any function declared within my app (including this one) does not use the 64-bit calling convention directly – parameters are still passed in RCX, RDX, R8, and R9 for the first four, but space for shadowing them is not reserved on the stack. For additional parameters, I simply use other registers. The stack is not used for any parameter data. Some will call this reckless, but “reckless” would only be relative to the standard being compared to.
Further, in my own apps, calls to local functions use a single pointer (in RCX) pointing to a structure that holds all the parameters to be passed to that function. Doing things any other way, the first thing most local functions would need to do would be to save the incoming parameters in local variables for later access; from the CPU’s perspective this is little different from using the 64-bit convention as it was created. I’m not doing that here; individual registers carry the parameters into local functions as they’re called. The reason for this is that from a tutorial standpoint it’s much more confusing to work with a pointer to an array of parameters for every function, many of which will be pointers to other things. When is enough, enough? So it is again reiterated: modify the code as required to suit your own preferences.
The Almighty RAX
Born in the earliest recorded incarnations of Intel CPU design, the RAX register (building on EAX, which built on AX) always holds the return value from a function. I have not seen an exception to this rule in any WinAPI function or method, no matter what it is. Even drivers use it across the board. All functions within any application I write do the same, so it will apply here: no matter what you call, when you call it, or where you call it from, RAX holds the return value.
Registering the Window Class
To create the app’s main window, its window class has to be registered. This necessitates declaring the WNDCLASSEX structure, which in this sample code will be done in a separate file called structuredefs.asm. As always, you’re free to move things around and rename files as you see fit. However, at this point an adjustment is being made to the main source file’s beginning so that utility files, such as typedefs, structure declarations, and macros, come before the .data directive.
The following is placed in the structuredefs.asm file:
WNDCLASSEX struct
cbSize dword ?
dwStyle dword ?
lpfnCallback qword ?
cbClsExtra dword ?
cbWndExtra dword ?
hInst qword ?
hIcon qword ?
hCursor qword ?
hbrBackground qword ?
lpszMenuName qword ?
lpszClassName qword ?
hIconSm qword ?
WNDCLASSEX ends
Nesting structures will be covered when the subject of DirectX structures is delved into – when there are actual examples to work with.
Mind your D’s and Q’s – be careful when copying code to accurately type dword and qword declarations. One typo here could (and probably would) crash the app.
Strictly for the sake of keeping the source readable, I use a single constant to represent long chains of flags that are logically OR’d together. In this app, WNDCLASSEX.dwStyle will equate to:
classStyle equ CS_VREDRAW OR CS_HREDRAW OR CS_DBLCLKS OR CS_OWNDC OR CS_PARENTDC
(The CS_xxx constants come from the Windows header files; they're declared in wincons.asm in the source code attached to this article.)
The OR directive is reserved by the assembler; it functions the same as the | character in most other languages. I add the above line to my constants.asm file; you can place it (if you use it at all) wherever you like.
With this done, the actual WNDCLASSEX structure, which will be used to create the main window, can now be declared. In this sample code, it’s done in the structures.asm file. This is where all fixed data that can be is placed directly into the structure fields. There’s no reason to move data around any more than is required, so on-the-fly initialization where it isn’t required makes no sense at all. With WNDCLASSEX, the hInst, hIcon, hbrBackground, and hIconSm fields won’t be known until runtime, so they’re initialized at zero.
There are several options when declaring the actual data for the WNDCLASSEX structure (which will be named wcl). The most traditional form can be used:
wcl WNDCLASSEX <sizeof (WNDCLASSEX), [. . .]>
Using the format above, all fields must be accounted for if any are declared (however this will vary depending on the actual assembler being used).
The way data is declared in assembly language allows for some flexibility in declaring structures, and at least for me, this comes in handy. Declaring wcl as a label of type WNDCLASSEX allows me to list each field separately, while still having the debugger recognize the structure as WNDCLASSEX. In the source code, I prefer having access to the individual fields per-line; it keeps the source cleaner and makes things easier to read and update. Of course, doing things this way also opens the door to errors; if the field-by-field declaration doesn’t exactly match the WNDCLASSEX definition, there are going to be problems. So using this method may not be for everybody. If you don’t like it, just use the first form shown above.
wcl label WNDCLASSEX
dword sizeof ( WNDCLASSEX ) ; cbSize
dword classStyle ; dwStyle
qword mainCallback ; lpfnCallback
dword 0 ; cbClsExtra
dword 0 ; cbWndExtra
qword ? ; hInst
qword ? ; hIcon
qword ? ; hCursor
qword ? ; hbrBackground
qword mainName ; lpszMenuName
qword mainClass ; lpszClassName
qword ? ; hIconSm
The function mainCallback is the callback function for the window class; it will be discussed next. The variable mainClass is the class name string, which I define in the strings.asm file, along with the main window name (window text), as:
mainClass byte ‘DemoMainClass’, 0 ; Main window class name
mainName byte ‘Demo Window’, 0 ; Main window title bar text
Note that the terminating 0 must be declared at the end of the string. The assembler doesn’t automatically place it.
From this point forward, assume that any constant used in Win32 calls must be declared somewhere in your source. I put my Win32 constants in the file wincons.asm, and constants that are unique to my app in the file constants.asm.
To fill in hCursor, LoadImage is called as follows. In constants.asm, declare:
lr_cur equ LR_SHARED OR LR_VGACOLOR OR LR_DEFAULTSIZE
This is an optional step. If you prefer to use the values directly, simply replace lr_cur below with LR_SHARED OR LR_VGACOLOR OR LR_DEFAULTSIZE.
xor r11, r11 ; Set cyDesired; uses default if zero: XOR R11 with itself zeroes the register
xor r9, r9 ; Set cxDesired; uses default if zero: XOR R9 with itself zeroes the register
mov r8, image_cursor ; Set uType
mov rdx, ocr_normal ; Set lpszName
xor rcx, rcx ; Set hInstance to 0 for a global Windows cursor
WinCall LoadImage, 6, rcx, rdx, r8, r9, r11, lr_cur ; Load the standard cursor
mov wcl.hCursor, rax ; Set wcl.hCursor
NOTE: many data structures requiring runtime initialization will be of dword, or smaller, size. You must pay close attention to this, as assembly will not stop you from writing a qword into a dword location. It’ll simply overwrite the next four bytes after the dword when that isn’t what you want to do. If wcl.hCursor were a dword, then the 32-bit EAX, not the 64-bit RAX, would be written there. If it were a 2-byte word, then the 16-bit AX would be written, and if it were a single 8-bit byte, then AL would be written.
There are countless references to Intel architecture registers online. Use them as needed. The hardware designers, as a predominant rule, are very consistent in their naming of things, so it won’t take long to memorize how the registers work. Just keep using them; the information will stick.
hInstance is assigned when the app starts up; anybody with any appreciable WinAPI experience knows that it’s used a lot. It’ll be required when initializing DirectX, among other things. It isn't used when loading the standard cursor because it's a stock object provided by Windows. If hInstance is specified, Windows will search the calling application for the resource. It won't find it, and the call will fail.
In the call above, the WinCall macro is invoked. LoadImage takes six parameters, so six parameters are specified. 64-bit calling convention requires that RCX, RDX, R8, and R9 hold the first four parameters, meaning these cannot be altered – other registers cannot be used in their place. R11 is arbitrary; any open register except RAX or R10 (which the WinCall macro uses internally) can be used to store the cyDesired parameter because it’s not one of the first four parameters. The WinCall macro will properly set up the stack for the call to LoadImage, caring nothing for which registers are used to carry values on entry into the macro. Just remember R10 is used by the macro itself as a taxi service so don't use it, or RAX, to give your parameters a sendoff into WinCall.
For setting the hIcon and hIconSm fields, LoadImage is used the same as shown above; just change the cxDesired and cyDesired parameters to the appropriate dimensions of the icon; uType is set to image_icon, and of course lpszName is set to the name of the resource as it’s declared in your resource file. hInstance must be set to the application's hInstance, since these resources are part of the application - they are not Windows-global. The resource file is compiled by RC.EXE so nothing changes from its format in any C++ application (or any other language that handles resource files the same as C++). The resource file lines for the icons are shown below:
LARGE_ICON ICON DISCARDABLE "LargeIcon.ico"
SMALL_ICON ICON DISCARDABLE "SmallIcon.ico"
In the assembly source, declare the names of the resources (I place these in the strings.asm file):
LargeIconResource byte ‘LARGE_ICON’, 0 ;
SmallIconResource byte ‘SMALL_ICON’, 0 ;
Pointers to the variables LargeIconResource and SmallIconResource are passed in turn to each of the two LoadImage calls that load the icons. The results are placed into wcl.hIcon and wcl.hIconSm respectively. These are qword fields so RAX, on return from LoadImage, is assigned to each. To actually load the pointers to the strings, use the assembly lea instruction. This is “load effective address;” it was designed to perform on-the-fly calculation of memory address, and it will be used to its full potential later in the app. For now, it simply loads the pointer with no calculation:
lea rdx, LargeIconResource ;
The reason this is necessary is that ML64.EXE eliminated the offset directive that used to be a part of Microsoft's assembly language. Had they not retired it, the line could simply be encoded as “mov rdx, offset LargeIconResource.” I have no idea why they did it; it’s another step backward in the eternal march toward abstraction de-evolution.
With all the above handled, the window class can now be declared:
lea rcx, wcl ; Set lpWndClass
winCall RegisterWindowClass, 1, rcx ; Register the window class
The return value is not used by this app.
Registering the window class finally allows creation of the window that will serve as the DirectX render target. Win32 is Win32 (64 bits or otherwise) so logically the window creation process continues the same as it would in a C++ application.
The complete startup code is in the attached source for this article.
Your Window to the Future
This section will focus primarily on the callback function for the main window. The main difference between this app and the average C++ application is that here, the switch statement will not be used. I have always disliked that statement, finding it clunky, primitive, and quite inefficient (given that I look at everything from the viewpoint of what is actually executing on the CPU).
In place of switch, the CPU’s group of scan instructions is used. These are CPU-level instructions that scan a given array of bytes (, words, dwords, qwords) in memory until a match is found (or not). The value to scan for is always contained in RAX for qwords, EAX for dwords, AX for words and AL for bytes. RCX holds the number of qwords, etc. to scan, and RDI points to the location in memory to begin scanning.
One of the very convenient uses for the scan instructions is sizing a string. The following code performs this task:
mov rdi, <location to scan> ; Set the location to begin scanning
mov rcx, -1 ; Set the max unsigned value for the # of bytes to scan
xor al, al ; Zero AL as the value to scan for
repnz scasb ; Stop scanning when a match is encountered or the RCX counter reaches zero
not rcx ; Apply a logical NOT to the negative RCX count
dec rcx ; Adjust for overshot (the actual match of 0 is counted and needs to be undone)
Whatever is encountered in memory that stops the scan (if repnz is used, for “repeat while not zero [repeat while the CPU’s zero flag is clear]”), RDI will point at the next byte, word, dword, or qword after that value. In the sample code above, when the scan completes, RDI will point at the byte immediately after the string’s terminating zero. For Unicode strings, the code would look like this:
mov rdi, <location to scan> ; Set the location to begin scanning
mov rcx, -1 ; Set the max unsigned value for the # of bytes to scan
xor ax, ax ; Wide strings use a 16-bit word as a terminator
repnz scasw ; Scan words, not bytes
not rcx ; Apply a logical NOT to the negative RCX count
dec rcx ; Adjust for overshot
RCX then holds the length of the string. However, all good things come with a down side; if you forget to add the terminating 0 after a string (whether that string is wide or ANSI), the wrong size will be returned as the scan will continue until either RCX reaches 0 (that’s potentially a whole lot of bytes to scan), or the next 0 value is encountered somewhere in memory after the scan start position. Still, in such a situation, forgetting the terminating 0 is not going to end well regardless of the approach used to sizing the string.
In my apps, I typically precede all strings with a size qword so that I can simply lodsq that size qword into RAX, leaving RSI pointing at the string start. (All the lods? Instructions move data into RAX, EAX, AX, or AL.) If a string is static and isn’t going to change (i.e. the window class name), there is no point in sizing it at all at runtime, let alone multiple times, when the correct size can be set during compile.
You won’t always want to use this approach. If you have a limited size buffer and want to be sure you don’t scan beyond it, then you’ll have to set RCX to the buffer size. However doing this will force you to calculate the string size by either subtracting the ending count in RCX from the starting count, or by subtracting the ending pointer (plus one) from the starting pointer after the scan.
Call Me Right Back, K?
Regarding the window callback, a lookup table contains all the messages handled by the callback. This table always begins with a qword that holds the entry count. The offset into the table is then calculated, and the same offset into a corresponding router table holds the location of the code for handling that message.
mov rax, rdx ; Set the value to scan (incoming message)
lea rdi, message_table ; Point RSI @ the table start (entry count qword)
mov rcx, [ rdi ] ; Load the entry count
scasq ; Skip over the entry count qword
mov rsi, rdi ; Save pointer @ table start
repnz scasq ; Scan until match found or counter reaches 0
jnz call_default ; No match; use DefWindowProc
sub rdi, rsi ; Get the offset from the first entry of the table (not including entry count qword)
lea rax, message_router ; Point RAX at the base of the router table
call qword ptr [ rax + rdi - 8 ] ; Call the handler
For this app, only four messages will be initially handled: WM_ERASEBKGND, WM_PAINT, WM_CLOSE, and WM_DESTROY.
The handler for WM_ERASEBKGND does nothing but return TRUE (1) in RAX. Returning TRUE for this message tells Windows that all handling for the message is complete and nothing more needs to be done. (Relative to all Windows messages a callback might process, the value that your app must return is FALSE (0) far more often than not, but it’s never safe to assume – always check the documentation for each message handled; some require a very different and specific return value depending on how you handled the message. This is especially true in dialog boxes, in particular with NM_ notifications sent through WM_NOTIFY.)
There are cases where an application engaging in complex drawing operations may want to know when the background is being erased, but even these odd men out may no longer exist. The WM_ERASEBKGND message itself is an ancient artefact of a bygone era. In the earliest days of Windows, just drawing a stock window with a frame could seriously tax the graphics adapter. As such, a slew of tricks and compensations had to be employed to speed up the process when and where that could be done. One of these innovations was the idea of identifying the exact portion of a window’s client area that actually needed to be redrawn. This “update region” often does not include the entire client area, and any time a few CPU cycles could be saved, it was worth doing.
So the concept of “update areas” or “update regions” (a region is a collection of rectangles) was created. Within the client area, a region was declared as “invalid,” meaning it had to redrawn. In modern times, the code required to process update regions is arguably slower and bulkier than simply redrawing the entire client area off screen then drawing it in one shot onto the window itself. DirectX employs this “double buffering” technique extensively to avoid flicker, which comes from “erase then redraw” on-screen. Still, even in Windows 10, the remnants of the old window painting system remain; the handler for the WM_PAINT message must explicitly “validate” the update area. If it doesn’t, WM_PAINT messages will repeat forever, flowing into a window’s callback function fast and furious, dragging down the performance of the entire application. This sample code’s WM_PAINT handler uses the Win32 ValidateRect function as its only task, since DirectX takes over all drawing in the window client area.
A Volatile Situation
Note that per MSDN, the following registers are considered nonvolatile – any function called will preserve their values across its entire execution:
R12, R13, R14, R15; RDI, RSI, RBX, RBP, RSP
These must be saved then restored if they’re altered in the window callback (or any other) function. For this application, all of the nonvolatile registers are saved regardless of the message being handled - saving is done before the incoming message is even looked at. You may or may not want to alter this behavior in your own application’s internal functions, but when you call any Windows function, you have to be aware that the contents of volatile registers can never be relied on to persist across the call. One function or another may indeed leave a volatile register unchanged on its return, but specs are specs and that behavior could easily change at any time, especially given the any-time-is-a-good-time update policy for Windows 10.
The complete function for the window callback is shown below, noting that the assembler will handle saving and restoring RBP and RSP for each function declared:
mainCallback proc ;
local holder:qword, hwnd:qword, message:qword, wParam:qword, lParam:qword
; Save nonvolatile registers
push rbx ;
push rsi ;
push rdi ;
push r12 ;
push r13 ;
push r14 ;
push r15 ;
; Save the incoming parameters
mov hwnd, rcx ;
mov message, rdx ;
mov wParam, r8 ;
mov lParam, r9 ;
; Look up the incoming message
mov rax, rdx ; Set the value to scan (incoming message)
lea rdi, message_table ; Point RSI @ the table start (entry count qword)
mov rcx, [ rdi ] ; Load the entry count
scasq ; Skip over the entry count qword
mov rsi, rdi ; Save pointer @ table start
repnz scasq ; Scan until match found or counter reaches 0
jnz call_default ; No match; use DefWindowProc
sub rdi, rsi ; Get the offset from the first entry of the table (not including entry count qword)
lea rax, message_router ; Point RAX at the base of the router table
call qword ptr [ rax + rdi - 8 ] ; Call the handler
jmp callback_done ; Skip default handler
call_default: ; The only changed register holding incoming parameters is RCX so only reset that
mov rcx, hWnd ; Set hWnd
WinCall DefWindowProc, 4, rcx, rdx, r8, r9 ; Call the default handler
callback_done: pop r15 ;
pop r14 ;
pop r13 ;
pop r12 ;
pop rdi ;
pop rsi ;
pop rbx ;
ret ; Return to caller
mainCallback ends ; End procedure declaration
When popping registers, they must be popped in the exact reverse order from how they were saved (pushed). Intel architecture uses a LIFO stack model – last in, first out. Each push instruction (assuming a qword is pushed) stores that qword in memory at the location pointed to by RSP; RSP is then backed up (moves toward 0) by 8 bytes. The “lowest” address on the stack is the “top” of the stack. Entry into this callback will place items on the stack in the order they’re pushed – the lowest (closest to 0) address holds r15; the highest holds RBX (referencing the code above).
One of the most common bugs in an assembly language application is forgetting to pop an item that was pushed onto the stack. You’re not in Kansas anymore, Toto, so you have to do these things manually. Extra power carries extra responsibility. When this occurs, the return from the function (the ret instruction) will itself pop the qword off the top of the stack and jump to whatever address it holds. So inadvertently leaving even one qword on the stack will completely demolish the return from the function and typically send the app careening down to Mother Earth in flames.
The code above represents the entirety of the callback function (minus the individual message handlers) for the main window. Each handler can be thought of as a “subroutine” in the original BASIC language, for those who can remember that far back. The handler code for each message is actually part of the mainCallback function – it lives inside the function itself. Since all the handlers are coded after the ret instruction, they will never execute unless explicitly called.
Within the handlers, the only oddity is the use of the ret instruction to return from the handler into the main code of the function. From the assembler’s viewpoint, you can’t do this. The assembler sees ret and it assumes you’re returning from the function itself. As such, it will insert code that attempts to restore RBP and RSP, doing so at a location where you most definitely don’t want that happening. This is where I employ a semi-unorthodox method: I directly encode the ret statement as:
byte 0C3h ; Return to caller
Alternatively, you could use a textequ statement to change that to something more palatable, like:
HandlerReturn textequ <byte 0C3h> ;
Text equates evaporate beyond the source code level; the assembler will simply replace all occurrences of HandlerReturn with byte 0C3h. You just won’t have to look at it in your source code.
At the CPU level, the call instruction doesn’t do anything with RBP or RSP directly. The instruction itself simply pushes the address of the next instruction after call, then jumps to wherever you’re calling. Correspondingly, ret doesn’t do anything except pop whatever value is waiting at the top of the stack and jump to that value, as an address. (The code to restore RBP, reset RSP [from its setup for local variable usage], and return is generated by the assembler.) The CPU has no concept of what a function is; the assembler gives all of that meaning completely separately from the CPU. So hard-coding 0C3h as the ret statement prevents the compiler from trying to helpfully crash your program by resetting RBP and RSP in preparation for a return from the function. “But wait, I’m still on the potty!” Encoding the byte directly in the code stream is just another benefit of the flexibility afforded by the very direct assembly language data model (which is “no model at all”).
If you don’t like this method, you’ll have to encode a separate function for each message you actually process. Then you can simply call as required, but you’ll have to reload the parameters coming into the mainCallback function before doing so (to the extent that each handler needs the information). This, however, carries the extra overhead of setting up and tearing down a function (saving registers, etc.) as well as placing mainCallback's incoming parameters back into the required registers for passing to each message handling function. It all adds up to a lot of extra overhead just for the sake of ritual. It's hardly worth it. Avoiding this extra overhead is the reason for using “in-function handlers” for message processing. Each handler has direct access to the mainCallback local variables (which hold the incoming parameters) because, from the compiler's viewpoint, each handler is still part of mainCallback.
Accessing local variables – in particular, the incoming parameters to mainCallback – is no problem at all. Within each handler, you’re still in the “space” of the mainCallback function, therefore all its local variables are fully intact and accessible.
The lookup table message_table is shown below:
message_table qword (message_table_end – message_table_start ) / 8
message_table_start qword WM_ERASEBKGND
qword WM_PAINT
qword WM_CLOSE
qword WM_DESTROY
message_table_end label byte ; Any size declaration will work; byte is used as the smallest choice
The router list for the callback function is:
message_router qword main_wm_erasebkgnd
qword main_wm_paint
qword main_wm_close
qword main_wm_destroy
The WM_ERASEBKGND handler is shown below:
align qword ;
main_wm_erasebkgnd label near ;
mov rax, 1 ; Set TRUE return
byte 0C3h ; Return to caller
WM_PAINT: with DirectX handling all of the drawing for the main client area, the only thing the WM_PAINT handler needs to do is to validate the invalid client area.
First, ensure the ValidateRect function is declared in the externals.asm file:
extrn _imp__ValidateRect:qword
ValidateRect textequ <_imp__ValidateRect>
The WM_PAINT handler consists of a single call to ValidateRect:
align qword ;
main_wm_paint label near ;
xor rdx, rdx ; Zero LPRC for entire update area
mov rcx, hwnd ; Set window handle
WinCall ValidateRect, 2, rcx, rdx ; Validate the invalid area
xor rax, rax ; Set FALSE return
byte 0C3h ; Return to caller
Failure to validate the update area will cause a torrent of WM_PAINT messages to flow into the window’s callback function; Windows will continue sending them forever, thinking there’s an update area still needing to be redrawn.
The WM_CLOSE handler destroys the window. The DestroyWindow function needs to be declared in the externals.asm file, or wherever you’re keeping your externals:
extrn __imp_DestroyWindow:qword
DestroyWindow textequ <__imp_DestroyWindow>
The WM_CLOSE handler follows:
align qword ;
main_wm_close label near ;
mov rcx, hwnd ; Set the window handle
WinCall DestroyWindow, 1, rcx ; Destroy the main window
; Here it's assumed that the call succeeded. If it failed,
; RAX will be 0 and GetLastError will need a call to figure
; out what's blocking the window from being destroyed.
xor rax, rax ; DestroyWindow leaves RAX at TRUE if it succeeds so it needs to be zeroed here
byte 0C3h ; Return to caller
The handler for WM_DESTROY is a notification, sent after the window has been removed from the screen. This is where the DirectX closeouts occur. The local ShutdownDirectX function will be covered in Part IV, which discusses DirectX initialization and shutdown:
align qword ;
main_wm_destroy label near ;
; The DirectX shutdown function is commented out below; it
; has not been covered yet as of part III of the series. It
; will be uncommented and coded in the source for part IV.
; call ShutdownDirectX ; Calling a local function does not require the WinCall macro
xor rax, rax ; Return 0 from this message
byte 0C3h ; Return to caller
The callback function is closed out with a single line:
main_callback endp
The only task remaining, to make the code presented so far a complete application, is the actual startup code. WinMain is not used, so the startup code moves directly into creating the main window, then enters the message loop. The entire block of entry code is shown below:
;*******************************************************************************
;
; DEMO - Stage 1 of DirectX assembly app: create main window
;
; Chris Malcheski 07/10/2017
include constants.asm ;
include externals.asm ;
include macros.asm ;
include structuredefs.asm ;
include wincons.asm ;
.data ;
include lookups.asm ;
include riid.asm ;
include routers.asm ;
include strings.asm ;
include structures.asm ;
include variables.asm ;
.code ;
Startup proc ; Declare the startup function; this is declared as /entry in the linker command line
local holder:qword ; Required for the WinCall macro
xor rcx, rcx ; The first parameter (NULL) always goes into RCX
WinCall GetModuleHandle, 1, rcx ; 1 parameter is passed to this function
mov hInstance, rax ; RAX always holds the return value when calling Win32 functions
WinCall GetCommandLine, 0 ; No parameters on this call
mov r8, rax ; Save the command line string pointer
lea rcx, startup_info ; Set lpStartupInfo
WinCall GetStartupInfo, 1, rcx ; Get the startup info
xor r9, r9 ; Zero all bits of RAX
mov r9w, startup_info.wShowWindow ; Get the incoming nCmdShow
xor rdx, rdx ; Zero RDX for hPrevInst
mov rcx, hInstance ; Set hInstance
; RCX, RDX, R8, and R9 are now set exactly as they would be on entry to the WinMain function. WinMain is not
; used, so the code after this point proceeds exactly as it would inside WinMain.
; Load the cursor image
xor r11, r11 ; Set cyDesired; uses default if zero: XOR R11 with itself zeroes the register
xor r9, r9 ; Set cxDesired; uses default if zero: XOR R9 with itself zeroes the register
mov r8, image_cursor ; Set uType
mov rdx, ocr_normal ; Set lpszName
xor rcx, rcx ; Set hInstance
WinCall LoadImage, 6, rcx, rdx, r8, r9, r11, lr_cur ; Load the standard cursor
mov wcl.hCursor, rax ; Set wcl.hCursor
; Load the large icon
mov r11, 32 ; Set cyDesited
mov r9, 32 ; Set cxDesired
mov r8, image_icon ; Set uType
lea rdx, LargeIconResource ; Set lpszName
mov rcx, hInstance ; Set hInstance
WinCall LoadImage, 6, rcx, rdx, r8, r9, r11, lr_cur ; Load the large icon
mov wcl.hIcon, rax ; Set wcl.hIcon
; Load the small icon
mov r11, 32 ; Set cyDesited
mov r9, 32 ; Set cxDesired
mov r8, image_icon ; Set uType
lea rdx, SmallIconResource ; Set lpszName
mov rcx, hInstance ; Set hInstance
WinCall LoadImage, 6, rcx, rdx, r8, r9, r11, lr_cur ; Load the large icon
mov wcl.hIconSm, rax ; Set wcl.hIcon
; Register the window class
lea rcx, wcl ; Set lpWndClass
winCall RegisterClassEx, 1, rcx ; Register the window class
; Create the main window
xor r15, r15 ; Set hWndParent
mov r14, 450 ; Set nHeight
mov r13, 800 ; Set nWidth
mov r12, 100 ; Set y
mov r11, 100 ; Set x
mov r9, mw_style ; Set dwStyle
lea r8, mainName ; Set lpWindowName
lea rdx, mainClass ; Set lpClassName
xor rcx, rcx ; Set dwExStyle
WinCall CreateWindowEx, 12, rcx, rdx, r8, r9, r11, r12, r13, r14, r15, 0, hInstance, 0
mov main_handle, rax ; Save the main window handle
; Ensure main window displayed and updated
mov rdx, sw_show ; Set nCmdShow
mov rcx, rax ; Set hWnd
WinCall ShowWindow, 2, rcx, rdx ; Display the window
mov rcx, main_handle ; Set hWnd
WinCall UpdateWindow, 1, rcx ; Ensure window updated
; Execute the message loop
wait_msg: xor r9, r9 ; Set wMsgFilterMax
xor r8, r8 ; Set wMsgFilterMin
xor rdx, rdx ; Set hWnd
lea rcx, mmsg ; Set lpMessage
WinCall PeekMessage, 4, rcx, rdx, r8, r9, pm_remove
test rax, rax ; Anything waiting?
jnz proc_msg ; Yes -- process the message
; call RenderScene ; <--- Placeholder; will uncomment and implement in Part IV article
proc_msg: lea rcx, mmsg ; Set lpMessage
WinCall TranslateMessage, 1, rcx ; Translate the message
lea rcx, mmsg ; Set lpMessage
WinCall DispatchMessage, 1, rcx ; Dispatch the message
jmp wait_msg ; Reloop for next message
breakout: xor rax, rax ; Zero final return – or use the return from WinMain
ret ; Return to caller
Startup endp ; End of startup procedure
include callbacks.asm ;
end ; Declare end of module
As a skeleton program, the demo application is now complete.
As stated in the README.TXT file (in the accompanying .ZIP file), many improvements will be made to the source code for this specific article, so don't spend too much time modifying this code. Not yet. This code was intended to relate concepts and act as a tutorial; as such, its focus was not on efficiency. This will change in Part IV, which will cover initializing DirectX.
In addition, several handlers will be added to the callback function, to create a custom window frame.
Note that DirectX goes absolutely bonkers with constant declarations and nested structures. (DirectX 11 is used, because DirectX 12 only runs on Windows 10.) Because of the sheer volume of typing to move those declarations and definitions over to assembly, they will no longer be detailed in subsequent articles after this one. It's presumed that by now, you get the point and understand the basics of declaring structures, constants, etc. in assembly - with the exception of nesting structures, which will be covered in the next article.
An assembly language app is much like screen printing: MOST of the time involved is in the initial setup - getting your externals declared, your structures defined, etc. All of this is one-time work and can be used for an infinite number of applications without having to be repeated. As this series continues, the accompanying source code will do much of that work for you. Feel free to copy and paste the nasty declarations and definitions as desired so you don't have to research and type them all manually.
The pace will pick up in Part IV, where DirectX will be initialized, shut down when the main window is destroyed, and the message loop will only render a blank scene with no vertices.