Introduction
The OX kernel features its own custom boot loader
designed to boot a 32 bit protected mode kernel. The loader is implemented in
two stages, stage 1 which is in s1.s and stage 2 which is in s2.s. The first
stage is a traditional loader whose code is organized at address 0x7C00 the
address at which the PC BIOS will load the stage 1 loader. Stage 1 necessarily
consists of 512 bytes of 16 bit assembler that uses the BIOS int 13 interrupt
to load stage 2. Stage 2 is a larger 16 bit assembler program consisting of
16348 bytes. Thus, stage 1 and stage 2 combined are 16 * 1024 + 512 = 16896 or
0x4200. The kernel follows and is a 32 bit executable whose maximum size can be
512 KB. The vmox.img contains the s1 stage 1 loader followed immediately by the
stage 2 loader followed immediately by the 32 bit ox kernel. The image is then
padded with null bytes, value 0x0, out to 1474560 which is the size of a
classic 1.44 MB floppy disk. This is needed so that the kernel can be booted
from a Virtual Box virtual machine or a Bochs PC emulator using a floppy boot.
The program that creates the floppy disk image is supplied with the ox kernel
distribution and is called mkboot. Its source is in boot/mkboot.c. There is
also a tool for extracting regions of a binary file called get_data.c whose
source is in boot/elf/get_data.c. The get_data program can be used to retrieve
sections of the vmox.img file. For example:
./get_data vmox.img 0x0 0x200 s1
./get_data vmox.img 0x200 0x4000 s2
./get_data vmox.img 0x4200 0x2746d vmox.boot
The above commands extract the binary files from
the vmox.img boot disk. The first retrieves the 512 bytes of the stage 1 boot
loader in a file called s1. The second retrieves the 16348 bytes of the stage 2
boot loader in a file called s2. The third retrieves the 32 bit kernel image in
a file called vmox.boot. Thus, the arguments to the get_date
program are the
image file name, followed by the offset in hex in the file of where to retrieve
the image, followed by length in hex, followed by the output file to put the
binary images in.
The get_data tool can also be used to extract ELF
sections from an ELF file. Given a 32 bit statically linked ELF image (as in
the vmox kernel), the various section details can be viewed by first using:
readelf -e vmox
Where for example the program headers can be:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x00100000 0x00100000 0x1f8c8 0x1f8c8 R E 0x1000
LOAD 0x021000 0x00120000 0x00120000 0x00e70 0x2d9c0 RW 0x1000
LOAD 0x0220d4 0x080480d4 0x080480d4 0x00024 0x00024 R 0x1000
NOTE 0x0220d4 0x080480d4 0x080480d4 0x00024 0x00024 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x4
Note that the image has three loadable sections and that
these are the ones that are loaded in memory by the loader. The PhysAddr
notes
where in image file the sections start and FileSiz
indicates there size within
the image file. MemSiz
is the size of the section in memory. If
FileSiz
is less
than MemSiz
, the difference must be zeroed out by the loader. To extract the
first section using get_data
use the following command:
./get_data vmox 0x00100000 0x1f8c8 vmox.section1
The stage 1 loader works to load the stage 2 loader
and then jumps to the stage 2 loaders starting address in memory. Stage 2 loads
the entire region from the floppy drive starting with sector 1 not 0 as sector
0 is the stage 1 loader. One can see from the stage 2 loader, that the kernel's
starting base address is:
%define _K_BASE 0x14000
This is the address just after the stage 2 loader has been
loaded at memory segment 0x1000 (address 0x10000) thus the 32 bit kernel image
is at physical address 0x14000 given 0x4000 (16348 bytes) for stage 2. The
kernel then runs from memory segment 0x1000 to 0x9000. Stage 2 thus consists of
three main parts:
- initialization code for protected mode, re-enabling real
mode, enabling a20 line
- loading the stage 2 and kernel image from floppy disk
- converting the 32bit ELF image into a flat binary and
subsequently jumping to the kernel start.
The initialization code is derived from John Fine's boot
loader [1] and initializes the processor into protected mode as well as enables
the a20 line. The a20 line allows the processor to access memory above one meg.
The code to do this is in the second stage boot loader in the file s2.s:
cli
enable_a20
enable_pmode
enable_rmode
sti
Note that in this loader, NASM assembler macros were
utilized to make the code more readable and modular. One can view the details
of the macros by searching for them in the sources. Additional GDT setup logic
was derived from the boot loader originally developed by Gareth Owen [2] for
GazOS. The license for software from [1] and [2] are public domain and GPLv2
respectively. OX boot loader is therefore available as open source under GPLv2
as well. The GDT logic sets up three segments descriptors, one for a NULL segment,
and the other for code and data. This is needed by the enable_pmode
logic.
Both the first stage and second stage loaders require logic
to read from the floppy drive to retrieve the binaries and execute them. The s1
loader loads the s2 loader and that both the s1 and s2 loaders are flat 16 bit
binaries. This means executing the code is a matter of copying the code from
the floppy drive into the proper location in RAM and jumping to it. The s2
loader; however, has the ability of loading either a 32 bit flat binary kernel
or a 32 bit ELF statically linked kernel. ELF of course is the preferred format
given that it allows the loader to initialize the kernels .bss segment and
uninitialized memory. The code for reading from floppy and loading into RAM is as
follows in s1:
%define _S2_LEN 0x21
%define _S2_BASE 0x600
%define _S2_LOAD_SEG 0x60
%define _S2_SIGN_OFF 0x3FFE
mov bx,_S2_LOAD_SEG
mov es,bx
mov ax,1
mov cx,[load_len]
mov di,1
load_s2:
call sector_read
inc ax
mov bx,es
add bx,32
mov es,bx
cmp ax,cx
jne load_s2
call turn_off_floppy
mov ax,[_S2_BASE + _S2_SIGN_OFF]
cmp ax,0xAA55
jne .load_err
mov si,BOOT_MSG
call bprint
exec_bin_kernel _S2_BASE
Note that the load_s2 code requires a macro called
sector_read
which uses the BIOS int 0x13 to read the floppy drive. The code
reads sequentially from the drive and copies the data into RAM. Then the macro
exec_bin_kernel _S2_BASE
jumps to that address in memory immediately executing
the stage 2 loader.
The stage 2 loader has the following logic to load the
kernel using a more advanced macro to load from segment 0x1000 to segment
0x9000:
%define _K_LEN 0x9000
%define _K_BASE 0x14000
%define _K_LOAD_SEG 0x1000
%define _K_END_SEG _K_LOAD_SEG + _K_LEN
%define _S1_BASE 0x7C00
%define _S2_CURR_SECT 0x1
%define _S2_CURR_HEAD 0x0
%define _S2_CURR_TRACK 0x0
%define _S2_NR_SECT 0x12
read_tracks _S2_CURR_SECT, _S2_CURR_HEAD, _S2_CURR_TRACK,_K_LOAD_SEG,_K_END_SEG,_S2_NR_SECT,boot_drive
Where the macro read_tracks
does the heavy lifting. Note
that we start the load of the 32 bit kernel at address 0x10000 (segment 0x1000)
and continue to address 0x90000 (segment 0x9000). Thus we have 0x90000 –
0x10000 = 0x80000 bytes available for the kernel less the fact that the kernel
actually starts at address 0x14000 and therefore there are 8 * 16 ^ 4 – (4*16
^ 3) bytes or 507904 bytes available for the 32bit kernel. As an optimization,
there is no need to start loading the 32 bit kernel by reading from the first
sector on disk, we could read starting from offset 0x4200 on the floppy drive
as this loader actually loads the stage 2 loader twice. However, for the purposes
of the current OX kernel this is OK, as the kernel is about 177773 bytes and
stripped, it is about 140032 bytes. The kernel currently includes a file system
and memory allocator as well as rudimentary process handling (e.g., fork,
exec, scheduling).
Unlike the load of the stage 2 loader, loading the 32 bit
kernel requires the loader to parse and relocate the kernel in RAM prior to
jumping to it. Effectively, what has to happen in 16 bit unreal mode, is that
the kernel image has to be converted from 32bit ELF to 32bit flat binary with
proper zeroing out of the .bss segment. In this process, the kernel segments
are relocated to a physical offset that the kernel will run from. These
locations are important as the memory allocator inside the kernel needs to be
aware of which physical memory locations belong to the kernel. The code for
converting the kernel to a flat binary form ELF is done in the following macro
call:
mov edx,_K_BASE
exec_elf_kernel edx,nr_sections,kernel_start
After this call, the kernel has been relocated to its
starting place in memory and the address of its start routine _start is stored
in kernel_start
variable. To see the value of kernel_start
at the command line,
run:
nm vmox | grep _start
or use readelf vmox -e
And look for "Entry point address:" in the output. These
values are given in hex. In the s2.s sources, you can print the address using:
mov eax, [kernel_start]
print_reg eax
which will also print the entry point. These must match in
order for the load to work as this is the value in ELF of e_entry
which is the
C _start
routine.
Note that in C, the starting function is not main,
it is a function called _start which in the case of a kernel contains logic for
initializing the kernel. In normal C programs, it contains initialization code
for starting up a process to run on the native operating system. The
exec_elf_kernel
macro uses the ELF header to find the loadable sections and
move them into memory. If the section has a file size smaller than memory size,
the difference is zeroed out. Since this is happening in so-called unreal mode,
the process is in 16bit mode with the a20 line enabled so that 32 bit memory
access is possible. As an aside, it is important to note that the memmove
implementation used by the exec_elf_kernel
in the stage 2 loader had a bug in
it where it couldn't correctly load a kernel larger than a 16bit segment as the
loop instruction was not properly setup to use the ecx
register. To fix this,
the assembler directive 'a32' was added to have the assembler generate 16 bit
looping logic that uses ecx
not cx
for its counter. This is why there is
effectively two sources for s2.s, with one designated s2.s.DEBUG which was a
debug version used to find this issue. Debugging a loader is difficult as there
is no debugger that works at that level so print statements must be used to
output register contents and trace through the program to see its progress. The
.DEBUG file will help if this is needed in the future.
The load places each section at the physical address
determined by the compilation and linking of the 32 bit kernel. Thus, the kernel
is linked using the following directives:
cc $(OBJS) -o vmox -m32 -nostdinc -nostdlib -nostartfiles -nodefaultlibs -static -Ttext 0x100000
The directive -m32 compiles to 32 bits (as the development
machine is now 64bits). The directives -nostdinc
-nostdlib
-nostarfiles
-nodefaultlibs
instruct the compiler not to use the standard includes, standard
libraries, standard startup files, and default libraries. This is required
since the kernel can not use user level libraries. The directive -static link
the executable without any shared objects. The directive -Ttext 0x100000
instructs the linker to organize the code/data relative to this physical
address in memory which is the first one meg of ram. This address offset is
used by the exec_elf_kernel
macro to place the kernel at that offset in memory.
The actual space used can be computed using:
size vmox
which may return:
text data bss dec hex filename
129257 3696 183104 316057 4d299 vmox
Note that the total is 0x4d299 or 316057 in decimal. The bss
segment is the largest component and is most directly affected by the variable
BLOCK_ARRAY_SIZE
which is the size of the buffer cache in the file system. See
file include/ox/fs/block.h for more details. Knowing the size of the kernel in
RAM and its location at 0x100000 the memory allocator can start assigning to
user and dynamic kernel memory users RAM pages at an address greater then
0x100000 + kernel_memor_size
(0x4d299). For example, starting at address
0x200000 would likely be a could place to start.
Once the kernel_start
address is obtained and the
kernel has been converted from ELF to a flat binary and relocated to its
starting address in physical RAM, the second stage loader then re-enables
protected mode and jumps to the kernel_start
address. This effectively executes
the 32bit kernel.
An example 32 bit "bare bones" kernel can be as follows:
Basic Kernel
void main ( void );
void
_start( void )
{
main();
}
char *message = "--> Executing test kernel <--
";
#define D_SIZE 0x76a0b
char data[D_SIZE] = {0};
void
main ( void )
{
char *vram = (char *)0xB8000;
while(*message) {
*vram++ = *message++;
*vram++ = 0x7;
}
for( ; ; )
;
}
Note that the kernel simply prints "--> Executing test
kernel <-- " by writing directly to Video RAM at address 0xB8000 and then
idles the CPU. It has a large .bss segment because of a large character array
that is requested this is to test the ELF loader's ability to zero out those
bytes, but otherwise, this has no impact.
Loading in Virtual Box
Testing low level programming can now be done using
virtual machines. This removes the requirement of running the software on real
hardware and potentially making a mistake on real hardware that may cause
undesirable consequences such as damaging the data on a hard drive. Since the
OX boot loader loads from floppy we can actually simulate a floppy boot using
Virtual Box. Given the image file vmox.img, follow the instructions in [3]. The
instructions walk you through how to select a floppy drive media for boot and
Virtual Box should just read the image from the drive and load it in the
virtual machine.
References
- Fine, J. S. (1999). Protected mode programming examples,
system utilities, building embedded systems.
Retrieved April 14,
2013, from http://geezer.osdevbrasil.net/johnfine/index.htm
- Gareth, O. (1999). GazOS. Retrieved May 1999, from
http://gazos.sourceforge.net/
- Hoffman, C. (2013). How Do I Use a Floppy Disc Image in VirtualBox ? Retrieved April 14, 2013, from
http://www.ehow.com/how_8456703_do-floppy-disc-image-virtualbox.html