Thursday, June 7, 2012

Write your own Kernel: BootLoader stub

Since I was young, one of my dreams was to write my own operating system.
It was still the DOS era, and as a young boy I started studying lot of tech stuff trying to figure out what was really happening inside my computer.

Actually, at the time, it was too much 'tech' stuff for me, I didn't have the necessary knowledge to understand the subtle details behind a low-level programming approach.

Then I took my Computer Engineering degree, the technical details are no longer obscure, but since then, the old dream remained just... an old dream!

Yesterday, while messing up with a broken hard disk trying to recover some data, a funny idea came up into my mind, and I launched:


dd if=/dev/sda of=MBR count=1 bs=512 > MBR

This dumped into a file the content of that disk's Master Boot Record (MBR). Of course, I couldn''t resist this, and I disassembled it:


objdump -D -b binary -mi386 -Maddr16,data16 MBR

While staring at that code, my old dream came back to me, and so I started messing around a bit. The outcome, was a small bootloader stub, which is actually working, and which I am here describing... In the future, I will surely spend some time extending this, but this will remain just a project-for-fun!
And yes, this is somewhat reinventing the wheel, but for a low-level programmer, this is really funny!

So, the work setting is this:
As for VirtualBox configuration, I have set a 40MiB virtual IDE Hard Disk, FAT32 partitioned, for future development (my legacy machine will mount a similar one), and a floppy device loading a binary raw image (the actual bootloader + early kernel).

So, let's now start analyzing the BootLoader's early stub. This is a 'stage 1' bootloader which simply prints a "Hello World" message, waits for a keystroke and then reboots the system. No actual loading of any stage 2 loader, nor mode change.

Given the hardware architecture we are targeting, we have some constraints about the bootloader to keep in mind for writing it successfully:
  • The bootloader is loaded by the BIOS (which is loaded into RAM at startup as well), given some preconditions
  • Both code and data must be placed within one sector of the booting disk (the MBR), which is 512B
  • The last two bytes of this sector must have the value 0xaa55, otherwise the sector will not be considered as bootable
  • The BIOS will load the bootloader at address 0x0000:0x7c00, so the code must be relocated to that address, otherwise it won't work.
  • The %dl register will contain the disk number, useful for reading additional data from it. Nevertheless, I am ignoring this up to this up to this stage.
  • Of course, we run in real mode, so we have 16-bit code and we can do almost everything! :)
So, our MBR will contain both code and data. in the future, it will contain some partition table as well, but since it is stored into a floppy disk, we must provide a Disk Description Table (DDT) as well, to make it a valid floppy. Additionally, the first byte of the MBR will be part of the first instruction which will get executed, so we have to properly merge these things. The beginning of our boot1.S code is therefore this:

 .code16
 .text

.globl _start; _start:

 jmp stage1_start
 
OEMLabel:  .string "BOOT"
BytesPerSector:  .short 512
SectorsPerCluster: .byte 1
ReservedForBoot: .short 1
NumberOfFats:  .byte 2
RootDirEntries:  .short 224
LogicalSectors:  .short 2880
MediumByte:  .byte 0x0F0
SectorsPerFat:  .short 9
SectorsPerTrack: .short 18
Sides:   .short 2
HiddenSectors:  .int 0
LargeSectors:  .int 0
DriveNo:  .short 0
Signature:  .byte 41   #41 = floppy
VolumeID:  .int 0x00000000   # any value
VolumeLabel:  .string "myOS       "
FileSystem:  .string "FAT12   "

 stage1_start:

The .code16 and .text are things to make GNU assembler (gas) produce valid code. In particular, .code16 tells the assembler to produce 16-bits code (the default would be 32-bits, of course!) and .text ensures that we have everything into one single section (we don't actually mind which one, as they will be stripped, later on). _start is an actual symbol which describes the entry point for the executable, but we will be using this in a different way. The first instruction, jmp, tells the machine to skip "executing" the DDT, so we can have a correctly executing program, keeping the correct format for a floppy (of course, this bites some bytes out of the small 512B available space).

The code now must setup the runtime stack, since it will allow using the "call" instruction for calling subroutines. This looks like this:

 cli              
 movw $0x07C0, %ax
 addw $544, %ax  # 8K buffer
 mov %ax, %ss
 movw $4096, %sp
 sti

This creates a 4KiB stack space above the bootloader, which is (hardcoded) 8KiB large (544 paragraphs). We use the cli and sti couple, in order to disable and then re-enable interrupts, since it is not safe here to perform any interrupt operation, before having the stack correctly set up.

Then, we display our dummy message, "Hello World", and then reboot.

 cld
 movw $hello_s, %si
 call  print_string
 jmp reboot
 hlt

cld clears the direction flag, so that the internal implementation of print_string will read the string placed into %si from the beginning to the end. The final hlt is actually never executed, but placing it there is a good practice, nevertheless. So, let's see how do print_string function and reboot subroutine work.

1:
 movw $0x0001, %bx
 movb $0x0e, %ah
 int $0x10
print_string:
 lodsb
 cmpb $0, %al
 jne 1b
 ret

This routine is actually nicely optimized in space (remember, a bootloader suffers from space!). The funny part is that its entry point is in the middle of its code!
So, when we call it, a lodsb instruction gets executed, which loads one byte from %si into %al and increments %si by one: we read one character of the string from memory!
cmpb checks whether the byte just read is 0, i.e., if it is a NUL terminating character, the end of the string. If not, it goes executing from the 1: label.
There, we find the int $0x10 instruction, which generates an interrupt and searches into the Interrupt Vector Table the entry 0x10, which is associated with the BIOS teletype function family. The value 0x0e stored into %ah tells the BIOS to activate the screen printing function, which displays on screen exactly the caracter stored into %al, using the old-fashioned Codepage Font (which is usually hard-coded within the BIOS itself).
This process goes on until a '\0' is found in the string, then a ret is executed.

Rebooting the system is far more easy:
reboot:
 movw $reboot_s, %si
 call print_string
 movw $0,%ax
 int $0x16 # Wait for keystroke
 movw $0,%ax
 int $0x19 # Reboot the system

A nice string asking the user to press any key is shown, using the same function as before. Then, a keystroke is waited for, using the BIOS int $0x16 which, having %ax == 0, activates the "get keystroke" BIOS function. The BIOS does not return from the interrupt routine until a key is pressed, and then int $0x19 is executed, i.e. the "bootstrap loader" interrupt.


The end of stage1.S goes like this:

 hello_s: .string "Hello World!\n\r"
 reboot_s: .string "Press any key to reboot..."

 . = _start + 0x0200 - 2
 .short 0x0AA55 #Boot Sector signature

The two strings which we want to display are declared, so that we can load their addresses for the printing function.
Then, the line:
. = _start + 0x0200 - 2
tells gas, the GNU Assembler, to move the location of the code being generated (the '.' variable) 512B after _start, and then 2B backwards, which is 2 bytes before the end of the MBR. At this point, we make gas emit the 0xAA55 signature, which tells the BIOS that the current disk is bootable, and we are done: this is our bootloader stub!

Now, the last part, is to actually compile this code. The GNU Compiling Toolchain is targeted at 32/64-bits executables, so by defaults it usually produces ELF Programs. Which we do not want here, as we just need a stream of bytes representing the instructions to be executed.

gas alone cannot do this: it creates headers and everything and cannot be disabled. But we can rely on ld, the GNU linker, asking it to produce a raw binary. So, the two steps are:

as boot.S -c -o boot.o

which creates an ELF executable, containing our code. Then:

ld --oformat binary --Ttext 0x7C00 -o boot.bin boot.o

which, with some magic, strips every header (--oformat binary) and relocates the code starting from the address 0x7c00 (--Ttext 0x7C00) for the text section, which (considering our source) contains our whole bootloader, whose first instruction jumps to the actual initialization code.

Making VirtualBox launch our boot.bin image, this is the actual outcome:



So, now, the next step is to write a second stage bootloader, and make the first stage one be able to load it and transfer control to it!

3 comments:

  1. "Since I was young, one of my dreams was to write my own operating system." ... altro che gli altri piccoli aspiranti ingegneri che giocavano banalmente con i lego XD ...
    Lascia perdere i paper e scrivi il secondo step che sono curiosa XD !

    ReplyDelete
  2. Hey... really useful blog post... especially because it is illustrated in AT&T syntax... but the problem is "setup the runtime stack"... I know the basic concept of stack, SS and SP... but I can't create a picture of that code in my mind... Please, Can you more elaborate on stack setup??.. Thank you .. :)

    ReplyDelete
    Replies
    1. Hi,
      if you know the basics about SP, than that should be easy enough to get. When you power up your processor, the SP is logically zero, so you cannot really use any stack, because when you push something on it, there's actually no valid address pointed by SP.
      So, before executing any 'call' instruction (which writes on stack the return address) I set up a stack.
      If you have more specific questions, feel free to ask! :)

      Delete