My OS, the journey so far

As I’m walking through the great adventure of building my own OS, I occasionally have some questions and notes to write down. So this is my “journey so far…” post. You can follow my progress on GitHub, because I’m pushing updates more often than I write these blogs.

Those who said that it’s hard to write an OS were simply wrong. Let me tell you one thing – there’s a big difference between hard and a-lot-to-do. Hard is when you have to twist your brain around every single moment, but writing an OS is just a lot of code to write and a lot of articles to read.

So for starters (and the people like me, who have NO background in computer sciences) – here’s a few things that you’ll have to find out yourself:

  1. Basically Assembly (and by that I mean CPU instructions) only work with registers and RAM (well there are some old I/O instructions, that have been overcome by memory mapped devices);
  2. Registers – there are two kinds of registers:
    1. CPU registers like assembly mnemonics AX, EAX, SP, ESP, etc.;
    2. Memory addresses for memory mapped device “registers”;
  3. Of course, at the boot time you’ve got some BIOS functions to work with I/O, but you’re stuck in real-mode (with only 1MiB of RAM available for use, and not even all of it);
  4. Memory and any other storage is just a huge array of bytes ready to be chewed by your code. Only to get to the storage you have to set up a lot of things first – like a storage driver, that knows how to communicate with storage controller;
  5. Oh yes, and by the way – in standard PC architecture the CPU is not the only micro-controller, you have dozens of them – PIC, APIC to name a few. Every one of them loves to be programmed – it’s not hard, it’s fun and challenging at the same time.

The Boot process

There are currently two ways to get your OS up and running. The old way – using BIOS and the new way – using (U)EFI.

BIOS way:

  1. Write a MBR (Wikipedia) bootcode. It’s the code that’s located at the very beginning of your storage device – 1st sector (512 bytes). The BIOS will always load it from there. You’ll have to learn some Assembly though, because it’s too small for any C code, no matter how good the size optimizations are – you have only 440 bytes spare, the rest is a partition table and some other reserved bytes.
    1. It’s loaded at address 0x7C00 by BIOS (including the partition table);
    2. It should set up stack pointer (just set a location in memory where the stack will be – remember it grows downward, so if you set it at 0x7C00 then the push instruction will relocate stack pointer at 0x7BFF, and so on);
    3. It should relocate itself to somewhere else to give space for the next boot code in the chain (0x600 was my choice);
    4. It should validate the storage structure (for example if it’s on the harddrive, you’ll check all 4 of the partition entries to see which one is bootable);
    5. It should read partition boot code (the VBR) from a bootable partition and pass the control to it (with a simple jump).
  2. Write a VBR (Wikipedia) bootcode. This is the boot code that is located at the beginning of a bootable partition (again 1 sector – 512bytes). Note: if the partition is an extended partition this code should work the same way as MBR and load the next VBR in the chain. And this is also done in Assembly.
    1. Your MBR should load it at 0x7C00 (well it’s up to you any way where you’ll put it – it’s a complete freedom for you at any stage of booting :);
    2. Again, it should relocate itself (you can overwrite MBR code, as it’s not needed any more) to free up some space;
    3. It can do some tests to see how much memory do you really have (read the memory map using BIOS functions);
    4. It should parse the file system on current partition and find your bootloader file;
    5. Read the bootloader from file system into memory;
    6. And pass the control to it
  3. Write a bootloader. This one does not have such a drastic size restrictions, but remember that you’re in real-mode, so you basically might have around 600KiB of RAM to work with. And there must be a glue code at the beginning of your bootloader, because it’s not your regular executable file – it should be a flat binary file that knows where it’s going to be loaded thus it can do the jumps correctly. A glue code is simply an Assembly code that has to be compiled at the very beginning of binary image that knows about your C main() function and can call it. There is one thing though – most of the C compilers emit 32bit (protected mode code) or 64bit (long mode code), but none does 16bit real mode, so you’ll have to do a protected mode jump (or long mode jump) in your glue code to execute main() function. And this is where the scary thing will happen – you’ll loose all the access to BIOS functions (so you won’t be able to read data from any storage any more). But don’t worry – there are a few things you should do and you’ll probably be back on track with more power than BIOS can offer you:
    1. Prepare stack and segments (it’s always a good practice to re-do it in every boot code you load);
    2. Enable A20 Line (yes, this is one funky thing that’s haunting PC industry for like 30 years now, but it’s just 3 instructions)
    3. Do a protected mode (or even long mode) jump in your glue code (you really have to do a jump to enable any of these modes);
    4. Set up interrupts – not a big deal. All these Interrupt Descriptor tables (IDT) and Global Descriptor tables (GDT) are just arrays of data structures and you pass their memory location to CPU with simple instructions like lgdt or lidt;
    5. … well this is where I’ve been so far. Next up – read ACPI tables (they are located in RAM), find APIC, disable old-school PIC, write AHCI driver and you’re ready to read all the data from your SATA harddrive, write EHCI driver to read from USB flash devices, etc. Also once you’re in protected mode or long mode, you’ve got an access to all of the RAM that’s installed in your PC.

(U)EFI way:

As much as I understand, (U)EFI is really neat – if you’re going 64bit, then all you have to do is to write a small application in C that loads your kernel from any partition you wish, because you’re already in long mode and all the ACPI tables are presented through (U)EFI data structures passed as an argument to your efi_main() function in C. So no glue code, no memory mapping, no real mode, no BIOS functions. And you might also have some drivers installed from the motheboard vendor so you don’t have to write AHCI driver yourself.

My way:

The thing I’m currently doing is actually combining both of them together. I’m writing a MBR boot code that can read EFI GPT (GUID Partition Table) and load a bootloader code from BBP (BIOS Boot Partition), which is a special partition that is meant to enable the GPT in legacy BIOS driven hardware (and also in emulators like Bochs, that don’t support (U)EFI yet). The main goal is to create an EFI emulator for legacy BIOS (it’s in a long run for now), but I might as well just write simple bootloader that can run in parallel with EFI implementation. Anyway the adventure currently resides in ACPI town, so I’m wrapping my head around that, as I’m trying to get from ACPI tables to full AHCI implementation to read data from harddrive. Why am I doing it – mostly to learn all the ins and outs of PC architecture.

The environment

I must say it’s a lot of fun. First of all remember, C is just a language that defines simple data types, control structures and simple operators (like +, -, etc.). For example, the functions that you’ve seen before, like fopen(), malloc() and so on, are part of C standard library that’s implemented by the OS vendor and they are not available in bare bones C. But don’t worry – you don’t need that while running on bare metal. because you can do a lot of neat tricks like this:

char *vidmem = (char *) 0xb8000; // mapped vido memory location

which gives you access to (80 x 25 x 2 byte array of characters, each representing a single character on the screen). Or even this:

/**
* GDT Entry structure
*/
struct gdt_entry_struct {
	uint16 limit_low;			// The lower 16 bits of the limit.
	uint16 base_low;			// The lower 16 bits of the base.
	uint8  base_middle;			// The next 8 bits of the base.
	uint8  access;				// Access flags, determine what ring this segment can be used in.
	uint8  granularity;
	uint8  base_high;			// The last 8 bits of the base.
} __attribute__((packed));
typedef struct gdt_entry_struct gdt_entry_t;

gdt_entry_t *gdt = (gdt_entry_t *)0x800;

Also mixing Assembly with C is a normal thing and not that hard at all, just remember about calling conventions if you want to pass arguments to and from assembly.

OK that’s it for now. Till the next time.

Leave a Reply

Your email address will not be published. Required fields are marked *