I ain’t dead, I’m still researchin. It’s kind of never ending story, but still I need to mark few things down. Today it’s all about memory layout and storage (hard disk drive).
For those unfamiliar with what I’m about to write, please, read the following:
- My first post, where I said that I want to try out this thing called “writing your own OS”, and where I gained my 15 minutes of fame over the internet (actually 3 days according to Google Analytics – 70k unique visitors, whoa!);
- My second post, where I try to clear my mind up about all the memory maps and stuff and how to create a Cygwin cross-compiler;
- My third post, where I try to clarify my actions with the rest of the world;
- My fourth post, where I’ve summarized everything I read as a response to my first post;
- My post about a MinGW Cross-Compiler.
I’ve made my decision – GPT and EFI boot it is. Although without a decent emulator that has the (U)EFI support I’d be stuck with hardware and painful debugging options. But I’m a man with a plan. In any case I must learn Assembly, then why not try my luck with another crazy idea. My idea is to build a backwards compatible MBR that looks up for (the officially acclaimed) BIOS Boot partition (BBP), that contains stage 2 loader, that will emulate some EFI functionality and be able to load EFI binaries from EFI Boot partition. What d’ya say about that? In my native language they say “two rabbits with one shot”. The chain-loading flow:
- BIOS loads my MBR
- My MBR loads GPT header (at LBA 1) and does some validation tests
- My MBR loads next sector (LBA 2) that should contain up to 4 partition entries
- My MBR then looks up for BBP in those first four partitions
- Then (and only then) MBR relocates only the vital BBP loader instructions, thus saving space for around 510Kb (up to 630Kb if EBDA area is free) of memory
- MBR loads BBP code (BBP partition is not formatted, it’s treated as raw binary code) and jump to it’s entry point – voila!
Oh yes, and to save more space and keep the BIOS logo displaying (except maybe if BIOS changes video mode right before entering MBR – I have to test it) – it’s dead silent, if everything goes wrong – only then I do the video mode switch and print out a simple error code. It might not be the best style of MBR development, but it’s only for slight backwards compatibility and mostly because no x86 emulator does support (U)EFI … yet (well Qemu kind of does, but I’m still sticking with Bochs).
I was afraid, but not stupid, so getting a grip on assembly programming took some 2-3 days and now I’m feeling quite confident. I can think about some idea, look up some examples on the net, search for an appropriate instruction in Intel’s manual, that suits my needs and write some code. In overall – I’m starting to understand what’s happening.
I still don’t know any do’s and don’ts. For example, why does every single MBR starts with XOR AX, AX, why not MOV AX, 0? Is XOR faster or smaller in size? And there were a few drawbacks. One, for example, is doing 64bit integer arithmetic operations in 16bit Assembly.
So, how do you do a 64bit integer division (needed for LBA -> CHS calculations) or if I drop CHS support and go straight to extended read, then how to do start and end LBA subtraction to calculate length This one I quite don’t get, so I’m doing some stupid assumptions for now – just check weather the high dword is 0 and that’s it. So our BBP should be located at the beginning of the disk. Actually this thing about high-low structure arithmetic has bothered me a lot lately. For example, I mentioned QueryPerformanceCounter in my previous post – it returns a structure called LARGE_INTEGER, which is basically a struct with low dword and high dword values of CPU cycle count since the OS has started. In combination with QueryPerformanceFrequency you could calculate time in seconds, milliseconds or microseconds, but I don’t know how to do this split division. 🙁
Yes, by the way, I just found out that in Real Mode you are not stuck with 16bit operands – it’s the memory addressing that’s stuck at 16bit. In other discussion I found that it’s not quite true. Confusion! I also found one article where you can get a pretty clear explanation of x86 registers and their usage. Good for me!
Saving some memory
If you’d have only 50 cents, what would you do with them? You’d spend them wisely. So if you have 1Mb or RAM, how do you use it? We’ll once you have full control of your program written in assembly, you can do a lot of tricks, like, for example, a relocation. To gain some more contiguous space to load your bootloader, MBR always does a relocation. But as much as I’ve seen around the internet – every one of the MBR sources does the same thing – relocate at the very beginning. Why?
I took a different approach. I do all the necessary validation checks before (staying where I was located by BIOS) and only do partial relocation – I relocate only last 16 bytes of my code – the ones, that do the actual copying of BBP into memory. Everything else is not relevant any more – I’ve done my checks, I don’t need that garbage any more. So I end up with precious 510Kb of RAM for my bootloader. 510Kb is actually quite a lot, well at least for my plan of emulating some simple (U)EFI functionality – read FAT partition, parse PE header, that is.
Before I give out the source of my MBR2GPT project, I must talk about development in general.
When you’re working on some project that has to do something with non human readable data structures, you need to build your own toolkit to make these structures readable. For example, this autumn I was working on one art project called “Emografs” (see the video) that was a part of Staro Riga 2012 lights festival, I developed a software that:
- Reads human pulse from a finger with an oxymeter;
- Does audio mixing and time-stretching on audio loops;
- Sends a metronome signal to the video mixing console;
- Control the Arduino that controls small LEDs and a servo that turns a kaleidoscope (I also wrote code for this one).
Of course, to create some time-stretching I had a few easy but uggly options, like:
- classical time-stretch, that also changes the pitch;
- granular time-stretch, that ended all the beat transients too smeared out;
- loop sliced audio files (like REX files).
I went with the last option, that also meant I had to come up with my own audio file format and an utility that can load wave files and slice them (with GUI!)
So back to our topic, whilst working with hard disk drives or disk images – it’s almost the same – it’s just a huge array of bytes that are not human readable. So I rewrote my diskutils (see previous posts) with a plan to create a GUI disk/image editor. The GUI version is still work in progress, but console applications are must have right now – as they can build GPT formatted images that can be directly used with Bochs.
- Windows binaries
- VSExpress source (soon to be uploaded on GIT)
Usage is simple, you can get all the options with -h switch. Also a little explanation:
- buildimg – is an image builder, that build GPT disk images
- diskdump – is the unified disk-to-image and image-to-disk dump utility
- diskedit – is a work in progress GUI editor for disks and images
- disklib – is a shared library with some useful functions
To compile diskedit you need to have a compiled FLTK static library and FLTK source at hand.
OK, it’s the thing you’ve all been waiting for. My MBR2GPT source – download it here. It depends on:
- MinGW cross-compiler I build in this post
- previously mentioned diskutils buildimg executable, that will merge all the binary images into a disk image used with Bochs;
- and Bochs of course with a debugger if you’d like to tinker around.
Of course it’s half-baked, there are no CRC32 checks and it loads only the (F-U Phoenix!) 127 sectors from BBP – so I have to work out the extended-extended read (a.k.a. multiple reads in a loop) to load all those 510Kb I promised. 🙂
Here are the new source files. Changes made:
- It’s dead silent now, to save more space – eventually I run out of 440 bytes, so I removed text display functions
- It copies all the 480 KiB from BIOS Boot Partition in a loop with 64 KiB in each take
- I has neat data structures instead of defined constants for memory locations
Also I’ve posted my DiskUtils on GitHub
Uploaded mbr2gpt on GitHub too.
To be continued…
XOR AX, AX is faster than MOV AX, 0 because in almost every x86 CPU instruction:
XOR REG, REG (same registers)
is treated as special case, and CPU just clears the register instead of xoring.
It is also recommended method by some Intel documents.
http://www.agner.org/optimize/ is supposed to be the ultimate reference for x86 optimization. It might serve you well.
Dalīšanu lielākiem skaitļiem (64-bit), kas reprezentēti ar mazākiem cipariem (piemēram četriem 16-bit), māca jau pamatskolā: http://en.wikipedia.org/wiki/Long_division#Method
OK šis paņēmiens būtu skaidrs uz papīra (un jā skolā man mācīja), bet man nelīmējas bilde kopa – kā to pārnest uz high/low struktūrām. :/ Laikam manā gadījumā ir kaut kāds missing link, kā šiftot ciparus starp high un low daļām. Anyway, paldies par komentu.
It is tthe best time to make some plans for the future and it is time to be happy.
I’ve read this post and if I could I desirre to suggest you some interestng thigs oor
advice. Maybe you could write next articles referring to this article.
I desire to red even more things about it!
Leave a comment