Assembly and C

While writing my OS I’ve had my time playing around the possibilities that open up mixing these two languages together. So I wanted to share my knowledge.

Calling conventions

The first thing you must know is that C compiles into machine code and Assembly language, basically, IS machine code, only in a human-readable manner. The only thing lower than Assembly is byte strings of instructions and operands. Also Assembly itself is not that great in terms of readability and maintainability than C. So that means that C is way more abstract than machine code and thus it uses some hidden tricks to intercommunicate with other portions of machine code.

OK, enough of this gibberish, there is this thing called “calling conventions”, the thing that says how the variables are passed to and from functions, so that you can mix many different languages together in a single harmony. The most popular one (and basically the de-facto standard of C) is cdecl and I’m going to stick to that one.

So what happens when you call a cdecl function:

  1. all the arguments passed to function get pushed on the stack in reverse order (rightmost goes first)
  2. function get’s called with call instruction which translates to:
    1. pushes instruction pointer (eip) on the stack
    2. does a jump to function’s location in memory
  3. function reads variables from the stack (accessing them directly) and does it’s magic
  4. function sets return value into eax register
  5. function returns with ret instruction which translates to:
    1. pop the previous instruction pointer (eip) value from stack
    2. do the jump to previous location
  6. do the cleanup – remove previously pushed values from the stack (you can pop, or you can increment stack pointer (esp))

Voila – simple as that. OK, now to the candy store.

In practice

The assembly dialect I’m using here is x86 protected mode NASM which is: “instruction destination, source”.

The C function:

// In C all the functions that are not marked as static are global
int cfunct(int a, int b, int c){
	return a + b + c;
}

Called from assembly:

[extern cfunct]		; import the C function
call_cfunct:
	push 3					; c
	push 2					; b
	push 1					; a
	call cfunct			; call the C function
									; now eax holds the return value
	add esp, 12			; stack cleanup, we did the mess,
									; so we clean it up and remember
									; we pushed 3 integers (32bit, dwords)
									; that means it's 12 bytes

Or the other way around. The assembly function:

[global asmfunct] ; export the function (label) to C (linker)
asmfunct:
	push ebp					; save base pointer
	mov ebp, esp			; set stack pointer as our base pointer
	add ebp, 8				; increment base pointer (as the first 2 values
										; on the stack are the old base pointer and
										; old instruction pointer)
; do the magic:
	mov eax, [ebp]		; get attribute a
	add eax, [ebp + 4]; get attribute b (32bits = 4 bytes)
	add eax, [ebp + 8]; get attribute c
	pop ebp						; restore the old base pointer
	ret								; return to callee

Called from C:

// Import the Assembly function
extern int asmfunct(int, int, int);

void call_asmfunct(){
	int r = asmfunct(9, 8, 7);
}

Neat!

The inline assembly

Now if you’re writing an OS your self, you’re probably going the GNU toolchain road, so now it’s getting a little bit messy with assembly, because you have to use GAS syntax (which is: “instruction source, destination” and a lot of % and $ signs), which I don’t like very much, but there are some good parts in the GCC. More on the inline asm features you can read here, but the simple example is here:

int x = 1;
int y = 2;
int r = 0;
// syntax:
// assembly commands : return values : arguments
asm volatile("add %1, %0" : "=a"(*r) : "a"(x), "c"(y));