Assembly and C

While writing my OS I’ve had my time playing around the possibilities that open up mixing these two languages together. So I wanted to share my knowledge.

Calling conventions

The first thing you must know is that C compiles into machine code and Assembly language, basically, IS machine code, only in a human-readable manner. The only thing lower than Assembly is byte strings of instructions and operands. Also Assembly itself is not that great in terms of readability and maintainability than C. So that means that C is way more abstract than machine code and thus it uses some hidden tricks to intercommunicate with other portions of machine code.

OK, enough of this gibberish, there is this thing called “calling conventions”, the thing that says how the variables are passed to and from functions, so that you can mix many different languages together in a single harmony. The most popular one (and basically the de-facto standard of C) is cdecl and I’m going to stick to that one.

So what happens when you call a cdecl function:

  1. all the arguments passed to function get pushed on the stack in reverse order (rightmost goes first)
  2. function get’s called with call instruction which translates to:
    1. pushes instruction pointer (eip) on the stack
    2. does a jump to function’s location in memory
  3. function reads variables from the stack (accessing them directly) and does it’s magic
  4. function sets return value into eax register
  5. function returns with ret instruction which translates to:
    1. pop the previous instruction pointer (eip) value from stack
    2. do the jump to previous location
  6. do the cleanup – remove previously pushed values from the stack (you can pop, or you can increment stack pointer (esp))

Voila – simple as that. OK, now to the candy store.

In practice

The assembly dialect I’m using here is x86 protected mode NASM which is: “instruction destination, source”.

The C function:

// In C all the functions that are not marked as static are global
int cfunct(int a, int b, int c){
	return a + b + c;
}

Called from assembly:

[extern cfunct]		; import the C function
call_cfunct:
	push 3					; c
	push 2					; b
	push 1					; a
	call cfunct			; call the C function
									; now eax holds the return value
	add esp, 12			; stack cleanup, we did the mess,
									; so we clean it up and remember
									; we pushed 3 integers (32bit, dwords)
									; that means it's 12 bytes

Or the other way around. The assembly function:

[global asmfunct] ; export the function (label) to C (linker)
asmfunct:
	push ebp					; save base pointer
	mov ebp, esp			; set stack pointer as our base pointer
	add ebp, 8				; increment base pointer (as the first 2 values
										; on the stack are the old base pointer and
										; old instruction pointer)
; do the magic:
	mov eax, [ebp]		; get attribute a
	add eax, [ebp + 4]; get attribute b (32bits = 4 bytes)
	add eax, [ebp + 8]; get attribute c
	pop ebp						; restore the old base pointer
	ret								; return to callee

Called from C:

// Import the Assembly function
extern int asmfunct(int, int, int);

void call_asmfunct(){
	int r = asmfunct(9, 8, 7);
}

Neat!

The inline assembly

Now if you’re writing an OS your self, you’re probably going the GNU toolchain road, so now it’s getting a little bit messy with assembly, because you have to use GAS syntax (which is: “instruction source, destination” and a lot of % and $ signs), which I don’t like very much, but there are some good parts in the GCC. More on the inline asm features you can read here, but the simple example is here:

int x = 1;
int y = 2;
int r = 0;
// syntax:
// assembly commands : return values : arguments
asm volatile("add %1, %0" : "=a"(*r) : "a"(x), "c"(y));

Join the Conversation

9 Comments

  1. GNU assembleris (gan inline, gan standalone) jau sen kā var lietot arī (neglīto) intel sintaksi. Nav obligāti jālieto (glītā) GAS sintakse.

    Tas inline asm fragments pilnīgi nestrādās. Tur ir gan sintakses kļūda (kas tā par zvaizgnīti pie r?) un reģistru numiri ir nepareizi add instrukcija vienkārši pieskaitīs pie “a” registra “a” reģistru). Foršāk būtu lietot normālus vārdus nevis operandu numurus.

    Un vispār – kāpēc tu apraksti cdecl, kas ir i386 calling convention? Vai tad nav par x86_64 jāraksta?

    1. Kā jau rakstīju, tad šos piemērus nenotestēju, tikai rakstīju no galvas.

      Atbildot uz tavu komentāru, tad instrukcija pieskaitīs ECX (kurā būs vērtība no C mainīgā y) pie EAX (kurā būs x) un atgriezīs EAX vērtību C mainīgajā r (vismaz tas bija tas ko es cerēju iegūt), bet tev taisnība – *r ierakstīs rezultatu kaukāda adresē kas ir saglabāta atmiņas adresē 0x0000 (jo r ir 0 un *r padod vērtību uz ko pointo pointeris, kas, t.i. adrese 0), bet nu šancēt būtu jāšancē, tikai būs pigori.

      Par x86 un x86_64 runājot, tad es vēl cīnos ar paging un tad kad tikšu līdz Long modei, tad paspēlēšos ar assembly iekš 64bit modes – tiesa gan nekas daudz tur jauns nenāks – tikai RXX reģistri un 8byte stack bloki.

Leave a comment

Leave a Reply to MārtiņšM Cancel reply

Your email address will not be published.