Sunday, December 7, 2014

x86 Assembly Notes - The C Calling Convention

One major part of x86 assembly is the ability to invoke subrountines.
This educational note is prepared to enhance the reader's understanding why there are prolog and epilog in a subroutine, as well as the changes in stack. Only slight understanding of x86 assembly is assumed.

Prepared by Globeriz (Globeriz Project)

It all starts with the stack.
The stacks works in a LIFO(Last-in First-out) basis. This means the last value pushed onto stack will be the first value to pop from the stack. e.g.:
push 3 ;push 0x00000003 to stack
push 2 ;push 0x00000002 to stack
push 1 ;push 0x00000001 to stack
pop eax  ;pop last pushed value from stack (i.e. 1) to eax

First, familiarize yourself with push and pop instructions.
push - for pushing a value onto stack
It first decreases esp(stack pointer register) by 4 ( sizeof (DWORD) ), then the value is assigned to the dereferenced value of esp.
In opcodes, push eax is equivalent to:
sub esp, 4 ;esp = esp - 4 (i.e. esp-4 => esp)
mov [esp], eax ;*(DWORD *)esp = eax
Meanwhile in stack:
Memory Address Relative to esp Value Memory Address Relative to esp Value
X-4 esp-4 ?
X-4 esp eax
X esp a
X esp+4 a
X+4 esp+4 b
X+4 esp+8 b
BEFORE push eax AFTER push eax
N.B. push eax does not alter the value of eax after execution.

pop - for popping (getting the last pushed value) from stack
It firsts assign memory/register with the last pushed value on stack ( i.e. [esp] ), then increases esp by 4
In opcodes, pop eax is equivalent to:
mov eax, [esp] ;  eax = *(DWORD *)esp
add esp, 4 ; esp = esp + 4
Note how pop eax cancels out the effect of push eax
Meanwhile in stack:
Memory Address Relative to esp Value Memory Address Relative to esp Value
X-8 esp-4 ? X-8 esp-8 ?
X-4 esp eax X-4 esp-4 eax
X esp+4 a X esp a
X+4 esp+8 b X+4 esp+4 b
BEFORE pop eax AFTER pop eax

Now, after understanding the push and pop instructions and their relations with the stack, it is time to introduce the C calling convention in x86 assembly.

There are TWO parties involved in a call, namely the caller (main routine) and the callee (subroutine).

The C calling convention of caller is as follows:
STEP P1> Saving the registers (by pushing them onto stack) that will be altered in the subroutine, such that it is possible to restore them after returning from the subroutine. Common examples are eax ecx edx.
STEP P2> Passing the necessary parameters to the subroutine. The parameters are pushed onto stack just before the call instruction, where the pushing order of the parameters (according to C calling convention) should be reverse of the order in the subroutine function (i.e. the LAST parameter of the function should be pushed FIRST)
STEP P3> Using the call instruction to invoke the subroutine function. The call instruction first pushes the return address (eip, or offset to the instruction right after call in caller) onto stack, and then it branches to the subroutine address to execute code in callee.
In opcodes, call eax is equivalent to:
  push eip
  jmp eax
Then the CPU enters callee (subroutine).

After returning from the subroutine function, (the return value from subroutine can be found in eax register)
STEP E1> Remove (deallocate) the parameters from stack by increasing esp by the total size of parameters.
STEP E2> Restoring saved registers (by popping them from stack) as the registers are believed to be altered in the subroutine N.B. the popping of registers should be the REVERSE of the order they are pushed

The following assembly code demonstrates the role of caller:
...
push eax ;saving registers
push ecx ;
push edx ;
  push param3 ;passing parameters
  push param2 ;
  push param1 ;first parameter pushed last
    call _subroutine ;invoke _subroutine(param1, param2, param3)
  add esp, 0x0C ;restoring stack to the state before the call
pop edx ;restoring registers
pop ecx ;
pop eax ;first saved register pops last
...

The rules for callee is as follows:
STEP P1> Preserving ebp (base pointer) and then copy the value of esp to ebp. Inside the subroutine, ebp is a copy of initial stack pointer. It is going to be used as a reference point for parameters and local variables in subroutine. The reason behind preserving ebp is that the caller does not expect ebp to be altered in the subroutine, therefore it is necessarily to be done in order to restore its value later on.
STEP P2> Allocating space for local variables, this is done by decreasing esp by the size of local variables required. After this, parameters and local variables can be located at known constant offsets (See Appendix A) from ebp.
STEP P3> Preserve the registers that will be used by the subroutine. Common examples are ebx edi esi

After the steps above (prolog), the CPU proceeds to the main body of subroutine. After executing the main body code, the following steps are followed to return properly to the caller (main routine).

STEP E1> Copy the return value to eax.
STEP E2> Restore the registers saved in prolog by pop instructions (REVERSE of order pushed)
STEP E3> Remove (deallocate) local variables by copying ebp to esp (simply reversing STEP P2)
STEP E4> Right before ret instruction, restore the ebp of caller by popping ebp from stack.
STEP E5> Using the ret instruction to return to caller (main routine). The ret instruction pops the return address from stack and branches to the main routine address to continue executing code in caller.

These steps are known as epilog. After epilog, the CPU returns to caller.

The following assembly code demonstrates the role of callee:
_subroutine:
  ; prolog
  push ebp ;preserving ebp
  mov ebp, esp ;preparing ebp as a reference point
    sub esp, 0x08 ;allocate space for local variables
      push ebx ;preserving registers
      push edi
      push esi
  ; end of prolog
  ... ; main body
  ;epilog
      pop esi ;restoring registers
      pop edi
      pop ebx
    mov esp, ebp ;deallocate local variables
  pop ebp ; restoring ebp
  ret ; return to main routine
  ;end of epilog

Appendix A (stack inside subroutine main body, after prolog and before epilog)
Memory relative to ebp (hex) Value
ebp-14 esi (saved)
ebp-10 edi (saved)
ebp-C ebx (saved)
ebp-8 local_var2
ebp-4 local_var1
ebp ebp (saved)
ebp+4 eip (return address)
ebp+8 param1
ebp+C param2
ebp+10 param3

Appendix B (entire stack change for calling process)

...;caller
push eax ;saving registers
push ecx ;
push edx ;
  push param3 ;passing parameters
  push param2 ;
  push param1 ;first parameter pushed last
    call _subroutine ;invoke _subroutine(param1, param2, param3)
Memory Address Relative to esp Value Memory Address Relative to esp Value
X-4 esp-4 ?
X-4 esp eax
X esp ?
X esp+4 ?
Original state AFTER push eax
Memory Address Relative to esp Value Memory Address Relative to esp Value
X-C esp-4 ?
X-C esp edx
X-8 esp ecx
X-8 esp+4 ecx
X-4 esp+4 eax
X-4 esp+8 eax
X esp+8 ?
X esp+C ?
AFTER push ecx AFTER push edx
Memory Address Relative to esp Value Memory Address Relative to esp Value
X-18 esp-8 ?
X-18 esp-4 ?
X-14 esp-4 ?
X-14 esp param2
X-10 esp param3
X-10 esp+4 param3
X-C esp+4 edx
X-C esp+8 edx
X-8 esp+8 ecx
X-8 esp+C ecx
X-4 esp+C eax
X-4 esp+10 eax
X esp+10 ?
X esp+14 ?
AFTER push parma3 AFTER push param2
Memory Address Relative to esp Value Memory Address Relative to esp Value
X-1C esp-4 ?
X-1C esp eip
X-18 esp param1
X-18 esp+4 param1
X-14 esp+4 param2
X-14 esp+8 param2
X-10 esp+8 param3
X-10 esp+C param3
X-C esp+C edx
X-C esp+10 edx
X-8 esp+10 ecx
X-8 esp+14 ecx
X-4 esp+14 eax
X-4 esp+18 eax
X esp+18 ?
X esp+1C ?
AFTER push param1 AFTER call _subroutine
Inside subroutine:
  ; prolog
  push ebp ;preserving ebp
  mov ebp, esp ;preparing ebp as a reference point
    sub esp, 0x08 ;allocate space for local variables
      push ebx ;preserving registers
      push edi
      push esi
  ; end of prolog 
  ...
Memory Address Relative to esp Value Memory Address Relative to esp Relative to ebp Value
X-24 esp-4 ? X-24 esp-4 ebp-4 ?
X-20 esp ebp X-20 esp ebp ebp
X-1C esp+4 eip X-1C esp+4 ebp+4 eip
X-18 esp+8 param1 X-18 esp+8 ebp+8 param1
X-14 esp+C param2 X-14 esp+C ebp+C param2
X-10 esp+10 param3 X-10 esp+10 ebp+10 param3
X-C esp+14 edx X-C esp+14 ebp+14 edx
X-8 esp+18 ecx X-8 esp+18 ebp+18 ecx
X-4 esp+1C eax X-4 esp+1C ebp+1C eax
X esp+20 ? X esp+20 ebp+20 ?
AFTER push ebp AFTER mov ebp, esp
Memory Address Relative to esp Relative to ebp Value Memory Address Relative to esp Relative to ebp Value
X-2C esp-4 ebp-C ? X-2C esp ebp-C ebx
X-28 esp ebp-8 ? X-28 esp+4 ebp-8 ?(var2)
X-24 esp+4 ebp-4 ? X-24 esp+8 ebp-4 ?(var1)
X-20 esp+8 ebp ebp X-20 esp+C ebp ebp
X-1C esp+C ebp+4 eip X-1C esp+10 ebp+4 eip
X-18 esp+10 ebp+8 param1 X-18 esp+14 ebp+8 param1
X-14 esp+14 ebp+C param2 X-14 esp+18 ebp+C param2
X-10 esp+18 ebp+10 param3 X-10 esp+1C ebp+10 param3
X-C esp+1C ebp+14 edx X-C esp+20 ebp+14 edx
X-8 esp+20 ebp+18 ecx X-8 esp+24 ebp+18 ecx
X-4 esp+24 ebp+1C eax X-4 esp+28 ebp+1C eax
X esp+28 ebp+20 ? X esp+2C ebp+20 ?
AFTER sub esp, 0x08 AFTER push ebx
Memory Address Relative to esp Relative to ebp Value Memory Address Relative to esp Relative to ebp Value
X-38 esp-8 ebp-18 ? X-38 esp-4 ebp-18 ?
X-34 esp-4 ebp-14 ? X-34 esp ebp-14 esi
X-30 esp ebp-10 edi X-30 esp+4 ebp-10 edi
X-2C esp+4 ebp-C ebx X-2C esp+8 ebp-C ebx
X-28 esp+8 ebp-8 var2 X-28 esp+C ebp-8 var2
X-24 esp+C ebp-4 var1 X-24 esp+10 ebp-4 var1
X-20 esp+10 ebp ebp X-20 esp+14 ebp ebp
X-1C esp+14 ebp+4 eip X-1C esp+18 ebp+4 eip
X-18 esp+18 ebp+8 param1 X-18 esp+1C ebp+8 param1
X-14 esp+1C ebp+C param2 X-14 esp+20 ebp+C param2
X-10 esp+20 ebp+10 param3 X-10 esp+24 ebp+10 param3
X-C esp+24 ebp+14 edx X-C esp+28 ebp+14 edx
X-8 esp+28 ebp+18 ecx X-8 esp+2C ebp+18 ecx
X-4 esp+2C ebp+1C eax X-4 esp+30 ebp+1C eax
X esp+30 ebp+20 ? X esp+34 ebp+20 ?
AFTER push edi AFTER push esi
Then it enters main body of subroutine. After the main body:
  ...
  ;epilog
      pop esi ;restoring registers
      pop edi
      pop ebx
    mov esp, ebp ;deallocate local variables
  pop ebp ; restoring ebp
  ret ; return to main routine
  ;end of epilog
Memory Address Relative to esp Relative to ebp Value Memory Address Relative to esp Relative to ebp Value
X-38 esp-8 ebp-18 ? X-38 esp-C ebp-18 ?
X-34 esp-4 ebp-14 esi(x) X-34 esp-8 ebp-14 esi(x)
X-30 esp ebp-10 edi X-30 esp-4 ebp-10 edi(x)
X-2C esp+4 ebp-C ebx X-2C esp ebp-C ebx
X-28 esp+8 ebp-8 var2 X-28 esp+4 ebp-8 var2
X-24 esp+C ebp-4 var1 X-24 esp+8 ebp-4 var1
X-20 esp+10 ebp ebp X-20 esp+C ebp ebp
X-1C esp+14 ebp+4 eip X-1C esp+10 ebp+4 eip
X-18 esp+18 ebp+8 param1 X-18 esp+14 ebp+8 param1
X-14 esp+1C ebp+C param2 X-14 esp+18 ebp+C param2
X-10 esp+20 ebp+10 param3 X-10 esp+1C ebp+10 param3
X-C esp+24 ebp+14 edx X-C esp+20 ebp+14 edx
X-8 esp+28 ebp+18 ecx X-8 esp+24 ebp+18 ecx
X-4 esp+2C ebp+1C eax X-4 esp+28 ebp+1C eax
X esp+30 ebp+20 ? X esp+2C ebp+20 ?
AFTER pop esi AFTER pop edi
Memory Address Relative to esp Relative to ebp Value Memory Address Relative to esp Relative to ebp Value
X-34 esp-C ebp-14 esi(x)
X-34 esp-14 ebp-14 esi(x)
X-30 esp-8 ebp-10 edi(x)
X-30 esp-10 ebp-10 edi(x)
X-2C esp-4 ebp-C ebx(x)
X-2C esp-C ebp-C ebx(x)
X-28 esp ebp-8 var2
X-28 esp-8 ebp-8 var2(x)
X-24 esp+4 ebp-4 var1
X-24 esp-4 ebp-4 var1(x)
X-20 esp+8 ebp ebp
X-20 esp ebp ebp
X-1C esp+C ebp+4 eip
X-1C esp+4 ebp+4 eip
X-18 esp+10 ebp+8 param1
X-18 esp+8 ebp+8 param1
X-14 esp+14 ebp+C param2
X-14 esp+C ebp+C param2
X-10 esp+18 ebp+10 param3
X-10 esp+10 ebp+10 param3
X-C esp+1C ebp+14 edx
X-C esp+14 ebp+14 edx
X-8 esp+20 ebp+18 ecx
X-8 esp+18 ebp+18 ecx
X-4 esp+24 ebp+1C eax
X-4 esp+1C ebp+1C eax
X esp+28 ebp+20 ?
X esp+20 ebp+20 ?
AFTER pop ebx AFTER mov esp, ebp
Memory Address Relative to esp Value Memory Address Relative to esp Value
X-2C esp-10 ebx(x) X-2C esp-14 ebx(x)
X-28 esp-C var2(x) X-28 esp-10 var2(x)
X-24 esp-8 var1(x) X-24 esp-C var1(x)
X-20 esp-4 ebp(x) X-20 esp-8 ebp(x)
X-1C esp eip X-1C esp-4 eip(x)
X-18 esp+4 param1 X-18 esp param1
X-14 esp+8 param2 X-14 esp+4 param2
X-10 esp+C param3 X-10 esp+8 param3
X-C esp+10 edx X-C esp+C edx
X-8 esp+14 ecx X-8 esp+10 ecx
X-4 esp+18 eax X-4 esp+14 eax
X esp+1C ? X esp+18 ?
AFTER pop ebp AFTER ret
Then CPU returns to main routine to continue execution:
  add esp, 0x0C ;restoring stack to the state before the call
pop edx ;restoring registers
pop ecx ;
pop eax ;first saved register pops last
...
Memory Address Relative to esp Value Memory Address Relative to esp Value
X-1C esp-10 eip(x) X-1C esp-14 eip(x)
X-18 esp-C param1(x) X-18 esp-10 param1(x)
X-14 esp-8 param2(x) X-14 esp-C param2(x)
X-10 esp-4 param3(x) X-10 esp-8 param3(x)
X-C esp edx X-C esp-4 edx(x)
X-8 esp+4 ecx X-8 esp ecx
X-4 esp+8 eax X-4 esp+4 eax
X esp+C ? X esp+8 ?
AFTER sub esp, 0x0C AFTER pop edx
Memory Address Relative to esp Value Memory Address Relative to esp Value
X-C esp-8 edx(x)
X-C esp-C edx(x)
X-8 esp-4 ecx(x)
X-8 esp-8 ecx(x)
X-4 esp eax
X-4 esp-4 eax(x)
X esp+4 ?
X esp ?
AFTER pop ecx AFTER pop eax (Original state)