Skip to content

Common Assembly Structure

This section lists common control-flow structures and how they typically appear in RISC-V assembly (examples compiled on Compiler Explorer with a RISC-V 32-bit GCC).

Define a function over positive integers:

f(n)={n/2if n0(mod2)3n+1if n1(mod2)f(n) = \begin{cases} n/2 & \text{if } n \equiv 0 \pmod{2}\\ 3n+1 & \text{if } n \equiv 1 \pmod{2} \end{cases}

The Collatz conjecture states that repeatedly applying ff eventually reaches 1 for any positive integer.

int collatz(int n) {
int steps = 0;
while (n > 1) {
if (n % 2 == 0) {
n = n / 2;
} else {
n = n * 3 + 1;
}
steps++;
}
return steps;
}

Code 4.7: C source (collatz.c).

collatz:
mv a4, a0
li a5, 1
li a0, 0 # steps = 0
bgt a4, a5, .L4
ret
.L3:
srai a4, a5, 1 # n = (3n + 1) / 2
addi a0, a0, 2 # steps += 2
.L4:
slli a5, a4, 1 # a5 = 2n
add a5, a5, a4 # a5 = 2n + n = 3n
andi a3, a4, 1 # a3 = n & 1
addi a5, a5, 1 # a5 = 3n + 1
bne a3, zero, .L3
li a5, 1
srai a4, a4, 1 # n = n / 2
add a0, a0, a5 # steps++
bne a4, a5, .L4
ret

Code 4.8: Example assembly (collatz.s).

  • Swapping the order of the andi a3, a4, 1 and addi a5, a5, 1 instructions would not change the program behavior. Why does GCC choose to compute a3 earlier?
  • In the C source, the odd case is written after the even case. Why does the assembly place the .L3 block (odd path) before the even path?

In assembly there is no “function” as a special language construct—any block of instructions can be called as a function if it follows the calling convention.

A simplified RV32I calling-convention summary (see Table 4.1):

  • Use a0–a7 for up to 8 integer-like arguments.
  • Use a0–a1 for return values (a0 for 32-bit return).
  • Caller-saved registers (ra, t0–t6, a0–a7) may be clobbered by callees.
  • Callee-saved registers (s0–s11, and sp is managed by the compiler) must be preserved by the callee if used.

int fib(int n) {
if (n <= 1)
return n;
return fib(n - 1) + fib(n - 2);
}

Code 4.9: C source (fib.c).

A typical stack frame (as generated by the compiler) may look like this (16-byte aligned):

Stack offsetContents
12(sp)saved ra
8(sp)saved s0
4(sp)saved s1
0(sp)padding (alignment)

Figure 4.2: Example stack frame layout for fib.

fib:
addi sp,sp,-16
sw ra,12(sp) # save ra
sw s0,8(sp) # save s0
mv s0,a0 # s0 = n
li a5,1
ble a0,a5,.L1
sw s1,4(sp) # save s1
addi a0,a0,-1
call fib # a0 = fib(n-1)
mv s1,a0 # s1 = a0
addi a0,s0,-2
call fib # a0 = fib(n-2)
add a0,s1,a0 # a0 += s1
lw s1,4(sp) # restore s1
.L1:
lw ra,12(sp) # restore ra
lw s0,8(sp) # restore s0
addi sp,sp,16
jr ra

Code 4.10: Example assembly (fib.s).

  • When computing fib(4), how many times is fib called in total? What is the difference between the minimum and maximum values of sp (how many bytes does the deepest stack consume)?
  • If you want to run this program in Ripes, what changes are needed? Is it sufficient to simply replace jr ra with a self-loop?
  • Why is saving s1 delayed until later (after the base-case check), instead of saving it immediately after saving ra and s0? What benefit does this bring?
  • Try compiling with -O2. Does the generated code change significantly? If so, why is it better?