Common Assembly Structure
This section lists common control-flow structures and how they typically appear in RISC-V assembly (examples compiled on Compiler Explorer with a RISC-V 32-bit GCC).
Branches and loops
Section titled “Branches and loops”Define a function over positive integers:
The Collatz conjecture states that repeatedly applying eventually reaches 1 for any positive integer.
Example: Collatz steps (C vs assembly)
Section titled “Example: Collatz steps (C vs assembly)”int collatz(int n) { int steps = 0; while (n > 1) { if (n % 2 == 0) { n = n / 2; } else { n = n * 3 + 1; } steps++; } return steps;}Code 4.7: C source (collatz.c).
collatz: mv a4, a0 li a5, 1 li a0, 0 # steps = 0 bgt a4, a5, .L4 ret.L3: srai a4, a5, 1 # n = (3n + 1) / 2 addi a0, a0, 2 # steps += 2.L4: slli a5, a4, 1 # a5 = 2n add a5, a5, a4 # a5 = 2n + n = 3n andi a3, a4, 1 # a3 = n & 1 addi a5, a5, 1 # a5 = 3n + 1 bne a3, zero, .L3 li a5, 1 srai a4, a4, 1 # n = n / 2 add a0, a0, a5 # steps++ bne a4, a5, .L4 retCode 4.8: Example assembly (collatz.s).
Questions
Section titled “Questions”- Swapping the order of the
andi a3, a4, 1andaddi a5, a5, 1instructions would not change the program behavior. Why does GCC choose to computea3earlier? - In the C source, the odd case is written after the even case. Why does the assembly place the
.L3block (odd path) before the even path?
Function calls and recursion
Section titled “Function calls and recursion”In assembly there is no “function” as a special language construct—any block of instructions can be called as a function if it follows the calling convention.
A simplified RV32I calling-convention summary (see Table 4.1):
- Use
a0–a7for up to 8 integer-like arguments. - Use
a0–a1for return values (a0for 32-bit return). - Caller-saved registers (
ra,t0–t6,a0–a7) may be clobbered by callees. - Callee-saved registers (
s0–s11, andspis managed by the compiler) must be preserved by the callee if used.
Example: Fibonacci recursion
Section titled “Example: Fibonacci recursion”int fib(int n) { if (n <= 1) return n; return fib(n - 1) + fib(n - 2);}Code 4.9: C source (fib.c).
A typical stack frame (as generated by the compiler) may look like this (16-byte aligned):
| Stack offset | Contents |
|---|---|
| 12(sp) | saved ra |
| 8(sp) | saved s0 |
| 4(sp) | saved s1 |
| 0(sp) | padding (alignment) |
Figure 4.2: Example stack frame layout for fib.
fib: addi sp,sp,-16 sw ra,12(sp) # save ra sw s0,8(sp) # save s0 mv s0,a0 # s0 = n li a5,1 ble a0,a5,.L1 sw s1,4(sp) # save s1 addi a0,a0,-1 call fib # a0 = fib(n-1) mv s1,a0 # s1 = a0 addi a0,s0,-2 call fib # a0 = fib(n-2) add a0,s1,a0 # a0 += s1 lw s1,4(sp) # restore s1.L1: lw ra,12(sp) # restore ra lw s0,8(sp) # restore s0 addi sp,sp,16 jr raCode 4.10: Example assembly (fib.s).
Questions
Section titled “Questions”- When computing
fib(4), how many times isfibcalled in total? What is the difference between the minimum and maximum values ofsp(how many bytes does the deepest stack consume)? - If you want to run this program in Ripes, what changes are needed? Is it sufficient to simply replace
jr rawith a self-loop? - Why is saving
s1delayed until later (after the base-case check), instead of saving it immediately after savingraands0? What benefit does this bring? - Try compiling with
-O2. Does the generated code change significantly? If so, why is it better?