AArch64/ARM64 Tutorial

Chapter 26: Functions

What is a function? A function is a subroutine of code that accepts input(s), processes the inputs to preform a task(s), and give back output(s) depending on how the task(s) went. Of course, this is very general, some functions don't take a input (task preformed is always the same), and some functions don't give out an output(s).

To execute a function, you "call" it. This means you have to setup the proper inputs beforehand. Once the inputs have been you set, you can call the function. In general, functions are very large and contain complex tasks that should not be "handwritten". The C library has many built in functions such as printf (print text to console/terminal).

You can of course write your own functions for custom tasks, but when I speak of functions, it's for the very complex ones that should *not* be handwritten. These complex functions can be something such as a C library or if you are doing some video game cheats, it would be functions that are already built into the game.

A majority of functions contain what is called a Prologue. A Prologue is simply a list of instructions that will modify the SP register for the purpose of backing up some registers, and/or allocating some memory for usage.

Template of a Prologue:
stp fp, lr, [sp, #0xXX]! //The first instruction of a prologue *ALWAYS* involves a store-pair instruction of the fp & lr to the sp register. "#0xXX" size can vary.
mov fp, sp //This *may* be the last instruction of the prologue. Sometimes not, but it will always be present somewhere in the prologue after the above instruction.

A majority of functions end in an Epilogue. An Epilogue is essentially the opposite of the Prologue. It will retrieve back any old data that was previously saved during the Prologue. Any allocated memory will be de-allocated. Also, fp, lr, and sp will be set to their pre-Prologue values.

Template of an Epilogue:
ldp fp, lr, [sp], #0xXX //This will usually be the 2nd-to-last instruction of the epilogue. 0xXX here will match the 0xXX value used in the prologue
ret //*ALWAYS* last instruction of an epilogue

We won't go over all the instructions in detail in this Chapter. I'm simply providing the templates so you know how to spot a Function in a Program. Functions, for the most part, are called via a bl instruction.

Branch & Link:
bl label

This is just like a regular unconditional branch (b label) instruction except the address of the instruction, that is *underneath* the bl instruction, is placed into the Link Register (lr aka x30).

Example:
bl somewhere
add w1, w3, w3

In the above example, once the bl instruction executes, the CPU will jump to the location of "somewhere", and then x30 (lr) will be written with the address of the location of the add instruction.

Let's say the "bl somewhere" instruction resides in memory at address 0x400F0. When it gets executed, the address value of 0x400F4 (where the add instruction resides at) will be placed into x30.

This is important to understand because this mechanism is how functions can return back from where ever they were called from. As you saw earlier, the Epilogue always ends in the ret (return instruction).

Return:
ret xD //Extended Register usage only

The ret instruction is just an unconditional branch, but it branches to the Memory Address in xD. Optionally, you can omit the usage of "xD" in the ret instruction if you want ret to utilize x30 (lr).

So very broadly, a program utilizes a function like this...

Setup Input Values
bl function_location
Prologue of Function is executed
Function executes task(s)
Epilogue of Function is executed
ret (last instruction of Epilogue)
Execution of CPU is back to where it started

Function Arguments:

We talked about how most functions need input value(s) beforehand. We call those Arguments.

When a function requires Arguments, certain GPRs (and/or FPRs) must be filled with certain values (exact values depend on the function and what task is be done with said function). x0 is used for the 1st argument, x1 is used for the 2nd arg, etc etc..... View list below.

x0 = 1st Argument
x1 = 2nd Argument
x2 = 3rd Argument
x3 = 4th Argument
x4 = 5th Argument
x5 = 6th Argument
x6 = 7th Argument
x7 = 8th Argument
f0 = 1st Argument that's a Float or Vector
.. ..
f7 = 8th Argument

Of course, the above list is entirely function dependent. If a function requires zero args, then the above is completely irrelevant. If a function requires 3 GPR args, then args go in x0, x1, and x2. If a function requires 1 GPR arg and 2 Float Args, then arguments go in x0, f0, and f1.

Let's talk about the commonly known printf function. This function will output a string of text to the console/terminal. It requires, at the very least, 1 argument. Which is the memory address to the string of text.

Example:
adrp x0, location_of_string
add x0, x0, :lo12: location_of_string
bl printf
..
..
location_of_string:
.asciz "Hi there.\n"

As you can see since there was only 1 argument, it goes into x0. Printf *can* have more than 1 argument. These additional arguments are notated within the string via what is called a Format Specifier.

Format specifier follows this structure~
%[flags][width][.precision][length]specifier

This page HERE does a great breakdown of printf and format specifiers. View the specifier chart on the site. You can see how these format specifiers can specify something such as an integer value. We can make the printf function display desired integer values by...

Having the format specifier(s) in the string
Supplying the value(s) for the specifier(s) via Register Arguments

Example:
adrp x0, location_of_string
add x0, x0, :lo12: location_of_string
mov x1, #100
mov x2, #5
bl printf
..
..
location_of_string:
.asciz "Hello, I am %d years old. I own %d cars.\n"

We can see that the string in the example contains 2 format specifiers. Therefore it has 3 total arguments. x0 contains the memory address of the string. x1 contains the value we will use for the 1st specifier (1st %d). x2 contains the value we will use for the 2nd specifier (2nd %d).

The use of specifiers allows us to change what values we can apply to the string without having unique strings for every unique combinations of values. Anyway, when the string does print to the console, it will be as such (with a new line being entered into)...

Hello, I am 100 year old. I own 5 cars.

Let's look at some pics for you to get a better idea. Here's a pic of the above of right before the bl instruction is going to be executed.

We can see in the above pic that x0 (designed by green arrow) contains the Memory Address that points to our String. x1 contains the integer value used for the first format specifier (designated by red arrow). x2 contains the integer value used for the second format specifier (designated by blue arrow).

We can observe memory at x0's address to confirm that it does indeed point to the String.

We see the string is outlined in magenta, I drew the outline to include the null byte after 0xA (0xA is the ASCII format to enter into a new line below). Whenever printf uses a String, it must end in a null byte. Don't worry, your Assembler will do this for you as long as you use the .asciz Psuedo-Op.

We see the hex contents of....

48 65 6C 6C 6F 2C 20 49 20 61 6D 20 25 64 20 79 65 61 72 73 20 6F 6C 64 2E 20 49 20 6F 77 6E 20 25 64 20 63 61 72 73 2E 0A

If you take the above hex values and plug them into a Hex to ASCII converter, it will display our String.

A breakpoint in GDB has been set for the instruction (mov; Address 0x400600) after the bl instruction. The program is allowed to run, thus printf has been executed. Let's take a look of the produced String in the console/terminal

And here's a pic of GDB after hitting the breakpoint (printf function completed and returned back to us).

We see x0 (outlined in green) contains the byte length of the produced string (excluding the required ending null byte). This is printf's return value. Let's now dive into Return Values....

Return Values:

Most functions will return value(s) to let you know if the task was successful, denied, failed, etc. Return values are almost always returned in x0 and/or f0. If necessary, more return values are placed into x1 and/or f1. Usually a function will write the return values to the appropriate registers within the Epilogue or right before the Epilogue.

In regards to the printf function, these are the following possible return value scenarios...

If successful, number of printed characters is returned
If unsuccessful, a negative number is returned

On point #2, the exact negative number varies per architecture. For our examples, we will just pretend it's always -1.

In regards to the snippet of code we covered with the pics from earlier, when printf has completed and returned to us, x0 contains the value of 41 since the above string is 41 characters in length when the format specifiers are replaced with the values provided in x1 and x2.

If for whatever reason, printf failed, x0 would be -1.

When calling functions and receiving return value(s), you should write your program/code to check for any possible errors. For printf, you could write out something like this...

..
bl printf
cmp x0, #0 //Check if printf was a success
blt error //If x0 is a negative value, we have an error
..
..
..
error: //Spot in source file to handle printf errors
..

As you can see, when you need to check the return values, it should be done immediately (first instructions underneath the bl).

NOTE: Under normal programs, there is rarely error checks after printf and/or puts. This is because those functions basically don't fail. They only "fail" due to the User unknowingly putting in the wrong inputs which produces a different string than what they were expecting. The function still produced the correct string and therefore did not fail, it was just User input ignorance.

What's great about ARM64 is that there are dedicated instructions for quick error checking that can sometimes replace the use of cmp-branch.

Compare, Branch if Zero
cbz label

Compare, Branch if Not Zero
cbnz label

Now for the case of checking printf return values, we *cannot* use any of the two above instructions since we are checking if a value is *less than* zero, but these instructions are definitely handy for many other functions where their return values operate differently than printf.

Calling Convention; Terminology:

Note: Caller vs Callee will be explained shortly

We need to discuss how all the GPRs and FPRs are categorized within the ARM64 architecture.

x0 thru x7 are known as the Parameter & Result Registers. Obviously these are the GPRs used for Args & Return Values in Functions.

x8 (xr) is an indirect result register. This will only matter for you if you happen to use emulated syscalls in your source. The syscall identifier number (tell QEMU which syscall to use) goes into x8. And yes it's called an indirect RESULT register even though it's used for an input in syscalls.

x9 thru x15 are the caller saved temporary registers. If there is data in these registers that need to stay intact, the caller must place the data somewhere safe before calling the function. The values in these registers are not saved throughout functions.

x16 (ip0) and x17 (ip1) are inter-procedural scratch registers. Meaning they are used as scratch registers only during a Prologue. Other than that, they basically are never used.

x18 (pr) is known as the platform register. Basically just another scratch register. Rarely used.

x19 thru x28 are the callee saved registers. These registers get preserved throughout function calls. These registers must be backed-up/saved by the Callee before being used. The Caller can place data in these registers before a function is executed. When said function has returned, the data is intact. These registers are also known as the Non-Volatile Registers.

x29 (fp) is the frame pointer. We will explain this in the next chapter.

x30 (lr) is the link register. You already know what this is. It holds the address for the function to know where to return back to once it's completed.

f0 thru f7 are the Parameter & Result Registers for Float specific Args & Return Values.

f8 thru f15 are the callee saved registers (similar to x19 thru x28 for GPRs). IMPORTANT: Only the lower 64-bits of the FPR(s) are preserved!

f16 thru f31 are scratch registers and are treated similar to x9 thru x15. If there are values in these registers that need to be preserved, then it's responsible on the caller.

Caller vs Callee

Caller vs Callee?. What do these terms mean?

Let's say you have a program and it calls a function. The "code" or that portion of the program that sets the inputs, calls the function, and process the outputs, is known as the CALLER.

The CALLEE is the instructions within the function itself, which ofc includes the prologue and epilogue.

Keep in mind the Callee can become a new Caller. If a function contains a function within itself, this becomes the case. The function that calls the new function is the caller. The new function is the callee.

The Caller is responsible for...

Setting up any Input Values
Backing up x9 thru x15, and f16 thru f31 if there is data in here that the program wants to be saved
Calling the function (i.e. bl function)
Processing/Verifying the return values

The Callee is responsible for...

Prologue
Save the inputs (they get saved to x19 thru x28, and f8 thru f15); thus the values in these registers must be saved before being used by the Callee
Epilogue
Return back to caller (ret; part of the Epilogue)

An alternative name for Caller is Parent. And for Callee is Child.

Register Safety; File Management Demonstration

We can use the callee-saved registers to keep data intact throughout multiple function calls. The best way to explain/show this is via file management function calls. We will use supplied C library functions to open a file, read it, close it.

When opening files, we need to supply the file name, which will reside in Memory. We also need to provide what kind of permissions which we use to open the file with (read, write, etc). Once the file has been opened, we will be given what is called a File Pointer (fp for short; don't confuse this for x29).

Once we have the fp, we use that plus the following items to read a file.

Memory Address to place read contents to
Size (always 1)
Count (the actual size; the amount of bytes to read from the file)

Reading a file means dumping its contents to Memory. Once we have read the file, we will need to close it. To close a file, we need its file pointer. Thus, after opening the file, we need to save the fp to a callee-saved register so we don't lose it when reading the file.

fopen = C function to open files
fread = C function to read files
fclose = C function to close files

We will open the file at /example/test.txt.
We will read the first 8 bytes of the file.
We will close the file.

Let's review the args and return values for each function...

Args for fopen:

x0 = Memory Address where file name is located
x1 = Memory Address to the permissions string. i.e. asciz string of "r" is for read

Return Values for fopen (x0):

0 = Error
Anything not zero = fp

Args for fread:

x0 = Memory Address to dump (read) file's contents to
x1 = Size (always 1)
x2 = Count (amount of bytes to dump/read)
x3 = fp

Return Values for fread (x0):

x0 = Count (x2 arg of fread) if successful
Anything else = Error

Args for fclose:

x0 = fp

Return Values for fclose (x0):

0 = Success
Anything not zero = Error

With all of that being said, here is the source...

//start of example
file_name_location:
.asciz "/example/test.txt" //File Name for fopen
..
..
read_perms_location:
.asciz "r" //Read permissions for fopen
.align 2 //Block needs aligned to make sure ARM64 instructions below are word-aligned
..
..
//Actual start of program, the above is just the region of memory where the strings reside at
adrp x0, file_name_location
add x0, x0, :lo12: file_name_location
adrp x1, read_perms_location
add x1, x1, :lo12: read_perms_location
bl fopen
cbz error //If zero, error occurred
mov x19, x0 //Save file pointer into x19 callee-saved register!!!

movz w0, #0xCC00, lsl #16 //Pretend the Memory Address we are dumping the contents to is 0xCC000000.
mov w1, #1 //Size; always 1
mov w2, #8 //Dump 8 bytes
mov x3, x19 //file pointer
bl fread
cmp w0, #8 //return value must equal count
bne error

mov x0, x19 //file pointer
bl fclose
cbnz error //If not zero, error occurred
//end of example

When in need to use the callee-saved registers, you start with x19 and work your way up where x28 is the last GPR callee register. As an fyi, an alternative name to these registers (other than non-volatile) is the Global Variable Registers, GVR for short.

Next Chapter

Tutorial Index