AArch64/ARM64 Tutorial

Chapter 6: Instruction Format

Now let's move onto basic Instruction Format. When writing Instructions in a Source File, certain Format(s) must be followed or else the Assembler will output an error. We have 1 major factor to discuss first.... the GPRs. The GPRs have two different "modes". Extended and Non-Extended. Extended is when an Instruction will utilize all 64-bits of the GPR. When a GPR is in extended use, it is written in a Source File as such...

xD

D = Register's number. For example, GPR 5 in extended form is x5. GPR 22 in extended form is x22.

Non-Extended is when an Instruction will utilize only the lower (righthand-side) 32-bits of the GPR. When a GPR is in non-extended use, it is written as such...

wD

D = Register's number. For example GPR 11 in non-extended form is w11.

In every instruction, there is a Destination Register. In most instructions, the Destination Register is the Register that holds the result of an executed instruction, while the Source Register is the Register that is used to compute the result for the Destination Register. Some instructions will have one source register, while others will have two. Every instruction has only one Destination Register. 

For most instructions, there are essentially 4 main formats...

NOTE: "r" can be "x" (extended) or "w" (non-extended). 

Format 1:
rD, rA, rB

rD = Destination Register
rA = 1st Source Register
rB = 2nd Source Register

Keep in mind this is not an actual instruction, or an exact correct format. This is just to show you a very very general view of any instruction that uses two source registers to compute a value for the destination register. Now let's look at an example of an instruction with just one source register..

Format 2:
rD, rA

rD = Destination Register
rA = Source Register

Moving onto another Format that also uses 1 source Register, but also comes with something a bit different..

Format 3:
rD, rA, VALUE

rD = Destination Register
rA = Source Register
VALUE = Immediate Value

Now let's look at an example with zero source registers...

Format 4:
rD, VALUE

rD = Destination Register
VALUE = Immediate Value


Immediate Value (VALUE) is a numerical value that is **not** represented by what's in a Register. Think of it as writing a Value from 'scratch'. The implementation of Immediate Values allows the AArch64 language to have instructions that can provide more flexibility with less register usage.

Before continuing further into Immediate Values, it's vital that you understand Signed vs Unsigned values. Signed meaning negative numbers are possible. So how do we know if a number is negative?

  1. The first bit of the value contained in the register is a 1. and...
  2. You specify the values to be treated as Signed (this is determined by conditional branches which will be covered in Chapter 10).

Let's look at a value in a non-extended Register (wD). Assuming point 2 from above is true, if the very first bit of the Hex word value is a 1, then the value is negative.

Example of a negative value in wD:
0xFFFFFF18

To confirm this is a negative value, simply take the first hex digit (F) and convert it to binary. Which is 1111. Since the very first (leading) bit of this 4-bit value is 1, this determines that the hex value is negative. Remember!!! This is a negative value if and only if point 2 from above is also True. If not, then the example value is a positive value.


Signed Range in Non-Extended GPRs:
0x80000000 thru 0xFFFFFFFF (Decimal -2,147,483,648 thru -1)
0x00000000 = zero
0x00000001 thru 0x7FFFFFFF (Decimal 1 thru 2,147,483,647)

Unsigned Range in Non-Extended GPRs:
0x00000000 = zero
0x00000001 thru 0xFFFFFFFF (Decimal 1 thru 4,294,967,295)

In regards to the above example of wD (0xFFFFFF18). This hex value (when Signed) is -232 in decimal form.

For an extended register, since it is 64-bits in size, the Signed & Unsigned range is a bit different but follows the same principle...

Signed Range in Extended GPRs:
0x8000000000000000 thru 0xFFFFFFFFFFFFFFFF (Decimal -9,223,372,036,854,775,808 thru -1)
0x0000000000000000 = zero
0x0000000000000001 thru 0x7FFFFFFFFFFFFFFF (Decimal 1 thru 9,223,372,036,854,775,807)

Unsigned Range in Extended GPRs:
0x0000000000000000 = zero
0x0000000000000001 thru 0xFFFFFFFFFFFFFFFF (Decimal 1 thru 18,446,744,073,709,551,615)


Referring back to VALUE (Immediate Values aka Values from Scratch), there are some quirks in regard to its number range. VALUE comes in all sorts of "flavors". We are going to cover 6 flavors that are easy to dicpher for the beginner. As you continue through this tutorial, you will learn about the other "flavors".

Signed 32-bit values are actually 16-bit halfword values but need to be lengthen to 32-bits in size to represent negative numbers.

Signed Immediate Value 32-bit Range:
0xFFFF8000 thru 0xFFFFFFFF (Decimal -32768 thru -1)
0x0000
0x0001 thru 0x7FFF (Decimal 1 thru 32767)

Signed Immediate Value 64-bit Range: (values follow the same principal as 32-bits in size except for negative numbers; will be 64-bits)
0xFFFFFFFF80000000 thru 0xFFFFFFFFFFFFFFF (Decimal -2147483648 thru -1)
0x00000000
0x00000001 thru 0x7FFFFFFF (Decimal 1 thru 2147483647)

Unsigned 32-bit and 64-bit values follow the range you would expect. 

Unsigned Immediate Value 32-bit Range:
0x00000000
0x00000001 thru 0xFFFFFFFF (Decimal 1 thru 65535)

Unsigned Immediate Value 64-bit Range:
0x0000000000000000
0x0000000000000001 thru 0xFFFFFFFFFFFFFFFF

12-bit Signed values come with some quirks. When negative, 12-bit values will still be a word value in size for when you need to write them as Immediate Values for non-extended usage. For Extended usage, the negative 12-bit values must be written in 64-bit form.

Signed Immediate Value 12-bit Range for Non-Extended Use:
0xFFFFF800 thru 0xFFFFFFFF
0x000
0x001 thru 0x7FF

Signed Immediate Value 12-bit Range for Extended Use:
0xFFFFFFFFFFFFF800 thru 0xFFFFFFFFFFFFFFFF
0x000
0x001 thru 0x7FF

Unsigned Immediate Value 12-bit Range:
0x000
0x001 thru 0xFFF


Let's revisit the Zero Register that was briefly mentioned back in Chapter 4. It is a read only register that is always zero. There are rare occasions where you would need to use to this register to represent zero if an instruction doesn't permit Immmediate Value usage. It doesn't have a GPR number assigned to it. It has the following names...

You are free to use wzr/xzr in place of writing zero for an Immediate Value.


Clearing the air on some confusing Terminology:

There's some important terminology that we need to discuss before continuing further. Across the web you will see the term "ARM64" or "AArch64" used in place of ARMv8. For starters, AArch64 is a specific instruction set within the entire ARMv8 Architecture. It is the standard/regular/default instruction set of ARMv8. It is the instruction set that you will learn about in this tutorial. ARMv8 contains a mode with the ability to essentially use ARMv7 instructions. This is known as AArch32 or ARM32. There is also a 3rd instruction set known as Thumb32 (T32 for short). Thumb32 is a more basic instruction set that contains both 16-bit and 32-bit width instructions. Thus, it is a variable-length instruction set.

The term ARM64 is just an alternate name for AArch64. In my opinion, AArch64 can be a confusing term, because it can apply to ARMv9 as well. ARMv9 has its own AArch64 instruction set that has more features and enhancements than's ARMv8's AArch64 instruction set. ARMv9's AArch64 mode contains some instructions that cannot be used in ARMv8.

So to recap, here are the terms...
AArch64/ARM64 = The alias used for the main instruction set in ARMv8 and ARMv9; but they are not exactly the same
AArch32/ARM32 = 32-bit instruction set present in ARMv8 and ARMv9 that mimics ARMv7 very closely
Thumb32/T32 = Variable width (16 & 32-bit) instruction set present in ARMv8 and ARMv9.

In conclusion, from this point forward, this tutorial will use the term "ARM64" when referencing ARMv8's AArch64 Instruction Set.


FInal Notes: Zero can always be written as just 0. Leading zero digits can be omitted if desired. Because leading zero digits can be omitted, this means that the unsigned 12-bit range works for both extended and non-extended use.

Some considerations:
Alternatively, you can write the numbers in Decimal form into your Source File(s). You can also write out negative hex numbers with the minus (-) symbol. For example, you can write out -1 in Hex as -0x1.


Next Chapter

Tutorial Index