Chapter 9: Basic Loads & Stores; Big vs Little Endian
We've mentioned in previous Chapters about how Data can reside in Memory. How do we modify the contents that are residing in Memory? How do we figure out what's currently in Memory? Let's begin.
Simple Vocab Guideline~
Let's cover Store instructions first.
Store Double-Word:
str xD, [xA, UIMM12]
str xD, [xA, sSIMM12]***
str xD, [xA, xB]
***sSIMM12 stands for scaled 12-bit Signed Offset. Scaled means that the Immediate Offset must be a certain multiple. The size of the multiple is dependent on the data size used within the Destination Register. Since the Destination Register is for a Double-Word, the sSIMM12 must have a value that is a multiple of 8.
For *any* load or store instruction, a Memory Address must first be calculated via the addition of the two source registers, or the source register and the Immediate Value. The result of this equation is known as the Effective Address.
Effective Address calculation guide (applies to all loads and stores in this Chapter):
xA + UIMM12 = Effective Address
xA + sSIMM12 = Effective Address
xA + xB = Effective Address
Example:
str x29, [x3, #0x1B0]
x29 = 0xFFFFFFFF80000001
x3 = 0x0040A0000000
The Effective Address is 0x0040A0000000 + 0x01B0. Which is 0x0040A00001B0.
Once the instruction has executed, the value of 0xFFFFFFFF80000001 is stored to Memory Address 0x0040A00010B0. Whatever double-word value that was there beforehand, is now overwritten.
Okay, so now that you know how to store a full double-word, how do we store a single word? Well, we still use the same instruction (str), but we will use a non-extended Register as the Destination Register.
Store Word:
str wD, [xA, UIMM12]
str wD, [xA, sSIMM12] //Offset must be a multiple of 4
str wD, [xA, xB]
This stores the ENTIRE 32-bits of the non-extended register wD to the Effective Address.
In order to Store a halfword, we have to use the strh instruction.
Store Halfword:
strh wD, [xA, UIMM12]
strh wD, [xA, sSIMM12] //Offset must be a multiple of 2
strh wD, [xA, xB]
This stores the LOWER 16-bits of the non-extended register wD to the Effective Address.
To store just a Byte, we use the strb instruction.
Store Byte:
strb wD, [xA, UIMM12]
strb wD, [xA, SIMM12] //Scaled offset isn't applicable because it's a multiple of 1
strb wD, [xA, xB]
This stores the LOWER 8-bits of the non-extended register wD to the Effective Address.
Little Endian annoyances:
Before we discuss load instructions we need to cover Big vs Little Endian. Back in Chapter 3, the method that you have learned to navigate and observe memory with was using Big Endian. Big Endian makes the most intuitive sense. It's exactly like reading a book, left to right, top to bottom.
Little Endian is essentially the opposite of that. Unfortunately, ARM64 uses Little Endian.
Anyway, lets pretend you have the value of 0x0123456789ABCDEF in register x0. Let's also pretend that x1 contains the value (Memory Address) of 0x40008002B0.
If we preform the instruction of "str x0, [x1]", memory is now this....
As you can see the value was flipped to store all the bytes within the double-word backwards.
Now pretend memory is re-clear (re-zeroed). If we store just the word of w0 (lower 32-bits of entire GPR which is 0x89ABCDEF) to x1 (instruction would be "str w0, [x1]"), view of memory is this..
The bytes themselves are still in "intact" but all bytes of the word were flipped from left-right to right-left. IMPORTANT: Only the word value (0xEFCDAB89) was written to memory. The zeroes you see in the pic were *NOT* written by the store word instruction.
Okay pretend memory is re-cleared (re-zeroed) again. Now, if we store the halfword of w0 (lower 16 bits of w0 which is 0xCDEF) to x1 (instruction would be "strh w0, [x1]"), the view of memory is this ..
Just like with the double-word and word storing, the bytes are "intact" but flipped. IMPORTANT: Only the halfword value (0xEFCD) was written to memory. The zeroes you see in the pic were *NOT* written by the strh instruction.
Bytes for Little Endian load/store exactly how they would in Big Endian. Let's say memory is re-cleared yet again. We store the byte of w0 (lower 8 bits of w0 which is 0xEF) to x1 (instruction would be "strb w0, [x1]"). Memory is this...
IMPORTANT: Only the byte value (0xEF) was written to memory. The zeroes you see in the pic were *NOT* written by the strb instruction.
Confused still? Let's say we have x0 and x1 that are still the same values from earlier, memory is re-cleared, but we execute the following four instructions..
str x0, [x1] //Store 0x0123456789ABCDEF at 0x40008002B0
str w0, [x1, #0x10] //Store 0x89ABCDEF at 0x40008002C0
strh w0, [x1, #0x20] //Store 0xCDEF at 0x40008002D0
strb w0, [x1, #0x30] //Store 0xEF at 0x40008002E0
Before we cover specific Load instructions, let's discuss what they generally are. Load instructions still calculate the Effective Address via the same method as Store instructions do. Once the Effective Address is calculated, the load will occur. It's important that you understand that Load = Copy-Paste from Memory to Register. Whatever was in the Register beforehand is now overwritten. It's simply the opposite of Storing.
Little Endian is still applicable. For example....
0xEFCDAB8967452301 residing in Memory will be loaded into Register as 0x0123456789ABCDEF
0xEFCDAB89 residing in memory will be loaded into Register as 0x89ABCDEF
0xEFCD residing in memory will be loaded into Register as 0xCDEF
0xEF residing in memory will be loaded into Register as 0xEF
Load Double-Word:
ldr xD, [xA, UIMM12]
ldr xD, [xA, sSIMM12] //Offset must be multiple of 8
ldr xD, [xA, xB]
This is essentially the opposite of a Store Double-Word Instruction. The Effective Address must be calculated just like a Store Instruction would. The double-word value located at the Effective Address is COPY-PASTED into xD. Whatever value that was in xD beforehand, is now overwritten.
Example:
ldr x11, [x13, #0xFC]
x13 = 0x0041CF000B8
The Effective Address is 0x41CF000B8 + 0xFC = 0x41CF001B4
The double-word value located at 0x41CF001B4 will be copy-pasted into x11.
Load Word:
ldr wD, [xA, UIMM12]
ldr wD, [xA, sSIMM12] //Offset must be multiple of 4
ldr wD, [xA, xB]
To load a Word, we still use the same instruction name (ldr), but the Destination Register is now non-extended. It's important to note that in the Load Word instruction, the upper 32-bits (the Extended-only portion) of the Destination Register (xD) is *ALWAYS* set to zero. In fact, this is true for *EVERY* load instruction where the Destination Register is non-extended (wD).
Example:
ldr w11, [x17, #0x50]
x17 = 0x400C81050
The Effective Address is 0x400C81050 + 0x50 = 0x400C810A0
The word value located at 0x400C810A0 will be copy-pasted into w11. The Extended only portion bits (upper 32-bits of x11) are nulled.
Load Halfword:
ldrh wD, [xA, UIMM12]
ldrh wD, [xA, sSIMM12] //Offset must be multiple of 2
ldrh wD, [xA, xB]
Whenever a ldrh instruction is executed, the Extended only portion bits (upper 32-bits of xD) are set to zero. Also the upper 16-bits of wD are set to zero as well. Thus, in any ldrh instruction, wD always results as 0x0000XXXX with XXXX being the halfword value that was loaded.
Load Byte:
ldrb wD, [xA, UIMM12]
ldrb wD, [xA, SIMM12] //Scaled offset isn't applicable because it's a multiple of 1
ldrb wD, [xA, xB]
Whenever a ldrb instruction is executed, the Extended only portion bits (upper 32-bits of xD) are set to zero. Also the upper 24-bits of wD are set to zero as well. Thus, in any ldrb instruction, wD always results as 0x000000XX with XX being the byte value that was loaded.
One thing we haven't discussed yet is store and load examples is using load/store instructions that have the same numbered Destination & Source Register
Example 1:
str x5, [x5, #0xF24]
Here we have a store double-word instruction where the Destination and Source register use the same GPR. It operates the same way as like any other basic store instruction. Calculate the Effective Address (x5 + 0xF24), then the instruction simply stores x5 to the Effective Address.
Example 2:
ldrb w3, [x3]
This one is bit less simplistic. Even though the Destination and Source Register use the same GPR, they are using different forms (non-extended vs extended). In this instruction the byte value located at the address in x3 is then loaded into w3. Thus, x3 is no longer its original value.
Okay so at this point in your Assembly Journey, you have learned basic integer instructions (i.e. add) and basic load+store instructions. Now let's go over a simple exercise in which you will use your newly learned skills.
We want to write a Source of code that will do the following...
We will pretend that x5 + 0x4000 is the exact address of where our word value resides in memory. We will use w0 as our register that will be part of the incrementing. Our first instruction will be a ldr instruction, which is this...
ldr w0, [x5, #0x4000]
After this instruction has executed, w0 now contains our word value. Let's increment it by 2. This will require a simple add instruction that uses an Immediate Value of 2....
add w0, w0, #2
In the add instruction, we've used the same register for both the Destination Register & Source Register. Obviously, because we simply wanted to increment the loaded value by 2, this explains why we need our add result to rewrite itself back into w0. Okay great, our value has been incremented, now we just need to store it back to where we've loaded it from...
str w0, [x5, #0x4000]
The str instruction will store the word value back to memory. Here's the entire Source with comments...ldr w0, [x5, #0x4000] //Load our value from Memory
add w0, w0, #2 //Increment value by 2
str w0, [x5, #0x4000] //Store value back to Memory
Final Note:
For the case where an Immediate Value is 0, you do not need to include it when writing out the instruction. For example...
str x9, [x22, #0]
...Can be written as...
str x9, [x22]