AArch64/ARM64 Tutorial

Chapter 22: Load & Store Pair

You already know the basics of loads and stores. Let's go over Load Pair and Store Pair. The Load & Store Pair instructions are important to learn because once you get to Chapter 27: The Stack, these instructions are used heavily.


Store Pair Non-Extended:
stp wD, wA, [xB, sSIMM7] //7-bit Signed Offset that must be divisible by 4

xB is added with sSIMM7 to create the Effective Address (EA). wD is stored at the EA. wA is stored at the EA+4.

Example:
stp w1, w5, [x12]

Pretend w1 = 0x11112222
Pretend w5 = 0x000000A0
Pretend x12 = 0x40007FFC30

The above instruction will store 0x11112222 at 0x40007FFC30, and 0x000000A0 at 0x40007FFC34.

In the below pic, we see that w1 (outlined in green), w5 (outlined in red), x12 (outlined in blue) have the values of the above example. We also examine the double-word of memory at x12's address (outlined in magenta). It is currently all zeroes.

Now let's execute the instruction. Let's take a view of memory at x12's address again...

We see that w1's value was stored to 0x40007FFC30 (represented by green arrow), and w5's value was stored to 0x40007FFC34 (presented by red arrow). Due to Little Endian shenanigans, the two consecutive words present in memory are 0x22221111 A0000000.


Store Pair Extended:
stp xD, xA, [xB, sSIMM7] // 7-bit Signed Offset that must be divisible by 8

The does the same thing as the non-extended version of stp but the entire double-words of xD and xA are now being stored consecutively. xB + DSIMM8 = the Effective Address. xD is stored at the EA. xA is stored at EA+8.


Load Pair Non-Extended:
ldp wD, wA, [xB, sSIMM7] // 7-bit Signed Offset that must be divisible by 4

The double-word located at the Effective Address (xB + DSIMM8) in split into two words. The upper word is loaded into wD. The lower word is loaded into wA.


Load Pair Extended:
ldp xD, xA, [xB, sSIMM7] // 7-bit Signed Offset that must be divisible by 8

The quad-word located at the Effective Address (xB + DSIMM8) in split into two double-words. The upper double-word is loaded into xD. The lower double-word is loaded into xA

Example:
ldp x0, x10, [x9, #0x8]

Pretend that x9 = 0x40007FFDD0. What's in x0 and x10 will be replaced once the instruction executes.

Here's a pic of before the instruction has executed. We can see the quadword value at the Effective Address (0x40007FFDD8) outlined in magenta. x0 is outlined in green. x10 is outlined in red.

Now let's execute the instruction.

We can see the quadword was split into 2 double words. Upper double word was loaded into x0. This is represented via the green arrow. Lower double-word was loaded into x10. This is represented by the red arrow. Remember that Little Endian caused the "byte swaps" within each double-word before each double word was placed into its appropriate register.


FINAL CHAPTER NOTE: stp and ldp are able to use pre and post indexing.

Examples:
stp x2, x4, [x7, #0x28]!
ldp w15, w16, [x22], #20


Next Chapter

Tutorial Index