AArch64/ARM64 Tutorial

Chapter 25: Float Loads & Stores

The one category of floats we haven't covered yet is floating point loads and stores. For plain jane float loads/stores (regardless of precision), you will use ldr for load and str for store. The Effective Address is calculated using the same method as Integer loads/stores. This means the source register for all floating point loads & stores is a GPR. It's important to know that only the extended registers can be used as source registers in floating point loads/stores.

IMPORTANT NOTE: there are no data conversions (float to int, int to float, precision changes, etc) that occur during any float load/store. So let's say you store a double precision (64-bit) float to memory. It will reside in memory as it's double-precision Hex form. Then when you load that value from memory into an FPR, what you see in Memory is what gets placed into the FPR.

Obviously, you have to make sure you designate the correct precision you want to use via how you write out the FPR within the instruction.

For example here is a load instruction that loads a single precision float into FPR7 from memory address at x0+0x20
ldr s7, [x0, #0x20]

The "s" in s7 is to designate to load what's at x0+0x20 into the lower 32-bits of FPR 7.

IMPORTANT NOTE: In any float load instruction, the unused/unrelated upper bits are always nulled out.

X = Destination Register

ldr hX = Upper 112 bits set to null
ldr sX = Upper 96 bits set to null
ldr dX = Upper 64 bits set to null
ldr qX = Nothing set to null, entire register is loaded with data indicated located at the Effective Address

So in the above example of "ldr s7, [x0, #0x20]", FPR 7's lower 32-bits gets loaded with word value located at x0+0x20. Then the upper 96-bits are nulled. Keep in mind that the data loaded doesn't have to be a Float compliant Hex Number. Remember no conversions take place.

Example of loading a double precision float into FPR31 at x22 minus 0x10
ldr d31, [x22, #-0x10]

The double-word (64-bits) located at the Effective Address is placed into the lower 64-bits of FPR 31. Upper 64-bits are nulled.

Example of storing quad precision float of FPR27 to x10
str q27, [x10]

The entire FPR (128-bits, quadword) of FPR27 is store to the Effective Address (x10).

NOTE: You can use Pre and Post Indexing if desired.

You can also do paired load/stores (ldp and stp). Immediate offset is a signed 7-bit value, if used. Pre and post indexing is allowed. ldp and stp *cannot* be used on half-precision float values!

Examples:
ldp s7, s3, [x22] //Loads word value at x22 into s7, loads word value at x22+4 into s3.
stp q1, q2, [x3, #0x10]! //Stores quadword value of q1 at x3+0x10. Stores quadword value of q2 at x3+0x20. Afterwards x3 is incremented by 0x10.

Also, since there is no data conversions done on the FPRs for loads/store, you can use the FPRs for fast integer loops.

Example:
//Move 40 words (10 quadwords) of data from start address in x5 to start address in x7
loop:
mov w13, #10 //Set loop count to 10
ldr q0, [x5], #0x10 //Load quadword *then* increment x5 address by 0x10
str q0, [x7], #0x10 //Store quadword *then* increment x7 address by 0x10
subs w13, w13, #1 //Decrement loop counter, update condition flags
bne loop //Repeat loop when w13 =/= 0

Immediate Value rules:
In the non-pair Float Loads & Stores, if an Immediate Value is used, it must following one of the following criteria..

Unscaled 9-bit Signed Offset
Scaled 12-bit Unsigned Offset

Pre and post indexing is only allowed for the unscaled offset. Regarding the scaled 12-bit offset, this means that the offset must be divisible by the size of the Destination Register. So for example, using a single precision FPR means the offset must be divisible by 4. Using a double precision FPR, the offset must be divisible by 8. Using the entire FPR (quad), offset must be divisible by 16 (0x10).

For ldp and stp, there is only one type of offset rule. That is a Scaled 7-bit Signed offset. So same scaling concept applies, if the pair is single precision FPRs, offset must be divisible by 4. If pair is double precision FPRs, offset must be divisible by 8, If pair is quad/entire FPRs, offset must be divisible by 16.

If these rules aren't followed, the Assembler will attempt to modify the instruction on the fly to be compliant. If it fails, you will get an Error.

Float Register Safety:
Back in Chapter 23, we went over a very simple template to follow for Register Safety, but it was for GPRs only. Here is a very basic beginner FPR version of that...

f0 thru f7 = Semi Safe; you may need to backup and restore values later
f8 thru f31 = Not safe; values must be backed up then later restored

Next Chapter

Tutorial Index