Working with Floats

Beginner or Intermediate ASM coders may have not done any work on Floating Points yet or very little. This thread will cover some basics of working with floats.

Chapter 1: Fundamentals

There are 32 floating point registers aka FPRs (f0 thru f31). Floating point values in the FPRs are displayed in their 64-bit (double-word) hexadecimal form. The float values can either used as single precision or double precision. Single precision floats will have less accuracy, but they only take up a word of space when residing in memory (in their 32 bit hexadecimal form). Double precision floats are for precise work, but at a cost with taking two words of space in memory.

Picture of the FPRs:

When a FPR value is stored to memory as single precision, its hexadecimal 32 bit (word) converted value is what gets stored. When a FPR value is stored to memory as double precision, what you see in the fpr register is what gets exactly stored to memory (double word)

Example of a Single Precision float in an FPR

Code:

`40B3880000000000 #Converted to decimal = 5000`

If this FPR value was stored to memory (converted to its 32 bit form), it will show a word value of 0x459C4000.

Question:

How do I tell if a FPR value is in single precision?

Answer:

If the last 7 digits of the FPR is null while the first digit of the second 32 bit segment is an even hex number (0,2,4,6,8,a,c,e), then the value is in single precision. See the guide below...

Code:

`Guide:`

XXXXXXXXY0000000

XXXXXXXXY = The value in fpr while Y is an even hex digit

Lets look at an example of a Double Precision float in an fpr...

Code:

`40B3880B8723A502 #Converted to decimal = 5000.045030811105`

If you took this fpr value and stored it to memory as double precision, it stores as a doubleword hexadecimal value of 0x40B3880B8723A502 (what you exactly see in the fpr).

Remember when I said double precision is used to maintain high accuracy? Well, if you were to take this double precision float value from its fpr and store it to memory as single precision, it would store as 0x459C405C.

Convert 0x459C405C to decimal and you get 5000.044921875. Accuracy is now off by almost 0.0001!

You will notice that there is a entirely separate second 64 bit segment in each fpr. For now we will not cover this second 64 bit segment as that deals with what's called 'Paired Singles'. Float instructions (that are NOT paired single instructions) only use the first 64 bit segment. Paired Singles are any instructions that have the first two letters of 'ps' it its instruction operand.

You will need a good 32 bit and 64 bit float converter. Here's a 64 bit converter (only allows one way converting. ie: 64bit float -> 32bit float & decimal)...

https://babbage.cs.qc.cuny.edu/IEEE-754.old/64bit.html

The converter is handy if you need to take the 64 bit value of a floating point register and find out what word value it would be if it was stored to memory as single precision.

Also here's a 32 bit float converter, obviously this can only be used for single precision. (Allows both ways of conversion for single precision 32-bit float to/from decimal)

https://www.h-schmidt.net/FloatConverter/IEEE754.html

Question:

How do I know which precision to use?

Answer:

Use the precision that corresponds to the default instruction of your code address. Both single precision and double precision instructions will be covered in the next chapters, showing you how to tell the difference between the two.

FYI:

Aldelaro's Dolphin-memory-engine comes with options to search for floats. You can only search using a decimal represent to look for single precision 32-bit float values in memory (word values).

Another Question:

What about signed vs logical?

Answer:

All Floating Point instructions treat their values as Signed for all operations. If a FPR value is negative, it is treated negative under every Float Instruction. If it's positive, it will be treated positive under every Float Instruction. Bit 0 of any floating point number in an FPR will tell you if a value is negative or not (if Bit 0 = 1, value is negative).

To keep things super simple, if you see a value in an FPR start with 0xB or 0xC, it's a negative value. FPR values starting with 0x3 or 0x4 are positive.

If you see a value in the FPR start with 0xFFF8, that deals with float-to-integer conversions which will be covered in Chapter 5. That is a Broadway specific feature and has nothing to do with standard floating point arithmetic.

Chapter 2: Storing/Loading

Here are the two basic instructions for loading and storing floating points.

stfs fD, SIMM (rA) #NOTE use stfd for double precision

This will store the float word value was single precision to the memory address calculated by SIMM+rA.

lfs fD, SIMM (rA) #NOTE: use lfd for double precision

This will load the float word value as single precision from the memory address calculated by SIMM+rA and store it in fD.

IMPORTANT NOTE: The Effective Address (SIMM + rA) of all store & load float instructions MUST be divisible by 4. If not, an Alignment Exception will occur.

Let's look at example where you have a single precision loading float instruction 'lfs f2, 0x20 (r5)', and you want to manually modify the value that gets loaded into f2.

#rX = Whatever is a safe register for use in your code

Code:

`lis rX, 0x4201 #Single precision float value for decimal value of 32.3`

ori rX, rX, 0x3333

stw rX, 0x20 (r5) #Write over what's going to be loaded from memory

lfs f2, 0x20 (r5) #Now load it; default instruction

Here's a picture showing the above source where I have already set r3 to 0x42013333, set r5 to 0x80D00100. So we only need to look at the last two instructions. The following picture shows you what everything looks like right before executing the stw instruction.

r3 is circled in red. Its value will be stored to the spot in memory (circled in blue). Onto to the next picture where the stw instruction has been executed.

The 32-bit float value is now in memory. It can be now be loaded into an FPR (f2) via the lf instruction...

If you are working with a double precision loading float instruction, it's gonna require a bit more work. Let's say our default instruction is lfd f3, 0x64 (r14).

WWWWXXXXYYYYZZZZ = Desired manually written 64 bit double precision float value

Code:

`lis rX, 0xWWWW #Set upper 32 bits`

ori rX, rX, 0xXXXX

stw rX, 0x64 (r14) #Store upper bits

lis rX, 0xYYYY #Set lower 32 bits

ori rX, rX, 0xZZZZ

stw rX, 0x68 (r14) #Store lower bits, Remember we have to increment the address by 0x4, due to double word value in memory

lfd f3, 0x64 (r14) #Default instruction

Here's a picture showing you the 64-bit double precision float of 0x3FD5555555555555 (circled in red) in the memory that is GOING to be loaded into f3.

Now this next picture shows the 'lfs f3, 0x0064 (r14)' executed by the CPU.

Chapter 3: Basic Math plus other instructions

There's a lot more math-based instructions for floats than compared to integers in Broadway PPC.

Important NOTE for the following 4 instructions: To have the instruction as double precision remove the final 's' from the instruction operand. Example: fadds -> fadd

fadds fD, fA, fB #fA is added with fB. Result in fD.

fmuls fD, fA, fB #fA is multiplied with fB. Result in fD.

fsubs fD, fA, fB #fA minus fB. Result in fD.

fdivs fD, fA, fB #fA divided by fB. Result in fD.

-----

Need to copy a value from floating point register to another? Simple.

fmr fD, fA #fA's value is copied to fD

Need to flip a positive value to be negative, or vice versa? Easy to do.

fneg fD, fA #fA's value is flipped and result is written to fD.

Need to make any possible negative results positive (their absolute value); easy peasy

fabs fD, fA #The absolute value of fA is placed in fD

Need to do the opposite of that (negative absolute), say no more

fnabs fD, fA #The negative absolute value of fA is placed in fD

Gotta round the float to its single precision value? Here ya go

frsp fD, fA #The rounded single precision value of fA is placed in fD

-----

Important NOTE about Square Roots:

If you are needing to get the square root of a floating point value, it's a bit of a nuisance with Broadway. Broadway does not come with a dedicated floating-point square root instruction. You instead need the following two instructions.

frsqrte fD, fA #Floating Point Reciprocal of Square Root Estimate

fres fD, fA#Floating Point Single Precision Reciprocal Estimate

If you are unfamiliar with what a 'Reciprocal' of a number is, here's a quick lesson.

The reciprocal of 4 is 1/4. The reciprocal of 257 is 1/257. Simple. Just change the whole number into a fraction where the number is the denominator and the numerator is the value of 1.

Thus, if you take the reciprocal of that reciprocal, the result is the whole number (i.e. reciprocal of 1/64 = 64).

Now that you understand reciprocals, we can to solve the Broadway floating point square root instruction issue. Here's an example snippet of code where you have a float value in f5 and you need to take the square root of it and have the result in f6.

Code:

`frsqrte f6, f5 #Get reciprocal of the square root of f5, place result in f6`

fres f6, f6 #Now take the reciprocal of the reciprocal to get square root value

Please note that frsqrte can handle both double precision and single precision float values. However, the fres instruction can only utilize single precision floats. If you have a double precision float value and don't mind the square root result to be in single precision, then use the following source.

Code:

`#fY is in double precision and we need it's square root value as single precision in fZ`

frsqrte fZ, rY

frsp fZ, fZ #Take double precision result and round it to single precision

fres fZ, fZ #Square root value now in fZ

If that doesn't work for you and you must have the square root result in double precision, we need to do some 'manual math'

Code:

`#fY is double precision and need to keep it in double precision and have result in fZ.`

#rX is a scrap general purpose register that is safe for use

#fX is a scrap float register that is safe for use

frsqrte fZ, fY

lis rX, 0x3FF0 #Manually write out double precision float value of 1 (0x3FF0000000000000)

stw rX, -0x8 (sp)

li rX, 0

stw rX, -0x4 (sp)

lfd fX, -0x8 (sp) #fX now equals 0x3FF0000000000000

fdiv fZ, fX, fZ #1/fZ to get reciprocal. Result back in fZ

Helpful tips for Beginners:

Setting a FPR value to 0~

fsub fX, fX, fX #Subtract FPR by itself, result will always be 0. Use fsubs for single precision

Setting a FPR value to 1~

fdiv fX, fX, fX #Will work as long as fX isn't 0. Use fdivs for single precision

Chapter 4: Float Comparisons

What if we need to do some comparison work with floats? Its similar to integer or GPR comparisons, but requires some extra work.

Floating Compare Ordered

fcmpo crf, fD, fA

This will do a comparison of fD vs fA. crf = the Condition Field within the Condition Register that will be used. For more details of the Condition Register and CR fields view this thread HERE.

You need to specify the Condition Field. When you have done integer comparisons in the past (i.e. cmpwi), you have probably never ran into a situation where you had to specify a particular CR field. By default, if no CR field is specified, cr0 is used. For basic Wii code usage when doing Float Comparisons, cr1 is what you want to use. If a CR field is not specified in your Float Comparison Instruction, the compiler will output an error.

Example Float Comparisons:

Code:

`#Compare f1 vs f2, if equal then branch. The branch is less likely to occur`

fcmpo cr1, f1, f2

beq- cr1, some_label

Code:

`#Compare f0 vs f30, if f0 is greater than f30, branch. The branch is most likely to occur`

fcmpo cr1, f0, f30

bgt+ cr1, some_label

You have to remember to also specify the CR field on any branch instructions that are meant to be correlated with your float comparison instructions.

It's important to note that there's no such thing as logical vs signed when it comes to floating comparisons. As mentioned in Chapter 1, Bit 0 of the floating point value will let you know if a value is positive or not. Here's a snippet of source that checks if a float is negative~

Code:

`#Example that assumes f13 is safe, and you want to check if f7 is negative, adjust which registers to use for your code`

fsub f13, f13, f13 #Subtract f13 by itself to make it zero; use fsubs for single precision

fcmpo cr1, f7, f13 #Now check f7's value against zero; place desired branch below

Note:

There's another different type of floating compare instruction, its is fcmpu. fcmpu = floating compared unordered. The difference between fcmpo and fcmpu is how the FPSCR (floating point status control register) is modified when a NaN (not a number) is present in the comparison. Thus, in simple terms, don't worry about fcmpu, just use fcmpo.

IMPORTANT NOTE regarding RC (.) shortcut for floating point instructions:

If a floating point instruction is used with the 'Record' shortcut/feature (as long as said instruction allows it), cr1 is effected instead of cr0. HOWEVER, comparison results are NOT set in cr1. Instead certain bits from what is known as the Floating Point Status Control Register is copied over to cr1. To keep it simple, for Gecko ASM Codes, do NOT use the RC (.) shortcut for any floating point instructions.

Chapter 5: Conversions Pt 1: (floats to integers)

There are two different instructions you can choose from to convert a floating point value to an integer.

fctiw fD, fA #fA is converted into integer form, result is placed into fD; standard rounding is applied (i.e. decimal value 5.8 rounds to 6)

fctiwz fD, fA #Same as instruction above but the value is rounded towards zero (i.e. decimal value 5.8 rounds down to 5)

Fyi: The rounding examples are shown in decimal representation for ease of readability. Obviously the integer values will be in hex form.

NOTE: When values are converted, do NOT concern yourself with the upper 32 bits of the fpr. The lower 32 bits will hold the result.

Example: Converting f1's value of 0x403EDF5840000000 to an integer. Place result in f13, apply standard rounding

Code:

`fctiw f13, f1`

Here's a picture of the value of 0x403EDF5840000000 in the f1 right before the fctiw is executed.

Once this instruction has executed, f13 will result in 0xFFF800000000001F. The following pic shows this...

As you can see in f13, the lower 32 bits (0x0000001F) is our result (31 in decimal). The exact decimal conversion of 0x403EDF5840000000 is 30.872440338134766. Since standard rounding was used, the result was rounded up to 31 (0x1F). If you had instead used the fctiwz instruction, the result would have been rounded down to 30 (0x1E).

~How to store the result to memory:

There are two different ways to store the integer result.

Method #1: stfiwx

stfiwx fD, rA, rB

Values of rA + rB = effective address. Lower 32 bits of fD (the integer result) is stored to the effective address.

Method #2: stfd (instruction example plus syntax shown in Chapter 2: Storing & Loading)

Both examples storing the result of f13 to memory address 0x80001554. Assume f13, f11 and r12 are safe for use.

Using Method #1:

Code:

`lis r12, 0x8000`

li r11, 0x1554

stfiwx f13, r11, r12

Pictures for Method #1

Pic of right before stfiwx is executed:

r11 already set designated by blue arrow

r12 already set designated by orange arrow

f13 (integer value) designated by red arrow

Spot in memory where integer value will be stored at is circled in magenta

Picture of the stfiwx executed by the CPU. Integer in memory circled in magenta

-----------------------------------

Using Method #2:

Code:

`lis r12, 0x8000`

stfd f13, 0x1550 (r12) #Since the entire double-word is being stored, we need to store it at 0x80001500 instead of 0x80001554.

Pictures for Method #2

Pic of right before stfd is executed

r12 already set designated by blue arrow

f13 (integer) designated by red arrow

spot in memory where f13's value will be stored at, circled in magenta

Picture of the stfd instruction executed by the CPU. f13's value in memory circled in magenta

As you can see, the entire value of f13 get stored as a double word to memory. The lower 32-bits of this value is the integer.

-------------------------------

Pros and Cons of stfiwx Method vs stfd Method:

- stfiwx requires 2 GPR's to use but only uses one word of space in memory

- stfd only requires 1 GPR but at the expense of using a double-word of space in memory.

Chapter 6: Conversions Pt 2: (integers to floats)

Do you need to convert an integer value to it's FPR value? Here's a snippet of code to do that (thank you salmon01)~

Code:

`#r3 holds your integer value to convert, result will be placed into f1. `

#This code assumes r3, r4, f1, and f2 are all safe. Adjust choosing which registers to use accordingly for your code's safety requirements

lis r4, 0x4330

stw r4, -0x8 (sp)

lis r4, 0x8000

stw r4, -0x4 (sp)

lfd f2, -0x8 (sp) # load magic double into f2

xoris r3, r3, 0x8000 # flip sign bit

stw r3, -0x4 (sp) # store lower half (upper half already stored)

lfd f1, -0x8 (sp)

fsub f1, f1, f2 # complete conversion

Keep in mind that the result in f1 is in double-precision. If you need this result in single-precision, you should also add in this instruction as well...

Code:

`frsp f1, f1`

The frsp instruction will round the double precision float to a single precision float. When stepping thru in Dolphin, you won't notice any changes in f1 when this occurs, but this is needed for real Hardware.

For more info about Dolphin's incorrect FPR emulation view the following threads~

https://mkwii.com/showthread.php?tid=1870 (Chapter 10)

https://mkwii.com/showthread.php?tid=1886

Chapter 7: Paired Singles

Remember that second 64 bit segment that was mentioned at the beginning of the thread? That deals with paired singles. Paired singles are just two single precision floats that can be manipulated simultaneously.

The first 64 bit segment of the fpr is the first paired single. And the second 64 bit segment is the second paired single. Paired singles are ALWAYS single precision!

If a paired single is stored to memory, both single precision floats are converted to their hexadecimal word value and stored to memory as a double word.

Example~

Code:

`C06C800000000000 4170000000000000 is the value in the fpr`

When written to memory, it will have a double word value of 0xC36400004B800000.

Here's a full tutorial on Paired Singles - https://mkwii.com/showthread.php?tid=1870

Happy coding!