Paired Singles

Chapter 1: Intro

Paired Singles are probably the most strenuous aspect of PPC Assembly to understand.

First thing's first, you have to already know the basics of PPC ASM, able to make codes, and you already understand the fundamentals of Floating Points + FPRs. I have a great tutorial covering Floats which you can find HERE. Read that first before coming back here.

Chapter 2: Paired Singles in a FPR

With normal floating point instructions, you have only been working with the first 64-bit segment. This first segment which all normal floating point instructions work on is known as Paired Single 0 (or ps0 for short). The second segment is Paired Single 1 (ps1). Paired Single instructions utilize both ps0 and ps1 simultaneously, and the floats in ps0 & ps1 can only be of single precision. Hence the term Paired Singles (a pair of single precision floats). Using paired single instructions when a double precision float is present in an FPR will cause undefined behavior/results and could lead to an exception.

Chapter 3: Paired Singles in Memory

Since the values in ps0 and ps1 are single precision, each paired single (if stored/loaded to/from memory) is a word value.

Example: You have this value in f1 - C06C800000000000 4170000000000000

- ps0 = C06C800000000000

- ps1 = 4170000000000000

If this paired single value was stored to memory, it will be shown as 0xC36400004B800000

- 0xC3640000 is the ps0 value

- 0x4B800000 is the ps1 value

It's important to understand that paired singles are always two words in memory linked together as a double word.

Chapter 4: Storing/Loading

Here are some basic instructions for loading and storing paired singles.

psq_l fD, VALUE (rA), W, I

#VALUE is signed

The above example will load the paired single value located at rA+VALUE into fD. The single precision floating point value located at rA+VALUE will be loaded in ps0 and the single precision floating point value located at rA+VALUE+4 will be loaded into ps1.

psq_st fD, VALUE (rA), W, I

#VALUE is signed

The above is a basic paired single storing instruction. Value of ps0 is stored to memory location rA+VALUE and the value of ps1 is stored at rA+VALUE+4.

VALUE is NOT a 16-bit signed range. For paired single loading/storing instructions it's actually a 12-bit signed range (0xFFFFF800 thru 0x000007FF). If this range isn't followed, your compiler will throw an error.

Note: The format I have shown above for the two instructions is the format that all PPC compilers use. For whatever reason, Dolphin (in Code view) uses a slightly different format.

What are the W and I values?

W can only be two values. 0 or 1. If W is 1 then ps1 of the FPR is always loaded/stored as the floating point value of 1 (0x3FF0000000000000) regardless what is in the fpr/memory beforehand. If W is 0, then the loading/storing of ps1 acts normally.

The I value represents which Graphics Quantization Register (GQR) will be used. To keep things simple for now, for basic loading/storing of paired singles, simply set the I value to 0. This will be fine for most Wii games. More information covering the nitty gritty of GQRs will be discussed later in Chapter 8.

Chapter 5: Basic Math & Single Parameter Instructions

Like floating point instructions, there are plenty of instructions for paired single instructions for all sorts of math.

ps_add fD, fA, fB

#fA ps0 + fB ps0 = fD ps0

#fA ps1 + fB ps1 = fD ps1

Other math instructions:

ps_sub (subtracting)

ps_mul (multiplying)

ps_div (dividing)

You also have some basic single parameter instructions...

ps_mr fD, fA

#fA ps0's value is copied to fD ps0

#fB ps1's value is copied to fD ps1

ps_abs fD, fA

#absolute value of fA ps0 is placed into fD ps0

#absolute value of fA ps1 is palced into fD ps1

ps_nabs fD, fA

#same as ps_abs but using negative absolute value instead

ps_neg fD, fA

#fA ps0 (if positive) is changed to its negative value (i.e. 1 to -1), result placed in fD ps0. If negative, then the value it changed to positive (i.e. -1 to 1). Same effects will obviously also apply to fA ps1 and result will be placed into fD ps1.

Chapter 6: Comparisons

ps_cmpo is the instruction for ordered paired single comparisons. There is also the ps_cmpu instruction for unordered comparisons.

The difference between ordered and unordered comparisons is how the FPSCR (floating point status control register) is modified when a NaN (not a number) is present in the comparison. Thus, in simple terms if you have no idea about what I just stated, stick with ordered comparisons only (ps_cmpo).

There are two types of ordered comparisons. One for ps0 and one for ps1

ps_cmpo0 crf, fD, fA

#fD ps0's value is compared to fA ps0's value

ps_cmpo1 crf, fD, fA

#fD ps1's value is compared to fA ps1's value

Just like with floating point instructions, you need to specify the condition register field. To keep it simple, use cr1.

Example comparison:

#Let's say f3 ps1 has the value of 1.0 in decimal form, and f0 ps1 has the value of 3.5 in decimal form.

Code:

`ps_cmpo1 cr1, f3, f0`

bgt- some_label

In the above source, the branch would NOT be taken because 1.0 is less then 3.5.

Chapter 7: Other Instructions

Paired Singles has some nifty instructions to move values across the paired single segments. There are 4 different instructions you can use.

ps_merge00 fD, fA, fB

#fA ps0 is copied to fD ps0. fB ps0 is copied to fD ps1

ps_merge01 fD, fA, fB

#fA ps0 is copied to fD ps0. fB ps1 is copied to fD ps1

ps_merge10 fD, fA, fB

#fA ps1 is copied to fD ps0. fB ps0 is copied to fD ps1

ps_merge11 fD, fA, fB

#fA ps1 is copied to fD ps0. fB ps1 is coped to fD ps1

You can use ps_merge10 to swap the ps values within an FPR. This could come in handy for you.

Example (swap the ps values of f0)~

Code:

`ps_merge10 f0, f0, f0`

Here are two instructions that allows you to do basic addition across the paired single segments. Obviously if the value(s) are negative, the instructions can be used as subtraction instead.

ps_sum0 fD, fA, fC, fB

#fA ps0 + fB ps1 = fD ps0

#fC ps1 = fD ps1

ps_sum1 fD, fA, fC, fB

#fA ps0 + fB ps1 = fD ps1

#fC ps0 = fD ps0

Here's an example of adding ps0 and ps1 of an FPR together and having the result stored back in ps0 of the same FPR. This could be useful for something such as writing some source involving basic Trigonometry. We'll use f5 for the following example.

Code:

`ps_sum0 f5, f5, f5, f5`

If for w/e reason you wanted the result in ps1 of f5, just use ps_sum1 instead.

Chapter 8: Scaling, Quantization, and the GQRs

When a paired single load instruction of any type is executed, this is known as Dequantization (or Dequantizing). If it's a paired single store instruction of any type, it is known as Quantization (or Quantizing). Quantizaton/Dequantization is just a fancy term for taking integer values and having them in a scale (or range) that can be converted back to or from a float value.

The use of Scaling allows you to have integer values (no larger than halfwords) in memory and utilize them as floating points (by a specified range) in the paired single segments of an FPR.

Revisiting Chapter 4 regarding the paired single loading/storing instructions, the I value stands for which Graphics Quantization Register (GQR) to use. There are 8 total GQRs. The I value can be any number from 0 to 7.

Each GQR is 32 bits of data. You can modify a GQR to effect how scaling is done during quantization and/or dequantization. Wii Games will usually set the all the GQR values during its early boot sequence. You can modify any GQR value but be sure to restore its value at some point later in your source/code.

You can write a value to the GQR's via the mtspr instruction. Here are the spr numbers for the GQRs

- 912 = GQR0

- 913 = GQR1

- 914 = GQR2

- 915 = GQR3

- 916 = GQR4

- 917 = GQR5

- 918 = GQR6

- 919 = GQR7

To write a register value to the GQR...

mtspr XXX, rD

#XXX = GQR's spr number, rD contains the value to copy to the GQR

To backup the value of a GQR to a register...

mfspr rD, XXX

#rD is the register where the GQR's value will be copied to, XXX = GQR's spr number

Structure of GQR:

- Bits 0 & 1 = Unused

- Bits 2 thru 7 = Scale Value for Dequantization; 6 bit value (L_Scale)

- Bits 8 thru 12 = Unused

- Bits 13 thru 15 = Dequantization Type; 3 bit value (L_Type)

- Bits 16 & 17 = Unused

- Bits 18 thru 23 = Scale Value for Quantization; 6 bit value (R_Scale)

- Bits 24 thru 28 = Unused

- Bits 29 thru 31 = Quantization Type; 3 bit value (R_Type)

L_Type and R_Type can only be 4 different values.

- 0x0 = Float (no scaling)

- 0x4 = Unsigned Byte

- 0x5 = Unsigned Halfword

- 0x6 = Signed Byte

- 0x7 = Signed Halfword

L_Scale & R Scale structure:

This is a value (anywhere from 0x00 to 0x3F) that is plugged into the following formula:

2^scale = DQ (dequant/quant value)

During Dequantization:

Integer value (determined by L_Type) in memory is divided by DQ, that result is converted to a single precision float, and placed into its paired single segment of the FPR.

During Quantization:

Single precision float value of the paired single segment converted to its integer value (determined by R_Type), that result is multiplied by DQ, and placed into memory.

Confused? Yeah I don't blame you. Let's go over an example use of scaling.

Let's say you are making a code for your game that will utilize the GCN C stick values. You find where these values are in memory. There's a byte value for the X axis and a byte value right next to it for the Y axis. You move the sticks around and have come to the conclusion of the following ranges...

X axis range:

0x80 (furthest left) thru 0x00 (neutral) thru 0x7F (furthest right)

Y axis range:

0x80 (furthest down) thru 0x00 (neutral) thru 0x7F (furthest up)

Thus the two bytes are signed. We want to load these two byte values and scale them to floating points so we can preform quicker and more complex math on them compared to using normal integer instructions.

First we need to setup GQR7. The L_Type and R_Type will be set to 0x6 (signed byte). Now for L_Scale and R_Scale, what value do we use? Well the range of 0x00 thru 0xFF is a total of 256 unique values. A byte is 8 bits. If you preform the Scale equation of 2^8, you get a total of 256, matching the 256 unique values. So L_Scale and R_Scale will be 0x08. Now let's write the GQR7.

Code:

`lis r0, 0x0806 #Assuming r0 is safe for use in this example source`

ori r0, r0, 0x0806

mtspr 919, r0 #Write new value to GQR7

GQR7 is set, let's load the two bytes into the paired segments of an FPR. We will use f18 as an example. The bytes are located at r9 + 0x0088.

Code:

`psq_l f18, 0x0088 (r9), 0, 7`

Let's assume during the load, the X axis was a value of 0x00 (neutral) and the Y axis was 0xEF (sligthly down). f18 would have a value of...

- ps0 (X axis) 0x0000000000000000

- ps1 (Y axis) 0xBFB1000000000000

Convert these to decimal and you get 0.0, & -0.06640625

Scaling uses a range of -0.5 to 0.5 for signed bytes/halfwords and a range of 0.0 to 1.0 for unsigned (logical) bytes/halfwords. The ranges will be correct in response to your integers as long as you set the scale values correctly and you know the ranges of the integers beforehand.

Let's say we load them again but at a different point in time. X axis is 0x7F (max right), and Y axis is 0x33 (moderate up). f18 would have a value of...

- ps0 (X axis) 0x3FDFC00000000000

- ps1 (Y axis) 0x3FC9800000000000

Convert those to decimal and you get 0.49609375, & 0.19921875. If you are wondering why 0x7F didn't convert to 0.5 (max float) that's because 0x00 (0.0) is a possible value in the range and has to be accounted for. If any of the stick integer values were 0x80, that would result in -0.5.

If you were using something such as Halfword values where they had a range of 0x0000 thru 0xFFFF (65536 total possible values), you would use the scale formula of 2^16. 16 for 16 bits (halfword)

Chapter 9: Going back over normal Float Instructions in regards to ps1

When working in the past with normal float instructions, your only concern was what was in ps0. However it's important to note that a lot of normal float instructions (mostly single precision float instructions) effect ps1 as well. Having this knowledge can help you optimize a source by using normal float instructions along with your paired single instructions.

Load Floating Single (lfs)~

Whenever an lfs instruction is executed, the single precision float that is loaded into ps0, is also loaded into ps1. Thus ps0 and ps1 are now the same value.

Example: r31's value is a memory address points to the word value of 0x3F800000; load that value as single precision float into f0

Code:

`lfs f0, 0 (r31)`

After the instruction has executed, both ps0 and ps1 of the FPR are now 0x3FF000000000000.

---

Single Precision Floating Point Math~

ALL math based single precision floating point instructions will only use the ps0 values of the source registers for the calculation, but the destination register will have the result be placed in both ps0 and ps1.

---

fres (can only be used on single precision floats)~

fres will use the ps0 values of the source register for the calculation. Destination register will have the result placed in both ps0 & ps1.

---

All other instructions that work on (or can work on) single floats (frsp, fmr, etc) only produce a result in ps0 while ps1 will be left undefined.

Shoutout to CLF78 for help with the contents in this Chapter!

Chapter 10: Dolphin Inaccuracies

Something you should know...

Dolphin actually displays the FPRs incorrectly. An FPR is actually just one 64 bit segment. The second segment you see on Dolphin doesn't actually exist on the Wii's hardware.

On real hardware, Single Precision floats take up just 32 bits of data in the FPR, thus these single precision floats are in their word form (like what you see in memory). Double Precision floats are still 64 bits of data, thus they take up the entire FPR.

For whatever reason, the Devs of Dolphin made the FPRs be displayed as two 64 bit segments and have any single precision floats be displayed in 64-bit form. Paired Singles are two single precision floats, and in Dolphin, the first float in the first 64 bit segment and the second float is in the other 64 bit segment.

What occurs on real hardware regarding Paired Singles is the the 64-bit FPR is split in two halves. The upper 32 bits is the first paired single while the lower 32 bits is the second.

Thus this inaccuracy of the FPRs can allow a coder to write up source to manipulate the incorrect outputs of instructions such as using fmr with a double precision float and messing with the destination register ps1's leftover value, which would have been erased/replaced when done on real hardware.

It's nothing to worry about though as replicating this 'bug' can only be done with poorly handwritten assembly. Thus the quirky FPR usage by Dolphin won't cause any issues at all running your Wii games.

For more info about this 'bug', here's a small write up with a demo code + source - https://mkwii.com/showthread.php?tid=1886

Chapter 11: Real World Example; Conclusion

Still confused by Paired Singles? Or not sure how some of these instructions can optimize one of your sources? It may be best to look at a real code (not some source example) that uses multiple different types of paired single instructions where using such instructions improves that source from just using plain jane floating point instructions.

Here is an MKWii code I have created that fits such requirements - https://mkwii.com/showthread.php?tid=1260

Summary of the code:

- Purpose is to allow the user to spin the WiFi globe in the direction depending on their analog stick

- Analog Stick X and Y values reside in memory as a paired single and work in a value range of -1 to 1.

- Longitude and Latitude of the globe resides in memory as a paired single.

- Load the Longitude & Latitude and update its values depending on the Analog Stick X and Y current values