Paired Singles

Chapter 1: Intro

Paired Singles are probably the most strenuous aspect of PPC Assembly to understand.

First thing's first, you have to already know the basics of PPC ASM, able to make codes, and you already understand the fundamentals of Floating Points + FPRs. I have a great tutorial covering Floats which you can find HERE. Read that first before coming back here.

Chapter 2: Paired Singles in a FPR

With normal floating point instructions, you have only been working with the first 64-bit segment. This first segment which all normal floating point instructions work on is known as Paired Single 0 (or ps0 for short). The second segment is Paired Single 1 (ps1). Paired Single instructions utilize both ps0 and ps1 simultaneously, and the floats in ps0 & ps1 can only be of single precision. Hence the term Paired Singles (a pair of single precision floats). Using paired single instructions when a double precision float is present in an FPR will cause undefined behavior/results and could lead to an exception.

Chapter 3: Paired Singles in Memory

Since the values in ps0 and ps1 are single precision, each paired single (if stored/loaded to/from memory) is a word value.

Example: You have this value in f1 - C06C800000000000 4170000000000000

- ps0 = C06C800000000000

- ps1 = 4170000000000000

If this paired single value was stored to memory, it will be shown as 0xC36400004B800000

- 0xC3640000 is the ps0 value

- 0x4B800000 is the ps1 value

It's important to understand that paired singles are always two words in memory linked together as a double word.

Chapter 4: Storing/Loading

Here are some basic instructions for loading and storing paired singles.

psq_l fD, SIMM (rA), W, I

The above example will load the paired single value located at rA+SIMM into fD. The single precision floating point value located at rA+SIMM will be loaded in ps0 and the single precision floating point value located at rA+SIMM+4 will be loaded into ps1.

psq_st fD, SIMM (rA), W, I

The above is a basic paired single storing instruction. Value of ps0 is stored to memory location rA+SIMM and the value of ps1 is stored at rA+SIMM+4.

SIMM is NOT a 16-bit signed range. For paired single loading/storing instructions it's actually a 12-bit signed range (0xFFFFF800 thru 0x000007FF). If this range isn't followed, your assembler will throw an error.

Note: The format I have shown above for the two instructions is the format that all PPC assemblers use. For whatever reason, Dolphin (in Code view) uses a slightly different format.

What are the W and I values?

W can only be two values. 0 or 1. If W is 1 then ps1 of the FPR is always loaded/stored as the floating point value of 1 (0x3FF0000000000000) regardless what is in the fpr/memory beforehand. If W is 0, then the loading/storing of ps1 acts normally.

The I value represents which Graphics Quantization Register (GQR) will be used. To keep things simple for now, for basic loading/storing of paired singles, simply set the I value to 0. This will be fine for most Wii games. More information covering the nitty gritty of GQRs will be discussed later in Chapter 8.

Chapter 5: Basic Math & Single Parameter Instructions

Like floating point instructions, there are plenty of instructions for paired single instructions for all sorts of math.

ps_add fD, fA, fB

#fA ps0 + fB ps0 = fD ps0

#fA ps1 + fB ps1 = fD ps1

Other math instructions:

ps_sub (subtracting)

ps_mul (multiplying)

ps_div (dividing)

You also have some basic single parameter instructions...

ps_mr fD, fA

#fA ps0's value is copied to fD ps0

#fB ps1's value is copied to fD ps1

ps_abs fD, fA

#absolute value of fA ps0 is placed into fD ps0

#absolute value of fA ps1 is palced into fD ps1

ps_nabs fD, fA

#same as ps_abs but using negative absolute value instead

ps_neg fD, fA

#fA ps0 (if positive) is changed to its negative value (i.e. 1 to -1), result placed in fD ps0. If negative, then the value it changed to positive (i.e. -1 to 1). Same effects will obviously also apply to fA ps1 and result will be placed into fD ps1.

Chapter 6: Comparisons

ps_cmpo is the instruction for ordered paired single comparisons. There is also the ps_cmpu instruction for unordered comparisons.

The difference between ordered and unordered comparisons is how the FPSCR (floating point status control register) is modified when a NaN (not a number) is present in the comparison. Thus, in simple terms if you have no idea about what I just stated, stick with ordered comparisons only (ps_cmpo).

There are two types of ordered comparisons. One for ps0 and one for ps1

ps_cmpo0 crf, fD, fA

#fD ps0's value is compared to fA ps0's value

ps_cmpo1 crf, fD, fA

#fD ps1's value is compared to fA ps1's value

Just like with floating point instructions, you need to specify the condition register field. To keep it simple, use cr1.

Example comparison:

#Let's say f3 ps1 has the value of 1.0 in decimal form, and f0 ps1 has the value of 3.5 in decimal form.

Code:

`ps_cmpo1 cr1, f3, f0`

bgt- cr1, some_label

In the above source, the branch would NOT be taken because 1.0 is less then 3.5.

Chapter 7: Other Instructions

Paired Singles has some nifty instructions to move values across the paired single segments. There are 4 different instructions you can use.

ps_merge00 fD, fA, fB

#fA ps0 is copied to fD ps0. fB ps0 is copied to fD ps1

ps_merge01 fD, fA, fB

#fA ps0 is copied to fD ps0. fB ps1 is copied to fD ps1

ps_merge10 fD, fA, fB

#fA ps1 is copied to fD ps0. fB ps0 is copied to fD ps1

ps_merge11 fD, fA, fB

#fA ps1 is copied to fD ps0. fB ps1 is coped to fD ps1

You can use ps_merge10 to swap the ps values within an FPR. This could come in handy for you.

Example (swap the ps values of f0)~

Code:

`ps_merge10 f0, f0, f0`

Here are two instructions that allows you to do basic addition across the paired single segments. Obviously if the value(s) are negative, the instructions can be used as subtraction instead.

ps_sum0 fD, fA, fC, fB

#fA ps0 + fB ps1 = fD ps0

#fC ps1 = fD ps1

ps_sum1 fD, fA, fC, fB

#fA ps0 + fB ps1 = fD ps1

#fC ps0 = fD ps0

Here's an example of adding ps0 and ps1 of an FPR together and having the result stored back in ps0 of the same FPR. We'll use f5 for the following example.

Code:

`ps_sum0 f5, f5, f5, f5`

If for w/e reason you wanted the result in ps1 of f5, just use ps_sum1 instead.

Chapter 8: Scaling, Quantization, and the GQRs

When a paired single load instruction of any type is executed, this is known as Dequantization (or Dequantizing). If it's a paired single store instruction of any type, it is known as Quantization (or Quantizing). Quantizaton/Dequantization is just a fancy term for taking integer values and having them in a scale (or range) that can be converted back to or from a float value.

The use of Scaling allows you to have integer values (no larger than halfwords) in memory and utilize them as floating points (by a specified range) in the paired single segments of an FPR.

Revisiting Chapter 4 regarding the paired single loading/storing instructions, the I value stands for which Graphics Quantization Register (GQR) to use. There are 8 total GQRs. The I value can be any number from 0 to 7.

Each GQR is 32 bits of data. You can modify a GQR to effect how scaling is done during quantization and/or dequantization. Wii Games will usually set the all the GQR values during its early boot sequence. You can modify any GQR value but be sure to restore its value at some point later in your source/code.

You can write a value to the GQR's via the mtspr instruction. Here are the spr numbers for the GQRs

- 912 = GQR0

- 913 = GQR1

- 914 = GQR2

- 915 = GQR3

- 916 = GQR4

- 917 = GQR5

- 918 = GQR6

- 919 = GQR7

To write a register value to the GQR...

mtspr XXX, rD

#XXX = GQR's spr number, rD contains the value to copy to the GQR

To backup the value of a GQR to a register...

mfspr rD, XXX

#rD is the register where the GQR's value will be copied to, XXX = GQR's spr number

Structure of GQR:

- Bits 0 & 1 = Unused

- Bits 2 thru 7 = Scale Value for Dequantization; 6 bit value (L_Scale)

- Bits 8 thru 12 = Unused

- Bits 13 thru 15 = Dequantization Type; 3 bit value (L_Type)

- Bits 16 & 17 = Unused

- Bits 18 thru 23 = Scale Value for Quantization; 6 bit value (R_Scale)

- Bits 24 thru 28 = Unused

- Bits 29 thru 31 = Quantization Type; 3 bit value (R_Type)

L_Type and R_Type can only be 4 different values.

- 0x0 = Float (no scaling)

- 0x4 = Unsigned Byte

- 0x5 = Unsigned Halfword

- 0x6 = Signed Byte

- 0x7 = Signed Halfword

L_Scale & R Scale structure:

This is the value (anywhere from 0x00 to 0x3F) that represents the input of "scale" in the following formula:

2^scale = DQ (dequant/quant value)

During Dequantization:

Integer value (determined by L_Type) in memory is divided by DQ, that result is converted to a single precision float, and placed into its paired single segment of the FPR.

During Quantization:

Single precision float value of the paired single segment converted to its integer value (determined by R_Type), that result is multiplied by DQ, and placed into memory.

Confused? Yeah I don't blame you. Let's go over an example use of scaling.

Let's say you are making a code for your game that will utilize the GCN C stick values. You find where these values are in memory. There's a byte value for the X axis and a byte value right next to it for the Y axis. You move the sticks around and have come to the conclusion of the following ranges...

X axis range:

0x80 (furthest left) thru 0x00 (neutral) thru 0x7F (furthest right)

Y axis range:

0x80 (furthest down) thru 0x00 (neutral) thru 0x7F (furthest up)

Fyi: You can test this on MKWii. Load up Dolphin and the Dolphin-memory-engine. Be sure only the Keyboard for GCN emulation is being used as port/player 1 and go the byte address's depending on the region.

- NTSC-U: 80343E84 = X axis, 80343E85 = Y axis

- PAL: 80348204 = X axis, 80348205 = Y axis

- NTSC-J: 80347B84 = X axis, 80347B85 = Y axis

- NTSC-K: 80336204 = X axis, 80336205 = Y axis

Both bytes are signed since the neutral position is 0. We want to load these two byte values and scale them to floating points so we can preform quicker and more complex math on them compared to using normal integer instructions.

First we need to setup GQR7. The L_Type and R_Type will be set to 0x6 (signed byte). Now for L_Scale and R_Scale, what value do we use? Well the range of 0x00 thru 0xFF is a total of 256 unique values. A byte is 8 bits. If you preform the Scale equation of 2^8, you get a total of 256, matching the 256 unique values. Therefore, L_Scale and R_Scale will be 0x08 since scale's value used in the formula of 2^scale was 8. Now let's write the GQR7.

Code:

`lis r0, 0x0806 #Assuming r0 is safe for use in this example source`

ori r0, r0, 0x0806

mtspr 919, r0 #Write new value to GQR7

GQR7 is set, let's load the two bytes into the paired segments of an FPR. We will use f18 as an example. Pretend the bytes are located at r9 + 0x0088.

Code:

`psq_l f18, 0x0088 (r9), 0, 7`

Let's assume during the load, the X axis was a value of 0x00 (neutral) and the Y axis was 0xEF (sligthly down). f18 would have a value of...

- ps0 (X axis) 0x0000000000000000

- ps1 (Y axis) 0xBFB1000000000000

Convert these to decimal and you get 0.0, & -0.06640625

Scaling uses a range of -0.5 to 0.5 for signed bytes/halfwords and a range of 0.0 to 1.0 for unsigned (logical) bytes/halfwords. The ranges will be correct in response to your integers as long as you set the scale values correctly and you know the ranges of the integers beforehand.

Let's say we load them again but at a different point in time. X axis is 0x7F (max right), and Y axis is 0x33 (moderate up). f18 would have a value of...

- ps0 (X axis) 0x3FDFC00000000000

- ps1 (Y axis) 0x3FC9800000000000

Convert those to decimal and you get 0.49609375, & 0.19921875. If you are wondering why 0x7F didn't convert to 0.5 (max float) that's because 0x00 (0.0) is a possible value in the range and has to be accounted for. If any of the stick integer values were 0x80, that would result in -0.5.

If you were using something such as Halfword values where they had a range of 0x0000 thru 0xFFFF (65536 total possible values), you would use the scale formula of 2^16. 16 for 16 bits (halfword)

Chapter 9: Dolphin Inaccuracies, HID2 PSE, Re-visit of Float Instructions

Something you should know...

As of May 2024, Dolphin displays the FPRs incorrectly. An FPR is actually just one 64 bit segment. ps0 is the first 32-bits, and ps1 is the second 32-bits. This has forced me to incorrectly teach Beginners about the basics of floats on purpose.

Now that you know the true width of an FPR, we need to revisit a Float Basics that I have yet to mention. That is the PSE (Paired Single Enable) bit in the HID2 SPR Register. This bit obviously enables the Paired Single Instructions, but it also changes the effects/operations of every single-float instruction, and many other float instructions.

Under normal circumstances, your Wii Game is running with this PSE bit high. As an fyi, you cannot just simply toggle this bit on/off at will. Whenever you want to change this bit, you need invalidate the entire I-Cache, disable it, change the bit, and re-enable the I-Cache.

Anyway, when this PSE bit is low, Broadway will always read any FPR as a whole. Meaning when single-floats are placed into an FPR, they are changed to their 64-bit equivalent. This is important to understand.

Here is a list of operational descriptions of every Float instruction that's effected by this PSE bit. I didn't bother listing Paired Singles, because when PSE is low, they are simply illegal.

fadds, fdivs, fmuls, etc (any math based single-float instruction):

HID2 PSE low: Entire FPRs used. If Source Registers are double-precision, operation is done first, then fD is converted to 64-bit Single Precision. Result is always 64-bit Single Precision.

HID2 PSE high: ps0's of Source Registers used for operation. ps0 and ps1 of fD gets the 32-bit Single Precision result.

fmr:

HID2 PSE low: Entire FPR of fA copied to fD.

HID2 PSE high: ps0 of fA copied to ps0 of fD. ps1 of fD left UNCHANGED.

fres

HID2 PSE low: Entire FPRs used. If Source Register is double-precision, the operation is done first, then fD is changed to a 64-bit Single.

HID2 PSE high: ps0 of Source Register used for operation. ps0 and ps1 of fD gets the 32-bit Single Precision result.

frsqrte:

HID2 PSE low: Entire FPRs used. If Source Register was double precision, result REMAINS as double precision.

HID2 PSE high: ps0 of Source Register used for operation. ps0 of fD gets 32-bit Single result. ps1 of fD is left UNCHANGED.

frsp:

HID2 PSE low: Entire FPRs used. Result is always a 64-bit Single.

HID2 PSE high: ps0 of Source Regiser used for operation. ps0 of fD gets 32-bit Single result. ps1 of fD is left UNDEFINED (junk value).

lfs:

HID2 PSE low: 32-bit Single float changed to 64-bit width. Placed into FPR.

HID2 PSE high: 32-bit Single float placed into both ps0 and ps1 of FPR.

lfd:

HID2 PSE low: 64-bit width float placed into FPR.

HID2 PSE high: 64-bit width float changed to 32-bit single. That single float is placed into ps0 of FPR with ps1 being left UNDEFINED (junk)

stfs:

HID2 PSE low: 64-bit float changed to 32-bit Single Precision float, then stored to EA.

HID2 PSE high: ps0 of FPR stored to EA.

stfd:

HID2 PSE low: 64-bit float stored to EA.

HID2 PSE high: ps0 is changed to 64-bits, then stored to EA.

First fyi: From very limited personal testing, it appears most of the time, junk values result as 0x3F800000 (1.0). Just something to note.

Second fyi: fabs, fneg, and fnabs operate the same regardless of HID2 PSE. For fabs, bit 0 of the FPR is set low. For fnabs, bit 0 of the FPR is set high. And finally for fneg, bit 0 of the FPR is flipped.

Keep in mind this is how Real Hardware does it. Dolphin is not accurate here. As you can see, I have basically thrown a curveball at you everything you've learned in the past.

Shoutout to CLF78 for help with the contents in this Chapter!

Chapter 10: Real World Examples

Still confused by Paired Singles? Or not sure how some of these instructions can optimize one of your sources? It may be best to look at a real code that uses multiple different types of paired single instructions where using such instructions improves that source from just using plain jane floating point instructions.

Here is an MKWii code I have created that utilizes some paired single instructions - https://mkwii.com/showthread.php?tid=1260

Summary of the code:

- Purpose is to allow the user to spin the WiFi globe in the direction depending on their analog stick

- Analog Stick X and Y values reside in memory as a paired single and work in a value range of -1 to 1.

- Longitude and Latitude of the globe resides in memory as a paired single.

- Load the Longitude & Latitude and update its values depending on the Analog Stick X and Y current values

Here is another MKWii code (created by CLF78) that uses a quantized paired single load - https://mariokartwii.com/showthread.php?tid=1887

Chapter 11: Exercise

The following exercise will help you understand more about...

- Paired Single Loads & Stores (psq_l & psq_st)

- Paired Single Math Operations (ps_mul & ps_sub)

- Paired Single Math across Segments (ps_sum0)

A great way to get use to programming with Paired Singles is making a Speed-O-Meter from XYZ Coordinates. We will do this for the PAL MKWii game. We will pretend that there is no spot in Memory where the game keeps any kind of Speed measurement at.

This type of Speed-O-Meter is better than the typical type because the typical type uses Engine Speed. The one we will make (XYZ type) will factor in events such as boost panels, conveyor belts, air movement, etc.

First we need to understand what XYZ coordinates are. XYZ coordinates are a measurement/value that can give you the location of an object in a 3 dimensional space.

In a two dimensional space (like a piece of graph paper), going left & right is the X coordinate. Going up and down is the Y coordinate.

Picture~

Two dimensions is simple enough. Now, we will cover a 3 dimensional space. X & Y are still the same. However we have a new coordinate where you for forward & backward motion.

Imagine you are holding that piece of graph paper in front of your face. Left & Right is X. Up & Down is Y. Going forward (through the paper) and back is the Z coordinate.

Another way to remember on how to differentiate Y vs Z is that Y is always for elevation.

If we were to think of it like a Compass, X is West & East, and Z (**NOT** Y) is North & South.

XYZ coordinates (from my personal experience) are *ALWAYS* stored as single precision floats consecutively in memory for any Wii Game that uses them. Therefore, using Paired Singles makes perfect sense for this exercise.

Important NOTE: On the majority of XYZ coordinates used in Wii games values increase vs decrease don't work how one would expect. For example, in the DBZ BT3 game, going up in elevation (Y coordinate), the float value decreases.

Luckily for MKWii, going up in Y increases the float. Which makes more sense from an intuition standpoint. If you are ever in need of finding the XYZ coordinates of a game, searching for the Y coordinate is the easiest route. This is because you have no idea how to search for X & Z because you have no reference point of what "north" is on your "compass".

This whole 'increase vs decrease for Y up vs down' situation isn't relevant for our Speed-O-Meter code, but I thought I should mention this since we are covering XYZ usage.

To keep this exercise form being too long, we are not going to 'find' the XYZ coordinates 'manually'. There is already a method to load them up based on the slot of the player.

When making a speed-o-meter, we can't use the metric of kmh or mph. The game doesn't understand kilometers or miles. We can however, do a metric of "units" per frame. Units per Hour would yield a too large of a number for piratical use.

To calculate the change of XYZ coordinates (or speed) from one instance of time to another instance, we use the following formula (credits to JoshuaMK for supplying this when I needed it for DBZ BT3)

XYZ Speed = sqrt{[(x2 - x1)^2] + [(y2 - y1)^2] + [(z2 - z1)^2]}

x1 = X coordinate at time instance #1 (or frame #1)

y1 = Y coordinate at time instance #1 (or frame #1)

z1 = Z coordinate at time instance #1 (or frame #1)

x2 = X coordinate at time instance #2 (or frame #2)

y2 = Y coordinate at time instance #2 (or frame #2)

z2 = Z coordinate at time instance #2 (or frame #2)

---

We will write out a C0 gecko code since we know that executes at exactly once per frame (aka 60 times a second). The following snippet of code can be used anywhere to load a specific slot/player's XYZ coordinates.

Code:

`#Set desired slot value, 0 (P1; you) used for this source`

li r11, 0

#Set r12 to Point to the XYZs

lis r12, 0x809C #Pal address specific, fyi

lwz r12, 0x18F8 (r12) #Pal address specific, fyi

cmpwi r12, 0 #Added in for C0 code, when not in a race/battle, no valid pointer will exist

beqlr- #If invalid, end C0 code

lwz r12, 0x0020 (r12)

rlwinm r11, r11, 2, 0, 29

lwzx r12, r12, r11

lwz r12, 0 (r12)

lwz r12, 0x8 (r12)

lwz r12, 0x90 (r12)

lwz r12, 0x4 (r12) #At this point r12 + 0x68 points directly to the XYZ coordinates

The above code is doing what is called 'pointer level' loading. It comes from Seeky's mkw-structures github repo. Specifically, the player.h page - https://github.com/SeekyCt/mkw-structure...r/player.h

The repo can help you pinpoint all sorts of data and how to reference that data later. Anyway, going back to the code at hand....

We need to write some code that will load up saved XYZ coordinates (last frame's aka frame 1) from a safe unused spot (like the EVA) and then store the current frame's XYZ overwriting the old ones present in the EVA. We will need 4 float registers simultaneously. 2 FPRs for the last frame's XYZ, and 2 FPR's for the current frames.

The very first frame this occurs, the frame 1 XYZ's will be all null. Which is fine, since its only 1 frame, you won't see this effect your speedometer in real time.

We will have the XYZs (that temp reside in EVA) be kept at the following addresses.

- 0x800007F0 = X

- 0x800007F4 = Y

- 0x800007F8 = Z

Let's write out that portion of the code..

Code:

`#Load up current frame's XYZs into f10 & f11`

psq_l f10, 0x68 (r12), 0, 0

lfs f11, 0x70 (r12)

#Set EVA Upper

lis r12, 0x8000

#Load last frame's XYZs from EVA

psq_l f12, 0x7F0 (r12), 0, 0

lfs f13, 0x7F8 (r12)

#Write in current frame's XYZs to update EVA

psq_st f10, 0x07F0 (r12), 0, 0

stfs f11, 0x07F8 (r12)

Alright, we got last frame's and current frame's XYZs in f10 thru f13. The EVA has been updated. Now we can do the formula...

Code:

`#Do the formula: sqrt{[(x2 - x1)^2] + [(y2 - y1)^2] + [(z2 - z1)^2]}`

#X2 = f10 ps0

#X1 = f12 ps0

#Y2 = f10 ps1

#Y1 = f12 ps1

#Z2 = f11 ps0

#Z1 = f13 ps0

#Preform X2 minus X1, and Y2 minus Y1

ps_sub f10, f10, f12

#Z2 minus Z1

fsubs f11, f11, f13

#Raise X and Y to power of 2

ps_mul f10, f10, f10

#Add X + Y

ps_sum0 f10, f10, f10, f10

#Raise Z to the power of 2, then add Z to (X+Y)

fmadds f10, f11, f11, f10

#Get Square Root

frsqrte f10, f10

fres f10, f10

We got our speed calculated, now we just need a way to display it in the race timer for us to see. First we need to convert the float result to an integer. We will store that to a different spot in the EVA so we don't write over the XYZs. We'll use the next available EVA address of 0x800007FC.

After completing the C0 code's source, we will write out a new code (C2) that is hooked at Joshua MK's Millisecond Modifier address. That C2 code will load the integer and use that value for the milliseconds of the race timer.

Let's finish off the rest of this C0 code now...

Code:

`#Convert float to integer, no need for a fabs instruction, the result is always positive; use standard rounding in the conversion`

fctiw f10, f10

#Store to EVA

li r11, 0x07FC

stfiwx f10, r12, r11

#End C0

#blr #uncomment if *NOT* using pyiiasmh, adjust compiled code accordingly to be a proper C0 code

Just to recap here is the final completed C0 code's source..

Code:

`#Set desired slot value, 0 (P1; you) used for this source, adjust this to your needs`

li r11, 0

#Set r12 to Point to the XYZs

lis r12, 0x809C #Pal address specific, fyi

lwz r12, 0x18F8 (r12) #Pal address specific, fyi

cmpwi r12, 0 #Added in for C0 code, when not in a race/battle, no valid pointer will exist

beqlr- #If invalid, end C0 code

lwz r12, 0x0020 (r12)

rlwinm r11, r11, 2, 0, 29

lwzx r12, r12, r11

lwz r12, 0 (r12)

lwz r12, 0x8 (r12)

lwz r12, 0x90 (r12)

lwz r12, 0x4 (r12) #At this point r12 + 0x68 points directly to the XYZ coordinates#Load up current frame's XYZs into f10 & f11

#Load up current frame's XYZs into f10 & f11

psq_l f10, 0x68 (r12), 0, 0

lfs f11, 0x70 (r12)

#Set EVA Upper

lis r12, 0x8000

#Load last frame's XYZs from EVA

psq_l f12, 0x7F0 (r12), 0, 0

lfs f13, 0x7F8 (r12)

#Write in current frame's XYZs to update EVA

psq_st f10, 0x07F0 (r12), 0, 0

stfs f11, 0x07F8 (r12)

#Do the formula: sqrt{[(x2 - x1)^2] + [(y2 - y1)^2] + [(z2 - z1)^2]}

#X2 = f10 ps0

#X1 = f12 ps0

#Y2 = f10 ps1

#Y1 = f12 ps1

#Z2 = f11 ps0

#Z1 = f13 ps0

#Preform X2 minus X1, and Y2 minus Y1

ps_sub f10, f10, f12

#Z2 minus Z1

fsubs f11, f11, f13

#Raise X and Y to power of 2

ps_mul f10, f10, f10

#Add X + Y

ps_sum0 f10, f10, f10, f10

#Raise Z to the power of 2, then add Z to (X+Y)

fmadds f10, f11, f11, f10

#Get Square Root

frsqrte f10, f10

fres f10, f10

#Convert float to integer, no need for a fabs instruction, the result is always positive; use standard rounding in the conversion

fctiw f10, f10

#Store to EVA

li r11, 0x07FC

stfiwx f10, r12, r11

#End C0

#blr #uncomment if *NOT* using pyiiasmh, adjust compiled code accordingly to be a proper C0 code

Moving onto the C2 code, we will load the integer from the EVA and replace the GPR (r28) with the new integer. This will change the output of the milliseconds on the timer.

Code:

`#C2 Hook Address`

#PAL = 807F84F8

#Set EVA Upper

lis r28, 0x8000

#Load integer from EVA, plaace in r5 to replace r5's original millisecond value

lwz r28, 0x7FC (r28)

And that's it for the C2 Code. Combining the two codes together, this is the final compiled code (PAL only)...

Code:

`XYZ Speed-O-Meter from Scratch [Vega]`

NOTE: Dolphin Only. Crashes on console. To fix: Use a C2 Hook Address that only occurs per frame in a race.

Works everywhere. Choose the slot you want to show the XYZ speed of.

X = Slot

PAL

C0000000 0000000F

3960000X 3D80809C

818C18F8 2C0C0000

4D820020 818C0020

556B103A 7D8C582E

818C0000 818C0008

818C0090 818C0004

E14C0068 C16C0070

3D808000 E18C07F0

C1AC07F8 F14C07F0

D16C07F8 114A6028

ED6B6828 114A02B2

114A5294 ED4B52FA

FD405034 ED405030

FD40501C 396007FC

7D4C5FAE 4E800020

C27F84F8 00000002

3F808000 839C07FC

60000000 00000000

Code created by: Vega

Credits: JoshuaMK (XYZ formula, Millisecond Modifier), Seeky (mkw-structures & player.h), Stebler (player.h)

Success!! For anyone not familiar with MKW Speed-O-Meters, Funky Kong's max speed drifting under normal conditions is 84 units/frame.

Fyi, if you don't want to include the Y coordinate (elevation) for your Speed-O-Meter, use this formula instead...

sqrt{[(x2 - x1)^2] + [(z2 - z1)^2]}

Happy coding!