Paired Singles

Chapter 1: Intro

Paired Singles are probably the most strenuous aspect of PPC Assembly to understand.

First thing's first, you have to already know the basics of PPC ASM, able to make codes, and you already understand the fundamentals of Floating Points + FPRs. I have a great tutorial covering Floats which you can find HERE. Read that first before coming back here.

Chapter 2: Paired Singles in a FPR

With normal floating point instructions, you have only been working with the first 64-bit segment. This first segment which all normal floating point instructions work on is known as Paired Single 0 (or ps0 for short). The second segment is Paired Single 1 (ps1). Paired Single instructions utilize both ps0 and ps1 simultaneously, and the floats in ps0 & ps1 can only be of single precision. Hence the term Paired Singles (a pair of single precision floats). Using paired single instructions when a double precision float is present in an FPR will cause undefined behavior/results and could lead to an exception.

Chapter 3: Paired Singles in Memory

Since the values in ps0 and ps1 are single precision, each paired single (if stored/loaded to/from memory) is a word value.

Example: You have this value in f1 - C06C800000000000 4170000000000000

- ps0 = C06C800000000000

- ps1 = 4170000000000000

If this paired single value was stored to memory, it will be shown as 0xC36400004B800000

- 0xC3640000 is the ps0 value

- 0x4B800000 is the ps1 value

It's important to understand that paired singles are always two words in memory linked together as a double word.

Chapter 4: Storing/Loading

Here are some basic instructions for loading and storing paired singles.

psq_l fD, VALUE (rA), W, I

#VALUE is signed

The above example will load the paired single value located at rA+VALUE into fD. The single precision floating point value located at rA+VALUE will be loaded in ps0 and the single precision floating point value located at rA+VALUE+4 will be loaded into ps1.

psq_st fD, VALUE (rA), W, I

#VALUE is signed

The above is a basic paired single storing instruction. Value of ps0 is stored to memory location rA+VALUE and the value of ps1 is stored at rA+VALUE+4.

VALUE is NOT a 16-bit signed range. For paired single loading/storing instructions it's actually a 12-bit signed range (0xFFFFF800 thru 0x000007FF). If this range isn't followed, your compiler will throw an error.

Note: The format I have shown above for the two instructions is the format that all PPC compilers use. For whatever reason, Dolphin (in Code view) uses a slightly different format.

What are the W and I values?

W can only be two values. 0 or 1. If W is 1 then ps1 of the FPR is always loaded/stored as the floating point value of 1 (0x3FF0000000000000) regardless what is in the fpr/memory beforehand. If W is 0, then the loading/storing of ps1 acts normally.

The I value represents which Graphics Quantization Register (GQR) will be used. To keep things simple for now, for basic loading/storing of paired singles, simply set the I value to 0. This will be fine for most Wii games. More information covering the nitty gritty of GQRs will be discussed later in Chapter 8.

Chapter 5: Basic Math & Single Parameter Instructions

Like floating point instructions, there are plenty of instructions for paired single instructions for all sorts of math.

ps_add fD, fA, fB

#fA ps0 + fB ps0 = fD ps0

#fA ps1 + fB ps1 = fD ps1

Other math instructions:

ps_sub (subtracting)

ps_mul (multiplying)

ps_div (dividing)

You also have some basic single parameter instructions...

ps_mr fD, fA

#fA ps0's value is copied to fD ps0

#fB ps1's value is copied to fD ps1

ps_abs fD, fA

#absolute value of fA ps0 is placed into fD ps0

#absolute value of fA ps1 is palced into fD ps1

ps_nabs fD, fA

#same as ps_abs but using negative absolute value instead

ps_neg fD, fA

#fA ps0 (if positive) is changed to its negative value (i.e. 1 to -1), result placed in fD ps0. If negative, then the value it changed to positive (i.e. -1 to 1). Same effects will obviously also apply to fA ps1 and result will be placed into fD ps1.

Chapter 6: Comparisons

ps_cmpo is the instruction for ordered paired single comparisons. There is also the ps_cmpu instruction for unordered comparisons.

The difference between ordered and unordered comparisons is how the FPSCR (floating point status control register) is modified when a NaN (not a number) is present in the comparison. Thus, in simple terms if you have no idea about what I just stated, stick with ordered comparisons only (ps_cmpo).

There are two types of ordered comparisons. One for ps0 and one for ps1

ps_cmpo0 crf, fD, fA

#fD ps0's value is compared to fA ps0's value

ps_cmpo1 crf, fD, fA

#fD ps1's value is compared to fA ps1's value

Just like with floating point instructions, you need to specify the condition register field. To keep it simple, use cr1.

Example comparison:

#Let's say f3 ps1 has the value of 1.0 in decimal form, and f0 ps1 has the value of 3.5 in decimal form.

Code:

`ps_cmpo1 cr1, f3, f0`

bgt- cr1, some_label

In the above source, the branch would NOT be taken because 1.0 is less then 3.5.

Chapter 7: Other Instructions

Paired Singles has some nifty instructions to move values across the paired single segments. There are 4 different instructions you can use.

ps_merge00 fD, fA, fB

#fA ps0 is copied to fD ps0. fB ps0 is copied to fD ps1

ps_merge01 fD, fA, fB

#fA ps0 is copied to fD ps0. fB ps1 is copied to fD ps1

ps_merge10 fD, fA, fB

#fA ps1 is copied to fD ps0. fB ps0 is copied to fD ps1

ps_merge11 fD, fA, fB

#fA ps1 is copied to fD ps0. fB ps1 is coped to fD ps1

You can use ps_merge10 to swap the ps values within an FPR. This could come in handy for you.

Example (swap the ps values of f0)~

Code:

`ps_merge10 f0, f0, f0`

Here are two instructions that allows you to do basic addition across the paired single segments. Obviously if the value(s) are negative, the instructions can be used as subtraction instead.

ps_sum0 fD, fA, fC, fB

#fA ps0 + fB ps1 = fD ps0

#fC ps1 = fD ps1

ps_sum1 fD, fA, fC, fB

#fA ps0 + fB ps1 = fD ps1

#fC ps0 = fD ps0

Here's an example of adding ps0 and ps1 of an FPR together and having the result stored back in ps0 of the same FPR. This could be useful for something such as writing some source involving the Pythagorean Theorem. We'll use f5 for the following example.

Code:

`ps_sum0 f5, f5, f5, f5`

If for w/e reason you wanted the result in ps1 of f5, just use ps_sum1 instead.

Chapter 8: Scaling, Quantization, and the GQRs

When a paired single load instruction of any type is executed, this is known as Dequantization (or Dequantizing). If it's a paired single store instruction of any type, it is known as Quantization (or Quantizing). Quantizaton/Dequantization is just a fancy term for taking integer values and having them in a scale (or range) that can be converted back to or from a float value.

The use of Scaling allows you to have integer values (no larger than halfwords) in memory and utilize them as floating points (by a specified range) in the paired single segments of an FPR.

Revisiting Chapter 4 regarding the paired single loading/storing instructions, the I value stands for which Graphics Quantization Register (GQR) to use. There are 8 total GQRs. The I value can be any number from 0 to 7.

Each GQR is 32 bits of data. You can modify a GQR to effect how scaling is done during quantization and/or dequantization. Wii Games will usually set the all the GQR values during its early boot sequence. You can modify any GQR value but be sure to restore its value at some point later in your source/code.

You can write a value to the GQR's via the mtspr instruction. Here are the spr numbers for the GQRs

- 912 = GQR0

- 913 = GQR1

- 914 = GQR2

- 915 = GQR3

- 916 = GQR4

- 917 = GQR5

- 918 = GQR6

- 919 = GQR7

To write a register value to the GQR...

mtspr XXX, rD

#XXX = GQR's spr number, rD contains the value to copy to the GQR

To backup the value of a GQR to a register...

mfspr rD, XXX

#rD is the register where the GQR's value will be copied to, XXX = GQR's spr number

Structure of GQR:

- Bits 0 & 1 = Unused

- Bits 2 thru 7 = Scale Value for Dequantization; 6 bit value (L_Scale)

- Bits 8 thru 12 = Unused

- Bits 13 thru 15 = Dequantization Type; 3 bit value (L_Type)

- Bits 16 & 17 = Unused

- Bits 18 thru 23 = Scale Value for Quantization; 6 bit value (R_Scale)

- Bits 24 thru 28 = Unused

- Bits 29 thru 31 = Quantization Type; 3 bit value (R_Type)

L_Type and R_Type can only be 4 different values.

- 0x0 = Float (no scaling)

- 0x4 = Unsigned Byte

- 0x5 = Unsigned Halfword

- 0x6 = Signed Byte

- 0x7 = Signed Halfword

L_Scale & R Scale structure:

This is the value (anywhere from 0x00 to 0x3F) that represents the input of "scale" in the following formula:

2^scale = DQ (dequant/quant value)

During Dequantization:

Integer value (determined by L_Type) in memory is divided by DQ, that result is converted to a single precision float, and placed into its paired single segment of the FPR.

During Quantization:

Single precision float value of the paired single segment converted to its integer value (determined by R_Type), that result is multiplied by DQ, and placed into memory.

Confused? Yeah I don't blame you. Let's go over an example use of scaling.

Let's say you are making a code for your game that will utilize the GCN C stick values. You find where these values are in memory. There's a byte value for the X axis and a byte value right next to it for the Y axis. You move the sticks around and have come to the conclusion of the following ranges...

X axis range:

0x80 (furthest left) thru 0x00 (neutral) thru 0x7F (furthest right)

Y axis range:

0x80 (furthest down) thru 0x00 (neutral) thru 0x7F (furthest up)

Fyi: You can test this on MKWii. Load up Dolphin and the Dolphin-memory-engine. Be sure only the Keyboard for GCN emulation is being used as port/player 1 and go the byte address's depending on the region.

- NTSC-U: 80343E84 = X axis, 80343E85 = Y axis

- PAL: 80348204 = X axis, 80348205 = Y axis

- NTSC-J: 80347B84 = X axis, 80347B85 = Y axis

- NTSC-K: 80336204 = X axis, 80336205 = Y axis

Both bytes are signed since the neutral position is 0. We want to load these two byte values and scale them to floating points so we can preform quicker and more complex math on them compared to using normal integer instructions.

First we need to setup GQR7. The L_Type and R_Type will be set to 0x6 (signed byte). Now for L_Scale and R_Scale, what value do we use? Well the range of 0x00 thru 0xFF is a total of 256 unique values. A byte is 8 bits. If you preform the Scale equation of 2^8, you get a total of 256, matching the 256 unique values. Therefore, L_Scale and R_Scale will be 0x08 since scale's value used in the formula of 2^scale was 8. Now let's write the GQR7.

Code:

`lis r0, 0x0806 #Assuming r0 is safe for use in this example source`

ori r0, r0, 0x0806

mtspr 919, r0 #Write new value to GQR7

GQR7 is set, let's load the two bytes into the paired segments of an FPR. We will use f18 as an example. Pretend the bytes are located at r9 + 0x0088.

Code:

`psq_l f18, 0x0088 (r9), 0, 7`

Let's assume during the load, the X axis was a value of 0x00 (neutral) and the Y axis was 0xEF (sligthly down). f18 would have a value of...

- ps0 (X axis) 0x0000000000000000

- ps1 (Y axis) 0xBFB1000000000000

Convert these to decimal and you get 0.0, & -0.06640625

Scaling uses a range of -0.5 to 0.5 for signed bytes/halfwords and a range of 0.0 to 1.0 for unsigned (logical) bytes/halfwords. The ranges will be correct in response to your integers as long as you set the scale values correctly and you know the ranges of the integers beforehand.

Let's say we load them again but at a different point in time. X axis is 0x7F (max right), and Y axis is 0x33 (moderate up). f18 would have a value of...

- ps0 (X axis) 0x3FDFC00000000000

- ps1 (Y axis) 0x3FC9800000000000

Convert those to decimal and you get 0.49609375, & 0.19921875. If you are wondering why 0x7F didn't convert to 0.5 (max float) that's because 0x00 (0.0) is a possible value in the range and has to be accounted for. If any of the stick integer values were 0x80, that would result in -0.5.

If you were using something such as Halfword values where they had a range of 0x0000 thru 0xFFFF (65536 total possible values), you would use the scale formula of 2^16. 16 for 16 bits (halfword)

Chapter 9: Going back over normal Float Instructions in regards to ps1

When working in the past with normal float instructions, your only concern was what was in ps0. However it's important to note that a lot of normal float instructions (mostly single precision float instructions) effect ps1 as well. Having this knowledge can help you optimize a source by using normal float instructions along with your paired single instructions.

Load Floating Single (lfs)~

Whenever an lfs instruction is executed, the single precision float that is loaded into ps0, is also loaded into ps1. Thus ps0 and ps1 are now the same value.

Example: r31's value is a memory address points to the word value of 0x3F800000; load that value as single precision float into f0

Code:

`lfs f0, 0 (r31)`

After the instruction has executed, both ps0 and ps1 of the FPR are now 0x3FF000000000000.

---

Store floating single (stfs)~

Even though this is obvious, but it might as well be stated that the stfs instruction only stores ps0's value to memory.

---

Single Precision Floating Point Math~

ALL math based single precision floating point instructions will only use the ps0 values of the source registers for the calculation, but the destination register will have the result be placed in both ps0 and ps1.

---

Floating Reciprocal Estimate Single (fres; can only be used on single precision floats)~

fres will use the ps0 value of the source register for the calculation. Destination register will have the result placed in both ps0 & ps1.

---

Floating Round to Single (frsp; can only be used on double precision floats)~

Destination register will have the result placed ps0 while ps1 is left undefined

---

Floating Move Register (fmr; can be used on both precisions)

When used on single precision, Source register ps0 is copied over to Destination register ps0. Destination register's ps1 is left UNCHANGED.

---

All other instructions (that can work Single Precision ps0 values such as fabs, fnabs, fneg etc) will use Source Register's ps0 for the operation/input and the result/output will be placed in ps0 of the Destination Register while Destination Register's ps1 value is left undefined.

Shoutout to CLF78 for help with the contents in this Chapter!

Chapter 10: Dolphin Inaccuracies

Something you should know...

Dolphin actually displays the FPRs incorrectly. An FPR is actually just one 64 bit segment. The second segment you see on Dolphin doesn't actually exist on the Wii's hardware.

On real hardware, Single Precision floats take up just 32 bits of data in the FPR, thus these single precision floats are in their word form (like what you see in memory). Double Precision floats are still 64 bits of data, thus they take up the entire FPR.

For whatever reason, the Devs of Dolphin made the FPRs be displayed as two 64 bit segments and have any single precision floats be displayed in 64-bit form. Paired Singles are two single precision floats, and in Dolphin, the first float in the first 64 bit segment and the second float is in the other 64 bit segment.

What occurs on real hardware regarding Paired Singles is the the 64-bit FPR is split in two halves. The upper 32 bits is the first paired single while the lower 32 bits is the second.

Thus this inaccuracy of the FPRs can allow a coder to write up source to manipulate the incorrect outputs of instructions such as using fmr with a double precision float and messing with the destination register ps1's leftover value, which would have been erased/replaced when done on real hardware.

It's nothing to worry about though as replicating this 'bug' can only be done with poorly handwritten assembly. Thus the quirky FPR usage by Dolphin won't cause any issues at all running your Wii games.

For more info about this 'bug', here's a small write up with a demo code + source - https://mkwii.com/showthread.php?tid=1886

Chapter 11: Real World Examples

Still confused by Paired Singles? Or not sure how some of these instructions can optimize one of your sources? It may be best to look at a real code that uses multiple different types of paired single instructions where using such instructions improves that source from just using plain jane floating point instructions.

Here is an MKWii code I have created that utilizes some paired single instructions - https://mkwii.com/showthread.php?tid=1260

Summary of the code:

- Purpose is to allow the user to spin the WiFi globe in the direction depending on their analog stick

- Analog Stick X and Y values reside in memory as a paired single and work in a value range of -1 to 1.

- Longitude and Latitude of the globe resides in memory as a paired single.

- Load the Longitude & Latitude and update its values depending on the Analog Stick X and Y current values

Here is another MKWii code (created by CLF78) that uses a quantized paired single load - https://mariokartwii.com/showthread.php?tid=1887

Chapter 12: Exercise

The following exercise will help you understand more about...

- Paired Single Loads & Stores (psq_l & psq_st)

- Paired Single Math Operations (ps_mul & ps_sub)

- Paired Single Math across Segments (ps_sum0)

A great way to get use to programming with Paired Singles is making a Speed-O-Meter from XYZ Coordinates. We will do this for the PAL MKWii game. We will pretend that there is no spot in Memory where the game keeps any kind of Speed measurement at.

This type of Speed-O-Meter is better than the typical type because the typical type uses Engine Speed. The one we will make (XYZ type) will factor in events such as boost panels, conveyor belts, air movement, etc.

First we need to understand what XYZ coordinates are. XYZ coordinates are a measurement/value that can give you the location of an object in a 3 dimensional space.

In a two dimensional space (like a piece of graph paper), going left & right is the X coordinate. Going up and down is the Y coordinate.

Picture~

Two dimensions is simple enough. Now, we will cover a 3 dimensional space. X & Y are still the same. However we have a new coordinate where you for forward & backward motion.

Imagine you are holding that piece of graph paper in front of your face. Left & Right is X. Up & Down is Y. Going forward (through the paper) and back is the Z coordinate.

Another way to remember on how to differentiate Y vs Z is that Y is always for elevation.

If we were to think of it like a Compass, X is West & East, and Z (**NOT** Y) is North & South.

XYZ coordinates (from my personal experience) are *ALWAYS* stored as single precision floats consecutively in memory for any Wii Game that uses them. Therefore, using Paired Singles makes perfect sense for this exercise.

Important NOTE: On the majority of XYZ coordinates used in Wii games values increase vs decrease don't work how one would expect. For example, in the DBZ BT3 game, going up in elevation (Y coordinate), the float value decreases.

Luckily for MKWii, going up in Y increases the float. Which makes more sense from an intuition standpoint. If you are ever in need of finding the XYZ coordinates of a game, searching for the Y coordinate is the easiest route. This is because you have no idea how to search for X & Z because you have no reference point of what "north" is on your "compass".

This whole 'increase vs decrease for Y up vs down' situation isn't relevant for our Speed-O-Meter code, but I thought I should mention this since we are covering XYZ usage.

To keep this exercise form being too long, we are not going to 'find' the XYZ coordinates 'manually'. There is already a method to load them up based on the slot of the player.

When making a speed-o-meter, we can't use the metric of kmh or mph. The game doesn't understand kilometers or miles. We can however, do a metric of "units" per frame. Units per Hour would yield a too large of a number for piratical use.

To calculate the change of XYZ coordinates (or speed) from one instance of time to another instance, we use the following formula (credits to JoshuaMK for supplying this when I needed it for DBZ BT3)

XYZ Speed = sqrt{[(x2 - x1)^2] + [(y2 - y1)^2] + [(z2 - z1)^2]}

x1 = X coordinate at time instance #1 (or frame #1)

y1 = Y coordinate at time instance #1 (or frame #1)

z1 = Z coordinate at time instance #1 (or frame #1)

x2 = X coordinate at time instance #2 (or frame #2)

y2 = Y coordinate at time instance #2 (or frame #2)

z2 = Z coordinate at time instance #2 (or frame #2)

---

We will write out a C0 gecko code since we know that executes at exactly once per frame (aka 60 times a second). The following snippet of code can be used anywhere to load a specific slot/player's XYZ coordinates.

Code:

`#Set desired slot value, 0 (P1; you) used for this source`

li r11, 0

#Set r12 to Point to the XYZs

lis r12, 0x809C #Pal address specific, fyi

lwz r12, 0x18F8 (r12) #Pal address specific, fyi

cmpwi r12, 0 #Added in for C0 code, when not in a race/battle, no valid pointer will exist

beqlr- #If invalid, end C0 code

lwz r12, 0x0020 (r12)

rlwinm r11, r11, 2, 0, 29

lwzx r12, r12, r11

lwz r12, 0 (r12)

lwz r12, 0x8 (r12)

lwz r12, 0x90 (r12)

lwz r12, 0x4 (r12) #At this point r12 + 0x68 points directly to the XYZ coordinates

The above code is doing what is called 'pointer level' loading. It comes from Seeky's mkw-structures github repo. Specifically, the player.h page - https://github.com/SeekyCt/mkw-structure...r/player.h

The repo can help you pinpoint all sorts of data and how to reference that data later. Anyway, going back to the code at hand....

We need to write some code that will load up saved XYZ coordinates (last frame's aka frame 1) from a safe unused spot (like the EVA) and then store the current frame's XYZ overwriting the old ones present in the EVA. We will need 4 float registers simultaneously. 2 FPRs for the last frame's XYZ, and 2 FPR's for the current frames.

The very first frame this occurs, the frame 1 XYZ's will be all null. Which is fine, since its only 1 frame, you won't see this effect your speedometer in real time.

We will have the XYZs (that temp reside in EVA) be kept at the following addresses.

- 0x800007F0 = X

- 0x800007F4 = Y

- 0x800007F8 = Z

Let's write out that portion of the code..

Code:

`#Load up current frame's XYZs into f10 & f11`

psq_l f10, 0x68 (r12), 0, 0

lfs f11, 0x70 (r12)

#Set EVA Upper

lis r12, 0x8000

#Load last frame's XYZs from EVA

psq_l f12, 0x7F0 (r12), 0, 0

lfs f13, 0x7F8 (r12)

#Write in current frame's XYZs to update EVA

psq_st f10, 0x07F0 (r12), 0, 0

stfs f11, 0x07F8 (r12)

Alright, we got last frame's and current frame's XYZs in f10 thru f13. The EVA has been updated. Now we can do the formula...

Code:

`#Do the formula: sqrt{[(x2 - x1)^2] + [(y2 - y1)^2] + [(z2 - z1)^2]}`

#X2 = f10 ps0

#X1 = f12 ps0

#Y2 = f10 ps1

#Y1 = f12 ps1

#Z2 = f11 ps0

#Z1 = f13 ps0

#Preform X2 minus X1, and Y2 minus Y1

ps_sub f10, f10, f12

#Z2 minus Z1

fsubs f11, f11, f13

#Raise X and Y to power of 2

ps_mul f10, f10, f10

#Add X + Y

ps_sum0 f10, f10, f10, f10

#Raise Z to the power of 2, then add Z to (X+Y)

fmadds f10, f11, f11, f10

#Get Square Root

frsqrte f10, f10

fres f10, f10

We got our speed calculated, now we just need a way to display it in the race timer for us to see. First we need to convert the float result to an integer. We will store that to a different spot in the EVA so we don't write over the XYZs. We'll use the next available EVA address of 0x800007FC.

After completing the C0 code's source, we will write out a new code (C2) that is hooked at Joshua MK's Millisecond Modifier address. That C2 code will load the integer and use that value for the milliseconds of the race timer.

Let's finish off the rest of this C0 code now...

Code:

`#Convert float to integer, no need for a fabs instruction, the result is always positive; use standard rounding in the conversion`

fctiw f10, f10

#Store to EVA

li r11, 0x07FC

stfiwx f10, r12, r11

#End C0

#blr #uncomment if *NOT* using pyiiasmh, adjust compiled code accordingly to be a proper C0 code

Just to recap here is the final completed C0 code's source..

Code:

`#Set desired slot value, 0 (P1; you) used for this source, adjust this to your needs`

li r11, 0

#Set r12 to Point to the XYZs

lis r12, 0x809C #Pal address specific, fyi

lwz r12, 0x18F8 (r12) #Pal address specific, fyi

cmpwi r12, 0 #Added in for C0 code, when not in a race/battle, no valid pointer will exist

beqlr- #If invalid, end C0 code

lwz r12, 0x0020 (r12)

rlwinm r11, r11, 2, 0, 29

lwzx r12, r12, r11

lwz r12, 0 (r12)

lwz r12, 0x8 (r12)

lwz r12, 0x90 (r12)

lwz r12, 0x4 (r12) #At this point r12 + 0x68 points directly to the XYZ coordinates#Load up current frame's XYZs into f10 & f11

#Load up current frame's XYZs into f10 & f11

psq_l f10, 0x68 (r12), 0, 0

lfs f11, 0x70 (r12)

#Set EVA Upper

lis r12, 0x8000

#Load last frame's XYZs from EVA

psq_l f12, 0x7F0 (r12), 0, 0

lfs f13, 0x7F8 (r12)

#Write in current frame's XYZs to update EVA

psq_st f10, 0x07F0 (r12), 0, 0

stfs f11, 0x07F8 (r12)

#Do the formula: sqrt{[(x2 - x1)^2] + [(y2 - y1)^2] + [(z2 - z1)^2]}

#X2 = f10 ps0

#X1 = f12 ps0

#Y2 = f10 ps1

#Y1 = f12 ps1

#Z2 = f11 ps0

#Z1 = f13 ps0

#Preform X2 minus X1, and Y2 minus Y1

ps_sub f10, f10, f12

#Z2 minus Z1

fsubs f11, f11, f13

#Raise X and Y to power of 2

ps_mul f10, f10, f10

#Add X + Y

ps_sum0 f10, f10, f10, f10

#Raise Z to the power of 2, then add Z to (X+Y)

fmadds f10, f11, f11, f10

#Get Square Root

frsqrte f10, f10

fres f10, f10

#Convert float to integer, no need for a fabs instruction, the result is always positive; use standard rounding in the conversion

fctiw f10, f10

#Store to EVA

li r11, 0x07FC

stfiwx f10, r12, r11

#End C0

#blr #uncomment if *NOT* using pyiiasmh, adjust compiled code accordingly to be a proper C0 code

Moving onto the C2 code, we will load the integer from the EVA and replace the GPR (r28) with the new integer. This will change the output of the milliseconds on the timer.

Code:

`#C2 Hook Address`

#PAL = 807F84F8

#Set EVA Upper

lis r28, 0x8000

#Load integer from EVA, plaace in r5 to replace r5's original millisecond value

lwz r28, 0x7FC (r28)

And that's it for the C2 Code. Combining the two codes together, this is the final compiled code (PAL only)...

Code:

`XYZ Speed-O-Meter from Scratch [Vega]`

NOTE: Dolphin Only. Crashes on console. To fix: Use a C2 Hook Address that only occurs per frame in a race.

Works everywhere. Choose the slot you want to show the XYZ speed of.

X = Slot

PAL

C0000000 0000000F

3960000X 3D80809C

818C18F8 2C0C0000

4D820020 818C0020

556B103A 7D8C582E

818C0000 818C0008

818C0090 818C0004

E14C0068 C16C0070

3D808000 E18C07F0

C1AC07F8 F14C07F0

D16C07F8 114A6028

ED6B6828 114A02B2

114A5294 ED4B52FA

FD405034 ED405030

FD40501C 396007FC

7D4C5FAE 4E800020

C27F84F8 00000002

3F808000 839C07FC

60000000 00000000

Code created by: Vega

Credits: JoshuaMK (XYZ formula, Millisecond Modifier), Seeky (mkw-structures & player.h), Stebler (player.h)

Success!! For anyone not familiar with MKW Speed-O-Meters, Funky Kong's max speed drifting under normal conditions is 84 units/frame.

Fyi, if you don't want to include the Y coordinate (elevation) for your Speed-O-Meter, use this formula instead...

sqrt{[(x2 - x1)^2] + [(z2 - z1)^2]}

Happy coding!