Welcome, Guest |
You have to register before you can post on our site.
|
Forum Statistics |
» Members: 646
» Latest member: Luca1337
» Forum threads: 1,814
» Forum posts: 14,000
Full Statistics
|
Online Users |
There are currently 94 online users. » 1 Member(s) | 90 Guest(s) Bing, Google, Yandex, Vega
|
Latest Threads |
Top 10 Most Influential C...
Forum: Coding & Hacking General Discussion
Last Post: Vega
3 hours ago
» Replies: 2
» Views: 7,337
|
Show Ice Cube on Online P...
Forum: Online Non-Item
Last Post: _Ro
10 hours ago
» Replies: 0
» Views: 30
|
CPU Control Cycler [Ro]
Forum: Offline Non-Item
Last Post: _Ro
11 hours ago
» Replies: 7
» Views: 1,007
|
Thunder Cloud Effect Modi...
Forum: Offline; Item
Last Post: JerryHatrick
Yesterday, 11:13 PM
» Replies: 11
» Views: 1,085
|
MKW Coder/Developer of th...
Forum: Coding & Hacking General Discussion
Last Post: Vega
Yesterday, 09:10 PM
» Replies: 10
» Views: 13,828
|
Make it to 10,000
Forum: General Discussion
Last Post: Vega
Yesterday, 08:15 PM
» Replies: 7,338
» Views: 5,670,516
|
Miniturbos and Inside Dri...
Forum: Coding & Hacking General Discussion
Last Post: JerryHatrick
Yesterday, 09:54 AM
» Replies: 1
» Views: 858
|
Code request???
Forum: Code Support / Help / Requests
Last Post: DrTap
01-09-2025, 06:06 PM
» Replies: 3
» Views: 4,956
|
CPUs/Online Players Have ...
Forum: Visual & Sound Effects
Last Post: Zeraora
01-09-2025, 02:26 AM
» Replies: 2
» Views: 512
|
Offline Hide and Seek
Forum: Code Support / Help / Requests
Last Post: FelX
01-08-2025, 03:43 PM
» Replies: 11
» Views: 747
|
|
|
Extended Performance Monitor [stebler] |
Posted by: stebler - 08-07-2021, 08:38 AM - Forum: Visual & Sound Effects
- No Replies
|
|
Extended Performance Monitor [stebler]
This displays the game's built-in (from the EGG library) performance monitor and adds an additional bar to it.
Overview of mkw's main loop:
1. Wait on the video interface to make sure we are at the beginning of a frame.
2. Call SceneManager::draw, which will traverse the game engine's hierarchy and call the draw method of every element, which will send commands to the GPU.
3. While the GPU is processing commands, call SceneManager::calc, which will traverse the game engine's hierarchy the same way and call the calc method of every element, which will update their respective internal state for the next frame.
4. Wait for everything to complete.
What the bars measure:
Red: total frame time.
Green: SceneManager::draw.
Blue: time spent by the GPU.
Pink (custom bar): SceneManager::calc.
WIP decompilation of ProcessMeter: https://github.com/stblr/mkw/commit/868e...2806904e2d
Credits: chadderz, riidefi (documentation), Bean (previous performance monitor code).
Screenshot:
PAL:
040092b8 38600180
c2238a7c 0000000a
3c80802a 60843d60
909e0154 3c80ff50
6084ffff 909e0168
3c804040 909e016c
3c803f80 909e0170
38800000 989e0174
387e0060 389e0158
3d80800a 618cef80
7d8903a6 4e800421
7fc3f378 00000000
c200960c 00000004
80750050 81830048
818c0038 7d8903a6
4e800421 81950000
60000000 00000000
c200962c 00000004
4e800421 80750050
81830048 818c0040
7d8903a6 4e800421
60000000 00000000
c200968c 00000004
4e800421 80750050
81830048 818c0044
7d8903a6 4e800421
60000000 00000000
c2009700 00000004
9bf5006b 80750050
38630154 81830000
818c0010 7d8903a6
4e800421 00000000
c2009764 00000004
80750050 38630154
81830000 818c0014
7d8903a6 4e800421
2c180000 00000000
c20097a4 00000004
4e800421 80750050
81830048 818c003c
7d8903a6 4e800421
60000000 00000000
04238f34 60000000
0423910c 48000120
NTSC-U:
04009278 38600180
c22386f8 0000000a
3c808029 6084f9f8
909e0154 3c80ff50
6084ffff 909e0168
3c804040 909e016c
3c803f80 909e0170
38800000 989e0174
387e0060 389e0158
3d80800a 618ceee0
7d8903a6 4e800421
7fc3f378 00000000
c20095cc 00000004
80750050 81830048
818c0038 7d8903a6
4e800421 81950000
60000000 00000000
c20095ec 00000004
4e800421 80750050
81830048 818c0040
7d8903a6 4e800421
60000000 00000000
c200964c 00000004
4e800421 80750050
81830048 818c0044
7d8903a6 4e800421
60000000 00000000
c20096c0 00000004
9bf5006b 80750050
38630154 81830000
818c0010 7d8903a6
4e800421 00000000
c2009724 00000004
80750050 38630154
81830000 818c0014
7d8903a6 4e800421
2c180000 00000000
c2009764 00000004
4e800421 80750050
81830048 818c003c
7d8903a6 4e800421
60000000 00000000
04238bb0 60000000
04238d88 48000120
NTSC-J:
04009214 38600180
c223899c 0000000a
3c80802a 60843700
909e0154 3c80ff50
6084ffff 909e0168
3c804040 909e016c
3c803f80 909e0170
38800000 989e0174
387e0060 389e0158
3d80800a 618ceea0
7d8903a6 4e800421
7fc3f378 00000000
c2009568 00000004
80750050 81830048
818c0038 7d8903a6
4e800421 81950000
60000000 00000000
c2009588 00000004
4e800421 80750050
81830048 818c0040
7d8903a6 4e800421
60000000 00000000
c20095e8 00000004
4e800421 80750050
81830048 818c0044
7d8903a6 4e800421
60000000 00000000
c200965c 00000004
9bf5006b 80750050
38630154 81830000
818c0010 7d8903a6
4e800421 00000000
c20096c0 00000004
80750050 38630154
81830000 818c0014
7d8903a6 4e800421
2c180000 00000000
c2009700 00000004
4e800421 80750050
81830048 818c003c
7d8903a6 4e800421
60000000 00000000
04238e54 60000000
0423902c 48000120
NTSC-K:
04009408 38600180
c2238df0 0000000a
3c808029 60841d60
909e0154 3c80ff50
6084ffff 909e0168
3c804040 909e016c
3c803f80 909e0170
38800000 989e0174
387e0060 389e0158
3d80800a 618cefe0
7d8903a6 4e800421
7fc3f378 00000000
c2009714 00000004
80750050 81830048
818c0038 7d8903a6
4e800421 81950000
60000000 00000000
c2009734 00000004
4e800421 80750050
81830048 818c0040
7d8903a6 4e800421
60000000 00000000
c2009794 00000004
4e800421 80750050
81830048 818c0044
7d8903a6 4e800421
60000000 00000000
c2009808 00000004
9bf5006b 80750050
38630154 81830000
818c0010 7d8903a6
4e800421 00000000
c200986c 00000004
80750050 38630154
81830000 818c0014
7d8903a6 4e800421
2c180000 00000000
c20098ac 00000004
4e800421 80750050
81830048 818c003c
7d8903a6 4e800421
60000000 00000000
042392a8 60000000
04239480 48000120
Source code:
Code: # replace at 800092b8 (PAL)
# increase the size of ProcessMeter to store an additional CpuMonitor
li r3, 0x180
Code: # inject at 80238a7c (PAL)
# set the vtable
.set CpuMonitor_vtable, 0x802a3d60
lis r4, CpuMonitor_vtable@h
ori r4, r4, CpuMonitor_vtable@l
stw r4, 0x154 + 0x0 (r30)
# set the color
.set color, 0xff50ffff # pink: rgba(255, 80, 255, 255)
lis r4, color@h
ori r4, r4, color@l
stw r4, 0x154 + 0x4 + 0x10 (r30)
# set the y position
.set posY, 0x40400000 # 3.0f
lis r4, posY@h
stw r4, 0x154 + 0x4 + 0x14 (r30)
# set the y dimension
.set dimY, 0x3f800000 # 1.0f
lis r4, dimY@h
stw r4, 0x154 + 0x4 + 0x18 (r30)
# set the flags
li r4, 0 # none
stb r4, 0x154 + 0x4 + 0x1c (r30)
# add it to the list
.set List_Append, 0x800aef80
addi r3, r30, 0x60 # the list
addi r4, r30, 0x154 + 0x4 # the bar
lis r12, List_Append@h
ori r12, r12, List_Append@l
mtctr r12
bctrl
mr r3, r30 # original instruction
Code: # inject at 8000960c (PAL)
# call ProcessMeter::measureBeginFrame after Display::beginFrame
lwz r3, 0x50 (r21)
lwz r12, 0x48 (r3)
lwz r12, 0x38 (r12)
mtctr r12
bctrl
lwz r12, 0x0 (r21) # original instruction
Code: # inject at 8000962c (PAL)
bctrl # original instruction
# call ProcessMeter::measureBeginRender after Display::beginRender and before SceneManager::draw
lwz r3, 0x50 (r21)
lwz r12, 0x48 (r3)
lwz r12, 0x40 (r12)
mtctr r12
bctrl
Code: # inject at 8000968c (PAL)
bctrl # original instruction
# call ProcessMeter::measureEndRender after ProcessMeter::draw and before Display::endRender
lwz r3, 0x50 (r21)
lwz r12, 0x48 (r3)
lwz r12, 0x44 (r12)
mtctr r12
bctrl
Code: # inject at 80009700 (PAL)
stb r31, 0x6b (r21) # original instruction
# start the timer of the custom CPU monitor before SceneManager::calc
lwz r3, 0x50 (r21)
addi r3, r3, 0x154
lwz r12, 0x0 (r3)
lwz r12, 0x10 (r12)
mtctr r12
bctrl
Code: # inject at 80009764 (PAL)
# stop the timer of the custom CPU monitor after SceneManager::calc
lwz r3, 0x50 (r21)
addi r3, r3, 0x154
lwz r12, 0x0 (r3)
lwz r12, 0x14 (r12)
mtctr r12
bctrl
cmpwi r24, 0x0 # original instruction
Code: # inject at 800097a4 (PAL)
bctrl # original instruction
# call ProcessMeter::endFrame after Display::endFrame
lwz r3, 0x50 (r21)
lwz r12, 0x48 (r3)
lwz r12, 0x3c (r12)
mtctr r12
bctrl
Code: # replace at 80238f34 (PAL)
nop # prevent the game from hiding the bars
Code: # replace at 8023910c (PAL)
b 0x120 # hide the ugly element on the left
|
|
|
Paired Singles |
Posted by: Vega - 08-02-2021, 11:15 PM - Forum: PowerPC Assembly
- No Replies
|
|
Paired Singles
Chapter 1: Intro
Paired Singles are probably the most strenuous aspect of PPC Assembly to understand.
First thing's first, you have to already know the basics of PPC ASM, able to make codes, and you already understand the fundamentals of Floating Points + FPRs. I have a great tutorial covering Floats which you can find HERE. Read that first before coming back here.
Chapter 2: Paired Singles in a FPR
With normal floating point instructions, you have only been working with the first 64-bit segment. This first segment which all normal floating point instructions work on is known as Paired Single 0 (or ps0 for short). The second segment is Paired Single 1 (ps1). Paired Single instructions utilize both ps0 and ps1 simultaneously, and the floats in ps0 & ps1 can only be of single precision. Hence the term Paired Singles (a pair of single precision floats). Using paired single instructions when a double precision float is present in an FPR will cause undefined behavior/results and could lead to an exception.
Chapter 3: Paired Singles in Memory
Since the values in ps0 and ps1 are single precision, each paired single (if stored/loaded to/from memory) is a word value.
Example: You have this value in f1 - C06C800000000000 4170000000000000
- ps0 = C06C800000000000
- ps1 = 4170000000000000
If this paired single value was stored to memory, it will be shown as 0xC36400004B800000
- 0xC3640000 is the ps0 value
- 0x4B800000 is the ps1 value
It's important to understand that paired singles are always two words in memory linked together as a double word.
Chapter 4: Storing/Loading
Here are some basic instructions for loading and storing paired singles.
psq_l fD, SIMM (rA), W, I
The above example will load the paired single value located at rA+SIMM into fD. The single precision floating point value located at rA+SIMM will be loaded in ps0 and the single precision floating point value located at rA+SIMM+4 will be loaded into ps1.
psq_st fD, SIMM (rA), W, I
The above is a basic paired single storing instruction. Value of ps0 is stored to memory location rA+SIMM and the value of ps1 is stored at rA+SIMM+4.
SIMM is NOT a 16-bit signed range. For paired single loading/storing instructions it's actually a 12-bit signed range (0xFFFFF800 thru 0x000007FF). If this range isn't followed, your assembler will throw an error.
Note: The format I have shown above for the two instructions is the format that all PPC assemblers use. For whatever reason, Dolphin (in Code view) uses a slightly different format.
What are the W and I values?
W can only be two values. 0 or 1. If W is 1 then ps1 of the FPR is always loaded/stored as the floating point value of 1 (0x3FF0000000000000) regardless what is in the fpr/memory beforehand. If W is 0, then the loading/storing of ps1 acts normally.
The I value represents which Graphics Quantization Register (GQR) will be used. To keep things simple for now, for basic loading/storing of paired singles, simply set the I value to 0. This will be fine for most Wii games. More information covering the nitty gritty of GQRs will be discussed later in Chapter 8.
Chapter 5: Basic Math & Single Parameter Instructions
Like floating point instructions, there are plenty of instructions for paired single instructions for all sorts of math.
ps_add fD, fA, fB
#fA ps0 + fB ps0 = fD ps0
#fA ps1 + fB ps1 = fD ps1
Other math instructions:
ps_sub (subtracting)
ps_mul (multiplying)
ps_div (dividing)
You also have some basic single parameter instructions...
ps_mr fD, fA
#fA ps0's value is copied to fD ps0
#fB ps1's value is copied to fD ps1
ps_abs fD, fA
#absolute value of fA ps0 is placed into fD ps0
#absolute value of fA ps1 is palced into fD ps1
ps_nabs fD, fA
#same as ps_abs but using negative absolute value instead
ps_neg fD, fA
#fA ps0 (if positive) is changed to its negative value (i.e. 1 to -1), result placed in fD ps0. If negative, then the value it changed to positive (i.e. -1 to 1). Same effects will obviously also apply to fA ps1 and result will be placed into fD ps1.
Chapter 6: Comparisons
ps_cmpo is the instruction for ordered paired single comparisons. There is also the ps_cmpu instruction for unordered comparisons.
The difference between ordered and unordered comparisons is how the FPSCR (floating point status control register) is modified when a NaN (not a number) is present in the comparison. Thus, in simple terms if you have no idea about what I just stated, stick with ordered comparisons only (ps_cmpo).
There are two types of ordered comparisons. One for ps0 and one for ps1
ps_cmpo0 crf, fD, fA
#fD ps0's value is compared to fA ps0's value
ps_cmpo1 crf, fD, fA
#fD ps1's value is compared to fA ps1's value
Just like with floating point instructions, you need to specify the condition register field. To keep it simple, use cr1.
Example comparison:
#Let's say f3 ps1 has the value of 1.0 in decimal form, and f0 ps1 has the value of 3.5 in decimal form.
Code: ps_cmpo1 cr1, f3, f0
bgt- cr1, some_label
In the above source, the branch would NOT be taken because 1.0 is less then 3.5.
Chapter 7: Other Instructions
Paired Singles has some nifty instructions to move values across the paired single segments. There are 4 different instructions you can use.
ps_merge00 fD, fA, fB
#fA ps0 is copied to fD ps0. fB ps0 is copied to fD ps1
ps_merge01 fD, fA, fB
#fA ps0 is copied to fD ps0. fB ps1 is copied to fD ps1
ps_merge10 fD, fA, fB
#fA ps1 is copied to fD ps0. fB ps0 is copied to fD ps1
ps_merge11 fD, fA, fB
#fA ps1 is copied to fD ps0. fB ps1 is coped to fD ps1
You can use ps_merge10 to swap the ps values within an FPR. This could come in handy for you.
Example (swap the ps values of f0)~
Code: ps_merge10 f0, f0, f0
Another handy trick with ps_merge's is quick swapping of the ps0 values of two different FPRS. A beginner code that needed to swap the ps0 values of f11 and f12, may write some code like this...
Code: fmr f13, f11 #Use f13 as a scratch register
fmr f11, f12 #Place f12 into f11
fmr f12, f13 #Place f11 saved value into f12
However with the ps_merge type instructions, you can get this done with just 2 instructions and withOUT a scratch register...
Code: ps_merge00 f12, f11, f12 #Place f11 ps0 and f12 ps0 as new f12 ps0 and ps1
ps_merge11 f11, f12, f12 #Place f12 ps0 and ps1 as new f11 ps0 and ps1
---
Here are two instructions that allows you to do basic addition across the paired single segments.
ps_sum0 fD, fA, fC, fB
#fA ps0 + fB ps1 = fD ps0
#fC ps1 = fD ps1
ps_sum1 fD, fA, fC, fB
#fA ps0 + fB ps1 = fD ps1
#fC ps0 = fD ps0
Here's an example of adding ps0 and ps1 of an FPR together and having the result stored back in ps0 of the same FPR. We'll use f5 for the following example.
Code: ps_sum0 f5, f5, f5, f5
If for w/e reason you wanted the result in ps1 of f5, just use ps_sum1 instead.
Chapter 8: Scaling, Quantization, and the GQRs
When a paired single load instruction of any type is executed, this is known as Dequantization (or Dequantizing). If it's a paired single store instruction of any type, it is known as Quantization (or Quantizing). Quantizaton/Dequantization is just a fancy term for taking integer values and having them in a scale (or range) that can be converted back to or from a float value.
The use of Scaling allows you to have integer values (no larger than halfwords) in memory and utilize them as floating points (by a specified range) in the paired single segments of an FPR.
Revisiting Chapter 4 regarding the paired single loading/storing instructions, the I value stands for which Graphics Quantization Register (GQR) to use. There are 8 total GQRs. The I value can be any number from 0 to 7.
Each GQR is 32 bits of data. You can modify a GQR to effect how scaling is done during quantization and/or dequantization. Wii Games will usually set the all the GQR values during its early boot sequence. You can modify any GQR value but be sure to restore its value at some point later in your source/code.
You can write a value to the GQR's via the mtspr instruction. Here are the spr numbers for the GQRs
- 912 = GQR0
- 913 = GQR1
- 914 = GQR2
- 915 = GQR3
- 916 = GQR4
- 917 = GQR5
- 918 = GQR6
- 919 = GQR7
To write a register value to the GQR...
mtspr XXX, rD
#XXX = GQR's spr number, rD contains the value to copy to the GQR
To backup the value of a GQR to a register...
mfspr rD, XXX
#rD is the register where the GQR's value will be copied to, XXX = GQR's spr number
Structure of GQR:
- Bits 0 & 1 = Unused
- Bits 2 thru 7 = Scale Value for Dequantization; 6 bit value (L_Scale)
- Bits 8 thru 12 = Unused
- Bits 13 thru 15 = Dequantization Type; 3 bit value (L_Type)
- Bits 16 & 17 = Unused
- Bits 18 thru 23 = Scale Value for Quantization; 6 bit value (R_Scale)
- Bits 24 thru 28 = Unused
- Bits 29 thru 31 = Quantization Type; 3 bit value (R_Type)
L_Type and R_Type can only be 4 different values.
- 0x0 = Float (no scaling)
- 0x4 = Unsigned Byte
- 0x5 = Unsigned Halfword
- 0x6 = Signed Byte
- 0x7 = Signed Halfword
L_Scale & R Scale structure:
This is the value (anywhere from 0x00 to 0x3F) that represents the input of "scale" in the following formula:
2^scale = DQ (dequant/quant value)
During Dequantization:
Integer value (determined by L_Type) in memory is divided by DQ, that result is converted to a single precision float, and placed into its paired single segment of the FPR.
During Quantization:
Single precision float value of the paired single segment converted to its integer value (determined by R_Type), that result is multiplied by DQ, and placed into memory.
Confused? Yeah I don't blame you. Let's go over an example use of scaling.
Let's say you are making a code for your game that will utilize the GCN C stick values. You find where these values are in memory. There's a byte value for the X axis and a byte value right next to it for the Y axis. You move the sticks around and have come to the conclusion of the following ranges...
X axis range:
0x80 (furthest left) thru 0x00 (neutral) thru 0x7F (furthest right)
Y axis range:
0x80 (furthest down) thru 0x00 (neutral) thru 0x7F (furthest up)
Fyi: You can test this on MKWii. Load up Dolphin and the Dolphin-memory-engine. Be sure only the Keyboard for GCN emulation is being used as port/player 1 and go the byte address's depending on the region.
- NTSC-U: 80343E84 = X axis, 80343E85 = Y axis
- PAL: 80348204 = X axis, 80348205 = Y axis
- NTSC-J: 80347B84 = X axis, 80347B85 = Y axis
- NTSC-K: 80336204 = X axis, 80336205 = Y axis
Both bytes are signed since the neutral position is 0. We want to load these two byte values and scale them to floating points so we can preform quicker and more complex math on them compared to using normal integer instructions.
First we need to setup GQR7. The L_Type and R_Type will be set to 0x6 (signed byte). Now for L_Scale and R_Scale, what value do we use? Well the range of 0x00 thru 0xFF is a total of 256 unique values. A byte is 8 bits. If you preform the Scale equation of 2^8, you get a total of 256, matching the 256 unique values. Therefore, L_Scale and R_Scale will be 0x08 since scale's value used in the formula of 2^scale was 8. Now let's write the GQR7.
Code: lis r0, 0x0806 #Assuming r0 is safe for use in this example source
ori r0, r0, 0x0806
mtspr 919, r0 #Write new value to GQR7
GQR7 is set, let's load the two bytes into the paired segments of an FPR. We will use f18 as an example. Pretend the bytes are located at r9 + 0x0088.
Code: psq_l f18, 0x0088 (r9), 0, 7
Let's assume during the load, the X axis was a value of 0x00 (neutral) and the Y axis was 0xEF (sligthly down). f18 would have a value of...
- ps0 (X axis) 0x0000000000000000
- ps1 (Y axis) 0xBFB1000000000000
Convert these to decimal and you get 0.0, & -0.06640625
Scaling uses a range of -0.5 to 0.5 for signed bytes/halfwords and a range of 0.0 to 1.0 for unsigned (logical) bytes/halfwords. The ranges will be correct in response to your integers as long as you set the scale values correctly and you know the ranges of the integers beforehand.
Let's say we load them again but at a different point in time. X axis is 0x7F (max right), and Y axis is 0x33 (moderate up). f18 would have a value of...
- ps0 (X axis) 0x3FDFC00000000000
- ps1 (Y axis) 0x3FC9800000000000
Convert those to decimal and you get 0.49609375, & 0.19921875. If you are wondering why 0x7F didn't convert to 0.5 (max float) that's because 0x00 (0.0) is a possible value in the range and has to be accounted for. If any of the stick integer values were 0x80, that would result in -0.5.
If you were using something such as Halfword values where they had a range of 0x0000 thru 0xFFFF (65536 total possible values), you would use the scale formula of 2^16. 16 for 16 bits (halfword)
Chapter 9: Dolphin Inaccuracies, HID2 PSE, Re-visit of Float Instructions
Something you should know...
As of May 2024, Dolphin displays the FPRs incorrectly. An FPR is actually just one 64 bit segment. ps0 is the first 32-bits, and ps1 is the second 32-bits. This has forced me to incorrectly teach Beginners about the basics of floats on purpose.
Now that you know the true width of an FPR, we need to revisit a Float Basics that I have yet to mention. That is the PSE (Paired Single Enable) bit in the HID2 SPR Register. This bit obviously enables the Paired Single Instructions, but it also changes the effects/operations of every single-float instruction, and many other float instructions.
Under normal circumstances, your Wii Game is running with this PSE bit high. As an fyi, you cannot just simply toggle this bit on/off at will. Whenever you want to change this bit, you need invalidate the entire I-Cache, disable it, change the bit, and re-enable the I-Cache.
Anyway, when this PSE bit is low, Broadway will always read any FPR as a whole. Meaning when single-floats are placed into an FPR, they are changed to their 64-bit equivalent. This is important to understand.
Here is a list of operational descriptions of every Float instruction that's effected by this PSE bit. I didn't bother listing Paired Singles, because when PSE is low, they are simply illegal.
fadds, fdivs, fmuls, etc (any math based single-float instruction):
HID2 PSE low: Entire FPRs used. If Source Registers are double-precision, operation is done first, then fD is converted to 64-bit Single Precision. Result is always 64-bit Single Precision.
HID2 PSE high: ps0's of Source Registers used for operation. ps0 and ps1 of fD gets the 32-bit Single Precision result.
fmr:
HID2 PSE low: Entire FPR of fA copied to fD.
HID2 PSE high: ps0 of fA copied to ps0 of fD. ps1 of fD left UNCHANGED.
fres
HID2 PSE low: Entire FPRs used. If Source Register is double-precision, the operation is done first, then fD is changed to a 64-bit Single.
HID2 PSE high: ps0 of Source Register used for operation. ps0 and ps1 of fD gets the 32-bit Single Precision result.
frsqrte:
HID2 PSE low: Entire FPRs used. If Source Register was double precision, result REMAINS as double precision.
HID2 PSE high: ps0 of Source Register used for operation. ps0 of fD gets 32-bit Single result. ps1 of fD is left UNCHANGED.
frsp:
HID2 PSE low: Entire FPRs used. Result is always a 64-bit Single.
HID2 PSE high: ps0 of Source Regiser used for operation. ps0 of fD gets 32-bit Single result. ps1 of fD is left UNDEFINED (junk value).
lfs:
HID2 PSE low: 32-bit Single float changed to 64-bit width. Placed into FPR.
HID2 PSE high: 32-bit Single float placed into both ps0 and ps1 of FPR.
lfd:
HID2 PSE low: 64-bit width float placed into FPR.
HID2 PSE high: 64-bit width float changed to 32-bit single. That single float is placed into ps0 of FPR with ps1 being left UNDEFINED (junk)
stfs:
HID2 PSE low: 64-bit float changed to 32-bit Single Precision float, then stored to EA.
HID2 PSE high: ps0 of FPR stored to EA.
stfd:
HID2 PSE low: 64-bit float stored to EA.
HID2 PSE high: ps0 is changed to 64-bits, then stored to EA.
First fyi: From very limited personal testing, it appears most of the time, junk values result as 0x3F800000 (1.0). Just something to note.
Second fyi: fabs, fneg, and fnabs operate the same regardless of HID2 PSE. For fabs, bit 0 of the FPR is set low. For fnabs, bit 0 of the FPR is set high. And finally for fneg, bit 0 of the FPR is flipped.
Keep in mind this is how Real Hardware does it. Dolphin is not accurate here. As you can see, I have basically thrown a curveball at you everything you've learned in the past.
Shoutout to CLF78 for help with the contents in this Chapter!
Chapter 10: Real World Examples
Still confused by Paired Singles? Or not sure how some of these instructions can optimize one of your sources? It may be best to look at a real code that uses multiple different types of paired single instructions where using such instructions improves that source from just using plain jane floating point instructions.
Here is an MKWii code I have created that utilizes some paired single instructions - https://mkwii.com/showthread.php?tid=1260
Summary of the code:- Purpose is to allow the user to spin the WiFi globe in the direction depending on their analog stick
- Analog Stick X and Y values reside in memory as a paired single and work in a value range of -1 to 1.
- Longitude and Latitude of the globe resides in memory as a paired single.
- Load the Longitude & Latitude and update its values depending on the Analog Stick X and Y current values
Here is another MKWii code (created by CLF78) that uses a quantized paired single load - https://mariokartwii.com/showthread.php?tid=1887
Chapter 11: Exercise
The following exercise will help you understand more about...- Paired Single Loads & Stores (psq_l & psq_st)
- Paired Single Math Operations (ps_mul & ps_sub)
- Paired Single Math across Segments (ps_sum0)
A great way to get use to programming with Paired Singles is making a Speed-O-Meter from XYZ Coordinates. We will do this for the PAL MKWii game. We will pretend that there is no spot in Memory where the game keeps any kind of Speed measurement at.
This type of Speed-O-Meter is better than the typical type because the typical type uses Engine Speed. The one we will make (XYZ type) will factor in events such as boost panels, conveyor belts, air movement, etc.
First we need to understand what XYZ coordinates are. XYZ coordinates are a measurement/value that can give you the location of an object in a 3 dimensional space.
In a two dimensional space (like a piece of graph paper), going left & right is the X coordinate. Going up and down is the Y coordinate.
Picture~
Two dimensions is simple enough. Now, we will cover a 3 dimensional space. X & Y are still the same. However we have a new coordinate where you for forward & backward motion.
Imagine you are holding that piece of graph paper in front of your face. Left & Right is X. Up & Down is Y. Going forward (through the paper) and back is the Z coordinate.
Another way to remember on how to differentiate Y vs Z is that Y is always for elevation.
If we were to think of it like a Compass, X is West & East, and Z (**NOT** Y) is North & South.
XYZ coordinates (from my personal experience) are *ALWAYS* stored as single precision floats consecutively in memory for any Wii Game that uses them. Therefore, using Paired Singles makes perfect sense for this exercise.
Important NOTE: On the majority of XYZ coordinates used in Wii games values increase vs decrease don't work how one would expect. For example, in the DBZ BT3 game, going up in elevation (Y coordinate), the float value decreases.
Luckily for MKWii, going up in Y increases the float. Which makes more sense from an intuition standpoint. If you are ever in need of finding the XYZ coordinates of a game, searching for the Y coordinate is the easiest route. This is because you have no idea how to search for X & Z because you have no reference point of what "north" is on your "compass".
This whole 'increase vs decrease for Y up vs down' situation isn't relevant for our Speed-O-Meter code, but I thought I should mention this since we are covering XYZ usage.
To keep this exercise form being too long, we are not going to 'find' the XYZ coordinates 'manually'. There is already a method to load them up based on the slot of the player.
When making a speed-o-meter, we can't use the metric of kmh or mph. The game doesn't understand kilometers or miles. We can however, do a metric of "units" per frame. Units per Hour would yield a too large of a number for piratical use.
To calculate the change of XYZ coordinates (or speed) from one instance of time to another instance, we use the following formula (credits to JoshuaMK for supplying this when I needed it for DBZ BT3)
XYZ Speed = sqrt{[(x2 - x1)^2] + [(y2 - y1)^2] + [(z2 - z1)^2]}
x1 = X coordinate at time instance #1 (or frame #1)
y1 = Y coordinate at time instance #1 (or frame #1)
z1 = Z coordinate at time instance #1 (or frame #1)
x2 = X coordinate at time instance #2 (or frame #2)
y2 = Y coordinate at time instance #2 (or frame #2)
z2 = Z coordinate at time instance #2 (or frame #2)
---
We will write out a C0 gecko code since we know that executes at exactly once per frame (aka 60 times a second). The following snippet of code can be used anywhere to load a specific slot/player's XYZ coordinates.
Code: #Set desired slot value, 0 (P1; you) used for this source
li r11, 0
#Set r12 to Point to the XYZs
lis r12, 0x809C #Pal address specific, fyi
lwz r12, 0x18F8 (r12) #Pal address specific, fyi
cmpwi r12, 0 #Added in for C0 code, when not in a race/battle, no valid pointer will exist
beqlr- #If invalid, end C0 code
lwz r12, 0x0020 (r12)
rlwinm r11, r11, 2, 0, 29
lwzx r12, r12, r11
lwz r12, 0 (r12)
lwz r12, 0x8 (r12)
lwz r12, 0x90 (r12)
lwz r12, 0x4 (r12) #At this point r12 + 0x68 points directly to the XYZ coordinates
The above code is doing what is called 'pointer level' loading. It comes from Seeky's mkw-structures github repo. Specifically, the player.h page - https://github.com/SeekyCt/mkw-structure...r/player.h
The repo can help you pinpoint all sorts of data and how to reference that data later. Anyway, going back to the code at hand....
We need to write some code that will load up saved XYZ coordinates (last frame's aka frame 1) from a safe unused spot (like the EVA) and then store the current frame's XYZ overwriting the old ones present in the EVA. We will need 4 float registers simultaneously. 2 FPRs for the last frame's XYZ, and 2 FPR's for the current frames.
The very first frame this occurs, the frame 1 XYZ's will be all null. Which is fine, since its only 1 frame, you won't see this effect your speedometer in real time.
We will have the XYZs (that temp reside in EVA) be kept at the following addresses.- 0x800007F0 = X
- 0x800007F4 = Y
- 0x800007F8 = Z
Let's write out that portion of the code..
Code: #Load up current frame's XYZs into f10 & f11
psq_l f10, 0x68 (r12), 0, 0
lfs f11, 0x70 (r12)
#Set EVA Upper
lis r12, 0x8000
#Load last frame's XYZs from EVA
psq_l f12, 0x7F0 (r12), 0, 0
lfs f13, 0x7F8 (r12)
#Write in current frame's XYZs to update EVA
psq_st f10, 0x07F0 (r12), 0, 0
stfs f11, 0x07F8 (r12)
Alright, we got last frame's and current frame's XYZs in f10 thru f13. The EVA has been updated. Now we can do the formula...
Code: #Do the formula: sqrt{[(x2 - x1)^2] + [(y2 - y1)^2] + [(z2 - z1)^2]}
#X2 = f10 ps0
#X1 = f12 ps0
#Y2 = f10 ps1
#Y1 = f12 ps1
#Z2 = f11 ps0
#Z1 = f13 ps0
#Preform X2 minus X1, and Y2 minus Y1
ps_sub f10, f10, f12
#Z2 minus Z1
fsubs f11, f11, f13
#Raise X and Y to power of 2
ps_mul f10, f10, f10
#Add X + Y
ps_sum0 f10, f10, f10, f10
#Raise Z to the power of 2, then add Z to (X+Y)
fmadds f10, f11, f11, f10
#Get Square Root
frsqrte f10, f10
fres f10, f10
We got our speed calculated, now we just need a way to display it in the race timer for us to see. First we need to convert the float result to an integer. We will store that to a different spot in the EVA so we don't write over the XYZs. We'll use the next available EVA address of 0x800007FC.
After completing the C0 code's source, we will write out a new code (C2) that is hooked at Joshua MK's Millisecond Modifier address. That C2 code will load the integer and use that value for the milliseconds of the race timer.
Let's finish off the rest of this C0 code now...
Code: #Convert float to integer, no need for a fabs instruction, the result is always positive; use standard rounding in the conversion
fctiw f10, f10
#Store to EVA
li r11, 0x07FC
stfiwx f10, r12, r11
#End C0
#blr #uncomment if *NOT* using pyiiasmh, adjust compiled code accordingly to be a proper C0 code
Just to recap here is the final completed C0 code's source..
Code: #Set desired slot value, 0 (P1; you) used for this source, adjust this to your needs
li r11, 0
#Set r12 to Point to the XYZs
lis r12, 0x809C #Pal address specific, fyi
lwz r12, 0x18F8 (r12) #Pal address specific, fyi
cmpwi r12, 0 #Added in for C0 code, when not in a race/battle, no valid pointer will exist
beqlr- #If invalid, end C0 code
lwz r12, 0x0020 (r12)
rlwinm r11, r11, 2, 0, 29
lwzx r12, r12, r11
lwz r12, 0 (r12)
lwz r12, 0x8 (r12)
lwz r12, 0x90 (r12)
lwz r12, 0x4 (r12) #At this point r12 + 0x68 points directly to the XYZ coordinates#Load up current frame's XYZs into f10 & f11
#Load up current frame's XYZs into f10 & f11
psq_l f10, 0x68 (r12), 0, 0
lfs f11, 0x70 (r12)
#Set EVA Upper
lis r12, 0x8000
#Load last frame's XYZs from EVA
psq_l f12, 0x7F0 (r12), 0, 0
lfs f13, 0x7F8 (r12)
#Write in current frame's XYZs to update EVA
psq_st f10, 0x07F0 (r12), 0, 0
stfs f11, 0x07F8 (r12)
#Do the formula: sqrt{[(x2 - x1)^2] + [(y2 - y1)^2] + [(z2 - z1)^2]}
#X2 = f10 ps0
#X1 = f12 ps0
#Y2 = f10 ps1
#Y1 = f12 ps1
#Z2 = f11 ps0
#Z1 = f13 ps0
#Preform X2 minus X1, and Y2 minus Y1
ps_sub f10, f10, f12
#Z2 minus Z1
fsubs f11, f11, f13
#Raise X and Y to power of 2
ps_mul f10, f10, f10
#Add X + Y
ps_sum0 f10, f10, f10, f10
#Raise Z to the power of 2, then add Z to (X+Y)
fmadds f10, f11, f11, f10
#Get Square Root
frsqrte f10, f10
fres f10, f10
#Convert float to integer, no need for a fabs instruction, the result is always positive; use standard rounding in the conversion
fctiw f10, f10
#Store to EVA
li r11, 0x07FC
stfiwx f10, r12, r11
#End C0
#blr #uncomment if *NOT* using pyiiasmh, adjust compiled code accordingly to be a proper C0 code
Moving onto the C2 code, we will load the integer from the EVA and replace the GPR (r28) with the new integer. This will change the output of the milliseconds on the timer.
Code: #C2 Hook Address
#PAL = 807F84F8
#Set EVA Upper
lis r28, 0x8000
#Load integer from EVA, plaace in r5 to replace r5's original millisecond value
lwz r28, 0x7FC (r28)
And that's it for the C2 Code. Combining the two codes together, this is the final compiled code (PAL only)...
Code: XYZ Speed-O-Meter from Scratch [Vega]
NOTE: Dolphin Only. Crashes on console. To fix: Use a C2 Hook Address that only occurs per frame in a race.
Works everywhere. Choose the slot you want to show the XYZ speed of.
X = Slot
PAL
C0000000 0000000F
3960000X 3D80809C
818C18F8 2C0C0000
4D820020 818C0020
556B103A 7D8C582E
818C0000 818C0008
818C0090 818C0004
E14C0068 C16C0070
3D808000 E18C07F0
C1AC07F8 F14C07F0
D16C07F8 114A6028
ED6B6828 114A02B2
114A5294 ED4B52FA
FD405034 ED405030
FD40501C 396007FC
7D4C5FAE 4E800020
C27F84F8 00000002
3F808000 839C07FC
60000000 00000000
Code created by: Vega
Credits: JoshuaMK (XYZ formula, Millisecond Modifier), Seeky (mkw-structures & player.h), Stebler (player.h)
Success!! For anyone not familiar with MKW Speed-O-Meters, Funky Kong's max speed drifting under normal conditions is 84 units/frame.
Fyi, if you don't want to include the Y coordinate (elevation) for your Speed-O-Meter, use this formula instead...
sqrt{[(x2 - x1)^2] + [(z2 - z1)^2]}
Happy coding!
|
|
|
|