Welcome, Guest |
You have to register before you can post on our site.
|
Online Users |
There are currently 90 online users. » 1 Member(s) | 87 Guest(s) Bing, Google, _Ro
|
Latest Threads |
Show Ice Cube on Online P...
Forum: Online Non-Item
Last Post: _Ro
23 minutes ago
» Replies: 0
» Views: 6
|
CPU Control Cycler [Ro]
Forum: Offline Non-Item
Last Post: _Ro
50 minutes ago
» Replies: 7
» Views: 983
|
Thunder Cloud Effect Modi...
Forum: Offline; Item
Last Post: JerryHatrick
9 hours ago
» Replies: 11
» Views: 1,056
|
MKW Coder/Developer of th...
Forum: Coding & Hacking General Discussion
Last Post: Vega
11 hours ago
» Replies: 10
» Views: 13,797
|
Make it to 10,000
Forum: General Discussion
Last Post: Vega
Yesterday, 08:15 PM
» Replies: 7,338
» Views: 5,668,675
|
Miniturbos and Inside Dri...
Forum: Coding & Hacking General Discussion
Last Post: JerryHatrick
Yesterday, 09:54 AM
» Replies: 1
» Views: 856
|
Code request???
Forum: Code Support / Help / Requests
Last Post: DrTap
01-09-2025, 06:06 PM
» Replies: 3
» Views: 4,946
|
CPUs/Online Players Have ...
Forum: Visual & Sound Effects
Last Post: Zeraora
01-09-2025, 02:26 AM
» Replies: 2
» Views: 501
|
Offline Hide and Seek
Forum: Code Support / Help / Requests
Last Post: FelX
01-08-2025, 03:43 PM
» Replies: 11
» Views: 727
|
Show Nametags During Coun...
Forum: Visual & Sound Effects
Last Post: _Ro
01-08-2025, 07:48 AM
» Replies: 1
» Views: 666
|
|
|
Broadway SPR Rules + Notes + Mini Guides + Misc Random Stuff |
Posted by: Vega - 05-15-2022, 02:37 PM - Forum: Resources and References
- No Replies
|
|
Broadway SPR Rules + Notes + Mini Guides + Misc Random Stuff
I have made this document when I needed a quick reference card to messing around with Broadways' special purpose registers. Might as well share it for anyone who needs it.
Sync and isync are synchronizing instructions. They play a huge role when modifying the Hardware Registers and certain other Special Purpose Registers.
Times when sync or isync are required~
1. sync is required BEFORE & AFTER clearing the L2E bit of the L2CR
2. isync is required BEFORE & AFTER modifying the ICE bit of HID0
3. isync is required BEFORE modifying ILOCK bit of HID0
4. sync is required BEFORE & AFTER modifying DCE bit of HID0**
5. sync is required BEFORE modifying DLOCK bit of HID0
6. isync is required AFTER any mtsr instruction when the segment registers effects an Instruction EA
7. isync is required AFTER any mtsrin instruction when the segment register(s) effects the Instruction EA
8. isync is required BEFORE & AFTER any mtsr instruction when the segment registers effect a Data EA
9. iysnc is required BEFORE & AFTER any mtsrin instruction when the segment registers effect a Data EA
10. sync is required BEFORE any modification to SDR1
11. isync is required AFTER any modification of SDR1
12. sync is required BEFORE modifying the POW bit of the MSR
13. isync is required AFTER modifying ANY of the following MSR bits....
POW, PR, ME, FP, SE, BE, IR, DR, RI, FE0, FE1
14. sync is required AFTER modifying the L2FM bit of HID4
15. sync is required AFTER reading the DMAQL bit(s) of HID2; you cannot write to these bits btw
16. sync is required AFTER writing the F bit high in the DMAL Register
17. isync is required AFTER writing to the IABR
18. sync is required AFTER writing to the DABR
19. isync is required AFTER modifying an IBAT Register***
20. isync is required BEFORE & AFTER modifying a DBAT Register****
21. isync is required BEFORE a tlbie instruction if the TLB effects Data
22. isync is required AFTER a tlbie instruction where the TLB effects Instructions
23. sync is required AFTER a tblie instruction if the TLB effects Data
24. tlbsync is required AFTER a TLB invalidation
***You can have multiple IBAT Registers be modified consecutively, with a single isync at the end.
****You can have multiple DBAT Registers be modified consecutively, with a single isync at the beginning & end
References for above isync/sync rules~
1. Broadway User Manual page 319.
2. Broadway User Manual page 137 for the 'BEFORE'. Broadway User Manual page 319 for the 'AFTER'
3. Broadway User Manual pages 60 & 137.
4. Broadway User Manual page 136 for the 'BEFORE'. Regarding the 'AFTER' there is no reference. Not mentioned in the Broadway manual but its a high probability since you are changing how Data Cache is executed.
5. Broadway User Manual pages 60 & 136.
6. PowerPC Microprocessor Family: The Programming Environments, table 2-23 (page 2-43)
7. PowerPC Microprocessor Family: The Programming Environments, table 2-23 (page 2-43)
8. PowerPC Microprocessor Family: The Programming Environments, table 2-22 (page 2-41)
9. PowerPC Microprocessor Family: The Programming Environments, table 2-22 (page 2-41)
10. PowerPC Microprocessor Family: The Programming Environments, table 2-22 (page 2-42)
11. PowerPC Microprocessor Family: The Programming Environments, table 2-22 (page 2-42)
12. Broadway User Manual page 330
13. POW Bit - Broadway User Manual page 330
13. PR Bit - Broadway User Manual page 89
13. All other bits: PowerPC Microprocessor Family: The Programming Environments, tables 2-22 & 2-23 (pages 2-41 & 2-43)
14. Broadway User Manual Page 66
15. Broadway User Manual Page 66
16. Broadway User Manual Page 324
17. Broadway User Manual Page 177
18. No reference. High probability. Since an isync is required for the IABR, it would make sense for a sync to be required for the DABR.
19. PowerPC Microprocessor Family: The Programming Environments, table 2-23 (page 2-43)
20. PowerPC Microprocessor Family: The Programming Environments, table 2-22 (page 2-42)
21. PowerPC Microprocessor Family: The Programming Environments, table 2-22 (page 2-42)
22. PowerPC Microprocessor Family: The Programming Environments, table 2-23 (pages 2-43 & 2-44)
23. PowerPC Microprocessor Family: The Programming Environments, table 2-23 (pages 2-43 & 2-44)
24. Broadway User Manual Page 180
NOTE: From everything I have read so far on the Broadway Manual & other PPC manuals, there is never a circumstance that ever requires a back-to-back sync nor a circumstance that ever requires a back-to-back isync.
Other HID0 bit rules:
1. Never have SGE bit on whenever the Write Pipe is enabled (reference: Broadway User Manual page 326)
HID2 bit rules:
1. The entire Instruction Cache must be disabled then invalidated before modifying the LSQE, PSE, and/or LCE bits. (reference: Broadway User Manual page 64)
HID4 rules:
You cannot modify any HID4 L2CR related bits while L2CR is on!
Bit 0 of HID4 must always be written as high (1), it will always read as 1 (reference: Broadway User Manual page 65).
Once bits 3 & 4 are set, its value cannot be lowered (reference: Broadway User Manual page 66)
Bit 3 & 4 settings:
00 = bus max depth of 2
01 = depth of 3
10 = depth of 4
11 = reserved/unused
Thus if the setting is 10, it cannot be changed at all.
The L2FM field (bits 1 & 2) follow this same rule, can also not be lowered once set (reference: Broadway User Manual page 66)
L2CR Mini Guides:
NOTE: Interrupts should always be masked (disabled) during any L2CR operations.
Following guides are referenced using Section 9.1.3 (Page 318), and Section 9.1.4 (Page 319) of the Broadway User Manual,
Guide for L2 Cache Global Invalidation:
*Interrupts must be masked (disabled) throughout this entire process
*DPM bit of HID0 must be low throughout this entire process
1. Disable the L2CR by setting the L2E bit low
2. Initiate the Global Invalidation by setting the L2I bit high #Steps 1 and 2 MUST be done separately!
3. Run a Loop that constantly checks the L2IP bit. Once that bit is low, the Invalidation has been completed.
Guide to Initialize L2CR:
*Interrupts must be masked (disabled) througout this entire process
*DPM bit of HID0 must be low throughout this entire process
1. Globally Invalidate the L2 Cache (see above guide)
2. Disable L1 instruction cache of HID0
3. Turn on the L2CR by setting the L2E bit high
4. Restore L1 instruction cache of HID0
Guide to Configure L2CR:
*Interrupts must be masked (disabled) throughout this entire process
*DPM bit of HID0 must be low throughout this entire process
1. Turn on the L2CR (see above guide)
2. Disable L1 instruction cache of HID0
3. Set L2CR L2DO bit high #example bit, can be a diff config bit of your choice
4. Restore L1 instruction cache of HID0
Guide to Turn off L2CR:
*Interrupt must be masked (disabled) throughout this entire process
1. Simply set the L2E bit low
*If you plan on invalidating or re-enabling L2CR after this, DPM bit in HID0 must be low before you start the invalidation or before you re-enable.
Guide to go into Reduced Power Mode (doze, nap, sleep)
*Interrupts must be ON! for the entire guide
1. Set Desired Power Mode bit high on HID0.
2. Flip POW bit high in the MSR (don't forget your sync before and isync after!)
3. Broadway will enter new power mode in a few clock cycles
Reference: Chapter 10.2 of the Broadway User Manual (pages 327 thru 330)
Reduced Power Mode options:
Doze: Time Base & Decrementer still work
Nap: Time Base & Decrementer still work
Sleep: Time Base & Decrementer do NOT work
You can get out of Doze and Nap mode by setting the Decrementer to the desired value before going into said power mode. Once Decrementer goes below 0, the Decrementer exception will run. Be sure to write some custom code in the Decrementer Exception to get you back to full power mode and back to normal operations.
Miscellaneous fun fact~
While in supervisor mode (MSR: PR bit low) any mtspr instruction involving HID1 or PVR will execute as a nop (reference Broadway User Manual page 90).
|
|
|
Utilizing the Condition Register |
Posted by: Vega - 05-15-2022, 01:07 PM - Forum: PowerPC Assembly
- Replies (2)
|
|
Utilizing the Condition Register
For Advanced ASM Coders
Chapter 1: Intro
Requirements:- Understand the basics of using compare and branch instructions
- Understand binary/bits + Logical Operations
Are your codes ending up with "countless" branches and branch labels? Your codes are in need of some spring cleaning. Sometimes codes that have an excessive amount branches end up "unreadable", making it difficult for others to understand your code or help you debug any errors.
Chapter 2: Condition Register Fundamentals
When you execute any plane jane comparison instruction such as....
...you are actually telling Broadway to run a check and then place the result of said check in Condition Register Field 0.
What is Condition Register Field 0? First thing's first. The register you see in Dolphin that is named "CR" is the Condition Register. It contains the results of previously executed Compare Instructions. Conditional Branch Instructions (i.e. beq) read the data of the Condition Register to determine whether or not a branch route/jump is taken.
The Condition Register contains 7 fields. Field 0 (cr0) thru Field 7 (cr7).
STUVWXYZ
S = cr0
T = cr1
U = cr2
etc etc..
Each field (crF) takes up one DIGIT (half-byte) in the CR. Thus, each crF contains 4 bits of data. You can specify which crF to place the result of the compare instruction in. By default, if no crF is specified in your compare instruction, then cr0 will be used.
is short for...
If you wish to use cr7 instead of cr0, you would write the instruction like this...
An important thing that you must keep in mind is that if you make a comparison that is NOT using cr0, you must also specify the crF in the subsequent branch instruction.
Like this...
Code: cmpwi cr7, r0, 100
beq- cr7, store_data #Notice the specification of cr7 in the instruction
In conclusion, any crF that isn't cr0 must be specified in both compare and branch instructions.
Chapter 3: Condition Register Field Bits and Examining Branch Instructions
Now that you know there are 7 crF's and how to use each one in your comparison + branch instructions, let's cover the crF bits and what each bit represents.
Each crF has 4 bits of data that uses the following structure.- bit 0 = Less-Than flag (LT)
- bit 1 = Greater-Than flag (GT)
- bit 2 = Equal flag (EQ)
- bit 3 = Summary Overflow flag (SO)
CR Bit Table
LT GT EQ SO crfX
0 1 2 3 crf0
4 5 6 7 crf1
8 9 10 11 crf2
12 13 14 15 crf3
16 17 18 19 crf4
20 21 22 23 crf5
24 25 26 27 crf6
28 29 30 31 crf7
Whenever a bit in the crF is high, the condition was true FROM THE MOST RECENT comparison instruction. Whenever a bit was low, the condition was false FROM THE MOST RECENT comparison.
Multiple bits can be flagged high and/or low from a comparison instruction. Now that you understand the crF bits, let's go over what branch instructions actually do.
Code: bge (branch if greater than or equal) = checks bits 0, if bit is low, branch is taken
bgt (branch if greater than) = checks bit 1, if bit is high, branch is taken
ble (branch if less than or equal) = checks bit 1, if bit is low, branch is taken
blt (branch if less than) = checks bit 0, if it is high, branch is taken
bne (branch if not equal) = checks bit 2, if bit is low, branch is taken
bng (branch if not greater than) = equivalent to ble
bnl (branch if not less than) = equivalent to bge
bns (branch if not summary overflow) = checks bit 3, if bit is low, branch is taken
bso (branch if summary overflow) = checks bit 3, if bit is high, branch is taken
The branch instruction checks the bits of the crF that is specified in the instruction.
Example: 'bge- cr7' checks LT bit of cr7. If low, branch is taken.
Chapter 4: Condition Register specific instructions
Before going into the CR specific instructions, we need to go over its 'format'. The 'format' of a typical CR instruction is this..
crXXX B, B, B #XXX = and, or, andc, orc, nor, xor, eqv
Under this format, you need to specify the exact bit of the entire Condtion Register. The problem with this is that it now becomes a memory game and you have to refer to the earlier CR bit table provided in Chapter 3. Instead of doing that non-sense, you can use this handy formula...
B = 4*crX+ZZ
X = field number (0 thru 7)
ZZ = lt, gt, eq, or so
With this formula, all you need to remember to which Field you want to use and what bit type. So now the easier-to-remember 'format' is this..
crXXX 4*crX+ZZ, 4*crX+ZZ, 4*crX+ZZ
---
CR Based Instructions:
- Condition Register Logical OR~
cror crfD, crfA, crfB #crfA bit is logically OR'd with crfB bit. Result is written to crfD bit.
- Condition Register Logical AND~
crand crfD, crfA, crfD #crfA bit is logically AND'd with crfB bit. Result is written to crfD bit.
- Condition Register Logical NOR~
crnor crfD, crfA, crfD #crfA bit is logically NOR'd with crfB bit. Result is written to crfD bit.
- Condition Register Logical XOR~
crxor crfD, crfA, crfD #crfA bit is logically XOR'd with crfB bit. Result is written to crfD bit.
- Condition Register Logical EQV (XNOR)~
creqv crfD, crfA, crfD #crfA bit is logically XNOR'd with crfB bit. Result is written to crfD bit. Technically, the instruction does a XOR of crfA with crfD, then this temp result is complemented, then writes that result to crfD.
- Condition Register Logical AND with Complement~
crandc crfD, crfA, crfD #crfA bit is logically AND'd with the complemented crfB bit. Result is written to crfD bit.
- Condition Register Logical OR with Complement~
crorc crfD, crfA, crfD #crfA bit is logically OR'd with the complemented crfB bit. Result is written to crfD bit.
Simplified Mnemonics:
- Setting a bit high (set cr0 EQ high)~
crset 4*cr0+eq #creqv 4*cr0+eq, 4*cr0+eq, 4*cr0+eq; crF bit is XNOR'd with itself and resutl written to same bit spot
- Setting a bit low (set cr0 EQ low)~
crclr 4*cr0+eq #crxor 4*cr0+eq, 4*cr0+eq, 4*cr0+eq; crF bit is XOR'd with itself and result written to same bit spot
- Copy-Pasting (Moving) a bit (copy cr0 EQ bit to cr7 EQ bit's spot)
crmove 4*cr7+eq, 4*cr0+eq #cror 4*cr7+eq, 4*cr0+eq, 4*cr0+eq; crF bit is Or'd with itself and result writen to crfD
- Flip a bit (flip cr0 EQ bit and place result in cr7 EQ bit's spot))
crnot 4*cr7+eq, 4*cr0+eq #crnor 4*cr0+eq, 4*cr0+eq, 4*cr0+eq; crF bit is NOR'd with itself and result written to crfD
Also, the following instructions may be handy for you...
- mfcr rD #Contents of the CR is copied to rD
- mtcr rD #Contents of rD is copied to the CR
- mcrf crD, crA #Condition Field A is copied to Condition Field D
Chapter 5: Cleaning up some Code
Let's go over some basic examples of some "CR trickery" to help clean up code. Some examples below won't shorten the source at all (will be same compiled length), but the amount of branches (plus label names) are reduced. This is accomplished by using multiple crF's and using Condition Register specific instructions.
Scenario 1:
If r4 = 1 and r10 = r31, then go to 'store_data'. Otherwise, go to 'dont_store'.
Typical Source
Code: cmpwi r4, 1
bne- dont_store
cmpw r10, r31
beq- store_data
New Source
Code: cmpwi r4, 1
cmpw cr7, r10, r31
crand 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
beq- store_data
Scenario 2:
If r4 = 1 or r10 = r31, then go to 'store_data'. Otherwise, go to 'dont_store'.
Typical Source
Code: cmpwi r4, 1
beq- store_data
cmpwi r10, r31
bne- dont_store
New Source
Code: cmpwi r4, 1
cmpw cr7, r10, r31
cror 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
beq- store_data
Scenario 3:
If r4 = 1 and r10 =/= r31, then go to 'store_data'. Otherwise, end_code
Typical Source
Code: cmpwi r4, 1
bne- end_code
cmpw r10, r31
bne- store_data
New Source
Code: cmpwi r4, 1
cmpw cr7, r10, r31
crandc 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
beq- store_data
Scenario 4:
If r4 = 1 or r10 =/= r31, then go to 'store_data'.
Typical Source
Code: cmpwi r4, 1
beq- store_data
cmpw r10, r31
bne- store_data
New Source
Code: cmpwi r4, 1
cmpw cr7, r10, r31
crorc 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
beq- store_data
Scenario 5:
If r4 = 1 then r10 must =/= r31, or if r4 =/=1 then r10 must = r31. If all requirments met go to 'store_data'. If not, go to end_code.
Typical Source
Code: cmpwi r4, 1
bne- make_sure_next_true
#r4 = 1, r10 must =/= r31
cmpw r10, r31
bne- store_data
b end_code
#r4 =/= 1, r10 must = r31
make_sure_next_true:
cmpw r10, r31
beq- store_data
New Source
Code: cmpwi r4, 1
cmpw cr7, r10, r31
crxor 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
beq- store_data
Scenario 6:
If r4 = 1, then r10 must = r31. However r4 can =/= 1 as long as r10 =/= r31. If all requirements are met go to 'store_data'. If not, go to end_code.
Typical source
Code: cmpwi r4, 1
bne- make_sure_next_false
#r4 = 1, r10 must = r31
cmpw r10, r31
bne- store_data
b end_code
#r4 =/= 1, r10 must =/= r31
make_sure_next_false:
cmpw r10, r31
bne- store_data
New source
Code: cmpwi r4, 1
cmpw cr7, r10, r31
creqv 4*cr0+eq, 4*cr0+eq, 4*cr7+eq
beq- store_data
Chapter 6: Final Example
Let's say you have a value in r3 and it must be a valid Memory Address. Meaning a valid mem80, mem81, or mem9 address. If the address is not valid in any way, branch to the LR. An efficient way to write it would be like this (pretend r4 thru r7 are safe)...
Code: lis r4, 0x8000 #0x80000000
lis r5, 0x817F #0x817FFFFF
ori r5, r5, 0xFFFF
addis r6, r4, 0x1000 #0x90000000
addis r7, r5, 0x1280 #0x93FFFFFF
cmplw r3, r4
cmplw cr5, r3, r5
cmplw cr6, r3, r6
cmplw cr7, r3, r7
cror 4*cr0+eq, 4*cr0+lt, 4*cr7+gt #Check if less than 0x80000000 ***or*** greater than 0x93FFFFFF; place result in cr0
crand 4*cr5+eq, 4*cr5+gt, 4*cr6+lt #Now check if its in between 0x817FFFC0 ***and*** 0x90000000; place result in cr5
cror 4*cr0+eq, 4*cr0+eq, 4*cr5+eq #If *any* of the two above conditions (cr0 and cr5) were true, branch to LR
beqlr-
And that's pretty much it. Happy coding!
|
|
|
All About Cache |
Posted by: Vega - 05-15-2022, 12:59 PM - Forum: PowerPC Assembly
- No Replies
|
|
All About Cache
This PowerPC tutorial will teach you the in's and out's of the Cache model of the Wii Broadway Chip, it's instruction set, and how some of these instructions may need to be used for Gecko ASM Codes. This is a lengthy read, but every PPC Coder/dev should have a decent understanding of Broadway's Cache model.
Chapter 1: Understanding some Basics about Memory
There's two types of memory, Virtual & Physical. When Broadway executes in Virtual Memory, this is called Virtual Mode. When Broadway executes in Physical Memory, this is called Real Mode.
Virtual Memory is split into two categories:
- Virtual Cached Memory
- Virtual Uncached Memory
Virtual Cached Memory is your typical 'normal' memory that you are familiar with (i.e. 0x80000000 thru 0x817FFFFF & 0x90000000 thru 0x93FFFFFF).
Virtual Cached Memory is a representation of Physical Memory but it includes any cached content. Cached content may be 'old' or may be 'too new'. Therefore, what you see in Virtual Cached Memory may not be what is actually present in Physical Memory. Virtual Uncached Memory is a simple representation (copy) of Physical Memory.
Virtual Memory has to be split into Cached & Uncached so software always have the option to bypass cache.
Wii games won't run entirely in Real Mode due to lack of 'security'.
In Real Mode, all of memory has the same properties, and those properties cannot be adjusted from the Broadway default settings. With Virtual Mode, you can set different regions of memory to have a variety of different properties, and adjust said properties whenever you want.
Here's a list of Physical, Virtual Cached, and Virtual Uncached memory ranges for most Wii games.
- 0x00000000 thru 0x017FFFFF Physical Mem1
- 0x10000000 thru 0x13FFFFFF Physical Mem2
- 0x80000000 thru 0x817FFFFF Virtual Cached Mem1 (known as mem80 and mem81)
- 0x90000000 thru 0x93FFFFFF Virtual Cached Mem2 (known as mem9)
- 0xC0000000 thru 0xC17FFFFF Virtual Uncached Mem1
- 0xD0000000 thru 0xD3FFFFFF Virtual Uncached Mem2
The list doesn't include everything (like Hardware Memory), just the most common stuff that's relevant to Gecko ASM Codes.
Chapter 2: Structure of Cache Organization
There are two different cache systems in Broadway. L1 (Level 1) and L2 (Level 2). The L2 cache operates in a similar fashion but is larger. There's no need to deep dive into the intricasies of the L2 cache. The L1 cache will be the only cache unit covered about this this thread. The L1 Cache is split into two categories:
- Data Cache
- Instruction Cache
Instruction Cache is for anything that contains executable instructions, simple enough. Data Cache is for any data that are part of any load/store mechanism. Executable instructions can also be included in the Data Cache. For example, if you write (i.e. store) a new instruction to memory, it will be utilized by both the Instruction and Data cache.
Here's the layout of a Data Cache set/page (each row is known as a 'way')
Way0 | 32-byte Aligned Physical Address | StateBits | 8 Words
Way1 | 32-byte Aligned Physical Address | StateBits | 8 Words
Way2 | 32-byte Aligned Physical Address | StateBits | 8 Words
.. ..
Way6 | 32-byte Aligned Physical Address | StateBits | 8 Words
Way7 | 32-byte Aligned Physical Address | StateBits | 8 Words
The Instruction Cache implements the same layout, but it uses a single "Valid" Bit in place of the State Bits. Each Way will contain a 32-byte aligned physical address. Even though the address is physical, it is always translated to its Virtual Address for usage. "8 words" means the 8 words of data/instructions that are at the 32-byte aligned address. 8 words = 32 byte block. This block is known as a Cache Block. Since every address has to be 32-byte aligned, this means nothing smaller than a 32-byte aligned block of memory can have unique State/Valid Bits.
8 ways (Way0 thru 7) make up one 'Set'. There are a total of 128 Sets for both the Instruction and Data Cache. Both Caches are 32KB in size (32 bytes x 8 ways x 128 = 32,768 bytes = 32KB).
Since every cache block is 32-byte aligned, this means that you make a modification to the cache of let's say address 0x80001504, cache for the words of addresses 0x80001500 thru 0x8000151C will all be effected simultaneously.
Chapter 3: Cache Hits and Misses
It's crucial to understand that the Data Cache can only have new content added to it by store instructions. This includes any typical store instruction, but it also includes the dcbi, dcbz and dcbz_l instructions (these are treated as store instructions by Broadway). Content in the Data Cache is managed by a pseudo least-recently-used algorithm (aka PLRU).
The Instruction Cache gets content added to it by Broadway's Instruction Fetching mechanism only. It is impossible to control the Fetching mechanism directly. Therefore we cannot, at will, add in new content to the Instruction Cache. Just like the Data Cache, content in the Instruction Cache has its own PLRU.
The inner workings of the PLRU is not a concern for us Gecko Code creators. However, we do need to cover Cache Hits and Misses. Anyway, over time, the PLRU will fill instructions/data in the cache and later remove them so new data can use the Cache. The filling of the cache by the PLRU is usually referred to as 'pushing a block(s) onto the Cache'. We cannot change how the PLRU itself functions, but there are specific instructions we can do to forcefully edit Cache Blocks or push new blocks onto the Cache. This is covered in Chapter 5.
Whenever instructions/data is processed by Broadway, Broadway will check the L1 Cache (then the L2 Cache) to see if the specific memory address is in the Cache. If the address is present, this is known as a Cache Hit. If not, this is known as a Cache Miss. Cache misses severely degrade performance.
Chapter 4: State and Valid Bits
Each cache block (with it's 32-byte aligned physical address) in the Data Cache will have one of the following state bits with it.
State bits--
- M = Modified
- E = Exclusive
- I = Invalid
Modified = Present in Virtual Cached Memory but not yet present on Physical Memory; will be written to physical memory sooner or later. When new blocks are placed into the Cache by the PLRU, they are tagged with M bit. Please note that PPC Manuals will sometimes refer to a Data Cache block as "dirty" if it's tagged with the Modified (M) bit.
Exclusive = What's in Virtual Cached Memory is what's in Physical Memory. Please note that PPC Manuals will sometimes refer to a Data Cache block as "clean" if it's tagged with the Exclusive (E) bit.
Invalid = Old data that is now invalid, you can freely erase/modify this block w/o effecting anything. When the PLRU updates Data Cache, only blocks that are tagged with the I bit qualify to be removed from the Cache.
Each physical address in the instruction cache has a valid bit associated with it
Valid = next time this address is used by an instruction, the value here is what will be used
Invalid = old data that is now invalid, will not be used, can be tossed whenever. When the PLRU updates Instruction Cache, only blocks that are tagged with the I bit qualify to be removed from the Cache.
Chapter 5: List of Cache Instructions
Broadway comes with the following cache instructions~
dcbf rD, rA = Data Cache Block Flush
dcbi rD, rA = Data Cache Block Invalidate
dcbst rD, rA = Data Cache Block Store
dcbt rD, rA = Data Cache Block Touch
dcbtst rD, rA = Data Cache Touch for Store
dcbz rD, A = Data Cache Block Zero
dcbz_l rD, rA = Data Cache Block Zero then Lock
icbi rD, rA = Instruction Cache Block Invalidate
rD + rA = The address (aka Effective Address aka EA)
Note that in all instructions, if rD = r0, it will be treated as literal zero.
- dcbf For cache hits, if the block has a M bit, the data in the block is now written to physical memory and an I bit replaces the M bit. If the block has an E bit, the bit is simply changed to I. For cache misses, no action is taken. Therefore you can use dcbf as a way to "erase" the Cache but make sure memory gets updated before Cache is erased. For "erasing" Cache without updating memory, refer to dcbi.
- dcbst For cache hits, if the block has an E bit or I bit, no action is taken. If the block has a M bit, the data in the block is written to physical memory and the bit is changed to E. For cache misses, no action is taken. Pro-Tip: If you are familiar with BAT Registers and the region of memory is in a BAT that is marked at 'Write-Through (W bit high), you will never need the dcbst instruction for that region of memory, but performance of Broadway will be degraded.
- dcbi For cache hits, the state bit is always changed to I, regardless of what is was before. If the state bit was M, data that was going to be written to physical memory is now discarded. For cache misses, no action is taken. Therefore, dcbi can be used to "erase" the Cache and prevent the Cache from updating physical memory.
- dcbt This is used to give the Cache system a hint that an upcoming Load instruction needs to have its 32-bit aligned Address pushed onto the cache. Thus, this is only useful if you know the Load instruction will end up as a Cache Miss. For cache hits, no action is taken. Improper usage of this instruction (too many Cache hits) will degrade performance.
- dcbtst This is used to give the Cache system a hint that an upcoming Store instruction needs to have its 32-bit aligned Address pushed onto the cache. Thus, this is only useful if you know the Store instruction will end up as a Cache Miss. For cache hits, no action is taken. Improper usage of this instruction (too many Cache hits) will degrade performance.
- dcbz For cache hits, the contents of the Block (virtual memory) are zero'd, and state bit changed to M. For cache misses, a new block using the address referenced by the dcbz is pushed onto the Cache, then the Cache Block is zero'd (regardless of what is present in Physical Memory) & tagged with the M bit.
- dcbz_l does the same as above, but will then lock the cache where it can't be modified. This instruction is only legal when the Locked Cache (via HID2) is enabled, otherwise an exception will occur.
- icbi is the only instruction you have available to modify the instruction cache. For cache hits, the block is set to Invalid. For cache misses, no action is taken.
As mentioned in Chapter 3, dcbi, dcbz and dcbz_l are treated as store instructions. All other cache-related instructions are treated as load instructions. In conclusion, there are no cache-related instructions to force any updates (add new Blocks) to the Instruction Cache.
Chapter 6: Overwriting Executable Instructions
For Gecko ASM Codes, the only instance where we really need to worry about cache is if your code involves writing/re-writing new instructions that will be executed later on.
This is known as Self-Modifying Code.
When overwriting instructions, you need to ensure they get updated in physical memory before Broadway fetches them for execution. Or else there's a chance the instructions fetched will be the old instructions.
Here's a template for updating cache for writing in new executable instructions
Code: #rX = points to memory address of newly written executable instruction
dcbst 0, rX
icbi 0, rX
isync
- dcbst 0, rX = This will force the block (if M bit tagged) to be written to physical memory. State bit changed to E.
- icbi 0, rX = The old instruction may still be present (and marked Valid) in the Instruction Cache. Therefore, we tag it as Invalid
- isync = Broadway is an out-of-order execution CPU like any other modern CPU. Even with the icbi instruction, it's possible Broadway still fetched the older instruction. This isync instruction will force Broadway to purge its current fetched instructions and refetch. Thus forcing the new instruction to be fetched.
You do **NOT** need to 32-byte align the address (i.e. 0x8000151C -> 0x80001500) for rX when using the above example source. Broadway will handle that for you.
You also do **NOT** need to include the isync if the first newly written instruction is at least 5 will-be-executed instructions ahead of the icbi. This is because Broadway can only fetch up to 4 instructions at a time.
Fyi: If using the above snippet in a loop mechanism, you only need an isync at the end. Do not place it inside the loop. Also remember that Cache Blocks are 32-byte aligned. Therefore your address incrementation amounts for load and store instructions (in your loop) should be incrementing by 32.
Chapter 7: In-Depth Explanation
To explain the entirety of why we need..
dcbst
icbi
isync
...for the case of Self-Modifying Code, we need to cover some complex aspects of Broadway that you may not be familiar with.
First understand that all Wii games configure virtual regions of memory via what a mechanism called BAT registers.
We don't need to worry what the BAT registers are exactly and how to use them. Just understand that all of usable physical memory is mapped twice virtually, once for data and once for instructions. (For more info on BATs, read this thread HERE)
Thus we have two virtual copies of the same physical memory. It's important to understand that there is no 'built-in' mechanism by Broadway that ensures these two copies of memory always match each other. That is required by software (the program/game/codes/whatever).
The virtual memory that is used for Data is configured as "Write-Back" and "Cache-Enabled".
- Write-Back = store operations update the cached memory, but do not instantly update physical memory
- Write-Through = store operations update cached memory, plus updating physical memory. performance is degraded.
Since the virtual memory for Data is also cache-enabled, it is referred to as Virtual Cached Memory. Therefore this memory includes all contents of the Data Cache.
The virtual memory for Instructions is also configured as Cache-Enabled (Write-Back/Through is not applicable here).
Anyway since Virtual Cached Memory, for the use of Data, is in Write-Back Mode, this presents a problem for Instruction execution. It can create scenarios where the Instruction Cache is "seeing" a different virtual memory copy than what the Data Cache is "seeing".
It's important to understand how Broadway fetches instructions for execution. The fetching mechanism will hit a virtual address, translate it to its physical address equivalent, and then search various units for the address's instruction. Broadway searches the following places...
- L1 Instruction Cache
- L2 Instruction Cache
- Physical Memory (may also be called System or Main memory in various manuals/websites)
Broadway checks the L1 Cache first. If the address isn't present there, it will then check the L2 Cache. If not present in the L2 Cache, physical memory is finally checked.
For Cache hits, Broadway will then check the address's valid bit in the cache. If the valid bit is set, Broadway will use the instruction that is currently present in Virtual Cached Memory (the memory that the Instruction Cache "see's"). If the invalid bit is instead set, Broadway will directly go to physical memory for the instruction, bypassing the L2 check if necessary.
Keep in mind that L1 and L2 cache are 'synced', whatever is in the L1 Cache is ALWAYS present in the L2 Cache. This is possible due to L2 being larger than L1.
Now that you understand how instruction fetching works, we need to cover the 'under the hood' stuff of store and load instructions via virtual cached memory.
So let's say you have any plane-jane basic store instruction (i.e stw), that stores to plain-jane virtual cached memory. Welp after that store instruction has executed, the physical address will be pushed onto the Data cache and the data itself is written at the virtual cache memory address.
Now let's say you then execute a load instruction (i.e lwz) as the very next instruction. Obviously, what you have just stored via stw is what will be loaded via lwz. That's because the previous store updated the Data Cache (with a new Block), therefore the load instruction will receive a Cache Hit and the contents to load are retrieved from the Data Cache (virtual cached memory).
Now let's say you store over an instruction, the only changes that instantly occur is in the Data Cache which would be the Virtual Cached Memory that the Data Cache "see's". Physical memory doesn't update instantly since the memory in question is under Write-Back mode. Thus, the next time the new instruction is fetched, the old instruction will most likely be used instead.
Why is this?
This is because the newly written instruction won't be in the Instruction Cache's L1 + L2 meaning it's not present in the virtual memory that the Instruction Cache "see's". It will also not be present in Physical Memory.
The utilization of the dcbst instruction will force the newly written instruction to be also written to Physical Memory. However this instruction alone isn't enough. It is possible the old instruction is currently in the Instruction L1/L2 Cache with being marked as Valid. Meaning the instruction fetching mechanism won't even bother checking Physical Memory since the L1/L2 cache is basically saying "Hey we have the instruction! And it's valid! No need to check physical memory!"
Therefore to alleviate this possible problem, we use the icbi instruction to mark the old instruction in the L1/L2 cache as invalid. If the old instruction isn't in the Instruction Cache, then the icbi has zero effect (like a nop). The isync is needed just in case the old instruction was fetched. It will force Broadway to re-fetch instructions again so now the new instruction is guaranteed to be fetched. As mentioned earlier, an isync is not required if the modified new instruction is at least 5+ would-be-executed instructions ahead of the icbi instruction.
In conclusion, these three instructions (dcbst, icbi, isync) will always ensure that your newly written instructions are always executed.
Still confused? Here's a picture:
rX = New Instruction to write
rY = Address in question
Yellow font shows the changes invoked by the respective instruction.
Instruction is the instruction that HAS executed. Regarding Fetcher Status, it gives you a basic summary of what is happening in regards to the Fetcher and the Instruction Queue. When Instructions are placed into the Queue, they are also placed into the I-Cache (if not present beforehand), and then marked as Valid.
'Not Present most likely' for rY D-Cache means that its very unlikely that rY (with its cache block data) is already present in the D-Cache. Even if it is, we would have no idea (given the information from the diagram) of what its state bits would be.
'0x38000000 possibly' for rY I-Cache means we have no idea (give the information) if rY (with its cache block data) is present (regardless of Valid vs Invalid) in the I-Cache.
In conclusion, these three instructions (dcbst, icbi, isync) will always ensure your newly written instructions are visible to the instruction fetching mechanism.
Chapter 8: Possible Questions and Answers
Hey Vega, I've seen on some PPC Manuals or are on some websites that a sync must be placed after dcbst. Do I need this sync instruction in my Self-Modifying Code?
TLDR Version: No
Want to know exactly why? Read below...
First, let's recap what isync (Instruction Synchronize). isync does the following...- Waits for all prior instructions to complete
- Prevents future instructions from completing until isync itself completes
- Future instructions that were already fetched, are purged then refetched
Sync is sort of similar to isync, which brings in a lot of confusion. sync does the following..- Waits for all prior instructions to complete
- Waits for all Memory Accesses caused by prior instructions to complete
- Prevents future instructions from completing until Sync itself completes
What do I mean by Memory Accesses? Are we talking about Loads and Stores?
No, we are not. Placing a sync after a store instruction does *NOT* ensure said store instruction reaches physical memory. This is the biggest misconception of sync. Only dcbf and dcbst ensures a store reaches physical memory (or actually writing to physical memory directly ofc, lol).
The term "Memory Accesses" refers to....- Some external (non-PowerPC) device (i.e. Starlet, DSP engine, DMA engine, EEPROM chip) is accessing Memory (the 32-byte block in question) in any way.
- Another PowerPC core or processor is accessing Memory (the 32-byte block in question) in any way
- TLB invalidations (from the core that is executing the self modifying code itself, *NOT* some other core)
- Page Table R and C bits being accessed by other cores, processors, or external devices
If the self-modifying code is executing in an environment where any of the following is true....- Page Tables are being utilized (which thus uses TLBs)
- There is another core within Broadway or another Broadway processor present (which is impossible since Broadway is single core uni-processor), and cache coherency between all cores and/or processors must be maintained
- Some external device (i.e. Starlet) is accessing the memory utilized by the self-modifying code
Then a sync would be required. Because typical self-modifying code that you would write for a regular Gecko Code doesn't meet any of these qualifications, a sync can be omitted safely.
What about using dcbf instead of dcbst?
This depends and it's so hard to answer this due to endless amount of factors. To keep it simple, for a C0 code or C2 codes that execute at or slower than once per frame, use dcbf. For C2 codes that execute quicker than once per frame, use dcbst.
Chapter 9: Some neat tricks with Cache instructions
This chapter will contain some snippets of code to show some neat tricks you can do with the Cache. All tricks are sources meant to be compiled as C0 Gecko Codes. Fyi,these tricks will only work on a regular Wii Console.
---
Trick #1: Write word 1 to memory, load it back, and it will be a different value (0) than what was just stored
Summary:
Write null word to virtual address 0x80001500
Flush the block, so we know its written to physical 0x00001500, and therefore the block is now left invalid
Write 1 to virtual address 0x80001500
Load word from physical (0xC0001500 which is direct physical copy of 0x00001500)
If value is *NOT* 1 (aka 0), game will light up disc drive to show success
Code: #Disable INTs; not done correctly! Do not copy this for your regular cheat codes!
mfmsr r3
rlwinm r12, r3, 0, 17, 15
mtmsr r12
#Set r12 to 0x80000000
lis r12, 0x8000
#Make sure value at 0x80001500 is null beforehand
li r11, 0
stwu r11, 0x1500 (r12)
#Make sure null is also written to the physical memory
dcbf 0, r12
#Set value of 1
li r11, 1
#Set r10 as pointer to start of uncached memory
lis r10, 0xC000
#Store 1 to Virtual Cache memory
#This store will now push r12 on to the Cache assigned with the M bit.
#Fyi: Anything that has the M bit has not been sent to physical memory yet
stw r11, 0 (r12)
#Load up from uncached memory (exact copy of physical)
lwz r11, 0x1500 (r10)
#Check if r11 = 1. If not, light up disc drive
cmpwi r11, 1
beq- restore_ints
#Disc Drive
lis r12, 0xCD00
lwz r0, 0x00C0 (r12)
ori r0, r0, 0x0020
stw r0, 0x00C0 (r12)
#Restore INTs. #Copy current MSR into r12
#Not done correctly! Do not copy this for your regular cheat codes!
restore_ints:
mfmsr r12
#Insert r3's EE bit into r12, overwriting r12's EE bit
rlwimi r12, r3, 0, 16, 16
#Update MSR
mtmsr r12
#End C0
#blr
---
Trick #2: Write zero to memory without using regular store instructions (this will actually write zero to an entire 32-byte aligned block). This isn't really a 'trick' per say since the dcbz instruction is suppose to behave in such a manner, but you get the idea.
Summary:
Write 1 to virtual address 0x80001500
Make block exclusive (via dcbst) to force update to physical memory
Do a temp load to prove value the word value at physical address is 1
Zero the cache block
Force block to be written to physical (via dcbst)
Load value from physical memory
It will equal 0 (disc drive lights up)
Code: #Disable INTs; not done correctly! Do not copy this for your regular cheat codes!
mfmsr r3
rlwinm r12, r3, 0, 17, 15
mtmsr r12
#Write 1 to 0x80001500
lis r12, 0x8000
li r11, 1
stwu r11, 0x1500 (r12)
#Make sure updates are in physical memory, keep block exclusive
dcbst 0, r12
#At this moment, physical addr 0x00001500 = 1. We will verify this. If not true, skip lightning up disc drive
lis r10, 0xC000
lwz r11, 0x1500 (r10)
cmpwi r11, 1
bne- restore_ints
#Zero out cache block now, block now tagged with M
#This instruction zero's out the data for the block in cached memory
dcbz 0, r12
#However, let's make sure the changes also go the physical memory
dcbst 0, r12
#Now load value from physical, if null disc drive will light up
lwz r11, 0x1500 (r10)
cmpwi r11, 0
bne- restore_ints
#Disc Drive
lis r12, 0xCD00
lwz r0, 0x00C0 (r12)
ori r0, r0, 0x0020
stw r0, 0x00C0 (r12)
#Restore INTs. #Copy current MSR into r12
#Not done correctly! Do not copy this for your regular cheat codes!
restore_ints:
mfmsr r12
#Insert r3's EE bit into r12, overwriting r12's EE bit
rlwimi r12, r3, 0, 16, 16
#Update MSR
mtmsr r12
#End C0
#blr
---
Trick #3: Write value of 1 to virtual, then immediately write 2 to physical afterwards. However with some cache trickery, when we load the word value from physical memory, it will be the stale value of 1.
Summary:
Flush block at 0x80001500
Write 1 to 0x80001500
Write 2 to 0xC0001500 immediately afterwards
dcbst on cache block to overwrite the 2 with the earlier value of 1
Load value from 0xC0001500
It will be 1 (not 2) and disc drive will light.
Code: #Disable INTs; not done correctly! Do not copy this for your regular cheat codes!
mfmsr r3
rlwinm r12, r3, 0, 17, 15
mtmsr r12
#First flush the block to make sure it cannot be in the exclusive state after our write
lis r12, 0x8000
ori r12, r12, 0x1500
dcbf 0, r12
#Set r10 to 0xC000
lis r10, 0xC000
#Set values 1 and 2 in their registers
li r11, 1
li r9, 2
#Write 1 to 0x80001500 first!
#Then Write 2 to 0xC0001500 (physical)
stw r11, 0 (r12)
stw r9, 0x1500 (r10)
#Force cache block at 0x80001500 to update to physical memory now
#This will overwrite the newly written value of 2 present at 0x00001500/0xC0001500
dcbst 0, r12
#Now Load from physical & check. If 1 (old value), disc drive will light up
lwz r11, 0x1500 (r10)
cmpwi r11, 2
beq- restore_ints
#Disc Drive
lis r12, 0xCD00
lwz r0, 0x00C0 (r12)
ori r0, r0, 0x0020
stw r0, 0x00C0 (r12)
#Restore INTs. #Copy current MSR into r12
#Not done correctly! Do not copy this for your regular cheat codes!
restore_ints:
mfmsr r12
#Insert r3's EE bit into r12, overwriting r12's EE bit
rlwimi r12, r3, 0, 16, 16
#Update MSR
mtmsr r12
#End C0
#blr
And that's it for Cache, happy coding!
|
|
|
Draggable items code |
Posted by: dirtyfrikandel - 05-12-2022, 11:43 PM - Forum: Code Support / Help / Requests
- Replies (1)
|
|
Hey.
I need some help with creating a code that lets you drag any item, just like the draggable blueshell mod (https://mariokartwii.com/showthread.php?...=draggable)
For example to drag a star i've tried the following:
1) Item Behaviour Modifier, https://mariokartwii.com/showthread.php?tid=386) but it also activates the power before dragging it.
048A61B8 00000002
2) I changed this code to the following (taken the PAL code for star from Item Behaviour Modifier):
068A61B8 00000008
00000002 00000000.
This does not activate the power up when pressing the item button (nice!), and the item actually drags behind the vehicle. However, when pressing the button again, it drops the item on the ground (like a banana)
I want to drag things like POW, Shocks, Bloopers, Megas in order to simulate a mario kart 8 Deluxe where you have 2 item slots.
Is this even possible as these items have a standard behaviour value of 00 instead of 01, like the blueshell/red shell/green shell (https://wiki.tockdom.com/wiki/Filesystem...r_Modifier)
Kind regards,
Dirtyfrikandel
|
|
|
Hello I am 267 |
Posted by: 267 - 05-03-2022, 08:45 AM - Forum: Introductions
- Replies (1)
|
|
Hi I'm Zi/StereoCat/Rosuku/Roskzy/Rosk/Sazry/Sazry666 I go by a few names
I'm bored and I feel like coming back to the mkwii community again but I'm not as active at all.
YT: ASL
Wiki: 267
|
|
|
[Code Request] Get Points After Time is Up |
Posted by: Zeem - 04-14-2022, 05:03 PM - Forum: Code Support / Help / Requests
- Replies (6)
|
|
After playing some battles online, I realized that players don't receive points until a few seconds after they hit their opponent. This is problematic because during the last 2-3 seconds of the battle, nobody can earn any points. I was thinking of a way this could possibly be fixed, it happens way too often when I'm battling online.
My idea is: a code that allows players to earn/lose points for 2 more seconds after time runs out.
If anyone can make this code, I would really appreciate it.
|
|
|
|