Welcome, Guest |
You have to register before you can post on our site.
|
Online Users |
There are currently 100 online users. » 0 Member(s) | 95 Guest(s) Applebot, Bing, Facebook, Google, Twitter
|
Latest Threads |
Thunder Cloud Effect Modi...
Forum: Offline; Item
Last Post: JerryHatrick
4 hours ago
» Replies: 11
» Views: 1,053
|
MKW Coder/Developer of th...
Forum: Coding & Hacking General Discussion
Last Post: Vega
6 hours ago
» Replies: 10
» Views: 13,775
|
Make it to 10,000
Forum: General Discussion
Last Post: Vega
7 hours ago
» Replies: 7,338
» Views: 5,667,363
|
Miniturbos and Inside Dri...
Forum: Coding & Hacking General Discussion
Last Post: JerryHatrick
Yesterday, 09:54 AM
» Replies: 1
» Views: 855
|
Code request???
Forum: Code Support / Help / Requests
Last Post: DrTap
01-09-2025, 06:06 PM
» Replies: 3
» Views: 4,941
|
CPUs/Online Players Have ...
Forum: Visual & Sound Effects
Last Post: Zeraora
01-09-2025, 02:26 AM
» Replies: 2
» Views: 499
|
Offline Hide and Seek
Forum: Code Support / Help / Requests
Last Post: FelX
01-08-2025, 03:43 PM
» Replies: 11
» Views: 725
|
Show Nametags During Coun...
Forum: Visual & Sound Effects
Last Post: _Ro
01-08-2025, 07:48 AM
» Replies: 1
» Views: 665
|
Item Reset Code with Time...
Forum: Code Support / Help / Requests
Last Post: WaluigiisFluffy
01-07-2025, 11:20 PM
» Replies: 6
» Views: 234
|
Racer Count Modifier
Forum: Offline Non-Item
Last Post: Vega
01-07-2025, 06:30 PM
» Replies: 1
» Views: 123
|
|
|
Remove invisible walls [jawa] |
Posted by: jawa - 11-19-2022, 07:37 PM - Forum: Visual & Sound Effects
- Replies (5)
|
|
Removes invisible walls.
WARNING! Halfpipes do not work as intended.
NTSC-U
C251508C 0000001C
7D2000A6 552C045E
7D800124 38210120
9421FF60 BDC10008
7E4802A6 2C030000
418200A4 48000011
636F7572 73652E6B
636C0000 7E2802A6
7F95E378 3A600000
7E138A14 89F00000
7E13AA14 89D00000
7E8F7214 2C140000
41820018 7C0F7000
41820008 48000060
3A730001 4BFFFFD4
7C701B78 82300008
82B0000C 7E318214
7EB58214 7E348B78
7C14A800 41800008
48000034 A274000E
39E0001F 7E737838
2C13001F 41810020
2C13000D 41820008
4800000C 39E00018
91F4000E 3A940010
4BFFFFC8 7E4803A6
B9C10008 382100A0
7D8000A6 512C0420
7D800124 00000000
PAL
C2519500 0000001C
7D2000A6 552C045E
7D800124 38210120
9421FF60 BDC10008
7E4802A6 2C030000
418200A4 48000011
636F7572 73652E6B
636C0000 7E2802A6
7F95E378 3A600000
7E138A14 89F00000
7E13AA14 89D00000
7E8F7214 2C140000
41820018 7C0F7000
41820008 48000060
3A730001 4BFFFFD4
7C701B78 82300008
82B0000C 7E318214
7EB58214 7E348B78
7C14A800 41800008
48000034 A274000E
39E0001F 7E737838
2C13001F 41810020
2C13000D 41820008
4800000C 39E00018
91F4000E 3A940010
4BFFFFC8 7E4803A6
B9C10008 382100A0
7D8000A6 512C0420
7D800124 00000000
NTSC-J
C2518E80 0000001C
7D2000A6 552C045E
7D800124 38210120
9421FF60 BDC10008
7E4802A6 2C030000
418200A4 48000011
636F7572 73652E6B
636C0000 7E2802A6
7F95E378 3A600000
7E138A14 89F00000
7E13AA14 89D00000
7E8F7214 2C140000
41820018 7C0F7000
41820008 48000060
3A730001 4BFFFFD4
7C701B78 82300008
82B0000C 7E318214
7EB58214 7E348B78
7C14A800 41800008
48000034 A274000E
39E0001F 7E737838
2C13001F 41810020
2C13000D 41820008
4800000C 39E00018
91F4000E 3A940010
4BFFFFC8 7E4803A6
B9C10008 382100A0
7D8000A6 512C0420
7D800124 00000000
NTSC-K
C2507520 0000001C
7D2000A6 552C045E
7D800124 38210120
9421FF60 BDC10008
7E4802A6 2C030000
418200A4 48000011
636F7572 73652E6B
636C0000 7E2802A6
7F95E378 3A600000
7E138A14 89F00000
7E13AA14 89D00000
7E8F7214 2C140000
41820018 7C0F7000
41820008 48000060
3A730001 4BFFFFD4
7C701B78 82300008
82B0000C 7E318214
7EB58214 7E348B78
7C14A800 41800008
48000034 A274000E
39E0001F 7E737838
2C13001F 41810020
2C13000D 41820008
4800000C 39E00018
91F4000E 3A940010
4BFFFFC8 7E4803A6
B9C10008 382100A0
7D8000A6 512C0420
7D800124 00000000
Code: # System::DVDArchive::getFile([DVDArchive* d_arc], char const*, unsigned int*)
# returns file buffer (r3 = void* outbuf)
.set INVISIBLE_WALL, 0x0D
.set SOUND_TRIGGER, 0x18
.set STACK, 0xA0
.macro disable_interrupts
mfmsr r9
rlwinm r12, r9, 0, 17, 15
mtmsr r12
.endm
.macro enable_interrupts
mfmsr r12
rlwimi r12, r9, 0, 16, 16
mtmsr r12
.endm
disable_interrupts
#default
addi sp, sp, 288
# push stack
stwu sp, -STACK (sp)
stmw r14, 0x8 (sp)
mflr r18
cmpwi r3, 0
beq end
check:
bl course_kcl_string
.string "course.kcl"
.align 2
course_kcl_string:
mflr r17
course_kcl:
# r17 = &"course.kcl", string 1
mr r21, r28 # r21 = filename, string 2
# li r20, 0 # r21 = strings_are_equal
li r19, 0 # r19 = strcmp counter
loop:
add r16, r19, r17
lbz r15, 0 (r16) # str1 + off
add r16, r19, r21
lbz r14, 0 (r16) # str2 + off
add r20, r15, r14
cmpwi r20, 0 # if chr == \0
beq success
cmpw r15, r14 # (str1 + off) == (str2 + off)
beq cont_add
b end
cont_add:
addi r19, r19, 1
b loop
success:
# course.kcl is loaded!
# r3 = void* buf#
mr r16, r3
lwz r17, 0x08 (r16) # r17 = SEC3.start
lwz r21, 0x0C (r16) # r21 = SEC4.start
add r17, r17, r16 # SEC3* sec3 = file_start + SEC4.start
add r21, r21, r16 # SEC4* sec4 = file_start + SEC3.start
mr r20, r17 # r20 = i = SEC3.start
loop2:
cmpw r20, r21 # i, SEC4
blt loop2_2
b end
loop2_2:
lhz r19, 0xE (r20) # kcl flag
li r15, 31 # and mask value
and r19, r19, r15 # kcl_type = kcl_flag & mask [0x10] (isolate 5 LSB bits)
cmpwi r19, 0x1F
bgt- end
cmpwi r19, INVISIBLE_WALL
beq invis
b inc
invis:
li r15, SOUND_TRIGGER
stw r15, 0xE (r20) # *kcl_flag_ptr = SOUND_TRIGGER
inc:
addi r20, r20, 0x10 # i += 0x10
b loop2
end:
mtlr r18
lmw r14, 0x8 (sp)
# pop stack
addi sp, sp, STACK
enable_interrupts
|
|
|
Fast Race Music Modifier [Zeraora] |
Posted by: Zeraora - 11-16-2022, 11:24 PM - Forum: Visual & Sound Effects
- No Replies
|
|
This code allows the user to change all fast race music to any respective BRSTM.
NTSC-U:
0470A5D0 388000XX
PAL:
04712074 388000XX
NTSC-J:
047116E0 388000XX
NTSC-K:
0470041C 388000XX
XX = BRSTM Identifier
A list of the identifiers can be found on the wiiki.
|
|
|
Normal Race Music Modifier [Zeraora] |
Posted by: Zeraora - 11-16-2022, 11:22 PM - Forum: Visual & Sound Effects
- No Replies
|
|
This code allows the user to change all normal race music to any respective BRSTM.
NTSC-U:
0470A53C 388000XX
PAL:
04711FE0 388000XX
NTSC-J:
0471164C 388000XX
NTSC-K:
04700388 388000XX
XX = BRSTM Identifier
A list of the identifiers can be found on the wiiki.
|
|
|
QEMU + GNU Debugger Basic Tutorial |
Posted by: Vega - 11-14-2022, 01:24 AM - Forum: Other
- No Replies
|
|
QEMU + GNU Debugger Basic Tutorial
Editor's NOTE: I'm very new to QEMU, GDB, and general assembling/linking. So if anybody has any improvements or corrections for this tutorial, please share them. Thank you.
Chapter 1: Intro
NOTE: Guide is for Linux only. Verified to work on Debian 10 & Debian 11.
Instead of using something like Dolphin to rig up an environment to test simulate PowerPC code/instructions, you can instead use the QEMU emulator with the GNU Debugger.
The QEMU and GNU programs support a wide variety of languages, thus you can use those programs for the following...
- ARM 64-bit aka AAarch64 (for Nintendo Switch)
- ARM 32-bit (for Starlet on Nintendo Wii)
- PPC 64-bit (for Xbox360 & PS3)
- PPC 32-bit (for Nintendo Gamecube, Broadway on Wii, & Wii U)
Keep in mind that the languages are "generic" for the most part. The Xbox360, PS3, Wii U, Wii, and Gamecube all use unique CPUs that have special/additional instructions & registers to their conventional counterparts. The Switch uses a "generic" Cortex-A57 CPU which uses the ARMv8-a (8.0) language.
The GNU Debugger can specify some CPUs. For example, you can specify PowerPC 32-bit CPU 750cl to try to mimic as closely as possible for Broadway. Regarding Nintendo Switch, you can specify its exact CPU (cortex-a57).
The great thing about QEMU+GDB is that you can test C code, not just basic Assembly files. The following guide will cover debugging a basic Hello World source written in C. Later on, a quick overview of debugging bare-bones Assembly files will also be covered.
Chapter 2: Software Installation
Update & Upgrade your System then Reboot
Code: sudo apt-get update
sudo apt-get upgrade
sudo reboot
Install the GNU Compiler Software for the desired architecture
ARM 64 bit...
Code: sudo apt-get install gcc-aarch64-linux-gnu binutils-aarch64-linux-gnu binutils-aarch64-linux-gnu-dbg
ARM 32 bit...
Code: sudo apt-get install gcc-arm-linux-gnueabihf binutils-arm-linux-gnueabihf binutils-arm-linux-gnueabihf-dbg
PPC 64 bit...
Code: sudo apt-get install gcc-powerpc64-linux-gnu binutils-powerpc64-linux-gnu binutils-powerpc64-linux-gnu-dbg
PPC 32 bit...
Code: sudo apt-get install gcc-powerpc-linux-gnu binutils-powerpc-linux-gnu binutils-powerpc-linux-gnu-dbg
Install QEMU Emulator & GNU Debugger
Code: sudo apt-get install qemu-user qemu-user-static gdb-multiarch build-essential
NOTE: There are other qemu packages such as qemu and qemu-system, we only need user and user-static.
Chapter 3: C file creation and compilation
Create the following Hello World C file. Save it as hello_world.c
Code: #include <stdio.h>
#include <stdlib.h>
int main() {
puts("hello world");
return EXIT_SUCCESS;
}
Compile an executable file from your C source
ARM64:
Code: aarch64-linux-gnu-gcc -ggdb3 -o hello_world hello_world.c -static -mcpu=cortex-a57
ARM32:
Code: arm-linux-gnueabihf-gcc -ggdb3 -o hello_world hello_world.c -static
NOTE: The tags of... "-mbig-endian -march=armv5te -mcpu=arm926ej-s" should be included, but I can't get the file to be compiled when these tags are applied. So if anyone is very familiar with GCC, please let me know how to remedy this.
PPC64:
Code: powerpc64-linux-gnu-gcc -ggdb3 -o hello_world hello_world.c -static
PPC32:
Code: powerpc-linux-gnu-gcc -ggdb3 -o hello_world hello_world.c -static -mcpu=750
NOTE: PPC64 and PPC32 default to big endian. Extra command tags for endianness are not required.
About command tags:
-o = Create object file (executable)
-ggdb3 = Use GNU Debugging symbols
-static = Use static libraries
Chapter 4. Run C file
Launch the file on QEMU
ARM64:
Code: qemu-aarch64 -L /usr/aarch64-linux-gnu -g 1234 ./hello_world
ARM32:
Code: qemu-arm -L /usr/arm-linux-gnueabihf -g 1234 ./hello_world
PPC64:
Code: qemu-ppc64 -L /usr/powerpc64-linux-gnu -g 1234 ./hello_world
PPC32:
Code: qemu-ppc -L /usr/powerpc-linux-gnu -g 1234 ./hello_world
About command tags:
-L /user/xxxxx = Choose which elf interpreter to use
-g xxxxx = Set port number for GDB connection
QEMU & GDB need to run on a port. You can have multiple instances of QEMU+GDB programs running, but they cannot all use the same port.
At this moment you will notice that the terminal command to run QEMU is not doing anything...
This is exactly what you want to see. QEMU is waiting on the GNU Debugger to be launched. On a second terminal, launch the Debugger using the follow terminal command. Do NOT close/exit the first terminal!
ARM64:
Code: gdb-multiarch -q --nh \
-ex 'set architecture arm64' \
-ex 'set sysroot /usr/aarch64-linux-gnu' \
-ex 'file hello_world' \
-ex 'target remote localhost:1234' \
-ex 'break main' \
-ex continue \
-ex 'layout split' \
-ex 'layout next' \
-ex 'layout regs'
ARM32:
Code: gdb-multiarch -q --nh \
-ex 'set architecture arm' \
-ex 'set sysroot /usr/arm-linux-gnueabihf' \
-ex 'file hello_world' \
-ex 'target remote localhost:1234' \
-ex 'break main' \
-ex continue \
-ex 'layout split' \
-ex 'layout next' \
-ex 'layout regs'
PPC64:
Code: gdb-multiarch -q --nh \
-ex 'set architecture ppc64' \
-ex 'set sysroot /usr/powerpc64-linux-gnu' \
-ex 'file hello_world' \
-ex 'target remote localhost:1234' \
-ex 'break main' \
-ex continue \
-ex 'layout split' \
-ex 'layout next' \
-ex 'layout regs'
PPC32:
Code: gdb-multiarch -q --nh \
-ex 'set architecture ppc' \
-ex 'set sysroot /usr/powerpc-linux-gnu' \
-ex 'file hello_world' \
-ex 'target remote localhost:1234' \
-ex 'break main' \
-ex continue \
-ex 'layout split' \
-ex 'layout next' \
-ex 'layout regs'
About ex tags:- set architecture is self explanatory
- set sysroot is for setting the directory that contains the targeted libraries, this must match what elf interpreter you used in the qemu command
- file is self explanatory
- target remote machine:portnumber is to tell gdb what machine and port QEMU is running on. Ofc port number used here must match what was used in the qemu terminal command
- break main is to set a breakpoint on the main function
- continue is to tell gdb to go ahead and run the program, do NOT breakpoint it at the very first assembly instruction
- layout split will split the terminal into two halves where you can see more information simultaneously
- layout regs tells gdb to place GPRs + some SPRs into the upper half of the split layout
Since break main and continue are applied, this tells GDB to run the program, and stop at the first instruction at the main function.
Notice how the port number in the GDB terminal command matches what was used in the QEMU terminal command.
Chapter 5: Basic GNU Debugging Commands, Stepping Thru the C File
At this point your C program is paused at 'main' waiting for further actions. Your GDB should look like this. Registers may or may not be available when you first boot GDB (we will address this shortly). For the picture below in my example, you will see our registers aren't available yet. You will also see we are at the start of our C program.
GDB comes with a large set of Debugging Commands, here's a quick list of useful ones~
GNU Debugging Commands:- step = step C program by 1 line (in Assembly view, this will step execution by 1 instruction)
- stepi = step execution by 1 instruction (for Assembly view only)
- nexti = step the very next instruction below, and bypass branches and function calls when encountered
- break [function name] = set an instruction breakpoint at a function
- step = step to next line in C program (for assembly files, this will do the same as stepi)
- next = allow program to run til next instruction breakpoint
- quit = quit
- delete = delete all instruction breakpoints
- info vector = list FPRs as Vector data first, then as Float data
- layout next = swap to different view (C vs Assembly)
- layout prev = swap to your previous view
Okay so we have GDB running, let's practice some commands. We can switch to Assembly view of our C file by using this command..
Great. Let's swap back to C view using this command...
If registers aren't available upon boot, this can always be remedied via just 1 step (or 1 stepi when debugging a view of Assembly). The registers can now be seen. We can use the step command to step one line of C code, like this...
We can keep using the step command until you will see that the C source becomes unavailable. This is because our program has completed all execution. We can now use quit to exit GDB. Press Y when prompted.
When you have finally exited GDB, take a look at your terminal that was running QEMU. You will see that the QEMU process has been terminated.
NOTE: Be sure to properly exit GDB (via the quit command) or else you will need to use a different port next time you run QEMU.
Chapter 6: GDB Memory Commands
Before we dive into debugging an Assembly file, lets go over how to view memory on the GNU Debugger. Viewing memory is a bit complicated, you cannot (afaik) view memory live on a separate layout or terminal.
Memory command template:
x/nfu addr
x = all gdb memory commands must start with a lower case x.
n = The count of how many units to display. Default value is 1.
f = Display format. Default is x (for hex). d is for signed decimal. u is for unsigned decimal. o is for octal. f is for float.
u = unit type. b for bytes. h for halfwords. w for words. g for doublewords. Default value is w (words).
addr = memory address
If nfu all are set to default values (which would also be the case if they were all omitted), then a the slash (/) is NOT needed in the memory command.
Example memory command showing 4 hex words at address 0x1234C
x/4xw 0x1234C
Instead of having to fill in an address, you can instead reference a register using the "$" symbol.
Example memory command showing 2 hex words located at the Stack Pointer:
x/2xw $sp
Chapter 7: Create Executable file from Bare-bones Assembly
Instead of writing the file in C, we can use an Assembly file without any standard libraries. Delete your original hello_world executable, so we can make a new one via Assembly.
Choose the following assembly source that you want to use and save it as hello_world.s. Be sure the "s" is lowercase. The source will use the emulated computer's system calls (via QEMU) to printf a message.
ARM 64bit:
Code: .section .text
.global _start
_start:
/* syscall write(int fd, const void *buf, size_t count) */
mov x0, #1
ldr x1, =msg
ldr x2, =len
mov w8, #64 /*Syscall number for write fo ARM64*/
svc #0
/* syscall exit(int status) */
mov x0, #0
mov w8, #93 /*Syscall number for exit for ARM64*/
svc #0
msg:
.asciz "Hello, ARM64!\n"
len = . - msg
ARM 32bit:
Code: .section .text
.global _start
_start:
/* syscall write(int fd, const void *buf, size_t count) */
mov r0, #1
ldr r1, =msg
ldr r2, =len
mov r7, #4 /*Syscall number for write for ARM32*/
svc #0
/* syscall exit(int status) */
mov r0, #0
mov r7, #1 /*Syscall number for exit for ARM32*/
svc #0
msg:
.asciz "Hello, ARM32!\n"
len = . - msg
PPC 64bit:
Code: .section .text
.global _start
.section ".opd","aw"
.align 3
_start:
.quad ._start,.TOC.@tocbase,0
.previous
.global ._start
._start:
/* syscall write(int fd, const void *buf, size_t count) */
li 3, 1
lis 4, msg@highest
ori 4,4, msg@higher
rldicr 4, 4, 32, 31
oris 4, 4, msg@h
ori 4, 4, msg@l
li 5, len
li 0, 4 /*Syscall number for write for PPC64*/
sc
/* syscall exit(int status) */
li 3, 1
li 0, 1 /*Syscall number for exit for PPC64*/
sc
msg:
.asciz "Hello, PPC64!\n"
len = . - msg
PPC 32bit:
Code: .section .text
.global _start
_start:
/* syscall write(int fd, const void *buf, size_t count) */
li 3, 1
lis 4, msg@ha
addi 4, 4, msg@l
li 5, len
li 0, 4 /*Syscall number for write for PPC32*/
sc
/* syscall exit(int status) */
li 3, 0
li 0, 1 /*Syscall number for exit for PPC32*/
sc
msg:
.asciz "Hello, PPC32!\n"
len = . - msg
--
Side note: View Chapter 9 for more info about syscalls
To assemble the source into an executable, the 2 following terminal commands are required...
ARM64:
Code: aarch64-linux-gnu-as -mcpu=cortex-a57 hello_world.s -o hello_world.o
aarch64-linux-gnu-ld hello_world.o -o hello_world
ARM32:
Code: arm-linux-gnueabihf-as -march=armv5te -mcpu=arm926ej-s -mbig-endian hello_world.s -o hello_world.o
arm-linux-gnueabihf-ld -EB hello_world.o -o hello_world
PPC64:
Code: powerpc64-linux-gnu-as -mregnames hello_world.s -o hello_world.o
powerpc64-linux-gnu-ld hello_world.o -o hello_world
PPC32:
Code: powerpc-linux-gnu-as -mregnames -m750cl hello_world.s -o hello_world.o
powerpc-linux-gnu-ld hello_world.o -o hello_world
NOTE: PPC64 and PPC32 default to big endian. Extra tags for endianness are not required.
Chapter 7: Launch file, Stepping Instructions
You will notice that the QEMU and GDB terminal commands have been tweaked since we are now using an Assembly file.
Launch QEMU~
ARM 64-bit:
Code: qemu-aarch64 -g 1234 ./hello_world
ARM 32-bit:
Code: qemu-arm -g 1234 ./hello_world
PPC 64-bit:
Code: qemu-ppc64 -g 1234 ./hello_world
PPC 32-bit:
Code: qemu-ppc -g 1234 ./hello_world
Launch GNU Debugger in a second terminal~
ARM64:
Code: gdb-multiarch -q --nh \
-ex 'set architecture aarch64' \
-ex 'file hello_world' \
-ex 'target remote localhost:1234' \
-ex 'layout split' \
-ex 'layout regs'
ARM32:
Code: gdb-multiarch -q --nh \
-ex 'set architecture arm' \
-ex 'file hello_world' \
-ex 'target remote localhost:1234' \
-ex 'layout split' \
-ex 'layout regs'
PPC64:
Code: gdb-multiarch -q --nh \
-ex 'set architecture ppc64' \
-ex 'file hello_world' \
-ex 'target remote localhost:1234' \
-ex 'layout split' \
-ex 'layout regs'
PPC32:
Code: gdb-multiarch -q --nh \
-ex 'set architecture ppc' \
-ex 'file hello_world' \
-ex 'target remote localhost:1234' \
-ex 'layout split' \
-ex 'layout regs'
Registers may or not be available at this moment.
At this point you can start instruction stepping (via the stepi command). Let's step just one instruction...
Fyi, when you step, any register(s) that are changed by the stepped instruction will become highlighted (except in this case of initially making the Registers available). Similar concept to how registers will change to a red font in the Dolphin Emulator.
Stepping is cool enough, but lets view some memory. Let's take a look at the first 4 word values (as hexadecimal) of the Stack. Use the following command...
Sweet! Feel free to play around with this to get a better feel.
Please NOTE the GDB will not allow you to step thru the actual system calls themselves. When you type stepi on sc/svc you will be navigated to the two instructions ahead of said sc/svc. System calls are emulated and cannot be customized. More on syscalls in Chapter 9.
Chapter 8: Alternative Method for Assembly Files
Alternatively, you can do Assembly Files with just one terminal command. You will have to change the lowercase "s" in hello_world.s to be Capitalized (hello_world.S). However with this method, you are more limited on cpu and architecture specification.
ARM 64-bit:
Code: aarch64-linux-gnu-gcc -ggdb3 -nostdlib -o hello_world -static hello_world.S
ARM 32-bit:
Code: arm-linux-gnueabihf-gcc -ggdb3 -nostdlib -o hello_world -static hello_world.S
PPC 64-bit:
Code: powerpc64-linux-gnu-gcc -ggdb3 -nostdlib -o hello_world -static hello_world.S
PPC 32-bit:
Code: powerpc-linux-gnu-gcc -ggdb3 -nostdlib -o hello_world -static hello_world.S
-nostdlib = Do not include any libraries that are not entirely present in the source file(s).
Chapter 9: Syscall Tables
This chapter is present to address the use of syscalls in the barebones Assembly Source examples. Syscalls allow the user to handle tasks such as console input/output, memory allocation, and file management via bare bones assembly. QEMU cannot emulate custom syscalls. Every CPU Architecture has built-in syscalls with a unique syscall table. QEMU will emulate these.
If you are familiar with tinkering with ISFS/IOS for Wii Files, then adapting to syscalls is very easy. They essentially operate the same (you use a file-open syscall to get an fd, and use the fd for future file based syscalls).
You can search around on Google and easily find the syscall table for your desired architecture.
I may expand this chapter/section in this future for a quick tut on using these syscalls with various examples.
Chapter 10: Final Note
You may get unknown errors when quitting GDB. This will usually occur if you quit after you have called the sc/svc for the exit status when stepping. Simply press Y to quit the session and press N to deny core file creation on GDB.
|
|
|
All Items Can Land V2.0 Lite [MrBean, CLF78] |
Posted by: Deez Nutz - 11-13-2022, 11:41 PM - Forum: Incomplete & Outdated Codes
- Replies (1)
|
|
This Is a lite version of the original all items can land by Mr Bean but will not freeze when used with future fly or stalking,
With this version dropped items such as megas/shocks/pow/bloopers will not work when dropped.
NTSC U
0479D6B8 60000000
0478DD24 38600000
04787EE4 39800001
04787EE8 39600001
04787EEC 39400001
04787EF0 39200001
PAL
047A66C4 60000000
0468000C 38600000
04790EF0 39800001
04790EF4 39600001
04790EF8 39400001
04790EFC 39200001
NTSC J
047A5D30 60000000
04680014 38600000
0479055C 39800001
04790560 39600001
04790564 39400001
04790568 39200001
|
|
|
No Boundary Check V2.0 |
Posted by: Deez Nutz - 11-13-2022, 11:30 PM - Forum: Code Support / Help / Requests
- Replies (8)
|
|
Lets you drive anywhere out of the map without respawning,
unlike the longer version by Anarion my 3 line version of this code does not cause the camera to glitch out of bounds when there are more then 2 players.
NTSC U
0056F033 00000000
C259728C 00000003
B27D0334 A01D0334
PAL
00573E83 00000000
C25A22C4 00000003
B27D0334 A01D0334
NTSC J
00573803 00000000
C25A1C44 00000003
B27D0334 A01D0334
|
|
|
Item used Modifier [Unnamed] |
Posted by: Unnamed - 10-17-2022, 08:00 AM - Forum: Offline; Item
- No Replies
|
|
Item used Modifier [Unnamed]
If you have an Item in Inventory and you press your Fire button, you will instead use the Item you specify here.
NTSC-U
C278894C 00000002
3BE000YY 93E30090
3BE000XX 00000000
PAL
C2791958 00000002
3BE000YY 93E30090
3BE000XX 00000000
NTSC-J
C2790FC4 00000002
3BE000YY 93E30090
3BE000XX 00000000
NTSC-K
C277FD18 00000002
3BE000YY 93E30090
3BE000XX 00000000
XX/YY Item Values:
00/01 = Green Shell
01/01 = Red Shell
02/01 = Banana
03/01 = Fake Item Box
04/01 = Mushroom
05/01 = Triple Mushroom
06/01 = Bob-omb
07/01 = Blue Shell
08/01 = Lightning
09/01 = Star
0A/01 = Golden Mushroom
0B/01 = Mega Mushroom
0C/01 = Blooper
0D/01 = POW Block
0E/01 = Cloud
0F/01 = Bullet Bill
10/03 = Triple Green Shell
11/03 = Triple Red Shell
12/03 = Triple Banana
14/00 = Nothing
Source:
##########################################################
Adresses:
0x8078894C (NTSC-U)
0x80791958 (PAL)
0x80790FC4 (NTSC-J)
0x8077FD18 (NTSC-K)
##########################################################
li r31, 0xYY ## Load the Item Number in r31
stw r31, 0x90 (r3) ## overwrite Item number in Inventory
li r31, 0xXX ## Load the Item in r31
##########################################################
Code Creator: Unnamed
Code Credits: Bully
|
|
|
ARM Tutorial for Use with Starlet |
Posted by: Vega - 10-07-2022, 08:39 PM - Forum: Other
- Replies (1)
|
|
ARM Tutorial for Use with Starlet
First thing's first. Huge Thank You to Palapeli!
Author's Note: This thread is located in this subforum because the PPC Assembly Tutorial subforum is for PPC only. I don't see a need to make an ARM Assembly Tutorial subforum.
This is an ARM tutorial for the Starlet Core of the Wii. It is designed for those who are already experts in coding/programming with Broadway PowerPC on the Wii. This will cover the basics of the ARMv5 Assembly language, and some Starlet-specific attributes. You will also be provided with some programs/tools to assist you in testing ARM code on Starlet/IOS. For more info, read the following manuals---
NOTE: ONCE AGAIN, this is for those who are already experts in PPC (Broadway) Assembly! This is *not* a "1-0-1" tutorial, this is a "2-0-1" tutorial.
Chapter 1: Intro, Understanding the Starlet and Broadway Boot Sequences
While Broadway is the CPU that runs the Wii's Games, Channels, HBC apps, etc, there is another CPU in the background that actually is the 'Master' CPU. It is an Arm9 (arm926ej-s) core nicknamed "Starlet". It's official name is "IOP" which stands for Input-Output Processor. Starlet uses the Armv5 (specifically ARMv5tej) ARM Assembly language.
When you power on the Wii, Starlet boots from an internal MASK ROM (boot0). The following boot sequences are performed~
boot0: Will decrypt and verify (via checksum) the first 48 blocks of the NAND (these blocks is boot1's Code). Generated checksum is verified against what is stored in the OTP (One Time Programmable Register). If checksums don't match, boot0 will halt. Contents of the first 48 blocks of NAND are guaranteed by the manufacture. Older boot1's had a strmcmp bug in it which allowed the installation of Bootmii (as boot2).
boot1: Contains ARM Code that will initialize the DDR3 memory, setup some Hollywood/Hardware Registers, and check the boot2 version number that is in the SEEPROM. If that number in SEEPROM is higher than what is in the TMD of boot2, then boot1 will halt. Otherwise, boot1 will load up boot2 and then proceed to execute the ARM code in boot2. There are two copies of boot2 in the NAND just in case one copy gets corrupted.
boot2: A stripped down IOS with a small amount of tasks. boot2 will check the Wii Menu's TMD (Title Metadata) to know which IOS to load for Starlet to run. Boot2 will then load up the System Menu's IOS into memory, and then IOS will be 'booted' and take 'control'.
IOS: Also known as boot3. Loads the Wii Menu into memory. Every menu/channel/etc has a "BS1" code that gets placed into memory at 0x00003400. This "BS1" code is a small snippet of PowerPC code, more on that later.
IOS's are single or multiple ELFs of ARM code that Starlet places into memory and runs the code as an "Operating System". Every IOS has a kernel which is placed into SRAM. SRAM is the memory dedicated for Starlet (under normal operations, Broadway cannot access this memory). Most other IOS's will have additional ELFs that are placed into MEM2 as 'modules/plugins'. Boot2 is an IOS with only a bare bones kernel.
There are different types of IOS for different purposes. Example: IOS80 runs Wii Menu 4.3. This means when the Wii Menu (Broadway/PPC) is running, IOS80 (Starlet) is also running in the background.
Once the Wii Menu is in memory, IOS will write some very elementary PPC code at the EXI Boot Base (a group of Hollywood Hardware Registers). Whenever something is written to the EXI Boot Base, it is copied over (automatically by hardware) to physical address 0xFFF00100 (Broadway's Reset Vector Address).
This is the following code (or something very similar, depending on the specific IOS) that is written to the EXI Boot Base~
Code: lis r3, entry@h #entry is some physical mem1 address
ori r3, r3, entry@l
mtsrr0, r3
lis r4, msr@h #msr is the value to give the Machine State Register after the rfi has executed
ori r4, r4, msr@l
mtsrr1, r4
rfi
'entry' is usually the address of 0x00003400. This is the address to the "BS1" boot code that was placed into memory earlier. 'msr' is the Machine State Register value and it is usually zero.
After writing to the EXI Boot Base, IOS will power on the Broadway (via poking some Hollywood Registers). The Broadway chip boots up and is now running.
When Broadway is powered on from a Hard Reset, it's MSR value is 0x00000040. The only bit high is the IP bit (exception prefix), which means exceptions use the addressing mode 0xFFFx_xxxx. This is why Broadway boots at 0xFFF00100 instead of at 0x00000100.
Anyway, Broadway executes the code starting at 0xFFF00100 and will jump to 0x00003400 with an MSR value of zero. Broadway is now at the "BS1" code. Anyway, the BS1 contains code that will do some basic tasks such as setup some Broadway specific registers, adjust the MSR yet again, configure the Cache, and configure the BAT registers.
Once that has been completed, execution of Broadway will jump to 0x81330000 (now running in Virtual Cached Mode) and the Wii Menu is now officially running.
It's important to note that once Broadway has been powered on, IOS will recede into a very small loop waiting for tasks/interrupts that are called on by Hardware or Broadway (such as an IPC request for modifying a file on the NAND). When Broadway is running, Starlet (IOS) is running in this loop 99.9% of the time, basically doing nothing.
Chapter 2: Memory, Modes, Instruction Sets, Endianess, Registers
Memory regarding MEM1 and MEM2 (aka Broadway's memory) is straight forward. When referencing this memory on your ARM instructions, you would just use the physical address (i.e. 0x00001500). For Hollywood Registers, just use the physical address equivalent but with bit 8 set high (i.e. 0x0D800064) However, using SRAM (Starlet's dedicated memory) is confusing....
Without getting into the nitty gritty of various SRAM mappings that can be utilized, SRAM is configured as such whenever a Wii app/game/etc is running...
- 0xFFFE0000 thru 0xFFFE7FFF = SRAM B aka SRAM 1 (32KB size)
- 0xFFFE8000 thru 0xFFFEFFFF = Junk, literally junk values, has no meaning or effect
- 0xFFFF0000 thru 0xFFFFFFFF = SRAM A aka SRAM 0 (64KB size)
The IOS Kernel executes code and function calls located at SRAM A (similar to how Wii games execute code at Mem80). Certain parts of SRAM A and all of SRAM B is used as Data/Storage (similar to mem9 on Broadway).
The above addresses are "Mirrors". They are not "real" addresses per say. Also, "Mirrors" are **not** Virtual Addresses. The Mirrors are done by hardware, not software. These Mirrors were implemented to comply with Starlet's hardware requirements for addressing in regards to resets and exceptions.
These are the physical/real/true ranges of SRAM (when Wii games/etc are running):- 0x0D4E0000 thru 0x0D4E7FFF = SRAM B
- 0x0D4E8000 thru 0x0D4EFFFF = Junk
- 0x0D4F0000 thru 0x0D4FFFFF = SRAM A
Therefore, to change a physical address to it's Mirrored equivalent, just add 0xF2B00000.
Regarding the IOS Kernel, it is the highest level of security of IOS and has all privileges. Think of it like the "inner/root core" of the entire IOS. IOS interacts with the AES, SHA, and HMAC engines from the Kernel. When IOS idles while the PPC is running, it is idling via the 'main/idle' loop (called thread 0) that resides in the kernel.
The rest of IOS, what is known as the Modules/Plugins, is located in MEM2 at 0x13400000 thru 0x13FFFFFF (12MB). Under normal conditions, Broadway cannot write to this region of memory.
When an exception occurs in Starlet/IOS, memory mapping starts at 0xFFFF0000. More on exceptions in the next Chapter.
Starlet can execute in 3 different modes---
- ARM Mode (all exceptions run in ARM Mode)
- Thumb Mode
- Jazelle Mode
ARM Mode is the standard 32-bit mode. Instructions are 32-bits in size. All instructions are available for usage.
Thumb Mode is when the processor will run 16-bit sized instructions. This is a reduced instruction set due to the 16-bit size of Thumb instructions. This is also known as Thumbv1 Mode in newer/modern ARM manuals. Starlet runs in this mode for the main loop of the Kernel.
Jazelle Mode is when the processor executes in a mode tailored for a Java Virtual Machine. Starlet *NEVER* runs in this mode. Therefore, it won't be covered.
Unlike other ARM cores, Starlet runs in Big Endian mode by default, but can be configured to run in Little Endian mode. This was done for compatibility with the Broadway chip.
In the Manuals linked in the first Chapter, the bit labeling is done in Little Endian 'fashion'. The most significant (far left) bit is named as bit 31, and least significant (far right) bit is named as bit 0.
Starlet has absolutely zero Floating Point functionality. Starlet does implement its own Cache and address translation.
Register List of Starlet (in ARM Mode)- r0 thru r3 = Volatile Registers w/ r0 being used for function return values (think of these like r3 thru r10 of Broadway)
- r4 thru r10 = Non-Volatile Registers (think of these like r14 thru r31 of Broadway)
- r11 = Frame Pointer (points to bottom of current Frame; more on this in Chapter 12)
- r12 = Scrap register (similar to r0 of Broadway but without the stupid literal zero rule )
- r13 = Stack Pointer (like r1 for Broadway)
- r14 = Link Register (unlike Broadway, you can directly read/write to the LR)
- r15= Program Counter (unlike Broadway, you can read/write to the PC)
- CPSR (Current Program Status Register)
Register List of Starlet (in Thumb Mode)- r0 thru r7 only (r0 - r3 volatile, r4 - r6 non volatile, r7 scrap)
- SP, LR, PC, CPSR
CPSR Breakdown (NOTE: Reverse the bit numbers if referencing an ARM manual!!!)- bit 0 = Negative/Less-Than; flipped high if a comparison results in a negative number
- bit 1 = Zero; flipped high if a comparison results in a zero value
- bit 2 = Carry/Borrow/Extend: flipped high is if a value went above/below its maxed and then is represented by a negative/positive number.
- bit 3 = Overflow; result of any instruction that the value cannot be represented
- Bit 4 = Saturation; if an overflow and/or Saturation occurs for certain DSP-oriented instructions
- Bit 5 and 6 = Reserved
- Bit 7 = Jazelle Enable; read only. Starlet runs in a super water downed 8-bit variable mode. Do not write to this to enable/disable Jazelle
- bit 8 thru 23 = Reserved
- Bit 24 = IRQ Disable (aka disable interrupts)
- Bit 25 = FIQ Disable
- Bit 26 = Thumb Enable; read only. Starlet runs in a faster but limited mode where instructions (that can be used) are 16-bits in size. Do not write to this to enable/disable Thumb
- Bits 27 thru 31 = Processor Mode (more info on this in the next Chapter)
Chapter 3: Exceptions and Starlet Syscalls
Here's a list of all Exceptions with their Exception Vector Address plus a short Description....
- 0xFFFF0000 Reset Vector (for resets or cold boots)
- 0xFFFF0004 Undefined/Illegal Instruction (similar to a Program Exception in PPC)
- 0xFFFF0008 Software Interrupt (called whenever swi instruction is executed; similar to a System Call exception in PPC)
- 0xFFFF000C Prefetch/Instruction Abort (similar to a ISI exception in PPC)
- 0xFFFF0010 Data Abort (similar to a DSI exception in PPC)
- 0xFFFF0014 Reserved
- 0xFFFF0018 IRQ (interrupt request; similar to External Interrupt exception in PPC)
- 0xFFFF001C FIQ (fast interrupt)
At each Exception Vector Address is a single instruction that changes the Program Counter to automatically branch to the actual code/program that handles said exception. Before discussing Exceptions in further detail, we need to cover the various Processor Modes. Here is a list of all 7 of them...
CPSR Processor Mode bit value breakdown:- 0x10 = User Mode
- 0x11 = Fast Interrupt Request Mode
- 0x12 = Interrupt Request Mode
- 0x13 = Supervisor Mode
- 0x17 = Abort Mode
- 0x1A = Undefined Mode
- 0x1F = System Mode
All modes except User mode are known as Privileged Modes. All Privileged Modes except System Mode are known as Exception Modes. User mode is the most restrictive. The only way for Software to exit out of User mode is through an Exception and the contents of that Exception contain instructions to switch to a Privileged Mode once the Exception has been finished. You can 'manually' enter into any Exception Mode with some CPSR related instructions, but it Starlet is usually only in this mode due to an actual Exception occurring.
When any exception occurs, the following takes place...
1. Exception Mode enabled and execution of Starlet is now in Physical Memory
2. Banked Registers enabled
3. SPSR Register enabled (what was in the CPSR, before the exception, is now copied into the SPSR)
4. CPSR value changed depending on type of Exception taken
1. The type of Exception Mode that gets set depends on the type of Exception that occurred. Here is a list...- Reset = Supervisor Mode
- Undefined/Illegal = Undefined Mode
- Software Interrupt = Supervisor Mode
- Prefetch Abort = Abort Mode
- Data Abort = Abort Mode
- IRQ = Interrupt Request Mode
- FIQ = Fast Interrupt Request Mode
2. Banked Registers are certain Registers that are only visible/usable during the Exception Modes. The type and amount of Banked Registers allowed for usage depends on the type of Exception that was taken. In every Exception, SP (r13), and LR (r14) are banked. This means you are now using a different version of SP & LR. The original SP & LR are preserved and can only be used again once you leave the Exception. This is a crafty method implemented in ARM to easily save important data without the need to 'manually' save it. The FIQ Exception Mode is the only oddball. It will also include banking registers r8 thru r12.
Each mode has its own banked versions of SP and LR. Thus, there are 6 versions of SP & LR. The regular version, and the versions present for the 5 Exception Modes. Please note that User mode and System Mode do ***NOT*** bank any registers!
The banked LR will be set depending on the type of Exception that occurred (except for Reset, it will be an unpredictable value). Refer to chapter A2.6 of the ARMv6 manual for details of every banked LR value for each Exception.
All banked SP values are set by IOS sometime during initial environment setup/config. In order to change a banked SP, let's say the Banked IRQ SP register, you must be in IRQ Exception Mode.
3. The SPSR (Saved Program Status Register) gets filled with the CPSR's value that was present before the Exception was taken. Each Exception Mode has its own banked version of the SPSR. In totality, there are 5 SPSR's.
4. The CPSR will have some values changed. In every exception, T (thumb) and J (jazelle) bits are cleared. Thus, every exception executes in ARM mode! For every exception (except FIQ), the FIQ bit is left unchanged. In FIQ mode, it is set high (FIQ disabled). For every exception, IRQ bit is set high (IRQ disabled).
A picture is worth a 1000 words. Here is a handy diagram of the Banked Registers~
When an exception has ended, SPSR is copied over to the CPSR. Therefore, whatever mode (User or System) you were in beforehand, is now restored. SP and LR (plus r8 thru r12 for FIQ) original versions/values are also restored.
For every IOS, the Undefined/Illegal instruction exception vector goes to the address of 0xFFFF1F24. This is the location of IOS's specialized Syscall Handler. There are 2 types of Syscalls for IOS.
Regular ARM syscalls:
This uses the swi instruction. Whenever Starlet executes the swi instruction, an exception will occur and Starlet will start execution at 0xFFFF0008.
IOS syscalls:
IOS handles various illegal instructions to represent special syscalls that will preform certain tasks. The 'base' syscall instruction has the compiled form of 0xE6000010. Each syscall has a number, starting at syscall 0. You insert the number into the instruction by shifting the syscall number to left by 5 bits then logically ORing it into 0xE6000010.
Example (syscall 0x54)
- Shift 0x54 by 5 bits to the left. 0x54 => 0xA80
- Logically OR 0xA80 with 0xE6000010
- The result is 0xE6000A90.
Therefore to 'call' syscall 0x54, you would use the instruction of 0xE6000A90. Keep in mind some modules/plugins cannot execute certain syscalls due to lack of privilege. Syscall 0x54 can only be called from the Kernel or the /dev/es module.
The Syscall Program Handler is located at 0xFFFF1F24. Starlet will execute this handler to make a new stack frame to backup registers, check if the instruction is a special syscall instruction, and then use a lookup table to determine which address/function to call based on the syscall number.
Link to WiiBrew article of syscalls and a decomp of the syscall handler - https://wiibrew.org/wiki/IOS/Syscalls
Chapter 3: Immediate Value Rules
While PPC has a signed or unsigned 16-bit immediate value range for instructions which makes it super simple to write any 32-bit value in a register (lis+ori), the way ARMv5 language implements immediate values is can be a pain the ass.
To load a value 'from scratch' into a register in ARM, you will typically use the mov instruction. Example~
Code: mov r0, #1 @To write comments, use the '@' symbol.
When writing any immediate value in any ARM instruction, it *MUST* have a hashtag sign pre-pended. Notes/comments are designated via the '@' symbol.
You can use any value that can be expressed in an 8-bit field within a 32-bit 'space'. For example, the value 0xFF000000 is a legal immediate value. 0xFF fits into 8 bits. If the value consists of 2 or more binary 1's, the first and last binary 1 must fit within an 8 bit field.
Therefore, something like the value of 0x000001C1 is invalid. 0x1C1 cannot be expressed in 8 bits.
Examples~- 0 = valid
- 0x1 = valid
- 0xF7 = valid
- 0x17800000 = valid
- 0x0003FC00 = valid
- 0x101 = invalid
- 0x100C0045 = invalid
0xFFFFFFFF (-1) is an invalid value. However, there is an instruction called mvn (Move then Logically NOT). It the same as mov but will Logically-NOT the value. Therefore, to load 0xFFFFFFFF into a register, you can do this...
Code: mvn r3, #0 @Loads zero then does a logical NOT of it, resulting in -1.
Therefore any 8-bit number in the allowed range, its "logical NOT'd" value is also valid.
Code: mvn r3, #255 @Loads 0xFF then does a logical NOT of it, resulting in 0xFFFFFF00 (-256)
Each 32-bit ARM instruction has 8 bits within them for an initial value and 4 bits as a rotation mechanism. If you want to learn how instructions actually process immediate values, formula is listed below.
V = n ror (2*r)
V = Immediate Value
n = 8-bit Initial Value
r = 4-bit Rotation Value
ror = Rotate Right (this is ROTATING, *not* shifting)
Example~- Use a 8 bit value (n) of 0xFF
- Use a 4-bit value (r) of 0xB
First, take 0xB and multiply it by 2. You will get 0x16 which is 22 in decimal.
Now take the value of 0xFF and rotate it to the right by 22 bits.
You get the result of 0x0003FC00. This value is safe to use in the mov instruction
For the case you are lazy and would rather just type a number into a script/program to check it's validity, there's a person named Azeria who has made such a script.
Link - https://raw.githubusercontent.com/azeria...rotator.py
Enter in the number, in decimal form, script will tell you if you can use it or not.
If the number is invalid (and the mvn instruction won't work either), you will need to break it apart via multiple instructions. For example, since the value of 511 is an invalid immediate value. Therefore you would do this, to load it into let's say r0.
Code: mov r0, #255 @Hex is 0xFF
add r0, r0, #256 @Hex is 0x100. 255 + 256 = 511. r0 now = 511
The immediate values rules apply to ***ALL*** instructions, not just mov.
Chapter 4: Multiple Input Variations for same Instruction
For Broadway, there are many different versions of instructions, such as differences for an instruction when it uses two source registers vs the type of instruction that use 1 source and 1 immediate value (i.e. add vs addi)
However for ARM, this is not the case. The add instruction can be written out in different forms. Instead of having add,addi,etc instructions, you just have add.
Example~
Code: add r0, r3, r4 @Add two source registers to place result in destination register
add r0, r3, #100 @Add source register and immediate value to place result in destination register
Chapter 5: Compares, Branches, and Conditional Instruction Execution
Compare and branch instructions work similarly to PPC. You can use label names just like how you did with PPC. There are a slew of simplified mnemonic conditional branches that can be used.
Example~
Code: cmp r4, 100 @Check r4 vs 100
beq some_label
Another Example~
Code: cmp r4, r12 @Check r4 vs r12
bgt some_label
How Compare Instructions actual operate:
When a comparison (cmp) instruction is executed, the Source Register (or Immediate Value) is subtracted *FROM* from the Destination Register. For example..
Will do r4 minus r12. The result of this subtraction will flip the appropriate CPSR bits high (i.e. Negative/Less Than Bit).
ARM's cmp instruction doesn't allow specification of a signed vs unsigned comparison. However, there are a wide variety of branch instructions to compensate for this...
Here are all the Conditional Branch options.
- EQ = Equal
- NE = Not Equal
- GT = Greater Than (signed)
- LT = Less Than (signed)
- GE = Greater Than (signed) or Equal
- LE = Less Than (signed) or Equal
- HS = Unsigned Higher or Same
- LO = Unsigned Lower Than
- MI = Negative
- PL = Positive or Zero
- VS = Signed Overflow
- VC = Not Signed Overflow
- HI = Unsigned Higher
- LS = Unsigned Lower or Same
- CS = Carry Bit Set; same thing as HS
- CC = Carry Bit Clear; same thing as LO
- AL = Always (same as just a regular Branch - B)
Example~
Code: bhs some_label @branch to some_label if result unsigned higher or same
Unlike PPC, there are no branch hints you can use. There is also no dedicated Condition Register (let alone multiple CRs) for Starlet.
Almost all ARM instructions can have conditional operations applied to them too.
Example:
Code: cmp r10, r11 @Compare r10 vs r11
addne r3, r4, r5 @If r10 doesn't equal r11, execute the addition of r3 + r4 into r5.
Another example:
Code: cmp r7, #0x400 #Compare r7 vs 0x400
movlt r0, #1 #Set r0 to 1 if r7 is less then 0x400
ARM also comes with the cmn instruction. It's basically the opposite the cmp. Instead of a subtraction being done for the operation, an addition is preformed.
Example:
This would do the operation of r10 + 0xC10 to set the appropriate CPSR bits.
Chapter 6: Logical Operations plus additional Logical Shift/Rotation Feature
ARM comes with some instructions for logical operations. They are all pretty plain jane. Therefore, there is no need to really deep dive into them.
- orr = Logical OR
- and = Logical AND
- eor = Logical XOR
- bic = Bit Clear
- lsl = Shift Left
- lsr = Shift Right
- asr = Algebraic Shift Right (real name is Arithmetic Shift Right; just calling it Algebraic for those familiar with PPC)
- ror = Rotate Right
To mimic...
nand rD, rA, rB
Use....
and rD, rA, rB
mvn rD, rD
To mimic...
nor rD, rA, rB
Use...
orr rD, rA, rB
mvn rD, rD
To mimic...
not rD, rA
Use...
mvn rD, rA
To mimic...
rlwinm rD, rA, 0, 0x00008000 #Big Endian bit 16
Use...
bic rD, rA, #0x00008000
To mimic...
rlwimi rD, rA, 0, 0x00008000 #Big Endian bit 16
Use...
bic rD, rD, #0x00008000 @Erase previous bit 16 value, whatever it was, in rD
and rA, rA, #0x00008000 @Erase all previous bit values except bit 16 in rA
orr rD, rD, rA @Now OR the two registers together to insert rA's bit 16 into rD
Most instructions also allow the usage of a additional shift/rotation to it's 2nd/3rd source register before the instruction continues further calculations. No, you cannot use this feature for on an immediate value.
Example:
Code: add r3, r4, r5, lsl #2
This will preform a 2-bit lefthand shift of r5's value *BEFORE* adding it to r4.
This will preform a 27-bit righthand shift of r3's value *BEFORE* r0 is compared to it.
Here is a list of all operations you can use:- lsl = shift left by bit amount
- lsr = shift right by bit amount
- asr = algebraic shift right by bit amount (similar to a srawi ppc instruction)
- ror = rotate right by bit amount
Chapter 7: Basic Loads & Stores
Just like with PPC, in order to modify contents in memory, it must be loaded into a register, the register modified, then stored back to memory. Instead of writing the source register in parenthesis, it must be enclosed within brackets.
Example:
Code: ldr r8, [r2] @Load the word at address in r2, place into r8
ldr is ths default load instruction, it loads a word.
ldrh = Load Halfword
ldrsh = Load Halfword Signed (similar to lha instruction in PPC)
ldrb = Load Byte
ldrsh = Load Byte Signed
str is the default store instruction, it stores a word.
strh = Store Halfword
strsh = Store Halfword Signed (stores the lower 16 bits of the register and then sign extends it so a 32-bit value is actually stored. It's like a PPC extsh instruction followed by a stw)
strb = Store Byte
strsb = Store Byte Signed (same mechanism as strsh, but for bytes)
Example of load instruction using a source register and immediate value
As you can see in the above example, the immediate value goes at the tail end of the instruction. Here's an example of a store halfword instruction that uses two source registers~
Code: strh r0, [r0, r1] @halfword of r0 is stored at address designated by r0+r1.
This would be similar to be sthx PPC instruction.
Chapter 8: Program Counter Details and Literal Pools
Unlike PowerPC, in ARMv5, the Program Counter (PC) is a General Purpose Register. You can freely read/write to. Therefore, you can also use it as a loading/storing reference (PC + offset value).
In ARM Mode, the PC is the current instruction + 8.
In Thumb Mode, the PC is the current instruction + 4.
This is because the PC is always 2 instructions ahead of the current instruction that is going to execute.
Example (ARM mode assumed):
Code: ldr r0, [pc]
add r1, r2, r3
nop
In the above example, when the ldr instruction executes, r0 will be loaded with "0xE320F000", which is the compiled form of a nop in ARMv5.
Another Example (ARM mode assumed):
Code: ldr r0, [pc, #0xC]
add r1, r2, r3
nop @PC + 0
nop @PC + 4
nop @PC + 8
bic r5, r6, #0xF @PC + 0xC
In the above example, the compiled form of the bic instruction will be loaded into r0 for the ldr instruction. To load exactly where you are current at, in ARM mode, you would need to load at PC minus 0x8.
With the PC being a general purpose register, we can use it as an alternative method to PPC style BL Tricks.
Example (ARM mode assumed; pretend first instruction resides at memory address 0x00001500):
Code: ldr r0, [pc, #0x8]
add r1, r2, r3
nop
b the_end
@Address of test string
.long 0x00001514
@Test string, located at 0x00001514
.asciz "This is a test."
the_end:
The load instruction is loading the word value located at PC+8. Which is 0x00001514. At address 0x00001514 is the asciz instruction. Manually calculating address values in this manner is a pain. Therefore we can use what are called literal pools.
Example:
Code: ldr r0, =Test
add r1, r2, r3
nop
b the_end
Test:
.asciz "This is a test."
the_end:
HOWEVER, this is nothing more than a compiler trick! Whatever compiler (or instruction simulator) you are using, it will auto add the "long 0xXXXXXXXX" somewhere in your source (usually at the end). The literal pool trick will not work for something such as manually writing over instructions in IOS. You may need to use a BL-Trick Lookup Table instead.
Chapter 9: Pre & Post Indexing of Loads/Stores
The ARM language doesn't have 'update' versions of loads/store instructions like with PPC. However the use of pre/post indexing can mimic this.
To mimic a PPC stbu instruction, you can do this...
Code: strb r4, [r0, #0x20]! @Take note of the "!" appended after the bracketed contents
The byte of r4 is stored to r0+0x20, *THEN* r0 is added by 0x20.
You also can do what is called post indexing. Example~
This will load the word located at r6 into r1. It is ***NOT*** loaded at r6+4. Once the word has been loaded, *THEN* r6 is incremented by 4. As you can see this is different than pre indexing.
Chapter 10: Multi Loading/Storing
ARM can't exactly replicate how PPC does multi loads/stores but it does come with some unique instructions that PPC cannot do. First thing's first, ARMv5 is only capable of doing multi loads/stores for word values only. You will use ldm for basic multi loading.
Example:
Code: ldm r1, {r3,r4,r5,r6}
This instruction will do the following...- Load word at r1 + 0 into r3
- Load word at r1 + 4 into r4
- Load word at r1 + 8 into r5
- Load word at r1 + 12 into r6
Obviously this differs quite a bit from a PPC lmw instruction. You will also notice the source registers are enclosed in curly brackets instead of regular squared brackets. This is required for any multi load/store instruction.
Instead of writing out every register in the source register list, you can do this...
Code: ldm r1, {r3 - r6} @shorter & quicker to write
You can also force an update to r1 after the multi load, like this....
Take NOTE of the Exclamation Point placed immediately after r1. After the words are loaded into r3 thru r6, r1 is then *incremented* by 16 (0x10). Incrementation is 4 bytes per every source register present in the instruction.
For basic multi storing, you use the stm instruction.- stm r1, {r3 - r6} @Stores r3 at r1, r4 at r1+4, r5 at r1+8, r6 at r1+12
- stm r1!, {r3 - r6} @Same as above but r1 is incremented by 16 afterwards
There are a variety of extra options for multi loading/storing. Here are 8 more multi load/store instrucitons....- ldmia #Load Multi with increase afterwards
- ldmib #Load Multi with increase before
- ldmda #Load Multi with decrease afterwards
- ldmdb #Load Multi with decrease before
- stmia #Store Multi with increase afterwards
- stmib #Store Multi with increase before
- stmda #Store Multi with decrease afterwards
- stmdb #Store Multi with decrease before
ldmia is actually a simplified/alternative mnemonic of ldm only if the destination register is *NOT* appended with a "!"
stmia is actually a simplified/alternative mnemonic of stm only if the destination register is *NOT* appended with a "!"
Examples:- ldm r1, {r3 - r6} = ldmia r1, {r3 - r6}
- stm r1, {r3 - r6} = stmia r1, {r3 - r6}
The term "increase" means the loading/storing address is increased during the instruction. "Afterwards" means the incrementation of the loading/storing address (by 4) starts *AFTER* the first load/store.
"Decrease afterwards" is the same as above but address's are decreasing instead of increasing
The term "before" means to increase/decease the loading/storing address by 4 *BEFORE* the first load/store
Examples:- ldmib r0, {r7 - r9} @r0 is increased by 4 *FIRST* before the first load. Thus, word at r0+4 is loaded into r7. Word at r0+8 loaded into r8. Word at r0+12, loaded into r9.
- stmda r0, {r7 - r9} @r9 is stored at r0. r8 is stored at r0-4. r7 is stored at r0-8.
- stmdb r0, {r7 - r9} @r0 is decreased by 4 *FIRST* before the first store. Thus, r9 stored at r0-4. r8 stored at r0-8. r7 stored at r0-12.
You can also have the destination register be updated in these increase/decrease before/after type multi load/stores.
Example:
Code: ldmdb r10!, {r0, r1} @Word at r10-4 loaded into r1. Word at r10-8 loaded into r0. Afterwards, r10's value is decreased by 8 (2 source registers x 4 = 8)
Another Example:
Code: stmia r7!, {r4 - r6} @r4 stored at r7. r5 stored at r7+4. r6 stored at r7+8. Afterwards, r7's value is increased by 12 (3 source registers x 4 = 12)
There are also alternative mnemonics available. Here's a list of them.- stmfd (store multiple full descending) = stmdb
- stmed (store multiple empty descending) = stmda
- stmfa (store multiple full ascending) = stmib
- stmea (store multiple empty ascending) = stmia
- ldmfd (load multiple full descending) = ldmia
- ldmed (load multiple empty descending) = ldmib
- ldmfa (load multiple full ascending) = ldmda
- ldmea (load multiple empty descending) = ldmdb
Chapter 11: Loops
There is no Count (CTR) register for Starlet. You must use a general purpose register as a loop tracker. Since load/store instructions come with a Post-Indexing feature, you do not need to decrement the load and/or store start addresses beforehand.
Example of Basic Loop:
Code: @Set loop amount
mov r2, #10
@Do the loop
loop:
ldr r1, [r0], #4
str r1, [r8], #4
subs r2, r2, #1
bne loop
The subs instruction will execute a basic sub (subtract) instruction but also update the CPSR flag bits. This is essentially the same mechanism as using the Record (.) shortcut in PPC instructions. Once r2 = 0, the bne branch will not be taken.
You can append many instructions with 's' to force the instruction to update the CPSR flag bits.
Chapter 12: Stack, Prologues, Epilogues
Pushing and popping the stack is a bit different than doing it on PPC. Plus, the layout/structure of the Stack is also different. There are dedicated push and pop mnemonics.
push = stmfd sp! = stmdb sp!
pop = ldmfd sp! = ldmia sp!
Here's an example push instruction that backups just the LR to a new stack frame (storing of PC is explained later)
It's better to look at this instruction in it's stmdb alternative mnemonic form to understand what's going on underneath the hood.
For starters let's pretend sp's value before the instruction = 0x0000C200. When the instruction executes, it will first temp decrease sp's value by 4. PC is stored at 0x0000C1FC. SP's value is temp decreased by 4 again. LR is stored at at 0x0000C1F8. SP's value is then actually decreased by 8, due to 2 source registers in the stmdb instruction. After the instruction has executed SP is now 0x0000C1F8.
PC is stored because Stack Frames in ARM must always have a size divisible by 8. Now to pop this new frame, we do the following...
Let's look at this pop instruction in it's ldmia alternative mnemonic form...
In this instruction word at sp+0 is loaded into lr. Then PC is loaded from sp+4. Afterwards, SP values is increased by 8. We are now 'back' to where we 'left off'.
Or are we? In fact, this is WRONG!!! We are loading PC's old value. The Program Counter keeps track of what's the next instruction that will execute.
By executing that above pop instruction, we would actually "branch" back to w/e instruction is present after the initial push instruction. Needless to say, your program/code will crash/fault because of this.
How do we fix this??? You don't have to fix the push instruction, just fix the pop instruction....
Code: pop {lr} @Grab back old LR, do not grab back the old PC!
add sp, sp, #4 @Need to add 4 to SP to compensate not loading the PC
Now that you understand basic pushing and popping, let's go over how prologues and epilogue are implemented in IOS. There are a plethora of methods (styles) to write prologues and epilogues in ARM. Regardless of the 'style' used, r4 thru r10 are the non-volatile registers aka the global variables. They are the equivalent of the r14 thru 31 for Broadway. What's different is that the lower registers are used first instead of the higher ones. For example, in PPC if you make a stack frame and need to use 2 non-volatile registers, you throw r30 and r31 on the stack. In ARMv5 its backwards, if you need two non-volatile registers, you throw r4 and r5 on the stack, **NOT** r9 and r10.
The majority of functions within IOS use the "Full Descending" type of stack pushing/popping. Which is perfect, because we can use the push and pop mnemonics.
In ARMv5 compliant prologues, r11 is used for what is called the Frame Pointer aka fp. It must point to the bottom of the current Stack Frame.
r12 is known as Inter-Procedural scratch register aka ip. It is used as a scratch register during prologues for the purpose of making a copy of sp immediately before a new frame (push/stmfd instruction) is created.
Example Prologue that's ARMv5 compliant (1 register being saved)
Code: mov ip, sp @Make a copy of soon-to-be old SP
push {r4, fp, ip, lr} @Backup 1 register, old fp, old sp (ip), and lr
sub fp, ip, #4 @Make fp (r11) point to bottom of the newly created frame
FP, IP, and LR (in that order) must always be pushed onto a new frame. At this point the Stack structure is as such...
SP+Offset | Item
SP | r4 (Top of new Frame)
SP+4 | old fp (will have address that points to bottom of old frame)
SP+8 | old sp aka old ip (will have address that points to top of old frame)
SP+0xC | function return lr
SP+0x10 | Top of Old Frame (where old SP is pointing to)
.. .. | Unknown Size
SP+?? | Bottom of Old Frame; old function return lr (where old FP is pointing to)
The current value in new fp (r11) at this state in time would be pointing at the address of SP + 0xC. Onto the epilogue...
And here's the responding epilogue~
Code: pop {r4, fp, ip, lr}
bx lr
If you don't need to ensure an instruction set change, then the epilogue can be changed to this...
pop {r4, fp, ip, pc}
For me personally, I think the whole idea of fp is redundant. As long as each frame contains it's "old sp", then back-chaining is guaranteed. PowerPC does this right. Here's how I would personally write prologues/epilogues of custom functions.
Example prologue saving 1 register~
Code: push {r4, sp, lr, pc} @PC stored for stack size rules
Epilogue~
Code: pop {r4, sp, lr}
add sp, sp, #4 @Compensate for not including PC in the pop
bx lr
When modifying IOS, be sure to follow the ARMv5 compliant style.
If you are in the situation where you need extra space allocated in the new frame for something such as an output buffer for a child function (i.e. sprintf), here's an example prologue (2 registers + 0x30 buffer space)...
Code: mov ip, sp
push {r4, r5, fp, ip, lr}
sub sp, sp, #0x34 @Add 0x34 of space *not* 0x30, because we pushed an odd amount of registers onto the frame which would violate stack frame sizing rules
sub fp, ip, #4
When ready to setup your output buffer, do this...
Code: mov rX, sp @rX = register being used for output buffer
Finally, here's the respective epilogue..
Code: add sp, sp, #0x34
pop {r4, r5, fp, ip, lr}
bx lr
Chapter 13: Exchanging between ARM and Thumb; Thumb Instruction Set
You have to use the bx, blx, or 'bx lr' instructions to switch between ARM mode and Thumb mode. Do not try to edit the Thumb mode bit in the CPSR. You will cause an exception.
Anyway, the Least Significant Bit (bit 0 in ARM manuals) in the target address of a bx/blx/'bx lr' determines which mode to run. If the LSB is high, Thumb mode will be activated. When it's low, ARM Mode will be activated. It's really that simple.
Since Thumb instructions are limited to 16-bits, there is a reduced instruction set. Read up on Chapter A6 of the ARM reference manual. Keep in mind that on chart A6.2.1, any instructions notated with "[2]" are ARMv6 only and do not work on Starlet.
Regarding immediate values, Thumb instructions only allow the basic unsigned range of 0x00 thru 0xFF (0 thru 256).
Chapter 14: Interrupts
To disable interrupts, do this...
mrs rX, cpsr
and rY, rX, #0xC0 @Keep rY somewhere safe
orr rX, rX, #0xC0
msr cpsr_c, rX
To restore them, do this...
mrs rX, cpsr
bic rX, rX, #0xC0
orr rX, rX, rY @rY can be scrapped now
msr cpsr_c, rX
NOTE: To forcefully enable interrupts (instead of restoration), simply remove the ORR instruction from the above restoration example.
Chapter 15: MEM1 Store/Load issues with Starlet
Read this...
https://twitter.com/marcan42/status/1362...47?lang=en
Marcan, the co-creator of HBC and Bootmii explains it perfectly. Thus, if you are doing any sort of memcpy,memset,memclear,etc you must ensure every piece of data stored/loaded to/from MEM1 to done as a word value. Not only that, the word value(s) must always be stored/loaded via an address that is divisible by 4.
Chapter 16: Cache, Address Translation, Self Modifying Code
Starlet comes with a single Cache Unit. It is split into a 16KB Instruction Cache and 16KB Data Cache. Also just like Broadway, Starlet uses Cache Blocks of 32-byte size. However, the Cache Blocks are referred to as "Modified Virtual Address", "Lines", or "Single-Entries".
The term "Modified Virtual Address" can be confusing. Basically there is a hardware register (FCSE PID) in Starlet that does another address translation on top of your typical Virtual Address to Physical translation.
Don't fret though, the configuration of the FCSE PID is as such to where the Modified Virtual Address and Virtual Address are always equivalent.
Regarding typical address translation, the kernel and all modules (except ES,FS,STM, and DI) use Identical translation (Virtual Address exactly maps to Physical).
The other modules use a translation scheme in which the Virtual Address is the Physical Address but with the Most Significant Bit set low. For example, physical address 0xFFFE0500 is represented virtually via 0x7FFE0500.
Going back to the Cache...Starlet doesn't use the MEI protocol, but its own protocol that is almost identical, just with a different naming system.
Clean (similar to Exclusive in Broadway, what's in Cached Memory is also in Physical/Real Memory)
Dirty (similar to Modified in Broadway, what's in Cached Memory hasn't been updated yet to Physical/Real Memory)
Invalid (just like Invalid in Broadway, Block will be casted out soon, can be tossed)
Starlet's cache 'algorithm' can be changed between two different settings. First setting is Psuedo Random, second setting is Round Robin. IOS uses the Psuedo Random setting.
Here's a list of Handy Cache Operations~
Invalidate both the entire ICache and DCache~
mcr p15, 0, rX, c7, c7, 0 @rX must be zero
Invalidate entire ICache~
mcr p15, 0, rX, c7, c5, 0 @rX must be zero
Invalidate ICache Line~
mcr p15, 0, rX, c7, c5, 1 @rX must be 32-byte aligned address!!!
Invalidate entire DCache~
mcr p15, 0, rX, c7, c6, 0 @rX must be zero
Invalidate DCache Line~
mcr p15, 0, rX, c7, c6, 1 @rX must be 32-byte aligned address!!!
Test and Clean Entire DCache~
loop:
mcr p15, 0, pc, c7, c10, 3
bne loop
Clean DCache Line~
mcr p15, 0, rX, c7, c10, 1 @rX must be 32-byte aligned address!!!
Test, Clean, and Invalidate entire DCache (aka Test then Flush)~
loop:
mcr p15, 0, pc, c7, c14, 3
bne loop
Clean and Invalidate DCache Line (aka Flush)~
mcr p15, 0, rX, c7, c14, 1 @rX must be 32-byte aligned address!!!
Prefetch ICache Line~
mcr p15, 0, rX, c7, c13, 1 @rX must be 32-byte aligned address!!!
The prefetch ICache operation is simply a cache hint for the Instruction fetcher. Regarding DCache hints, there is only the PLD (pre-load data). This provides the DCache with a hint for an upcoming Load instruction. It executes exactly like the PPC dcbt instruction. There is no version of PLD for Store Hints.
Unlike Broadway, there is no need for any special pre-flushing routine in regards to flushing the entire DCache. Just simple execute the instructions required and you're good to go.
---
Self Modifying Code is pretty simple. The following snippet shows how to overwrite a single instruction. Adjust source accordingly for multiple instruction rewrites.
Code: @Self Mod code
@rX = Address of Instruction
@Pretend rZ contains new instruction
@Write new instruction at rX
str rZ, [rX]
@Align address to 32-bytes; not required if address is already 32-byte aligned
bic rX, rX, #0x0000001F
@Clean Data Cache Block
mcr p15, 0, rX, c7, c10, 1
@Drain Write Buffer
mov rY, #0 @rY = a scrap register that is safe to use
mcr p15, 0, rY, c7, c10, 4
@Invalidate Instruction Cache Block
mcr p15, 0, rX, c7, c5, 1
@Drain Prefetch Buffer; **not** required if self modified instruction(s) is 5+ sequential instructions ahead
@Using any branch will force the prefetch buffer (in the instruction fetcher) to be drained
@Or else a IMB (Instruction Memory Barrier) instruction is required
b 0x4
Chapter 17: Details of the Main/Idle Loop, and Writing in Custom ARM Code
As mentioned earlier in Chapter 1, Starlet is running in a loop waiting for tasks to do. This loop is called thread 0 or also called the Idle Thread.
Locating this loop is simple. Since Starlet is in this loop 99.99% of the time when your game is running, we can easily find the "Base Thread Pointer". We use a simple equation to calculate this Pointer.
Thread Pointer = 0xFFFE0000 + (0xB0 * threadnumber)
The loop is known as Thread 0, so the equation is this...
0xFFFE0000 + (0xB0 * 0)
0xFFFE0000 + 0 = 0xFFFE0000
Now we have the Thread Pointer. Using that, we can find other important informaiton
Thread Pointer + 0 = CPSR
Thread Pointer + 0x3C = SP
Thread Pointer + 0x40 = PC (what we need)
We can use something such as my Memory Editor code (HERE) to view 0xFFFE0040 on MKWii to get the current PC addresses of Starlet using Broadway. Your game must be patched using this method (HERE) for you to view SRAM!!! Obviously, convert 0xFFFE0040 to its real SRAM address. Then finally convert it to the usable Broadway virtual address.
0xCD4E0040 #View this address on the Memory Editor Code
You will see the PC is constantly flickering between 3 different addresses. On cIOS249 (using IOS56 as base), these are the following addresses you will see...
- 0xFFFF0C6A
- 0xFFFF0C6C
- 0xFFFF0C6E
The 3 addresses you see on your screen may differ if your MKWii game is running on a different IOS. Anyway, this is the loop. It is executing the following ARM instructions..
Code: loop:
ldr r3, [r4, #0] @On cIOS249[56] this is loading the word value located at 0xFFFF9ECC
cmp r3, #0
beq loop
IMPORTANT NOTE: Starlet/IOS is in Thumb mode when executing this loop, hence the 0x2 increments to each address. It is also in System Mode (highest privileged mode possible)
Now onto the methods of writing in custom ARM Code for IOS to execute...
Method 1:
You could add in custom patches of code to an IOS and then run a game/channel/etc that uses said IOS. This can be cumbersome at times, as you would constantly need to repatch and then reinstall the IOS for testing new ARM Code.
Method 2:
Another method is to use Palapeli's /dev/sha exploit. It allows you to run a snippet of ARM code that without having to interact with IOS directly. Everything is initiated via Broadway. Link - https://github.com/TheLordScruffy/saoirs...ot.cpp#L67
Here is an Assembly Source of Palapeli's exploit that can be used as a C2 Gecko Code for MKWii.
Code: #Exploit created by Palapeli
#THIS WORKS!
#exploit contents
06000000 0000001C
4903468D 49034788
49036209 47080000
10100000 00000A00 #entry point physical here
FFFF0014 00000000
#Custom ARM code
06000a00 00000014
e3a00536 e59010e0
e3811002 e58010e0
e12fff1e 00000000
#Custom ARM test code; simply write the word value of 1 to 0x80000040
mov r0, #0x0D800000 @Set r0 upper bits to GPIO Starlet Address
ldr r1, [r0, #0xE0] @Load up GPIO
orr r1, r1, #0x0002 @Flip Shutdown Bit High
str r1, [r0, #0xE0] @Write new GPIO
bx lr @Return to exploit
#########
#PAL
#Hooked at Shared Item address - 807BA164
#Custom return codes
#0 (Green shell) = success
#1 (Red shell) = failed to open /dev/sha
#2 (Banana) = ios_ioctlv error
#Statements
.set ios_open, 0x801938f8
.set ios_ioctlv, 0x801945e0
.set ios_close, 0x80193ad8
.set entrypoint, 0xA00 #physical address for 0x80000A00, this is where a list of custom ARM instructinos will reside at that we want executed
#Push stack; no need to backup r0, r11, r12, CTR or LR
stwu sp, -0x0080 (sp)
stmw r4, 0x8 (sp) #r3 is the shared item, we are modifying this as a custom return code, dont push it on the stack
#Set r31 as 0x8019 for ios calls
lis r31, 0x8019
#Set r30 as 0x8000 for EVA work
lis r30, 0x8000
#r29 will be used for fd backup
#r28 used for custom return/error code backup
#Open sha via ios
bl open_sha
.string "/dev/sha"
.align 2
open_sha:
mflr r3
li r4, 0
ori r12, r31, ios_open@l
mtctr r12
bctrl
#backup fd
mr r29, r3
#check for errors
cmpwi r3, 0
li r3, 1
blt- the_end
#Setup register args for SHA_Init
mr r3, r29 #fd
li r4, 0 #Ioctl no
li r5, 1 #Amount of input buffers
li r6, 2 #Amount of output buffers
addi r7, r30, 0x1500 #Vector table/root
#Vector table/root is at 0x80001500
#layout of Vector
#0x0 = null
#0x4 = null
#0x8 = 0x7FFE0028
#0xC = null
#0x10 = null (physical address for 0x80000000); for cache safety
#0x14 = 0x4 (length; 32 bits)
#Fill in the Vector contents
li r0, 0
stw r0, 0 (r7)
stw r0, 0x4 (r7)
stw r0, 0xC (r7)
stw r0, 0x10 (r7)
lis r0, 0x7FFE #0xFFFE0028 + 0x80000000. Needed because the ioctlv function call does a subtraction of 0x80000000 from this address to convert it to physical. Obviously this the ioctlv is 'bad' code. Why not just set bit 0 low? Smh.
ori r0, r0, 0x0028
stw r0, 0x8 (r7)
li r0, 4
stw r0, 0x14 (r7)
#Call ios_ioctlv
ori r12, r31, ios_ioctlv@l
mtctr r12
bctrl
#check for errors
cmpwi r3, 0
li r28, 0
bge- close_ios
#Ioctlv failed, place return code of 2 in r28
li r28, 2
#IOS Close
close_ios:
mr r3, r29 #fd
ori r12, r31, ios_close@l
mtctr r12
bctrl
mr r3, r28 #I've never seen an error from closing IOS, so just place return code from IOS_Ioctlv as the final return code
#The end, if r3 = 0 success, -1 = failed oepning /dev/sha, -2 = failed ioctlv call
the_end:
lmw r4, 0x8 (sp) #Pop stack
addi sp, sp, 0x0080 #Recover r4 thru r31
stw r3, 0x0020 (r23) #Shared Item Code default instruction
It is currently hooked to the Shared Item Address and will return certain items based on the condition of certain IOS calls (in regards to the exploit working or not). Adjust the addresses to the IOS calls accordingly, they are currently configured for PAL MKWii.
What's also included is the Exploit contents itself that is packaged in an 06 String Write. This 06 Code must be present. The Exploit Contents are currently configured to run ARM 32-bit code that is present at 0x80000A00. Therefore, you will NEED to make another 06 String Write Gecko Code that contains ARM instructions at 0x800000A00. There is a provided example 06 String Write (at 0x80000A00) that does a demo of the exploit by shutting down the Console via Starlet/IOS.
The right-side word of the 2nd to the last line of the 06 String Write Exploit is the physical address (entry point) that you can alter if needed. It must be a physical address. In a nutshell the following Memory Addresses are used...
0x80000000 thru 0x8000001B #Location of Exploit ARM Contents, don't change this (other than the entry point if desired)
0x80000A00+ #Depends on the length of your custom ARM code; change this entry point if needed
0x80001500 thru 0x80001517 #Used as a temp space for IOS_Ioctlv usage, you should be familiar enough with PPC to change this if this happens to conflict with your other unrelated Codes
Important Note about the Exploit:
The exploit can only be ran once. This is because Thread 0 is switched to ARM Mode after the exploit has been ran. Therefore to run it beyond just once, you will need to rig something up to rewrite the Exploit's ARM contents to be in ARM mode next time the Exploit (code as a whole) is executed by Broadway again.
Chapter 18: Assembling ARMv5 with Devkit
You need to have DevkitPPC already installed and the environments set. Here is a Linux Debian based guide for that - https://mariokartwii.com/showthread.php?tid=1200
You can find various Windows and Ubuntu guides via Google. After you get that done, you need to install devkitARM. Here's the linux command for that...
sudo dkp-pacman -S devkitARM
Now that you have devkitARM installed, here is a guide to assemble ARMv5 assembly into a raw binary file.
Example of ARM assembly contents/instructions in a file called source.s:
1. Nativate to DevkitARM binutils
cd /where/you/installed/devkit/devkitpro/devkitARM/bin
2. Assemble the ARM instructions to object code. Your source file must have the ".s" extension.
./arm-none-eabi-as -march=armv5te -mcpu=arm926ej-s -mbig-endian /where/file/is/source.s -o /where/file/is/source.o
NOTE: To force Thumb mode only, add in "-mthumb"
3. Convert (strip) object code to raw binary file
./arm-none-eabi-objcopy -O binary /where/file/is/source.o /where/file/is/source.bin
Congrats, feel free to view the binary file on a Hex Editor to see the assembled instructions. If your source file has both ARM and Thumb instructions present, you will need to designate them via Assembler directives. Place ".arm" before your instructions in your source file to force the Assembler to assemble the instructions in ARM mode. Place ".thumb" before your Thumb instructions. As an alternative to providing "-mthumb" for the Step #2 command, you can instead slap a ".thumb" at the top of your source file.
Example showing source file using nop in both ARM and Thumb form:
The above will assemble the nop into its 32-bit form, then the second nop into its 16-bit form.
Another option at your discretion is to insert the architecture name and cpu name in your source file instead of having to type it out in Step #2.
Example:
Code: .arch armv5te
.cpu arm926ej-s
.arm
nop
.thumb
nop
For the above example, your Step #2 command would be...
./arm-none-eabi-as -mbig-endian /where/file/is/source.s -o /where/file/is/source.o
---
How to disassemble (output will appear in your terminal screen)~
./arm-none-eabi-objdump -b binary -m armv5te -D -EB /where/file/is/source.bin
NOTE: The above will disassemble instructions to their ARM 32-bit form. Therefore, Thumb instructions, if present, will not be disassembled correctly.
NOTE: To force thumb only disassembly of instructions, add "-M force-thumb". If present, ARM 32-bit instructions will not be disassembled correctly.
Chapter 19: Instruction Simulator, Links
In the ARM Reference manual, there are many instructions present that obviously won't work for Starlet. In the Instruction Set chapters/sections, if an Instruction has the note "Version 6 and above" it obviously can't be used on Starlet.
There aren't any division instructions available. You will need to use some trickery to mimic division instructions. Luckily, there are multiply based instructions.
- mla rD, rA, rB, rC @[(rA x rB) + rC] = rD
- mul rD, rA, rB @rA x rB = rC
There isn't a good Starlet Emulator for use unfortunately. However, there are some ARM instruction simulators out there. Google is your friend on this one. There are some web browser based ones if you don't want to install anything on your computer.
Here is a decent web browser ARM instruction simulator - https://cpulator.01xz.net/?sys=arm
It's designed for ARMv7-a. The downsides to this simulator is that is can only run in Little Endian using Physical Addressing only, and switching between ARM & Thumb is not supported.
Now that you have completed this tutorial, you should...
1. Read more in depth of the Manuals provided in Chapter 1
2. Try out some snippets of code in a Simulator
3. Read these WiiBrew articles for Starlet and IOS~
You can then start tinkering around in IOS and try out your modifications on a real Wii.
And lastly, this is a handy Co Processor reference. This was a 'snapshot' of various co processor registers using Palapeli's exploit on cIOS249[56]. Thus all info was 'snapshotted' using the IOS Kernel. Read up on the Starlet Manual to understand the details of this reference.
FCSE PID = Null (identical translation for VA to MVA)
TTBR = 0x13850000
Domain Access = Set to Client on all 16 Fields; therefore all Access Permissions for every mapped Memory Region is based on the AP bits of the related Page Table Entry
C1 Details:- L4 bit low (set T bit for Loads to PC)
- RR bit low (Round Robin algo disabled in Cache)
- V bit high (Exceptions map to 0xFFFF0000 thru 0xFFFF001C)
- I bit high (ICache enabled)
- R bit low (ROM protection low)
- S bit low (System protection low)
- B bit high (Big Endian)
- C bit high (DCache enabled)
- A bit high (Alignment Fault checks enabled for Data)
- M bit high (MMU on)
Cache Register Details:- Ctype = Write Back; Register 7 for Cleaning, Format C for Cache Lockdown
- S bit high (Use Harvard cache)
- DCache size = 16KB
- DCache is 4-way
- DCache M bit low (Cache present; this MUST always be set low for Starlet regardless of Cache usage)
- DCache block size is 8 words (32 bytes)
- ICache size = 16KB
- ICache is 4-way
- ICache M bit low (Cache present; this MUST always be set low for Starlet regardless of Cache usage)
- ICache block size is 8 words (32 bytes)
TCM Memories = Both DCTM and ICTM low (not present)
Data TCM Region = Disabled, size and address set to Null
Instruction TCM Region = Disabled, size and address set to Null
Chapter 20: Test Code; Conclusion
Using the simulator I linked in the previous chapter, here is a mock-up code you can step thru with. The code attempts to mimic a source that opens, reads, and closes a file via IOS calls.
Code: @Example snippet of code:
@1. open /dev/fs to allow us to open basic files on the NAND virtual filesystem
@2. open /shared2/sys/SYSCONF
@3. dump its contents to sram (0xFFFF1000)
@4. close the SYSCONF file
@Make fake SP address
mov sp, #0x00000A00
@Call source as a custom function
bl example_function
the_end:
nop @We will return here once source has been completed, if no errors were set
b the_end
@FUNCTION
@Make fake SP at 0x00000A00
example_function:
@Prologue, save 2 registers
push {r4, r5, sp, lr}
@Open /dev/fs
@Make basic lookup table, and backup its pointer
bl lookup_table
.asciz "/dev/fs"
.asciz "/shared2/sys/SYSCONF"
.align 2
lookup_table:
mov r5, lr
@Open /dev/fs
mov r0, lr
bl ios_open
cmp r0, #0
blt error_handler
@We don't need file descriptor from /dev/fs
@Open SYSCONF
add r0, r5, #8 @Point to /shared2/sys/SYSCONF in lookup table
mov r1, #1 @Read perms for IOS_Open
bl ios_open
cmp r0, #0
blt error_handler
mov r4, r0 @Backup fd
@Read SYSCONF
mov r1, #0xFF000000 @Set dump address to 0xFFFF1000
orr r1, r1, #0x00FF0000
orr r1, r1, #0x00001000
mov r2, #0x4000 @Set dump size to 0x4000 bytes
bl ios_read
cmp r0, #0x4000
bne error_handler
@Close SYSCONF
mov r0, r4 @fd
bl ios_close
cmp r0, #0
bne error_handler
@Epilogue
pop {r4, r5, sp, pc}
@Fake routine locations, so that you can run source on simlulator
error_handler:
nop @Will end here if any errors occur
b error_handler
ios_open:
mov r0, #1 @Edit this to negative number to replicate ios_open error
bx lr
ios_read:
mov r0, #0x4000 @Edit this to anything other than 0x4000 for ios_read error
bx lr
ios_close:
mov r0, #0 @0 for ios_close being successful
bx lr
Please note that a BL Trick was used instead of a Literal Pool so the Simulator wouldn't add any extra content to the source.
Happy Coding!
|
|
|
Load Directly into Solo TTs [vabold, stebler] |
Posted by: vabold - 09-19-2022, 08:13 PM - Forum: Time Trials & Battle
- No Replies
|
|
PAL
0400A978 38000001
C2530594 0000000A
901F0C0C 38000001
38600002 38800000
38A000WW 38C000XX
38E000YY 390000ZZ
981F0024 907F0B70
90DF0034 90FF0030
911F0B68 907F0B6C
3D808061 698CB420
7D8903A6 4E800421
60000000 00000000
C2634F40 00000002
38600124 907F0038
3800001F 00000000
NTSC-U
0400A938 38000001
C252BA4C 0000000A
901F0C0C 38000001
38600002 38800000
38A000WW 38C000XX
38E000YY 390000ZZ
981F0024 907F0B70
90DF0034 90FF0030
911F0B68 907F0B6C
3D808061 698CB420
7D8903A6 4E800421
60000000 00000000
C260408C 00000002
38600124 907F0038
3800001F 00000000
NTSC-J
0400A8D4 38000001
C252FF14 0000000A
901F0C0C 38000001
38600002 38800000
38A000WW 38C000XX
38E000YY 390000ZZ
981F0024 907F0B70
90DF0034 90FF0030
911F0B68 907F0B6C
3D808061 698CB420
7D8903A6 4E800421
60000000 00000000
C263468C 00000002
38600124 907F0038
3800001F 00000000
NTSC-K
0400AA80 38000001
C251E5EC 0000000A
901F0C0C 38000001
38600002 38800000
38A000WW 38C000XX
38E000YY 390000ZZ
981F0024 907F0B70
90DF0034 90FF0030
911F0B68 907F0B6C
3D808061 698CB420
7D8903A6 4E800421
60000000 00000000
C2623338 00000002
38600124 907F0038
3800001F 00000000
WW = 0, manual; 1, auto
XX = character ID
YY = vehicle ID
ZZ = course ID
|
|
|
|