About Stack Frames
#1
About Stack Frames

For advanced ASM Coders.



Chapter 1. Intro

If you have been working on ASM codes for some time, you may have used one of the following sources below...

Code:
stwu sp, -0x50 (sp) #Push stack, make space for 18 registers
stmw r14, 0x8 (sp)

lmw r14, 0x8 (sp)
addi sp, sp, 0x50 #Pop stack


Code:
stwu sp, -0x80 (sp) #Push stack, make space for 29 registers
stmw r3, 0x8 (sp)

lmw r3, 0x8 (sp)
addi sp, sp, 0x80 #Pop stack


These sources are good methods to get extra registers to utilize for your ASM codes. However, it's better to create a custom stack frame instead of using one of the generic versions above. As you should only use as much stack space as your code requires. This will also involve teaching you how to backup Floating Point Registers to the stack frame if you ever run into a situation that calls for it. You will also learn how to use Stack Frames as space for input and/or output values of Function Calls within your Code(s) instead of relying on the Exception Vector Area.



Chapter 2. Key Instructions; Requirements

You will need to have a basic understanding of Function Calls. Read this tutorial HERE first if you are not familiar with calling functions in your ASM Codes.

You will also need to fully understand some key instructions that you may not have used before, or else none of the other chapters will make much sense.

Those instructions are...
  • stwu
  • stmw
  • stfd
  • lmw
  • lfd

stwu, stmw, and lmw instructions are briefly explained in the Simple ASM Reference Page -> https://mariokartwii.com/showthread.php?tid=863

stwu is explained more in-depth in this tutorial --> https://mariokartwii.com/showthread.php?tid=975
stfd & lfd is explained in this tutorial --> https://mariokartwii.com/showthread.php?tid=1744

The following will provide examples with pics of stmw and lmw if you are not completely familiar with those yet.

Example of stmw:
Code:
stmw r29, 0x00EC (r5)

r5 = 0x80001500
Word of r29 stored as 0x800015EC
Word of r30 stored at 0x800015F0
Word of r31 stored at 0x800015F4

Picture of right before stmw is executed. Instruction is highlighted in green. r5 is outlined in red. r29 thru 31 in blue. Spot where the 3 words will be stored at is outlined in Magenta -

[Image: stack01.png]

Picture of once stmw has executed. Store of r29 shown by red arrow. Store of r30 shown by blue arrow. Store of r31 shown by magenta arrow.

[Image: stack02.png]

Example of lmw:
Code:
lmw r14, 0x1000 (r3)

r3 = 0x80243DD0

r14 thru r31 will be loaded starting at the word located at 0x80244DD0.

Picture of right before lmw is executed. r3 outlined in red. r14 thru 31 outlined blue. Data that will be loaded outlined in magenta.

[Image: stack03.png]

Picture of once lmw has executed. Magenta arrow with the outlines shows the transfer of data from memory to the registers.

[Image: stack04.png]



Chapter 3. Key Elements

It's important to note that the 'pushing/popping' the stack method that you may have been using in your codes is technically not a 'correct' way for making a stack frame. However since C2 (Insert) ASM Gecko Codes are for what is known as "Inline Assembly", the push/pop stack method is perfectly valid. So essentially there's two different ways of creating Stack Frames. The "Inline" method/style (for your everyday Wii ASM Codes) and the Conventional method/style (what you will see most of the Wii Game's function use). The Conventional Method will be covered more in-depth in Chapter 11.

There are 3 key elements when taking about stack frames.
  • r1 aka sp (stack pointer)
  • Stack Frame
  • The Stack

The Stack Pointer (sp) is the register that holds the address which points to the current stack frame. The stack frame is an area of memory that holds values of registers that need to be preserved throughout function call(s). The value in sp is always a static memory address (mem80) and the memory address used in sp must be end in 0 or 8.

The Stack itself is the collection of the current and previous stack frames. When new stack frames are created, the Stack grows toward LOWER memory addresses. So visually in memory, the stack grows up. Thus, sp always points to the very 'top' of the entire Stack.

Since the stack grows toward lower memory addresses, anything before sp's current value in memory is free space to use. This free space is known as the stack's Negative Space (more on this on Chapter 9). Technically, there is a large upper bound limit to this, but it's not a concern for what you will be learning in this tutorial.

Rules of Stack Frames~
  • Stack Frames must have a minimum size of 0x10 bytes.
  • Stack Frame size must be divisible by 0x8.
  • Stack Frames has a particular layout that must be followed

Note that these rules are not prohibited by hardware. However they should be followed for "proper etiquette".

Stack Layout using an example stack frame with only one register being backed up (r31), and with sp being the value of 0x80350140:

Mem Address, Offset to Sp, Description
0x80350140, (0x0), Top of Current Stack Frame, previous value of sp (pointer to previous frame) goes here
0x80350144, (0x4), Reserved for Link Register***
0x80350148, (0x8), Padding
0x8035014C, (0xC), r31 (Register storage)
0x80350150, (0x10, Previous Stack Frame

r31 (Register Storage) and Padding can be swapped in position within the the layout if desired. Most stack frames you see created for codes on MarioKartWii.com have the padding after the register storage.

Since the top of the current stack frame (first word value at offset 0x0) always contains the previous/former value of sp, it will contain the value of 0x80350150 in our above example (word value of 0x80350150 is at mem address 0x80350140). Therefore, you can keep following the Stack visually down in memory to see all the previous Stack Frames of the entire Stack. This is known as 'Stack Tracing'.

***This space of 4 bytes (offset 0x4) must be NOT tampered when you create your own custom Stack Frames. More details covered in Chapter 10.

The picture below is the game being paused right after we finished a prologue of a particular function. The size of the Stack Frame is the bare minimum 0x10 bytes and r31 is the only register (excluding the LR) being saved. We can tell the size of this Frame by looking the the stwu instruction's SIMM of -0x10 and simply making that number positive to 0x10. There's other proper methods to determining a Stack Frame's size as reading an stwu instruction is unreliable, but that will be explained later.

The red arrow points to the current stack frame (top of the Stack). The entire Stack Frame is outlined in blue. The backed up r31 is outlined in magenta.

[Image: stack11.png]



Chapter 4. Creating Stack Frames (Inline Style for Gecko ASM Codes)

Let's take another look at the classic 'push/pop' the Stack Method for backing up the Global Variable Registers.

Code:
stwu sp, -0x50 (sp)
stmw r14, 0x8 (sp)

...

lmw r14, 0x8 (sp)
addi sp, sp, 0x50


To create a new stack frame, you first need to backup sp's current value. Based on the size of the new Frame you want to make, you need to store sp's value to a spot at the Stack's Negative Stack Space. You also need to update sp to decrement its value based on what's currently in sp and what the size of your new Frame will be. This can all be accomplished with just one stwu instruction.

Code:
stwu sp, -0x50 (sp) #Backup sp, decrement it to new value based on size of new Frame. Store it to Negative Space also based on the Frame size.

Confused? Let's look at the following pictures.

Right before the stwu instruction gets executed (sp outlined in red pointing to current frame that will become the previous frame once the stwu gets executed)...

[Image: stack05.png]

Here's the picture of once the stwu instruction has executed...

- sp outlined in red pointing to new Frame
- old Frame outlined in blue (notice how the value contained at the address in sp points to where the start of the older Frame is at)

[Image: stack06.png]

Here's a basic formula to know any current Stack Frame's size:
(Sp's old value - Sp's current value) = Size of Stack Frame

Sp's old value is always the first word value of what's in the current Stack Frame.

At this point, the new frame has already been created. The new frame has a size of 0x50 bytes. Now we need an instruction that will store the GVR's to our new frame...

Code:
stmw r14, 0x8 (sp)

This will store r14 thru r31 all onto the stack frame via just one instruction. The store occurs with an offset of 0x8 in reference to sp because at 0x4 is the LR reserved spot (more on the LR reserved spot in Chapter 10)

The following picture shows you what the stack now looks like once the stmw instruction has executed and everything is saved onto the new Frame. Magenta arrow and outlines give you a better visual.

[Image: stack07.png]

Also here's a basic keymap to go along with the above picture.

Mem Address, Offset, Register
80398CF0, 0x8, r14
80398CF4, 0xC, r15
80398CF8, 0x10, r16
80398D00, 0x14, r17
... ...
80398D30, 0x48, r30
80398D34, 0x4C, r31

There was no padding needed at the end. r14 thru r31 fitted into the frame perfectly.



Chapter 5. Destroying Stack Frames (Inline Style for Gecko ASM Codes)

Destroying Frames (also known as popping) is a bit more simpler than creating them. Referring back to Chapter 4, let's look at the following instruction...

Code:
lmw r14, 0x8 (sp) #Restore all the registers from the Frame

Getting our registers (r14 thru r31) is easy to do with the lmw instruction.

Pic of before lmw is executed. You will see I wrote some li instructions above the lmw instruction to edit the GVR's so you can get a better visual for the picture after this one...

[Image: stack08.png]

Pic of after lmw has executed. Magenta arrow and outlines shows the data transfer from memory to the registers...

[Image: stack09.png]

Ok great, we got the Registers back. Now we need sp to be it's previous value so it can point to the older Frame. A simple addi instruction will do the trick.

Code:
addi sp, sp, 0x50 #Pop the current Frame, sp will now point to Previous Frame

Remember when I said the Stack grows toward LOWER addresses? Well obviously a decreasing Stack would recede toward Higher Addresses. This is why we use the addi instruction. It will increase sp's value so we can have it point to the previous Frame. We add sp by 0x50 because the Stack Frame that was most recently created was the size of 0x50 bytes.

Pic of after addi has executed (sp outlined in red which now points to 'old' frame; 'old' frame is now current frame)...

[Image: stack10.png]

And all done! Now that you understand have Frames are Created and Destroyed. Let's go over creating a custom stack frame.



Chapter 6. Custom Frame Sizing

Here's the basic formula of calculating the size you need based on how many GPR's you want to store (FPR's will be covered later).

Number of GPR's x 4 = Sub-Total 1
Sub Total 1 + 8 = Sub Total 2
Round Sub-Total 2 up to be divisible by 0x10.


Let's say we are making a new code and we need 5 extra registers for use. You start with r31 on your count and work towards lower register numbers til you hit the count of 5. Like this..

r31 = 1 free register
r30 = 2 free registers
r29 = 3
r28 = 4
r27 = 5

This will tell us what will be the Destination Register in our stmw instruction. Use the basic formula to calculate the Stack Size for 5 GPR's

5 x 4 = 20 (0x18 hex)
Round 0x18 up to 0x20 so it's divisible by 0x10.

Our minimum required Stack Size is 0x20. We now know what two instructions to use to create our custom Frame.

Code:
stwu sp, -0x20 (sp) #Make new Frame with a size of 0x20
stmw r27, 0x8 (sp) #Backup 5 GPR's to new Frame

And to destory (pop) the frame, these are the two instructions...

Code:
lmw r27, 0x8 (sp) #Restore the 5 GRP's from the Frame
addi sp, sp, 0x20 #Destroy (pop) the Frame



Chapter 7. Storing Floats to a Stack Frame

Storing floats to a stack frame requires special procedures. Without going into endless technical detail, each FPR will take up 16 bytes of memory when storing it to a stack frame since each FPR's value has to be stored twice. Don't worry about attempting to understand what is going on 'under the hood', I will provide an easy to follow template that you can simply 'plug & use'. If you actually want to know what goes on under the hood, read the Paired Single Tutorial HERE after completing this tutorial.

Let's say you wanna just store 2 floating point registers (f1 and f2) to your stack frame. Since each FPR will take 16 bytes, lets run the calculations~

16 x 2 = 32 (0x20 in hex) #2 fprs will take up 32 bytes.

0x20 + 0x8 = 0x28

Round 0x28 up to 0x30 for aligned frame size.


Here's how we would create the stack frame, then store 2 FPRs to it...

Code:
stwu sp, -0x30 (sp)
stfd f1, 0x8 (sp) #Store f1 in double precision mode
psq_st f1, 0x10 (sp) 0, 0 #Store f1 in paired single mode
stfd f2, 0x18 (sp) #Store f2 in double precision mode
psq_st f2, 0x20 (sp) 0, 0 #Store f2 in paired single mode

At this point you are wondering what the hell does 'psq_st' mean, and what the hell is the '0, 0' part of the instructions. Do not worry about it, follow this template for storing an FPR in paired single mode...

Code:
psq_st fD, SIMM (sp), 0, 0

Replace fD will whatever FPR you are utilizing and adjust SIMM accordingly based on where you are storing in reference to the new Frame.

Alright let's say we are at the end of our code, recover the 2 floats...

Code:
psq_l f2, 0x20 (sp), 0, 0
lfd f2, 0x18 (sp)
psq_l f1, 0x10 (sp), 0, 0
lfd f1, 0x8 (sp)
addi sp, sp, 0x30

Follow this template for loading an FPR from the Frame in paired single mode...

Code:
psq_l fD, SIMM (sp) 0, 0

Adjust fD and SIMM accordingly to your needs.

This is a pain in the ass, but it's necessary. Float values could be 'corrupted' if the above special procedures aren't followed.



Chapter 8. Other Uses for Stack Space

Other than storing GPR's and FPR's to a Stack Frame, you can also store items such as input and/or output values from Function Calls that will be called in your ASM Code.

The function known as sprintf takes a string that's in C format and parses it so certain data values are placed into the string. The first argument for sprintf is where the parsed String will be dumped to. A great place for this would be the Stack Frame. No need to take up space in the Exception Vector Area.

Example:
Code:
addi r3, sp, 8 #Make the parsed string dump to the very start of our Stack Frame. Let's pretend all other sprintf args are already set in their respective registers.
lis r12, 0x8001 #PAL sprintf function address
ori r12, r12, 0x1A2C
mtctr r12
bctrl #Call it

For more details about sprintf, view this thread HERE.

Once the function has completed and no errors have occurred, the output for sprintf will be at the start of the Frame. You will need to know how much space in your Stack Frame to allocate for this. Be sure to make a Stack Frame that will have enough space!



Chapter 9. Negative Stack Space Tricks

I've mentioned earlier that the negative area of the Stack Pointer is safe to use. If your code doesn't include any function calls, you can simply store register values to the negative area w/o creating a new Stack Frame. This is because prologues of Function Calls will create new frames, and whatever is in the negative Stack area will be overwritten.

Example where you need 3 free registers, so you backup r29, r30, & r31~

Code:
stwm r29, -0xC (sp) #r29 stores at -0xC in reference to sp, r30 at -0x8, r31 at -0x4

...

lmw r29, -0xC (sp)

Another handy trick for using negative stack space is for Float to Integer conversions.

Let's say you have a float in f1, you wanna convert it to an integer back into f1. Then finally have that integer value loaded into r10. Here's how you can do that via Negative Stack Space...

Code:
fctiw f1, f1 #Convert float to integer, standard rounding
stfd f1, -0x8 (sp) #Store result to negative space, integer result is in lower 32 bits of the double-word
lwz r10, -0x4 (sp) #Load integer into r10



Chapter 10. Explaining the purpose of the reserved LR Spot

As mentioned earlier in Chapter 3 in regards to the Stack Layout, you cannot use the word spot that is at sp+0x4. This is because when another function calls gets executed, its prologue includes two instructions that will backup the LR into r0 then store r0 onto the LR reserved spot of the Previous Frame!

Confused?

Let's look at a picture we have looked at before. The pic that shows us right after completing a prologue of a function in the game. Where the stack size was the bare minimum 0x10.

[Image: stack12.png]

r1 (sp) in the registers tab is outlined in Red. The value in r1 points to the start of the new frame. In the Memory Viewer, you can see the start of the new frame (what r1 points to) outlined is also outline in Red.

In the registers tab, the Link Register is outlined in blue. You can see in the Code View, the instructions that place LR's value in r0 then store it to the LR reserved spot of the PREVIOUS/OLD frame is also outlined in blue. The LR reserved spot itself (of the previous/old frame) in the Memory Viewer is outlined in blue as well.

As you can see the saved LR is always placed in the LR reserve spot of the previous/old frame. Now that you understand the LR reserved spot, we can cover how to push and pop the Stack via the Conventional Method.



Chapter 11. Creating + Destroying Stack Frames (Conventional Method)

You could, as an option, push and pop the Stack via the conventional method, instead of the "Inline" method. In reality, it's never really needed for Gecko ASM Codes, because 99% of the time r11 and r12 are safe to write to. If you need to backup something such as the LR, you would just write it to r12 before the initial stack push like so...

Code:
#Make Stack to backup 2 registers, also need to backup LR
mflr r12
stwu sp, -0x10 (sp)
stmw r30, 0x8 (sp)

...

#Pop it
lmw r30, 0x8 (sp)
addi sp, sp, 0x10
mtlr r12

The reason why this "hacky" method of saving LR (and other registers such as r0 and CTR) is preferred is that it can reduce the size of the Stack and it reduces the amount of instructions required. However, if you want to still use the conventional way, here's how...

Conventional Push Method (aka Prologue of a function)~
Code:
#Make Stack to backup 2 registers, use conventional method to backup LR
stwu sp, -0x10 (sp) #Make new frame
mflr r0 #Temp place LR into r0; r0 in normal operation is used as a scrap register
stw r0, 0x0014 (sp) #Store LR to its reserved spot in previous Frame
stmw r30, 0x8 (sp) #Backup 2 registers

Conventional Pop Method (aka Epilogue of a function)~
Code:
#Pop it
lmw r30, 0x8 (sp) #Recover registers
lwz r0, 0x0014 (sp) #Temp place old LR into r0
mtlr r0 #Recover old LR
addi sp, sp, 0x10 #Destroy frame, make sp point to old frame

Here's an easy formula/guide to always remember the proper 'offset' to use when storing the LR.

1. Take the offset value used in the stwu and make it positive (example: -0x10 to 0x10)
2. Now simply add 4 (0x10 + 4 = 0x14)

Super simple and obviously you would use that calculated offset when retrieving the LR during the pop.

As you can see this takes more instructions to execute compared to the Inline/Hacky method.

Happy coding!
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)