ASM Tips n Trix
This thread will be a list of mini-guides/tips to help shorten or optimize your ASM codes. This is tailored towards a Coder who has recently started learning ASM.
I. Using Offset Values to complete Memory Addresses
Let's say we want to load the word from memory address 0x80001650. A beginner might write the following instructions....
Code:
lis r12, 0x8000
ori r12, r12, 0x1650
lwz r11, 0 (r12)
This is not completely optimized. The use of the ori instruction is unnecessary. We can shorten this...
Code:
lis r12, 0x8000
lwz r11, 0x1650 (r12)
As you can see we have shortened the source. Now let's go over a case where you need a write a load/store instruction, but your Offset Value (SIMM) will exceed the 16-bit signed range (0xFFFF8000 thru 0x7FFF). We have the following source...
Code:
#Load word value from 0x8028CF08
lis r12, 0x8028 #Set the upper bits
ori r12, r12, 0xCF08 #Set lower bits too or else we will exceed the 16-bit signed range
lwz r11, 0 (r12) #Load word into r11
Here's a simple trick to do if your offset value needs to be 0x8000 or higher:
Code:
#Load word value from 0x8028CF08
lis r12, 0x8029 #Add one to your upper 16 bit original value (0x8028 + 1)
lwz r11, 0xFFFFCF08 (r12) #Simply pre-pend the offset value with 0xFFFF. This is known as 'sign-extending'.
II. 'Register into a Register'
Let's say we have the following instruction...
Code:
lwz r11, 0x00AC (r12)
However, after this instruction, let's pretend we are no longer obligated to use r12. Well then there's no need to waste the use of r11. Especially, if we need that register for a different instruction later. Therefore you should do this instead...
Code:
lwz r12, 0x00AC (r12)
III. Using a singular lis instruction for multiple loading/storing
Let's say we have the following instructions...
Code:
lis r12, 0x8000
lwz r11, 0x1500 (r12)
lis r10, 0x8000
lwz r9, 0x1800 (r10)
We have a redundant instruction. We are executing essentially the same lis instruction for two different registers. Do this instead...
Code:
lis r12, 0x8000
lwz r11, 0x1500 (r12)
lwz r9, 0x1800 (r12)
Now we've saved the use of r10.
IV. Optimizing Codes made by Read Breakpoints
Let's say you did a Memory Read Breakpoint and you end up with the following default instruction...
Code:
lwz r5, 0x1778 (r30)
And you want to change the value of r5. A beginner coder might write something like this...
Code:
li r5, 0xC #Custom r5 value
stw r5, 0x1778 (r30) #Make sure new r5 value is in memory
lwz r5, 0x1778 (r30) #Default Instruction
This is redundant. There's no need to take our new r5 value, store it to memory, and then immediately load it back from memory. Remove both the stw and lwz instructions. You are left with this...
Code:
li r5, 0xC
In some cases, a code may require that the value in the register must also be in memory. If that is the case, you will write the source like this..
Code:
li r5, 0xC
stw r5, 0x1778 (r30)
There's still no need to have the default instruction.
V. Optimizing Branch Routes
We have the following list of instructions...
Code:
cmpwi r21, 0x1
beq- the_label
b finish_code
the_label:
li r28, 0x14
finish_code:
stb r28, 0x2 (r30)
This is not fully optimized branch routing. There's no need to have two label names, you can do this instead...
Code:
cmpwi r21, 0x1
bne+ finish_code
li r28, 0x14
finish_code:
stb r28, 0x2 (r30)
As you can see, if r21 is equal to one, it will continue down to the li instruction. This is more efficient that making two whole separate branch labels/routes.
VI. Avoiding Pushing/Popping the Stack
What some beginner coders will do (when needing extra registers in a code) is use the method of 'pushing/popping' the stack. Info for this is HERE. This will cause any code to naturally have more lines of compiled code. It is nice to have free registers, but if you are wanting to cut down the length of code, you should avoid the push/pop stack method.
We know r11 and r12 are always free for use without restoration (99% of the time). You can also use a volatile register (r3 thru r10), and restore their original values at then end of your code. However, finding a volatile register to have the same value every time the ASM instruction is executed (test this via a breakpoint over and over again), is actually rare.
Instead, you can use more registers (without restoring their original values), by looking ahead at further ASM instructions in comparison to your code's address. For example...let's say we have a code address of 0x80456000, and we have the following addresses plus ASM instructions.
Code:
0x80456000 lwz r4, 0 (r5) #Default Instruction, Address of Code
0x80456004 add r23, r6, r9
0x80456008 mflr r0
0x8045600C cmpwi r31, 0x1
If you have an address that has a loading type instruction (lwz, lhz etc) as the default instruction, and you are able to have the default instruction at the end of the source, you can use r4 (for our example). r4 is free w/o restoration because it will get written to anyway.
r23 is also free, because it will get written to later. Same with r0. Obviously, we can't use r5, r6, r9, r31, because they are being used as variables for the other instructions. So using them even with restoring their original values is really not safe.
So with the instructions listed above, our list of free registers would be r0, r4, r11, r12, and r23. Which will most likely be enough to not have to push/pop the stack.
VII. Optimizing conditions with the Record (dot) Shortcut
We have the following source...
Code:
lwz r5, 0x1AA8 (r31)
add r6, r6, r5
cmpwi r6, 0x0
bne+ some_label
Certain ASM instructions can have a dot (.) added to them. This is known as 'Record'. Record is a shortcut for cmpwi rD, 0. D = whatever register you are using for the comparison. Please not that there's no way I can list all the instruction that do or do not have the Record shortcut option. Refer to an actual ASM handbook/reference for assistance.
The add instruction has the ability to equip this Record feature. Like this...
Code:
lwz r5, 0x1AA8 (r31)
add. r6, r6, r5 #Notice the dot appended to add
bne+ some_label