PowerPC Tutorial

Previous Chapter

Chapter 15: Bit Rotation, Shifting, Clearing, etc

Section 1: Introduction

This chapter is going to be a big jump in skill level. Many beginners have a hard time understanding bit rotation when learning PowerPC. Outside of float, exception, and cache instructions, the rotation instructions may be the most difficult to learn.

Now that you know a bit more about bits (no pun intended), and how to use bit values within instructions, we can move onto something more advanced than logical operations.

PowerPC comes with 3 different bit-rotation instructions. They are..

These instructions are sort of like PPC's Swiss Army Knife. They can do many different tasks such as ANDing, fast multiplication, fast division, check for even/odd, check for negative value, check for alignment of an address, etc.

Knowing how these instructions operate and when to use them in your code is half the battle with PowerPC. Needless to say, this chapter is an important one, and it is a hefty one.

Because these 3 instructions can do an array of different tasks, they can be "broken up" into many simplified mnemonics. We will first go over some of the commonly used simplified mnemonics because this will teach you about clearing, and shifting. Afterwards, we will break down what each of the 3 instructions do from a standard mnemonic viewpoint.


Section 2: Bit Clearing

There is a simplified mnemonic of rlwinm for clearing bits. Clearing a bit simply means setting it to 0.

clrlwi is Clear Left Word Immediate. This instruction will clear the upper (left-hand) bits (starting at bit 0 going right) of rA, via the XX value, and the result is placed in rD.

clrrwi is Clear Right Word Immediate. This will clear the lower (right-hand) bits (starting at bit 31 going left) of rA, via the XX value, and place result in rD.

Let's say we have a value in r5 and that value is 0x1AAB000F. This in binary is...

0001 1010 1010 1011 0000 0000 0000 1111

If we execute the Clear Right instruction of "clrrwi r5, r5, 22", we will need to zero-out the lower/right-hand 22 bits of r5. The 10 upper/left-hand bits are left alone. r5's result in binary would now be..

0001 1010 1000 0000 0000 0000 0000 0000

The bits in blue were the bits that were cleared by the instruction. Convert the result back to hex, and r5 is now 0x1A800000. Notice how the 3rd digit of r5 went from A to 8. This is because bits 8 thru 11 went from 1010 (0xA) to 1000 (0x8).


Section 3: Bit Shifting

Clearing was easy enough, let's dive in bit shifting. Instead of zero-ing out bits, we will 'move/shift' them. We can shift them either left or right.

Here are two basic shifting instructions...

The XX value designates how many bits to shift by, XX can be anything from 1 to 31. There are also these shifting instructions..

It's the same thing as before but the amount to shift is in a source register instead of an immediate value.

Let's go over the execution of a srwi instruction. Let's say r5 has a value of 0x0000FE1F. This in binary is...

0000 0000 0000 0000 1111 1110 0001 1111

Take note of the purple '1'. If we execute the instruction of "srwi r5, r5, 15", r5's result in binary is...

0000 0000 0000 0000 0000 0000 0000 0001

r5's result is 0x00000001. Notice where the purple '1' is at now. It was moved/shifted to the right by 15 bits. For rightward shifts, whatever bits that went beyond bit 31 are thrown away. Same rule applies if you were shifting bits toward the left and they went beyond bit 0. Notice the bits in blue. These zero bits were placed in because bits on the lefthand side went missing due to the rightward shift. You can call the zero bits the "replacement bits". For slw, slwi, srw, and srwi, the replacement bits are always zero.

Here's a picture that gives you a better visual of what's going on for srwi...

---

Let's take a look an at example shifting left...

Example: "slwi r6, r6, 2" r6 starts off as 0x80000007. Binary form is...

1000 0000 0000 0000 0000 0000 0000 0111

After instruction is executed. The result in binary is...

0000 0000 0000 0000 0000 0000 0001 1100

Hex result is 0x0000001C. The red 1 is thrown away because it was shifted beyond bit 0. The three green 1's are shifted accordingly, and the blue bits are the replacement bits.

Here's a picture...

IMAGE

---

There's still another shifting instruction that is available to use. It is however not a simplified mnemonic of rlwinm, but it should be talked about here. It is a standard-mnemonic instruction.

srawi rD, rX, rXX

This is Shift Right Algebraic Word Immediate. It operates just like srwi, however when the register bits are shifted to the right, bit 0's value (before the shift) will be copied and used as the replacement bits into any new bit slots that were opened up on the left hand side due to the rightward shift. Confused? Let's over go an example:

srawi r0, r3, 3

Pretend r3 starts off as 0xFFFFFFF0. r3 in binary is...

1111 1111 1111 1111 1111 1111 1111 0000

Take note of bit 0 in green. It's value will be use as the 'copy value' for new bit slots that will be opened up by the rightward shift.

Now let's say we execute the instruction. r3 in binary form is...

1111 1111 1111 1111 1111 1111 1111 1110

Notice bits 0 thru 3 are in orange. These bits are the replacement bits and are copies of bit 0's value (what bit 0 was before the rightward shift).

r3 as a result = 0xFFFFFFFE. Confused? Here's a picture...

IMAGE

Let's go over this srawi instruction via GDB. Here's a picture of right before the srawi gets executed.

IMAGE

r3 is outlined in orange. We see it is currently 0xFFFFFFF0. Now let's execute the srawi instruction!

IMAGE

As you can see r3's result is 0xFFFFFFFE.


Section 4: Fast Multiplication & Division

Standard multiply and divide instructions are slow. They can degrade performance of a program. Therefore when possible, you should replace the multiplication and division instructions with shifting instructions instead.

Now this cannot be replaced for all multiply/divide instructions. It can only be done if and only if the multiplier/divisor is a power of 2.

Multiply Conversion (XXX is a power of 2):
mulli rD, rA, XXX = slwi rD, rA, BB

Divide (unsigned/logical) Conversion (rB's value is a power of 2):
divwu rD, rA, rB = srwi rD, rA, BB

Divide (signed) Conversion (rB's value is a power of 2):
divw rD, rA, rB = srawi rD, rA, BB

XXX/rB conversion to BB formula
2 = 1
4 = 2
8 = 3
16 = 4
32 = 5
64 = 6
128 = 7
256 = 8
etc etc...


Section 5: Bit Rotation; How Shifting and Rotation Differ

So bit rotation is similar to shifting except the bits that were meant to be tossed away are actually recycled. Confused? here's a basic picture of a leftward rotation...

IMAGE

Notice the arrow that connects the far left bit to the far right bit. No bits are tossed away. They are all kept.

PowerPC comes with following rotation simplified mnemonics...

Let's use a rotlw for an example. We have the following instruction..
rotlw r5, r7, r9

Before the instruction executes...
r7 = 0x7F7F7F00
r9 = 23

What will occur is that r7's value will be rotated left by 23 bits, and the post rotation result will be written to r5.

r7 in binary form is...
0111 1111 0111 1111 0111 1111 0000 0000

Each group of four is uniquely colored, take note.

Once the instruction has executed, r5 gets the result, which is 0xFEFE00FE. Let's examine r5 in Binary view..

1111 1110 1111 1110 0000 0000 1111 1110

The colors show us how the bits were rotated to come up with the new result.


Section 6: Disassembling rlwnm and rlwinm

Okay great, you understand how rotation works now, and you already know how Logical ANDing works (from previous Chapter). We can now go into exactly how rlwnm and rlwinm operate.

rlwnm rD, rA, rB, MB, ME
rlwinm rD, rA, SH, MB, ME

Lets go over rlwinm first since it uses an immediate value instead of a second source register. Should be a tad less difficult to learn. We have the following instruction...

rlwinm r0, r5, 3, 0, 28

So what do the values 3, 0, and 28 represent?

The 3 is the SH value. SH stands for shift tectonically, but this value is *NOT* used for shifting. This is very confusing of course as many PPC manuals will use the term "SH". Anyway, it's the amount of bits to ROTATE to the left. The bit rotation action is operated on the rA (r5) first.

The 0 is the MB value. It stands for Mask Beginning. It's the first high bit of the applied Mask.

The 28 is the ME value, It stands for Mask End. It's the last high bit of the applied Mask.

What is the applied Mask? It is a value of all high bits. Any bits not part of the mask are low.

MB of 0 and ME of 28 would be a mask of 1111 1111 1111 1111 1111 1111 1111 1000. In hex this is 0xFFFFFFF8. Notice how the first high bit starts at bit 0 (MB), and ends at bit 28 (ME). Bits 29, 30, and 31 are low because they are not included in the MB->ME range.

This mask value will be Logically AND'd with the rotated value. The result of this operation is the final result and what is written to rD (r0).

To recap our rlwinm instruction:
Step 1: Rotate r5 by 23
Step 2: Create a mask of high bits starting at bit 0 and ending at bit 28
Step 3: Logically AND Step1 result with Step 2's Mask.
Step 4: place Step 3 result into r0.

Sometimes its visually easier to write the mask in hex form instead of using decimal MB and ME values. Like this...

rlwinm r0, r5, 3, 0xFFFFFFF8 #The assembler can understand this!

Let's say we have the value 0x81234567 in r5, and we executed the rlwinm from above. What we do first is ROTATE all the bits of r5 leftward/counter-clockwise by 3.

Before rotation (each quad group of bits are color coded to help you visual the rotation)~

1000 0001 0010 0011 0100 0101 0110 0111

After rotation~

0000 1001 0001 1010 0010 1011 0011 1100

As you can see bit 0 is now bit 29! So this rotated value in hex is now 0x091A2B3C. We take this current temporary rotated value, and logically AND it with our Mask (0xFFFFFFF8).

The final result (r0) is 0x091A2B38.

Let's review this example via GDB. Here's the rlwinm instruction right before it gets executed....

IMAGE

We see r5 is outlined in blue and r0 is outlined in magenta. Let's execute the rlwinm instruction now...

IMAGE

As you can see, r0 (outlined in magenta) has the result of 0x091A2B28.

Regarding the rlwnm instruction (Rotate Left Word Then And w/ Mask) it's the same procedure as rlwinm except instead of using an immediate value for the initial rotation, that value resides in a second source register.


Section 7: Disassembling rlwimi

The 3rd standard mnemonic is rlwimi. Rotate Left Word Immediate Then Mask Insert. This still involves rotating the bits, but there is no logical AND'ing.

This instruction is going to be a bit difficult to explain as it even took me a while to figure this out, and there is no really good information anywhere on the net explaining this instruction in a 'beginner sense'. Let's look at an example instruction...

rlwimi r6, r4, 2, 0, 29

So you should already know that we will rotate the contents of r4 by 2 bits. Let's say BEFORE the rotation, r4 is this...

0x4455AA01

In binary that is....

0100 0100 0101 0101 1010 1010 0000 0001

After rotation, r4 is now...

0001 0001 0101 0110 1010 1000 0000 0101

Which in hex is 0x1156A805. We know that the mask (0, 29) is 0xFFFFFFFC. With the rlwimi instruction, there is NO logical ANDing. What we need to do is look at what is CURRENTLY in r6 (the Destination Register).

Let's say r6 is 0x0000FFFF. Binary form is...

0000 0000 0000 0000 1111 1111 1111 1111

With our 0xFFFFFFFC mask, if a bit in the mask is 1, then whatever bits are in our rotated r4 register will replace whatever bits are in r6. If a bit in the mask is 0, then those bits in r6 are preserved (not replaced by bits in r4!)

With all of that being said, that means r6's bits 30 & 31 do *NOT* change. And the bits 0 thru 29 of our rotated r4 data replaces bits 0 thru 29 of r6. r6 will thus equal (in binary)..

0001 0001 0101 0110 1010 1000 0000 0111

Which in hex is 0x1156A807. The bits in BLUE are the blues that were used/replaced by the rotated r4 value. The bits in RED were the bits that were preserved (not replaced).

Let's review this via GDB. Here's a pic of right before the rlwimi instruction gets executed...

IMAGE r4 is outlined in blue, r6 is outlined in red. Let's execute the rlwimi instruction now... IMAGE

As you can see, the result (r6) is 0x1156A807.

If you have a rlwimi instruction where the destination register is same as the source register (let's say r4). Then the Mask comparison is done w/ old r4 (BEFORE the initial rotation).


Section 8: Simplified Mnemonic Cheat Sheet

Name = Simplified Mnemonic = Standard Mnemonic

Extract & Left Justify Word Immediate = extlwi rD, rA, n, b (n > 0) = rlwinm rD, rA, b, 0, n - 1
Extract & Right Justify Word Immediate = extrwi rD, rA, n, b (n > 0) = rlwinm rD, rA, b + n, 32 - n, 31
Insert From Left Word Immediate = inslwi rD, rA, n, b (n > 0) = rlwimi rD, rA, 32 - b, b , (b + n) - 1
Insert From Right Word Immediate = insrwi rD, rA, n , b (n > 0) = rlwimi rD, rA, 32 - (b + n), b, (b + n) - 1
Rotate Left Word Immediate = rotlwi rD, rA, n = rlwinm rD, rA, n, 0, 31
Rotate Right Word Immediate = rotrwi, rD, rA, n = rlwinm, rD, rA, 32 - n, 0, 31
Rotate Word Left = rotlw rD, rA, rB = rlwnm rD, rA, rB, 0, 31
Shift Left Word Immediate = slwi rD, rA, n (n < 32) = rlwinm rD, rA, n, 0, 31 -n
Shift Right Word Immediate = srwi rD, rA, n (n < 32) = rlwinm rD, rA, 32 - n, n, 31
Clear Left Word Immediate = clrlwi rD, rA, n (n < 32) = rlwinm rD, rA, 0, n, 31
Clear Right Word Immediate = clrrwi rD, rA, n (n < 32) = rlwinm rD, rA, 0, 0, 31 - n
Clear Left And Shift Left Word Immediate = clrlslwi rD, rA, b, n, (n ≤ b ≤ 31) = rlwinm rD, rA, n, b - n, 31 - n

• Extract — Select a field of n bits starting at bit position b in the source register; left or right justify this field in the target register; clear all other bits of the target register.

• Insert — Select a left- or right-justified field of n bits in the source register; insert this field starting at bit position b of the target register; leave other bits of the target register unchanged.

• Rotate — Rotate the contents of a register right or left n bits without masking.

• Shift — Shift the contents of a register right or left n bits, clearing vacated bits (logical shift).

• Clear — Clear the leftmost or rightmost n bits of a register.

• Clear left and shift left — Clear the leftmost b bits of a register, then shift the register left by n bits. This operation can be used to scale a (known non-negative) array index by the width of an element.


Section 9: Some Examples

To clear a specific bit (i.e. clearing only bit 21 of r3)

rlwinm r3, r3, 0, 22, 20 #mask of 0xFFFFFBFF

To overwrite a single bit of the Destination Register using what is in the Source Register. (r3's bit 16 value overwrites r12's bit 16 value

rlwimi r12, r3, 0, 16, 16 #mask of 0x00008000

Regardless of what value bits 30 & 31 were beforehand. Flip them, and re-insert. Register in question will be r4 with using r0 as a scrap register

not r0, r4
rlwimi r4, r0, 0, 30, 31

To swap the upper and lower 16 bits (of r3)

rlwinm r3, r3, 16, 0, 31 #mask of 0xFFFFFFFF
simplified mnemonic: rotlwi r3, r3, 16 or rotrwi r3, r3, 16

To take a byte and copy it to another place in the same register (r3 used for example)
0x12345678 --> 0x12347878

rlwimi r3, r3, 8, 0x0000FF00
simplified mnemonic: insrwi r3, r3, 8, 16

To extract a byte and cut-paste it to another place in the register and all other data in the register is wiped
0x12345678 --> 0x00000034

rlwinm r3, r3, 16, 24, 31 #mask of 0x000000FF
simplified mnemonic: extrwi r3, r3, 8, 8

NOTE: PowerPC does not come with clrlw/clrrw (clear left word / clear right word) instructions.

To achieve a 'clear left word' (ex: clrlw rD, rA, rB), execute these two instructions...
slw rD, rA, rB
srw rD, rA (rD from the slw), rB

For a 'clear right word', you do the opposite...
srw rD, rA, rB
slw rD, rA (rD from the srw), rB


Section 10: Conclusion

Welp that was a mouthful. Some coders prefer to use the simplified mnemonics while some prefer to use the standards. Do whatever works best for you. For me personally, I use the standard mnemonics for any task except bit clearing, and bit shifting.


Next Chapter

Tutorial Index