PowerPC Tutorial

Previous Chapter

Chapter 30: Barriers, Broadcasting

Section 1: Components of a PowerPC System

Being familiar with the typical components of a PowerPC System may help you understand more complex Assembly work (Cache, Barriers, Broadcasting, etc). We will go over the basics of Barriers and Broadcasting along with some primitive examples.

We've talked about the Nintendo Wii in some of the past chapters. Its PowerPC system (Broadway) is a single processor that contains just one core. While this is a simple design to understand and learn on, other PowerPC systems may be more complex. A system can have a single processor (CPU) or multiple processors, and those processors can have multiple cores.

A core is the most basic agent in a system. The core is responsible for the actual execution of instructions (that is fetched from the Program's memory). Each Core contains...

A processor (also called CPU) can contain a single core or multiple cores. Therefore a program could be devised to have multiple cores run different parts of a Program simultaneously. This is known as multi-threading.

Here is a general diagram of a system that has two Processors (each called a Cluster), and each Processor contains two cores.

Each core has its own L1 Cache. And the two Cores of each Processor (aka Cluster) share a L2 Cache. Then the two Processors share a L3 Cache. Finally an agent known as the Bus connects the PowerPC System as a whole to Main (physical) Memory. Every PowerPC system will contain the Bus Agent and of course, Main Memory.

You can see how a Cache Miss would degrade performance. Having to move "out" all the way to the Bus to check Main Memory severely slows down a Program.

Because a PowerPC system can have multiple cores, its possible to have a case such as two different L1 Data Cache Units to possess the same cache block within their respective Cache unit but the states bits will differ. This is known as Cache incoherency. This can later screw up a Program, and weird exceptions or behavior can occur.

Another example of cache incoherency would be the L1 Data Cache and L2 Data Cache both possessing the same Cache Block, but once again, the state bits differ.


Section 2: Revisiting Self Modifying Code

To ensure cache coherency, a program may need to execute a broadcast. Broadcasting in nooby terms means a agent needs to "tell" another agent about a cache update. This could be PPC core to PPC core, or PPC core to an external device (i.e. eeprom chip, dma unit, etc). Broadcasting is done via the bus.

The following instructions....

..only broadcast based on bit 29 of a SPR known as HID0 (will broadcast if bit is high). However, dcbz and dcbz_l will always broadcast (regardless of the HID0 settings) if the virtual memory in question is M bit high. This M bit has to do with what is called the "WIMG" settings of BAT Registers/Page Tables. You will learn about BAT Registers and Page Tables in the next 2 chapters.

As an important fyi, the icbi instruction will NEVER broadcast. Revisiting our Self Modifying Code from the previous chapter, what would similar code look like if the Program was running on a multi processor/multi core system?

dcbst rD, rA
sync
icbi rD, rA
isync
sync

NOTE: isync is required NO MATTER WHAT even if the code (instructions) being modified is less than 5 sequential instructions of the responsible store instruction.

Compared to the basic Self Modifying code from the previous chapter, we see two more instructions added. Both are sync's. What does sync do?

The sync instruction has the ability to broadcast (via HID0 bit 29 bit). We'll assume that HID) bit 29 is high for our new self modifying code example. Because sync broadcasts, and icbi never broadcasts, a sync is placed after the icbi. The sync is placed after the isync because the purge of the IQ needs to occur asap.

Now you may ask why have a sync after dcbst? Isn't it possible for dcbst to broadcast? Yes that's correct, but there are "behind the scenes" tasks that need to update/notify other agents immediately after the dcbst. These behind the scenes tasks are also called Memory Accesses. What is a Memory Access? Are we talking about loads & stores? NO! we have not. First, as I've noticed some beginners assume this..., sync does *NOT* ensure stores reach physical memory. Second, Memory accesses are...


Section 3: Barriers

Since Broadcasting alone may not be enough, you may also need Barrier instructions. As mentioned in the previous chapters there are 3 PowerPC barrier instructions.

You've already briefly learned about isync and sync. eieio, in simple terms, prevents store gathering. If a PPC device is interacting with an I/O device, and there are back to back store instructions in the Program, an eieio instruction may be placed in between the stores like this..

stw
eieio #Prevent the two stw's from being gathered
stw

Sync also works for disabling store gathering, but eieio should be preferred as its less performance-degrading than sync. Eieio is also used for Page Table code which you will learn about in Chapter 32.

Sync should also be used when doing performance tests to ensure code that is undergoing testing is to not be effected by outside/other code. Many PowerPC CPUs come equipped with have a Performance Monitor (PM). The PM could be used to see if Code Snippet #1 is faster than Code Snippet #2. In short, a Program using the PM would look like this..

  1. Code setting up PM related registers
  2. sync
  3. Turn on PM
  4. Run Code Snippet #1 that is to be monitored
  5. sync
  6. Turn off PM, log results somewhere
  7. Repeat 1 thru 6 but step 4 will test Code Snippet #2.
  8. Compare results

isync is used for what is called context synchronization. What does this mean? It means anything to do with any assigned attribute to an instruction (i.e. privilege). The MSR contains a Privilege (PR) Bit (Bit 17). When this bit is low, you are in "Supervisor Mode". When this bit is high, you are in "User Mode". In Supervisor Mode, you can access all Registers. We will discuss this Bit more in detail in the Next Chapter. Here we have the following code....

add #Executing in user mode
stw #Executing in user mode
mfmsr rX #Executing in user mode
rlwinm rX, rX, 0, 18, 16 #Flip PR bit low
mtmsr rX #Set Supervisor MOde
lwz
lwz

A beginner would assume the last two lwz's are executing in Supervisor Mode. HOWEVER, they are NOT! Remember how the Instruction Fetcher works. It's very possible that one or both lwz's were fetched before the mtmsr has completed, meaning they are going to be executed in User Mode! They were fetched with the attribute/privilege of "User Mode". If these lwz's are loading data from some Supervisor-Protected memory, then an Exception would occur.

How do we resolve this issue? We place an isync after the mtmsr instruction. This will ensure instructions immediately after mtmsr will be Supervisor Mode.

add
stw
mfmsr rX
rlwinm rX, rX, 0, 18, 16 #Flip PR bit low
mtmsr rX
isync #Ensure below lwz's execute in Supervisor Mode
lwz #Executing in Supervisor Mode
lwz #Executing in Supervisor Mode

Alright, this should give you some beginner insight on Broadcasting and Barriers. There are more cases where Barrier instructions must be used, but those cases may differ depending on the specific PowerPC CPU in question. For such cases, it's best to refer to your PowerPC CPU's specific official manual.


Next Chapter

Tutorial Index