PowerPC Page Tables Tutorial
#1
PowerPC Page Tables Tutorial

Works for most 32-bit PPC chips including Broadway

Requirements:
I'm making this tutorial due to a recent uptick in conversation on Discord PPC-related servers about how to set these things up. There appears to be some confusion among some ppl. This tutorial should clear things up.

Under normal circumstances the BAT registers are enough to map out all the memory you need. However if that is not the case, then you will need to add in a Page Table.



Chapter 1: Fundamentals

What is a Page Table?

A page table is a region of memory that contains blocks/sections of data of what is called Page Table Entry Groups (PTEG). Each PTEG contain 8 Page Table Entries (PTE). Every PTEG address is 64-byte aligned. Each PTE within a PTEG is 64-bits in length (double-word) and will contain the necessary information that is required for a proper address translation (such as the upper bits of the physical address equivalent and WIMG bits)

PTE bit breakdown~
Upper 32-bits
  • bit 0 = V aka Valid bit
  • bits 1 thru 24 = VSID
  • bit 25 = H (Hash Value 2 Used)
  • bits 26 thru 31 = API (Bits 4 thru 9 of the Virtual Effective Address)

Lower 32-bits
  • bits 0 thru 19 = RPN aka Real Page Number. It's Upper 20 bits of Physical Address to be used in the Translation
  • bits 20 thru 22 = Reserved (Must be Null)
  • bit 23 = R
  • bit 24 = C
  • bits 25 thru 28 = WIMG (what you would use in BATs)
  • bit 29 = Reserved (Must be Null)
  • bits 30 & 31 = PP (what you would use in BATs)

V (Valid) bit is simple enough to understand. If the PTE is invalid, it won't be used by Broadway. Broadway will try to look for another valid PTE. 

VSID (Virtual Segment ID) is a randomly generated identifier used as an input to calculate what is called Hash Value 1 (more on this in Chapter 6)

H (Hash Value 2) is when a 2nd hash (Hash Value 1 failed) had to be computed. more on this in Chapter 6.

R (Referenced) and C (Changed) are bits that get updated by Broadway to keep history information of the PTE, you do not need to worry about how Broadway updates these

WIMG and PP bits are what they would be in BAT Registers (write-through, cache-inhibit, memory-coherence, guarded)

--------

Special purpose registers known as Segment Registers contain the VSIDs. Permission related bits are also present and will change the meaning of the PP bits in a PTE. There are 16 segment registers (sr0 thru sr15).

SR bit breakdown~
  • bit 0 = Must be 0 or else a DSI exception will occur. Broadway does not support Direct-Store segments.
  • bit 1 = Supervisor protection bit (if set PP, 00 and 01 act as they usually do. If not set, PP 00 and 01 will be read & write)
  • bit 2 = User protection bit (if set PP, 00 and 01 act as they usually do. If not set, PP 00 and 01 will be read & write)
  • bit 3 = No execute bit (instructions in memory will not be executed)
  • bits 4 thru 7 = Reserved (Must be Null)
  • bits 8 thru 31 = VSID

If multiple SR's are to be used, then each SR must have a unique randomly generated VSID. You can have software generate these from calling some rand function, or have them predefined (generated by a third party) in a source.

To break it down very generally, address translation occurs as such....

  1. Portions of the Effective (Virtual) Address are broken apart
  2. Based on the value of the certain portions, a select SR will be used.
  3. The SR is then broken into portions
  4. SR Portions are used with what's in SDR1 Register to generate some hashes
  5. Special hashes will calculate the PTEG Physical address (where to navigate within the Page Table)
  6. Each PTEG has 8 PTEs. They are all checked to see if a Valid one exists
  7. Valid PTE contains the information on how to translate plus the memory properties (WIMG + PP bits)
  8. Translation (physical address that will be used) is preformed by taking the RPN bits in the found PTE and concatenating that with the lower 12 bits of the original Effective Address

Here is a chart of minimum recommended attributes for all allowed Page Table sizes~

[Image: pagetable01.png]

Page Tables cannot cover less than 8MB or more than 4Gbyte of memory.

Each PTE covers translation for 4 KB of memory. For a chunk of memory that is 16Mbytes in size, it would require 4,096 PTE's. Since each PTE is 8 bytes in size. That would mean 32Kbytes of memory are required to be allocated for the Page Table as a whole. However due to collision possibilities, you would need at least 4 times this amount. In conclusion to cover 16Mbytes of total memory, you would need to allocate 128Kbytes of memory for the page table.

SDR1 is a special purpose register that contains the very start address of the entire Page Table and the input values for the special hashes that are used to calculate the PTEG address.

SDR1 bit breakdown~
  • Bits 0 thru 15 = HTABORG
  • Bits 16 thru 22 = Reserved (must remain null)
  • Bits 23 thru 31 = HTABMASK

HTABORG is the physical address of where the very start of the Page Table resides at. In the above chart, the x's are don't care bit values. Meaning they have no restrictions on what they can be when setting the physical start address. The larger the covered Memory Size, the more right-justified zero bits are required in the physical start address.

Bits 7 thru 15 within HTABORG is known as the "Maskable Bits". Meaning however many zeroes were required is the amount of high/one bits are required to be set in HTABMASK.

As an fyi, BATs are faster than Page Tables. They also take priority over Page Tables. If a virtual address translation falls under both BAT and Page Table translation, the BAT will be used. This means you can setup two different virtual address's to translate to the same physical address (i.e. 0x80001500 -> 0x00001500 w/ Bat and also have 0xA0001500 -> 0x00001500 w/ Page Table)



Chapter 2: Allocating Memory

You will need quite a bit of memory for your page table entries, especially if you are planning to cover something such as 1+GB of virtual memory. For Mario Kart Wii, you can use something such as Egg::Heap::Alloc function to purchase you some memory for this (read note below)...

Example PAL (fill in mem_needed byte value)~

Code:
.set egg_alloc, 0x80229814
lis r12, egg_alloc@h
ori r12, r12, egg_alloc@l
mtlr r12
lis r3, mem_needed@h
ori r3, r3, mem_needed@l
lis r4, mem_needed@h
ori r4, r4, mem_needed@l #This is actually alignment. It needs to match r3
lwz r5, -0x5CA0 (r13) #PAL specific for Egg-Alloc
lwz r5, 0x0024 (r5)
blrl

In the above code, r3 returns pointer to Allocated Heap. Be sure to make this address physical before writing it to the HTABORG bits of SDR1.

IMPORTANT NOTE: The above code may not work for very large memory chunks due to natural function limitations. Function does work when asking for a 0x10000 chunk of memory with 0x10000 (64KB) required alignment



Chapter 3: SDR1 Configuration & TLB Invalidation

IMPORTANT NOTE: Be sure interrupts are masked (off) the entire time you are working on anything Page Table related (SDR1, SR, PTE construction, etc).

Before any page tables can be constructed, the TLB (Translation Lookaside Buffers) must be invalidated. TLBs are buffers in a on-chip unit that keep track of recently used PTEs. You cannot read/write to these directly. The only action you can do to them is invalidate a TLB by its index number, or issue a tlbsync instruction to wait for all/any TLB invalidations to complete.

SDR1 configuration must be done in real mode (reference: PPC PEM Book Page 2-42 footnotes for Table 2-22). Once SDR1 has been configured, you can invalidate the TLBs. There are a total of 64 TLBs. Each TLB is referenced by an index number that is contained in bits 14 thru 19 of the Effective Address used in the Register of the tlbie (TLB invalidate entry) instruction. The first TLB starts at index 0 and ends at index 63. The following snippet of code configures SDR1 and then invalidates TLBs. It assumes you went into Real Mode via rfi with EE, IR, and DR of the MSR set low

Code:
#Setup SDR1
lis r3, sdr1_value@h #Remember that the page table root/start address (HTABORG) needs to be physical
ori r3, r3, sdr_value@l
sync #Required per page 4-43 table 2-23 of the PPC PEM Book
mtspr 25, r3 #SDR1's SPR number is 25
isync #Required per page 4-43 table 2-23 of the PPC PEM Book

#Invalidate TLBs
li r0, 64
li r3, 0
mtctr r0
inval_tlb:
tlbie r3
addi r3, r3, 0x1000
bdnz+ inval_tlb
tlbsync #Required per page 202 section 5.4.3.2 of the Broadway Manual



Chapter 4: Segment Registers Configuration

After you invalidate the TLBs, you can setup the Segment Registers. The first 4 bits of a Effective Address chooses which Segment Register will be used. Therefore, by design, the following occurs..

Effective Address --> Segment Register Chosen
  • 0x0XXXXXXX --> sr0
  • 0x1XXXXXXX --> sr1
  • 0x2XXXXXXX --> sr2
  • .. ..
  • 0xEXXXXXXX --> sr14
  • 0xFXXXXXXX --> sr15

Normally, a coder/dev may write a series of cmpwi/branch instructions to take an input Effective Address and know which SR to configure. There's no need for that. Broadway comes with the mtsrin instruction

Move to Segment Register Indirect~
mtsrin rS, rB
Upper 4 bits of rB selects the SR
rS is copied into the SR

Here's an example code that setups every SR with all protection bits low (no restrictions on supervisor, user, or execute). It includes a lookup table where all 16 randomly generated VSID's reside at

Code:
#Use a VSID lookup table
bl vsid_table
#Example VSID's listed in table. Ofc use your own, that are randomly generated.
.long 0x000
.long 0x111
.long 0x222
.long 0x333
.long 0x444
.long 0x555
.long 0x666
.long 0x777
.long 0x888
.long 0x999
.long 0xaaa
.long 0xbbb
.long 0xccc
.long 0xddd
.long 0xeee
.long 0xfff
vsid_table:
mflr r3

#Set loop count. 16 for 16 SR's
li r0, 16
mtctr r0

#Pre decrement for loop
addi r3, r3, -4

#Set r4 to 0x00000000, increment by 0x10000000 to select next SR for the mtsrin instruction
li r4, 0

#Loop. Write SR using mtsrin
write_sr_loop:
lwzu r0, 0x4 (r3) #Load VSID
mtsrin r0, r4 #Based on r4's upper 4 bits, select the SR and write to it with currently loaded VSID
addis r4, r4, 0x1000 #Increment r4 to use next SR for mtsrin instruction
bdnz+ write_sr_loop



Chapter 5: Clearing the Page Table

Before any page table is to be used, it should always be entirely zero'd. Here's a snippet of code that does that..


Code:
#r3 = *Physical* Start Address of the Page Table
#r4 = Size of entire Page Table in bytes
#Divide size by 4
srwi r4, r4, 2

#Pre Decrement Start Address for Loop
subi r3, r3, 4

#Set r0, to 0
li r0, 0

#Set Loop Amount
mtctr r4

#Loop
zero_table:
stwu r0, 0x4 (r3)
bdnz+ zero_table

Above code is for real mode use. Assumes r3 is physical.



Chapter 6: Algorithm, How PTEGs are Generated

In order for any Page Table to be constructed for use, it needs to be filled with PTEs at the correct spots within the Table. This is determined by an algorithm. This algorithm requires 2 inputs. The EA and what's in SDR1.

First, here is a very broad overview of how an Effective Address is translated to its Physical Equivalent

[Image: pagetable02.png]

As you can see portions of the EA are broken up. Then the selected SR is utilized with the EA portions to make a temporary 52-bit Address. The VPN portion (upper 40 bits) of this 52-bit Address then goes through a series of operations and hashing. Here's a diagram to display that...

[Image: pagetable03.png]

The above chart can be broken down into the following steps~

  1. Lower 12-bits of the 52-bit address is placed aside. VPN is broken up into the VSID and Page-Index. These two items are used as the inputs for the Hash function
  2. The output of this Hash function is 19 bits in size. (upper 13 bits always result in null)
  3. HTABMASK of SDR1 is logically AND'd with upper 9 bits of Hash1
  4. Result from step 3 is logically OR'd with HTABORG's Maskable Bits
  5. Upper 7 bits of PTEG is the upper 7 bits of HTABORG
  6. Next 9 bits of PTEG is the result of Step 4
  7. Next 10 bits of PTEG is the lower 10 bits of Hash1 (step 2 result)
  8. Lower 6 bits of PTEG is always set low (64-byte aligned)

The following chart demonstrates how you can hand-generate a PTEG using just the EA and SDR1. The chart uses the following inputs...

SDR1 = 0x0F980007
Virtual Addr/EA = 0x00FFA01B

[Image: pagetable04.png]

As you can see the PTEG result is 0x0F9FF980. It's important to understand that the amount of "1" bits in HTABMASK in SDR1 determines how many bits of Hash Value 1 is to be placed into PTEG bit 15 going leftward. The chart indicates this via the bracketed bit contents of the upper 9 bits of Hash Value 1 which in turn points to the bracketed bits in the PTEG.

The above chart showed how a Primary PTEG is generated. Sometimes (due to the result of the Hash Value 1) a PTEG can be generated which matches a previous PTEG from a different EA. If such a case occurs, Hash Value 1 must be logically NOT'd (bitwise negated or also known as a 1's complement).

This new Hash Value is known as Hash Value 2. In the above chart, it would replace what's in Hash Value 1 (steps beforehand aren't required anymore). The following Chart shows what occurs once Hash Value 2 needs to be used...

[Image: pagetable05.png]

Therefore if Primary PTEG couldn't be used, the new (Secondary) PTEG would be 0x0F980640.



Chapter 7: Constructing the Page Table

In order to construct a Page Table, you must write all PTEs for all possible PTEGs for your range of Covered Memory.

Summary of constructing part (or all) of the Page Table based on a single EA. Assumes you also have the SDR1 and the PA that you want to use for the EA translation.

1. Using EA, figure out which SR would be used
2. Grab SR data
3. Form Upper 32-bits of PTE by...
    a. Extract VSID & API from SR
    b. Form temp upper PTE by inserting both VSID and API
    c. Finalize it by flipping bit 0 high (V/Valid bit)
4. Form Lower 32-bits of PTE by...
    a. Supply the PA (alternatively, you can supply one for identical translation by extracting the RPN bits from the EA)
    b. Insert desired WIMG bits
    c. Insert a high R bit, a high C bit, and desired PP bits
5. Generate PPC-Special Hash Value aka Hash Value 1
6. Generate PTEG Address by....
    a. Create a temp hash called tmp1 using Hash Value 1 & SDR1
    b. Create another temp hash called tmp2 using tmp1 and SDR1
    c. Create temp blank PTEG
    d. Insert blank PTEG, tmp2, & Hash Value 1
7. Using PTEG Address from Step 6, make sure there is a empty (invalid) PTE (out of 8)
8. If empty, write new PTE (will set it valid) that was formed from steps 3 and 4
9. If all 8 PTEs of PTEG are already valid, run a secondary special hash (Hash Value 2) to generate a different PTEG
10. Check 8 PTEs in Second PTEG, if none of those can be used, then halt
11. If one of the PTEs in the 2nd PTEG can be used, write new PTE but with H bit high to indicate Hash Value 2 was required

The above must be done for every 4KB aligned virtual address that you plan to use. So for example, let's say you want to setup the following translation scheme...

Effective/Virtual Address Range | Physical Address Range
0xA0000000 thru 0xA07FFFFF | 0x00000000 thru 0x007FFFFF

The above would be for 8MB of covered memory. To construct all the PTEs, you would first need to construct the PTE for virtual address 0xA0000000, then 0xA0001000, then 0xA0002000, etc etc until the last address of 0xA07FF000. When constructing the PTEs be sure the correct physical address is used for each new 4KB aligned virtual address you are utilizing.

Example snippet of code for a single PTE construction~
Assumes all SR's are configured, TLB's invalidated, SDR1 configured, and you are in real mode with ID+DR low.

Code:
#r3 = Virtual/Effective EA (assumed to be 4KB aligned)
#r4 = SDR1
#r5 = Physical EA equivalent desired (assumed to be 4KB aligned)

#Figure out which SR (r6) to use
#We could use a list of cmpwi/beq's by grabbing the data from every individual SR, but......
#...Broadway has the mfsrin instruction HOORAY!
#mfsrin rD, rB
#The upper 4 bits in rB selects the SR to use in the instruction
#SR is then copied into rD
mfsrin r6, r3

#SR found. Create Upper PTE except H bit (r0)
#Use VSID from SR, API from EA. Flip V high.
rlwinm r0, r6, 7, 1, 24 #Extract VSID from Segment Register
rlwimi r0, r3, 10, 26, 31 #Extract API from Virtual EA and insert VSID+API
oris r0, r0, 0x8000 #Set valid (V)) bit

#Create Lower PTE (r7)
#Use RPN bits from desired Physical Address
#Use desired WIMG and PP bits
#Set R and C high
#!!NOTE!! Example here uses WIMG of 0000 and PP of 10
clrrwi r7, r5, 12 #Extract RPN bits from desired PA
li r8, 0 #Set WIMG
rlwimi r7, r8, 3, 25, 28 #Insert WIMG
ori r7, r7, 0x0182 #R high, C high, PP set to 0x10; change accordingly to your needs

#Generate 19-bit Hash Value 1 (r8)
rlwinm r8, r3, 20, 16, 31 #Extract EA bits 4 thru 19, right justify it
clrlwi r9, r6, 13  #Extract the lower 19 bits (bits 13 thru 31 of the SR) of the VSID
xor r8, r8, r9

#Preset r12 to hold absent H bit
li r12, 0

#Calculate PTEG Address (r11)
#tmp1 = Hash Value[13-21] & HTAMASK
#tmp2 = tmp1 | HTABORG-Maskable
#PTEG = SDR1[0-6], tmp2 [rol'd 16], Hash Value[22-31 rol'd 6]
calc_pteg:
rlwinm r9, r8, 22, 23, 31 #Extract bits 13 thru 21 of the Hash Value & right justify it
and r10, r9, r4 #Create tmp1 Hash , Logically AND r9 with HTABMASK (note there's no need to extract HTABMASK out of SDR1 beforehand because the ANDing will never be effected by the HTABORG bits
rlwinm r11, r4, 16, 23, 31 #Extract "Maskable" bits of HTABORG of SDR1 & right justify it
or r10, r10, r11 #Create tmp2 Hash; tmp1 | HTABORG-Maskable
li r11, 0 #Set r11 to 0 for a fresh PTEG Address
rlwimi r11, r4, 0, 0, 6 #Insert SDR1 bits 0 thru 6 (HTABORG non-maskable) into  upper 7 bits of PTEG
rlwimi r11, r10, 16, 7, 15 #Insert tmp2 into PTEG bits 7 thru 15
rlwimi r11, r8, 6, 16, 25 #Insert Hash Value bits 22 thru 31 into PTEG bits 16 thru 25

#Set H bit in upper PTE
or r0, r0, r12 #Logically OR in possible H-bit into upper PTE

#Check if at least 1 out of 8 PTEs are not already in use. First non-valid one will be constructed
li r10, 8 #r10 safe to use now
subi r11, r11, 8 #Pre decrement for loop
mtctr r10
pte_valid_check:
lwzu r10, 0x8 (r11)
andis. r9, r10, 0x8000 #r9 safe to use now
beq- construct_pte
bdnz+ pte_valid_check

#Check if we are on Hash Value 1 or 2
cmpwi r12, 0x0040
beq- ERROR #If equal we already used both hashes!

#Hash Value 2 not used yet. Hash Value 2 is simply a 1's complement of P-Hash. All we need is a bitwise-negate or better known as a Logical-NOT
not r8, r8
li r12, 0x0040 #Set r12 to have H bit high next time we run calc_pteg
b calc_pteg #Re-do PTEG address calculation

#ERROR; currently configured to do a basic infinite loop, adjust this to your needs
ERROR:
nop
b ERROR

#We can construct the PTE. Do it!
construct_pte:
stw r7, 0x4 (r11)
eieio #To prevent the possibility of store gathering if it happens to be enabled
stw r0, 0 (r11)
sync #Make sure this store completes before a future load or store via a PTE occurs. For possibility PTEs are being modified while address translation is on

Already, well that was a doozy. As you can see in the above source code, the mfsrin instruction was used to know which SR data to grab based on the EA. This is much more efficient that using a list of compare+branch instructions.

Move from Segment Register Indirect~
mfsrin rD, rB
Upper 4 bits of rB selects the SR 
SR is copied into rD



Chapter 8: Wrapping Things Up; Example Gecko Code

When exiting real mode, be sure that IR and DR will be set high in the MSR after the rfi instruction has been executed. Also make sure EE is back to its original state.

Here is a Gecko Code that uses a Page Table for 0xA0000000 thru 0xA07XXXXX (physical 0x00000000 thru 0x007XXXXX) translation. Once the Page Table has been fully constructed, a simple store instruction using the address of 0xA0001500 is completed. Obviously this works or else an exception (page fault) would occur.

-----

0xA0000000+ 8MB Page Table Example [Vega]

PAL
C200A42C 00000032
9421FFE0 BF810008
3D808022 618C9814
7D8803A6 3C600001
3C800001 80ADA360
80A50024 4E800021
38804000 38A3FFFC
38000000 7C8903A6
94050004 4200FFFC
48000005 7FE802A6
3BDF0024 57DE007E
7FDA03A6 7FC000A6
57C0045E 54000732
7C1B03A6 4C000064
6C638000 5464843E
38A4FFFF 7CA52078
7CA50034 20A50020
7C642B78 7C0004AC
7C9903A6 4C00012C
38000040 38600000
7C0903A6 7C001A64
38631000 4200FFF8
7C00046C 3CC000CA
60C6701C 7CCA01A4
3FA0A000 3B800800
7FA3EB78 546500FE
54C03870 506056BE
64008000 54A70026
39000000 51071E78
60E70182 5468A43E
54C9037E 7D084A78
39800000 5509B5FE
7D2A2038 548B85FE
7D4A5B78 39600000
508B000C 514B81DE
510B3432 7C006378
39400008 396BFFF8
7D4903A6 854B0008
75498000 41820024
4200FFF4 2C0C0040
41820010 7D0840F8
39800040 4BFFFFB0
60000000 4BFFFFFC
900B0000 90EB0004
379CFFFF 3BBD1000
4082FF60 7FDB03A6
3BFF0130 7FFA03A6
4C000064 38000007
3C60A000 90031500
BB810008 38210020
38600000 00000000

Code:
#Address Ports
#PAL = 8000A42C

#NOTES (read me)
eieio and sync's not used due to SGE being low in HID0, PTEs being written for first time ever and while in real mode, and this is simply a demo code. Not for 'real world use'.

#Assembler Directives
.set egg_alloc, 0x80229814
.set page_table_size_bytes, 0x00010000
.set page_table_size_words, 0x4000
.set VSID, 0x00CA701C

#Inline Style Stack Frame to backup 4 Registers
#No need to save LR or r0
stwu sp, -0x0020 (sp)
stmw r28, 0x8 (sp)

#Call Egg Alloc
#Using 8MB of Covered Memory. Page Table will be 0x00010000 bytes (64Kbytes)
lis r12, egg_alloc@h
ori r12, r12, egg_alloc@l
mtlr r12
lis r3, page_table_size_bytes@h
lis r4, 0x0001 #LOL it works
lwz r5, -0x5CA0 (r13) #PAL specific for Egg-Alloc
lwz r5, 0x0024 (r5)
blrl

#Clear Table just in case allocated memory has junk in it
#r3 = Start Address of the Page Table
#r4 = Size of entire Page Table in words
li r4, page_table_size_words

#Pre Decrement Start Address for Loop
subi r5, r3, 4

#Set r0, to 0
li r0, 0

#Set Loop Amount
mtctr r4

#Loop
zero_table:
stwu r0, 0x4 (r5)
bdnz+ zero_table

#Go into Real Mode with EE, IR, and DR set low
bl get_pc #Get Program counter
get_pc:
mflr r31 #Will need this value later to get back to Virtual Mode

margin = real_mode - get_pc

addi r30, r31, margin #This points to instruction right after rfi, keep r31 intact for later
clrlwi r30, r30, 1 #Change address to physical
mtspr srr0, r30 #Place physical address into srr0
mfmsr r30 #Get MSR, keep it in r30 for later
rlwinm r0, r30, 0, 17, 15 #Flip EE low
rlwinm r0, r0, 0, 28, 25 #Flip IR, DR low
mtspr srr1, r0 #Place updated MSR into srr1
rfi #Go into real mode

#Setup SDR1; use 64KB aligned address (r3) returned from Egg_Alloc
#Address could be beyond 64KB aligned, we'll need to count trailing zeroes in the upper 16 bits to be sure
real_mode:
xoris r3, r3, 0x8000 #Make page table root address physical
srwi r4, r3, 16 #Temp shift to the right by 16 bits

#Count trailing zeros
subi r5, r4, 1
andc r5, r5, r4
cntlzw r5, r5
subfic r5, r5, 32

#Or in Physical Heap Address with Trailing Zero value
or r4, r3, r5

#Write it to SDR1
sync #Required per page 4-43 table 2-23 of the PPC PEM Book
mtspr 25, r4 #SDR1's SPR number is 25
isync #Required per page 4-43 table 2-23 of the PPC PEM Book

#Invalidate TLBs
li r0, 64
li r3, 0
mtctr r0
inval_tlb:
tlbie r3
addi r3, r3, 0x1000
bdnz+ inval_tlb
tlbsync #Required per page 202 section 5.4.3.2 of the Broadway Manual

#Setup Segment Register 10 for 0xA0000XXX Virtual Memory
#Set VSID, keep all other bits low. SR will simply be just the VSID then
lis r6, VSID@h
ori r6, r6, VSID@l #Using r6 due upcoming code that is designed to having SR in r6
mtsr 10, r6

#Set Initial EA in r29
lis r29, 0xA000

#Set PTE Construction Mega Loop Amount
#(0xA07FF - 0xA0000) + 1 = 0x800 amount of 4KB aligned addresses for PTE construction
li r28, 0x0800

#LOOP
mega_loop:
mr r3, r29
clrlwi r5, r3, 3 #Change 0xA to 0x0

#r3 = Virtual/Effective EA (assumed to be 4KB aligned)
#r4 = SDR1
#r5 = Physical EA equivalent desired (assumed to be 4KB aligned)
#r6 = PA (assumed to be 4KB aligned)

#Use VSID from SR, API from EA. Flip V high.
rlwinm r0, r6, 7, 1, 24 #Extract VSID from Segment Register
rlwimi r0, r3, 10, 26, 31 #Extract API from Virtual EA and insert VSID+API
oris r0, r0, 0x8000 #Set valid (V)) bit

#Create Lower PTE (r7)
#Use RPN bits from desired Physical Address
#Use desired WIMG and PP bits
#Set R and C high
#!!NOTE!! Example here uses WIMG of 0000 and PP of 10
clrrwi r7, r5, 12 #Extract RPN bits from desired PA
li r8, 0 #Set WIMG
rlwimi r7, r8, 3, 25, 28 #Insert WIMG
ori r7, r7, 0x0182 #R high, C high, PP set to 0x10; change accordingly to your needs

#Generate 19-bit Hash Value 1 (r8)
rlwinm r8, r3, 20, 16, 31 #Extract EA bits 4 thru 19, right justify it
clrlwi r9, r6, 13  #Extract the lower 19 bits (bits 13 thru 31 of the SR) of the VSID
xor r8, r8, r9

#Preset r12 to hold absent H bit
li r12, 0

#Calculate PTEG Address (r11)
#tmp1 = Hash Value[13-21] & HTAMASK
#tmp2 = tmp1 | HTABORG-Maskable
#PTEG = SDR1[0-6], tmp2 [rol'd 16], Hash Value[22-31 rol'd 6]
calc_pteg:
rlwinm r9, r8, 22, 23, 31 #Extract bits 13 thru 21 of the Hash Value & right justify it
and r10, r9, r4 #Create tmp1 Hash , Logically AND r9 with HTABMASK (note there's no need to extract HTABMASK out of SDR1 beforehand because the ANDing will never be effected by the HTABORG bits
rlwinm r11, r4, 16, 23, 31 #Extract "Maskable" bits of HTABORG of SDR1 & right justify it
or r10, r10, r11 #Create tmp2 Hash; tmp1 | HTABORG-Maskable
li r11, 0 #Set r11 to 0 for a fresh PTEG Address
rlwimi r11, r4, 0, 0, 6 #Insert SDR1 bits 0 thru 6 (HTABORG non-maskable) into  upper 7 bits of PTEG
rlwimi r11, r10, 16, 7, 15 #Insert tmp2 into PTEG bits 7 thru 15
rlwimi r11, r8, 6, 16, 25 #Insert Hash Value bits 22 thru 31 into PTEG bits 16 thru 25

#Set H bit in upper PTE
or r0, r0, r12 #Logically OR in possible H-bit into upper PTE

#Check if at least 1 out of 8 PTEs are not already in use. First non-valid one will be constructed
li r10, 8 #r10 safe to use now
subi r11, r11, 8 #Pre decrement for loop
mtctr r10
pte_valid_check:
lwzu r10, 0x8 (r11)
andis. r9, r10, 0x8000 #r9 safe to use now
beq- construct_pte
bdnz+ pte_valid_check

#Check if we are on Hash Value 1 or 2
cmpwi r12, 0x0040
beq- ERROR #If equal we already used both hashes!

#Hash Value 2 not used yet. Hash Value 2 is simply a 1's complement of P-Hash. All we need is a bitwise-negate or better known as a Logical-NOT
not r8, r8
li r12, 0x0040 #Set r12 to have H bit high next time we run calc_pteg
b calc_pteg #Re-do PTEG address calculation

#ERROR; currently configured to do a basic infinite loop, adjust this to your needs
ERROR:
nop
b ERROR

#We can construct the PTE. Do it!
construct_pte:
stw r0, 0 (r11)
stw r7, 0x4 (r11)

#Decrement Mega Loop, update r29
subic. r28, r28, 1
addi r29, r29, 0x1000
bne+ mega_loop

#Leave Real Mode
virt_margin = virtual_mode - get_pc

#Restore very first original MSR (r30)
mtspr srr1, r30

#Original get_pc value still in r31, simply add virt_margin to it
addi r31, r31, virt_margin
mtspr srr0, r31
rfi

#Test the page table!
virtual_mode:
li r0, 7
lis r3, 0xA000
stw r0, 0x1500 (r3)

#Pop Inline Style Frame
lmw r28, 0x8 (sp)
addi sp, sp, 0x0020

#Original Instruction
li r3, 0



Chapter 9: Credits, Resources
  • Gaberboo (SR corrections, addition of eieio and sync for safety when writing PTEs)
  • NXP AN2791 PDF (Diagrams, Chapter 7 source with slight modifications)
  • PowerPC Microprocessor Family: The Programming Environments PDF (Diagrams, SR breakdown, SDR1 real mode and syncing rules)
  • Broadway User Manual (tlb invalidation and syncing)
  • PowerPC Compiler Writer's Guide (count trailing zero's source used in Gecko Code)
Reply
#2
Quote:Permission related bits are also present and will override the PP bits in a PTE
Not quite, these bits change the meaning of the PP bits, the key in the SR being set results in two PP bit combinations (00 no access, 01 read only) being reduced access, and not being set causes both these combinations to be read/write. PP bit combo 11 is always read only and 10 is always read/write

Quote:bit 0 = Must be 0 or else the SR will be used for an I/O device
Broadway doesn't support direct store segments and so simply triggers a DSI for any access in a segment with an SR that has this bit set and doesn't translate with a BAT. 

Quote:Bits 7 thru 15 within HTABORG is known as the "Maskable Bits". Meaning however many zeroes were required is the amount of high/one bits are required to be set in HTABMASK.
Broadway's response to not following this is to just OR HTABORG with the hash anyway, according to a comment in Dolphin's source code. (Note I do not encourage doing this.)

Quote:Page Tables cannot cover less than 8MB or more than 4Gbyte of memory.
The values are a recommendation based on physical memory, on the expectation at least some physical memory will be used in multiple pages (say if an operating system is providing a file to multiple programs) and/or the page table gets PTEs clumped in one spot (Which can be somewhat remedied with smart setting of VSID). Doing some math says the absolute limit of a 64 KiB page table is mapping 32 MiB worth of pages, but pushing the limits like this poses problems similar to when your computer disc space is full.

Quote:Please NOTE that you could use a double-float store mechanism or the dcbz instruction to clear the page table
Integer stores are going to be faster than storing doubles if you only write the first word of every PTE to zero and I am willing it bet it's faster even if you write 0s to the entire table, since it's likely IBM optimized those more. I'll try to set up a timing comparison on console for this sometime.

Quote:construct_pte:
stw r0, 0 (r11)
stw r7, 0x4 (r11)
This is only ok if the page table isn't in use, a PTE should only be valid if it's ok for the processor to see it, with either address translation on this requires the order to be swapped, an eieio placed between the two instructions to prevent out-of-order shenanigans, and a following sync so the processor waits for the PTE to be written before proceeding.

Sidenote, a PR for mkw-sp I coauthored featured usage of page tables https://github.com/stblr/mkw-sp/pull/495 (I wrote the original page table code, it looks a lot nicer than when I first wrote it)
Reply
#3
Thank you for the corrections, I appreciate it.

Regarding the integer vs float for zero-ing a table, if we were to ignore alignment, cache hits/misses, etc, integer stores and double-float stores have the same cycle latency (2:1). I remember a long time ago I was doing some quick tests on seeing how to zero a small block of memory quickly. IIIRC, dcbz came out on top even with it being an Execution Serialization instruction. Then again, I have an awful memory (lol)

I've found a few optimized memcpy PowerPC 32-bit implementations on the web and all of them implemented double-float stores to some degree to speed up the memcpy.

Regarding eieio, that shouldn't be needed if SGE is low in HID0. A sync shouldn't be required if this is being set in Real Mode (translation off) and if so, the sync would simply need to be done sometime after all the PTEs are written and before translation is enabled. I'll add them regardless for the case that SGE is high in HID0 and/or PTEs are being rewritten while translation is on.
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)