Login

***Vega*** · (This post was last modified: 04-15-2024, 10:07 PM by Vega.)

Expanding on my previous thread HERE about missing info in the Broadway Manual, it appears I've ran into some more omitted information of the 1.0 Version of the Broadway Manual. I was doing some real hardware tests of floating point loads and stores, so I can know how to code them into my Broadway PPC Instruction Simulator. Anyway, I was running into some weird results, but after some tinkering, I found out that not only does lfs+stfs operations varie on the HID2 PSE bit, so does lfd+stfd.

Here's what occurs~
lfs
PSE low: 32-bit Single Float loaded from EA. Converted to 64-bit form. Placed into fD
PSE high: 32-bit Single Float loaded from EA. Placed into both ps0 and ps1 of fD

lfd (PSE high stuff missing from Manual)
PSE low: 64-bit Double Float loaded from EA. Placed into fD
PSE high: 64-bit Double Float loaded from EA. Converted to 32-bit Single Precision. Placed into ps0 with ps1 being left undefined***

stfs
PSE low: Value in fS is converted to a 32-bit Single Precision float. That float is stored at EA.
PSE high: ps0 of fS stored at EA

stfd (PSE high stuff missing from Manual)
PSE low: fS stored at EA
PSE high: Value in fS's ps0 converted to 64-bit form. That 64-bit float is stored at EA.

***Tested a handful of numbers, and every time lfd was done (when HID2 PSE high), ps1 was always written as 0x3F800000. I doubt this is a "hardcoded" constant, most likely it's just undefined (junk).

Obviously, if there are data-type conversions (i.e. denorms), those still occur in accordance to what is stated in the Broadway Manual.

Here are two snippets of code that confirms this. These are hooked to the Shared Item Address and will only execute when HID2 PSE is high. I didn't want to set HID2 PSE manually because that requires I-Cache invalidation & disabling. Run these on a real Wii console as Dolphin will not emulate this correctly.

Snippet 1 does this:

Sets 0x40341000 00000000 in memory via GPRs
Uses lfd to load it
Stores it via stfs, psq_st, and stfd
Results of the 3 stores are printed to screen in that order

Snippet 2 does this:

Sets 0x40341000 00000000 in memory via GPRs
Uses lfs to load it
Stores it via stfs, psq_st, and stfd
Results of the 3 stores are printed to screen in that order

I am using the integer value 0x40341000 00000000 for the test because this is a normalized (valid) double-precision float. Also, if it is read as single (0x40341000), this is a normalized (valid) single-precision float. Since both these floats are normalized and the exponent bits are greater than 896, there are zero data-type conversions. Only conversions are the standard width/precision changes mentioned earlier which always occur.

Snippet 1 will result as:
41A08000 00000000 (41A08000 written from the stfs)
41A08000 3F800000 (written from the psq_st, this shows what was loaded via lfd, ps1 is most likely undefined/junk as explained earlier)
40341000 00000000 (stfd, this gets written because 41A08000 is converted to its 64-bit form)

Snippet 2 will result as:
40341000 00000000 (40341000 written from the stfs)
40341000 40341000 (written from psq_st, this shows what was loaded via lfs, ofc lfs loads same value into ps0 and ps1)
40068200 00000000 (stfd, this gets written because 40068200 is converted to its 64-bit form)

Here are the codes~

Snippet 1:

Code:
#C2 Address 807BA164

#Pick up box, see result on screen

.set doubleword1, 0x40341000

.set doubleword2, 0x00000000

.set HID2, 920

.set PSE, 0x2000 #low bits excluded

#Shared Item Default instruction

stw r3, 0x0020 (r23)

#Check PSE bit of HID2

mfspr r12, HID2

andis. r12, r12, PSE

beq- the_end

#Set r31 to 0x80001500

lis r31, 0x8000

ori r31, r31, 0x1500

#Set double word to store

lis r3, doubleword1@h

ori r3, r3, doubleword1@l

lis r4, doubleword2@h

ori r4, r4, doubleword2@l

stw r3, 0 (r31)

stw r4, 0x4 (r31)

#Prove lfd, stfs, and stfd

#Load float

lfd f1, 0 (r31) #Loads as 0x41A08000 3F800000

#Store float, stores as 0x41A08000

stfs f1, 0 (r31)

#Store paired single, stores as 0x41A08000 3F800000

psq_st f1, 0x8 (r31), 0, 0

#Store using stfd, will store as 40341000 00000000

stfd f1, 0x10 (r31)

#Load the fpr values for sprintf

lwz r5, 0 (r31)

lwz r6, 0x4 (r31)

lwz r7, 0x8 (r31)

lwz r8, 0xC (r31)

lwz r9, 0x10 (r31)

lwz r10, 0x14 (r31)

#Set r4 arg for sprintf

bl setsprintf

.asciz "%08X %08X\n%08X %08X\n%08X %08X"

.align 2

setsprintf:

mflr r4

#Set r3 arg for sprintf

addi r3, r31, 0x40

#Clear cr1 eq bit cuz no floats for sprintf

crclr 6

#Call sprintf

lis r12, 0x8001

ori r12, r12, 0x1A2C

mtctr r12

bctrl

addi r5, r31, 0x40

#Setup OSFatal args

bl setupfatal

.long 0xFFFFFFFF

.long 0

setupfatal:

mflr r3

addi r4, r3, 4

#Call OSFatal

lis r12, 0x801A

ori r12, r12, 0x4EC4

mtctr r12

bctr

the_end:

Snippet 2:

Code:
#C2 Address 807BA164

#Pick up box, see result on screen

.set doubleword1, 0x40341000

.set doubleword2, 0x00000000

.set HID2, 920

.set PSE, 0x2000 #low bits excluded

#Shared Item Default instruction

stw r3, 0x0020 (r23)

#Check PSE bit of HID2

mfspr r12, HID2

andis. r12, r12, PSE

beq- the_end

#Set r31 to 0x80001500

lis r31, 0x8000

ori r31, r31, 0x1500

#Set double word to store

lis r3, doubleword1@h

ori r3, r3, doubleword1@l

lis r4, doubleword2@h

ori r4, r4, doubleword2@l

stw r3, 0 (r31)

stw r4, 0x4 (r31)

#Prove lfs, stfs, and stfd

#Load float

lfs f1, 0 (r31) #Loads as 0x40341000 40341000

#Store float, stores as 0x40341000

stfs f1, 0 (r31)

#Store paired single, stores as 0x40341000 40341000

psq_st f1, 0x8 (r31), 0, 0

#Store using stfd, will store as 0x40068200 00000000

stfd f1, 0x10 (r31)

#Load the fpr values for sprintf

lwz r5, 0 (r31)

lwz r6, 0x4 (r31)

lwz r7, 0x8 (r31)

lwz r8, 0xC (r31)

lwz r9, 0x10 (r31)

lwz r10, 0x14 (r31)

#Set r4 arg for sprintf

bl setsprintf

.asciz "%08X %08X\n%08X %08X\n%08X %08X"

.align 2

setsprintf:

mflr r4

#Set r3 arg for sprintf

addi r3, r31, 0x40

#Clear cr1 eq bit cuz no floats for sprintf

crclr 6

#Call sprintf

lis r12, 0x8001

ori r12, r12, 0x1A2C

mtctr r12

bctrl

addi r5, r31, 0x40

#Setup OSFatal args

bl setupfatal

.long 0xFFFFFFFF

.long 0

setupfatal:

mflr r3

addi r4, r3, 4

#Call OSFatal

lis r12, 0x801A

ori r12, r12, 0x4EC4

mtctr r12

bctr

the_end:

In conclusion, now you know. If you find any of this incorrect, please let me know.

Login
Username:
Password:	Lost Password?
	Remember me