Expanding on my previous thread HERE about missing info in the Broadway Manual, it appears I've ran into some more omitted information of the 1.0 Version of the Broadway Manual. I was doing some real hardware tests of floating point loads and stores, so I can know how to code them into my Broadway PPC Instruction Simulator. Anyway, I was running into some weird results, but after some tinkering, I found out that not only does lfs+stfs operations varie on the HID2 PSE bit, so does lfd+stfd.
Here's what occurs~
lfs
PSE low: 32-bit Single Float loaded from EA. Converted to 64-bit form. Placed into fD
PSE high: 32-bit Single Float loaded from EA. Placed into both ps0 and ps1 of fD
lfd (PSE high stuff missing from Manual)
PSE low: 64-bit Double Float loaded from EA. Placed into fD
PSE high: 64-bit Double Float loaded from EA. Converted to 32-bit Single Precision. Placed into ps0 with ps1 being left undefined***
stfs
PSE low: Value in fS is converted to a 32-bit Single Precision float. That float is stored at EA.
PSE high: ps0 of fS stored at EA
stfd (PSE high stuff missing from Manual)
PSE low: fS stored at EA
PSE high: Value in fS's ps0 converted to 64-bit form. That 64-bit float is stored at EA.
***Tested a handful of numbers, and every time lfd was done (when HID2 PSE high), ps1 was always written as 0x3F800000. I doubt this is a "hardcoded" constant, most likely it's just undefined (junk).
Obviously, if there are data-type conversions (i.e. denorms), those still occur in accordance to what is stated in the Broadway Manual.
Here are two snippets of code that confirms this. These are hooked to the Shared Item Address and will only execute when HID2 PSE is high. I didn't want to set HID2 PSE manually because that requires I-Cache invalidation & disabling. Run these on a real Wii console as Dolphin will not emulate this correctly.
Snippet 1 does this:
Snippet 2 does this:
I am using the integer value 0x40341000 00000000 for the test because this is a normalized (valid) double-precision float. Also, if it is read as single (0x40341000), this is a normalized (valid) single-precision float. Since both these floats are normalized and the exponent bits are greater than 896, there are zero data-type conversions. Only conversions are the standard width/precision changes mentioned earlier which always occur.
Snippet 1 will result as:
41A08000 00000000 (41A08000 written from the stfs)
41A08000 3F800000 (written from the psq_st, this shows what was loaded via lfd, ps1 is most likely undefined/junk as explained earlier)
40341000 00000000 (stfd, this gets written because 41A08000 is converted to its 64-bit form)
Snippet 2 will result as:
40341000 00000000 (40341000 written from the stfs)
40341000 40341000 (written from psq_st, this shows what was loaded via lfs, ofc lfs loads same value into ps0 and ps1)
40068200 00000000 (stfd, this gets written because 40068200 is converted to its 64-bit form)
Here are the codes~
Snippet 1:
Snippet 2:
In conclusion, now you know. If you find any of this incorrect, please let me know.
Here's what occurs~
lfs
PSE low: 32-bit Single Float loaded from EA. Converted to 64-bit form. Placed into fD
PSE high: 32-bit Single Float loaded from EA. Placed into both ps0 and ps1 of fD
lfd (PSE high stuff missing from Manual)
PSE low: 64-bit Double Float loaded from EA. Placed into fD
PSE high: 64-bit Double Float loaded from EA. Converted to 32-bit Single Precision. Placed into ps0 with ps1 being left undefined***
stfs
PSE low: Value in fS is converted to a 32-bit Single Precision float. That float is stored at EA.
PSE high: ps0 of fS stored at EA
stfd (PSE high stuff missing from Manual)
PSE low: fS stored at EA
PSE high: Value in fS's ps0 converted to 64-bit form. That 64-bit float is stored at EA.
***Tested a handful of numbers, and every time lfd was done (when HID2 PSE high), ps1 was always written as 0x3F800000. I doubt this is a "hardcoded" constant, most likely it's just undefined (junk).
Obviously, if there are data-type conversions (i.e. denorms), those still occur in accordance to what is stated in the Broadway Manual.
Here are two snippets of code that confirms this. These are hooked to the Shared Item Address and will only execute when HID2 PSE is high. I didn't want to set HID2 PSE manually because that requires I-Cache invalidation & disabling. Run these on a real Wii console as Dolphin will not emulate this correctly.
Snippet 1 does this:
- Sets 0x40341000 00000000 in memory via GPRs
- Uses lfd to load it
- Stores it via stfs, psq_st, and stfd
- Results of the 3 stores are printed to screen in that order
Snippet 2 does this:
- Sets 0x40341000 00000000 in memory via GPRs
- Uses lfs to load it
- Stores it via stfs, psq_st, and stfd
- Results of the 3 stores are printed to screen in that order
I am using the integer value 0x40341000 00000000 for the test because this is a normalized (valid) double-precision float. Also, if it is read as single (0x40341000), this is a normalized (valid) single-precision float. Since both these floats are normalized and the exponent bits are greater than 896, there are zero data-type conversions. Only conversions are the standard width/precision changes mentioned earlier which always occur.
Snippet 1 will result as:
41A08000 00000000 (41A08000 written from the stfs)
41A08000 3F800000 (written from the psq_st, this shows what was loaded via lfd, ps1 is most likely undefined/junk as explained earlier)
40341000 00000000 (stfd, this gets written because 41A08000 is converted to its 64-bit form)
Snippet 2 will result as:
40341000 00000000 (40341000 written from the stfs)
40341000 40341000 (written from psq_st, this shows what was loaded via lfs, ofc lfs loads same value into ps0 and ps1)
40068200 00000000 (stfd, this gets written because 40068200 is converted to its 64-bit form)
Here are the codes~
Snippet 1:
Code:
#C2 Address 807BA164
#Pick up box, see result on screen
.set doubleword1, 0x40341000
.set doubleword2, 0x00000000
.set HID2, 920
.set PSE, 0x2000 #low bits excluded
#Shared Item Default instruction
stw r3, 0x0020 (r23)
#Check PSE bit of HID2
mfspr r12, HID2
andis. r12, r12, PSE
beq- the_end
#Set r31 to 0x80001500
lis r31, 0x8000
ori r31, r31, 0x1500
#Set double word to store
lis r3, doubleword1@h
ori r3, r3, doubleword1@l
lis r4, doubleword2@h
ori r4, r4, doubleword2@l
stw r3, 0 (r31)
stw r4, 0x4 (r31)
#Prove lfd, stfs, and stfd
#Load float
lfd f1, 0 (r31) #Loads as 0x41A08000 3F800000
#Store float, stores as 0x41A08000
stfs f1, 0 (r31)
#Store paired single, stores as 0x41A08000 3F800000
psq_st f1, 0x8 (r31), 0, 0
#Store using stfd, will store as 40341000 00000000
stfd f1, 0x10 (r31)
#Load the fpr values for sprintf
lwz r5, 0 (r31)
lwz r6, 0x4 (r31)
lwz r7, 0x8 (r31)
lwz r8, 0xC (r31)
lwz r9, 0x10 (r31)
lwz r10, 0x14 (r31)
#Set r4 arg for sprintf
bl setsprintf
.asciz "%08X %08X\n%08X %08X\n%08X %08X"
.align 2
setsprintf:
mflr r4
#Set r3 arg for sprintf
addi r3, r31, 0x40
#Clear cr1 eq bit cuz no floats for sprintf
crclr 6
#Call sprintf
lis r12, 0x8001
ori r12, r12, 0x1A2C
mtctr r12
bctrl
addi r5, r31, 0x40
#Setup OSFatal args
bl setupfatal
.long 0xFFFFFFFF
.long 0
setupfatal:
mflr r3
addi r4, r3, 4
#Call OSFatal
lis r12, 0x801A
ori r12, r12, 0x4EC4
mtctr r12
bctr
the_end:
Snippet 2:
Code:
#C2 Address 807BA164
#Pick up box, see result on screen
.set doubleword1, 0x40341000
.set doubleword2, 0x00000000
.set HID2, 920
.set PSE, 0x2000 #low bits excluded
#Shared Item Default instruction
stw r3, 0x0020 (r23)
#Check PSE bit of HID2
mfspr r12, HID2
andis. r12, r12, PSE
beq- the_end
#Set r31 to 0x80001500
lis r31, 0x8000
ori r31, r31, 0x1500
#Set double word to store
lis r3, doubleword1@h
ori r3, r3, doubleword1@l
lis r4, doubleword2@h
ori r4, r4, doubleword2@l
stw r3, 0 (r31)
stw r4, 0x4 (r31)
#Prove lfs, stfs, and stfd
#Load float
lfs f1, 0 (r31) #Loads as 0x40341000 40341000
#Store float, stores as 0x40341000
stfs f1, 0 (r31)
#Store paired single, stores as 0x40341000 40341000
psq_st f1, 0x8 (r31), 0, 0
#Store using stfd, will store as 0x40068200 00000000
stfd f1, 0x10 (r31)
#Load the fpr values for sprintf
lwz r5, 0 (r31)
lwz r6, 0x4 (r31)
lwz r7, 0x8 (r31)
lwz r8, 0xC (r31)
lwz r9, 0x10 (r31)
lwz r10, 0x14 (r31)
#Set r4 arg for sprintf
bl setsprintf
.asciz "%08X %08X\n%08X %08X\n%08X %08X"
.align 2
setsprintf:
mflr r4
#Set r3 arg for sprintf
addi r3, r31, 0x40
#Clear cr1 eq bit cuz no floats for sprintf
crclr 6
#Call sprintf
lis r12, 0x8001
ori r12, r12, 0x1A2C
mtctr r12
bctrl
addi r5, r31, 0x40
#Setup OSFatal args
bl setupfatal
.long 0xFFFFFFFF
.long 0
setupfatal:
mflr r3
addi r4, r3, 4
#Call OSFatal
lis r12, 0x801A
ori r12, r12, 0x4EC4
mtctr r12
bctr
the_end:
In conclusion, now you know. If you find any of this incorrect, please let me know.