[WIP] YAPSxP: Yet Another PSX Emulator for PSP

**Exophase** · November 7th, 2006, 13:15

Originally Posted by hlide

IR0/1/2/3 are normally set from MAC0/1/2/3 but they are saturated so they are always between -32768 or 32767. So long as MAC0/1/2/3 don't exceed those min et max values, IR0/1/2/3 are okay. If they do exceed, IR0/1/2/3 would be saturated anyway to those min/max values so I think it is okay for them indeed. Plus they are always loaded or stored as 16-bit word so that okay to be fit as a float. It sounds as if the problem only concerns MAC0/1/2/3, TRX/Y/Z, R/B/BBK which are always 32-bit words if you need to load/store them as float.

Yeah, you're right, with the saturation they should be okay.

Originally Posted by hlide

Yes that's the main trouble, i cannot keep MAC0/1/2/3 as float registers because it only can keep a "24-bit integer" when MAC0/1/2/3 needs to store 1:31:0 bits. So if you use GPL (General purpose interpolation) :
MAC1=A1[MAC1 + IR0 * IR1]
MAC2=A2[MAC2 + IR0 * IR2]
MAC3=A3[MAC3 + IR0 * IR3]
IR1=Lm_B1[MAC1]
IR2=Lm_B2[MAC2]
IR3=Lm_B3[MAC3]
[0,8,0] Cd0<-Cd1<-Cd2<- CODE
[0,8,0] R0<-R1<-R2<- Lm_C1[MAC1]
[0,8,0] G0<-G1<-G2<- Lm_C2[MAC2]
[0,8,0] B0<-B1<-B2<- Lm_C3[MAC3]

and call successive GPL, you are accumulating errors but it is also true that IR1/2/3 and Rx/Gx/Bx are being saturated anyway, so as long as the code doesn't need to look at MAC0/1/2/3 it should be okay.

For the interpolation they probably usually won't, isn't it usually used for color?

Originally Posted by hlide

i don't see any easy way to do :/.

typically to set FLAGS for MAC1/2/3 should look that way ?

vli.t C000, [1<<(30-12), 1<<(29-12), 1<<(28-12)] // bits A1p, A2p, A3p
vli.t C010, [1<<(27-12), 1<<(26-12), 1<<(25-12)] // bits A1n, A2n, A3n
vli.t C020, [MIN, MIN, MIN] // -32767.0
vli.t C030, [MAX, MAX, MAX] // +32768.0
vslt.t C020, MAC1MAC2MAC3, C020 // cn[i] = IR[i] < (1-(1<<31)) ? 1.0 : 0.0 (unsure about this instruction)
vsge.t C030, MAC1MAC2MAC3, C030 // cp[i] = IR[i] >= (0+(1<<31)) ? 1.0 : 0.0
vmul.t C020, C020, C000 // A[i]p = cp[i]<<(31-i)
vmul.t C030, C030, C010 // A[i]n = cn[i]<<(28-i)
vadd.t C020, C030, C020 // A[i] = A[i]p + A[i]n
vadd.s S020, S020, S021 // FLAGS = A[1] + A[2];
vadd.s S020, S020, S022 // FLAGS += A[3];
vf2i.s S020, S020
mfv t8, S020
or t9, t9, t8 // t9 contains (GTE FLAG >> 12).

...

li t8, 0x7F87E // CHECKSUM = 1 if one of those bits are set
and t8, t9, t8
beql t8, zero
lui t8, 8
or t9, t8, t9
sll t9, t9, 12
mtv t9, FLAG

for IR1/2/3 with lm=0 (no negative limit):

vli.t C000, [1<<(24-12), 1<<(23-12), 1<<(22-12)] // bits B1, B2, B3
vli.t C010, [MAX, MAX, MAX] // 65535.0
vslt.t C020, C010, IR1IR2IR3 // 65535.0 < IR[i] ? 1.0 : 0.0
vmin.t IR1IR2IR3, IR1IR2IR3, C010 // IR[i] = min(IR[i], 65535.0)
vmul.t C020, C020, C000
vadd.s S020, S020, S021
vadd.s S020, S020, S022
vf2i.s S020, S020
mfv t8, S020
or t9, t9, t8 // t9 contains GTE FLAG.
...
blahblah

well i dunno if it works...

Well of course I didn't realize the existence of the vslt instructions, in fairness you just found them so that changes things a lot. What I was saying is that you can keep the GTE flags in their separate VFPU registers if possible (if you have room) and only do the conversions when you need it.

Originally Posted by hlide

well i use 6 matrixes for permanent usage so only matrix 0 and 1 are left for any transient purpose in my emulator. :/

I'd have to see what kind of layouts you can get away with. 2 matrices is still 32 registers right?

Originally Posted by hlide

well, I think you see the point why i cannot load/store GTE registers as float register :/

Yes, but since you can still statically determine within blocks what form the registers are taking you can at least leave them cached as floating point registers inbetween certain GTE instructions, don't you think? And hope error doesn't accumulate too much.

**hlide** · November 7th, 2006, 16:11

Originally Posted by Exophase

For the interpolation they probably usually won't, isn't it usually used for color?

I guess

Originally Posted by Exophase

Well of course I didn't realize the existence of the vslt instructions, in fairness you just found them so that changes things a lot. What I was saying is that you can keep the GTE flags in their separate VFPU registers if possible (if you have room) and only do the conversions when you need it.

don't worry, i really understand your point.

actually this code has some errors and I found right now a way to squeeze more :

____________________
viim.s S011, 16384 # 1<<(26-12)
viim.s S010, 8192 # 1<<(25-12)
vadd.s S012, S011, S011 # 1<<(27-12)
viim.s S000, 8
vscl.t C000, C010, S000 # [1<<(30-12),1<<(29-12),1<<(28-12)]

lvi.t C020, [-2147483648.0, -2147483648.0, -2147483648.0] # TOFIX they must be 43-bit limit not 30-bit limit :/
lvi.t C030, [+2147483647.0, +2147483647.0, +2147483647.0] # TOFIX

vslt.t C020, C130, C020 # cn[i] = MAC[i] < -2147483648.0 ? 1.0 : 0.0
vsge.t C030, C130, C030 # cp[i] = MAC[i] >= +2147483647.0 ? 1.0 : 0.0

vdot.t S100, C020, C000 # Ap = (cp[0] * 1<<(30-12-i)) + (cp[1] * 1<<(29-12-i)) + (cp[2] * 1<<(28-12-i))
vdot.t S133, C030, C010 # An = (cn[0] * 1<<(27-12-i)) + (cn[1] * 1<<(26-12-i)) + (cn[2] * 1<<(25-12-i))
vadd.s S133, S100, S133 # (FLAGS >> 12) = (An + Ap);
...
___________________________

funny to think that you can use a dot product to merge bits

. Well, i hope it is only 1 cycle :/

when you try to read FLAG (S333) with "mfc2 rd, FLAG", you just need to :
transform FLAG content into integer, test a mask to see if you need to set CHKSUM bit to 1 and shift all to left 12 bits. (I don't see why we need to do so after a gte instruction, to do so just when reading FLAG should be enough instead). Of course if we could set the other bits only when reading this FLAG that would be better but is more difficult because you need to pre-analyse the code to generate.

Originally Posted by Exophase

I'd have to see what kind of layouts you can get away with. 2 matrices is still 32 registers right?

yes : 4 x 4 = 16 registers, so you need 2 matrixes for 32 registers. GTE has 64 registers indeed so 4 matrixes. My GPR dnd GTE registers are loaded and stored in VFPU registers. The 2 left matrixes are scratch matrixes. For MDEC (i'm planning to use the VFPU vbfy1/2 "butterfly" instructions i recently digged in MDEC algorihtm) I maybe need to "steal" some matrix to GTE, since I don't think MDEC is used in conjonction with GTE. Keep in mind that GTE register has integer form, if you want all them to be float, you will need nearly 8 matrixes if I remember well.

Originally Posted by Exophase

Yes, but since you can still statically determine within blocks what form the registers are taking you can at least leave them cached as floating point registers inbetween certain GTE instructions, don't you think? And hope error doesn't accumulate too much.

_______________ DeViL code >8>=>
lwc2 MAC1, 0(a0)
lwc2 MAC2, 4(a0)
lwc2 MAC3, 8(a0)
...

bltz a1, t0, 0f
nop
jal gte_gpl12
...
b 1f
...
0: jal gte_gpl0
...
1:
swc2 MAC1, 0(a2)
swc2 MAC2, 4(a2)
swc2 MAC3, 8(a2)
...
_______________

how do you know ?(a0) contains indeed a [1:3:12] format or a [1:16:0] before encountering the real GPL12 instruction ?
and how you will store ?(a2) ?

**zodttd** · November 9th, 2006, 03:03

I believe MDEC is completely seperated from the GTE. So there should be no worries there, the two shouldn't "mix".

**Zaitmi** · November 12th, 2006, 23:58

Sorry, I'm a little slow... but is dynarec implemented in this emulator already?

**hlide** · November 13th, 2006, 01:29

dynarec is done at 90%, still need to make GTE, COP0 and HLE dynarec.
what is missing is a complete hardware emulation (GPU, CDR, SIO, etc.)
So there is a dynarec but not an emulator. Normally i'm pairing with another developper to make a complete emulator and waiting for an SVN access to develop together.

**tsurumaru** · November 13th, 2006, 21:58

Originally Posted by hlide

dynarec is done at 90%, still need to make GTE, COP0 and HLE dynarec.
what is missing is a complete hardware emulation (GPU, CDR, SIO, etc.)
So there is a dynarec but not an emulator. Normally i'm pairing with another developper to make a complete emulator and waiting for an SVN access to develop together.

Hi hlide.

This may be unfeasible, but if you have an almost complete dynarec, and the Anonymous Coder has an emulator without a working dynarec, why don't you ask Wraggy to put you guys in touch and see if you can't collaborate....?

**Exophase** · November 13th, 2006, 22:08

He wants to promote new emulators written from scratch.

**tsurumaru** · November 13th, 2006, 22:20

Originally Posted by Exophase

He wants to promote new emulators written from scratch.

Well thats an honourable enough intention, I didn't realise though, is the Anonymous Coder's emulator a PCSX port?

**hlide** · November 13th, 2006, 22:31

Originally Posted by tsurumaru

Hi hlide.

This may be unfeasible, but if you have an almost complete dynarec, and the Anonymous Coder has an emulator without a working dynarec, why don't you ask Wraggy to put you guys in touch and see if you can't collaborate....?

well the way I want my dynarec to run is not totally compatible with his emulator which is undoubtedly derived from pcsx-like emulator since it is very specific to psp. I mean I cannot even use my dynarec for another MIPS-based platform. What you need is not to adapt the dynarec to the emulator but to adapt the emulator to the dynarec if you want to retain all its potentiality. Which is basically the same as rewriting from the scratch.

***wraggster*** · November 13th, 2006, 22:38

this is turning out to be a very interesting thread, not that i get a chance to read threads these days, too busy /too tired

Thread: [WIP] YAPSxP: Yet Another PSX Emulator for PSP

Thread Tools

Thread Information

Users Browsing this Thread

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions