PDA

View Full Version : FAME SH4 is here!



fox68k
October 8th, 2004, 17:18
And finally, after a long time, i have just finished my Motorola 68000 emulation library written entirely in SH4 assembly. ;D

I would love to see full emulation speed in many emulators out there like NeoGeo CD and UAE4ALL among others. So i encourage every DC emu coder to use this lib and to help me with bug fixing. I will try my best. ;)

It is not completely tested but i could wait no longer to see this lib out. Of course, it is in a very early development stage (alpha) so do not except bug free.

You can download the library here (http://www.emumega.com/s68k/fame/fame.zip).

I will release a source package (including sources and notes about development) in a few days.

P.D: FAME site is not updated with news yet.

Ian_micheal
October 8th, 2004, 18:36
Great news fox68k :o downloading it now..

quzar
October 8th, 2004, 19:33
awsome. gonna test it out.

edit: is this api compatible with the x86 version of FAME?

qatmix
October 9th, 2004, 02:12
Excellent work Fox

ian can you email me a version of the neocd emu with sound enabled to test on the overclocked dc?

quzar
October 9th, 2004, 02:46
Excellent work Fox

ian can you email me a version of the neocd emu with sound enabled to test on the overclocked dc?

The sound is enabled in version 9.3. Or that is to say the sound CPU is running, its just not outputting properly. 9.3 runs at the speed that it should run if sound was working with the current z80 emulation core.

ron
October 9th, 2004, 04:42
FOX, that's an incredible work, I promise that will be tested until last drop, and of course included in our emus for DC. Great Work !!! and keep on the core. PUXA ASTURIES !!!!!

arqueiro
October 11th, 2004, 04:54
amazing work FOX ! we are wainting for this !
this is a gift to the scene :P

fox68k
October 11th, 2004, 07:08
awsome. gonna test it out.

edit: is this api compatible with the x86 version of FAME?

Fully compatible. 8)

Cheers.

qatmix
October 11th, 2004, 10:48
The sound is enabled in version 9.3. Or that is to say the sound CPU is running, its just not outputting properly. 9.3 runs at the speed that it should run if sound was working with the current z80 emulation core.


Wow that means it runs super smooth and in a frame! Any news on progress with this emu and workign sound?

dcsteve
October 11th, 2004, 18:07
have any of you successfully been able to incorperate this into a dc emu? I request a of compile it with the last release source of genesis plus for dc..

http://files.frashii.com/~sp00nz/Doom/files/BlackAura/gpdc-pvr-5.tar.bz2

It requires a version of KOS with SPU DMA. He's using 1.3.x incr 133.

quzar
October 11th, 2004, 21:24
Wow that means it runs super smooth and in a frame! Any news on progress with this emu and workign sound?

actually i guess i wasnt clear on that. sound isnt working not because there isnt enough speed for it (that is what frameskip is for) but because it is literally broken. The sound processor is running, but not at fullspeed. This is how ian got it to slowdown after taking out my frame limiting code =\.

Since ian quit i dont know if there are gonna be any more updates on the emulator. The only thing i can think of would be to redo the z80 emulation with another z80 emulator that is prooven to work on the dc. either that or take out some compiler flags that ian used with it that i have heard to be problematic in stuff.

evilo
October 12th, 2004, 04:21
Since ian quit i dont know if there are gonna be any more updates on the emulator. The only thing i can think of would be to redo the z80 emulation with another z80 emulator that is prooven to work on the dc. either that or take out some compiler flags that ian used with it that i have heard to be problematic in stuff.



ian quit the scene ?????????????

Else about sound, I currently started to implement it in the actual WIP version (of the ps2 port of course), and I finally completely removed the streams functions (stream.c & stream.h) and directly update both cpu sound and mix them for each frame, result is that it's starting to work more or less. fm sound is working more or less, and sfx are working but not on all games.
Ym2610 sound core is also almost 10 years old, so I think to update it with a more recent one.
Anyway, I'm also thinking to replace the z80 core, as this is strange that some games have sound working, and others no...

qatmix
October 12th, 2004, 11:02
so is the DC version dead? *???

quzar
October 12th, 2004, 15:11
Well, im pretty sure I am the DC coder with the second most experience with it. If evilo is willing to help me implement some of the things he did in the PS2 version, maybe I can try to make an update?

Cereal
October 12th, 2004, 22:07
It's at times like this I wish someone had a picture of a PS2 hugging a DC saying it will be ok.

quzar
October 12th, 2004, 22:38
It's at times like this I wish someone had a picture of a PS2 hugging a DC saying it will be ok.

considering we have all seen the picture of the PS2 beating the DC to within an inch of its life, that would be a nice change.

qatmix
October 13th, 2004, 03:38
great stuff, It would be really sad not to get sound working on the NEO CD emu. as its soooo close (ish).

evilo
October 13th, 2004, 09:12
Well, im pretty sure I am the DC coder with the second most experience with it. If evilo is willing to help me implement some of the things he did in the PS2 version, maybe I can try to make an update?

I will help you with pleasure on fixing the sound code. I already provided fixes to ian on the z80 core, and he also provided me with fixes on gfx issues. ps2 or dc or even gc or whatever platform it is, we are anyway all in the same community !
Neocd/ps2 is also an open source project, so once next release will be out (once, additionaly to others additions, I will reach a good state for the sound support) you are free to pick it up from my site (neocd.ps2-scene.org) and report my modification to the dc source code, or if you want contact me to my email address ([email protected]) and I will send you current ps2 sound implementation. About helping you more deeply, yes but be aware that the ps2 is NOT using the SDL library, and that I don't have the DC SDK installed, so apart for the code itself, I won't be able to go further with any assistance.

see you soon.

And best wishes to ian :)

Cereal
October 13th, 2004, 10:04
I've never see a coder of another system show such great respect and friendly ness toward other systems coders. It's great to see these kind of things in the comuinity.

dcsteve
October 13th, 2004, 12:56
wow, you seem to know alot about fixing the sound. Do you think you can fix the sound in this genesis emulator for dreamcast? It works but it keeps looping..

http://files.frashii.com/~sp00nz/Doom/files/BlackAura/gpdc-pvr-5.tar.bz2

It requires a version of KOS with SPU DMA. He's using 1.3.x incr 133.

Cereal
October 13th, 2004, 13:40
KOS is for DC only and he's a PS2 coder why would he know anything about that?

GPF
October 13th, 2004, 13:43
well technically there is a branch of KOS for the PS2 :)

Don't know how mature the branch is though.

Cereal
October 13th, 2004, 13:46
well technically there is a branch of KOS for the PS2 :)

Don't know how mature the branch is though.Holy Fly dog pooh batman I never knew that. Did Dan make that also?

quzar
October 13th, 2004, 14:16
kos has implementations for the DC, PS2, and GBA.

evilo: I'll keep an eye out for the next version and try to keep a lookout for the changes in it. At the moment I'm a bit busy with a little thing I'm trying to get working called DCPS ;)

Cereal
October 13th, 2004, 14:42
DCPS eh. Lets just stomp any hype and rumors right now. Thats not a PlayStation emulator for DC right?

quzar
October 13th, 2004, 16:53
ah, time to change the name.... i hadnt even though of that :P

MetaFox
October 13th, 2004, 17:05
At the moment I'm a bit busy with a little thing I'm trying to get working called DCPS ;)Working on the children's game contest, eh?

With the "DreamCast Play-doh Simulator"? ;D

Cereal
October 13th, 2004, 18:55
Dude that would be totally awsome. Like to play with 3d playdoh and create animations. A mario paint thing with clay. Wow that would be awsome.

quzar
October 13th, 2004, 19:00
Well, based on my recent posts on DCemulation, its probably clear what im doing =P

That play-doh idea does sound cool though =P

Cereal
October 13th, 2004, 20:23
But why are you naming a CPS1 emulator or whatever mame thing your working on by that title?

quzar
October 13th, 2004, 23:37
DC+CPS = DCPS

I have a few novel Ideas for how to get some of the larger games working even though it is a full emulator. My goal though is to get it up to 50% real speed without changing out the cpu core, and without any sort of underclocking or frameskipping. I think im doing kindah good cause so far ive gotten it from 10% to around 20% speed (this is all with sound running).

Stef
October 14th, 2004, 01:10
wow, you seem to know alot about fixing the sound. Do you think you can fix the sound in this genesis emulator for dreamcast? It works but it keeps looping..

http://files.frashii.com/~sp00nz/Doom/files/BlackAura/gpdc-pvr-5.tar.bz2 *

It requires a version of KOS with SPU DMA. He's using 1.3.x incr 133.

I don't remember if this version still use my basic sound code... but if it's the case, it's normal the sound loop a bit : the code is quite simple and can work correctly only if the emulator is running at the correct speed (with frame skip or not).
We can modify it to work even without having 100% speed, but as 100% speed is (imo) a must, it will be a waste of time... i think it's better to just wait of having the emulator running at correct speed instead :)

BlackAura
October 14th, 2004, 01:51
It is using your basic sound code still. The version I was playing with has a different sound system, but is also has synchronisation issues. They just sound completely different - you get periods of terrible noice instead of looping.

One question about FAME. In the documentation it says:
The last field is a 32-bit pointer to the data of the memory region. The data pointed by it must be allocated in native (Motorola) format. If not, the data will be fetched incorrectly. Make sure of this fact.

However, the example program actually byte swaps the program it loads. Does that mean that FAME needs the program / RAM to be in Intel byte order, or Motorola byte order? It seems a little unclear on that point. And, if it's in Motorola byte order, are all the paramaters (address, data) passed to the callback functions in Intel or Motorola byte order?

I'm around half way through trying to get the PC version of Genesis Plus working with FAME. I can switch the byte-orderness of the emulator by changing a #define anyway. All I really need to do is break the memory read/write functions up into separate functions, so FAME can call them by itself, and hook the CPU code up. Stef's adapter layer makes it pretty easy to change CPU emulators.

BlackAura
October 14th, 2004, 04:33
Well, I can't get it to work. It seems to be executing code, but it keeps resetting. As near as I can tell, it's writing to the bits of the M68k address space than cause the system to crash, which means something's gone quite badly wrong. Here's the debug output from trying to run Sonic 1

PC:00000206 SP:00FFFE00
PC:00000206 SP:00FFFE00
Lockup 00A07F00 = 7C (000018C0)
reset!
PC:00000206 SP:00FFFE00
Lockup 00A07F00 = 7C (000018C0)
reset!
PC:00000206 SP:00FFFE00


PC:0000032E SP:00FFFE00 SR:2771
D0:0007FFFF A0:0004F534
D1:00004C0F A1:000001A4
D2:00000000 A2:00000000
D3:00000000 A3:00000000
D4:00000000 A4:00000000
D5:00000000 A5:00000000
D6:00000000 A6:00000000
D7:00000000 A7:00FFFE00

PC:0000 SP:0000
AF:0040 AF:0000
BC:0000 BC:0000
DE:0000 DE:0000
HL:0000 HL:0000
IX:FFFF IY:FFFF

The only place I'm doing any interfacing with FAME is in cpu68k, and I'm just using the existing memory read/write functions to do everything, so my setup is very simple:
struct M68K_CONTEXT cpu_contxt;
struct M68K_PROGRAM prg_fetch[] =
{
{ 0x000000, 0x3FFFFF, (unsigned)cart_rom },
{ 0xFF0000, 0xFFFFFF, (unsigned)work_ram },
{ -1, -1, NULL }
};

struct M68K_DATA data_rb[] = {
{ 0x000000, 0xFFFFFF, m68k_read_memory_8, NULL },
{ -1, -1, NULL, NULL }
};

struct M68K_DATA data_rw[] = {
{ 0x000000, 0xFFFFFF, m68k_read_memory_16, NULL },
{ -1, -1, NULL, NULL }
};

struct M68K_DATA data_wb[] = {
{ 0x000000, 0xFFFFFF, m68k_write_memory_8, NULL },
{ -1, -1, NULL, NULL }
};

struct M68K_DATA data_ww[] = {
{ 0x000000, 0xFFFFFF, m68k_write_memory_16, NULL },
{ -1, -1, NULL, NULL }
};

fox68k
October 14th, 2004, 14:04
One question about FAME. In the documentation it says:

However, the example program actually byte swaps the program it loads. Does that mean that FAME needs the program / RAM to be in Intel byte order, or Motorola byte order? It seems a little unclear on that point. And, if it's in Motorola byte order, are all the paramaters (address, data) passed to the callback functions in Intel or Motorola byte order?

I'm around half way through trying to get the PC version of Genesis Plus working with FAME. I can switch the byte-orderness of the emulator by changing a #define anyway. All I really need to do is break the memory read/write functions up into separate functions, so FAME can call them by itself, and hook the CPU code up. Stef's adapter layer makes it pretty easy to change CPU emulators.

Every map region (fetch and data) has to be in big (Motorola) endian ordering.

About memory read/write functions take a look at "Memory handling" section where you will find enough information about parameters and other details.

If you have any doubt after that, do not hesitate in ask again.

fox68k
October 14th, 2004, 14:11
The only place I'm doing any interfacing with FAME is in cpu68k, and I'm just using the existing memory read/write functions to do everything, so my setup is very simple:
struct M68K_CONTEXT cpu_contxt;
struct M68K_PROGRAM prg_fetch[] =
{
* * *{ 0x000000, 0x3FFFFF, (unsigned)cart_rom },
* * *{ 0xFF0000, 0xFFFFFF, (unsigned)work_ram },
* * *{ -1, -1, NULL }
};

struct M68K_DATA data_rb[] = {
* * *{ 0x000000, 0xFFFFFF, m68k_read_memory_8, NULL },
* * *{ -1, -1, NULL, NULL }
};

struct M68K_DATA data_rw[] = {
* * *{ 0x000000, 0xFFFFFF, m68k_read_memory_16, NULL },
* * *{ -1, -1, NULL, NULL }
};

struct M68K_DATA data_wb[] = {
* * *{ 0x000000, 0xFFFFFF, m68k_write_memory_8, NULL },
* * *{ -1, -1, NULL, NULL }
};

struct M68K_DATA data_ww[] = {
* * *{ 0x000000, 0xFFFFFF, m68k_write_memory_16, NULL },
* * *{ -1, -1, NULL, NULL }
};

Your code looks ok, but may be the handling functions are not working in the expected way.

I have set up a little example with only one memory region. Take a look at the way function handlers work:



int readbyte(int address, struct M68K_DATA *ds) {
printf("Calling read byte memory handling...\n");
printf("Direccion: %08X\n",address);
printf("Lectura: %02X\n",ram[address^1]);
return ram[address^1];
}

int readword(int address, struct M68K_DATA *ds) {
printf("Calling read word memory handling...\n");
printf("Address: %08X\n",address);
return ((unsigned *)ram)[address>>1];
}

void writebyte(int address, int data, struct M68K_DATA *ds) {
printf("Calling write byte memory handling...\n");
printf("Address: %08X Data: %08X\n",address,data);
ram[address^1] = data & 0xFF;
}

void writeword(int address, int data, struct M68K_DATA *ds) {
printf("Calling write word memory handling...\n");
printf("Address: %08X Data: %08X\n",address,data);
((unsigned *)ram)[address>>1] = data & 0xFFFF;
}

. . . .

// Defining memory map
M68K_PROGRAM p68ks[2] = {
{0x000000, 0x03FFFF, (unsigned)ram},
{-1,-1,NULL}
};

M68K_DATA d68_rb[2] = {
{0x000000, 0x03FFFF, readbyte, NULL},
{-1,-1, NULL, NULL}
};

M68K_DATA d68_rw[2] = {
{0x000000, 0x03FFFF, readword, NULL},
{-1,-1, NULL, NULL}
};

M68K_DATA d68_wb[2] = {
{0x000000, 0x03FFFF, writebyte, NULL},
{-1,-1, NULL, NULL}
};

M68K_DATA d68_ww[2] = {
{0x000000, 0x03FFFF, writeword, NULL},
{-1,-1, NULL, NULL}
};


I hope you find this useful.

Cheers.

BlackAura
October 14th, 2004, 20:12
Actually, I can't make any sense of it. Those handler functions work the same way as the ones in Genesis Plus anyway, but Genesis Plus stores everything in Intel byte order instead of Motorola byte order.

When it loads a ROM image, is runs through and does byte swapping, so if you're reading a single byte, you're actually reading from [address ^ 1], and you can read a word from memory and it'll be in native (Intel) byte order. That seems to be what your handlers are doing. As far as I can tell, this is what the example program included with FAME does too.

But that doesn't make sense if everything is stored in Motorola byte order. If the ROM or RAM image is in Motorola byte order, that means that it's stored exactly as it would be in a real system's memory, so you wouldn't need to change the address for byte accesses, and you'd need to byte-swap all word accesses.

This is what's confusing me. All the example code you've given seems to indicate that it works in Intel (little endien) byte order, but all the documentation and descriptions indicate that it works in Motorola (big endien) byte order.

I can only get the emulator to do anything if I store the ROM and RAM images in Intel byte order. It initialises the video hardware, changes the video mode, enables interrupts on the VDP and the CPU, and then it runs for about ten frames, and crashes by writing to an invalid memory address. If I'm storing things in Motorola byte order, the initial SP and IP are incorrect, and it does absolutely nothing.

fox68k
October 15th, 2004, 18:33
Well, i will try to explain how FAME is working... and how is the process implied in.

- We have a ROM image; we load it in RAM, so it is stored in little endian format. After that we do the byte swapping, so now, the memory ROM located in the PC RAM is in big endian format, right? ok.

- But, why did we byte-swap the ROM? because the CPU emulators (FAME in this case) usually fetch and decode the opcodes from that memory (ROM) so it has to be in big endian format (the 68k endianness). ok for the opcodes.

- Now it is time to read words. Every memory region is in big endian format, so read/write the whole word directly. Not swap needed.

- And the last issue is about read/write bytes. Here, the point is the access type. Unlike with the word access, we are accessing a byte, which is inside the word, so the endianness matters in the read/write process. We are reading/writting in a little endian fashion, and the memory is in big endian format, so we have to byte-swap the access address.

This byte-swap is done internally in the case of memory regions pointer by the "data" pointer but not performed in the case of function handlers so it is responsability for the programmer.

- And now about the debug output i found that this result in the PC and SP looks to be correct:

PC:00000206 SP:00FFFE00

The problem is probably in the read/write accesses. Take a look at how they work and if they are compatible with the read/write process used by FAME.

Anyway, if you can not fix the problem, get in contact with me by PM and i will try to see what is happening there. ;)

BlackAura
October 15th, 2004, 22:45
Ah... I see. Just a difference in terminology. I would consider the raw image to be in big-endien format, and the byteswapped version to be in little endien format. Which means FAME works exactly the way I thought it worked, which is the same way that C68k or StarScream already work, and the same way that Musashi had been configured for Genesis Plus anyway. No problem there then.

As far as I can see, the read/write handlers work the same way as FAME expects them to. A word access just does a word access, and a byte access flips the last bit of the address before doing anything. I have FAME handling all ROM accesses, and accesses to the most commonly used copy of RAM (FF0000 - FFFFFF), and the read/write handlers aren't being called for RAM or ROM access at all in the game I'm testing with (Sonic 1).

It seems to be reading from ROM or RAM correctly, but then it starts going weird. For Sonic 1, it seems that a loop that's supposed to fill a region of memory never stops running, and it eventually hits part of the address space that causes a real MegaDrive to lock up, or causes GP to reset. Most other games actually crash the entire emulator...

One question - what's the x86 ELF version supposed to be used for? I'm trying to use it on Linux, with GCC 3.3.3 and Binutils 2.14. I haven't tried the SH-4 version out yet, because it's usually easier to debug things on a PC...

fox68k
October 16th, 2004, 08:53
One question - what's the x86 ELF version supposed to be used for? I'm trying to use it on Linux, with GCC 3.3.3 and Binutils 2.14. I haven't tried the SH-4 version out yet, because it's usually easier to debug things on a PC...

It is for Linux, yes.

It is best to set up the emulator in PC when possible because it is easier and when you have got it running, you only have to replace the library file.

Please, let me know about your progress with FAME integration to help with any problem/issue.

Cheers.

wraggster
November 4th, 2004, 12:32
Any newspostable News on the progress of Fame SH4?? :)

fox68k
November 8th, 2004, 04:01
I am currently fixing errors. Chui is helping me finding them. He is debugging FAME running it in the NeoCD/SDL DC emu.

Hopefully i will release a new version soon. :)

chui
November 8th, 2004, 04:28
We try build a NeoCD/SDL 0.31 port with fame core and my new SDL variation with dma-transfer for speed up.
;) :P ::)

quzar
November 8th, 2004, 21:07
Is it any faster with FAME than c68k ? i'm going to be fixing the sfx issues as soon as evilo gets the sources to me, so I might need to get some extra speed in it =P.

chui
November 9th, 2004, 07:54
i run some benchs under neo4all x86 with fame and it's very fast, but blitter spend a lot of cpu.

i think, we must optimize neogeo blitter (video.c).

evilo
November 9th, 2004, 08:21
i run some benchs under neo4all x86 with fame and it's very fast, but blitter spend a lot of cpu.

i think, we must optimize neogeo blitter (video.c).



gfx code is clearly the bottleneck in neocd, with good optimisation we could gain a lot in this.. problem is that the code is already quite compact and difficult to optimize unless you know it very very welll.. on ps2 the only optimisations I've managed to use are for statement tricks (reversed, etc.. ) and this kind of stuff...
fyi, I tried several weeks ago to update the code with the last one in mame (that most of all use zoom table in rom instead of dynamically rebuild it ) but general result is slower...

quzar
November 9th, 2004, 12:35
neo4all x86? ... wait, is that what you are calling neocd with fame in it?

chui
November 9th, 2004, 13:06
yes, neo4all is neocd with fame and my variation of SDL.

now compile and runs ok under Linux (x86) and Windows without sound.

BlackAura
November 9th, 2004, 17:35
Does anyone know how complex (or otherwise) the NeoGeo graphics hardware is?

Genesis Plus / DC was suffering from pretty much the same problem - the video rendering code was taking up a huge chunk of CPU time. So (as an experiment) I re-wrote the rendering code using the Dreamcast's 3D hardware as a fast blitter, which gave a truly insane speedup - around as much as replacing Musashi with C68K did.

Some of the techniques I used are really only applicable to the Dreamcast or the Genesis, but they might be applicable to the NeoGeo as well, depending on what kind of effects the video hardware is capable of, and how accurate the emulation needs to be.

The idea is that you convert all the tile data into textures, and upload those to VRAM. Then, once per frame, you draw the background and sprite layers using those textures. On the Dreamcast, the graphics hardware will deal with graphics / sprite priorities automatically, but most systems (including the PS2) won't.

Looking at the rendering code, that's more or less what the rendering code works anyway. Once a frame, it draws the tiles to the screen each in one piece, and isn't scanline-accurate. That means that a hardware accelerated renderer might work perfectly (no graphics glitches).

quzar
November 9th, 2004, 19:55
Does anyone know how complex (or otherwise) the NeoGeo graphics hardware is?

Genesis Plus / DC was suffering from pretty much the same problem - the video rendering code was taking up a huge chunk of CPU time. So (as an experiment) I re-wrote the rendering code using the Dreamcast's 3D hardware as a fast blitter, which gave a truly insane speedup - around as much as replacing Musashi with C68K did.

Some of the techniques I used are really only applicable to the Dreamcast or the Genesis, but they might be applicable to the NeoGeo as well, depending on what kind of effects the video hardware is capable of, and how accurate the emulation needs to be.

The idea is that you convert all the tile data into textures, and upload those to VRAM. Then, once per frame, you draw the background and sprite layers using those textures. On the Dreamcast, the graphics hardware will deal with graphics / sprite priorities automatically, but most systems (including the PS2) won't.

Looking at the rendering code, that's more or less what the rendering code works anyway. Once a frame, it draws the tiles to the screen each in one piece, and isn't scanline-accurate. That means that a hardware accelerated renderer might work perfectly (no graphics glitches).

Right now im working at fixing the sound using evilo's sources for the PS2 version, and it doesnt seem like speed will be an issue, since what IM did to slow down the emulator for 9.3 was turn on the z80. If it still needs a lot of speed after the sound get to working, I'd love to try and see if I might be able to, using the example of your genesis plus previews implement hardware rendering for neocd. from all that i have poked at the renderer it seems VERY simple compared to other systems that i have looked at (tg16 ugh, genesis, cps)

evilo
November 10th, 2004, 03:27
Does anyone know how complex (or otherwise) the NeoGeo graphics hardware is?

The idea is that you convert all the tile data into textures, and upload those to VRAM. Then, once per frame, you draw the background and sprite layers using those textures. On the Dreamcast, the graphics hardware will deal with graphics / sprite priorities automatically, but most systems (including the PS2) won't.



yes, you're right, this is the best way to do it... then a good point on the ps2, is that while you are uploading a texture through dma, you can continue with the rest during the transfert ! Then I don't know on which fact you are telling that the ps2 cannot handle this... but even If I'm not the one knowing deeply the ps2, if you ask me how to manage sprite priority using the hardware, I would quickly answer you using the Z buffer !

quzar :
any result yet ?? note that for FM emulation (-> music in bust a move), the ym2610 can be a speed issue.

quzar
November 10th, 2004, 12:05
did you get any sound out of the "full bak mame" part? of just of the first part?

I spent most of my time yesterday just getting it to compile, as ian had left the sources quite broken. Then i fixed an issues with vmu loading that i noticed once i started testing, that if you dont have a vmu in, it crashes. This probably accounted for a lot of users who would say that they couldnt get it to load any games.

I also made space for a mechanism to vary the cycles per vblank based on the outputfreq of the console.

I think that one of the main probelms you might be having is that you are only taking the sound directly from the ym2610 as opposed to passing it to the fm emulation. (if i read your sound.c correctly).

ill try to remove the streaming from the DC version using the SDL and replace it with OSD functions similar to what you use (grab audio buffer, stuff it into sound chip). and see what happens.

evilo
November 10th, 2004, 14:30
I actually get sound from 3 different tests, original flavour, part mame and full mame, with same result except speed that decrease when I use last mame source.

fm.c is actually the ym2610, don't be confused with 2610intf.c as it's only the cpu interface call by sound.c

at this point I have both fm (so music) & sfx working except in a few games that hangs up, but from what I found it's the IOP ("co-processor") of the ps2 that actually hangs up with probably a memory alignement issue.

BlackAura
November 10th, 2004, 22:05
yes, you're right, this is the best way to do it... then a good point on the ps2, is that while you are uploading a texture through dma, you can continue with the rest during the transfert ! Then I don't know on which fact you are telling that the ps2 cannot handle this... but even If I'm not the one knowing deeply the ps2, if you ask me how to manage sprite priority using the hardware, I would quickly answer you using the Z buffer !

The DC's graphics hardware works a little differently. It doesn't really have a Z buffer, and the graphics hardware works by taking all the data for an entire scene, and then rendering it in one pass, automatically depth sorting everything. The main advantage that gives you is that you don't have to bother sorting transparent polygons - the hardware will automatically do that for you, so you can send the polygons in pretty much any order you like, and still get the same result.

The PS2 has a more conventional graphics chip, as far as I'm aware, which has a Z buffer, and renders things immediately like PC graphics cards do. This doesn't make a hardware-accelerated renderer impossible, but it might make it a little trickier to implement unless you can find some way to sort the data with little speed penalty.

Taking GP/DC as an example, the Genesis hardware deals with three layers (background 1, background 2, sprite), and each tile can have either low or high priority. On the Dreamcast, you don't really even need to worry about that - you just send the data to the video hardware any way you feel like (I did BG1, BG2, then sprites), and simply change the Z value appropriately. The hardware sorts it all out for you.

That would work on conventional hardware (like the PS2) too, were it not for the fact that the Z buffer doesn't take transparent geometry into account. You might draw a high-priority tile from BG1 first, which should appear above a low-priority tile from BG2. However, if you then try to draw a low-priority tile from BG2 beneath it, nothing will be drawn. The depth buffer values will have been set for the entire tile, not just the opaque parts, so you won't be able to see graphics behind other graphics, depending on the order you draw them.

This doesn't much matter for most 3D games, since they tend to be mostly opaque geometry, with transparency and blending usually used for special effects or 2D displays. To get them to appear properly, they're typically sorted by depth, and then sent in the correct order (usually with depth buffer writes turned off), from furthest to nearest.

The only easy way out of this is to find some way to efficiently sort the tiles into a depth order. For a Genesis, that means you'd need six rendering passes (Low priority BG1, BG2, Sprites, High priority BG1, BG2, Sprites) instead of 3, which is a lot more work to do. For a NeoGeo, the text layer is easy (draw it last). The sprite layer is more difficult, since each sprite has it's own priority value, and they could be sent in any order the game feels like. You'd probably have to sort the sprites by depth first, then draw them in that order.

That said, the PS2 might have some way to deal with this kind of thing. I know that the GS is a very weird graphics chip, so it might have some mode that you can use to eliminate the need for depth sorting.

evilo
November 11th, 2004, 04:47
no, you are wrong, as for the dc, you just need to send the data for each component (bg, layer 1, 2 etc..) in any way/order you want, and just need to specify the z value and alpha value, then the GS will do all the job for you... This is actually how I'm designing the ingame menu in neocd, as each part (shaded background, menu box, menu box gb, menu items, etc... ) are all on a separate plan. But I don't know why I didn't think to adapt it to the blitter...

anyway you should come to ps2dev, and have a look to document, or code samples, it's for sure a weird stuff, but it's surely more complex / advanced that you think !

BlackAura
November 11th, 2004, 06:17
no, you are wrong, as for the dc, you just need to send the data for each component (bg, layer 1, 2 etc..) in any way/order you want, and just need to specify the z value and alpha value, then the GS will do all the job for you...

I thought that was more the job of the EE / vector units.

Just had a look at some more info on the GS. Damn, that thing must be a pain to write software for. It seems to have been designed by some insane Japanese guy who'd been locked in a room for several years, and fed all kinds of strange hallucinogens.


But I don't know why I didn't think to adapt it to the blitter...

It's one of those things that seems obvious, but you just wouldn't think of it. I actually didn't think of it either - Dan Potter did, except he thought it might work well for a SNES emulator.

Everybody disagreed, because SNES games really require a very high level of accuracy, and many of the effects the SNES PPU uses just can't be simulated (quickly) using 3D hardware. The MD's VDP is a lot simpler, and most games don't rely on it doing funky things as much, so you can get away with it.

Since the existing renderer in NCD is really simplistic (little more than a simple tile blitter), a hardware blitter approach should work brilliantly.

However, I don't think it'll work on the Dreamcast. The NeoGeo CD has a palette of 4096 colours (out of a total of 64k), and it can display all of those at once. The Dreamcast's palette hardware has entries for 1024 colours, and since you need to describe the entire scene (including palettes) before it starts rendering, there's no way to display all the colours. Actually, there might be a way, but I don't think it'd be much fun.

fox68k
December 16th, 2004, 17:29
Well, after a hard debugging work, here is a stable version of FAME.
Lots of bugs have been fixed and now a good bunch of Neo Geo CD games work on Neo4All and Genesis Plus perfectly.

I would love to hear from coders trying to use FAME in their projects.

Special thanks goes here to Chui, for his excellent, unvaluable help and contribution to FAME. ;) And to BlackAura and Mekanaizer for his efforts trying to get FAME up in Genesis Plus.

Download the package here (http://perso.wanadoo.es/orayo/fame/fame.zip).

P.D: Keep me updated with any bug or problem with FAME. I will try to help you as soon as possible.

rattboi
December 18th, 2004, 00:21
Special thanks goes here to Chui, for his excellent, unvaluable help and contribution to FAME. ;) And to BlackAura and Mekanaizer for his efforts trying to get FAME up in Genesis Plus.



lol, "unvaluable"

BlackAura
December 18th, 2004, 03:31
I have it working in Genesis Plus / DC. At least, I do on my PC test version. When I get around to merging it with the Dreamcast code again, I'll test it out there.

It does seem a little buggy though. A few games don't seem to work properly (they usually crash the entire emulator), or crash periodically, and a few games exhibit some weird bugs. I'll have to set up some decent debugging / tracing capability, so we can tell where FAME does things differently to the other CPU emulators I can use in Genesis Plus (at the moment, Musashi, C68k, and StarScream).

Stef
January 4th, 2005, 01:28
What about FAME's core by the way ? what about the speed ?
I guess it can still help in NeoCD...
My C Z80 core is almost done, need a lot of debug though :p
I really optimised it for code size, not really for speed.
But on dreamcast optimisation for code size give good result on speed ;)
I'm impatient to see it in action with Genesis Plus :)

qatmix
January 4th, 2005, 03:05
great stuff stef. A fast z80 will help in lots of emus.

warmtoe
January 4th, 2005, 14:05
I'm impatient to see it in action with Genesis Plus :)


Me too - got a link?!?!?!!

RockinB
January 4th, 2005, 17:13
You may know that SH4 assembly is a superset of SH2 assembly.

Now, is FAME SH4 compatible with SH2? If not, could it be made compatible?
And also very important: how much resources does it consume(memory). As for Musashi m68k, I was shocked by it's size...

Christuserloeser
January 4th, 2005, 17:14
That's unbelievable good news ;D Makes me very happy to hear that you're that advanced. Thanks for posting Stef :)

Christuserloeser
January 4th, 2005, 17:19
You may know that SH4 assembly is a superset of SH2 assembly.

Now, is FAME SH4 compatible with SH2? If not, could it be made compatible?
And also very important: how much resources does it consume(memory). As for Musashi m68k, I was shocked by it's size...

Talk about that with BlueCrab - he seems to know about SH2-to-SH4 as he's working on a SAT emu... basicly it should be compatible.

Stef
January 5th, 2005, 02:11
Me too - got a link?!?!?!!


Not really, i was speaking about my own WIP version ;)
and for the moment i still have many stuff to fix and complete in the z80 core :)

Sakuragi
January 6th, 2005, 06:34
wow un compatriote et qui plus est l auteur du tres bon gen :D. Je n ai pas tout suivit mais tu travailles sur le port de Gen ou ... ?

Stef
January 7th, 2005, 02:02
En fait je travaille à rendre Genesis Plus DC plus rapide tout en restant en language C pour la portabilité :)
J'ai déjà fait un nouveau core 68000 (C68K) pour Genesis Plus, ce qui a sensiblement amélioré la vitesse de Genesis Plus. Mais le Z80 reste encore assez lent, j'ai donc réécri un nouveau core Z80 en C toujours, mais en essayant d'optimiser la taille du code surtout, pour soulager le petit cache du CPU de la DC :) *En théorie il devrait être plus rapide que l'ancien Z80 ;)
Après le Z80, peut-être je m'attaquerai à un nouveau moteur de rendu du VDP en software. La version hardware de BlackAura est très rapide, mais comporte quelques limites concernant les effets spéciaux sur les scanlines ou les scrolls... c pour ça qu'un core software mais plus rapide que l'actuel serait vraiment une très bonne chose.

Sakuragi
January 7th, 2005, 05:36
je te remercie de m avoir repondu ^^ . J aimerai bien m y connaitre dans tout ça pour pouvoir vous aider moi aussi :-/ . Bon courage a toi en tout cas ;) . Tu penses qu on pourrait voir arriver un émulateur megadrive a pleine vitesse et avec le son correctement émulé prochainement ?

Stef
January 7th, 2005, 06:16
je te remercie de m avoir repondu ^^ . J aimerai bien m y connaitre dans tout ça pour pouvoir vous aider moi aussi *:-/ . Bon courage a toi en tout cas *;) . Tu penses qu on pourrait voir arriver un émulateur megadrive a pleine vitesse et avec le son correctement émulé prochainement ?

Merci.
Oui je suis sur qu'on y arrivera ;)
Quand je sais pas, mais on y arrivera.

BlackAura
January 10th, 2005, 02:41
I actually have a lot of the special effects worked out already. It won't work correctly for every game, but most of them should look OK. As long as they don't rely on sprite masking, accurate shadow/hilight (I can only do an approximation of it), or accurate window emulation (again, only an approximation).

That reminds me... Got a quick explaination of how shadow/hilight mode works? I can actually render the stuff (approximately), but I can't find the appropriate VDP registers to turn it on and off again.

Stef
January 10th, 2005, 04:03
I actually have a lot of the special effects worked out already. It won't work correctly for every game, but most of them should look OK. As long as they don't rely on sprite masking, accurate shadow/hilight (I can only do an approximation of it), or accurate window emulation (again, only an approximation).

What type of specials effects are actually working with your hardware accelerated core ?
As you said, some complex effects as sprite masking / complex priorities / hilight / shadow effect can't be done or not accuratly... but i'm thinking about scanline based scrolling or vertical cell scrolling, i guess you can't do them too right ?




That reminds me... Got a quick explaination of how shadow/hilight mode works? I can actually render the stuff (approximately), but I can't find the appropriate VDP registers to turn it on and off again.

Do you have the sega2.doc file (also exist in html version), you'll find all VDP register description.

BlackAura
January 11th, 2005, 01:21
Scanline scrolling works. Kinda. There's two ways to do it, and I've not yet decided which one is best. Basically, I need to benchmark them to find out which is faster.

The first way is to render each tile as 8 tristrips instead of one. That involves transferring 32 vertices and one polygon header (instead of 4+1), so it should be around 7 times slower. However, it would work pretty much perfectly, and it could probably be made a little faster.

The second way is to render each tile as one tristrip, but with 10 vertices instead of 4. This will work fine (and may actually look slightly better) for games that use scanline scrolling for distortion or wobble effects, such as heat haze (Sonic games), or using it to interpolate between tiles (Sonic 2 and 3 special stages). However, because each tile is still one object, large differences in the scroll values between adjacent scanlines will look ugly, because the PVR will interpolate between the scanlines. It's much faster than the previous method though - it should be just under 3 times slower than the standard per-tile renderer. This is the one I'm currently using.

Vertical cell scrolling should work, but I'm not sure how to make it interact with scanline scrolling. For the moment, I'm making the assumption that it doesn't, and I'm completely ignoring it when scanline scrolling is enabled. Obviously that's not correct, but I can't think of a good way to deal with that right now.

Changing the horizontal scroll values in the middle of a frame (Contra) will never work with the current renderer, because it needs scanline accuracy. I can deal with one mid-frame palette change per frame, but no more than that, because the code is somewhat... dumb. I could do better, but I'm not going to bother unless I find a game that actually needs it.

I could go to a per-scanline rendering system, but I think the additional overhead would probably kill any speed gains using the PVR would give us.

I don't have the sega2.doc file. I did at some point, but I was never able to actually open it with anything.

Stef
January 11th, 2005, 03:00
Scanline scrolling works. Kinda. There's two ways to do it, and I've not yet decided which one is best. Basically, I need to benchmark them to find out which is faster.

The first way is to render each tile as 8 tristrips instead of one. That involves transferring 32 vertices and one polygon header (instead of 4+1), so it should be around 7 times slower. However, it would work pretty much perfectly, and it could probably be made a little faster.


I remember i spoke with the author of PGen (genesis emulator for PS2) about this method... but i was always convinced the x8 overhead in vertices transfert will be too heavy to be really interesting.
I never got any news from the author of PGen about his hardware accelerated VDP core. I think he never had the time to do it :-/



The second way is to render each tile as one tristrip, but with 10 vertices instead of 4. This will work fine (and may actually look slightly better) for games that use scanline scrolling for distortion or wobble effects, such as heat haze (Sonic games), or using it to interpolate between tiles (Sonic 2 and 3 special stages). However, because each tile is still one object, large differences in the scroll values between adjacent scanlines will look ugly, because the PVR will interpolate between the scanlines. It's much faster than the previous method though - it should be just under 3 times slower than the standard per-tile renderer. This is the one I'm currently using.

Hehe, smart idea ;)
As you said, interpolation will probably give stranges results on large scrolling between 2 scanlines, but i guess that deserves some tests :)



Vertical cell scrolling should work, but I'm not sure how to make it interact with scanline scrolling. For the moment, I'm making the assumption that it doesn't, and I'm completely ignoring it when scanline scrolling is enabled. Obviously that's not correct, but I can't think of a good way to deal with that right now.


Unfortunatly, vertical cell scrolling is almost time used with scanline horizontal scrolling to simule some rotation effect (jungle book logo...)... but as in my software render i always draw at least 1 tile scanline i guess there is some way of doing it by using the first method for scanline scroll, but again, it will them become too slow.
All the stuff is to find the best compromise between speed and accuracy ;)



I could go to a per-scanline rendering system, but I think the additional overhead would probably kill any speed gains using the PVR would give us.


I agree, a software core would probably be more efficient here ;)



I don't have the sega2.doc file. I did at some point, but I was never able to actually open it with anything.

The original SEGA2.DOC was corrupted, but i do know a html version exist somewhere... i'll try to find it.

Mekanaizer
January 11th, 2005, 10:09
Hi.

Stef, if you want PM me and i send to you my working sh4_asm Z80 core. You can do what ever you want with it.

Stef and BlackAura, here is the Sega2.doc converted to htm:

http://mekanaizer.planetaclix.pt/sega2f.htm

edit: There are some errors in the doc. Corrections can be fond on some genesisDEV forums and in some sites.

-Mekanaizer-

quzar
January 11th, 2005, 17:31
the z80 sh4 core you have just the core from Ishmair's spectrum emulator modified to be a seperate module right?

either way, do you think i could get some of that my way? ;D

Stef
January 12th, 2005, 00:50
Hi.

Stef, if you want PM me and i send to you my working sh4_asm Z80 core. You can do what ever you want with it.

I'm not sure, but i believe i already have your ASM core.
Is it complete ? and how it performs ?
The current one i have here isn't complete and isn't very fast : about 20 Mhz Z80 on a 200 Mhz SH-4).

fox68k
January 12th, 2005, 05:35
It would be nice if Mekanaizer provides support to that core. The scene needs a fast and well supported Z80 core for the DC.

Stef
January 13th, 2005, 00:35
the z80 sh4 core you have just the core from Ishmair's spectrum emulator modified to be a seperate module right?

either way, do you think i could get some of that my way? *;D

Yeah you're right, it's the core from Ishmair.
Unfortunatly, even if the core is done in SH4 assembly, the overall structure isn't really optimised and need majors changes to become really fast.
I guess actual C cores can perform as fast if not faster.

Stef
January 14th, 2005, 01:26
Well, some news about my C Z80 core (aka CZ80 :p).
I'm currently working with the PC version of Genesis Plus for debug. CZ80 is heavily optimised for size more than speed since i did it first for Dreamcast architecture.
The base Z80 core used in Genesis Plus (J.B Z80) takes about 224 KB when compiled with -O1 level.
CZ80 takes only 35 KB... but as expected, it doesn't performed so great on PC and i was a bit disapointed when i compared it to J.B Z80 : CZ80 is only about 1.6 time faster :-/
I know that actuals C Z80 cores are already fast (compared to 68000 C core) and optimisations marges are smaller but anyway i was expecting better results.
I really hope to get at least a factor of 2 on DC...

Mekanaizer>
What about putting your Z80 ASM core sources available for every one ? :)
I think it can bring a lot to the DC Scene :)

quzar
January 14th, 2005, 02:20
1.6 times faster is still a very significant amount! and smaller size is always nice. will you be designing an easy wrapper like you did with c68k? also, is it API compliant with any other core? last question: when do you think it will be ready for download?

Stef
January 14th, 2005, 03:51
1.6 times faster is still a very significant amount! and smaller size is always nice. will you be designing an easy wrapper like you did with c68k? also, is it API compliant with any other core? last question: when do you think it will be ready for download?

I already done the CPU interface to switch easily from an other core. Overall structure looks like C68K and is close from the z80 core currently used in Genesis Plus.
By the way, i just found a C Z80 core which should be quite fast beware the implementation, it's the core used in Generator, you can see it here : http://www.squish.net/cgi-bin/cvsweb.cgi/generator/cmz80/

CZ80 needs some debug, i think it will be ready in 2 weeks (or more)... i'll anyway provide some beta version for tests.

RockinB
January 14th, 2005, 13:10
You're joking, right? 1.6 time faster is more one could expect.
I agree, the memory consumption of some CPU cores is really a pain. Some take LUT wherever they can, although most entries are the same.

You CZ80 would be very welcome for emus on SEGA Saturn, the performance gain and the smaller size will have a more significant impact than on Dreamcast.

This sh4 Z80 core needs to be published, that's for sure. Couldn't yet find Ishmairs sources on my own...

I really wonder, why there is noone making a dynarec core. On Saturn, this is the ultimate option to choose. I will start an SH2 dynamic recompiler and see, how far I get.

quzar
January 14th, 2005, 13:28
I already done the CPU interface to switch easily from an other core. Overall structure looks like C68K and is close from the z80 core currently used in Genesis Plus.
By the way, i just found a C Z80 core which should be quite fast beware the implementation, it's the core used in Generator, you can see it here : http://www.squish.net/cgi-bin/cvsweb.cgi/generator/cmz80/

CZ80 needs some debug, i think it will be ready in 2 weeks (or more)... i'll anyway provide some beta version for tests.


you are the second person in the past day to link me to cmz80. currently NeoDC uses mz80 (and old version, which is why its broken), and I plan on trying to replace it with cmz80. How do you think the speed of CZ80 compares to that of cmz80?

BlackAura
January 15th, 2005, 22:10
As for a DBT (aka dynamic recompiler) CPU emulator... For systems as old as these, it's probably not a good idea unless you have no other choice. Most MegaDrive games don't rely on cycle-accurate timing from the 68k, so that might be a candidate for a DBT, but it doesn't run particularly fast (around 8MHz). The overhead of generating native code blocks, caching them, and maintaining that cache might be more than you'd gain by switching over. Even then, it would require writing yet another new CPU core, which would take a long time. We may not actually need to do that at all.

For a Z80, I wouldn't even consider it. Not on a Dreamcast.

Stef
January 16th, 2005, 09:55
You're joking, right? 1.6 time faster is more one could expect.
I agree, the memory consumption of some CPU cores is really a pain. Some take LUT wherever they can, although most entries are the same.

You CZ80 would be very welcome for emus on SEGA Saturn, the performance gain and the smaller size will have a more significant impact than on Dreamcast.

Well 1.6 time faster *isn't really much, i was expecting 2.0 but err, my optimisations were*done first for dreamcast, i hope to have better results on this architecture :)

About*SEGA saturn, i'm *not sure than a simple C Z80*can be fast enough to be usable... i believe that only a strongly optimised assembly core can achieve acceptable speed. Anyway i would be interested to know how CZ80 runs on that system :)





you are the second person in the past day to link me to cmz80. currently NeoDC uses mz80 (and old version, which is why its broken), and I plan on trying to replace it with cmz80. How do you think the speed of CZ80 compares to that of cmz80?


I can't be sure without tests but CMZ80 implement alls bases stuff to make a fast core. It even does the unbase PC trick on SP (stack pointer) which can make it possibily faster than CZ80 (lacks it for now) but i guess speed to be almost the same since optimisation on SP has far less impact. That deserves some tests i guess :)