PDA

View Full Version : FAME/C generator source available?



scarnie
June 29th, 2009, 07:24
G'Day Chui,

I was wondering if the FAME/C generator source code was available somewhere to experiment with? I'm playing around with UAE4ALL, and would like to muck with it anad try to improve performance running on my iPhone - anything I do I'll gladly share back.

Cheers and thanks for the awesome work,

Stu

fox68k
July 3rd, 2009, 13:07
You can check out FAME here: http://svn.emuforge.com/fame.

scarnie
July 8th, 2009, 23:04
Thanks mate - I notice that the actual code generator for FAME/C is not there, which is what I'm most interested in.

I've taken C68k and have it generating an almost identical version - still a bit of work to do, but close. I plan on taking this as a starting point and generating a more optimized version for the ARM. Unfortunately, Cyclone it is not compatible with the GPL (and I couldn't get it working with uae4all anyway). FAME/C is working fine.

Cheers,

Stu

fox68k
July 10th, 2009, 12:08
Thanks mate - I notice that the actual code generator for FAME/C is not there, which is what I'm most interested in.

You are confused. FAME/C does not rely on a code generator. FAMEx86 and sh4 do, though.

If you are trying to plug C68K in uae4all, you will face a long way. Some games require features not present in C68K.
From my point of you are better off getting started with FAME/C and optimizing code for ARM. An alternative might be adding ARM code to either FAME/x86 or sh4 generators which is probably what you were thinking about at first.

scarnie
July 10th, 2009, 19:22
You are confused. FAME/C does not rely on a code generator. FAMEx86 and sh4 do, though.

If you are trying to plug C68K in uae4all, you will face a long way. Some games require features not present in C68K.
From my point of you are better off getting started with FAME/C and optimizing code for ARM. An alternative might be adding ARM code to either FAME/x86 or sh4 generators which is probably what you were thinking about at first.

Hey mate,

Definitely not confused - and I do agree with you. FAME/C does not appear to rely on a code generator any more, but it was derived from C68k (as per comments I've read previously). Already, with minor changes to C68k, a good percentage of the opcodes being generated match identical to famec_opcodes.h (even spacing :D).

I'm taking a very systematic and logical approach. I was never intending to just drop in C68k. My plan is to initially replace the famec_opcodes.h header file only - I've already reformatted C68k to output the opcodes in the same way FAME/C does today. Once that is working, I'll do 'black box' testing, whereby I'll generate a 68k test app, that unit-tests most many instructions of the CPU and records known outcomes, I'll then start experimenting with mine and ensure the same tests pass. As long as my tests cover a good percentage of the CPU functionality and my replacement CPU passes, it should be fine. My tests vary from simply executing 1 instruction and returning to running little programs. The already do things like verify cycles, flags, memory changes, etc.

Cheers and thanks for the input,

Stu

fox68k
July 10th, 2009, 21:37
FAME/C is heavily based on C68K. But that does not mean it has to follow the same approach. We got rid of the code generator long ago, cause we found it kinda pointless.

scarnie
July 10th, 2009, 23:33
Could you be more specific as to what problems I will run into? The way I see it, if I can make C68k generate an identical file to famec_opcodes.h, I have succeeded in my initial goal.

You're right, I could just modify the C code, but if I find a useful pattern that I could apply to all the bit-ops and numeric opcodes, why force myself to cut/paste (and be susceptible to error). Update the generator and regenerate the file is much better. Ultimately, the reason for writing a generator for the 68k was there is so much repetition, so I simply want to take advantage of that in my endeavours. Whether that generator is C or assembly is really irrelevant.

Cheers,

Stu

fox68k
July 11th, 2009, 14:27
Could you be more specific as to what problems I will run into?

As far as i remember, you need trace mode and bus and address error exceptions.


You're right, I could just modify the C code, but if I find a useful pattern that I could apply to all the bit-ops and numeric opcodes, why force myself to cut/paste (and be susceptible to error). Update the generator and regenerate the file is much better. Ultimately, the reason for writing a generator for the 68k was there is so much repetition, so I simply want to take advantage of that in my endeavours.

I would rather make use of macros and inlined functions in C instead. But that is a personal preference. Obviously, FAME/C is not a good example. It should be tidied up, but we are pretty busy and always find anything more interesting to spend our time on.


Whether that generator is C or assembly is really irrelevant.

In theory, it is. In practice, it is not. Preprocessing support is quite limited in many assemblers (being NASM one exception) where as C preprocessing is more flexible and powerful.

Regards.

scarnie
July 11th, 2009, 20:21
Okay, I'll take the approach of deriving an ARM version from the SH-4 generator - at least they are both RISC architectures.

What settings are used for the Dreamcast / SH-4 version of uae4all?

I could not get Cyclone to work, as it is not compatible with Apple's implementation of GAS.

Cheers,

Stu

fox68k
July 13th, 2009, 10:10
What settings are used for the Dreamcast / SH-4 version of uae4all?

Have a look at the Makefile. For uae4all:

--format=$(format) --inline-loop --accurate-timing --check-branches --multi-bank --undocumented --direct-mapping=16 --trace-mode --clock-counting --running-state --global-context


I could not get Cyclone to work, as it is not compatible with Apple's implementation of GAS.

Do you know what the differences are?

Cheers.

scarnie
July 14th, 2009, 19:22
Do you know what the differences are?


.rept is not supported at all. Use .space instead

.ltorg is also not supported, so manually create literal pool (boo) periodically for the jumps

.global is .globl in Apple's version

After manually fixing this, I got it to compile, but the use of r9 is a no-no on iPhone, as this is used by the OS and can be used by the compiler. I globally replaced with r12, as it looked safe, but will review my changes and try again.

fox68k
July 17th, 2009, 11:25
Not big deals. Please, keep me updated about this. I might be able to support you.

I am just curious. iPhone features a very powerful CPU. It is quite unlikely you really need anything faster than FAME/C. Have you thought about this?

scarnie
July 17th, 2009, 16:59
Thanks for the offer, mate. Very much appreciated.

The 3GS has an ARM Cortex A8 (like the Pandora) running at 600MHz and 256MB RAM. It runs at full speed with sound using FAME/C. The prior generation of devices (iPhone 3G and iPod Touch) have an ARM 1176 running at 412MHz & 500MHz respectively. This CPU has much less horsepower (although still good) and the FAME/C is just not quite there.

I have sound set at 22050KHz and a buffer size of 960 sample frames, and it only refreshing at around 15 - 18 times per sec on the older hardware (should be 22-23x). So, I do need to do some work on this.

Sound Bug:
I am running PAL, and set the time_slice and m68k_speed both at 0. The big problem I'm having right now is the sound is 'choppy', like portions of the buffer are skipped or some timing issue. I've used my iPhone sound routines several times now in other emu's / code without issue, so I'm fairly certain it is not them. Any suggestions here would be great. Interestingly, if I set the mode to NTSC, the sound choppiness goes away, by IK+ crashes on start. I'm using 0.7.2a of the gp2x version of uae4all. Not use if IK+ is incompatible with NTSC or it's a bug in uae4all yet.

Update on my approach:
My approach has changed a little again, as I'm now back to C68k, and rewriting it to use a hybrid C / inline ARM approach. I'm taking advantage of the ARM's CPSR, so the flags generally do not have to be calculated (like Cyclone). In addition, the hybrid method is cool, as I see the gcc instruction scheduler making interesting optimizations that otherwise would take me lots of time, pouring over the ARM 1176 technical reference manual. All the opcode handlers are set as 'naked' functions, so there is no prolog and epilog. This means I'm using the 'GOTO' approach like Cyclone, where the end of each opcode handler jumps to the next handler conditionally. It might be interesting for some people to look at, when I'm done. I've already got a number of unit tests which validate the core is working for the instructions I have implemented (ORI/ANDI/EORI, ORICCR/ANDICCR/EORICCR, and more).

Cheers,

Stu

scarnie
July 20th, 2009, 01:34
@fox68k,

So far things are looking great with regards to performance of the hybrid C/ARM version. I have about 82 unit tests so far, testing a wide variety of expected results of the implemented instructions, such as flags. I'm finally at the point where I could test this little program:

ORG $0
START:
move.l #$80, d2
move.l #$00, d0
inner:
move.l #$7FFF, d1
loop:
addi.w #$3333, d0
dbra d1, loop
dbra d2, inner
trap #15 ; halt simulator

END START ; last line of source

Basically, it increments d0 by 1, 4194176 times.

(loop) = the time to execute the above program
(total) = the time for all 82 tests

The results:

3G ARM 1176 @ 412MHz
FAME/ARM core:
1.041s (loop)
1.711s (total)

FAME/C core:
1.708s (loop)
2.427s (total)

Speed up:
1.64x (loop)
1.42x (total)

3G[S] ARM Cortex A8 @ 600MHz
FAME/ARM core:
0.567s (loop)
0.818s (total)

FAME/C core:
1.081s (loop)
1.313s (total)

Speed up:
1.90x (loop)
1.61x (total)