Pate has posted more WIP news concerning his Dos Emulator for the Nintendo DS:
Well, the big architectural change in the next version is that I finally got the self-modifying version of my IRQ handling code to work! This is something that I have wanted to do pretty much since the start of this project. I tried to use SMC back in December last year, but I could not make it work reliably at that time. Now at the beginning of my summer vacation I decided to look into this again, and now it looks like I got it to work reliably! The rewritten IRQ handling I did at the end of May probably helped to make this work, as the IRQ code was now in one place and easier to change.
Ever since the beginning the main opcode dispatcher loop of DSx86 has looked like this:
loop:
ldr r1,[sp, #SP_IRQFLAG] @ Get the IRQFlag value (set to 0 if we need to handle an IRQ, else 0xFF)
ldrb r0,[r12],#1 @ Load opcode byte to r0, increment r12 by 1
mov r2, r3 @ Clear segment override, r2 = r3 = physical DS:0000
and r1, r0 @ AND the opcode value with the IRQFlag value, result is either just the opcode or 0
bic r9, #0xFF @ Clear segment override flags from low byte of r9
ldr pc,[sp, r1, lsl #2] @ Jump to the opcode handler (or opcode 0 if r1 == 0)
@ ------------------- 00 = ADD r/m8, r8 -------------------------------
op_00:
@-------
@ Check if we are to handle an interrupt instead of op_00
@-------
mrs r1,cpsr @ Save flags to r1
cmp r0, #0 @ Do we really need to handle opcode 00 instead of an IRQ?
bne IRQStart @ Nope, we need to handle an IRQ instead.
@-------
@ Handle opcode 00. No need to restore processor flags, as the following "adds" will change them anyways.
@-------
...
That is, I first read a mask from the stack which states whether the code should jump to IRQ handler next or keep on handling opcodes. Then I read the opcode byte, then make the effective segment register r2 point to the start of the DS segment, then mask the opcode byte with the IRQ mask, then clear the flags that tell whether we had a segment override prefix, and then load the program counter from the opcode handler address table (in stack), which causes a jump to the opcode handler. If the SP_IRQFLAG is zero, the jump goes to the opcode 0x00 handler, which first checks whether the opcode actually was a zero, and jumps to the IRQStart handler if it wasn't
The annoying thing in this code is that I need to perform a memory read for the IRQ mask and a masking operation for every single opcode, even though the IRQs happen extremely rarery (from the CPU speed point of view). That is 4 extra CPU cycles that go to waste for every single opcode. Really annoying and frustrating when coding an emulator that should run as fast as possible.
However, now that I finally managed to make the SMC version robust, the main opcode loop looks like this:
loop:
ldrb r1,[r12],#1 @ Load opcode byte to r1, increment r12 by 1
mov r2, r3 @ Clear segment override, r2 = r3 = physical DS:0000
bic r9, #0xFF @ Clear segment override flags from low byte of r9
.global SM_IRQFLAG
SM_IRQFLAG: @ SELF_MODIFIED CODE!
ldr pc,[sp, r1, lsl #2] @ Jump to the opcode handler (or load r0 register if IRQ)
b IRQStart @ Jump to IRQ handler if we did not jump above.
I got rid of the IRQFlag in the stack and the mask operation, so the main dispatcher loop does not have any code relating to IRQ handling. Instead, I replace the opcode at the SM_IRQFLAG address with a different one when the code needs to jump into IRQStart. The two opcodes that can be at SM_IRQFLAG are as follows:
#define IRQ_ON 0xE79D0101 @ ldr r0,[sp, r1, lsl #2]
#define IRQ_OFF 0xE79DF101 @ ldr pc,[sp, r1, lsl #2]
That is, the opcode loads r0 register (which is used as a scratch register in DSx86) instead of the program counter when an IRQ handling should start, so the program flow continues to the "b IRQStart" branch instruction. The IRQStart routine then restores the IRQ_OFF opcode to SM_IRQFLAG and then performs other stuff needed when beginning an IRQ handling. This change removed the 4 extra CPU cycles from the handling of every opcode, so DSx86 became 8% faster than before just by this small change!
As you might remember, at the end of May the Norton Sysinfo displayed the DSx86 speed as 10.6 times original PC. After this change the speed is up to 11.5 times original PC, which is even faster than the original Nov 12th, 2009 Sysinfo measurement of 11.3 times original PC on real hardware and 11.6 on No$GBA. At that time the code was still missing most of the current features, so it is no surprise it ran much faster then that it has been running recently before this SMC change.
Game-specific fixes
I have been following the compatibility wiki closely, it is very interesting and motivating to see how the compatibility of DSx86 improves by each version, and how thoroughly the testers test and report the problems. So, I decided to focus on the games not working in version 0.15 as reported on the wiki, and especially on the games that have their own pages there. I believe if testers have spent time in creating game-specific pages, they would propably like to see those games actually working! :-)
However, I started by looking further into the problem in Gods, and found out that the flickering problem is caused by the game accessing the VGA VRAM with the ES segment register value of 0x9FFC, instead of something between 0xA000 and 0xAFFF which is how I detect access to VGA VRAM. I added some hacks to the opcodes that the game uses with this segment register value, which fixed the problem in this game, but the annoying thing is that these hacks will slow down these opcodes in every software that uses these opcodes. Luckily the opcodes in Gods were mostly some reasonably uncommon ones, so this should not be much of a problem.
I also looked into the graphics problem in Silpheed, and found out that it does pretty much the same thing as Gods, in accessing the VGA VRAM with ES segment pointing to 0x9F00. This game however used the absolutely most common opcodes, so I really hate to hack these opcodes and make them up to 10 times slower just to make this one game work properly. This actually was the thing that made me look into the IRQ handling again, I thought that if I could make every single opcode run faster than before, perhaps then I would not feel so bad about making the common ones run 10 times slower. I haven't yet made these hacks, as I am still looking into possible other options or workarounds to not have to slow down the most common opcodes. So, Silpheed might not work properly yet in the next version.
After those games I then started looking into the games with their own pages in the compatibility wiki, the first one being A-Train. It seemed to sort of work, but the scenery graphics were strangely monochrome, even though other things on the screen seemed to use the correct palette. After quite a bit of debugging I finally noticed that the game sometimes uses "EGA Register Interface Library" calls to change the EGA registers, and sometimes it accesses the registers directly. I had just ignored all EGA Register Interface Library BIOS calls in DSx86, assuming that if a game wants to use them it would first query whether such exist. A-Train did not query the existence, but instead blindly used the calls, which did nothing in DSx86 and thus it always wrote to the same bit plane in EGA VRAM and thus the graphics got monochrome. I implemented the EGA RIL calls that A-Train needed, and the scenery began to look correct. There are still some minor graphics issues that I need to look into, but it is mostly OK now. The game uses 640x480 VGA mode, so fitting it properly into the DS 256x192 screen will be rather awkward.
The next game I looked into was Alcatraz. It had some serious palette issues and also other graphics problems. It is actually quite curious how it feels like I have fixed the palette handling in DSx86 half a dozen times already, and still I constantly run into new games that use a broken palette! Very strange.. Anyways, I haven't yet figured out what the problem in Alcatraz is, so I will continue looking into this. At first I just wanted to see how it works, as the compatibility wiki only shows information for DSx86 version 0.14.
Next I checked Buck Rogers - Countdown to Doomsday, which looped in trying to write and read port 0x2BB. I don't know what the game thinks that port should contain, but in any case DSx86 has nothing useful in that port so I just ignored the access, and after that the game seemed to work fine. Well, it has the same "Unexpected save error 3" problem as the other Buck Rogers game if the save game is not setup in the config file, but this is not a problem of DSx86.
The next game I checked was LHX Attack Chopper. It turned out to need quite a few new EGA opcodes, but it did not have any other problems (not counting the horrible PC beeper sound effects) so it was pretty easy to fix. It should work fine in the next version, though I have only flown the chopper around a little bit and gotten shot down. :-)
Perhaps the most interesting game I tested was Ugh!, as it uncovered a problem in the DSx86 keyboard handling routines. The game did not recognize cursor key presses, and after I debugged the keyboard IRQ handler of the game I realized that DSx86 does not send the extended keyboard prefix 0xE0 which the game expects. DSx86 emulates the old 83-key PC keyboard and not the currently standard 102-key extended keyboard. However, since games (like Ugh!) might expect to communicate with the extended keyboard, I decided to add the 0xE0 prefix byte to the extended keys, including the cursor keys, to DSx86 keyboard routines. So, from the next version onwards, the correct key map in the DSx86.ini file for cursor keys should look like the following:
KEY_UP=E048
KEY_DOWN=E050
KEY_LEFT=E04B
KEY_RIGHT=E04D
However, the old plain 48, 50, 4B and 4D scancodes should work fine in the games they currently work, it just means that the keyboard does not look like an extended keyboard to those games. By the way, this change also enables some new keys to be mapped, like the Right Control key E01D, Right Alt key E038 and Keypad Enter E01C, which don't exist in the touchpad keyboard. Also, if for some reason some game stops recognizing the new cursor keys, you might try overriding the new extended keys in the DSx86.ini with the old one-byte versions for that game.
Finally, I started looking into the problems in Castle Adventure. The unsupported INT call was due to a missing FCB file handling operation, but when I added that the game still did not work properly. After some debugging I noticed that the FCB structure I used in my FCB handling routines was not correct, so I fixed that, but still the game has problems. The strange thing is that the problems differ in each environment I try to test it. In iDeaS the game progresses to the first room, but does not show the player character and does not take any commands. In my real DS Lite it states "String formula too complex in line 5055" when pressing 'p'. In the bundled version of DSx86 running in No$GBA it just states that the file "CASTLE.RAN" is missing and exits. All in all, something very strange is going on with this game, so this still needs some debugging. I suspect my FCB file functions in general are not very robust yet.
Anyways, my summer vacation is starting now, so next week I can work on DSx86 quite a bit. I haven't yet decided whether I continue fixing the games on the compatibility wiki or start implementing new features already, but we shall see. It is summer and I have no obligations, so I'll do whatever feels interesting at the time. :-)
http://dsx86.patrickaalto.com/DSblog.html
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks