PDA

View Full Version : ds2_firmware enhancement work for DS2x86



wraggster
November 6th, 2011, 23:30
via http://dsx86.patrickaalto.com/DSblog.html

First off, if you have downloaded my ds2_firmware sources before yesterday, those still had a problem. I found and fixed the problem yesterday, so please download them again (or, if you have already made changes to them, read on for my description of what the problem was).

I was aware that my ported ds2_firmware sources did not produce a fully working firmware for DS2SDK 1.2 already when I released them. However, the firmware compiled from my ported sources misbehaved similarly to the firmware built from the original sources released by SuperCard. Thus, my port was OK, the problem seemed to be in the original sources, which is why I decided to release my port at that time. After that I then started to make my port work properly, by comparing the disassembly of the working ds2_firmware.dat (dated April 30th, 2010 with a size of 415 744 bytes, released in the /tools directory of the SDK 1.2 package) to a dump of the arm9.elf file built from the SuperCard sources.

The first difference I found was that the working ds2_firmware.dat had the while (CARD_CR2&CARD_BUSY); delay loops still in place, even though the sources released by SuperCard had all of those commented out. Those are in the beginning of practically all the routines in the iointerface.cpp source file. I uncommented those, but that did still not fix the problem. That did however point out that the source code released by SuperCard is actually not the same source code that they have used when building the ds2_firmware.dat themselves!

It took me considerably longer to find the next difference, which then turned out to be the actual problem in the sources. The original sources have a routine that waits until the MIPS side has sent a certain number of bytes to the ARM9 side using the card fifo:

int waitfifo_full_len(int len)
{
u32 temp;
delay_times_0 = fifo_over_time ;
while(1)
{
temp=cardcommand_r4_nowait(nds_fifo_cmd_read_state ,0,0);
if (((temp>>nds_fifo_read_full_bit)&1) ==1)
{
break;
}
else if (((temp>>nds_fifo_len_bit)&nds_fifo_len_mask) >=len)
{
break;
}

if (delay_times_0 == 0)
{
return 1;
}
}
return 0;
}

I used iDeaS to debug and disassemble the original ds2_firmware.dat. A disassembly of this routine begins at logical address 0x02002590, and the disassembled routine looks like the following:


A dump from the ELF file of the ARM9 code created from the SuperCard sources instead looks like this:

020019c8 <_Z17waitfifo_full_leni>:
20019c8: e92d4038 push {r3, r4, r5, lr}
20019cc: e59f4058 ldr r4, [pc, #88] ; 2001a2c <_Z17waitfifo_full_leni+0x64>
20019d0: e3a03014 mov r3, #20
20019d4: e1a05000 mov r5, r0
20019d8: e5843000 str r3, [r4]
20019dc: ea000004 b 20019f4 <_Z17waitfifo_full_leni+0x2c>
20019e0: e1530005 cmp r3, r5
20019e4: 2a00000c bcs 2001a1c <_Z17waitfifo_full_leni+0x54>
20019e8: e5943000 ldr r3, [r4]
20019ec: e3530000 cmp r3, #0
20019f0: 0a00000b beq 2001a24 <_Z17waitfifo_full_leni+0x5c>
20019f4: e3a01000 mov r1, #0
20019f8: e1a02001 mov r2, r1
20019fc: e3a000e0 mov r0, #224 ; 0xe0
2001a00: ebffff63 bl 2001794 <_Z21cardcommand_r4_nowaithjj>
2001a04: e59f3024 ldr r3, [pc, #36] ; 2001a30 <_Z17waitfifo_full_leni+0x68>
2001a08: e1a029a0 lsr r2, r0, #19
2001a0c: e2100002 ands r0, r0, #2
2001a10: e0023003 and r3, r2, r3
2001a14: 0afffff1 beq 20019e0 <_Z17waitfifo_full_leni+0x18>
2001a18: e3a00000 mov r0, #0
2001a1c: e8bd4038 pop {r3, r4, r5, lr}
2001a20: e12fff1e bx lr
2001a24: e3a00001 mov r0, #1
2001a28: eafffffb b 2001a1c <_Z17waitfifo_full_leni+0x54>
2001a2c: 02063b28 .word 0x02063b28
2001a30: 000003fe .word 0x000003fe

The sources have obviously been compiled with a different GCC version, but the most peculiar difference is that the code tests the cardcommand_r4_nowait() function ("bl 0x0200242C" in the working version and "bl 2001794" in the compiled version) return value (in register r0) for bit 1 in the working version (using tst opcode), but for bit 2 (using ands opcode) in the compiled version!

I checked whether the cardcommand_r4_nowait works differently in the two versions, in case that would explain the different bit, but it seemed to be similar. So, I next looked at where the nds_fifo_read_full_bit comes from, and found out that it is defined in game_define.h like this:

#define cpu_write_Full_bit 1

#define nds_fifo_read_full_bit cpu_write_Full_bit

So, the check for ((temp>>nds_fifo_read_full_bit)&1) ==1 results in ((temp>>1)&1) ==1 which is the same as (temp & 2). So the compiled version is correct as far as the sources are considered, but it is very strange that the original ds2_firmware.dat tests a different bit! I decided to check what happens if I simply change the code to test for the same bit as the original ds2_firmware.dat (in this routine and also in the waitfifo_empty() routine, which had a similar but opposite difference). And, curiously, after this change the new ds2_firmware.dat began to work correctly! So, the ds2_firmware.zip source code package I have on my download page does now have this change in the iointerface.cpp source file, and I also decided to get rid of the game_define.h file completely and defined the few needed values at the top of the iointerface.cpp itself. This made the source code package somewhat smaller and clearer.

DS2x86 enhancement work
After I got the ds2_firmware to work correctly, I began looking into enhancing DS2x86 to take advantage of the new possibilities. The first thing I wanted to do was to have the lower screen (the virtual keyboard) updated on the ARM side, so that I did not need to send the whole screen image (256x192 pixels at 16-bit color!) every time a simple config text or HDD "led" changes. After some experimenting I managed to have the ARM side show the virtual keyboard. I commented out all the lower screen sending routines from my DS2x86 sources, so at first I could not see any config strings any more. I then looked into how the commands (like ds2_setSwap()) work, and noticed that they are actually quite simple. All commands are sent as a 512-byte block, with the first 60 bytes containing info of up to 20 different commands that can be sent simultaneously, and the remaining bytes being a free data area for the commands to use.

I quickly implemented a new IS_SHOW_CONFIG command, with the data containing all the strings that need to be shown on the lower screen. The MIPS side builds and sends the command, and when the ARM side receives this command it parses the strings from the command data and displays them on the lower screen config areas. This seems to work fine as long as I don't attempt to send another command immediately before or after this new additional command. So, I still need to look more closely into how the commands interact with the screen and audio sending and such. Perhaps I need to change the I/O interface to always send a combined command, and then have my routines simply append their commands to this master command structure.

I haven't yet looked into how to send more data from the ARM side to the MIPS side. I would like to move the whole keyboard/touchpad handling to the ARM side, so that the MIPS side would just receive the x86-style key scancodes to put into the keyboard buffer. But, in any case, it looks like enhancing the I/O layer with additional commands is pretty simple, so I should not have any major problems moving the AdLib emulation to the ARM7, for example. Of course I might still run into some problems with that, but at the moment it looks quite doable.