PDA

View Full Version : DS2x86 progress



wraggster
October 3rd, 2011, 00:24
via http://dsx86.patrickaalto.com/DSblog.html

My work with the paging support for DS2x86 still continues. I did manage to find and fix one problem that caused the internal sanity check failures I mentioned in my previous blog post. This problem made the game every now and then drop into debugger with an error message stating that the NT (Nested Task) flag was not correct in IRET opcode. This turned out to be caused by a race condition between my TaskSwitch handling and the hardware IRQ handling. The TaskSwitch handler sets the NT flag on, which disables interrupts in DS2x86, but it was possible that the hardware interrupt happened after the TaskSwitch handler began executing but before it had turned on the NT flag. In this situation the hardware interrupt had already changed the opcode table to point to the IRQ handling code, and thus the interrupt got executed even when the NT flag was on after the TaskSwitch handler returned. I added code to reset the opcode table pointer after the NT flag gets turned on, which fixed that problem.

Another problem was much more difficult to track down, and it took me pretty much the whole of last week. After the previous fix, the Descent 2 Demo usually run up to the menu fine (though it still crashed occasionally before reaching that far). When selecting New Game or View Demo from the main menu, it loaded a little while but then always dropped down to DOS with an error message Error: Error reading ControlCenterTriggers in gamesave.c. Luckily, the source code for Descent 2 is freely available, so I was able to download the source code and look into the gamesave.c source file and see the C code that causes that error. The source code snippet in question looks like this:

//================ READ CONTROL CENTER TRIGGER INFO ===============

if (game_fileinfo.control_offset > -1)
{
if (!cfseek( LoadFile, game_fileinfo.control_offset,SEEK_SET ))
{
for (i=0;i<game_fileinfo.control_howmany;i++)
#ifndef MACINTOSH
if (cfread(&ControlCenterTriggers, game_fileinfo.control_sizeof,1,LoadFile)!=1)
Error( "Error reading ControlCenterTriggers in gamesave.c", i);
#else
ControlCenterTriggers.num_links = read_short(LoadFile);
for (j=0; j<MAX_CONTROLCEN_LINKS; j++ )
ControlCenterTriggers.seg[j] = read_short( LoadFile );
for (j=0; j<MAX_CONTROLCEN_LINKS; j++ )
ControlCenterTriggers.side[j] = read_short( LoadFile );
#endif
}
}

//================ READ MATERIALOGRIFIZATIONATORS INFO ===============


The next step was to trace the ASM code within DS2x86 until I found something that allowed me to see what C routine was executing. This was the part that took most of my time. I had to break into the debugger at various points, and then trace upwards along the call stack until I found the routine that failed to return but instead exited back to DOS. When tracing that routine, I finally found a PUSH opcode that pushed a value corresponding to an address of an error string similar to the error message at the beginning of the gamesave.c routine. At that point I knew I found the gamesave.c routine, and was able to start comparing the ASM code to the C source code and follow the progress until I got to the above code snippet. Here below is the ASM code corresponding to the original C source code:

//================ READ CONTROL CENTER TRIGGER INFO ===============


if (game_fileinfo.control_offset > -1)
{

10088FA8 mov esi,[101E7CF3] ; esi = game_fileinfo.control_offset
cmp esi,-1
jle 1008900A

if (!cfseek( LoadFile, game_fileinfo.control_offset,SEEK_SET ))
{

10088FB3 mov eax,[esp+000000C0] ; eax = LoadFile
mov edx,esi ; edx = game_fileinfo.control_offset
xor ebx,ebx ; ebx = SEEK_SET = 0
call 100EF804 ; cfseek()
test eax,eax
jne 1008900A

for (i=0;i<game_fileinfo.control_howmany;i++)

10088FC7 mov edi,[101E7CF7] ; edi = game_fileinfo.control_howmany
xor esi,esi ; esi = i = 0;
test edi,edi
jle 1008900A

if (cfread(&ControlCenterTriggers, game_fileinfo.control_sizeof,1,LoadFile)!=1)

10088FD3 mov eax,102B355C ; eax = &ControlCenterTriggers
10088FD8 mov ebx,00000001 ; ebx = 1
mov ecx,[esp+000000C0] ; ecx = LoadFile
mov edx,[101E7CFB] ; edx = game_fileinfo.control_sizeof
call 100EF7CC ; cfread()
10088FEF cmp eax,01
10088FF3 je 10088FFF

Error( "Error reading ControlCenterTriggers in gamesave.c", i);

10088FF5 push esi ; Push esi = i
push 10126DF4 ; Push "Error reading ControlCenterTriggers" address
jmp 100E26F4 ; Error(), routine never returns

10088FFF mov ebp,[101E7CF7] ; ebp = game_fileinfo.control_howmany
10089005 inc esi ; i++;
10089006 cmp esi,ebp ; Test for i < game_fileinfo.control_howmany
jl 10088FD3

}
}

//================ READ MATERIALOGRIFIZATIONATORS INFO ===============

if (game_fileinfo.matcen_offset > -1)

1008900A mov eax,[101E7CFF] ; eax = game_fileinfo.matcen_offset
...


After some tracing and studying of the code I finally figured out what went wrong. After executing the CMP opcode at address 10088FEF the cseip has a value 10088FF3. As this is very near the end of the physical page, it causes the code between 10088FF0..10089010 to get copied to the temporary area at F000:3FF0 as I described in my previous blog post. The code continues running the conditional jump je from this copied location. However, all conditional jump opcodes re-calculate the target physical address (since they can freely jump between pages). So, after executing the je opcode and jumping to 10088FFF the code is again running from the original location, and loads the last byte of this page, which is the first byte of the opcode MOV. At this point my special paging-enabled main opcode loop should have again copied the area around this page break to the temporary location, however, there was a bug in the code. After loading the opcode byte from 10088FFF the cseip variable was immediately incremented, and only after that I tested whether the current location was near the page end. Since at that point the cseip value was already 10089000, the code determined that it was not near the end and continued running from invalid physical address, loading invalid opcode bytes. By chance the invalid code did not crash but instead jumped backwards so that it eventually reached the address 10088FF5 and then exited with the error message.

The fix was not especially difficult but meant that I had to reorder some code around in the main loop. This is the most performance-critical part of my emulator, so I had to be careful not to cause additional delays to the code. I believe the new code is at least as fast, and it even has the advantage that I now only need to change one opcode for the self-modified stuff, when earlier I had to change two opcodes in the loop to make this work. So, in the end fixing this bug brought some additional advantages.

Currently I am attempting to trace and fix an annoying occasional problem when returning from a hardware IRQ. Occasionally the stack does not contain proper values when the code returns from an interrupt and returns from the routine that was interrupted. This also seems to have something to do with the page discontinuity within the stack area, but I have not been able to determine the exact cause yet. So, no idea yet when I would be able to release a version of DS2x86 that supports paging properly.