Sorry, no new version today, as there are not all that many useful enhancements yet in this version, and I have not had time to make sure my new changes have not broken any major features. I have been working on zerox86 quite a lot during the past week, though. Here is some information about the changes I have been working on.

Further optimizations to IRQ emulation
I started my work for the next version by rewriting the IRQ handling code yet again. The changes I made to the previous version already made it possible to run the emulated IRQs faster than the 250Hz task switch speed of GCW0, but I still had some thread synchronization code in my IRQ handling routine. However, now that I moved my timer IRQ not to use the asynchronic signals, I wanted to also move all the IRQ handling into the main emulation thread.

I was able to get rid of all the thread synchronization code, and also the whole signal broadcasting system I used to send IRQ requests from the UI and audio threads to the main emulation thread. Now the other threads simply set flags that the main thread reads. I still need to be careful when resetting the flag from the main thread, but other than that consideration there is no need for any extensive thread locking.

While I did that optimization, I also rewrote the timer I/O port handling in ASM. It used to be done in a C code (which is always slow to call from my main ASM code). When playing digitized sounds using the PC Speaker, the timer I/O ports are written to from the timer IRQ (at several thousand times per second), so to eventually handle that I need to have my timer I/O port handling pretty fast. Sadly this change caused Wizardry 6 to cause a division-by-zero error when using audio. This is probably caused by some timing problem, as the game seems to determine the speed of the PC using a very fast timing loop. I spent some time trying to adjust my timer handling, but was not able to fix this yet.

Fixed Jazz Jackrabbit menu selector on pause menu
Jazz Jackrabbit was reported on the combatibility list as having a problem in the game pause menu, where the selector bar is not visible. This was a rather curious issue, as the selection seems to work fine in all the other menus. It would be strange that the game would use two different techniques to handle menus. I had not played Jazz myself so far that I would have used the pause menu, so I had not noticed this problem, but going to the pause menu the problem was immediately visible. Finding the actual problem was considerably more difficult, though.

First I had to figure out the best way to debug this. I wanted to find out the opcode where the problem occurs, so I wanted to see which opcode Jazz uses to write the menu items to the screen. Since I did not know which opcode it was, it was difficult to figure out how and where to put a breakpoint. In the end it occurred to me that whatever opcode the game uses, it will always need to write something to the VGA registers before writing to screen (since the game runs in Mode-X). So I decided to let zerox86 run up to the pause menu, and then attached the GDB debugger to it, and set a breakpoint to the VGA port 0x3CE handler. Luckily the game does not draw anything else to the screen while on the pause menu, so my breakpoint got triggered only after I pressed cursor down.

When I got the code to stop to the break point, I copied the current CS:IP register value and used that to then cause zerox86 to break at the corresponding game code. The result looked like this (curiously the game uses 16-bit protected mode instead of the much more common 32-bit flat address space protected mode):

0207:0ED3 BACE03 mov dx,03CE VGA Graphics Registers I/O port
0207:0ED6 B80318 mov ax,1803
0207:0ED9 EF out dx,ax 03 = VGA Function Register = 0x18 (XOR)
0207:0EDA B8003F mov ax,3F00
0207:0EDD EF out dx,ax 00 = VGA Set/Reset Register = 0x3F
0207:0EDE B8080F mov ax,0F08
0207:0EE1 EF out dx,ax 08 = VGA Bit Mask Register = 0x0F
0207:0EE2 8E068615 mov es,[1586] ES = 014F (segment base address = 0xA0000)
0207:0EE6 8B7E08 mov di,[bp+08]
0207:0EE9 268A4551 mov al,es:[di+51] Load the VGA latch from VGA VRAM contents
0207:0EED A10483 mov ax,[8304]


There are a couple of interesting things in that code. First, the game sets the VGA Function Register to XOR instead of the normal MOV which simply writes the data as-is. The XOR function uses the VGA latch value (a sort of internal VGA register that keeps a copy of the most recent data that was read from the VGA VRAM) and performs a bitwise XOR operation with the data that is about to be written to the VGA VRAM, before actually writing the data. This is used for various tricks in the 16-color EGA graphics modes, but I had not seen this technique used in the 256-color modes until now. Perhaps because in the 256-color modes it is usually simpler to just use the XOR opcode to perform such operations, instead of programming the VGA registers and then making sure the VGA latch register contains the correct data.

The second interesting thing is that the code loads a byte from the VGA VRAM (at the code address 0207:0EE9), but then immediately overwrites the AL register (low 8 bits of AX) on the next code line. So, obviously the code has no need for the data at the VGA VRAM at that location, and the only possible use for that code is that it simply preloads the VGA latch register for later.

I had not coded any support yet for using the XOR function in the Mode-X graphics mode, so that was most likely the cause for the missing selector bar. Now I just needed to find out in which opcode the actual writing happens. Checking for the XOR function in all my Mode-X opcodes would cause a performance hit in all games, which I wanted to avoid since no other game has so far used this trick. Annoyingly, Jazz used a function pointer to call the actual screen writing code, so it took me a while longer to find it. It turned out that Jazz uses opcodes C6 (MOV r/m8, imm8) and C7 (mov r/m16, imm16) to perform the write. That was very nice, as those are rather rare opcodes to use for screen writing (usually games write some dynamic values, not hardcoded single pixels as in these C6 and C7 opcodes). So, I added support for the XOR operation to only these two opcodes, which resulted in the menu selection appearing properly! Here below are some screen copies about the problem, before and after my fix.


Enabled Virtual memory support
I also began working on the Virtual memory support in zerox86. Since the code is based on my DS2x86 which does have (rather fragile) virtual memory support, much of the code was already in place, just disabled or commented out. I removed the comments and then began running some of my test games. I started with Warcraft II demo. The first problem was that zerox86 crashed with a SIGBUS error. After running zerox86 in GDB I realized that my TLB (Translation Lookaside Buffer) table clearing needed work. I use the highest bit of the memory address to determine whether the address is in physical RAM or in some special area (like EGA or ModeX graphics memory or in Virtual Memory). In DS2x86 all the physical addresses are in the range 0x80000000 .. 0x80FFFFFF while on GCW0 the RAM addresses are in the range 0x00000000 ... 0x7FFFFFFF. Thus I have the exact opposite handling of the highest bit in GCW0 than in my DS2x86 code. The SIGBUS was caused by my forgetting to initialize my TLB to a proper "uninitialized" value, which on DS2x86 was simply zero but on GCW0 is actually 0x80000000. So I needed to fill my full 4MB TLB with 0x80000000 when zerox86 starts.

The next couple of problems were caused by my not noticing a few places in the code that needed more extensive checks, which were still commented out. The main problem with the paging (virtual memory) support is that a single opcode may span multiple pages, either with the opcode bytes, or with the memory address the opcode uses. DOSBox handles this by loading every byte separately, performing the memory access mapping for each byte. To speed up the emulation, I use a CS:EIP pointer in the host address space and load the opcode bytes without any memory checking. Similarly I also load the memory address that the opcode uses with just a single access to the TLB. This works fine and fast with linear memory access, not so when paging is on.

To handle the code segment paging, I change my main opcode loop to test whether the current CS:EIP pointer is within the last 16 bytes of the 4K page, and if so, I relocate these 16 bytes together with the 16 first bytes of the next 4K page to a continuous 32-byte temporary area. Accessing the next 4K page while relocating it may cause a Page Fault, which may thus not happen at exactly the same opcode as it should. This usually does not cause problems, though. Then when the CS:EIP pointer advances to within the first 16 bytes of the next page, I adjust the CS:EIP pointer to go back to the correct physical memory address. Since these checks add extra code to the innermost opcode interpreter loop, I use self-modifying code to add/remove these checks when paging gets enabled/disabled. This feature handles the code segment paging, but it still does not help in a situation where the opcode performs a memory access that spans multiple pages.

For the memory access spanning multiple pages, again the perfect solution would be to load/store all data one byte at a time using a separate TLB lookup for each byte. This would slow down most of the opcodes quite noticeably, so I am not willing to do that. Instead, I have added checks to the opcodes that have had this problem in the games I have tested. What this means in practice is that games (using virtual memory) I haven't tested myself and added specific support for will most likely not run in zerox86. This is what I mean with a "fragile support" for virtual memory.

One interesting thing happened when I was adding support for a 16-bit memory read page check. This code needs to check whether the address ends in 0xFFF, so that the low byte is in this page but the high byte needs to be read from a different virtual page (which may be physically far away from the first byte). I originally had written the check simply like this:

.set noat
andi AT, memory_address, 0xfff AT contains the last 12 bits of memory_address
sltiu AT, at, 0xfff AT = 1 if AT < 0xFFF, else AT = 0
beqz AT, spans_two_pages Jump if AT = 0 (last 12 bits of address are 0xFFF)
.set at

When I was looking at the MIPS architecture reference, I realized that this might actually be a situation where I could finally use the somewhat weird nor (not or) opcode of the MIPS assembly language! So, I changed the above code to the following.
.set noat
li AT, 0xFFFFF000 AT = 0xFFFFF000
nor AT, memory_address, AT AT = 0 if last 12 bits of memory_address are 0xFFF, else AT != 0
beqz AT, spans_two_pages Jump if AT = 0 (last 12 bits of address are 0xFFF)
.set at

In this code I NOR the memory address with 0xFFFFF000, where the result is zero only if the memory_address ends with 0xFFF. It is not any faster, but not any slower either, and finally uses the nor opcode for something! :-)

I got Warcraft II to start up pretty easily after a few such opcode fixes, and then I began to look into making Grand Theft Auto running. This game needed a few more opcodes to be supported, but otherwise it was surpiringly easy to get running. You obviously need to run the 8bit graphics version, as that is all that zerox86 supports. It is also not very fast, as is to be expected with all the additional code needed for virtual memory support.


After I got GTA working, I could not resist testing how far will I get with Windows 3.11. Back when I worked on DS2x86, it was partly my frustration trying to make Windows 3.11 work in it that eventually resulted in my stopping work on it. However, now with GCW0 I have a lot better debugging tools on a much faster platform, so it is again interesting to work on Windows 3.11 support. If I get it working, the next step would be to try to make Windows 95 running, as that is practically my end goal with my emulator projects.

I had pretty good notes from my work on Windows 3.11 on DS2x86. I ran into many of the same issues on zerox86, and could use my notes to see what needs fixing. Much of these fixes were just uncommenting some code, but some code also needed actual changes, mostly because of the 0x80000000 bit significance change. I still have not managed to get quite as far as I got with DS2x86, but I am already past the middle point of my notes from DS2x86. :-)


When working on changing my opcodes to be paging-enabled, I keep running Doom timedemo occasionally to keep track of how my changes affect the overall performance of zerox86. If the performance drops, I then spend some time figuring out new performance enhancements to the new paging-enabled opcode versions. Here is a table of Doom timedemo results at various steps on the way. The doom timedemo measures the number of timer ticks running the demo takes, so the smaller the number the faster the PC. The corresponding FPS can be counted as 74690/realticks, for example 74690/6451 = 11.57 fps.

realticks Description of the change made
6451 zerox86 version 0.03 on a high-resolution timer -enabled kernel.
7020 zerox86 version 0.05 (workaround for the lack of high-res timers).
6762 After further optimizations to IRQ emulation.
7408 With paging-enabled opcodes for GTA and WAR2.
6909 Performance enhancements to paging-enabled MOV opcodes.
6629 Performance enhancement to paging-enabled PUSH opcode.


Future work
I probably am not able to resist working on Windows 3.11 a little while longer, even though there are still many games on the compatibility wiki that would need looking into. I do plan to make some game-specific fixes before releasing the next version, though. I also have an idea about how to make zerox86 slow down for the really old games that currently run too fast, so that is also something I want to experiment with.

http://zerox86.patrickaalto.com/zblog.html