Pate has posted more news about his Dos Emulator for DS:

After my last blog post I continued working on the new macro system, and kept adding new 386-opcodes using the new macros. This progressed nicely, until I ran into a div ecx opcode. The problem with the x86 division opcodes is that they use double-precision dividends. For example the opcode "div cx" divides a 32-bit value in DX:AX registers with a 16-bit divisor in CX register. Similarly, "div ecx" divides a 64-bit value in EDX:EAX with a 32-bit divisor in ECX. However, the MIPS architecture only has 32-bit divide operations, so I was able to code a 1:1 mapping between "div cx" and the MIPS equivalent, but I could not do this easily with the "div ecx" opcode of a 386 architecture.

I spent some time googling for solutions to this problem. The first potential solution was to use the GCC 64-bit (long long) division code. I tested it by coding a simple 64-bit division into my C-language tester program, and then looking at the dump file to see what exactly happens there. The GCC compiler uses a helper function __divdi3 to handle the actual 64-bit division, and the assembler code dump of that looked to be very big and complex. In addition to this divisoion result, the x86 div opcode calculates a remainder of the division. This has a similar __moddi3 helper function, which was just as complex. The register usage of my emulator differs quite a lot from the MIPS C-language standard, so if I called these functions I would need to save and restore many registers, which would slow the handling down even further. I decided to abandon this possibility and look for some more optimized versions of this 64bit/32bit division.

After some more googling I found an algorithm in a source code of some software by Jan Marthedal Rasmussen. The function is called double_div and it looked to be just what I was after. It calculates both the quotient and the remainder in one go, and it has special case handling for the simple situations, before reverting to the generic difficult scenario. I implemented this algorithm first in C language in my tester program, and compared the results it gives to the results of the GCC __divdi3 function. After I was certain the C language implementation works properly, I looked at the assembler dump and began converting that to a suitable format for my emulator. I could not directly use the assembler version that GCC had compiled, as it used much more registers than I can afford in my emulator code. I have four free registers that can be used in every opcode handler, and in rare occasions (luckily this "div" opcode is one of them) I can also use the four registers allocated for lazy flags handling. The C standard however leaves 16 registers free for every function to use, without having to save/restore them. The code that GCC created looked otherwise quite efficient, it had even managed to optimize away some repeated calculations of the original C code. Too bad I could not use it.

After some careful register allocation I managed to create a division opcode handler that looks to be working, however I have not yet created a proper unit test for it. I am currently more interested in continuing with the opcode macro system, so I'll go back to making sure this division opcode works even in the more difficult situations later. That is also why I won't post my algorithm here yet, I'll do that after I have created the unit tests and made sure my implementation works correctly.

But, now I think I'll continue with the opcode macros. I don't have any new screen copies to show either, as the code I am currently at is the code that calculates the ship and polygon positions before drawing them, so nothing new has yet been drawn on the screen. Perhaps next week. :-)

http://dsx86.patrickaalto.com/DSblog.html