DS86 News - Dos Emulator for Nintendo DS - Version 0.20 released! [Archive] - DCEmu Network: The Homebrew, Hacking & Gaming Network

wraggster

July 4th, 2010, 23:00

Pate has released a new version of his Dos Emulator for the Nintendo DS, heres the full release notes:

Yes, it is version 0.20, not 0.16! I decided to jump the version number a bit again, as this version has some extensive internal architecture changes, as well as much improved audio features. I also began a new blog page as the previous one had grown quite long. See the end of this page for a link to the previous blog entries. Anyways, here is a list of the most important changes:

New IRQ handling using self-modifying code, which makes the whole emulator about 8% faster than before. More info in my previous blog post.
Keyboard handling changed to support extended key codes. This will allow the mapping of some new keys like the right control key to an NDS button. Again more info in my previous blog post.
Screen update mode "Direct" was removed. It was only used in the MCGA mode, and it only speeded up certain games (Wing Commander series, mostly) by a small amount. Having that mode around meant I had to write double code for all MCGA/VGA graphics memory and BIOS operations, so now I finally got fed up with doing that.
AdLib emulation changed from effective 13-bit waveforms to 15-bit waveforms (and full 16-bit with rhythm instruments), which means that AdLib audio now has quadruple volume.
AdLib rhythm instruments (Bass Drum, Snare, HiHat, TomTom, Cymbal) implemented.
Covox/Disney Sound Source/Parallel port DAC audio device emulation added. Thanks sverx for the idea!
Sound Blaster Direct DAC audio output mode supported (SB DSP command 10).
Sound Blaster auto-init 8bit DMA audio output mode supported (SB DSP commands 1C, 48 and DA).
Fixed the BLASTER environment variable to state the SB type as T3 (SB 2.0) instead of T4 (SB Pro 2.0), as only mono audio is supported. Some games (Hocus Pocus, for example) attempted to use stereo FM sound when the type was T4, resulting in no AdLib audio.
Graphics problems in Gods are fixed.
Some new EGA opcodes added.
The new Covox and SB improvements still have some limitations. The Covox and SB Direct DAC output have distortion/warble in the sound. I believe this is mostly caused by the screen blitting code, which does not allow the PC timer interrupts to happen during the screen blitting. You can lessen the distortion by using screen update mode 15FPS, but you can not get completely distortion-free sound. The SB auto-init DMA only works when the played buffer is exactly divisible by 128, so some games might still not play all digitized sounds.

AdLib emulation improvements
As you can see from the change list, my focus last week was the audio features of DSx86. The last time I worked on the AdLib emulation was September last year, so I first had to go through the code and try to remember how it worked. Then I began by increasing the audio volume, which was the most often requested audio improvement. I remembered I had tried to increase the volume once earlier, but that resulted in bad distortion. Now I found the reason for the distortion and was able to fix that, so increasing the audio volume actually made the audio much cleaner. Also, for the first time since I had started working on DSx86 I now used Hi-Fi headphones with my DS Lite, and was surprised to find that my AdLib emulation actually produces quite convincing bass frequencies! I had only tested the audio with the inbuilt speakers and el-cheapo tiny headphones, neither of which seemed to produce any bass sounds.

After I increased the audio volume I began working on the missing rhythm instruments. AdLib has two modes of operation, it can either use all 9 channels (each with 2 FM operators) for melodic instruments, or it can use the last three channels for rhythm instruments (so that Bass Drum uses both operators of channel 6, the other four rhythm instruments each use a single operator of the remaining two channels). Back in September the rhythm instrument code in my reference fmopl.c implementation looked extremely complex and slow, so I skipped implementing these at that time. Now when I am on my summer vacation I wanted to really look into this code and understand how it works, and now was able to optimize and invent various shortcuts to make it run pretty much as fast as the normal channels.

Actually Bass Drum behaved pretty much like a normal channel, except that in the normal melodic channel operator 1 either works as a phase modulator for operator 2, or it produces sound directly (so that a single channel can actually produce two different sounds), but with Bass Drum it either works as a phase modulator or is ignored completely. Ignoring it was of course quite an easy change, so that took care of the Bass Drum. Tom Tom was quite easy as well, it just used a single operator to drive the output, so it was actually easier than the melodic channels.

The HiHat, Snare and Cymbal sounds were more difficult. They also each use only a single operator, but they need a noise generator in addition to the phase frequency counter, and also the frequency is not used as a simple 16.16 fixed point value being an index to a waveform table, but only a few bits of the frequency counter are used to create certain fixed indices to the waveform table. Both HiHat and Cymbal also use two different operators (channel 7 operator 1 and channel 8 operator 2) to produce their output. For example, this is what the HiHat phase generation looks like in the reference implementation:

/* high hat phase generation:
phase = d0 or 234 (based on frequency only)
phase = 34 or 2d0 (based on noise)
*/

/* base frequency derived from operator 1 in channel 7 */
unsigned char bit7 = ((SLOT7_1->Cnt>>FREQ_SH)>>7)&1;
unsigned char bit3 = ((SLOT7_1->Cnt>>FREQ_SH)>>3)&1;
unsigned char bit2 = ((SLOT7_1->Cnt>>FREQ_SH)>>2)&1;

unsigned char res1 = (bit2 ^ bit7) | bit3;

/* when res1 = 0 phase = 0x000 | 0xd0; */
/* when res1 = 1 phase = 0x200 | (0xd0>>2); */
UINT32 phase = res1 ? (0x200|(0xd0>>2)) : 0xd0;

/* enable gate based on frequency of operator 2 in channel 8 */
unsigned char bit5e= ((SLOT8_2->Cnt>>FREQ_SH)>>5)&1;
unsigned char bit3e= ((SLOT8_2->Cnt>>FREQ_SH)>>3)&1;

unsigned char res2 = (bit3e ^ bit5e);

/* when res2 = 0 pass the phase from calculation above (res1); */
/* when res2 = 1 phase = 0x200 | (0xd0>>2); */
if (res2)
phase = (0x200|(0xd0>>2));

/* when phase & 0x200 is set and noise=1 then phase = 0x200|0xd0 */
/* when phase & 0x200 is set and noise=0 then phase = 0x200|(0xd0>>2), ie no change */
if (phase&0x200)
{
if (noise)
phase = 0x200|0xd0;
}
else
/* when phase & 0x200 is clear and noise=1 then phase = 0xd0>>2 */
/* when phase & 0x200 is clear and noise=0 then phase = 0xd0, ie no change */
{
if (noise)
phase = 0xd0>>2;
}

The noise value above is calculated in a noise-generator which gives a new value for each output sample, and uses the following algorithm in the reference implementation. The noise value in the above algorithm is the lowest bit of the OPL->noise_rng variable.

OPL->noise_p += OPL->noise_f;
i = OPL->noise_p >> FREQ_SH; /* number of events (shifts of the shift register) */
OPL->noise_p &= FREQ_MASK;
while (i)
{
if (OPL->noise_rng & 1) OPL->noise_rng ^= 0x800302;
OPL->noise_rng >>= 1;
i--;
}

My simplified and speeded-up version of the phase generation algorithm is below. The problems in my algorithm are that it does not take into account the frequency of operator 2 in channel 8 at all, as I don't have enough free registers to handle two operators simultaneously, and my noise generation is completely different. The result is that the HiHat does not sound quite like it should, it has more of a ringing and less noise to it's sound. It will have to do for now, though, until I figure out a better 1-CPU-cycle noise generator than my tst r7, r7, ror r7 opcode, or can figure out a way to calculate another operator while calculating the current operator as well.

@-------
@ On input: r7 = SLOT7_1->Cnt (16.16 fixed point value, FREQ_SH = 16)
@ On output: r1 = phase << 9
@-------
eor r1, r7, r7, lsr #5 @ r1 = (bit2 ^ bit7); (<<16)
orr r1, r7, lsr #1 @ r1 = (bit2 ^ bit7) | bit3; (<<16)
and r1, #(1<<(16+2)) @ r1 = res1 = (bit2 ^ bit7) | bit3; (== 0x200 shifted 9 bits left)
tst r7, r7, ror r7 @ Carry flag = pseudo-random noise value
orrcc r1, #(0xD0<<(16+2-9)) @ phase = res1|0xd0;
orrcs r1, #(0xD0<<(16+2-9-2)) @ phase = res1|0xd0>>2;

I also managed to speed up some things in my AdLib emulation in general, for example I reordered the operand-specific values in memory so that instead of using 7 separate ldr commands to load the r4-r10 registers needed in each operator calculation loop I load them with a single ldmia opcode, and I also improved the envelope calculations somewhat. The envelope generation in AdLib has the usual four phases, Attack, Decay, Sustain and Release. Plus silence of course. The code I use for the envelope generation is the following, it is similar to both melodic and rhythm instruments:

@-------
@ Calculate envelope for SLOT 1.
@ On input:
@ r1 = scratch register
@ r4 = sustain level (or silence level if in release phase) (in low 16 bits, 0..512)
@ r5 = envelope increment/decrement value (16.16 fixed point)
@ r8 = operator volume (16.16 fixed point), 0 = max volume, 512<<16 = silence
@-------
adds r8, r5 @ Adjust the volume by the envelope increment. Carry set if we are in attack phase.
bmi from_attack_to_decay_phase @ Go to decay if we went over max volume
rsbccs r1, r8, r4, lsl #16 @ Did we go under the SUSTAIN level (and we are not in attack phase)? Carry clear if we did.
bcc from_decay_to_sustain @ Yep, go adjust the volume
env_adjust_done:

The main idea of this code is that during normal envelope operations the program flow does not need to take any jumps, it will flow directly thru these four opcodes. The algorithm above is an ASM version of the following C language code. This is not based on the reference AdLib implementation, as I have completely re-engineered the envelope generation code to be based on running 16.16 fixed point adders instead of (slow) table lookups.

op->volume += op->env_incr;
if ( op->volume < 0 )
goto from_attack_to_decay;
if ( op->env_incr >= 0 && op->volume > op->sustain )
goto from_decay_to_sustain;
env_adjust_done:

The interesting part of my ASM implementation is that by using the reverse subtract RSB opcode instead of the CMP opcode I was able to get rid of the extra compare to see whether op->env_incr is greater than or equal to zero. The first adds always sets the carry flag if r5 (op->env_incr) is negative, and in that case I don't want to test the sustain level. So, by swapping the resulting Carry flag of the comparison between op->volume and op->sustain I can make sure the jump to from_decay_to_sustain is never taken if either op->env_incr is negative or op->volume is <= op->sustain. This is what I especially like about the ARM architecture, the conditionally executable opcodes make all sorts of neat tricks possible! I've been using the conditionally executed compare opcodes quite a bit recently, as that is a nice way to handle "comparison AND comparison" type of tests (with short-circuit evaluation).

The only bigger issues (in addition to the HiHat and Cymbal operators) in the AdLib emulation now are the amplitude and frequency modulation support. I actually have the amplitude modulation already coded in, but it does not seem to work properly, so that I still need to debug. Last September I did not have the iDeaS emulator to use for debugging the ARM7 code, so now debugging is much easier than what it was back then. The frequency modulation I haven't implemented at all yet, as it is quite a heavy CPU burden and I'm not yet sure if I can spare the CPU cycles. I would like to test that, though. After those changes I would consider my AdLib emulation finished, so I could release the sources and/or create an ARM7 library for using it, so that it could be used in all sorts of PC game porting projects. I would like to hear the original music in the Doom and Wolfenstein ports, for example, and why not Quake too, if it used AdLib music?

Major internal refactoring starting
I have now run into several games that point the segment register outside of the graphics memory area (Gods, Silpheed) or the EMS memory area (Alone in the Dark), which breaks my speedup trick of precalculating the memory access area using only the segment register. I decided to change my memory access technique to be more compatible and robust, which will sadly also make it slower.

In this version I have started this internal reworking. Pretty much none of it should be visible yet, but a few opcodes are somewhat slower than before. I need to change every single opcode (which I have spent the last nearly a year implementing) to use this new architecture, so it will take quite a long time before this work is finished. I plan to change a few opcodes (or opcode groups) by each version, and try to keep the two-week release window even while I am doing this change. I will also implement other changes and improvements, so this major rewrite is a sort of constant background process. Removing the Direct screen update method was also partly due to this internal reworking, as it would have gotten in the way of some required internal changes.

I haven't done much about the general compatibility this last week, as I have focused on the audio issues. I hope to look into a couple of new games again for the next version, and look into improving either the screen scaling methods or the touchpad mouse emulation. I'm not yet sure which.

Anyways, thanks again for your interest in DSx86, and sorry for this long blog post! Being on vacation gives me more time to work on DSx86, and also makes it possible to write longer blog posts. Oh, and happy 4th of July to all you celebrating it! :-)

http://dsx86.patrickaalto.com/DSdown.html