Okay, I have finally got a little bit forward with the Windows 3.11 support. Still no end in sight for the changes I still need to do, but at least I have managed to get some progress done. I found out the reason for the invalid VxD dynamic link call. I found a list of the device numbers from a Microsoft Knowledge Base article, and the device number 1 means the Virtual Machine Manager. The service number 0x2484 (9348) is larger than the number of services available in VMM, so this is why the blue screen occurred. That had nothing to do with VGA registers. The reason why I suspected some VGA register problem was that I got unsupported I/O port calls that did not happen in DOSBox, but it turned out that those happened while Windows was abruptly switching back to text mode to display the blue screen error message! So the VGA register problem was a symptom, not a cause.
Anyways, after quite a bit of debugging I then found where the invalid service number call happened. Windows 3.11 uses INT 20 software interrupt to perform those dynamic VxD calls. The calls are coded so that the interrupt opcode CD20 is followed by first the service number (in two bytes) and then the device number (in two bytes). Here below are two screen copies from the debugger illustrating what the problem was in DS2x86. On the left is the problem situation, where the INT 20 opcode at offset 8028DFFD is followed by the service number. Here the service number happens to be split into two 4KB pages, with the low byte being at offset 8028DFFF and the high byte at 8028E000 (which is in a different physical memory page). As Windows uses virtual memory, these two pages may not be adjacent in the physical memory, but my movzx opcode (which Windows uses to read the service number and device number within the INT 20 handler) did not handle this situation. It simply calculated the physical start address of the 16-bit value (from the offset 8028DFFF) and then read two bytes from that address. This caused the high byte to be read from whatever page happened to physically follow the current page in RAM, and this page happened to have byte 0x24 in the first offset of the page. Reading the other parameter (the device number 0x0001) in turn worked correctly, as it calculated the physical address from offset 8028E001 and correctly read the value from there, as shown on the right hand debug screen copy.
By the way, the screen copies above display another interesting (or annoying, depending on whether you are attempting to debug it!) behaviour in Windows 3.11. In many cases Windows replaces this (slow) interrupt call with an indirect function call after it has been executed once. In other words, Windows seems to use a lot of self-modifying code! For example, the opcode at offset 8028DFCE was originally a similar INT 20 call to device 0001 service 0084 (opcode bytes CD 20 84 00 01 00) but it has been replaced by a call near word [800118D0] (opcode bytes FF 15 D0 18 01 80) after it was once executed. Both of these seem to be some simulated DOS interrupt calls (as they follow a mov eax, 00000021 opcode, which loads EAX register with 0x21, which is the DOS interrupt number). You can perhaps imagine how difficult and frustrating it is to debug code that keeps changing itself while you run it!
In any case, adding a check for page split into movzx opcode handling fixed this problem, but the bigger issue remaining is that there are still a lot of other opcodes that may have the same problem. This is the reason why paging is so difficult (or more accurately, slow) to support. I would need to have a check for this split page handling in every opcode that accesses more than a single byte of RAM, but this will of course make the code much slower. Perhaps eventually I will decide to have two versions of DS2x86, one which does not support paging but is much faster, and another with full paging support but running much slower.
The next problem I ran into was that the code jumped to a real-mode address 8C80:0000, but the processor was in protected mode. So when the code there began with opcodes PUSH CS followed by POP DS, the DS register was loaded with an invalid selector. After some more debugging I realized that the address 8C80:0000 is jumped to when the WIN386.EXE loads and executes KRNL386.EXE (using the DOS LOAD AND/OR EXECUTE INT21 call). The DOS calls can not be run in actual protected mode, only real mode (or VM86 mode). So, if the processor was in actual protected mode after that call, there was certainly a problem in my implementation of it. The call should clear the condion flags when launching the new program, but I had mistakenly made it clear all flags. This meant that also the VM86 flag got cleared, and the processor went into actual protected mode. After fixing that problem the code progressed a little bit further.
The current problem is that Windows 3.11 hangs with the screen in text mode. Looking at the code where it hangs, this seems to be some sort of serious error handler, as the code drops into text mode, prints a message (which in this case is simply an empty string) and then goes into a tight loop. There is an opcode JNE that jumps into itself, so there is no chance that the code will progress further from that point. So, what I am currently trying to determine, is where and why KRNL386.EXE determines something is so badly wrong that it needs to halt the system. Again I need to compare the behaviour with DOSBox starting from the beginning of KRNL386.EXE loading and executing, so this will again probably take many days to figure out and solve.

http://dsx86.patrickaalto.com/DSblog.html