Via
http://strmnnrmn.blogspot.com/
Tracking down the SSB Dynarec Bug - Part 2
On Monday I talked about the fragment simulator and how this could be used to help track down bugs in the dynarec implementation. In this post I'm going to talk a bit about a tool I use mostly for regression testing, but also to help determine the exact point at which the fragment simulator and the interpretative core go out of sync. It's a bit of a long post, so apologies in advance
Daedalus can be compiled with a flag which enables a special 'synchronisation' mode. This build configuration creates an instance of a synchronisation class which can be initialised in one of two modes - either as a producer or as a consumer. At various points during program execution I pass information about the internal state of the emulator to the synchroniser for processing. In the case of the producer, it simply writes this data out to a file on disk. The consumer is a bit more interesting; it reads data of the required size from disk, and compares this 'baseline' value against the value provided by the emulator. If these two values are found to be different, the synchroniser knows that things have drifted out of sync and it can trigger a breakpoint and drop out into the debugger.
This technique relies on the fact that the emulator is deterministic, i.e. running the emulator twice in a row with the same inputs generates exactly the same results. By 'inputs' this means not just the same rom image, but external inputs such as data from the controller must match exactly too. Obviously pressing buttons on the controller in exactly the same order with the same timings would be impossible to duplicate, so the other function the synchroniser performs is to record input from the pad in the case of the producer, or play input back in the case of the consumer. Other external input, such as calls to timer functions (e.g. time(), QueryPerformanceCounter() or rdtsc) can be synchronised in the same way.
The synchroniser works with as few or as many sync points as you provide. For debugging very simple problems, you can get away with just checking the value of the program counter as each instruction is executed. For more tricky problems you can end up adding many more sync points - for instance you can synchronise the entire register set after every instruction to ensure that the synchroniser catches any instruction which generates a different result from the baseline.
I add sync points to Daedalus using a set of macros. When synchronisation is enabled, the macros expand out to calls to a virtual method on a global instance of the synchroniser class. An example sync point in the code might look like this:
u32 pc = gCPUState.CurrentPC;
SYNCH_POINT( DAED_SYNC_REG_PC, pc );
OpCode op;
if( CPU_FetchInstruction( pc, &op ) )
{
CPU_Execute( pc, op );
}
The interesting line here is the SYNC_POINT macro, which synchronises on the current program counter value. For producers, this just writes the value of 'pc' to disk. For consumers, it checks that the value we have for 'pc' matches the one read from disk.
The DAED_SYNC_REG_PC argument is simply a flag to describe what is being synchronised. Another global constant allows easy control of what is synchronised:
enum ESynchFlags
{
DAED_SYNC_NONE = 0x00000000,
DAED_SYNC_REG_GPR = 0x00000001,
DAED_SYNC_REG_CPU0 = 0x00000002,
DAED_SYNC_REG_CCR0 = 0x00000004,
DAED_SYNC_REG_CPU1 = 0x00000008,
DAED_SYNC_REG_CCR1 = 0x00000010,
DAED_SYNC_REG_PC = 0x00000020,
DAED_SYNC_FRAGMENT_PC = 0x00000040,
};
static const u32 DAED_SYNC_MASK(DAED_SYNC_REG_PC);
#define SYNCH_POINT( flags, x, msg ) \
if ( DAED_SYNC_MASK & (flags) ) \
CSynchroniser::SynchPoint( x, msg )
If I want to enable more thorough debugging, I can change DAED_SYNC_MASK and OR in more values:
static const u32 DAED_SYNC_MASK(DAED_SYNC_REG_PC|DAED_SYNC_REG_GPR) ;
Changing the mask value requires the emulator to be rebuilt from scratch and the baseline synch file to be recreated. This is a bit time consuming but doing it in this way means that the compiler can optimise out any synch points which we aren't interested in, keeping things running as quickly as possible.
One problem with this technique is that the synchroniser can quickly generate a massive amount of data, so much that most of the execution time is spent shifting this data to or from disk, slowing debugging to a crawl. In the example I gave on Monday, it can sometimes take over 500 million instructions before things go out of sync.
...
Catherine: Full Body’s English translation for the Vita