Page 1 of 5 12345 LastLast
Results 1 to 10 of 44

Thread: Daedalus blog update - 15 August 2007 - Interesting Dynarec Hack

                  
   
  1. #1
    DCEmu Pro
    Join Date
    Aug 2006
    Location
    Norway
    Age
    32
    Posts
    902
    Rep Power
    72

    psp Daedalus blog update - 15 August 2007 - Interesting Dynarec Hack

    New blog post from StrmnNrmn's blog tells us about how some dynarec optimizations make Daedalus faster.

    I was playing around with the code generation a couple of evenings ago, and realised that if I made a certain assumption, I could drastically speed up specific types of memory accesses.

    When I discussed load/store handling on Sunday, I presented the new code that is typically generated for handling a load such as 'lw $t0, 0x24($t1)' on the N64:


    ADDIU a0 = s1 + 0x0024 # add offset to base register
    SLT t0 = (a0 BEQ t0 != r0 --> _Trampoline_XYZ123 # branch to trampoline if invalid
    ADDU a1 = a0 + s7 # add offset to emulated ram
    LW s0 <- 0x0000(a1) # load data


    (I'll ignore all the extra code which is generated, and just concentrate on the 5 instructions above which correspond to the expected path of execution.)

    Of the 5 instructions that are generated, two - the SLT and BEQ - are just there for performing error handling in the case that the address is invalid, a hardware register (i.e. for memory-mapped I/O), or a virtual address. I'll call this error handling for short.

    If we were generating code in an idealised environment where we didn't have to perform error handling, we could drop the SLT/BEQ instructions to get this:


    ADDIU a0 = s1 + 0x0024 # add offset to base register
    ADDU a1 = a0 + s7 # add offset to emulated ram
    LW s0 <- 0x0000(a1) # load data


    We could then optimise this even further and perform the offset calculation directly as a part of the LW instruction:


    ADDU a1 = s1 + s7 # add offset to emulated ram
    LW s0 <- 0x0024(a1) # load data


    In this idealised situation we could reduce an emulated load to just two instructions, with no branches. That's a pretty good saving!

    The problem is that the environment we're generating code from is not 'ideal', and it's hard to know in advance of time which memory accesses are going to directly access physical ram, and which are going to access hardware registers or require virtual address translation. For that reason, we have to place a guard around every memory access to make sure that it behaves correctly. At least, that was the way I was thinking until earlier in the week.

    What I realised on Monday is that I can make an assumption that lets me remove the error handling code for certain types of load/stores. The assumption is that when the N64 accesses any memory through the stack pointer ($sp) register, the address is always going to be valid, physical memory.

    The assumption relies on the fact that most roms don't do anything particularly clever with their stack pointers - it gets set up for each thread to point at a valid region of memory then the game just runs along, pushing and popping values from it as the code executes. Of course, if the assumption is wrong then the emulator will just crash and grind to a halt in a unpredictable manner

    It was straightforward to add a hack to the code generation to exploit this kind of behaviour, and the results have been better than I expected - I'm seeing at least a 10% speed up, and the code expansion factor (the ratio of generated bytes of instructions to input bytes) has dropped from around 5.0x to 4.0x. Stability has been excellent too - I've run about 8 roms with the hack so far, and all of them have run perfectly.

    I think one of the reasons the hack has such an impact is that a lot of the memory accesses in a typical C program are through the stack. Here's an example snippet from the entry to a function on the N64, where the compiler emitted code to store the return address and arguments:


    SW ra -> 0x0014(sp)
    SW a0 -> 0x0058(sp)
    SW a1 -> 0x005c(sp)
    SW a2 -> 0x0060(sp)


    When I look through disassembly for the roms I'm working on, it's very common to see lots of sequential loads/stores relative to the stack pointer like this.

    Previously Daedalus generated around 20 instructions (including 5 branches) for the above snippet. With the hack, the generated code now looks like this:


    ADDU t1 = s1 + s7
    SW s4 -> 0x0014(t1)
    ADDU t1 = s1 + s7
    SW s3 -> 0x0058(t1)
    ADDU t1 = s1 + s7
    SW s2 -> 0x005c(t1)
    ADDU t1 = s1 + s7
    SW s5 -> 0x0060(t1)


    8 instructions, 0 branches. What's more, it looks like with a little work, I could eliminate 3 redundant address calculations:


    ADDU t1 = s1 + s7
    SW s4 -> 0x0014(t1)
    SW s3 -> 0x0058(t1)
    SW s2 -> 0x005c(t1)
    SW s5 -> 0x0060(t1)


    Now that would be efficient

    I still want to do lots of testing with the hack. I want to find out if there are roms that don't work with the hack enabled, and how common a problem it is. It's such a significant optimisation though that I'm certain I'll be adding it as an option in Daedalus R13. The results of my testing will probably determine whether I default it to on or off though.

    So far Daedalus R13 is shaping up to be significantly faster than R12. I'm still not sure when I'll be ready to release it, but you'll hear about it here first.

    -StrmnNrmn

    Give feedback to StrmnNrmn here or in the comments section in his post.

  2. #2
    XBOX 360 User masamune's Avatar
    Join Date
    Oct 2006
    Location
    USA
    Age
    36
    Posts
    98
    Rep Power
    0

    Default

    This sounds good... I'm glad to see that his blog updates are only days apart now, Bless his mac

  3. #3
    DCEmu Newbie
    Join Date
    Jan 2006
    Posts
    32
    Rep Power
    0

    Cool cool

    cant wait

  4. #4
    DCEmu Regular
    Join Date
    Jun 2006
    Posts
    334
    Rep Power
    66

    Default

    Adding it as an option seems like a good idea, it sounds like it could potentially break compatability for a number of games. But adding to his other changes, this should give at least a 20-25% speed increase over R12. Awesome!

  5. #5

    Default

    Quote Originally Posted by blackrave View Post
    Stability has been excellent too - I've run about 8 roms with the hack so far, and all of them have run perfectly.
    ...and i now need to change my underwear

  6. #6
    DCEmu Pro PLZKLLME0080's Avatar
    Join Date
    Sep 2006
    Location
    Midgar
    Age
    32
    Posts
    704
    Rep Power
    99

    Default

    Quote Originally Posted by FlyingMojo View Post
    ...and i now need to change my underwear
    Same here. I can't wait for the release of this!

  7. #7
    PS3 User mike03$$$'s Avatar
    Join Date
    Feb 2006
    Location
    Bronx, NY
    Age
    32
    Posts
    270
    Rep Power
    68

    Default

    cool a 25% speed up looking good strmnnrmn

  8. #8
    PSP User On The Rise's Avatar
    Join Date
    Jun 2006
    Location
    Columbus, Ohio
    Posts
    58
    Rep Power
    0

    Default

    This a great breakthrough for the emu. This is going to be the most improved release yet to date for this emu. StrmnNrmn is on a hot streak people he is in his prime. The way he is coding we could see hopefully more than a 30% speed increase. Nobody rush him with a realese date or anything let him do his thing. Thanks StrmnNrmn for updating your blog and coding this great emu. I wish I could donate.

  9. #9
    DCEmu Legend
    Join Date
    Sep 2006
    Location
    USA
    Posts
    2,152
    Rep Power
    75

    Default

    Does the excitement ever stop? I love the PSP homebrew scene and especially great emulators such as this. Makes me feel like a kid again.

    Needless to say, I, like all of you, cannot wait for the next release.

  10. #10
    DCEmu Rookie
    Join Date
    Nov 2005
    Posts
    135
    Rep Power
    67

    Default

    Quote Originally Posted by blackrave View Post
    It was straightforward to add a hack to the code generation to exploit this kind of behaviour, and the results have been better than I expected - I'm seeing at least a 10% speed up, and the code expansion factor (the ratio of generated bytes of instructions to input bytes) has dropped from around 5.0x to 4.0x. Stability has been excellent too - I've run about 8 roms with the hack so far, and all of them have run perfectly.
    Amazing. :thumbup:

Page 1 of 5 12345 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •