Great job irondirgible, for the fast news.
Well here's what StrmnNrmn had to say...
After Wednesday's news I wanted to keep everyone up to date with what I've been working on over the past few days.
With Wednesday's changes incorporated, I reprofiled a few roms to see where most of the CPU time was going. Things have changed considerably since I initially talked about deciding what to optimise. Looking at the profiler for Mario 64 the time spent executing display lists is now a much more significant fraction of the total time spent on each frame. Back around R3/R4 only around 20% of the time was spent here. With the latest build display list processing now accounts for around 35-40% of the time. The display list processing hasn't become any slower, it's just becoming more significant as I've optimised the CPU emulation.
One of the settings I mentioned was worth disabling for a speed boost when I released R7 was the 'Tesselate Large Triangles' option. When this setting is enabled, it causes the display list processor to recursively break up large triangles into smaller pieces. This has been necessary to overcome the PSPs poor hardware clipping support; without breaking the triangles up into smaller pieces, the PSP will often fail to render large triangles as shown below:
Super Mario 64 without clipping
The large triangles that make up the floor that Mario is standing on are rejected by the PSP, leaving a large hole where the floor should be. By breaking the triangles into smaller pieces before attempting to render them, it reduces the chance that the PSP will decide to discard them.
There were a few problems with the 'Tesselate Large Triangles' setting which I've been working on overcoming this weekend. Firstly, it's not perfect - there were plenty of cases where visible triangles would still be culled even when they had been subdivided 3-4 times (which generates 27-81 triangles for each input triangle!). This was always quite noticable in games with a relatively low camera, such as racing games. The other big problem with this setting was that it was very slow - often adding over 20ms per frame.
This setting was always intended as a quick fix rather than a long term solution, so I've been looking at fixing both of these problems over the past few days. I started by ripping out all the exisiting polygon clipping and tesselation code and starting from scratch. After a couple of days of hacking I've finally got a replacement system that seems to be clipping everything I've thrown at it perfectly. Here's a shot of the same location in Mario 64:
Super Mario 64 with new clipping code
Now that I have a working version of the code in place, I'm going to look at optimising it. At the moment the new clipping code is roughly as expensive as the tesselation code, but due to the way it's implemented I think it should be much easier to make work with the PSP's VFPU, as I can process batches of vertices in parallel. Ideally I'd like to get this change into the next release, so I'm going to hold off putting the R8 build together until it's ready. I'll let you know how I get on.
-StrmnNrmn
Bookmarks