gfoot has released a nice looking demo for the GP2X, heres what he wrote:
I've found and fixed a bug in my copy of your implementation - I don't know whether it's unique to the way I'm using the routines, but I've found it necessary to clear the source fifo when doing a blit. Set the MESG_FFCLR bit in the control register. Otherwise blits of funny widths didn't work properly (e.g. 64 pixel wide was fine, 48 was not). This only affects blits, rectfill is fine because it doesn't read the source data. Actually I'm not sure why the fifo matters anyway, as we're not reading from the CPU, we're reading from video memory (INVIDEO is set).
It's doing 8 layers of parallax scrolling, with tile sizes starting at 64x64 and dropping down to 8x8. So it's stressing a lot of small tiles as well as a few big tiles. All masked, at 8bpp. It gets 56fps (seems to be the maximum) with 8 layers, which doesn't seem bad. 9 layers drops to 28fps, 10 layers drops to 14fps - the overhead of doing so many small operations really kills it. At 16bpp it's a bit slower too - I'm getting about 2/3 of the performance I get at 8bpp.
Thanks for the heads up on the flushing routine - I am indeed still getting corruption, e.g. at the bottom right corner of the screen in the demo above. I'm also getting a white pixel in the top left, probably due to the old dummy blit routine I'm using. Hmm, maybe flushing the fifo is related to this somehow.
Something else I'm not sure of is how you figured out you have to set the busy bit to kick off the operation - it's marked as read only in the docs...
Download via comments: