PDA

View Full Version : SH4 DMAC demo



wraggster
June 20th, 2005, 23:18
Heinrich Tillack just emailed me with this rather cool news for us Dreamcast sceners:

hi !
SH4 DMAC demo

Yeah, my first working SH4 DMAC demo.
I use DMAC channel 1 in this demo to copy large data amounts to the videoscreen!

This demo also shows how the PVR DMA, Store queues (SQ) and the CPU is performing this copy operation.
(There is also code for Colortable DMA which is not working.)

download at http://a128.ch9.de
Some results of this test:
Results for 1000 loops and 640*480*4 bytes transfer each loop:
test memory Sh4->PVR
51344 ms dmac 32bit mode
16298 ms dmac 256bit mode
40782 ms cpu loop
16321 ms sq loop
6222 ms dma loop
test memory Sh4->Sh4
8920 ms dmac 256bit copy loop
18966 ms CPU 32bit copy loop

bye
heinrich

Thanks to Heinrich for the very interesting news ;)

GPF
June 21st, 2005, 04:24
test memory Sh4->PVR

51362 ms dmac 32bit mode

16315 ms dmac 256bit mode

40983 ms cpu loop

16368 ms sq loop

6240 ms dma loop

test memory Sh4->Sh4

8940 ms dmac 256bit copy loop

19020 ms CPU 32bit copy loop

why are my results slightly slower?

also how does this differ from using KOS's pvr_txr_load_dma ?

Thanks,
Troy

a128
June 21st, 2005, 06:33
test memory Sh4->PVR


8940 ms dmac 256bit copy loop

19020 ms CPU 32bit copy loop

why are my results slightly slower?

also how does this differ from using KOS's pvr_txr_load_dma ?

Thanks,
Troy

the CPU loop and maybe also the other loops are slower cause maybe your gcc version is not that good as gcc 3.3.5 i used (the CPU loop could suffer from a bad gcc version)

try uploading the test.elf in the archive . this should have the same results,

Christuserloeser
June 21st, 2005, 07:45
Very good news :) Thanks for your great work, A128 :)

ron
June 21st, 2005, 07:51
why don't you try to upgrade GCC to 3.4.2, results seems much better than older versions. Anyway Thanks very much for your work :D

GPF
June 21st, 2005, 14:48
the CPU loop and maybe also the other loops are slower cause maybe your gcc version is not that good as gcc 3.3.5 i used (the CPU loop could suffer from a bad gcc version)

try uploading the test.elf in the archive . this should have the same results,

The results were from the test.elf except i did a $KOS_OBJCOPY -O binary -R .stack test.elf upload.bin since my version of dc-tool-ip doesn't work with elf files.

I will try to compile it soon and test it with GCC 4.0.0 and post the results.

Is your example faster than KOS functions? Or are they functions that are not in KOS now?

Thanks,
Troy

quzar
June 21st, 2005, 14:55
It will be slightly slower if you upload from bba than coders cable or CD.

a128
June 21st, 2005, 15:50
The results were from the test.elf except i did a $KOS_OBJCOPY -O binary -R .stack test.elf upload.bin since my version of dc-tool-ip doesn't work with elf files.

I will try to compile it soon and test it with GCC 4.0.0 and post the results.

Is your example faster than KOS functions? Or are they functions that are not in KOS now?

Thanks,
Troy

Did KOS use the DMAC channel 1 for any operation? I gues not. or did I miss somethink in the KOS source tree?

For PVR DMA I do not no if it`s faster, it`s basicly what KOS does, except for some settings which I think is more how you have to setup the PVR DMA stuff.
Just have a look at the source code, I have put some note where it?`s diifferent compared to KOS

Does anyone know why the colortable DMA (which is also able to move memeory from Sh4->PVR) not working in that demo?

GPF
June 21st, 2005, 16:19
Did KOS use the DMAC channel 1 for any operation? I gues not. or did I miss somethink in the KOS source tree?

For PVR DMA I do not no if it`s faster, it`s basicly what KOS does, except for some settings which I think is more how you have to setup the PVR DMA stuff.
Just have a look at the source code, I have put some note where it?`s diifferent compared to KOS

Does anyone know why the colortable DMA (which is also able to move memeory from Sh4->PVR) not working in that demo?

Ok ill take a look at the source, I not that familiar with pvr code in KOS, other than PVR DMA doesn't work with Chankast :)

Thanks,
Troy

Morph
June 21st, 2005, 18:55
It may be possible that you get different results based on the hardware revision you use it on.

Rone
June 21st, 2005, 20:20
test kos 1.2 (build gcc-3.0.4) linux

gcc-3.0.4

test memory Sh4->PVR

51342 ms dmac 32bit mode

16296 ms dmac 256bit mode

40772 ms cpu loop

16319 ms sq loop

6241 ms dma loop

test memory Sh4->Sh4

8920 ms dmac 256bit copy loop

18966 ms CPU 32bit copy loop


gcc -3.4.4

test memory Sh4->PVR

51341 ms dmac 32bit mode

16296 ms dmac 256bit mode

40744 ms cpu loop

16319 ms sq loop

6262 ms dma loop

test memory Sh4->Sh4
no work

quzar
June 21st, 2005, 22:16
It may be possible that you get different results based on the hardware revision you use it on.

I've always pondered over writing some sort of benchmark suite for the DC to determine the differences between the two SH4s used in them. According to Rensas(sp?) (whom I emailed about the question) the DC's second version of SH chip was a revision made for heat and energy efficiency, so benchmarks with it might be slightly different.

Morph
June 22nd, 2005, 17:39
True, and making it heat and energy efficient doent necessarily make it mroe powerful. Heck, for the 5GHz project, they used a Northwood Pentium 4, which was actually an older core by then.

Phantom
June 22nd, 2005, 19:16
Supplied .elf:

test memory Sh4->PVR

50478 ms dmac 32bit mode

15494 ms dmac 256bit mode

39055 ms cpu loop

15519 ms sq loop

6081 ms dma loop

test memory Sh4->Sh4

8920 ms dmac 256bit copy loop

18967 ms CPU 32bit copy loop


GCC 3.4.4:

test memory Sh4->PVR

50470 ms dmac 32bit mode

15488 ms dmac 256bit mode

39019 ms cpu loop

15511 ms sq loop

6100 ms dma loop

test memory Sh4->Sh4
(..HANG..)

EDIT: The Sh4->PVR times I get differ from what other people have reported so far.

a128
June 24th, 2005, 15:45
benchmarks are great and it makes fun to test aganst different compiler ..and so on.

but it would be great if some one use DMAC /PVR DMA in a original dreamcast homebrew project.


well that gcc 3.4.x does not work with the DMAC demo....this is not a suprise to me. cause I had also problems with GCC >3.3.5