Wednesday, December 31, 2008

Memory Access Optimization

I recentely commit a change to enable "MemoryAccessOptmizition". I expect this will give the better results than StackOptimization. It is by default disabled now.

StackOptmization is a optmization when dynamic recompiling the N64 code into PSP code. It will generate the less code to access memory based on the assumption that N64 will always access RAM instead of IO when using SP register.

Memory Access Optimizatioin is based on another assumption that one instruction will access either RAM, either I/O, but not both. So if we believe one instruction will only access memory, we can generate better PSP code. I believe this approach will make more instructions get benifit than stack optimization.

Most of games run well under this optimization based on my testing. However Super Mario will crash. I finally tracked down to a instruction LB. So in current version, I don't do this optimization for LB, but for LBU/LH/LW/SB/SH/SW.

Wednesday, December 17, 2008

What Does Speed Hack Mean

In Daedalus, SpeedHack mainly refer to the technoligy to avoid busy wait (http://en.wikipedia.org/wiki/Busy_wait). In the real N64 machine, RSP, GPU, DMA will do the work parallelly with CPU. Busy Wait is used to wait for other component completes their work. However in daedalus, we emulate almost everything serail. Busy wait is nothing expect slow down the emulation.

The technoligy used in Daedalus is skip these busy wait. The hardest thing is how to detect different types of busy wait.

The detection is divided into two parse. When we find a Jump instruction that jump to itself's address, we will mark this instruction as SpeedHack. Then when we processing the delay slot, we will decide how to handle it.

So far we can only detect the "loop to self" type of busy wait. So far we can detect the following different types of busy wait.

0x80242e54 0x1000ffff B --> 0x80242e54
0x80242e58 0x00000000 NOP

0x80026054 0x08009815 J 0x80026054 ?
0x80026058 0x00000000 NOP

0x7f0d01e8 0x5443ffff BNEL v0 != v1 --> 0x7f0d01e8
0x7f0d01ec 0x24420004 ADDIU v0 = v0 + 0x0004

0x7f14a08c 0x5464ffff BNEL v1 != a0 --> 0x7f14a08c
0x7f14a090 0x24630001 ADDIU v1 = v1 + 0x0001

0x800006a4 0x1450ffff BNE v0 != s0 --> 0x800006a4
0x800006a8 0x00000000 NOP

0x8011ec14 0x0623ffff BGEZL s1 >= 0 --> 0x8011ec14
0x8011ec18 0x2631ffff ADDIU s1 = s1 + 0xffff


The first 2 types are very common. They are busy wait until the event happen. We can safely ignore the acutal code but skip to the next event. (most time, it is VBlank Interrupt.)

The other 3 types are not so common. I will leave them as it is. If we find they are very common, we can implement specific code to handle.

Saturday, December 6, 2008

JPEG support in daedalus

Wally told me that the green room in Zelda:OoT is due to lack of support of jpeg. So I decide to take a quick look if I can port the jpeg task support from Mupen64.

Giving some tech background. In N64, the secondary CPU is called RSP. Daedalus emulate RSP in high level. There are several type of tasks which is running in RSP. Two of them are very famous, DList (display list) and AList (audio list). It is supported by daedalus now, so we can see the graphics and hear the audio. There are other 2 types VidTask and JpegTask is not supported yet. JpgTask is used to decode a jpg data into a picture. Lack of this support, the zelda's room will be green.

I ported the code which is pretty straightforward. I tested in PC version of daedalus. Seems everything works well. I don't have a green room anymore. However when I run PSP version, I still get the green room. I don't have enough knowledge to track graphics issue. I suppose there is something missing in our graphics support.

So, in short, I still didn't get rid of green room although I ported jpeg support code.

Friday, December 5, 2008

OSHLE for new dynarec

I spend some time to work on Daedalus. One big change I made is adding back the support of OSHLE.

OS HLE is a technology to simulate a "Operation System Call" in high level, which will bring some significant performance increase. However the limitation is obversious. You have to find out every os functions from ROM, understand the funcationality and implement the patch functions in C.

Originally, Daedalus has about 100 os functions which already be understanded. The approach is using a invalid OP to replace the first instruction in os function call. When the simulation reads this special op (OP_PATCH), the patched functions will be called.

I am not quite sure why oshle is not enabled in original Daedalus version. I guess that the problem is the original approach make it hand to handle the os function's return. (os function may use ret and eret as return. If you are fimiliar with x86 assembly, these are same as ret and iret.)

So I decide to use a different approach instead of patch the instruction. This new method make it is easy to integrate oshle patch function to new dynarec engine. For every os patch function, I create a Fragment. The fragment will be used as a normal fragment. I got the benifit like fragment linking for free. (If you want to learn more about dynarec engine. Please refer this paper: Dynamo: A Transparent Dynamic Optimization System)

Inside fragement, I dynamically generate the call to the patch function. Then emit the code to check the return code. Based on the return code, the indirect exit (RET) stub or the ERET exit stub will be called corresponding.