Status of general MMX span routines. 09/08/97 Checked in MMX code. There is no way that the current code will compile and run. I haven't even tried to compile it. This is primarily to have it backed up and to let anyone that is interested see what has been done. The orginal C (or MCP) code are comments of these ASM or MAS files. The ACP directory contains a program that generates the .INC file for offsets to all the data. This program was used by Drew and seems to work better than H2INC. We should probably only have one of these that would go in the inc directory, but it's not done that way now (Plus, my code doesn't generate it based on a makefile. Three regular registers have been set aside for use to access the data. Since these are passed to every routine, I don't have to pass anything on the stack as long as I don't modify them. I have modified them a couple of times before I added this and they need to be changed to esi, edi, ebp or eax (eax is usually used for the next indirect jump). ebx is a pointer to the D3DI_SPANITER data (Also Accesses the SI stuff inside it). ecx is a pointer to the D3DI_RASTPRIM data. edx is a pointer to tge D3DI_RASTSPAN data. There are a few very useful m4 macros to acess this data in readable way (It also made converting C code easier): define(`XpCtxSI',`[ebx+D3DI_SPANITER_$1]')dnl define(`XpCtx',`[ebx+D3DI_RASTCTX_$1]')dnl define(`XpP', `[ecx+D3DI_RASTPRIM_$1]')dnl define(`XpS', `[edx+D3DI_RASTSPAN_$1]')dnl Things that need to be done. 1) New Special W divide. MMX newton's method code has already been written, but it was very specialized (I negated the OoW and OoWDX so that 2 - Oow*iW could be done with a pmadd and a few other things.) Code shouldn't have to change much. 2) Assembly equivalents to the ACMP, ZCMP macros. A version of these has also been written, but most compares were done in a reverse order (to preserve registers). The MMX Alpha and Z setup will most likely have to be different. This means that the atest.asm has not been coded. A test.mas file is written, and is missing ZCMP16 and ZCMP32. The other 4 specific code cases are done exactly like the C version except the iXorMask always seems to be inverted do to how the comparison is done. 3) BufWrite is not implemented. The code for doing this has been done in APP notes. The 16 bit cases use a pmaddw to combine the colors more quickly than shifting. There is also work beening done on a quick dithering routine. The MMX dithering routine will use a pcmpgtw to compare with the dither table and the do a psubssw since if the color value is to be incremented, then the mask will be all ones (= -1). Subtracting it will increment the color. The saturation is used to not increase too much. The only problem to this is that the color is unsigned so it has to be shifted down by one to saturate to 7fff. 4) BuffRead is not done. It uses almost identical routines as those in texread. 5) Lots of clean up and 64 bit constants that need to be in memory. I have to figure out what registers get passed to routines that are called and what is passed back. In some cases, it may be possible to pass data from one bead to the next using registers. This maybe difficult though. 6) ColorBld conversion. Mostly ROP stuff and calling of bldfuncs.asm. ROP stuff should be pretty easy. 7) Since function names are the same, if I made a header file declaring them extern "C" { }, the assembly code could concievably execute in place of the current c code. This is where the true bomb test is. 8) There's probably more, but there is always more.