Status of general MMX span routines.

09/08/97 Checked in MMX code.
  There is no way that the current code will compile and run.  I haven't
  even tried to compile it. This is primarily to have it backed up and
  to let anyone that is interested see what has been done.
  The orginal C (or MCP) code are comments of these ASM or MAS files.

  The ACP directory contains a program that generates the .INC file
  for offsets to all the data.  This program was used by Drew and
  seems to work better than H2INC.  We should probably only have one
  of these that would go in the inc directory, but it's not done that
  way now (Plus, my code doesn't generate it based on a makefile.

  Three regular registers have been set aside for use to access the data.
  Since these are passed to every routine, I don't have to pass anything
  on the stack as long as I don't modify them.  I have modified them a
  couple of times before I added this and they need to be changed to esi,
  edi, ebp or eax (eax is usually used for the next indirect jump).

  ebx is a pointer to the D3DI_SPANITER data (Also Accesses the SI stuff
    inside it).
  ecx is a pointer to the D3DI_RASTPRIM data.
  edx is a pointer to tge D3DI_RASTSPAN data.

  There are a few very useful m4 macros to acess this data in
  readable way (It also made converting C code easier):

define(`XpCtxSI',`[ebx+D3DI_SPANITER_$1]')dnl
define(`XpCtx',`[ebx+D3DI_RASTCTX_$1]')dnl
define(`XpP', `[ecx+D3DI_RASTPRIM_$1]')dnl
define(`XpS', `[edx+D3DI_RASTSPAN_$1]')dnl

  Things that need to be done.
    1) New Special W divide.   MMX newton's method code has already
        been written, but it was very specialized (I negated the
        OoW and OoWDX so that 2 - Oow*iW could be done with a pmadd
        and a few other things.)  Code shouldn't have to change much.

    2) Assembly equivalents to the ACMP, ZCMP macros.  A version of
        these has also been written, but most compares were done in
        a reverse order (to preserve registers).  The MMX Alpha and
        Z setup will most likely have to be different.  This means
        that the atest.asm has not been coded.  A test.mas file is
        written, and is missing ZCMP16 and ZCMP32.  The other 4
        specific code cases are done exactly like the C version
        except the iXorMask always seems to be inverted do to how
        the comparison is done.

    3) BufWrite is not implemented.  The code for doing this has
        been done in APP notes.  The 16 bit cases use a pmaddw
        to combine the colors more quickly than shifting.  There
        is also work beening done on a quick dithering routine.
        The MMX dithering routine will use a pcmpgtw to compare
        with the dither table and the do a psubssw since if the
        color value is to be incremented, then the mask will be
        all ones (= -1). Subtracting it will increment the color.
        The saturation is used to not increase too much.  The
        only problem to this is that the color is unsigned so
        it has to be shifted down by one to saturate to 7fff.

    4) BuffRead is not done.  It uses almost identical routines
        as those in texread.


    5) Lots of clean up and 64 bit constants that need to be in
        memory.  I have to figure out what registers get passed
        to routines that are called and what is passed back.
        In some cases, it may be possible to pass data from one
        bead to the next using registers.  This maybe difficult
        though.

    6) ColorBld conversion.  Mostly ROP stuff and calling of
        bldfuncs.asm.  ROP stuff should be pretty easy.

    7) Since function names are the same, if I made a header
        file declaring them extern "C" { },  the assembly code
        could concievably execute in place of the current c code.
        This is where the true bomb test is.

    8) There's probably more, but there is always more.