Video DMA Interface.


    This note outlines a proposed interface between the display driver,
    miniport and the video port. The objective is to provide an interface
    for the display driver that optimizes DMA throughput for devices that
    support scattergather while maintaining system throughput. The design
    also optimizes locking, performing the DMA and unlocking into one
    IOCTL if desired, a significant performance win. The idea is to provide
    a handle available to the display driver that represents locked memory
    and for that handle to be returned from each DMA IOCTL request. The
    miniport fields some of theses IOCTLs and calls into the video port for
    support.

    This interface supports only PCI busmaster devices.

    ///////////////////////////////////////////////////////////////////
    Video port to miniport interface.
    ///////////////////////////////////////////////////////////////////

    The miniport must provide 2 things at DriverEntry time:

        A) Set Master in VIDEO_PORT_CONFIG_INFO to TRUE.
        B) Provide a callback HwStartDma() of type PVIDEO_HW_START_DMA
        in the VIDEO_HW_INITIALIZATION_DATA.

    Then the following interfaces can be used:


    The NT video port support now exports to the miniport the following
    function:

    1)      BOOLEAN
            VideoPortLockPages(
                IN      PVOID                   HwDeviceExtension,
                IN OUT  PVIDEO_REQUEST_PACKET   pVrp
                IN OUT  PEVENT                  pMappedUserEvent,
                IN      PEVENT                  pDisplayEvent,
                IN      DMA_FLAGS               DmaFlags
                );

    This routine can be called by the miniport to do busmaster DMA for
    DMA devices. It returns TRUE if successful and FALSE if not successful.
    It can only be called in the context of an IOCTL. It cannot be called
    from an ISR or DPC.

    Its arguments are:
        1) A pointer to a DEVICE_EXTENSION.
        2) A pointer to a VIDEO_REQUEST_PACKET, whose OutputBuffer it may
        modify. The InputBuffer must be the virtual address of the memory
        to be locked. The InputBufferSize must be the size of that memory.
        The output buffer will receive a PDMA from which may be extracted
        a pointer to a scattergather list of physical pages which comprise
        the locked down virtual address (via GET_VIDEO_SCATTERGATHER). From
        this pointer, one can extract the physical address of any virtual
        address (see GET_VIDEO_PHYSICAL_ADDRESS).
        3) A pointer to a mapped user event, which may be set by the miniport.
        This is either a valid event returned from EngMapEvent or NULL. This
        should be received from the display driver. It will be passed into
        HwStartDma every time the resulting handle from VideoPortLockPages()
        is passed into the Video Port for a DMA operation. All events can only
        be set in the miniport, not waited on.
        4) A pointer to an event received from the display driver, intended
        to be used by the display driver to wait on DMA completion. May be
        NULL. Again, may only be set in miniport.
        This pointer to an event will also be passed into HwStartDma every
        time the resulting handle is passed into the Video Port for a DMA
        operation.
        4) An enum of type DMA_FLAGS defined in video.h. The values can be:

            a) VideoPortUnlockAfterDma
                This value should be used for a "one shot" dma action, where
                the memory is locked and a dma handle passed to HwStartDma(),
                then the memory is unlocked after the miniport signals via
                setting pDmaCompletionEvent.

            b) VideoPortKeepPagesLocked
                This value should only be used for dedicated graphics units.
                It does not guarantee that the virtual memory passed in will
                remain locked after a dma has completed, only that the system
                will try to keep it locked.

            c) VideoPortDmaInitOnly
                A typical initialization value. If used, the InputBuffer must
                contain a pointer to virtual memory. The HwStartDma will not
                be called in this case (see below). The memory will remained
                locked if possible.

    2)  PDMA
        VideoPortDoDma(
            IN      PVOID       HwDeviceExtension,
            IN      PDMA        pDma,
            IN      DMA_FLAGS   DmaFlags
            );

    Routine Description:

        This function is called by the miniport when a it has a valid DMA
        handle to cause HwStartDma to be called. It can be called outside
        the context of an IOCTL, but not from an ISR. It must execute at
        irql <= DISPATCH_LEVEL.


    Arguments:

        HwDeviceExtension - Pointer to miniport HWDeviceExtension.

        pDma              - Non - NULL DMA handle returned by this routine or
            VideoPortLockPages() in OutputBuffer.

        DmaFlags          - Flags specifying desired action.


    Return Value:

        Non NULL pDma if the corresponding memory is still locked, NULL
        otherwise.

    3)  PVOID
        VideoPortGetCommonBuffer(
            IN  PVOID                       HwDeviceExtension,
            IN  ULONG                       DesiredLength,
            IN  ULONG                       Alignment,
            OUT PVOID *                     pVirtualAddress,
            OUT PPHYSICAL_ADDRESS           pLogicalAddress,
            OUT PULONG                      pActualLength,
            IN  BOOLEAN                     CacheEnabled
            );

    Routine Description:

        Provides physical address visible to both device and system. Memory seen as contiguous
        by device. This routine can only be reliably called at driver load time. Memory allocated
        must be less than 256K.

    Arguments:
        HwDeviceExtension   - device extension available to miniport.
        DesiredLength       - size of desired memory (should be minimal).
        Alignment           - Desired liagnment of buffer, currently unused.

        pVirtualAddress     - unused.
        pLogicalAddress     - [out] parameter which will hold physical address of
                            of the buffer upon function return.
        pActualLength       - Actual length of buffer.
        CacheEnabled        - Specifies whether the allocated memory can be cached.

    Return Value:
        Virtual address of the common buffer.

    4)  PDMA
        VideoPortGetMdl(
            PVOID   HwDeviceExtension,
            PDMA    pDma
            );
    Routine Description:

        Returns a PMDL representing the page table of the locked buffer.

    Arguments:
        HwDeviceExtension   - device extension available to miniport.
        pDma                - Dma handle received from either VideoPortLockPages()
            or VideoPortDoDma().

    Return Value:
        A PMDL reprsenting the locked buffer.



    5)  BOOLEAN
        VideoPortSignalDmaComplete(
            IN  PVOID               HwDeviceExtension,
            IN  PVOID               pDmaHandle
            )
        /*++

        Routine Description:


        Arguments:

            HwDeviceExtension - a pointer to the miniport HW_DEVICE_EXTENSION.

            pDmaHandle  - the handle returned in the output buffer of the
            VIDEO_REQUEST_PACKET after VideoPortLockPages() returns.

        Return Value:

            TRUE if the DPC was scheduled, FALSE otherwise.

        --*/


    6)  PVOID
        VideoPortGetDmaContext(
            IN  PVOID   HwDeviceExtension,
            IN  PDMA    pDma
            );
        /*++

        Routine Description:


        Arguments:

            HwDeviceExtension - a pointer to the miniport HW_DEVICE_EXTENSION.

            pDma  - the handle returned in the output buffer of the
            VIDEO_REQUEST_PACKET after VideoPortLockPages() returns.

        Return Value:

            The Context previously associated with this PDMA.

        --*/


    7)  VOID
        VideoPortSetDmaContext(
            IN  PVOID   HwDeviceExtension,
            OUT PDMA    pDma,
            IN  PVOID   InstanceContext
            );

        /*++

        Routine Description:


        Arguments:

            HwDeviceExtension - a pointer to the miniport HW_DEVICE_EXTENSION.

            pDma  - the handle returned in the output buffer of the
            VIDEO_REQUEST_PACKET after VideoPortLockPages() returns.

            InstanceContext - any PVOID supplied by user.

        Return Value:

            NONE.

        --*/

    8)  ULONG
        VideoPortGetBytesUsed(
            IN  PVOID   HwDeviceExtension,
            IN  PDMA    pDma
            );

        /*++

        Routine Description:


        Arguments:

            HwDeviceExtension - a pointer to the miniport HW_DEVICE_EXTENSION.

            pDma  - the handle returned in the output buffer of the
            VIDEO_REQUEST_PACKET after VideoPortLockPages() returns.

        Return Value:

            The number of bytes used in the buffer associated with this PDMA

        --*/

    9)  VOID
        VideoPortSetBytesUsed(
            IN      PVOID   HwDeviceExtension,
            IN OUT  PDMA    pDma,
            IN      ULONG   BytesUsed
            );
        /*++

        Routine Description:


        Arguments:

            HwDeviceExtension - a pointer to the miniport HW_DEVICE_EXTENSION.

            pDma  - the handle returned in the output buffer of the
            VIDEO_REQUEST_PACKET after VideoPortLockPages() returns.

            BytesUsed - The number of bytes written to the buffer.

        Return Value:

            NONE.

        --*/




    ///////////////////////////////////////////////////////////////////
    Display driver to miniport interface (IOCTL interface).
    ///////////////////////////////////////////////////////////////////

    1) Miniport IOCTL routine.

    The design attemps to optimize DMA transfer from a fixed piece of virtual
    memory. These IOCTLs are to be defined by the miniport and the following
    descriptions of the IOCTLs are only suggestions.

    IOCTL_VIDEO_DMA_INIT - set by DispDrvr

        Causes VideoPortLockPages() to be called by the miniport where the DMA_FLAGS is
        set to VideoPortDmaInitOnly. This should only be done in graphics dedicated
        contexts, where disk and network IO is secondary to video IO. Driver writers
        must be aware that system performance (e.g. WinBench) can be damaged by leaving
        memory locked.

    IOCTL_VIDEO_DMA_TRANSFER_KEEP_LOCKED - set by DispDrvr

        The miniport may set up this IOCTL such that VideoPortDoDma() is called with the
        VideoPortKeepPagesLocked DMA_FLAGS argument. This scenario is oriented to graphics
        dedicated applications. It optimizes throughput so that the HwStartDma() routine
        in the miniport is called. In this way, DMA operations are optimized such that
        memory resources to other parts of the system are constrained.

        The scatter gather list available to HwStartDma() is valid:
            a) before the routine HwStartDma() returns.
            b) if HwStartDma() returns asynchronously, the list may be valid
            if the system memory manager is not stressed. If the PEVENT
            which is the fourth argument to HwStartDma() is set when the DMA is
            done, it will remain valid until then. If the PEVENT is not set,
            the list may become invalid at random times. This PEVENT must
            be set by :
                VideoPortSetEvent(HwDeviceExtension, PEVENT);
            c) if HwStartDma returns synchronously and the system memory
            manager is not stressed, the list will remain valid. if HwStartDma()
            return synchronously and the memory manager is stressed, the list
            will become invalid after HwStartDma returns.


    IOCTL_VIDEO_DMA_TRANSFER_ONCE - set by DispDrvr

        The miniport may set up this IOCTL so that VideoPortLockPages() is called with
        the VideoPortUnlockAfterDma DMA_FLAGS. This allows the buffer to be locked down,
        the HwStartDma miniport routine to be called back and the memory to be unlocked
        after the dma transfer has completed. This IOCTL is tuned for systems in which
        disk and network IO is very important and memory may be at a premium.

        The same remarks apply to the scatter gather list as for
        IOCTL_VIDEO_TRANSFER_KEEP_LOCKED.


    IOCTL_VIDEO_DMA_UNLOCK_PAGES - set by DispDrvr

        The miniport should simply call VideoPortUnlockPages() with the appropriate
        PDMA.


    Locking memory

    The video port must lock down the memory in order to perform dma. The
    amount of memory locked down is restricted by three things:
        1) Maximal number of physical page breaks supported by the driver.
        This is strictly a function of the dma hardware.
        2) The number of map registers the system has available at
        initialization. This is usually not bounded for busmaster devices,
        except by that indicated by HalGetAdapter().
        3) System performance contraints. In order to provide reasonable
        throughput for other parts of the system, the amount of memory the
        video port allows to be locked down is currently set as follows:
            a) small systems (12-16 meg)  256k
            b) medium systems (16-31 meg) 512k
            c) large systems (>32 meg)    1M

        These default values can be overridden by setting MaxDmaSize value under the
        Devicexxx key in the CurrentControlSet in the registry. Again, drivers which
        leave more than these amounts of memory locked down can severely impact system
        performance.

    Examples:

    from a miniport StartIo routine (note that the display driver formatting
    is unique to the display driver):

    case IOCTL_VIDEO_DMA_INIT:
       {
           //
           // This IOCTL should only be used for buffers that may remained locked
           // down for more than one dma transfer.
           //
           //
           // Map display driver representation into video port. Display
           // driver input buffer is of form
           //
           //   typedef struct _DMA_CONTROL
           //   {
           //           void          * pBitmap;        // Pointer to memory
           //                                           // to be locked.
           //           ULONG           ulSize;         // size of memory to
           //                                             // be locked.
           //           PVOID             pDma;           // Dma handle [OUT].
           //           PVOID           * pPhysAddr;      // Location to put
           //                                             // Physaddr.
           //           PEVENT            pDisplayEvent   // PEVENT.
           //           PEVENT            pMappedUserEvent// Mapped User Mode EVENT
           //                                             // handle.
           //   } DMA_CONTROL, *PDMA_CONTROL;
           //
           //
           //
           //

           PDMA_CONTROL pDmaCtrl = (PDMA_CONTROL) RequestPacket->InputBuffer;
           PUCHAR       ptmp     = pDmaCrtl->bitmap;
           ULONG        size     = pDmaCrtl->size;

          VideoDebugPrint(( 0,"\t InputBuffer:%x\n", ptmp));
          VideoDebugPrint(( 0,"\t InputBufferLength:%x\n", size));

          //
          // Save the location to put physaddr.
          //

          busAddress  = (PULONG)(pDmaCtrl->pPhysAddr);

           RequestPacket->InputBuffer       = ptmp;
           RequestPacket->InputBufferLength = size;

           if (RequestPacket->OutputBufferLength <
                                    (RequestPacket->StatusBlock->Information =
                                            sizeof(ULONG) )) {

              VideoDebugPrint((0, "IOCTL_VIDEO_DMA_INIT error1\n"));
              status = ERROR_INSUFFICIENT_BUFFER;
              break;
           }


           if (!VideoPortLockPages(HwDeviceExtension,
                                   RequestPacket,
                                   pMappedUserEvent,
                                   pDisplayEvent,
                                   VideoPortDmaInitOnly))
              {
                 RequestPacket->StatusBlock->Information = 0;
                 VideoDebugPrint((0, "IOCTL_VIDEO_DMA_INIT error2\n"));
                 status = ERROR_INSUFFICIENT_BUFFER;
              }
              else
              {
                //
                // Have to extract Physical address from scatterlist via DMA context in
                // OutputBuffer and put it back into OutputBuffer.
                //
                PVOID *  ppDmaHandle = (PVOID *)(RequestPacket->OutputBuffer);
                PVRB_SG  pSG = GET_VIDEO_SCATTERGATHER((PULONG)ppDmaHandle);
                ULONG    physaddr;

                pDmaCrtl->DmaHandle = *ppDmaHandle;

                hwDeviceExtension->IoBufferSize = size;
                hwDeviceExtension->IoBuffer     = ptmp;

                GET_VIDEO_PHYSICAL_ADDRESS(pSG, ptmp, ptmp, &size, physaddr);

                *busAddress = physaddr;

                status = NO_ERROR;

              }

          break;
      }

    case IOCTL_VIDEO_DMA_TRANSFER:
      //
      //    This IOCTL is optimized so that the display driver can get a buffer locked,
      //    dmaed and unlocked in one IOCTL.
      //
      {
           PDSP_DMA_ARGS    pDSPDmaArgs = (PDSP_DMA_ARGS)RequestPacket->InputBuffer;
           PDSP_DMA         pDSPDma     = pDSPDmaArgs->pDmaControl;
           PUCHAR           ptmp        = pDSPDma ->bitmap;
           ULONG            size        = pDSPDma ->size;

           if (RequestPacket->InputBufferLength < sizeof(DSP_DMA_ARGS)) {
              VideoDebugPrint(( 2,"\n Insufficient Buffer" ));
              status = ERROR_INSUFFICIENT_BUFFER;
              break;
           }

           RequestPacket->InputBuffer       = ptmp;
           RequestPacket->InputBufferLength = size;

           ASSERT(pDSPDma);

           if (RequestPacket->OutputBufferLength <
                                    (RequestPacket->StatusBlock->Information =
                                            sizeof(ULONG) )) {

              VideoDebugPrint((0, "IOCTL_VIDEO_DMA_INIT error1\n"));
              status = ERROR_INSUFFICIENT_BUFFER;
              break;
           }

           hwDeviceExtension->pDSPDmaArgs    = pDSPDmaArgs;

           if (!VideoPortLockPages(HwDeviceExtension,
                                   RequestPacket,
                                   pDSPDma->pMappedUserEvent,
                                   pDSPDma->pDisplayEvent,
                                   VideoPortUnlockAfterDma))
              {
                 RequestPacket->StatusBlock->Information = 0;
                 VideoDebugPrint((0, "IOCTL_VIDEO_DMA_TRANSFER error\n"));
                 status = ERROR_INSUFFICIENT_BUFFER;
              }
           else
                 status = NO_ERROR;

           break;

      }

    case IOCTL_VIDEO_DMA_UNLOCK:
      {
        //
        //  Private cleanup code. The memory has already been unlocked.
        //  The InputBuffer contains the Dma Handle.
        //

        PDMA pDma = *(PDMA*) (RequestPacket->InputBuffer);

        VideoPortUnlockPages(HwDeviceExtension, pDma);

        break;
      }



    2) Display driver code.

    DISPDBG((1, "Target, DmaControl.pBitmap:%x, DmaControl.ulSize:%x\n", DmaControl.pBitmap, DmaControl.ulSize));

    //
    // Ask the miniport to lock the pages needed for this DMA, do the dma and unlock them.
    //

    DSPDmaArgs.pDma        = (PVOID)DmaHandle;
    DSPDmaArgs.DmaBase     = ptrgBase;
    DSPDmaArgs.WidthTrg    = widthTrg;
    DSPDmaArgs.WidthSrc    = widthSrc;
    DSPDmaArgs.width__     = width * DSPSRCPIXELBYTES;
    DSPDmaArgs.Height      = height;
    DSPDmaArgs.Offset      = ulOffset;
    DSPDmaArgs.pPhysAddr   = pPCIAddress;
    DSPDmaArgs.bitdepth    = DSPSRCPIXELBITS;
    DSPDmaArgs.HS          = TRUE;
    DSPDmaArgs.pDmaControl = &DmaControl;

    if (EngDeviceIoControl(ppdev->hDriver,
                          IOCTL_VIDEO_DMA_TRANSFER,
                          &DSPDmaArgs,
                          sizeof(DSPDmaArgs),
                          &(DSPDmaArgs.pDma),
                          sizeof(DSPDmaArgs.pDma),
                          &returnedDataLength))
    {
        DISPDBG((0, "DSP.DLL!MSDMA - EngDeviceIoControl IOCTL_VIDEO_DMA_TRANSFER Error!!!\n"));
        DISPDBG((0, "DSP.DLL!vBitbltHSDMA - Exit\n"));
        return;
    }


    The requirements for these interfaces include:

    1) In the PVIDEO_HW_INITIALIZATION_DATA, the following fields need to be filled:

        PVIDEO_HW_START_DMA  HwStartDma - a pointer to a function which can be
            called when page locking is complete. This function returns
            Dma_Async_Return if it returns before the transfer is complete or
            Dma_Sync_Return if it returns after the transfer is complete.
            These return values are typedefed in video.h. This function takes
            as arguments:
                a)  pHwDeviceExtension - a pointer to a miniport device
                    extension.
                b)  a dma handle returned from a IOCTL_VIDEO_DMA_INIT  or
                    IOCTL_VIDEO_DMA_TRANSFER call.
                c)  a PEVENT which is the user Event mapped into kernel mode.
                    (NULL if pMappedUserEvent is NULL).
                d)  pDisplayEvent - a PEVENT, intended to be created and
                    waited on by the DisplayDriver and set by the miniport.
                e)  another PEVENT which must be set if the routine returns
                    Dma_Async_Return when the DMA completes. Failure to set
                    this PEVENT may invalidate the scatter gather lists at
                    any time. It must be set by:

                    VideoPortSetEvent(HwDeviceExtension, pVPEvent);

            Also, HwDeviceExtension should make a copy of the elements of the
            irp it intends to use or pass on, as the irp will be completed when
            HwStartDma returns, and hence it's fields will be invalidated.


    2) In the PORT_CONFIG_INFO, the following fields need to be
       filled:

        ULONG           DmaChannel - a value indicating if the device supports
            DMA.

        ULONG           DmaPort    - a value indicating if the device supports
            microchannel DMA.

        ULONG           NumberOfPhysicalBreaks    - a value indicating the
            maximal number of physical breaks the device supports.

        DMA_WIDTH       DmaWidth   - a value indicating the width of the dma
            device.

        DMA_SPEED       DmaSpeed   - a value indicating the specified transfer
            speed.

        BOOLEAN         DemandMode - a BOOLEAN indicating that the device can
            be programmed for demand mode rather than single cycle operations.

        BOOLEAN         bMapBuffers- a BOOLEAN indicating if an adapter
            requires that the data buffers be mapped into virtual address space.

        BOOLEAN         NeedPhysicalAddresses - a BOOLEAN indicating that the
            driver will need to translate virtual to physical addresses.

        BOOLEAN         ScatterGather - a BOOLEAN indicating that the
            driver will support scatter gather.

        BOOLEAN         Master        - a BOOLEAN indicating that the adapter
            is a bus master. Again, currently required to be TRUE.

        ULONG           MaximumScatterGatherChunkSize - the largest contiguous
            piece of memory the dma controller can handle. This value is zero
            if and only if the size is unbounded.