windows-server-2003/inetsrv/iis/lkrhash/todo.html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>GeorgeRe's ToDo List for LKRhash</title>
<link rel=Stylesheet type="text/css" media=all href="./docs/lkr.css">
</head>

<body>

<h1>GeorgeRe's ToDo List for LKRhash</h1>

<ul>
  <li>  <a href="#testing"> Testing</a></li>
  <li>  <a href="#kernel-mode"> Kernel Mode</a></li>
  <li>  <a href="#lock-implementation"> Lock Implementation</a></li>
  <li>  <a href="#iis6">    IIS 6.0</a></li>
  <li>  <a href="#misc">    Miscellaneous</a></li>
  <li>  <a href="#hashfn">  Hash Functions</a></li>
  <li>  <a href="#docs">    Documentation</a></li>
</ul>

<p>These file is primarily for George Reilly's benefit, so don't
worry if you don't understand all of the points.</p>


<a name="testing">
<h2>Testing</h2>
</a>

<ul>
  <li>  Write a C test, to directly test the C API. HashTest half does
      it, as it can use the public API and TypedHashTable.

  <li>  Build a publicly distributable test program, as a sample for
      customers. E.g., a dictionary component for ASP.

  <li>  hashtest: modify so that each thread can work on a completely
      separate hashtable, instead of sharing a global
      hashtable. This will allow us to discover the maximum possible
      scaling, which is probably less than the `ideal' scaling,
      #CPUs * 1-thread-perf.

  <li>  hashtest\kernel: split into driver and usermode test
      program. Usermode test program should take care of parsing the
      arguments and loading the test data into memory, then use an
      ioctl to pass it down to the kernel driver. The only code in
      the kernel driver,apart from LKRhash itself, should be the goo
      to crack the ioctl and DriverEntry. Usermode should print
      results.

  <li>  Check that a numeric key of 0 actually works in debug
      version.

  <li>  Better tests for ApplyIf family
</ul>


<a name="kernel-mode">
<h2>Kernel Mode</h2>
</a>

<ul>
  <li>  kernel locks: should they block at all? consider implications
      if running on some usermode program's thread, or of running at
      DISPATCH_LEVEL. Add support for using NtDelayExecution
      (implements Sleep) in SwitchOrSleep.

  <li>  Fix global objects in driver, including the global list of
      tables; i.e., put in the necessary magic to ensure that
      constructors and destructors of globals objects get
      called.

  <li>  implement kernel-mode version of !lkrhash

  <li>  Write a template wrapper for operator new/delete, keyed
      on (Non)PagedPool.

  <li>  Think about running at DISPATCH_LEVEL instead of
      PASSIVE_LEVEL. Which functions should be pageable
      vs. non-pageable? Memory allocators? Lock types?

  <li>  Use <i>zwapi.h</i>? Which set of NT headers should we rely upon?

  <li>  Memory allocation pool is a parameter for LKRhash public
      constructors (and LKR_Initialize), but it's ignored. Need
      virtual base for allocators? Note: m_paDirSegs allocates
      an array of CDirEntry*s.

  <li>  DONE: Control debug spew: provide some way of setting
      g_fDebugOutputEnabled. (See flags to LKR_Initialize)
</ul>


<a name="lock-implementation">
<h2>Lock Implementation</h2>
</a>

<ul>
  <li>  More volatiles for Lock_AtomicCompareAndSwap calls.

  <li>  Put delays into the inner loop of the spins, to see if that
      reduces cache line sloshing. See BLAM paper.

  <li>  Use SleepEx(ms, TRUE) instead of Sleep(ms) to enable
    APCs. Does this leave any danger of deadlocking a COM STA
    thread?

  <li>  Provide some way to expose the bit31 (precreate event)
      functionality of InitializeCriticalSectionAndSpinCount.
      Also, this function can fail.

  <li>  Document that the locks are not suitable for
      cross-process use.

  <li>  Enable per-class instrumentation for locks, as opposed to the
      current all-or-nothing system for per-lock instrumentation.
      Mostly present already, just needs a little factoring. Keep
      track of wall-clock time and sleep time too.

  <li>  Check asserts for Is(Read|Write)Unlocked. Disable
      Is(Read|Write)(L|Unl)ocked except in debug builds. They're
      fundamentally broken. In general, you can't tell if this
      thread owns the lock for reading (sometimes, not even for
      writing). It's even harder to tell that you DON'T own the
      lock, especially when some other thread does.

  <li>  Locks code. Move all member function implementations and enums
      into locks.cpp. Locks.h should be opaque declarations only.

  <li>  Experiment with using bitfields or a union of WORDS and
      BYTEs, as opposed to the current explicit masking and
      shifting.

  <li>  True waiter count in hiword? As opposed to waiters +
      writer, as currently.

  <li>  Add TryConvertSharedToExclusive member to MRSW locks.

  <li>  Look at the new EX_PUSH_LOCK in \nt\public\internal\base\inc\ntosp.h.
      Wrap in kLocks.h.

  <li>  Is the current kernel mode GetCurrentThreadId adequate;
      i.e., is the Cid unique across all processes, including
      the System one?

  <li>  Change Lock_AtomicIncrement to use `lock inc'. Ditto for
      Decrement. Don't need result of operation.

  <li>  Remove x86 `lock' prefixes and measure difference on a 1P machine.

  <li>  Experiment with putting the writer recursion count in
      m_lRW: 0xFFFF, 0xFFFE, ..., -n.

  <li>  Rewrite a few key functions, such as CSmallSpinLock::WriteLock
      or CReaderWriterLock3::ReadOrWriteLock, in x86 assembler. Keep
      portable implementation, of course.

  <li>  Add safe versions of the main entry points that do a lot
      more error checking: e.g., SafeWriteUnlock should check
      that the current thread actually owns the lock; check for
      over- or underflows; etc.

  <li>  Orphaned lock detection?

  <li>  Build locks as a separate statically linked library. Or dll?

  <li>  Add per-class initialize and cleanup functions, to be called
      from LKR_Initialize

  <li>  Make the locks throw exceptions (C++ or SEH?) in
      "impossible" conditions; e.g., recursively acquiring a
      CSmallSpinLock (=> instant deadlock), acquiring a
      writelock too many times, freeing a lock too many times.

  <li>  Experiment with queued spinlocks: see \\bustard\contrib\ReneS\mcslock

  <li>  Refcount Locks_Initialize and Locks_Cleanup?

  <li>  Move CKSpinLock, CEResource, and CFastMutex into <Locks.h>.

  <li>  Build another variant of the locks that waits on handles
      instead of using SwitchToThread. Perhaps a pool of handles?

  <li>  Add a timeout feature to Try(Read|Write)Lock

  <li>  Deadlock detection?

  <li>  In the Spin routines, keep track of how long it's been
      since spinning started. DebugBreak if we've spun for a
      long time (e.g., 10 minutes)

  <li>  For debugging purposes, keep track of all read owners of a
      multireader lock. Hang something out of TLS to keep track of
      all locks acquired by this thread.

  <li>  InitializeCriticalSectionAndSpinCount is broken on
      Win98. Declared with wrong prototype or something. Only do
      a GetProcAddr for it on NT.

  <li>  DONE: Put an InterlockedCompareExchange at the beginning of each
      outer loop in the spin routines, to ensure that there's an
      unconditional memory barrier.

  <li> DONE: Increase SL_OWNER_BITS from 4 to 8 to accommodate likely
      scenarios with locking iterators.

  <li>  DONE: Sprinkle KeEnterCriticalRegion (and KeLeaveCriticalRegion)
      in the various locks, to prevent APCs being delivered that might
      suspend the thread that holds the lock.

  <li>  NOT NEEDED, as I realized that the return value can be stored
      as a member variable that's written after the lock is acquired.
      WriteLock: make all return a value instead of `void' that's
      passed into WriteUnlock. Ditto for ReadLock, etc. Needed for
      CKSpinLock and OldIrql. <sigh/>
</ul>


<a name="iis6">
<h2>IIS 6.0</h2>
</a>

<ul>
  <li>  Reduce the three versions of LKRhash in IIS6 to just
      this one. Definitely get rid of the IISUTIL version, even
      if not allowed to fix the IISRTL/LISRTL version.

  <li>  Remove all uses of the deprecated iterators from the
      IIS6 code base.

  <li>  Port !lkrhash to ulkd. Update !cache or whatever it's called.

  <li>  Copy the debug instrumentation from http.sys into CKSpinLock
      and CEResource.
</ul>


<a name="misc">
<h2>Miscellaneous</h2>
</a>

<ul>
  <li>  Don't __declspec(dllexport) CLKRLinearHashTable.
        Don't __declspec(dllexport) whole classes,
        just public methods.

  <li>  Use ASSERTIONAL, PRECONDITION, etc, specialized macros.

  <li>  Add typedefs for RecordPointer and HashSignature, to ease
        porting to AWE-type environment.

  <li>  Run hashtest under the AppVerifier.

  <li>  Compile with PREFast and /W4

  <li>  Get rid of IsBad*Ptr wrappers in irtldbg.h. Clean up irtldbg.h and
  irtlmisc.h.

  <li>  Experiment with __assume().

  <li>  Get rid of all calls to Interlocked* in src\LKR-*.cpp.
      Replace with trivial methods. Have the methods use the lock type
      to decide whether to use interlocked or non-threadsafe operations.
      Goal is to have non-threadsafe version use no interlocked ops
      whatsoever.

  <li>  Intel's Hyper-Threading presentation says to use a
      "Fill line size" of 128 bytes.

  <li>  Provide BulkAdd and BulkDelete functions that work in
      terms of a collection of records. It should be cheaper to
      do _Expand (or _Contract) once for all affected buckets,
      than to have to do them after each insertion (deletion).

  <li>  Allow a NULL AddRefRecord(), so that reference counting
      can be entirely dispensed with.

  <li>  placement new for inline SubTable[0]

  <li>  Clear() should abandon immediately if m_cRecords==0.

  <li>  Replace use of NODES_PER_CLUMP in functions with a call
      to inline function _NodesPerClump(). This will allow
      experiments with NodeClumps whose size is set at runtime,
      not compile time. Also need to access m_dwKeySigs[i] and
      m_pvNode[i] through accessor functions.

  <li>  :%s/CListEntry/LIST_ENTRY/g. This allows us to get rid
      of the global ctors/dtors (which are a pain in the kernel)
      and just initialize/test empty in LKR_Initialize and
      LKR_Terminate.

  <li>  Use a custom allocator. CMediumSegment is 4KB on x86,
      but the heap overhead means that it's actually two pages,
      rather than exactly one page.

  <li>  'typedef INT_PTR NodeIndex' so that loop induction works
      better on Win64.

  <li>  Experiment with disabling linear hashing's expansion and
      contraction. This would mean that we could avoid taking
      the table lock to calculate the bucket address. However,
      this would only work if the table lock is not exposed or
      explicitly disabled. It would also require the user to set
      the table size correctly in the constructor, if they don't
      use LK_{SMALL|MEDIUM|LARGE}_TABLESIZE.

  <li>  In Win64/Debug version, InsertRecord should check for
      32-bit overflow of m_cRecords. (Though it's still
      improbable. A record takes 8 bytes for the pointer + 4
      bytes for the hash signature plus some amortized overhead,
      which means that the storage for the table itself would be
      more than 12 * 4GB. And that excludes any storage for the
      &gt; 2^32 records.)

  <li>  Add a "contention level" flag to the constructor. If the
      number of subtables is not specified explicitly (i.e.,
      LK_DFLT_NUM_SUBTBLS is passed to ctor), we key the number of
      subtables off LK_TABLESIZE multiplied by a function of the
      number of CPUs. However, we only need a lot of subtables under
      two circumstances: (a) many millions of elements (esp. on
      Win64, where total number of elements might someday exceed 2^32),
      or (b) high contention. There isn't necessarily a correlation
      between large table size and high contention. With the
      multi-reader locks, high contention only arises if there are a
      lot of insertions/deletions. If the table is not modified much
      after it's built, contention shouldn't be an issue and there's
      no real advantage to having a lot of subtables.

  <li>  [Suggestion by BAlam] Provide a way to "seal" hashtable.
      Once the records have been inserted, call Seal(). All
      valid operations thereafter (except dtor) are readonly, so no
      table or bucket locks need be taken.

  <li> Provide a way to disable all locking. Caller takes
      responsibility for guaranteeing that all operations are
      threadsafe, either because table has thread affinity or
      because a higher-level lock is being used.

  <li>  Add optional refcount parameter to template
      wrapper. Refcount all operations, hide dtor, and provide a
      Destroy method instead. Fix issues with some tables not
      shutting down gracefully.

  <li>  Benchmark CLKRHashTable with one subtable vs. CLKRLinearHashTable.

  <li>  Experiment with faster functions for finding subtable,
      esp. when number of subtables is not a power of 2.
      Probably just use middle 6 bits of ((hash &lt;&lt; 13) - hash)
      and use that to index into a lookup table of byte-sized
      remainders.

  <li>  Use .w -&gt; .h stuff (hsplit) to filter out MS-confidential stuff
      from the headers, so that LKRhash.h and Locks.h can be
      rendered fit for public consumption?

  <li>  Check that CLKRLinearHashTable can still be used as the
      base class for CTypedHashTable.

  <li>  Add a default constructor to CTypedHashTable that uses a
      <code>static const char* ClassName()</code> method.

  <li>  Should operator== and operator!= be made friend
      functions for the iterators, instead of members?

  <li>  Add version.subversion number to CLKRLinearHashTable

  <li>  Ensure all relevant fields are printed in !lkrhash

  <li>  Add -k flag to !lkrhash to enumerate all known custom
      extensions.

  <li>  The publicly exposed versions of the Lock accessors
      should use the Safe versions of the lock entrypoints. The
      internal versions (as called by the LKRhash code itself)
      can continue to use the fast versions.

  <li>  Remove the deprecated iterators from the main code before
      releasing it. Bundle them in a separate subdirectory for
      the few holdouts. Clarify that they will not be released
      again.

  <li>  Double-check <code>nCmp = pstrKey1-&gt;m_cch - pstrKey2-&gt;m_cch;</code>
      in WordHash.h

  <li>  Use C#-style XML documentation comments

  <li>  Write some documentation. Does there need to be a sanitized
      version for external consumption? Think about using XSL to
      achieve this.

  <li>  Write a tutorial. Perhaps a little phone book class
      (email =&gt; phone number, then firstname+lastname =&gt;
      phone number). Extend it with a reverse map (phone number
      =&gt; name); mention canonicalization

  <li>  See if division or multiplication is better in
      InsertRecord expansion test. Ditto for DeleteKey/Record.
      Experiment with using shifts for division. For example,
      NumRecords/7 ~= NumRecords * 9 / 64
      = (NumRecords &gt;&gt; 3) + (NumRecords &gt;&gt; 6).
      Don't need perfect mathematical accuracy.

  <li>  Don't use modulo in CLKRHashTable::_SubTable, if m_cSubTables
      is not a power of 2. Either round the m_cSubTables up or down
      to the nearest power of 2, or compute a modulo table.

  <li>  Add a context pointer to all of the *Key/Apply functions
      that will be passed down to CompareKeys and HashKey.
      E.g., do a case-sensitive or -insensitive match.

  <li>  Make the m_pfn's be const pointers and member-initialize them.

  <li>  Get rid of __LKRHASH_NO_NAMESPACE__; i.e., require the namespace.

  <li>  Refcount LKR_Initialize and LKR_Terminate?

  <li>  Add some flags to LKR_Initialize: default size, output
      tracing, etc

  <li>  Should ApplyIf(LKP_DELETE) call the Action function before
      deleting? Or add LKP_PERFORM_DELETE[_STOP]

  <li>  ApplyIf should not take the bucket lock. This would
      permit recursive calls to routines that need to take the
      bucket lock. Careful! There may be race conditions, if
      another thread holds an incompatible bucket lock but has
      been suspended.

  <li>  Implement fMultiKeys to provide support for multiple,
      identical keys. Needed for EqualRange, hash_multiset, and
      hash_multimap. See CLKRLinearHashTable::_InsertRecord for
      details on what needs to be changed. Be sure to actually
      compare keys for equality: matching hash signatures is not
      enough.

  <li>  Provide FindMulti, DeleteMulti(OUT Record*** pppRecords
      OPTIONAL), and FreeMulti.

  <li>  Provide implementations of the STL collection classes:
      hash_map, hash_set, hash_multimap, and hash_multiset. Must
      provide full implementation of STL iterator semantics
      (pair&lt;key,value&gt;). Use the SGI version of hash_map
      as an inspiration, not the Dinkumware version. See
      Austern's book.

  <li>  Provide const_iterators for STL-style iterators. But see
      what Scott Meyers has to say about this in "Effective STL."

  <li>  Consider providing some kind of implicit locking with
      STL-style iterators. Either per-subtable or all subtables.

  <li>  We need to call _ExtractKey a lot. Would it be better to
      cache the DWORD_PTR in each node? On the other hand,
      _ExtractKey is usually very cheap (typically add the
      offset of the key to the base address of the record), so
      the space-time tradeoff may not be worth it.

  <li>  Add debug print routines for the other enumerations and for
      lock state to LKRhash. Place in a separate file that can be
      shared with lkrdbg.

  <li>  Provide mapping functions, LKRetcodeToHResult,
      LKRetcodeToWin32Error, and LKRetcodeToNtStatus, for
      canonical translations.

  <li>  Build lkrhash as a separate statically linked library

  <li>  Make CBucket be a union with BYTE[BUCKET_BYTE_SIZE], to
      improve cache-line alignment. But really ought to be using
      an allocator that produces page-aligned locations for
      maximal benefit: i.e., something like RockAll instead of
      the NT heap or the CRT.

  <li>  Better Statistics: #buckets, density, by subtable

  <li>  Experiment with new hash function from Paul Larson and
      others.

  <li>  Experiment with GUID hashing functions. Supposedly the
      first DWORD is enough. Rip all the IIDs, CLSIDs, etc out
      of the registry and create random GUIDs with UuidCreate()
      and MAC-based GUIDs with UuidCreateSequential() to test
      the distributions.

  <li> Public API in LKR-hash.h: remove dependency on irtlmisc.h
      and irtldbg.h

  <li>  Coalesce CLKRLinearHashTable_Iterator members and
      CLKRHashTable_Iterator members for space; i.e., put
      `short m_iNode', `BYTE m_LockType', and `BYTE m_ist'
      together in CLKRLinearHashTable_Iterator.

  <li>  Add some assertions in terms of MAX_DIRSIZE_BITS and
      NODE_CLUMP bits.

  <li>  Use IsBadCodePtr to validate callback functions in
      constructor. Of course, this is no guarantee that they'll
      still be good functions later.

  <li>  Make LKRhash exception-safe. What happens if callback routines
      (AddRefRecord/ExtractKeys or ApplyIf) access violate (throw an
      SEH exception) or throw a C++ exception? Table and bucket
      locks won't be released, state variables may be corrupted,
      etc. LKRhash code should never throw any kind of exception

  <li>  Add throw() specifications to appropriate functions?

  <li>  Add some kind of auto object for readlocking or writelocking a
      table, so that the table automatically gets unlocked by
      auto-obj's destructor.

  <li>  Use auto_ptrs.

  <li>  Port to C# (Chris Tracy has started on this).
      Andre Rosenthal has started a port to Managed C++

  <li>  DONE: Still need NULL implementations for copy ctor and op=?
      Or is this an obsolete vestige of the long-gone templated
      inner classes?

  <li>  DONE: Factor out memory allocation stuff into LKR-mem.cpp;
      initialization stuff into LKR-init.cpp; LKR-locks.cpp, etc.

  <li>  DONE: Add TryWriteLock and TryReadLock methods to table. Also
      ConvertExclusiveToShared and vice versa.

  <li>  DONE: Couch tests in _IsBucketChainMultiKeySorted et al in
      terms of a macro, CheckAndAdd.

  <li>  DONE: Experiment with having no bucket locks whatsoever, just
      subtable locks. This should make operations a little
      faster, but presumably will hurt multiprocessor
      scalability. (Have to have table locks in multithreaded
      version as linear hashing's expansion/contraction means
      that bucket address calculation is a function of several
      variables that can change at any time.)

  <li>  DONE: Add some instrumentation: #allocs, #expands, #contracts,
      #ExtractKeys, etc.

  <li>  DONE: Reduce MIN_DIRSIZE to 4 and inline
      m_aDirSegs[4]. Only allocate if directory gets bigger than
      MIN_DIRSIZE.

  <li>  DONE: Add an index member to subtable

  <li>  DONE: Inline the array of pointers to subtables within
      CLKRHashTable, instead of dynamically allocating it.

  <li> DONE: Make AddRefRecord return the new reference
      count. Should it take a `const void*' or a `void*'?
      (TypedHashTable takes a non-const `Record*'.)

  <li>  DONE: Provide a DeleteKey variant that returns a pointer to the
      record that is being removed from the table. In this case,
      instead of releasing reference, would implicitly transfer
      ownership of the record to the caller.

  <li>  DONE: Inline _FindBucket by hand into Delete(Key|Record),
      InsertRecord, and Find(Key|Record).

  <li>  DONE: Replace <code>IRTLVERIFY(x)</code> with
      <code>if (!x) IRTLASSERT(! "message")</code>.

  <li>  DONE: InsertRecord should not LKAR_INSERT_RELEASE if record
      addresses are same. This guarantees that calls to
      _AddRefRecord with LKAR_MIN_DELETE_FROM_TABLE <= lkar <=
      LKAR_MAX_DELETE_FROM_TABLE works.

  <li>  DONE: If m_cSubTables==1 don't bother to scramble the hash
      index in CLKRHashTable::_SubTable

  <li>  DONE: break apart lkrhash.cpp: iterators, applyif, stats, etc

  <li>  DONE: Always step forward through all subtables, to avoid
      possible deadlock, when acquiring subtable locks; i.e.,
      ensure that we have a valid lock hierarchy.

  <li>  DONE: Test new contraction algorithm.

  <li>  DONE: sublocks for ApplyIf

  <li>  DONE: Provide a C API wrapper
</ul>


<a name="hashfn">
<h2>Hash Functions</h2>
</a>

<ul>
  <li>  Follow up on case-insensitive hash function for
      Unicode. Note that (wch &amp; 0xFFDF) is inadequate.
      Note: NTFS keeps a per-volume UpcaseMap for Unicode.

  <li>  Provide A/W string hash functions that take a count,
      instead of relying on zero-termination.
</ul>


<a name="docs">
<h2>Documentation</h2>
</a>

<ul>
  <li>  Write a tutorial
</ul>
</ul>


<hr>

<address></address>

<!-- hhmts start -->
Last modified:
Fri Feb 15 13:15:54 2002 
<!-- hhmts end -->

</body>
</html>