Source code of Windows XP (NT5)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

166 lines
13 KiB

  1. <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
  2. <html>
  3. <head>
  4. <title>Microsoft Index Server Guide: Understanding Index Server</title>
  5. <meta name="FORMATTER" content="Microsoft FrontPage 1.1">
  6. <meta name="GENERATOR" content="Microsoft FrontPage 1.1">
  7. </head>
  8. <body bgcolor="#FFFFFF">
  9. <!--Headerbegin--><p align=center><a name="TOP"><img src="onepix.gif" alt="Space" align=middle width=1 height=1></a> <a href="default.htm#Top"><img src="toc.gif" alt=" Contents" align=middle border=0 width=89 height=31></a> <a href="front.htm"><img src="previous.gif" alt="Previous" align=middle border=0 width=32 height=31></a> <a href="install.htm"><img src="next.gif" alt="Next" align=middle border=0 width=32 height=31></a> </p>
  10. <hr>
  11. <!--Headerend--><p><a name="UnderstandingIndexServer"><font size=6><strong>Understanding Index Server</strong></font></a></p>
  12. <p><!--Chaptoc--></p>
  13. <blockquote>
  14. <p><a href="intro.htm#QueryForms">Query Forms</a> <br>
  15. <a href="intro.htm#BasicQueryingFeatures">Basic Querying Features</a> <br>
  16. <a href="intro.htm#BasicIndexingFeatures">Basic Indexing Features</a> <br>
  17. <a href="intro.htm#SupportforMultipleLanguages">Support for Multiple Languages</a> <br>
  18. </p>
  19. </blockquote>
  20. <hr>
  21. <!--ChaptocEnd--><p>Microsoft Index Server is the Microsoft content-indexing and searching solution for Microsoft Internet Information Server
  22. (IIS) and Peer Web Services (PWS). An add-on module for IIS and PWS, Microsoft Index Server is designed to index the
  23. full text and properties of documents on an IIS-based (or PWS-based) server. Index Server can index documents for both
  24. corporate intranets and for any drive accessible through a uniform naming convention (UNC) path on the Internet.</p>
  25. <p>Clients can formulate queries by using any World Wide Web (WWW) browser to fill in the fields of a simple Web query form.
  26. The Web server forwards the query form to the query engine, which finds the pertinent documents and returns the results to
  27. the client formatted as a Web page.</p>
  28. <p>Unlike many content indexing systems, Index Server can index the text and properties of formatted documents, such as those
  29. created by Microsoft&#174; Word or Microsoft&#174; Excel. This feature lets you publish existing documents on your intranet Web
  30. without converting them to HyperText Markup Language (HTML).</p>
  31. <hr>
  32. <h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="QueryForms">Query Forms</a></h1>
  33. <p>Users submit their queries by filling out fields in a form. With Index Server, the administrator for a Web server can create
  34. customized forms to help users find documents at the local site. The administrator can modify the form so that the user can
  35. search by contents or by other document properties, such as author or subject. The administrator creates a query form with
  36. standard HTML, and the form becomes little more than a Web page itself. Any user who knows how to create Web pages
  37. with HTML can put together a simple query form in minutes.</p>
  38. <hr>
  39. <h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="BasicQueryingFeatures">Basic Querying Features</a></h1>
  40. <p>These are the basic features of a query:</p>
  41. <ul>
  42. <li>Scope</li>
  43. <li>Restriction</li>
  44. <li>Result set</li>
  45. </ul>
  46. <p>The <em>scope</em> tells the query engine where to look when searching. It describes the set of documents within the <a href="glossary.htm#Corpus">corpus</a> that will be
  47. searched. The <em>restriction</em> tests to see if a document should be returned. A restriction is a set of terms that can be combined by
  48. various operators. The <em>result set</em> defines the information to return from a query. </p>
  49. <p>In addition to the basic features, other features let you control how results are returned and displayed, for example, how results
  50. are sorted. You can also:</p>
  51. <ul>
  52. <li>Limit query to specific scopes</li>
  53. <li>Search for words and phrases within document contents</li>
  54. <li>Search for words or phrases near another word or phrase</li>
  55. <li>Search for words and phrases within textual properties. (for example, @DocAuthor Sally)</li>
  56. <li>Search for properties with &lt;, &lt;=, =, =&gt;, &gt; against a constant. (for example,. DATE &gt; 1/1/95)</li>
  57. <li>Apply Boolean operators, <strong>AND</strong>, <strong>OR</strong>, and<strong> NOT</strong></li>
  58. <li>Search with wild cards ( for example, &#147;*&#148;, &#147;?&#148; and <a href="glossary.htm#regex">regular expressions</a>).</li>
  59. <li>Fully integrate searches with Windows NT security model.</li>
  60. <li>Rank hits by quality</li>
  61. <li>Return specified property data.</li>
  62. </ul>
  63. <h2>Scope</h2>
  64. <p>A query scope specifies the set of documents that must be searched. Typically scopes are specified by a directory path on a
  65. storage volume, such as D:\Docs. IIS and PWS Web sites correspond to virtual roots that point to a collection of documents. </p>
  66. <p>Index Server indexes documents based on sites. An administrator can index all the sites on a server, or select a subset of sites
  67. to index. Queries can be run against multiple sites, against a single site, or even against a single physical directory within a site.</p>
  68. <h2>Restriction</h2>
  69. <p>You can query against the contents of Web pages and other documents served by IIS (or PWS) and Index Server. The types
  70. of documents you can query include HTML, Microsoft&#174; Word, Microsoft&#174; Excel, Microsoft&#174; PowerPoint&#174;, and plain text
  71. documents. Other document types are not supported by Index Server directly, but a content filter can extend the list of
  72. supported document types. A content filter reads a proprietary document format and emits textual words, which are indexed
  73. by Index Server. For more information on content filters, contact Microsoft and ask about the IFilter interface.</p>
  74. <p>With Index Server you can search for multiple words and phrases within documents as well as words and phrases near other
  75. words and phrases. Index Server also provides free-text queries. With <em>free-text queries</em>, you can enter any set of words or
  76. phrases, or even a complete sentence, as the query restriction. Index Server will examine this text, identify all the nouns and
  77. noun phrases, and post a query using those terms. For example, assume you typed the following free-text query:</p>
  78. <blockquote>
  79. <p><em>The Fulton County Grand Jury said Friday an investigation of Atlanta&#146;s recent primary election
  80. produced no evidence that any irregularities took place.</em></p>
  81. </blockquote>
  82. <p>Index Server identifies the following words and noun phrases:</p>
  83. <blockquote>
  84. <p><strong>Words: </strong>Fulton, county, grand, jury, Friday, investigation, Atlanta, recent, primary, election, produce,
  85. evidence, irregularity</p>
  86. <p><strong>Phrases: </strong>Fulton county grand jury, primary election, grand jury, Atlanta&#146;s recent primary election</p>
  87. </blockquote>
  88. <p>These words and phrases are combined into a restriction, weighted for proper ranking, and posted as a query against the
  89. corpus.</p>
  90. <p><strong>Note</strong>&#160;&#160;&#160;The free-text query is preceded by <em>$contents</em>.</p>
  91. <h3>Property Restrictions</h3>
  92. <p>In addition to querying contents, users can query properties stored on objects. These properties include file size, creation and
  93. modification dates, file names, authors, and so on. Clients can query both textual properties (file name and author, for
  94. example) and numerical properties (size and modification date, for example). Clients can also query all ActiveX&#153; properties,
  95. including custom properties on Microsoft Office documents.</p>
  96. <p>You can use the standard comparison operators in queries. These include =, &gt;, &lt;, &gt;=, &lt;=, and != (not equal) for numeric and
  97. textual properties. In addition, for textual properties all the content query functionality is available. Properties can be compared
  98. only to constants; you cannot compare one property to another in the first release of Index Server. With Boolean operators
  99. (<strong>AND</strong>, <strong>OR</strong>, and<strong> NOT</strong>) and parentheses, you can freely mix restriction terms. </p>
  100. <h3>Fuzzy Queries</h3>
  101. <p>Index Server supports fuzzy queries, which contain simple wildcards (such as those in MS-DOS&#174;) and matches <a href="glossary.htm#regex">regular
  102. expressions</a> (from UNIX&#174;) against textual properties. Content queries support simple-prefix matching (for example, &#147;dog*&#148;
  103. will return &#147;dogmatic&#148; and &#147;doghouse&#148;). Index Server also supports linguistic stemming, which matches inflected and base
  104. forms of query words. (For example, &#147;swim**&#148; is expanded to &#147;swimming,&#148; &#147;swam,&#148; &#147;swum,&#148; and so on.)</p>
  105. <p>Although Index Server does not support true natural language processing, it supports free-text mode.</p>
  106. <h2>Result Sets</h2>
  107. <p>Index Server assembles query hits into result sets, which are returned to the client. The administrator can limit the maximum
  108. number of hits returned to the client. For example, a result set of 200 hits can be returned the client in 10 pages of 20 hits
  109. each. The query form determines the number of hits returned per page, but you can configure a form to let the client specify the
  110. number of hits to be returned.</p>
  111. <p>In addition to sorting by rank, Index Server can sort query results according to any document property.</p>
  112. <p>If the corpus is stored on a Windows NT File System (NTFS) volume, Index Server respects all security restrictions&#151;access
  113. control list (ACL) checking is performed. In a result set, a user can never see a document reference if the ACL on that object
  114. prohibits read access to that client.</p>
  115. <p>If allowed, the client can specify the specific properties to return in a result set (that is, the columns in the result set). Any
  116. property that is valid in a query restriction is valid as a result column. But the administrator can restrict the properties returned
  117. by a query.</p>
  118. <p>In addition to returning properties stored with the document, Index Server can generate document abstracts, which can also be
  119. returned in a result set. An abstract briefly summarizes the content of a document. A document abstract can also be part of a
  120. query restriction.</p>
  121. <h2>Logging</h2>
  122. <p>IIS and PWS already log all traffic moving between a client and the server. Standard IIS and PWS logging picks up query
  123. information such as the querying IP address and the queries posted to the server.</p>
  124. <hr>
  125. <h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="BasicIndexingFeatures">Basic Indexing Features</a></h1>
  126. <p>These are the basic features of an index:</p>
  127. <ul>
  128. <li>Full text search in Web pages</li>
  129. <li>Full text search in formatted data such as Microsoft Word or Microsoft Excel documents</li>
  130. <li>Incrementally refreshing of indexes</li>
  131. <li>Control of indexing for each virtual path</li>
  132. <li>Indexing of property values</li>
  133. <li>Indexing of text regardless of language</li>
  134. <li>Automatic index updates</li>
  135. <li>Performance monitoring</li>
  136. <li>Zero-maintenance design, 24-hour reliablity</li>
  137. <li>Multithreading to take advantage of SMP computers</li>
  138. </ul>
  139. <p>Indexes are controlled on each virtual path. An index is built over a set of directories (and their child directories). By default,
  140. you can incrementally refresh an index&#151;that is, refresh an index by indexing only changed files. Index Server does not need to
  141. re-index all the documents to pick up a few changes.</p>
  142. <p>With Index Server, a number of different performance monitors help administrators optimize their query service. These
  143. monitors measure criteria such as the number of documents that need to be indexed and how fast queries are being processed.</p>
  144. <p>By design, Index Server requires little if any maintanance. Once set up, all operations are automatic, including updates, index
  145. creation and optimization, and even crash recovery if there is a power failure or if the index gets corrupted. Index Server was
  146. designed from the start to work in mission-critical environments where the server must be running 24 hours a day, 7 days a
  147. week.</p>
  148. <hr>
  149. <h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="SupportforMultipleLanguages">Support for Multiple Languages</a></h1>
  150. <p>Most Web pages today are authored in English, but many documents are not. Because IIS and PWS can serve documents,
  151. multilingual indexing and querying features are a standard feature of Index Server. The query system was built with localization
  152. in mind. It is completely modular and can dynamically load and unload language-specific utilities. These utilities include word
  153. breakers, stemmers, and normalizers. These linguistic components are available for several languages.</p>
  154. <p>Index Server can index multilingual documents and switch between languages as required (for example, index an English
  155. paragraph, index a French paragraph, and switch back to English). All index information is stored as Unicode characters, and
  156. all queries are converted to Unicode before they are processed.</p>
  157. <!--Footerbegin--><hr>
  158. <p align=center><a href="default.htm#Top"><img src="toc.gif" alt=" Contents" align=middle border=0 width=89 height=31></a> <a href="front.htm"><img src="previous.gif" alt="Previous" align=middle border=0 width=32 height=31></a> <a href="#TOP"><img src="up_end.gif" alt="To Top" align=middle border=0 width=32 height=31></a> <a href="install.htm"><img src="next.gif" alt="Next" align=middle border=0 width=32 height=31></a> </p>
  159. <hr>
  160. <p align=center><em>&#169; 1996 by Microsoft Corporation. All rights reserved.<!--Footerend--></em></p>
  161. </body>
  162. </html>