|
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head> <title>Microsoft Index Server Guide: Understanding Index Server</title> <meta name="FORMATTER" content="Microsoft FrontPage 1.1"> <meta name="GENERATOR" content="Microsoft FrontPage 1.1"> </head>
<body bgcolor="#FFFFFF"> <!--Headerbegin--><p align=center><a name="TOP"><img src="onepix.gif" alt="Space" align=middle width=1 height=1></a> <a href="default.htm#Top"><img src="toc.gif" alt=" Contents" align=middle border=0 width=89 height=31></a> <a href="front.htm"><img src="previous.gif" alt="Previous" align=middle border=0 width=32 height=31></a> <a href="install.htm"><img src="next.gif" alt="Next" align=middle border=0 width=32 height=31></a> </p> <hr> <!--Headerend--><p><a name="UnderstandingIndexServer"><font size=6><strong>Understanding Index Server</strong></font></a></p> <p><!--Chaptoc--></p> <blockquote> <p><a href="intro.htm#QueryForms">Query Forms</a> <br> <a href="intro.htm#BasicQueryingFeatures">Basic Querying Features</a> <br> <a href="intro.htm#BasicIndexingFeatures">Basic Indexing Features</a> <br> <a href="intro.htm#SupportforMultipleLanguages">Support for Multiple Languages</a> <br> </p> </blockquote> <hr> <!--ChaptocEnd--><p>Microsoft Index Server is the Microsoft content-indexing and searching solution for Microsoft Internet Information Server (IIS) and Peer Web Services (PWS). An add-on module for IIS and PWS, Microsoft Index Server is designed to index the full text and properties of documents on an IIS-based (or PWS-based) server. Index Server can index documents for both corporate intranets and for any drive accessible through a uniform naming convention (UNC) path on the Internet.</p> <p>Clients can formulate queries by using any World Wide Web (WWW) browser to fill in the fields of a simple Web query form. The Web server forwards the query form to the query engine, which finds the pertinent documents and returns the results to the client formatted as a Web page.</p> <p>Unlike many content indexing systems, Index Server can index the text and properties of formatted documents, such as those created by Microsoft® Word or Microsoft® Excel. This feature lets you publish existing documents on your intranet Web without converting them to HyperText Markup Language (HTML).</p> <hr> <h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="QueryForms">Query Forms</a></h1> <p>Users submit their queries by filling out fields in a form. With Index Server, the administrator for a Web server can create customized forms to help users find documents at the local site. The administrator can modify the form so that the user can search by contents or by other document properties, such as author or subject. The administrator creates a query form with standard HTML, and the form becomes little more than a Web page itself. Any user who knows how to create Web pages with HTML can put together a simple query form in minutes.</p> <hr> <h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="BasicQueryingFeatures">Basic Querying Features</a></h1> <p>These are the basic features of a query:</p> <ul> <li>Scope</li> <li>Restriction</li> <li>Result set</li> </ul> <p>The <em>scope</em> tells the query engine where to look when searching. It describes the set of documents within the <a href="glossary.htm#Corpus">corpus</a> that will be searched. The <em>restriction</em> tests to see if a document should be returned. A restriction is a set of terms that can be combined by various operators. The <em>result set</em> defines the information to return from a query. </p> <p>In addition to the basic features, other features let you control how results are returned and displayed, for example, how results are sorted. You can also:</p> <ul> <li>Limit query to specific scopes</li> <li>Search for words and phrases within document contents</li> <li>Search for words or phrases near another word or phrase</li> <li>Search for words and phrases within textual properties. (for example, @DocAuthor Sally)</li> <li>Search for properties with <, <=, =, =>, > against a constant. (for example,. DATE > 1/1/95)</li> <li>Apply Boolean operators, <strong>AND</strong>, <strong>OR</strong>, and<strong> NOT</strong></li> <li>Search with wild cards ( for example, “*”, “?” and <a href="glossary.htm#regex">regular expressions</a>).</li> <li>Fully integrate searches with Windows NT security model.</li> <li>Rank hits by quality</li> <li>Return specified property data.</li> </ul> <h2>Scope</h2> <p>A query scope specifies the set of documents that must be searched. Typically scopes are specified by a directory path on a storage volume, such as D:\Docs. IIS and PWS Web sites correspond to virtual roots that point to a collection of documents. </p> <p>Index Server indexes documents based on sites. An administrator can index all the sites on a server, or select a subset of sites to index. Queries can be run against multiple sites, against a single site, or even against a single physical directory within a site.</p> <h2>Restriction</h2> <p>You can query against the contents of Web pages and other documents served by IIS (or PWS) and Index Server. The types of documents you can query include HTML, Microsoft® Word, Microsoft® Excel, Microsoft® PowerPoint®, and plain text documents. Other document types are not supported by Index Server directly, but a content filter can extend the list of supported document types. A content filter reads a proprietary document format and emits textual words, which are indexed by Index Server. For more information on content filters, contact Microsoft and ask about the IFilter interface.</p> <p>With Index Server you can search for multiple words and phrases within documents as well as words and phrases near other words and phrases. Index Server also provides free-text queries. With <em>free-text queries</em>, you can enter any set of words or phrases, or even a complete sentence, as the query restriction. Index Server will examine this text, identify all the nouns and noun phrases, and post a query using those terms. For example, assume you typed the following free-text query:</p> <blockquote> <p><em>The Fulton County Grand Jury said Friday an investigation of Atlanta’s recent primary election produced no evidence that any irregularities took place.</em></p> </blockquote> <p>Index Server identifies the following words and noun phrases:</p> <blockquote> <p><strong>Words: </strong>Fulton, county, grand, jury, Friday, investigation, Atlanta, recent, primary, election, produce, evidence, irregularity</p> <p><strong>Phrases: </strong>Fulton county grand jury, primary election, grand jury, Atlanta’s recent primary election</p> </blockquote> <p>These words and phrases are combined into a restriction, weighted for proper ranking, and posted as a query against the corpus.</p> <p><strong>Note</strong>   The free-text query is preceded by <em>$contents</em>.</p> <h3>Property Restrictions</h3> <p>In addition to querying contents, users can query properties stored on objects. These properties include file size, creation and modification dates, file names, authors, and so on. Clients can query both textual properties (file name and author, for example) and numerical properties (size and modification date, for example). Clients can also query all ActiveX™ properties, including custom properties on Microsoft Office documents.</p> <p>You can use the standard comparison operators in queries. These include =, >, <, >=, <=, and != (not equal) for numeric and textual properties. In addition, for textual properties all the content query functionality is available. Properties can be compared only to constants; you cannot compare one property to another in the first release of Index Server. With Boolean operators (<strong>AND</strong>, <strong>OR</strong>, and<strong> NOT</strong>) and parentheses, you can freely mix restriction terms. </p> <h3>Fuzzy Queries</h3> <p>Index Server supports fuzzy queries, which contain simple wildcards (such as those in MS-DOS®) and matches <a href="glossary.htm#regex">regular expressions</a> (from UNIX®) against textual properties. Content queries support simple-prefix matching (for example, “dog*” will return “dogmatic” and “doghouse”). Index Server also supports linguistic stemming, which matches inflected and base forms of query words. (For example, “swim**” is expanded to “swimming,” “swam,” “swum,” and so on.)</p> <p>Although Index Server does not support true natural language processing, it supports free-text mode.</p> <h2>Result Sets</h2> <p>Index Server assembles query hits into result sets, which are returned to the client. The administrator can limit the maximum number of hits returned to the client. For example, a result set of 200 hits can be returned the client in 10 pages of 20 hits each. The query form determines the number of hits returned per page, but you can configure a form to let the client specify the number of hits to be returned.</p> <p>In addition to sorting by rank, Index Server can sort query results according to any document property.</p> <p>If the corpus is stored on a Windows NT File System (NTFS) volume, Index Server respects all security restrictions—access control list (ACL) checking is performed. In a result set, a user can never see a document reference if the ACL on that object prohibits read access to that client.</p> <p>If allowed, the client can specify the specific properties to return in a result set (that is, the columns in the result set). Any property that is valid in a query restriction is valid as a result column. But the administrator can restrict the properties returned by a query.</p> <p>In addition to returning properties stored with the document, Index Server can generate document abstracts, which can also be returned in a result set. An abstract briefly summarizes the content of a document. A document abstract can also be part of a query restriction.</p> <h2>Logging</h2> <p>IIS and PWS already log all traffic moving between a client and the server. Standard IIS and PWS logging picks up query information such as the querying IP address and the queries posted to the server.</p> <hr> <h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="BasicIndexingFeatures">Basic Indexing Features</a></h1> <p>These are the basic features of an index:</p> <ul> <li>Full text search in Web pages</li> <li>Full text search in formatted data such as Microsoft Word or Microsoft Excel documents</li> <li>Incrementally refreshing of indexes</li> <li>Control of indexing for each virtual path</li> <li>Indexing of property values</li> <li>Indexing of text regardless of language</li> <li>Automatic index updates</li> <li>Performance monitoring</li> <li>Zero-maintenance design, 24-hour reliablity</li> <li>Multithreading to take advantage of SMP computers</li> </ul> <p>Indexes are controlled on each virtual path. An index is built over a set of directories (and their child directories). By default, you can incrementally refresh an index—that is, refresh an index by indexing only changed files. Index Server does not need to re-index all the documents to pick up a few changes.</p> <p>With Index Server, a number of different performance monitors help administrators optimize their query service. These monitors measure criteria such as the number of documents that need to be indexed and how fast queries are being processed.</p> <p>By design, Index Server requires little if any maintanance. Once set up, all operations are automatic, including updates, index creation and optimization, and even crash recovery if there is a power failure or if the index gets corrupted. Index Server was designed from the start to work in mission-critical environments where the server must be running 24 hours a day, 7 days a week.</p> <hr> <h1><a href="#TOP"><img src="up.gif" alt="To Top" align=middle border=0 width=14 height=11></a><a name="SupportforMultipleLanguages">Support for Multiple Languages</a></h1> <p>Most Web pages today are authored in English, but many documents are not. Because IIS and PWS can serve documents, multilingual indexing and querying features are a standard feature of Index Server. The query system was built with localization in mind. It is completely modular and can dynamically load and unload language-specific utilities. These utilities include word breakers, stemmers, and normalizers. These linguistic components are available for several languages.</p> <p>Index Server can index multilingual documents and switch between languages as required (for example, index an English paragraph, index a French paragraph, and switch back to English). All index information is stored as Unicode characters, and all queries are converted to Unicode before they are processed.</p> <!--Footerbegin--><hr> <p align=center><a href="default.htm#Top"><img src="toc.gif" alt=" Contents" align=middle border=0 width=89 height=31></a> <a href="front.htm"><img src="previous.gif" alt="Previous" align=middle border=0 width=32 height=31></a> <a href="#TOP"><img src="up_end.gif" alt="To Top" align=middle border=0 width=32 height=31></a> <a href="install.htm"><img src="next.gif" alt="Next" align=middle border=0 width=32 height=31></a> </p> <hr> <p align=center><em>© 1996 by Microsoft Corporation. All rights reserved.<!--Footerend--></em></p> </body>
</html>
|