|
|
TYPES OF RESTRICTIONS
1) Content Restriction
Input: <property>, <text>, <fuzzy level>
Matches documents which contain <text> in <property>.
<property> may be any textual Ole property or a special property. The special properties include CONTENTS (the main body of the document), ALL (search all properties), and user-defined PSEUDO-PROPERTIES (text distinguished for purposes of content search).
<fuzzy level> describes how exactly <text> has to match the document. Fuzzy level 0 is exact match. Fuzzy level 1 is prefix match (each word is treated as a prefix). Fuzzy level 2 is morphological stemming (run would match run, running, ran, etc.)
The result of a content query may be out-of-date.
2) Property Restriction
Input: <property>, <relop>, <value>
Matches documents where <property> <relop> <value>.
<property> must be a true Ole property, or a few special properties that are valid only in query results. The special properties are RANK (how well the restriction matches the object), HITCOUNT (number of content index 'hits'), and RANK VECTOR (for use with vector restriction)
<relop> is one of: <, <=, =, !=, >=, >, SOME OF, and ALL OF. The last two are bitwise operations valid only for integer types. In C++ syntax, SOME OF is (<property> & <value>) != 0, and ALL OF is (<property> & <value>) == <value>.
<value> is a STGVARIANT.
The result of a property query always reflects the last saved state of all objects.
TYPES OF INDEXES
1) Content Index
The content index is a mapping of <property>,<words> back to the documents which contain <words> in <property>.
There is no scoping within the content index.
The content index is lazily updated. It may be out-of-date.
2) Value Index
A value index is a mapping from <property>,<range of values> back to the documents which have a value within <range of values> for the <property>.
In other words, the possible range of values for a data type (VT_FILETIME, VT_I4, etc) is divided into "buckets". Every possible value falls into one of these buckets. Note that the mapping is from bucket to document, not value to document. A search for SIZE == 500 might map to a bucket from 250 to 525. So the result of index lookup would be all files from SIZE 250 to 525, not just those having SIZE == 500.
There is no scoping within a value index.
Value indices can be used in conjunction with content index. They are lazily updated with the same frequency as content index.
There is no administration necessary to set up value indices. All properties are value indexed except a few hard-coded exceptions. This may change in the future.
3) View Index
A view index is a B-Tree. It contains a complete sorted list of files for a single directory. Besides key columns, the view can contain additional unsorted columns. These improve retrieval efficiency but have less effect on query efficiency.
View indices must be created by an administrator.
4) Directory Index
Listed for completeness. This is a view index on the filename property. It is always available.
RULES FOR MATCHING QUERY WITH INDEX (in order of precedence)
1) If a query contains a content restriction, use content index, adding value indices if appropriate.
2) If one or more properties of a property restriction are used in the sort order of a view index, and the query is shallow, then use view index.
Note that properties of the view must be used in order. A view on SIZE and FILENAME could be used for queries involving SIZE, and queries involving both SIZE and FILENAME, but not for queries involving just FILENAME.
If more than one view is applicable, then the view in which the most keys of the sort appear in the restriction is used. Thus given two views: SIZE, FILENAME and SIZE, ATTRIBUTES, a query for SIZE and FILENAME would use the former.
3) If one or more properties of a property restriction is value indexed, and the value index is reasonably up-to-date, and the query is shallow, then use value indexing.
4) If 1, 2, and 3 do not apply, or if the volume is downlevel (not Ofs), then use the directory index (e.g. enumeration).
|