Leaked source code of windows server 2003
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

226 lines
9.2 KiB

  1. <html>
  2. <head>
  3. <meta name="GENERATOR" content="Microsoft FrontPage 3.0">
  4. <title>Unicode 3.0 NamesList File Structure</title>
  5. </head>
  6. <body>
  7. <h3>Unicode NamesList File Format</h3>
  8. <p>Last updated: 1999-07-06</p>
  9. <h3>1.0 Introduction</h3>
  10. <p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used
  11. to drive the layout of the character code charts in the Unicode Standard. The information
  12. in this file is a combination of several fields from the UnicodeData.txt and Blocks.txt files,
  13. together with additional annotations for many characters. This document describes the
  14. syntax rules for the file format, but also gives brief information on how each construct
  15. is rendered when laid out for the book. Some of the syntax elements were used in
  16. preparation of the drafts of the book and may not be present in the final, released form
  17. of the NamesList.txt file.</p>
  18. <p>The same input file can be used to do the draft preparation for ISO/IEC 10646 (referred
  19. below as ISO-style). This necessitates the presence of some information in the name list
  20. file that is not needed (and in fact removed during parsing) for the Unicode book.</p>
  21. <p>With access to the layout program (unibook.exe) it is a simple matter of creating
  22. name lists for the purpose of formatting working drafts containing proposed characters.</p>
  23. <h3>1.1 NamesList File Overview</h3>
  24. <p>The *.lst files are plain text files which in their most simple form look like this</p>
  25. <p>@@&lt;tab&gt;0020&lt;tab&gt;BASIC LATIN&lt;tab&gt;007F<br>
  26. ; this is a file comment (ignored)<br>
  27. 0020&lt;tab&gt;SPACE<br>
  28. 0021&lt;tab&gt;EXCLAMATION MARK<br>
  29. 0022&lt;tab&gt;QUOTATION MARK<br>
  30. . . . <br>
  31. 007F&lt;tab&gt;DELETE</p>
  32. <p>The semicolon (as first character), @ and &lt;tab&gt; characters are used by the file
  33. syntax and must be provided as shown. Hexadecimal digits must be in UPPER CASE). A double
  34. @@ introduces a block header, with the title, and start and ending code of the block
  35. provided as shown.</p>
  36. <p>For an ISO-style, minimal name list, only the NAME_LINE and BLOCKHEADER and their
  37. constituent syntax elements are needed.</p>
  38. <p>The full syntax with all the options is provided in the following sections.</p>
  39. <h3>1.2 NamesList File Structure</h3>
  40. <p>This section gives defines the overall file structure</p>
  41. <pre><strong>NAMELIST: TITLE_PAGE* BLOCK*
  42. </strong>
  43. <strong>TITLE_PAGE: TITLE
  44. | TITLE_PAGE SUBTITLE
  45. | TITLE_PAGE SUBHEADER
  46. | TITLE_PAGE IGNORED_LINE
  47. | TITLE_PAGE EMPTY_LINE
  48. | TITLE_PAGE COMMENTLINE
  49. | TITLE_PAGE NOTICE
  50. | TITLE_PAGE PAGEBREAK
  51. </strong>
  52. <strong>BLOCK: BLOCKHEADER
  53. | BLOCK CHAR_ENTRY
  54. | BLOCK SUBHEADER
  55. | BLOCK NOTICE
  56. | BLOCK EMPTY_LINE
  57. | BLOCK IGNORED_LINE
  58. | BLOCK PAGEBREAK
  59. CHAR_ENTRY: NAME_LINE | RESERVED_LINE
  60. | CHAR_ENTRY ALIAS_LINE
  61. | CHAR_ENTRY COMMENT_LINE
  62. | CHAR_ENTRY CROSS_REF
  63. | CHAR_ENTRY DECOMPOSITION
  64. | CHAR_ENTRY COMPAT_MAPPING
  65. | CHAR_ENTRY IGNORED_LINE
  66. | CHAR_ENTRY EMPTY_LINE
  67. | CHAR_ENTRY NOTICE
  68. </strong></pre>
  69. <p>In other words:<br>
  70. <br>
  71. Neither TITLE nor&nbsp; SUBTITLE may occur after the first BLOCKHEADER. </p>
  72. <p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE,&nbsp; and IGNORED_LINE may
  73. occur before the first BLOCKHEADER.</p>
  74. <p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of
  75. the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE,
  76. CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, NOTICE, EMPTY_LINE and IGNORED_LINE.</p>
  77. <p>Except for EMPTY_LINE, NOTICE and IGNORED_LINE, none of these lines may occur in any other
  78. place. </p>
  79. <p>Note: A NOTICE displays differently depending on whether it follows a header or title
  80. or is part of a CHAR_ENTRY.</p>
  81. <h3>1.3 NamesList File Elements</h3>
  82. <p>This section provides the details of the syntax for the individual elements.</p>
  83. <pre><small><strong>ELEMENT SYNTAX</strong> // How rendered</small></pre>
  84. <pre><small><strong>NAME_LINE: CHAR &lt;tab&gt; LINE
  85. </strong> // the CHAR and the corresponding image are echoed,
  86. // followed by the name as given in LINE
  87. <strong> CHAR TAB NAME COMMENT LF
  88. </strong> // Names may have a comment, which is stripped off
  89. // unless the file is parsed for an ISO style list
  90. <strong>RESERVED_LINE: CHAR TAB &lt;reserved&gt;
  91. </strong> // the CHAR is echoed followed by an icon for the
  92. // reserved character and a fixed string e.g. &lt;reserved&gt;
  93. <strong>COMMMENT_LINE: &lt;tab&gt; &quot;*&quot; SP EXPAND_LINE
  94. </strong> // * is replaced by BULLET, output line as comment
  95. <strong>&lt;tab&gt; EXPAND_LINE</strong>
  96. // output line as comment
  97. <strong>ALIAS_LINE: &lt;tab&gt; &quot;=&quot; SP LINE
  98. </strong> // replace = by itself, output line as alias
  99. <strong>CROSS_REF: &lt;tab&gt; &quot;X&quot; SP EXPAND_LINE
  100. </strong> // X is replaced by a right arrow
  101. <strong> &lt;tab&gt; &quot;X&quot; SP &quot;(&quot; STRING SP &quot;-&quot; SP CHAR &quot;)&quot;
  102. </strong> // X is replaced by a right arrow
  103. // the &quot;(&quot;, &quot;-&quot;, &quot;)&quot; are removed, the
  104. // order of CHAR and STRING is reversed
  105. // i.e. both inputs result in the same output
  106. <strong>IGNORED_LINE: &lt;tab&gt; &quot;;&quot; EXPAND_LINE
  107. EMPTY_LINE: LF
  108. </strong> // empty lines and file comments are ignored
  109. <strong>DECOMPOSITION: &lt;tab&gt; &quot;:&quot; EXPAND_LINE
  110. </strong> // replace ':' by EQUIV, expand line into
  111. // decomposition
  112. <strong>COMPAT_MAPPING: &lt;tab&gt; &quot;#&quot; SP EXPAND_LINE
  113. </strong> // replace '#' by APPROX, output line as mapping
  114. <strong>NOTICE: &quot;@+&quot; &lt;tab&gt; LINE
  115. </strong> // skip '@+', output text as notice
  116. <strong> &quot;@+&quot; TAB * SP LINE
  117. </strong> // skip '@', output text as notice
  118. // &quot;*&quot; expands to a bullet character
  119. // Notices following a character code apply to the
  120. // character and are indented. Notices not following
  121. // a character code apply to the page/block/column
  122. // and are italicized, but not indented
  123. <strong>SUBTITLE: &quot;@@@+&quot; &lt;tab&gt; LINE
  124. </strong> // skip &quot;@@@+&quot;, output text as subtitle
  125. <strong>SUBHEADER: &quot;@&quot; &lt;tab&gt; LINE
  126. </strong> // skip '@', output line as text as column header
  127. <strong>BLOCKHEADER: &quot;@@&quot; &lt;tab&gt; BLOCKSTART &lt;tab&gt; BLOCKNAME &lt;tab&gt; BLOCKEND
  128. </strong> // skip &quot;@@&quot;, cause a page break and optional
  129. // blank page, then output one or more charts
  130. // followed by the list of character names.
  131. // use BLOCKSTART and BLOCKEND to define the
  132. // what characters belong to a block
  133. // use blockname in page and table headers
  134. <strong> &quot;@@&quot; &lt;tab&gt; BLOCKSTART &lt;tab&gt; BLOCKNAME COMMENT &lt;tab&gt; BLOCKEND
  135. </strong>// if a comment is present it replaces the blockname
  136. // when an ISO-style namelist is laid out
  137. <strong>BLOCKSTART: CHAR</strong> // first character position in block
  138. <strong>BLOCKEND: CHAR</strong> // last character position in block
  139. <strong>PAGE_BREAK: &quot;@@&quot;</strong> // insert a (column) break
  140. <strong>TITLE: &quot;@@@&quot; &lt;tab&gt; LINE</strong>
  141. // skip &quot;@@@&quot;, output line as text
  142. // Title is used in page headers
  143. <strong>EXPAND_LINE: {CHAR | STRING}+ LF </strong>
  144. // all instances of CHAR *) are replaced by
  145. // CHAR NBSP x NBSP where x is the single Unicode
  146. // character corresponding to char
  147. // If character is combining, it is replaced with
  148. // CHAR NBSP &lt;circ&gt; x NBSP where &lt;circ&gt; is the
  149. // dotted circle</small>
  150. </pre>
  151. <h3><strong>1.4 NamesList File Primitives</strong></h3>
  152. <p>The following are the primitives and terminals for the NamesList syntax.</p>
  153. <pre><small><strong>LINE: STRING LF
  154. COMMENT: &quot;(&quot; NAME &quot;)&quot;
  155. &quot;(&quot; NAME &quot;)&quot; &quot;*&quot;
  156. </strong>
  157. <strong>NAME</strong>: &lt;sequence of ASCII characters, except &quot;(&quot; or &quot;)&quot; &gt;
  158. <strong>STRING</strong>: &lt;sequence of Latin-1 characters&gt;
  159. <strong>CHAR</strong>: <strong>X X X X</strong>
  160. <strong>| X X X X X X X X X</strong></small>
  161. <small><strong>X: &quot;0&quot;|&quot;1&quot;|&quot;2&quot;|&quot;3&quot;|&quot;4&quot;|&quot;5&quot;|&quot;6&quot;|&quot;7&quot;|&quot;8&quot;|&quot;9&quot;|&quot;A&quot;|&quot;B&quot;|&quot;C&quot;|&quot;D&quot;|&quot;E&quot;|&quot;F&quot;
  162. &lt;tab&gt;:</strong> &lt;sequence of one or more ASCII tab characters 0x09&gt;
  163. <strong>SP</strong>: &lt;ASCII 0x20&gt;
  164. <strong>LF</strong>: &lt;any sequence of ASCII 0x0A and 0x0D&gt;
  165. </small></pre>
  166. <p><strong>Notes:</strong>
  167. <ul>
  168. <li>Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from
  169. being misinterpreted as ISO CHAR.</li>
  170. <li>Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as
  171. UTF-16LE.</li>
  172. <li>The final LF in the file must be present</li>
  173. <li>A CHAR inside ' or &quot; is expanded, but only its glyph image is printed,&nbsp; the
  174. code value is not echoed</li>
  175. <li>Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
  176. Apostrophes are supported, but nested quotes are not.</li>
  177. </ul>
  178. </body>
  179. </html>