You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
226 lines
9.2 KiB
226 lines
9.2 KiB
<html>
|
|
|
|
<head>
|
|
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
|
|
<title>Unicode 3.0 NamesList File Structure</title>
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<h3>Unicode NamesList File Format</h3>
|
|
|
|
<p>Last updated: 1999-07-06</p>
|
|
|
|
<h3>1.0 Introduction</h3>
|
|
|
|
<p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used
|
|
to drive the layout of the character code charts in the Unicode Standard. The information
|
|
in this file is a combination of several fields from the UnicodeData.txt and Blocks.txt files,
|
|
together with additional annotations for many characters. This document describes the
|
|
syntax rules for the file format, but also gives brief information on how each construct
|
|
is rendered when laid out for the book. Some of the syntax elements were used in
|
|
preparation of the drafts of the book and may not be present in the final, released form
|
|
of the NamesList.txt file.</p>
|
|
|
|
<p>The same input file can be used to do the draft preparation for ISO/IEC 10646 (referred
|
|
below as ISO-style). This necessitates the presence of some information in the name list
|
|
file that is not needed (and in fact removed during parsing) for the Unicode book.</p>
|
|
|
|
<p>With access to the layout program (unibook.exe) it is a simple matter of creating
|
|
name lists for the purpose of formatting working drafts containing proposed characters.</p>
|
|
|
|
<h3>1.1 NamesList File Overview</h3>
|
|
|
|
<p>The *.lst files are plain text files which in their most simple form look like this</p>
|
|
|
|
<p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br>
|
|
; this is a file comment (ignored)<br>
|
|
0020<tab>SPACE<br>
|
|
0021<tab>EXCLAMATION MARK<br>
|
|
0022<tab>QUOTATION MARK<br>
|
|
. . . <br>
|
|
007F<tab>DELETE</p>
|
|
|
|
<p>The semicolon (as first character), @ and <tab> characters are used by the file
|
|
syntax and must be provided as shown. Hexadecimal digits must be in UPPER CASE). A double
|
|
@@ introduces a block header, with the title, and start and ending code of the block
|
|
provided as shown.</p>
|
|
|
|
<p>For an ISO-style, minimal name list, only the NAME_LINE and BLOCKHEADER and their
|
|
constituent syntax elements are needed.</p>
|
|
|
|
<p>The full syntax with all the options is provided in the following sections.</p>
|
|
|
|
<h3>1.2 NamesList File Structure</h3>
|
|
|
|
<p>This section gives defines the overall file structure</p>
|
|
|
|
<pre><strong>NAMELIST: TITLE_PAGE* BLOCK*
|
|
</strong>
|
|
<strong>TITLE_PAGE: TITLE
|
|
| TITLE_PAGE SUBTITLE
|
|
| TITLE_PAGE SUBHEADER
|
|
| TITLE_PAGE IGNORED_LINE
|
|
| TITLE_PAGE EMPTY_LINE
|
|
| TITLE_PAGE COMMENTLINE
|
|
| TITLE_PAGE NOTICE
|
|
| TITLE_PAGE PAGEBREAK
|
|
</strong>
|
|
<strong>BLOCK: BLOCKHEADER
|
|
| BLOCK CHAR_ENTRY
|
|
| BLOCK SUBHEADER
|
|
| BLOCK NOTICE
|
|
| BLOCK EMPTY_LINE
|
|
| BLOCK IGNORED_LINE
|
|
| BLOCK PAGEBREAK
|
|
|
|
CHAR_ENTRY: NAME_LINE | RESERVED_LINE
|
|
| CHAR_ENTRY ALIAS_LINE
|
|
| CHAR_ENTRY COMMENT_LINE
|
|
| CHAR_ENTRY CROSS_REF
|
|
| CHAR_ENTRY DECOMPOSITION
|
|
| CHAR_ENTRY COMPAT_MAPPING
|
|
| CHAR_ENTRY IGNORED_LINE
|
|
| CHAR_ENTRY EMPTY_LINE
|
|
| CHAR_ENTRY NOTICE
|
|
</strong></pre>
|
|
|
|
<p>In other words:<br>
|
|
<br>
|
|
Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p>
|
|
|
|
<p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, and IGNORED_LINE may
|
|
occur before the first BLOCKHEADER.</p>
|
|
|
|
<p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted sequence of
|
|
the following lines may occur (in any order and repeated as often as needed): ALIAS_LINE,
|
|
CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, NOTICE, EMPTY_LINE and IGNORED_LINE.</p>
|
|
|
|
<p>Except for EMPTY_LINE, NOTICE and IGNORED_LINE, none of these lines may occur in any other
|
|
place. </p>
|
|
|
|
<p>Note: A NOTICE displays differently depending on whether it follows a header or title
|
|
or is part of a CHAR_ENTRY.</p>
|
|
|
|
<h3>1.3 NamesList File Elements</h3>
|
|
|
|
<p>This section provides the details of the syntax for the individual elements.</p>
|
|
|
|
<pre><small><strong>ELEMENT SYNTAX</strong> // How rendered</small></pre>
|
|
|
|
<pre><small><strong>NAME_LINE: CHAR <tab> LINE
|
|
</strong> // the CHAR and the corresponding image are echoed,
|
|
// followed by the name as given in LINE
|
|
|
|
<strong> CHAR TAB NAME COMMENT LF
|
|
</strong> // Names may have a comment, which is stripped off
|
|
// unless the file is parsed for an ISO style list
|
|
|
|
<strong>RESERVED_LINE: CHAR TAB <reserved>
|
|
</strong> // the CHAR is echoed followed by an icon for the
|
|
// reserved character and a fixed string e.g. <reserved>
|
|
|
|
<strong>COMMMENT_LINE: <tab> "*" SP EXPAND_LINE
|
|
</strong> // * is replaced by BULLET, output line as comment
|
|
<strong><tab> EXPAND_LINE</strong>
|
|
// output line as comment
|
|
|
|
<strong>ALIAS_LINE: <tab> "=" SP LINE
|
|
</strong> // replace = by itself, output line as alias
|
|
|
|
<strong>CROSS_REF: <tab> "X" SP EXPAND_LINE
|
|
</strong> // X is replaced by a right arrow
|
|
<strong> <tab> "X" SP "(" STRING SP "-" SP CHAR ")"
|
|
</strong> // X is replaced by a right arrow
|
|
// the "(", "-", ")" are removed, the
|
|
// order of CHAR and STRING is reversed
|
|
// i.e. both inputs result in the same output
|
|
|
|
<strong>IGNORED_LINE: <tab> ";" EXPAND_LINE
|
|
EMPTY_LINE: LF
|
|
</strong> // empty lines and file comments are ignored
|
|
|
|
<strong>DECOMPOSITION: <tab> ":" EXPAND_LINE
|
|
</strong> // replace ':' by EQUIV, expand line into
|
|
// decomposition
|
|
|
|
<strong>COMPAT_MAPPING: <tab> "#" SP EXPAND_LINE
|
|
</strong> // replace '#' by APPROX, output line as mapping
|
|
|
|
<strong>NOTICE: "@+" <tab> LINE
|
|
</strong> // skip '@+', output text as notice
|
|
<strong> "@+" TAB * SP LINE
|
|
</strong> // skip '@', output text as notice
|
|
// "*" expands to a bullet character
|
|
// Notices following a character code apply to the
|
|
// character and are indented. Notices not following
|
|
// a character code apply to the page/block/column
|
|
// and are italicized, but not indented
|
|
|
|
<strong>SUBTITLE: "@@@+" <tab> LINE
|
|
</strong> // skip "@@@+", output text as subtitle
|
|
|
|
<strong>SUBHEADER: "@" <tab> LINE
|
|
</strong> // skip '@', output line as text as column header
|
|
|
|
<strong>BLOCKHEADER: "@@" <tab> BLOCKSTART <tab> BLOCKNAME <tab> BLOCKEND
|
|
</strong> // skip "@@", cause a page break and optional
|
|
// blank page, then output one or more charts
|
|
// followed by the list of character names.
|
|
// use BLOCKSTART and BLOCKEND to define the
|
|
// what characters belong to a block
|
|
// use blockname in page and table headers
|
|
<strong> "@@" <tab> BLOCKSTART <tab> BLOCKNAME COMMENT <tab> BLOCKEND
|
|
</strong>// if a comment is present it replaces the blockname
|
|
// when an ISO-style namelist is laid out
|
|
|
|
<strong>BLOCKSTART: CHAR</strong> // first character position in block
|
|
<strong>BLOCKEND: CHAR</strong> // last character position in block
|
|
<strong>PAGE_BREAK: "@@"</strong> // insert a (column) break
|
|
|
|
<strong>TITLE: "@@@" <tab> LINE</strong>
|
|
// skip "@@@", output line as text
|
|
// Title is used in page headers
|
|
|
|
<strong>EXPAND_LINE: {CHAR | STRING}+ LF </strong>
|
|
// all instances of CHAR *) are replaced by
|
|
// CHAR NBSP x NBSP where x is the single Unicode
|
|
// character corresponding to char
|
|
// If character is combining, it is replaced with
|
|
// CHAR NBSP <circ> x NBSP where <circ> is the
|
|
// dotted circle</small>
|
|
</pre>
|
|
|
|
<h3><strong>1.4 NamesList File Primitives</strong></h3>
|
|
|
|
<p>The following are the primitives and terminals for the NamesList syntax.</p>
|
|
|
|
<pre><small><strong>LINE: STRING LF
|
|
COMMENT: "(" NAME ")"
|
|
"(" NAME ")" "*"
|
|
</strong>
|
|
<strong>NAME</strong>: <sequence of ASCII characters, except "(" or ")" >
|
|
<strong>STRING</strong>: <sequence of Latin-1 characters>
|
|
<strong>CHAR</strong>: <strong>X X X X</strong>
|
|
<strong>| X X X X X X X X X</strong></small>
|
|
<small><strong>X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F"
|
|
<tab>:</strong> <sequence of one or more ASCII tab characters 0x09>
|
|
<strong>SP</strong>: <ASCII 0x20>
|
|
<strong>LF</strong>: <any sequence of ASCII 0x0A and 0x0D>
|
|
</small></pre>
|
|
|
|
<p><strong>Notes:</strong>
|
|
|
|
<ul>
|
|
<li>Special lookahead logic prevents a mention of a 4 digit standard, such as ISO 9999 from
|
|
being misinterpreted as ISO CHAR.</li>
|
|
<li>Use of Latin-1 is supported in unibook.exe, but not portably, unless the file is encoded as
|
|
UTF-16LE.</li>
|
|
<li>The final LF in the file must be present</li>
|
|
<li>A CHAR inside ' or " is expanded, but only its glyph image is printed, the
|
|
code value is not echoed</li>
|
|
<li>Straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
|
|
Apostrophes are supported, but nested quotes are not.</li>
|
|
</ul>
|
|
</body>
|
|
</html>
|