mirror of https://github.com/tongzx/nt5src
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1429 lines
34 KiB
1429 lines
34 KiB
|
|
NLSTRANS - NLS Translation Utility
|
|
|
|
|
|
|
|
Starting the Translation Utility
|
|
--------------------------------
|
|
|
|
nlstrans [-v] <inputfile>
|
|
|
|
-v turns on the verbose mode. This switch is optional.
|
|
|
|
<inputfile> is the name of the input file containing variations
|
|
of the commands listed below.
|
|
|
|
|
|
|
|
|
|
Command Legend
|
|
--------------
|
|
|
|
<cpnum> - The code page number (in decimal).
|
|
<langstr> - The language string identifying the language.
|
|
<lcid> - The locale id identifying the locale information.
|
|
<num entries> - The number of entries to follow (in decimal).
|
|
<mbchar> - The multibyte character (in hexadecimal).
|
|
<wchar> - The wide character (in hexadecimal).
|
|
<lowrange> - The low end of the DBCS range (in hexidecimal).
|
|
<highrange> - The high end of the DBCS range (in hexidecimal).
|
|
<maxcharlen> - The maximum length, in bytes, of a character (in decimal).
|
|
<defaultchar> - The default character (in hexadecimal).
|
|
<dc_unitrans> - The unicode translation of the default character (in hex).
|
|
<ctype1> - The character type 1 information (in hexidecimal).
|
|
<ctype2> - The character type 2 information (in hexidecimal).
|
|
<ctype3> - The character type 3 information (in hexidecimal).
|
|
<upper> - The upper case wide character (in hexadecimal).
|
|
<lower> - The lower case wide character (in hexadecimal).
|
|
<digit> - The digit to translate to ascii (in hexadecimal).
|
|
<ascii> - The ascii translation (in hexadecimal).
|
|
<czone> - The compatibility zone character to translate (in hex).
|
|
<katakana> - The katakana character to translate (in hex).
|
|
<hiragana> - The hiragana character to translate (in hex).
|
|
<half width> - The half width character to translate (in hex).
|
|
<full width> - The full width character to translate (in hex).
|
|
<precomp> - The precomposed character (in hexidecimal).
|
|
<base> - The base character for the given precomposed form (in hex).
|
|
<nonspace> - The nonspace character for the given precomposed form (in hex).
|
|
<code pt> - The Unicode code point (in hexidecimal).
|
|
<SM> - The script member (in hex).
|
|
<AW> - The alphanumeric weight (in hex).
|
|
<DW> - The diacritic weight (in hex).
|
|
<CW> - The case weight (in hex).
|
|
<COMP> - The compression value - 0, 1, 2, or 3 (in hex).
|
|
|
|
|
|
|
|
|
|
Commands
|
|
--------
|
|
|
|
(1) Code Page Specific Translation Tables
|
|
|
|
- A semicolon may be used to denote a comment. The comment will be
|
|
read until the end of the current line. So, once a semicolon is
|
|
used, the rest of the current line is ignored.
|
|
|
|
|
|
CODEPAGE <cpnum>
|
|
|
|
- Starts the code page specific section.
|
|
|
|
- Use the ENDCODEPAGE keyword to end the code page specific section.
|
|
|
|
- Only the following keywords may be used between this keyword and
|
|
the ENDCODEPAGE keyword:
|
|
|
|
- CPINFO
|
|
- MBTABLE
|
|
- GLYPHTABLE
|
|
- DBCSRANGE
|
|
- WCTABLE
|
|
|
|
|
|
ENDCODEPAGE
|
|
|
|
- Ends the code page specific section.
|
|
|
|
- Only used following the CODEPAGE keyword.
|
|
|
|
|
|
CPINFO <maxcharlen> <defaultchar> <dc_unitrans>
|
|
|
|
- The code page information.
|
|
|
|
- This table MUST appear FIRST in the data file.
|
|
|
|
|
|
MBTABLE <num entries>
|
|
|
|
- The multibyte translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<mbchar> <wchar>
|
|
|
|
- The maximum <num entries> should be 256.
|
|
|
|
|
|
GLYPHTABLE <num entries>
|
|
|
|
- The glyph character multibyte translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<mbchar> <wchar>
|
|
|
|
- The maximum <num entries> should be 256.
|
|
|
|
- This table MUST appear AFTER the MBTABLE in the data file.
|
|
|
|
|
|
DBCSRANGE <num entries>
|
|
|
|
- The DBCS ranges.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<lowrange> <highrange>
|
|
|
|
|
|
DBCSTABLE <num entries>
|
|
|
|
- The DBCS translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<mbchar> <wchar>
|
|
|
|
- The maximum <num entries> should be 256.
|
|
|
|
- The DBCS tables MUST immediately follow their ranges and must
|
|
include the DBCSTABLE keyword. The tables MUST also be in the
|
|
order in which they appear in the range (lowest first, highest last).
|
|
|
|
|
|
WCTABLE <num entries>
|
|
|
|
- The wide character translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<wchar> <mbchar>
|
|
|
|
|
|
|
|
(2) Language Specific Translation Tables
|
|
|
|
- A semicolon may be used to denote a comment. The comment will be
|
|
read until the end of the current line. So, once a semicolon is
|
|
used, the rest of the current line is ignored.
|
|
|
|
|
|
LANGUAGE <langstr>
|
|
|
|
- Starts the language specific section.
|
|
|
|
- Use the ENDLANGUAGE keyword to end the language specific section.
|
|
|
|
- Only the following keywords may be used between this keyword and
|
|
the ENDLANGUAGE keyword:
|
|
|
|
- UPPERCASE
|
|
- LOWERCASE
|
|
|
|
|
|
ENDLANGUAGE
|
|
|
|
- Ends the language specific section.
|
|
|
|
- Only used following the LANGUAGE keyword.
|
|
|
|
|
|
UPPERCASE <num entries>
|
|
|
|
- The upper case translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<lower> <upper>
|
|
|
|
|
|
LOWERCASE <num entries>
|
|
|
|
- The lower case translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<upper> <lower>
|
|
|
|
|
|
EXCEPTION <num entries>
|
|
|
|
- The exception table for linguistic casing.
|
|
|
|
- This table contains all exceptions to the default table on
|
|
a per locale id basis in order to get proper linguistic
|
|
casing.
|
|
|
|
- The 0x00000000 locale id is used to make changes to the default
|
|
table for *all* locales. These exceptions will become part of the
|
|
default linguistic casing table.
|
|
|
|
- All entries in the exception table must exist in some form
|
|
in the default table. If there is no translation desired in
|
|
the default table, then enter the code point as upper/lower
|
|
casing to itself.
|
|
|
|
- The table to follow should be in the format (for each lcid):
|
|
|
|
LCID <lcid> <num upcase entries> <num locase entries>
|
|
|
|
UPPERCASE
|
|
|
|
<lower> <upper>
|
|
|
|
LOWERCASE
|
|
|
|
<upper> <lower>
|
|
|
|
|
|
|
|
(3) Locale Specific Translation Tables
|
|
|
|
- NO COMMENTS will be accepted at anytime between the LOCALE and
|
|
ENDLOCALE keywords and the CALENDAR and ENDCALENDAR keywords.
|
|
A semicolon on a line will be used as part of the locale or
|
|
calendar information, as well as any characters after the
|
|
semicolon on the same line.
|
|
|
|
|
|
LOCALE <num entries>
|
|
|
|
- Starts the locale specific section.
|
|
|
|
- Use the ENDLOCALE keyword to end the entire locale specific section.
|
|
|
|
- Each set of locale information to follow should be in the format:
|
|
|
|
|
|
BEGINLOCALE <lcid>
|
|
|
|
- The locale information. The order of the information is
|
|
given below.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<keyword> <info>
|
|
|
|
or in some cases:
|
|
|
|
<keyword> <num> <info>
|
|
<info>
|
|
...
|
|
|
|
where
|
|
|
|
<keyword> is the keyword for the given information.
|
|
This string is ignored.
|
|
|
|
<num> is the number of entries for the keyword. This means
|
|
there will be 'num' number of entries, where each
|
|
entry MUST BE on a separate line. The keywords that
|
|
require the 'num' field are noted in the list of items
|
|
below.
|
|
|
|
<info> is the information to store in the data file. All
|
|
information will be stored as a Unicode string.
|
|
|
|
The escape sequence "\x" may be used to designate hex
|
|
values above 0x00ff, but ALL 4 digits of the Unicode
|
|
character MUST exist for this to work properly.
|
|
|
|
If the backslash character is to appear in the given
|
|
string (it's not part of an escape sequence), then
|
|
two backslashes must be used in succession.
|
|
|
|
White space (space and tab) is stripped from both the
|
|
front and the back of the string unless specifically
|
|
noted with the escape sequence. All other white space
|
|
is preserved.
|
|
|
|
To include TWO separate null-terminated strings for
|
|
one LCTYPE, the strings must be separated by \xffff.
|
|
This will be changed to 0x0000 in the binary file.
|
|
Currently, the second string will only be used by
|
|
the SMONTHNAME LCType information in the
|
|
GetDateFormatW api (Russian month names have different
|
|
grammar).
|
|
|
|
This section must have the following information (IN THE GIVEN
|
|
ORDER) following the BEGINLOCALE keyword.
|
|
|
|
|
|
ILANGUAGE
|
|
SENGLANGUAGE
|
|
SABBREVLANGNAME
|
|
SISO639LANGNAME
|
|
SNATIVELANGNAME
|
|
|
|
ICOUNTRY
|
|
SENGCOUNTRY
|
|
SABBREVCTRYNAME
|
|
SISO3166CTRYNAME
|
|
SNATIVECTRYNAME
|
|
|
|
IDEFAULTLANGUAGE
|
|
IDEFAULTCOUNTRY
|
|
IDEFAULTANSICODEPAGE
|
|
IDEFAULTOEMCODEPAGE
|
|
|
|
SLIST
|
|
IMEASURE
|
|
|
|
SDECIMAL
|
|
STHOUSAND
|
|
SGROUPING
|
|
IDIGITS
|
|
ILZERO
|
|
INEGNUMBER
|
|
SNATIVEDIGITS
|
|
IDIGITSUBSTITUTION
|
|
|
|
SCURRENCY
|
|
SINTLSYMBOL
|
|
SMONDECIMALSEP
|
|
SMONTHOUSANDSEP
|
|
SMONGROUPING
|
|
ICURRDIGITS
|
|
IINTLCURRDIGITS
|
|
ICURRENCY
|
|
INEGCURR
|
|
SPOSITIVESIGN
|
|
SNEGATIVESIGN
|
|
|
|
STIMEFORMAT <num>
|
|
STIME
|
|
ITIME
|
|
ITLZERO
|
|
ITIMEMARKPOSN
|
|
S1159
|
|
S2359
|
|
|
|
SSHORTDATE <num>
|
|
SDATE
|
|
IDATE
|
|
ICENTURY
|
|
IDAYLZERO
|
|
IMONLZERO
|
|
|
|
SLONGDATE <num>
|
|
ILDATE
|
|
|
|
ICALENDARTYPE
|
|
IOPTIONALCALENDAR <num> (use \xffff for localized calendar name)
|
|
|
|
IFIRSTDAYOFWEEK
|
|
IFIRSTWEEKOFYEAR
|
|
|
|
SDAYNAME1
|
|
SDAYNAME2
|
|
SDAYNAME3
|
|
SDAYNAME4
|
|
SDAYNAME5
|
|
SDAYNAME6
|
|
SDAYNAME7
|
|
|
|
SABBREVDAYNAME1
|
|
SABBREVDAYNAME2
|
|
SABBREVDAYNAME3
|
|
SABBREVDAYNAME4
|
|
SABBREVDAYNAME5
|
|
SABBREVDAYNAME6
|
|
SABBREVDAYNAME7
|
|
|
|
SMONTHNAME1
|
|
SMONTHNAME2
|
|
SMONTHNAME3
|
|
SMONTHNAME4
|
|
SMONTHNAME5
|
|
SMONTHNAME6
|
|
SMONTHNAME7
|
|
SMONTHNAME8
|
|
SMONTHNAME9
|
|
SMONTHNAME10
|
|
SMONTHNAME11
|
|
SMONTHNAME12
|
|
SMONTHNAME13
|
|
|
|
SABBREVMONTHNAME1
|
|
SABBREVMONTHNAME2
|
|
SABBREVMONTHNAME3
|
|
SABBREVMONTHNAME4
|
|
SABBREVMONTHNAME5
|
|
SABBREVMONTHNAME6
|
|
SABBREVMONTHNAME7
|
|
SABBREVMONTHNAME8
|
|
SABBREVMONTHNAME9
|
|
SABBREVMONTHNAME10
|
|
SABBREVMONTHNAME11
|
|
SABBREVMONTHNAME12
|
|
SABBREVMONTHNAME13
|
|
|
|
FONTSIGNATURE
|
|
|
|
|
|
|
|
ENDLOCALE
|
|
|
|
- Ends the locale specific section.
|
|
|
|
- Only used following the LOCALE keyword.
|
|
|
|
|
|
|
|
CALENDAR <num entries>
|
|
|
|
- Starts the calendar specific section.
|
|
|
|
- Use the ENDCALENDAR keyword to end the entire calendar specific section.
|
|
|
|
- Each set of calendar information to follow should be in the format:
|
|
|
|
|
|
BEGINCALENDAR <calendarid>
|
|
|
|
- The calendar information. The order of the information is
|
|
given below.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<keyword> <info>
|
|
|
|
or in some cases:
|
|
|
|
<keyword> <num> <info>
|
|
<info>
|
|
...
|
|
|
|
where
|
|
|
|
<keyword> is the keyword for the given information.
|
|
This string is ignored.
|
|
|
|
<num> is the number of entries for the keyword. This means
|
|
there will be 'num' number of entries, where each
|
|
entry MUST BE on a separate line. The keywords that
|
|
require the 'num' field are noted in the list of items
|
|
below.
|
|
|
|
<info> is the information to store in the data file. All
|
|
information will be stored as a Unicode string.
|
|
|
|
The escape sequence "\x" may be used to designate hex
|
|
values above 0x00ff, but ALL 4 digits of the Unicode
|
|
character MUST exist for this to work properly.
|
|
|
|
If the backslash character is to appear in the given
|
|
string (it's not part of an escape sequence), then
|
|
two backslashes must be used in succession.
|
|
|
|
White space (space and tab) is stripped from both the
|
|
front and the back of the string unless specifically
|
|
noted with the escape sequence. All other white space
|
|
is preserved.
|
|
|
|
To include TWO separate null-terminated strings for
|
|
one LCTYPE, the strings must be separated by \xffff.
|
|
This will be changed to 0x0000 in the binary file.
|
|
Currently, the second string will only be used by
|
|
the SMONTHNAME LCType information in the
|
|
GetDateFormatW api (Russian month names have different
|
|
grammar).
|
|
|
|
This section must have the following information (IN THE GIVEN
|
|
ORDER) following the BEGINCALENDAR keyword.
|
|
|
|
SCALENDAR
|
|
|
|
ITWODIGITYEARMAX
|
|
|
|
SERARANGES <num> (use \xffff for era string)
|
|
|
|
SSHORTDATE
|
|
SLONGDATE
|
|
|
|
IF_NAMES
|
|
|
|
SDAYNAME1
|
|
SDAYNAME2
|
|
SDAYNAME3
|
|
SDAYNAME4
|
|
SDAYNAME5
|
|
SDAYNAME6
|
|
SDAYNAME7
|
|
|
|
SABBREVDAYNAME1
|
|
SABBREVDAYNAME2
|
|
SABBREVDAYNAME3
|
|
SABBREVDAYNAME4
|
|
SABBREVDAYNAME5
|
|
SABBREVDAYNAME6
|
|
SABBREVDAYNAME7
|
|
|
|
SMONTHNAME1
|
|
SMONTHNAME2
|
|
SMONTHNAME3
|
|
SMONTHNAME4
|
|
SMONTHNAME5
|
|
SMONTHNAME6
|
|
SMONTHNAME7
|
|
SMONTHNAME8
|
|
SMONTHNAME9
|
|
SMONTHNAME10
|
|
SMONTHNAME11
|
|
SMONTHNAME12
|
|
SMONTHNAME13
|
|
|
|
SABBREVMONTHNAME1
|
|
SABBREVMONTHNAME2
|
|
SABBREVMONTHNAME3
|
|
SABBREVMONTHNAME4
|
|
SABBREVMONTHNAME5
|
|
SABBREVMONTHNAME6
|
|
SABBREVMONTHNAME7
|
|
SABBREVMONTHNAME8
|
|
SABBREVMONTHNAME9
|
|
SABBREVMONTHNAME10
|
|
SABBREVMONTHNAME11
|
|
SABBREVMONTHNAME12
|
|
SABBREVMONTHNAME13
|
|
|
|
|
|
|
|
ENDCALENDAR
|
|
|
|
- Ends the calendar specific section.
|
|
|
|
- Only used following the CALENDAR keyword.
|
|
|
|
|
|
|
|
(4) Locale Independent (Unicode) Translation Tables
|
|
|
|
- A semicolon may be used to denote a comment. The comment will be
|
|
read until the end of the current line. So, once a semicolon is
|
|
used, the rest of the current line is ignored.
|
|
|
|
|
|
UNICODE
|
|
|
|
- Starts the unicode section.
|
|
|
|
- Use the ENDUNICODE keyword to end the unicode section.
|
|
|
|
- Only the following keywords may be used between this keyword and
|
|
the ENDUNICODE keyword:
|
|
|
|
- ASCIIDIGITS
|
|
- FOLDCZONE
|
|
- COMP
|
|
- HIRAGANA
|
|
- KATAKANA
|
|
- HALFWIDTH
|
|
- FULLWIDTH
|
|
|
|
|
|
ENDUNICODE
|
|
|
|
- Ends the unicode section.
|
|
|
|
- Only used following the UNICODE keyword.
|
|
|
|
|
|
ASCIIDIGITS <num entries>
|
|
|
|
- The ascii digits translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<digit> <ascii>
|
|
|
|
|
|
FOLDCZONE <num entries>
|
|
|
|
- The fold compatibility zone translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<czone> <ascii>
|
|
|
|
|
|
HIRAGANA <num entries>
|
|
|
|
- The Katakana to Hiragana translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<katakana> <hiragana>
|
|
|
|
|
|
KATAKANA <num entries>
|
|
|
|
- The Hiragana to Katakana translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<hiragana> <katakana>
|
|
|
|
|
|
HALFWIDTH <num entries>
|
|
|
|
- The Full Width to Half Width translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<full width> <half width>
|
|
|
|
|
|
FULLWIDTH <num entries>
|
|
|
|
- The Half Width to Full Width translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<half width> <full width>
|
|
|
|
|
|
COMP <num entries>
|
|
|
|
- The precomposed and composite translation tables. Both versions
|
|
of the table will be built from this data.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<precomp> <base> <nonspace>
|
|
|
|
|
|
|
|
(5) Character Type Translation Tables
|
|
|
|
- A semicolon may be used to denote a comment. The comment will be
|
|
read until the end of the current line. So, once a semicolon is
|
|
used, the rest of the current line is ignored.
|
|
|
|
|
|
CTYPE <num entries>
|
|
|
|
- The character type translation table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<wchar> <ctype1> <ctype2> <ctype3>
|
|
|
|
|
|
|
|
(6) SortKey Translation Tables
|
|
|
|
- A semicolon may be used to denote a comment. The comment will be
|
|
read until the end of the current line. So, once a semicolon is
|
|
used, the rest of the current line is ignored.
|
|
|
|
|
|
SORTKEY
|
|
|
|
- Starts the sortkey section. This is the default sortkey table.
|
|
|
|
|
|
ENDSORTKEY
|
|
|
|
- Ends the sortkey section.
|
|
|
|
- Only used following the SORTKEY keyword.
|
|
|
|
|
|
DEFAULT <num entries>
|
|
|
|
- The default sortkey translation table.
|
|
|
|
- Contains the weights on a per code point basis.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<code pt> <SM> <AW> <DW> <CW> <COMP>
|
|
|
|
|
|
|
|
(7) Sort Tables Translation Tables
|
|
|
|
- A semicolon may be used to denote a comment. The comment will be
|
|
read until the end of the current line. So, once a semicolon is
|
|
used, the rest of the current line is ignored.
|
|
|
|
|
|
SORTTABLES
|
|
|
|
- Starts the sorttables section. This section contains all
|
|
sorting tables except the default sortkey table.
|
|
|
|
- Use the ENDSORTTABLES keyword to end the sort tables section.
|
|
|
|
- Only the following keywords may be used between this keyword and
|
|
the ENDSORTTABLES keyword:
|
|
|
|
- REVERSEDIACRITICS
|
|
- DOUBLECOMPRESSION
|
|
- IDEOGRAPH_LCID_EXCEPTION
|
|
- MULTIPLEWEIGHTS
|
|
- EXPANSION
|
|
- EXCEPTION
|
|
- COMPRESSION
|
|
|
|
|
|
ENDSORTTABLES
|
|
|
|
- Ends the sorttables section.
|
|
|
|
- Only used following the SORTTABLES keyword.
|
|
|
|
|
|
REVERSEDIACRITICS <num entries>
|
|
|
|
- The reverse diacritics table.
|
|
|
|
- This table contains all locale ids that require diacritics
|
|
to be sorted from right to left (instead of left to right).
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<lcid>
|
|
|
|
|
|
DOUBLECOMPRESSION <num entries>
|
|
|
|
- The double compression table.
|
|
|
|
- This table contains all locale ids that require special handling
|
|
of the compression characters (eg. Hungarian).
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<lcid>
|
|
|
|
|
|
IDEOGRAPH_LCID_EXCEPTION <num entries>
|
|
|
|
- The ideograph lcid exception table.
|
|
|
|
- This table contains all locale ids that require ideographs to be
|
|
sorted other than in their Unicode ordering. The name of the file
|
|
containing the ideograph exceptions is also given here.
|
|
|
|
- The file name may be no more than 8 characters in length. The
|
|
extension ".nls" will be added to the file name.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<lcid> <file name>
|
|
|
|
|
|
MULTIPLEWEIGHTS <num entries>
|
|
|
|
- The multiple weights table.
|
|
|
|
- This table contains a list of all scripts that need multiple
|
|
script members to represent the entire script (256 alphanumeric
|
|
weights is not enough).
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<first script member> <number of script members in range>
|
|
|
|
|
|
EXPANSION <num entries>
|
|
|
|
- The expansion (ligature) table.
|
|
|
|
- This table contains all possible expansion options for every
|
|
locale, so there is no need to distinguish between the
|
|
different locales.
|
|
|
|
- The sortkey table will contain the index into this table in
|
|
the AW field. For that reason, this table MUST be in the
|
|
correct order used by the sortkey default table and the
|
|
exception table.
|
|
|
|
- The maximum number of entries allowed in this table is 256.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<expansion code pt> <code pt 1> <code pt 2>
|
|
|
|
|
|
EXCEPTION <num entries>
|
|
|
|
- The exception table.
|
|
|
|
- This table contains all exceptions to the default table on
|
|
a per locale id basis.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
LCID <lcid> <num entries>
|
|
|
|
<code pt> <SM> <AW> <DW> <CW> <COMP>
|
|
|
|
|
|
COMPRESSION <num entries>
|
|
|
|
- The compression table.
|
|
|
|
- This table contains all compressions, both three to one and
|
|
two to one, on a per locale id basis.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
LCID <lcid>
|
|
|
|
TWO <num entries>
|
|
|
|
<code pt 1> <code pt 2> <SM> <AW> <DW <CW>
|
|
|
|
THREE <num entries>
|
|
|
|
<code pt 1> <code pt 2> <code pt 3> <SM> <AW> <DW> <CW>
|
|
|
|
|
|
|
|
(8) Ideograph Exception Tables
|
|
|
|
- A semicolon may be used to denote a comment. The comment will be
|
|
read until the end of the current line. So, once a semicolon is
|
|
used, the rest of the current line is ignored.
|
|
|
|
|
|
IDEOGRAPH_EXCEPTION <num entries> <file name>
|
|
|
|
- The ideograph exception table.
|
|
|
|
- The table to follow should be in the format:
|
|
|
|
<code pt> <SM> <AW>
|
|
|
|
|
|
|
|
|
|
|
|
Sample Files
|
|
------------
|
|
|
|
All sample files shown below are not real files. They are simply meant
|
|
to show the syntax of the different data files.
|
|
|
|
|
|
(1) Sample Code Page File
|
|
|
|
|
|
CODEPAGE 12
|
|
|
|
CPINFO 1 0x7F 0x2302
|
|
|
|
MBTABLE 11
|
|
|
|
0x00 0x0000
|
|
0x01 0x0001
|
|
0x02 0x0002
|
|
0x7F 0x2302
|
|
0xB0 0x2591
|
|
0xB1 0x2592
|
|
0xB2 0x2593
|
|
0xB3 0x2502
|
|
0xB4 0x2524
|
|
0xB5 0x2561
|
|
0xB6 0x2562
|
|
|
|
GLYPHTABLE 2
|
|
|
|
0x01 0x263A
|
|
0x02 0x263B
|
|
|
|
DBCSRANGE 2
|
|
|
|
0x51 0x51
|
|
|
|
DBCSTABLE 1
|
|
|
|
0x71 0x0025
|
|
|
|
0x80 0x81
|
|
|
|
DBCSTABLE 1
|
|
|
|
0x3e 0x003e
|
|
|
|
DBCSTABLE 2
|
|
|
|
0x3f 0x003f
|
|
0x40 0x0040
|
|
|
|
WCTABLE 11
|
|
|
|
0x0000 0x00
|
|
0x0001 0x01
|
|
0x0002 0x02
|
|
0x2302 0x7F
|
|
0x2502 0xB3
|
|
0x2524 0xB4
|
|
0x2561 0xB5
|
|
0x2562 0xB6
|
|
0x2591 0xB0
|
|
0x2592 0xB1
|
|
0x2593 0xB2
|
|
|
|
ENDCODEPAGE
|
|
|
|
|
|
|
|
(2) Sample Language File
|
|
|
|
|
|
LANGUAGE INTL
|
|
|
|
UPPERCASE 9
|
|
|
|
0x0061 0x0041
|
|
0x0062 0x0042
|
|
0x0063 0x0043
|
|
0x0064 0x0044
|
|
0x0065 0x0045
|
|
0x0066 0x0046
|
|
0x0067 0x0047
|
|
0x0068 0x0048
|
|
0x0069 0x0049
|
|
0xff41 0xff41 ; placeholder for exception
|
|
0xff42 0xff22 ; placeholder for exception
|
|
|
|
LOWERCASE 9
|
|
|
|
0x0041 0x0061
|
|
0x0042 0x0062
|
|
0x0043 0x0063
|
|
0x0044 0x0064
|
|
0x0045 0x0065
|
|
0x0046 0x0066
|
|
0x0047 0x0067
|
|
0x0048 0x0068
|
|
0x0049 0x0069
|
|
0xff21 0xff21 ; placeholder for exception
|
|
|
|
ENDLANGUAGE
|
|
|
|
|
|
EXCEPTION 2
|
|
|
|
LCID 0x00000000 2 1 ; default linguistic table
|
|
|
|
UPPERCASE
|
|
|
|
0xff41 0xff21
|
|
0xff42 0xff22
|
|
|
|
LOWERCASE
|
|
|
|
0xff21 0xff41
|
|
|
|
LCID 0x0000041f 2 2 ; Turkish
|
|
|
|
UPPERCASE
|
|
|
|
0x0069 0x0130
|
|
0x0131 0x0049
|
|
|
|
LOWERCASE
|
|
|
|
0x0049 0x0131
|
|
0x0130 0x0069
|
|
|
|
|
|
|
|
(3) Sample Locale File
|
|
|
|
|
|
LOCALE 1
|
|
|
|
BEGINLOCALE 0409 ; English - United States
|
|
|
|
ILANGUAGE 0409
|
|
SENGLANGUAGE English
|
|
SABBREVLANGNAME ENU
|
|
SISO639LANGNAME EN
|
|
SNATIVELANGNAME English
|
|
|
|
ICOUNTRY 1
|
|
SENGCOUNTRY United States
|
|
SABBREVCTRYNAME USA
|
|
SISO3166CTRYNAME US
|
|
SNATIVECTRYNAME United States
|
|
|
|
IDEFAULTLANGUAGE 0409
|
|
IDEFAULTCOUNTRY 1
|
|
IDEFAULTANSICODEPAGE 1252
|
|
IDEFAULTOEMCODEPAGE 437
|
|
|
|
SLIST ,
|
|
IMEASURE 1
|
|
|
|
SDECIMAL .
|
|
STHOUSAND ,
|
|
SGROUPING 3;0
|
|
IDIGITS 2
|
|
ILZERO 1
|
|
INEGNUMBER 1
|
|
SNATIVEDIGITS 0123456789
|
|
IDIGITSUBSTITUTION 1
|
|
|
|
SCURRENCY $
|
|
SINTLSYMBOL USD
|
|
SMONDECIMALSEP .
|
|
SMONTHOUSANDSEP ,
|
|
SMONGROUPING 3;0
|
|
ICURRDIGITS 2
|
|
IINTLCURRDIGITS 2
|
|
ICURRENCY 0
|
|
INEGCURR 0
|
|
SPOSITIVESIGN \x0000
|
|
SNEGATIVESIGN -
|
|
|
|
STIMEFORMAT 4 h:mm:ss tt
|
|
hh:mm:ss tt
|
|
H:mm:ss
|
|
HH:mm:ss
|
|
STIME :
|
|
ITIME 0
|
|
ITLZERO 0
|
|
ITIMEMARKPOSN 0
|
|
S1159 AM
|
|
S2359 PM
|
|
|
|
SSHORTDATE 6 M/d/yy
|
|
M/d/yyyy
|
|
MM/dd/yy
|
|
MM/dd/yyyy
|
|
yy/MM/dd
|
|
dd-MMM-yy
|
|
SDATE /
|
|
IDATE 0
|
|
ICENTURY 0
|
|
IDAYLZERO 0
|
|
IMONLZERO 0
|
|
|
|
SLONGDATE 4 dddd, MMMM dd, yyyy
|
|
MMMM dd, yyyy
|
|
dddd, dd MMMM, yyyy
|
|
dd MMMM, yyyy
|
|
ILDATE 0
|
|
|
|
ICALENDARTYPE 1
|
|
IOPTIONALCALENDAR 2 0\xffff
|
|
1\xffffGregorian Calendar
|
|
|
|
IFIRSTDAYOFWEEK 6
|
|
IFIRSTWEEKOFYEAR 0
|
|
|
|
SDAYNAME1 Monday
|
|
SDAYNAME2 Tuesday
|
|
SDAYNAME3 Wednesday
|
|
SDAYNAME4 Thursday
|
|
SDAYNAME5 Friday
|
|
SDAYNAME6 Saturday
|
|
SDAYNAME7 Sunday
|
|
|
|
SABBREVDAYNAME1 Mon
|
|
SABBREVDAYNAME2 Tue
|
|
SABBREVDAYNAME3 Wed
|
|
SABBREVDAYNAME4 Thu
|
|
SABBREVDAYNAME5 Fri
|
|
SABBREVDAYNAME6 Sat
|
|
SABBREVDAYNAME7 Sun
|
|
|
|
SMONTHNAME1 January
|
|
SMONTHNAME2 February
|
|
SMONTHNAME3 March
|
|
SMONTHNAME4 April
|
|
SMONTHNAME5 May
|
|
SMONTHNAME6 June
|
|
SMONTHNAME7 July
|
|
SMONTHNAME8 August
|
|
SMONTHNAME9 September
|
|
SMONTHNAME10 October
|
|
SMONTHNAME11 November
|
|
SMONTHNAME12 December
|
|
SMONTHNAME13 \x0000
|
|
|
|
SABBREVMONTHNAME1 Jan
|
|
SABBREVMONTHNAME2 Feb
|
|
SABBREVMONTHNAME3 Mar
|
|
SABBREVMONTHNAME4 Apr
|
|
SABBREVMONTHNAME5 May
|
|
SABBREVMONTHNAME6 Jun
|
|
SABBREVMONTHNAME7 Jul
|
|
SABBREVMONTHNAME8 Aug
|
|
SABBREVMONTHNAME9 Sep
|
|
SABBREVMONTHNAME10 Oct
|
|
SABBREVMONTHNAME11 Nov
|
|
SABBREVMONTHNAME12 Dec
|
|
SABBREVMONTHNAME13 \x0000
|
|
|
|
FONTSIGNATURE \x00af\x8000\x38cb\x0000\x0000\x0000\x0000\x0000\x0001\x0000\x0000\x8000\x00ff\x003f\x0000\xffff
|
|
|
|
ENDLOCALE
|
|
|
|
|
|
CALENDAR 5
|
|
|
|
|
|
BEGINCALENDAR 0
|
|
|
|
SCALENDAR 0
|
|
|
|
ITWODIGITYEARMAX 2029
|
|
|
|
SERARANGES 0
|
|
|
|
SSHORTDATE \x0000
|
|
SLONGDATE \x0000
|
|
|
|
IF_NAMES 0
|
|
|
|
|
|
BEGINCALENDAR 1
|
|
|
|
SCALENDAR 1
|
|
|
|
ITWODIGITYEARMAX 2029
|
|
|
|
SERARANGES 0
|
|
|
|
SSHORTDATE MM/dd/yy
|
|
SLONGDATE dddd, MMMM dd, yyyy
|
|
|
|
IF_NAMES 1
|
|
|
|
SDAYNAME1 Monday
|
|
SDAYNAME2 Tuesday
|
|
SDAYNAME3 Wednesday
|
|
SDAYNAME4 Thursday
|
|
SDAYNAME5 Friday
|
|
SDAYNAME6 Saturday
|
|
SDAYNAME7 Sunday
|
|
|
|
SABBREVDAYNAME1 Mon
|
|
SABBREVDAYNAME2 Tue
|
|
SABBREVDAYNAME3 Wed
|
|
SABBREVDAYNAME4 Thu
|
|
SABBREVDAYNAME5 Fri
|
|
SABBREVDAYNAME6 Sat
|
|
SABBREVDAYNAME7 Sun
|
|
|
|
SMONTHNAME1 January
|
|
SMONTHNAME2 February
|
|
SMONTHNAME3 March
|
|
SMONTHNAME4 April
|
|
SMONTHNAME5 May
|
|
SMONTHNAME6 June
|
|
SMONTHNAME7 July
|
|
SMONTHNAME8 August
|
|
SMONTHNAME9 September
|
|
SMONTHNAME10 October
|
|
SMONTHNAME11 November
|
|
SMONTHNAME12 December
|
|
SMONTHNAME13 \x0000
|
|
|
|
SABBREVMONTHNAME1 Jan
|
|
SABBREVMONTHNAME2 Feb
|
|
SABBREVMONTHNAME3 Mar
|
|
SABBREVMONTHNAME4 Apr
|
|
SABBREVMONTHNAME5 May
|
|
SABBREVMONTHNAME6 Jun
|
|
SABBREVMONTHNAME7 Jul
|
|
SABBREVMONTHNAME8 Aug
|
|
SABBREVMONTHNAME9 Sep
|
|
SABBREVMONTHNAME10 Oct
|
|
SABBREVMONTHNAME11 Nov
|
|
SABBREVMONTHNAME12 Dec
|
|
SABBREVMONTHNAME13 \x0000
|
|
|
|
|
|
BEGINCALENDAR 2
|
|
|
|
SCALENDAR 2
|
|
|
|
ITWODIGITYEARMAX 2029
|
|
|
|
SERARANGES 4 1989\xffff\x337b
|
|
1926\xffff\x337c
|
|
1912\xffff\x337d
|
|
1868\xffff\x337e
|
|
|
|
SSHORTDATE yy/MM/dd
|
|
SLONGDATE gg yyyy'\x5e74'M'\x6708'd'\x65e5'
|
|
|
|
IF_NAMES 0
|
|
|
|
|
|
BEGINCALENDAR 3
|
|
|
|
SCALENDAR 3
|
|
|
|
ITWODIGITYEARMAX 2029
|
|
|
|
SERARANGES 2 1911\xffffA.D.
|
|
0\xffffB.C.
|
|
|
|
SSHORTDATE yy/MM/dd
|
|
SLONGDATE gg yyyy'\x5e74'M'\x6708'd'\x65e5'
|
|
|
|
IF_NAMES 0
|
|
|
|
|
|
BEGINCALENDAR 4
|
|
|
|
SCALENDAR 4
|
|
|
|
ITWODIGITYEARMAX 2029
|
|
|
|
SERARANGES 2 1911\xffffA.D.
|
|
0\xffffB.C.
|
|
|
|
SSHORTDATE yy/MM/dd
|
|
SLONGDATE gg yyyy'\x5e74'M'\x6708'd'\x65e5'
|
|
|
|
IF_NAMES 0
|
|
|
|
|
|
ENDCALENDAR
|
|
|
|
|
|
|
|
(4) Sample Unicode File
|
|
|
|
|
|
UNICODE
|
|
|
|
ASCIIDIGITS 3
|
|
|
|
0x00B2 0x0032
|
|
0x00B3 0x0033
|
|
0x00B9 0x0031
|
|
|
|
FOLDCZONE 4
|
|
|
|
0xff01 0x0021
|
|
0xff02 0x0022
|
|
0xff03 0x0023
|
|
0xff04 0x0024
|
|
|
|
COMP 5
|
|
|
|
0x00C0 0x0041 0x0300
|
|
0x00C8 0x0045 0x0300
|
|
0x00CC 0x0049 0x0300
|
|
0x00D1 0x004E 0x0303
|
|
0x00D2 0x004F 0x0300
|
|
|
|
HIRAGANA 3
|
|
|
|
0x30a1 0x3041
|
|
0xff67 0x3041
|
|
0x30a2 0x3042
|
|
|
|
KATAKANA 4
|
|
|
|
0x3041 0x30a1
|
|
0x3042 0x30a2
|
|
0x3043 0x30a3
|
|
0x3044 0x30a4
|
|
|
|
HALFWIDTH 3
|
|
|
|
0x30d2 0xff8b
|
|
0x30d5 0xff8c
|
|
0x30d8 0xff8d
|
|
|
|
FULLWIDTH 4
|
|
|
|
0xff61 0x3002
|
|
0xff62 0x300c
|
|
0xff63 0x300d
|
|
0xff64 0x3001
|
|
|
|
ENDUNICODE
|
|
|
|
|
|
|
|
(5) Sample Character Type File
|
|
|
|
|
|
CTYPES 12
|
|
|
|
0x0000 0x0020 0x0000 0x0000
|
|
0x0009 0x0068 0x0009 0x0000
|
|
0x0020 0x0048 0x000A 0x0000
|
|
0x0021 0x0010 0x000B 0x0008
|
|
0x002F 0x0010 0x0003 0x0008
|
|
0x0030 0x0084 0x0003 0x0000
|
|
0x0041 0x0181 0x0001 0x0000
|
|
0x0048 0x0101 0x0001 0x0000
|
|
0x0061 0x0182 0x0001 0x0000
|
|
0x0067 0x0102 0x0001 0x0000
|
|
0x00BF 0x0010 0x000B 0x0008
|
|
0x00C0 0x0101 0x0001 0x0003
|
|
|
|
|
|
|
|
(6) Sample Sortkey File
|
|
|
|
|
|
SORTKEY
|
|
|
|
DEFAULT 4
|
|
|
|
0x0030 2 4 2 2 0
|
|
0x0031 2 5 2 2 0
|
|
0x0065 2 7 2 3 2
|
|
0x0066 2 8 2 3 3
|
|
|
|
ENDSORTKEY
|
|
|
|
|
|
|
|
(7) Sample Sort Tables File
|
|
|
|
|
|
SORTTABLES
|
|
|
|
REVERSEDIACRITICS 4
|
|
|
|
0x0000040c
|
|
0x0000080c
|
|
0x00000c0c
|
|
0x0000100c
|
|
|
|
|
|
DOUBLECOMPRESSION 1
|
|
|
|
0x0000040e
|
|
|
|
|
|
IDEOGRAPH_LCID_EXCEPTION 4
|
|
|
|
0x00010404 big5
|
|
0x00010804 big5
|
|
0x00010411 xjis
|
|
0x00010412 ksc
|
|
|
|
|
|
MULTIPLEWEIGHTS 1
|
|
|
|
36 10
|
|
|
|
|
|
EXPANSION 2
|
|
|
|
0x00c6 0x0041 0x0045
|
|
0x00e6 0x0061 0x0065
|
|
|
|
|
|
EXCEPTION 2
|
|
|
|
LCID 0x0000040a 2
|
|
|
|
0x0065 2 7 2 3 2
|
|
0x0066 2 8 2 3 3
|
|
|
|
|
|
LCID 0x0000040c 2
|
|
LCID 0x0000080c
|
|
|
|
0x0030 2 4 2 2 0
|
|
0x0031 2 5 2 2 0
|
|
|
|
|
|
COMPRESSION 2
|
|
|
|
LCID 0x0000040a
|
|
LCID 0x0000080a
|
|
|
|
TWO 2
|
|
|
|
0x0043 0x0048 2 4 2 3
|
|
0x0063 0x0068 2 4 2 2
|
|
|
|
THREE 1
|
|
|
|
0x0043 0x0048 0x0049 2 4 2 3
|
|
|
|
|
|
LCID 0x0000080c
|
|
|
|
TWO 1
|
|
|
|
0x0063 0x0068 2 4 2 2
|
|
|
|
THREE 0
|
|
|
|
|
|
ENDSORTTABLES
|
|
|
|
|
|
|
|
(8) Sample Ideograph Exceptions File
|
|
|
|
|
|
IDEOGRAPH_EXCEPTION 4 xjis
|
|
|
|
0xfa22 185 243
|
|
0xfa23 185 244
|
|
0xfa24 185 245
|
|
0xfa25 185 246
|
|
|