|
|
Definitions:
statement delimiter: a new statement will always follow a linebreak character: a) unless the following character is the continuation character b) if the following character is another linebreak character, it is a null statement. c) statements consisting entirely of whitespace characters, comments or combinations are considered null statements. d) the parser ignores null statements. e) parser eats everything up to linebreak character and each subsequent line that begins with the continuation char in the event of a parsing error. The Construct delimiters are a statement in and of themselves and they also serve to terminate the previous statement and delimit the start of the following statement. you may think of a <construct_delimiter> as equivalent to <linebreak><construct_delimiter><linebreak>
Whitespace characters: space, TAB Linebreak characters: <CR>, <LF> if both occur consecutively the pair is treated as one linebreak. White characters: either Whitespace or Linebreak characters. continuation character: + NEW DEF: when used as a continuation char the '+' must appear as the first char in a line. The parser will interpret the preceeding linebreak as whitespace. Logical linebreak: a linebreak character that is NOT followed by a continuation character. Continuation linebreak: a linebreak character followed by a continuation character. Arbitrary whitespace: Whitespace chars or comments or Continuation Linebreak Construct delimiters: curly braces { } A linebreak character is not required to preceed or proceed a construct delimiter. You may think of a <construct_delimiter> as logically equivalent to <linebreak><construct_delimiter><linebreak> Construct delimiters also delimit the scope of macro definitions. If a macro was defined within a nesting level created by a pair of construct delimiters, it remains defined only within that nesting level. Nesting level: the logical set of statements enclosed by a matching pair of construct delimiters. Statement delimiter: Logical linebreak or Construct delimiter. Keyword - value separator: colon : string delimiter: double quotes "" macro invocation: equals = comment indicator: *% Comments may be inserted immediately preceeding any logical or continuation linebreak. They may contain any characters except linebreak characters. The linebreak character terminates the comment. The comment indicator must be preceeded by a White Character. (unless the comment indicator is the first byte in the source file.) Grouping operator: () used in certain value constructs. Hex substring delimiter: <> Escape character : % used in string and parameter constructs.
The above characters have reserved meanings and may not be used in any keyword, symbol name or any user defined name.
Name spaces:
If attributes keyword are used within other constructs say feature or global keyword inside an option, that keyword must be declared using the EXTERN_FEATURE: or EXTERN_GLOBAL: modifier. Otherwise the attribute type expected (dictionary used) will be defined by the state.
the namespaces of the attribute keywords for various constructs may overlap each other, since we rely on the above rules to assign the namespace, but they cannot overlap non-attribute keywords.
Keywords: predefined keywords must begin with '*' the remainder of the keyword may be comprised of 'A' to 'Z', 'a' to 'z', '0' - '9', '_' and may be terminated by an optional '?'.
Symbol Keywords do not begin with '*' and may be any name defined by the user, they must be comprised of the same characters as normal keywords. Symbol Keywords are used as the name of Value macros and Font names in certain constructs.
Parsing rules:
Values: on some Keywords the Value is ignored by the parser. In these cases the Value (and the : delimiter) may be omitted for example: *Macros *Macros: PaperNames are both valid.
Block Macro definitions: if the definition (body of a BlockMacro) contains braces, the braces must appear in pairs and the correct order. ie { must appear before }. braces may be nested within the body.
Parsing Level 0: this is the outermost level of parsing. At this level the characters {, } are interpreted as construct delimiters, *% begins a comment etc. This is contrasted to the parsing rules applied to higher level objects like strings where these charcters have no special meaning.
Linebreak characters may only appear in parsing level 0. Their appearance at any time terminates parsing of the current statement.
parser eats everything up to linebreak character in the event of a parsing error.
Statements begin with either a *<keyword> or a <symbol keyword> where the keyword is a parser recognized keyword token and where <symbol keyword> is a parser unrecognized token which may represent a ValueMacroName or a TTFontName. such tokens must not begin with '*'. and will be marked as SYMBOL in the TokenMap.
In general arbitary Whitespaces may appear between any entities recognized at Parsing level 0. If Whitespaces are permitted within such entities it will noted.
For added robustness, macro strings will not be expanded to their binary equivalents. This prevents the insertion of random Linebreak characters in the stream.
Rule for GPD authors: for maximum robustness and error recovery, place level 0 braces (Construct delimiters) on a separate line. Do not place Construct delimiters on the same line where questionable keywords and constructs are used.
Heap useage:
the heap will be divided into several sections, each large enough to hold whatever may come. Growth of the heap sections is not allowed.
Strings, composite objects like RECTS: holds all strings, offsets referenced from beginning of string section.
Arrays of various types: each type of array is assigned its own dedicated memory buffer. A master table contains pointers to each array, its size and current entry. Once data is entered into the array, the keeper of the data need only remember the index of the array the data was written into.
After all parsing operations are complete, we will consolidate the arrays into one memory space and update the master table accordingly.
Some composite values require an indefinite amount of storage or reside in dedicated structures. (for example strings, lists and UIconstraints). Such values are stored in 2 parts, a fixed sized link, and and the part the link refers to. This part may be variable sized and may occupy heap space or one or more dedicated structures or some combination thereof. Since the link is always of a known size, it may be stored in a field in a structure etc.
The following table lists the values supported by the parser and how they are structured:
value type: Strings link: ARRAYREF dwOffset field specifies heap offset of start of string dwCount field specifies string length excluding terminating NULL. body: Null terminated array of bytes stored in heap.
value type: LIST value type: QualifiedName
Shortcuts that cause headaches:
*Command: 2 forms exist.
Macros:
a macroDefinition cannot be self-referencing macroDefintions cannot be forward referenced. ie only a previously defined and fully resolved macro can be referenced.
scope: an Macro is defined (referenceable) only after parsing the closing brace of its definition and until encoutering a closing brace that signifies the termination of the level the macro was defined in.
namespaces: since macro definitions are stored in a stack, defining a second macro with the same name does not necessarily destroy the first definition. If The first macro was defined outside of the scope of the 2nd, it will be visible once the parser leaves the scope of the 2nd Macro.
ValueMacros: Only string ValueMacros may be nested. That if any valueMacro definition references another valueMacro, the parser will assume the definition is a stringMacro, and the macro being referenced is also a stringvalue.
BlockMacros: a blockMacro may contain other Macrodefinitions but those definitions can only be referenced inside the block macro. They will not appear when the blockmacro is actually referenced. A BlockMacroName may NOT be substituted by a ValueMacro either in a *BlockMacro or *InsertBlock statement.
----- more parsing rules ------
the first non-null line of the root GPD sourcefile must be: *GPDSpecVersion:
arbitary whitespace is allowed between tokens comprising a command parameter. arbitary whitespace is allowed anywhere within a hex substring. --------------------------------------------------------------
Currently, these are the known types of keywords:
CONSTRUCTS: introduces a construct (causes a parser context change) usually followed by open brace in next statement. construct is terminated by matching close brace.
*UIGroup, *Switch, *Case, *Default, *Command *FontCartridge, *TTFontSubs, *Feature, *Option *OEM, *BlockMacro, *Macros a construct may be thought of as a type of structure initialization. ONly certain keywords are may be used inside of a construct. Some of these keywords may only be using within their associated construct and no where else.
LOCAL ATTRIBUTES: initializes a value in a construct. GLOBAL ATTRIBUTES: initializes a value in the global structure.
local and global attributes may be subdivided into freefloating and fixed. A fixed attribute must be used in the same nesting level as the construct it is associated with. A freefloating attribute may be used within another construct as long as that construct is contained within the construct associated attribute.
SPECIAL ATTRIBUTE: initializes and adds another item to a dedicated or global list or a list in construct. or has side effects requiring special processing. examples:
*Installable?, - Causes an installable feature to be synthesized. but parser may deal with this after all Feature/Options have been parsed. So not really.
Adds link to special tree structure: *Constraints, *InvalidCombination, *InvalidInstallableCombination, *InstalledConstraints, *NotInstalledConstraints
The values introduced by these keywords are additive (like using a LIST): *Font
*Command:<commandName>:<invocation> a shorthand
*MemConfigKB a shorthand way of creating an entire memory option.
LIST(<QualifiedName>,<QualifiedName>,<QualifiedName>) may be written as: <FeatureName>.LIST(<OptionName>,<OptionName>,<OptionName>)
if there are other types of keywords let me know.
Special Parsing contexts: in which User defines new keywords simply by referencing them.
*TTFontSubs: { <TTFontFaceName>: <DeviceFontID> .... not actually a symbol, but adds a string, number pair to a list. May be implemented during construction as a symbol. } *Macros: { <ValueMacroName>:<macrovalue> }
*FontCart: note the FontCart construct is ROOT_ONLY and is not multivalued. Each construct with a unique SymbolName corresponds to a dedicated FONTCART structure.
If we want to make FontCarts multivalued, we introduce a new keyword *AvailFontCarts: LIST(symbol1, symbol2, symbol3) which is a FreeFloating Global.
MacroProcessing: =<ValueMacroName> where a value or component of a string is expected =<BlockMacroName> following a symbolname following a construct Keyword. *InsertBlock: <BlockMacroName>
recognized value types: ORDER :== <section>.<number>
SYMBOLS: Any user defined (not recognized by the parser) token used to identify a statement or construct or value. <CommandNames> are not symbols because the parser has a list of recognized Valid Unidrv commands. Non Macro Symbols may be forward referenced: ie *DefaultOption or *Constraints may reference a symbol that is defined later.
where defined:
Associated Keyword: <symbol type> *Macros: <ValueMacroNames> not the Group Name! *BlockMacro: <BlockMacroNames> *Feature: <featureSymbol> *Option: <optionSymbol> *OEM: <OEM group name> saved in symbol tree for possible future use. *TTFontSubs: TTFontnames may be stored as symbols, but are not symbols in the strictest sense.
constructs not using symbols: *TTFontSubs: <ON | OFF> predefined. *UIGroup: <Group name> optional - not used by parser. *Default: <optional tag> optional - not used by parser. *Command: <Unidrv Command Name> predefined. CmdSelect is a special name which triggers special processing. *FontCartridge: <optional tag> optional - not used by parser. Implementation hint: use macros to keep all definitions in one place. or introduce *AvailFontCart: LIST(<FontCartSymbol>, <FontCartSymbol>) inside constructs.
where referenced: *InsertMacro: <BlockMacroNames> *<ConstructKeyword>: <symboldef> =<BlockMacroName>
*<anykeyword>: =<ValueMacroName> except *BlockMacro, *InsertMacro, *Include *Switch: <FeatureName> *Case: <OptionName>
Currently the parser saves symbols defined in *Feature and *Option keywords and remembers symbol references made in *Switch and *Case keywords.
The include keyword:
must not appear within a macrodefinition must not reference a macrovalue must be terminated by a linebreak not { or } construct.
--- state machine ----
The parser treates construct keywords as operators which change the state of the parser. (create state transitions.)
the set of allowed transitions is defined in the table AllowedTransitions this table enforces several rules:
the construct _TTFONTSUBS can only appear at the root level.
no constructs may appear within OEM, FONTCART, TTFONTSUBS, COMMAND constructs.
The following code fragment is a comprehensive list of the allowed state transitions:
pst = astAllowedTransitions[STATE_ROOT] ;
pst[CONSTRUCT_UIGROUP] = STATE_UIGROUP; pst[CONSTRUCT_FEATURE] = STATE_FEATURE; pst[CONSTRUCT_SWITCH] = STATE_SWITCH_ROOT; pst[CONSTRUCT_COMMAND] = STATE_COMMAND; pst[CONSTRUCT_FONTCART] = STATE_FONTCART; pst[CONSTRUCT_TTFONTSUBS] = STATE_TTFONTSUBS; pst[CONSTRUCT_OEM] = STATE_OEM;
pst = astAllowedTransitions[STATE_UIGROUP] ;
pst[CONSTRUCT_UIGROUP] = STATE_UIGROUP; pst[CONSTRUCT_FEATURE] = STATE_FEATURE;
pst = astAllowedTransitions[STATE_FEATURE] ;
pst[CONSTRUCT_OPTION] = STATE_OPTIONS; pst[CONSTRUCT_SWITCH] = STATE_SWITCH_FEATURE;
pst = astAllowedTransitions[STATE_OPTIONS] ;
pst[CONSTRUCT_SWITCH] = STATE_SWITCH_OPTION; pst[CONSTRUCT_COMMAND] = STATE_COMMAND; pst[CONSTRUCT_OEM] = STATE_OEM;
pst = astAllowedTransitions[STATE_SWITCH_ROOT] ;
pst[CONSTRUCT_CASE] = STATE_CASE_ROOT; pst[CONSTRUCT_DEFAULT] = STATE_DEFAULT_ROOT;
pst = astAllowedTransitions[STATE_SWITCH_FEATURE] ;
pst[CONSTRUCT_CASE] = STATE_CASE_FEATURE; pst[CONSTRUCT_DEFAULT] = STATE_DEFAULT_FEATURE;
pst = astAllowedTransitions[STATE_SWITCH_OPTION] ;
pst[CONSTRUCT_CASE] = STATE_CASE_OPTION; pst[CONSTRUCT_DEFAULT] = STATE_DEFAULT_OPTION;
pst = astAllowedTransitions[STATE_CASE_ROOT] ;
pst[CONSTRUCT_SWITCH] = STATE_SWITCH_ROOT; pst[CONSTRUCT_COMMAND] = STATE_COMMAND; pst[CONSTRUCT_OEM] = STATE_OEM;
pst = astAllowedTransitions[STATE_DEFAULT_ROOT] ;
pst[CONSTRUCT_SWITCH] = STATE_SWITCH_ROOT; pst[CONSTRUCT_COMMAND] = STATE_COMMAND; pst[CONSTRUCT_OEM] = STATE_OEM;
pst = astAllowedTransitions[STATE_CASE_FEATURE] ;
pst[CONSTRUCT_SWITCH] = STATE_SWITCH_FEATURE; pst[CONSTRUCT_COMMAND] = STATE_COMMAND; pst[CONSTRUCT_OEM] = STATE_OEM;
pst = astAllowedTransitions[STATE_DEFAULT_FEATURE] ;
pst[CONSTRUCT_SWITCH] = STATE_SWITCH_FEATURE; pst[CONSTRUCT_COMMAND] = STATE_COMMAND; pst[CONSTRUCT_OEM] = STATE_OEM;
pst = astAllowedTransitions[STATE_CASE_OPTION] ;
pst[CONSTRUCT_SWITCH] = STATE_SWITCH_OPTION; pst[CONSTRUCT_COMMAND] = STATE_COMMAND; pst[CONSTRUCT_OEM] = STATE_OEM;
pst = astAllowedTransitions[STATE_DEFAULT_OPTION] ;
pst[CONSTRUCT_SWITCH] = STATE_SWITCH_OPTION; pst[CONSTRUCT_COMMAND] = STATE_COMMAND; pst[CONSTRUCT_OEM] = STATE_OEM;
--- multiple statements and redefinitions: ------
for standard attributes, if two statements containing that attribute with different values appears in the same construct, the attribute takes the latter occuring value.
If the attribute is defined to be FreeFloating, it may appear multiple times in different *Option or *Case constructs. In this case if the effect of the multiple occurances is to add new branches which are compatible with the existing tree, or to reinitialize the value of a node in the existing tree that is an accepted use of multiple occuring attributes. However if the effect is to define a new branch which is incompatible with the existing tree, that is an error, and the latter initialization of the attribute is ignored.
There is one exception to the rule of adding conflicting branches to the attribute tree. That exception allows default initializers to be created. If an attribute is assigned a value which is subsequently made multivalued, the initial value becomes the default initializer unless the GPD author explicitly specified a 'default' case when making the attribute multivalued.
Note the order cannot be reversed. An attribute which is already defined to be multivalued cannot subsequently be defined to be fewer valued.
--- state machine ----
the set of allowed transitions is defined in the table AllowedTransitions this table enforces several rules:
the construct _TTFONTSUBS can only appear at the root level.
no constructs may appear within OEM, FONTCART, TTFONTSUBS, COMMAND constructs.
---- use of switch/case constructs -----
The same feature must not be referenced in nested constructs. This will produce an attribute tree that contains the same feature at two different levels. similarly... an attribute tree should not be constructed piecemeal. It is an error if the tree is subsequently redefined/elaborated using a different feature nesting order. ----- Severity of errors:
!!!!!: parser is non-compilable/non-functional unless this is resolved. !!!!: unfinished functionality. Some legal GPD files will cause corruption. !!!: integrity check omitted - a corrupt file may be inadvertantly generated if resource limitations are encountered. !!: syntax error in GPD may cause widespread corruption !: emit useful message for user. BUG_BUG: wish item - user friendlier error message etc. parser self-consistency check, self diagnostics. more general, elegant, faster, more complex code etc.
Note: PARANOID BUG_BUGs indicate error conditions that are the result of coding errors (mistaken assumptions, incomplete code paths etc) and are not the result of improper GPD syntax, or resource constraints (overflow of fixed length buffers etc).
All originating error messages should report the name of the function, name of variable or system call that is out of range or invalid.
Later, if a caller function sees a failure return value, it may want to tack on an extra message say keyword or line number where error occured.
A if a function returns with a failure condition, the caller may at its discretion increase the severity of the error. For example if the caller passed a string to be parsed and it failed, the string parsing function may raise a tiny error condition. But if the caller was going to use the string to open a GPD or resource file, then this suddenly becomes a major problem.
A function may never reduce the severity of an error unless code was just executed which will migitate the source of the problem. Don't select ERRSEV_RESTART unless there is a handler on the next go round to solve the initial problem. An endless loop may result otherwise.
|