mirror of https://github.com/lianthony/NT4.0
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
167 lines
7.3 KiB
167 lines
7.3 KiB
COMPILER ORGANISATION
|
|
---------------------
|
|
.OVERVIEW OF THE COMPILATION PROCESS
|
|
-----------------------------------
|
|
..INPUT TO THE COMPILER
|
|
---------------------
|
|
The MIDL compiler (MIDL) takes as input, Interface Definition
|
|
statements written in the Microsoft Interface Defintion Language.
|
|
The input containing the statements is a file with the
|
|
".IDL" extension (called the IDL file). In addition an Attibute
|
|
Configuration file with the ".ACF" extension (called the ACF file),
|
|
is also used for input after the IDL file has been processed.
|
|
|
|
..THE COMPILER PASSES
|
|
-------------------
|
|
The compiler is organised into 4 distinct passes m1,m2,m3,m4
|
|
(meaning midl 1, midl 2 and so on...). The responsibilites
|
|
of the various passes is as follows:
|
|
|
|
m1 : Parsing the idl file.
|
|
m2 : Parsing the acf file. (optional)
|
|
m3 : optimisation. (optional)
|
|
m4 : code generation.
|
|
|
|
In addition, a driver program handles parsing of the command line,
|
|
and depending upon the user specified options, invokes the passes.
|
|
The big picture for the compiler looks like this:
|
|
|
|
(FIGURE)
|
|
|
|
..OUTPUT FROM THE COMPILER
|
|
------------------------
|
|
The output from the MIDL compiler is a set of ".c" and ".h" files
|
|
containing the stub routines and prototypes. These files are then
|
|
linked with the client modules of the app and server modules.
|
|
|
|
APP.IDL will produce : app_c.c and app_c.h for client side
|
|
: app_s.c and app_s.h for server side.
|
|
|
|
.A LOOK AT THE COMPILER PASSES
|
|
-----------------------------
|
|
..THE COMPILER DRIVER (m0)
|
|
---------------------
|
|
|
|
TBD
|
|
|
|
..PASS1 (m1)
|
|
------------
|
|
Pass 1 (m1) of the compiler is a parsing pass the input to which is
|
|
a stream of token produced by the lexical analyser (lexer) The two
|
|
main components of m0 thus, are the parser and the lexer.
|
|
|
|
The lexer is encapsulated in a function called yylex(). The lexer
|
|
is called by the parser to supply tokens as and when needed. It is
|
|
a hand-coded lexer which operates off a state-transition table.
|
|
Starting from state 0 , the lexer accepts a character and makes a
|
|
transition to a new state depending upon the character. This process
|
|
repeats till a whole token has been recognised. White spaces, comments
|
|
and newlines are ignored by the lexer. Tokens returned by the lexer
|
|
can be keywords, identifiers, numeric and string constants, and
|
|
characters which form the syntax of the IDL. Keyword recognition is
|
|
the responsiblity of the lexer. A keyword is recognised as an
|
|
identifier intially. A preallocated keyword table is used to
|
|
distinguish between keywords and identifiers. If an identifier is
|
|
found in the keyword table it is returned as a keyword by the lexer.
|
|
|
|
The parser is automatically generated by the yacc parser generator.
|
|
The input to the parser generator is a source file containing
|
|
grammar productions for the IDL . The output is a ".c" file which
|
|
contains the parsing tables along with the driver routine for the
|
|
parser. The parser is encapsulated in a function called yyparse().
|
|
|
|
The process of recognition of a valid syntactic construct consists of
|
|
reductions of grammar rules specified in the grammar file (grammar.y).
|
|
Interspersed with the grammar rules are action routines which are
|
|
coded by the compiler writers and which are executed after the
|
|
production that they are in gets reduced.
|
|
|
|
Apart from parsing, m0 is responsible for building the type graph
|
|
the type data base and the symbol table. The process of building
|
|
the type graph and the symbol table is described elsewhere (WHERE??).
|
|
Suffice it to say that as and when productions are recognised, the
|
|
type graph and symbol table are built.
|
|
|
|
As types are defined, they also get entered in the type data base.
|
|
The type data base contains reference counts which indicate the
|
|
usage of the type and are useful in determining the kind of
|
|
marshalling that the type will undergo. Type sub-graphs corresponding
|
|
to procedures (loosely classified as signatures) also get entered into
|
|
the type data base.
|
|
|
|
|
|
..PASS2 (m2)
|
|
------------
|
|
|
|
Pass 2 is responsible for parsing the ACF file. This process follows
|
|
the IDL parsing. The m0 checks for the presence of an ACF file and
|
|
invokes pass 2 (m2) if needed.
|
|
|
|
The parser for pass2 also is generated by the parser generator. The
|
|
same lexical analyser is used even for pass 2. Since the parser
|
|
generator generates the parser function yyparse(), it results in
|
|
a name clash with the m1 parse. The proposed scheme for preventing
|
|
the name clash is to redefine yyparse as (say) yy2parse and similarly
|
|
renaming any global variables belonging to the parser module in an
|
|
include file which is included when the generated parser is compiled.
|
|
|
|
The m2 verifies that the acf conforms to the syntax and semantics of
|
|
NIDL. m2 acts on the type graph generated by m1 and qualifies it
|
|
with the attributes collected from the acf. No new data structures
|
|
are introduced by m2.
|
|
|
|
..PASS3 (m3)
|
|
------------
|
|
|
|
Pass 3 is the optimisation pass. This pass determines the kind of
|
|
marshalling a type or a procedure will undergo. Working off the type
|
|
data base, m3 determines how types and procedures will be marshalled.
|
|
Details of the optimisation are described in chapter (WHERE??) of this
|
|
document.
|
|
|
|
..PASS4 (m4)
|
|
------------
|
|
|
|
m4 generates the stub code and include files for the application.
|
|
Details of this pass are TBD.
|
|
|
|
.CODE ORGANISATION
|
|
------------------
|
|
|
|
The above description decribes the logical layout of the passes. The
|
|
physical code layout, generally parallels the logical layout, with
|
|
some other considerations in mind.
|
|
|
|
The most important consideration is that the compiler would be expected
|
|
to work under the DOS , OS/2 and NT environments. For DOS, the size
|
|
of the compiler code in memory is an important factor in deciding
|
|
the layout of the compiler. More code residing in memory means lesser
|
|
memory available for compiler internal storage and the smaller is
|
|
the IDL file that can be compiled.
|
|
|
|
Since all the passes are fairly independent of each other and only
|
|
share the data structures among themselves, it is proposed that
|
|
we overlay the various passes, with the driver (m0) residing in memory
|
|
at all times and fanning out control. Overlaying is explicit under
|
|
DOS with the modules partitioned into various overlays at link time.
|
|
|
|
Under NT and OS/2 the same effect can be achieved by defining the
|
|
various passes into segments which can be loaded on demand. Again,
|
|
m0 is always loaded.
|
|
|
|
Although overlaying is not of immediate concern, it serves well to
|
|
be prepared for a quick change to this if needed. Keeping this in
|
|
mind, it is proposed to organise the compiler passes such that
|
|
each pass is (almost) self sufficient. Each compiler pass will be
|
|
organised into modules, with one global entry point into the module.
|
|
In other words, calls from one module to another WILL NOT (hopefully)
|
|
happen. The advantage will be that in the overlayed scheme, overlay
|
|
(or segment) swapping will not occur. This does not apply to routines
|
|
like type graph driver/ symbol table driver routines which are globally
|
|
accessed. Such routines will be part of the m0 module, which stays
|
|
in memory all the time. Thus all except one routine defined in every
|
|
module will be static. Following this discipline will mean that a
|
|
change later to the overlayed scheme will be quick and painless. The
|
|
other, though relatively less significant, advantage is that static
|
|
routines within a module can be "near" routines thus saving code.
|
|
|