Windows NT 4.0 source code leak
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

167 lines
7.3 KiB

COMPILER ORGANISATION
---------------------
.OVERVIEW OF THE COMPILATION PROCESS
-----------------------------------
..INPUT TO THE COMPILER
---------------------
The MIDL compiler (MIDL) takes as input, Interface Definition
statements written in the Microsoft Interface Defintion Language.
The input containing the statements is a file with the
".IDL" extension (called the IDL file). In addition an Attibute
Configuration file with the ".ACF" extension (called the ACF file),
is also used for input after the IDL file has been processed.
..THE COMPILER PASSES
-------------------
The compiler is organised into 4 distinct passes m1,m2,m3,m4
(meaning midl 1, midl 2 and so on...). The responsibilites
of the various passes is as follows:
m1 : Parsing the idl file.
m2 : Parsing the acf file. (optional)
m3 : optimisation. (optional)
m4 : code generation.
In addition, a driver program handles parsing of the command line,
and depending upon the user specified options, invokes the passes.
The big picture for the compiler looks like this:
(FIGURE)
..OUTPUT FROM THE COMPILER
------------------------
The output from the MIDL compiler is a set of ".c" and ".h" files
containing the stub routines and prototypes. These files are then
linked with the client modules of the app and server modules.
APP.IDL will produce : app_c.c and app_c.h for client side
: app_s.c and app_s.h for server side.
.A LOOK AT THE COMPILER PASSES
-----------------------------
..THE COMPILER DRIVER (m0)
---------------------
TBD
..PASS1 (m1)
------------
Pass 1 (m1) of the compiler is a parsing pass the input to which is
a stream of token produced by the lexical analyser (lexer) The two
main components of m0 thus, are the parser and the lexer.
The lexer is encapsulated in a function called yylex(). The lexer
is called by the parser to supply tokens as and when needed. It is
a hand-coded lexer which operates off a state-transition table.
Starting from state 0 , the lexer accepts a character and makes a
transition to a new state depending upon the character. This process
repeats till a whole token has been recognised. White spaces, comments
and newlines are ignored by the lexer. Tokens returned by the lexer
can be keywords, identifiers, numeric and string constants, and
characters which form the syntax of the IDL. Keyword recognition is
the responsiblity of the lexer. A keyword is recognised as an
identifier intially. A preallocated keyword table is used to
distinguish between keywords and identifiers. If an identifier is
found in the keyword table it is returned as a keyword by the lexer.
The parser is automatically generated by the yacc parser generator.
The input to the parser generator is a source file containing
grammar productions for the IDL . The output is a ".c" file which
contains the parsing tables along with the driver routine for the
parser. The parser is encapsulated in a function called yyparse().
The process of recognition of a valid syntactic construct consists of
reductions of grammar rules specified in the grammar file (grammar.y).
Interspersed with the grammar rules are action routines which are
coded by the compiler writers and which are executed after the
production that they are in gets reduced.
Apart from parsing, m0 is responsible for building the type graph
the type data base and the symbol table. The process of building
the type graph and the symbol table is described elsewhere (WHERE??).
Suffice it to say that as and when productions are recognised, the
type graph and symbol table are built.
As types are defined, they also get entered in the type data base.
The type data base contains reference counts which indicate the
usage of the type and are useful in determining the kind of
marshalling that the type will undergo. Type sub-graphs corresponding
to procedures (loosely classified as signatures) also get entered into
the type data base.
..PASS2 (m2)
------------
Pass 2 is responsible for parsing the ACF file. This process follows
the IDL parsing. The m0 checks for the presence of an ACF file and
invokes pass 2 (m2) if needed.
The parser for pass2 also is generated by the parser generator. The
same lexical analyser is used even for pass 2. Since the parser
generator generates the parser function yyparse(), it results in
a name clash with the m1 parse. The proposed scheme for preventing
the name clash is to redefine yyparse as (say) yy2parse and similarly
renaming any global variables belonging to the parser module in an
include file which is included when the generated parser is compiled.
The m2 verifies that the acf conforms to the syntax and semantics of
NIDL. m2 acts on the type graph generated by m1 and qualifies it
with the attributes collected from the acf. No new data structures
are introduced by m2.
..PASS3 (m3)
------------
Pass 3 is the optimisation pass. This pass determines the kind of
marshalling a type or a procedure will undergo. Working off the type
data base, m3 determines how types and procedures will be marshalled.
Details of the optimisation are described in chapter (WHERE??) of this
document.
..PASS4 (m4)
------------
m4 generates the stub code and include files for the application.
Details of this pass are TBD.
.CODE ORGANISATION
------------------
The above description decribes the logical layout of the passes. The
physical code layout, generally parallels the logical layout, with
some other considerations in mind.
The most important consideration is that the compiler would be expected
to work under the DOS , OS/2 and NT environments. For DOS, the size
of the compiler code in memory is an important factor in deciding
the layout of the compiler. More code residing in memory means lesser
memory available for compiler internal storage and the smaller is
the IDL file that can be compiled.
Since all the passes are fairly independent of each other and only
share the data structures among themselves, it is proposed that
we overlay the various passes, with the driver (m0) residing in memory
at all times and fanning out control. Overlaying is explicit under
DOS with the modules partitioned into various overlays at link time.
Under NT and OS/2 the same effect can be achieved by defining the
various passes into segments which can be loaded on demand. Again,
m0 is always loaded.
Although overlaying is not of immediate concern, it serves well to
be prepared for a quick change to this if needed. Keeping this in
mind, it is proposed to organise the compiler passes such that
each pass is (almost) self sufficient. Each compiler pass will be
organised into modules, with one global entry point into the module.
In other words, calls from one module to another WILL NOT (hopefully)
happen. The advantage will be that in the overlayed scheme, overlay
(or segment) swapping will not occur. This does not apply to routines
like type graph driver/ symbol table driver routines which are globally
accessed. Such routines will be part of the m0 module, which stays
in memory all the time. Thus all except one routine defined in every
module will be static. Following this discipline will mean that a
change later to the overlayed scheme will be quick and painless. The
other, though relatively less significant, advantage is that static
routines within a module can be "near" routines thus saving code.