windows-nt-4.0/private/rpc/midlnew/doc/doc1.doc


								COMPILER ORGANISATION

								---------------------

									.OVERVIEW OF THE COMPILATION PROCESS

									-----------------------------------

									..INPUT TO THE COMPILER

									---------------------

									The MIDL compiler (MIDL) takes as input, Interface Definition

									statements written in the Microsoft Interface Defintion Language.

									The input containing the statements is a file with the

									".IDL" extension (called the IDL file). In addition an Attibute

									Configuration file with the ".ACF" extension (called the ACF file),

									is also used for input after the IDL file has been processed.


									..THE COMPILER PASSES

									-------------------

									The compiler is organised into 4 distinct passes m1,m2,m3,m4

									(meaning midl 1, midl 2 and so on...). The responsibilites

									of the various passes is as follows:


										m1 : Parsing the idl file.

										m2 : Parsing the acf file. (optional)

										m3 : optimisation. (optional)

										m4 : code generation.


									In addition, a driver program handles parsing of the command line,

									and depending upon the user specified options, invokes the passes.

									The big picture for the compiler looks like this:


											(FIGURE)


									..OUTPUT FROM THE COMPILER

									------------------------

									The output from the MIDL compiler is a set of ".c" and ".h" files

									containing the stub routines and prototypes. These files are then

									linked with the client modules of the app and server modules.


									APP.IDL	will produce	: app_c.c and app_c.h for client side

												: app_s.c and app_s.h for server side.


									.A LOOK AT THE COMPILER PASSES

									-----------------------------

									..THE COMPILER DRIVER (m0)

									---------------------


										TBD


									..PASS1 (m1)

									------------

									Pass 1 (m1) of the compiler is a parsing pass the input to which is

									a stream of token produced by the lexical analyser (lexer) The two

									main components of m0 thus, are the parser and the lexer.


									The lexer is encapsulated in a function called yylex(). The lexer

									is called by the parser to supply tokens as and when needed. It is

									a hand-coded lexer which operates off a state-transition table.

									Starting from state 0 , the lexer accepts a character and makes a

									transition to a new state depending upon the character. This process

									repeats till a whole token has been recognised. White spaces, comments

									and newlines are ignored by the lexer. Tokens returned by the lexer

									can be keywords, identifiers, numeric and string constants, and

									characters which form the syntax of the IDL. Keyword recognition is

									the responsiblity of the lexer. A keyword is recognised as an

									identifier intially. A preallocated keyword table is used to

									distinguish between keywords and identifiers. If an identifier is

									found in the keyword table it is returned as a keyword by the lexer.


									The parser is automatically generated by the yacc parser generator.

									The input to the parser generator is a source file containing

									grammar productions for the IDL . The output is a ".c" file which

									contains the parsing tables along with the driver routine for the

									parser. The parser is encapsulated in a function called yyparse().


									The process of recognition of a valid syntactic construct consists of

									reductions of grammar rules specified in the grammar file (grammar.y).

									Interspersed with the grammar rules are action routines which are

									coded by the compiler writers and which are executed after the

									production that they are in gets reduced.


									Apart from parsing, m0 is responsible for building the type graph

									the type data base  and the symbol table. The process of building

									the type graph and the symbol table is described elsewhere (WHERE??).

									Suffice it to say that as and when productions are recognised, the

									type graph and symbol table are built.


									As types are defined, they also get entered in the type data base.

									The type data base contains reference counts which indicate the

									usage of the type and are useful in determining the kind of

									marshalling that the type will undergo. Type sub-graphs corresponding

									to procedures (loosely classified as signatures) also get entered into

									the type data base.


									..PASS2 (m2)

									------------


									Pass 2 is responsible for parsing the ACF file. This process follows

									the IDL parsing. The m0 checks for the presence of an ACF file and

									invokes pass 2 (m2) if needed.


									The parser for pass2 also is generated by the parser generator. The

									same lexical analyser is used even for pass 2. Since the parser

									generator generates the parser function yyparse(), it results in

									a name clash with the m1 parse. The proposed scheme for preventing

									the name clash is to redefine yyparse as (say) yy2parse and similarly

									renaming any global variables belonging to the parser module in an

									include file which is included when the generated parser is compiled.


									The m2 verifies that the acf conforms to the syntax and semantics of

									NIDL. m2 acts on the type graph generated by m1 and qualifies it

									with the attributes collected from the acf. No new data structures

									are introduced by m2.


									..PASS3 (m3)

									------------


									Pass 3 is the optimisation pass. This pass determines the kind of

									marshalling a type or a procedure will undergo. Working off the type

									data base, m3 determines how types and procedures will be marshalled.

									Details of the optimisation are described in chapter (WHERE??) of this

									document.


									..PASS4 (m4)

									------------


									m4 generates the stub code and include files for the application.

									Details of this pass are TBD.


									.CODE ORGANISATION

									------------------


									The above description decribes the logical layout of the passes. The

									physical code layout, generally parallels the logical layout, with

									some other considerations in mind.


									The most important consideration is that the compiler would be expected

									to work under the DOS , OS/2 and NT environments. For DOS, the size

									of the compiler code in memory is an important factor in deciding

									the layout of the compiler. More code residing in memory means lesser

									memory available for compiler internal storage and the smaller is

									the IDL file that can be compiled.


									Since all the passes are fairly independent of each other and only

									share the data structures among themselves, it is proposed that

									we overlay the various passes, with the driver (m0) residing in memory

									at all times and fanning out control. Overlaying is explicit under

									DOS with the modules partitioned into various overlays at link time.


									Under NT and OS/2 the same effect can be achieved by defining the

									various passes into segments which can be loaded on demand. Again,

									m0 is always loaded.


									Although overlaying is not of immediate concern, it serves well to

									be prepared for a quick change to this if needed. Keeping this in

									mind, it is proposed to organise the compiler passes such that

									each pass is (almost) self sufficient. Each compiler pass will be

									organised into modules, with one global entry point into the module.

									In other words, calls from one module to another WILL NOT (hopefully)

									happen. The advantage will be that in the overlayed scheme, overlay

									(or segment) swapping will not occur. This does not apply to routines

									like type graph driver/ symbol table driver routines which are globally

									accessed. Such routines will be part of the m0 module, which stays

									in memory all the time. Thus all except one routine defined in every

									module will be static. Following this discipline will mean that a

									change later to the overlayed scheme will be quick and painless. The

									other, though relatively less significant, advantage is that static

									routines within a module can be "near" routines thus saving code.