COMPILER ORGANISATION
---------------------
	.OVERVIEW OF THE COMPILATION PROCESS
	-----------------------------------
	..INPUT TO THE COMPILER
	---------------------
	The MIDL compiler (MIDL) takes as input, Interface Definition
	statements written in the Microsoft Interface Defintion Language.
	The input containing the statements is a file with the
	".IDL" extension (called the IDL file). In addition an Attibute
	Configuration file with the ".ACF" extension (called the ACF file),
	is also used for input after the IDL file has been processed.

	..THE COMPILER PASSES
	-------------------
	The compiler is organised into 4 distinct passes m1,m2,m3,m4
	(meaning midl 1, midl 2 and so on...). The responsibilites
	of the various passes is as follows:

		m1 : Parsing the idl file.
		m2 : Parsing the acf file. (optional)
		m3 : optimisation. (optional)
		m4 : code generation.

	In addition, a driver program handles parsing of the command line,
	and depending upon the user specified options, invokes the passes.
	The big picture for the compiler looks like this:

			(FIGURE)
	
	..OUTPUT FROM THE COMPILER
	------------------------
	The output from the MIDL compiler is a set of ".c" and ".h" files
	containing the stub routines and prototypes. These files are then
	linked with the client modules of the app and server modules.

	APP.IDL	will produce	: app_c.c and app_c.h for client side
				: app_s.c and app_s.h for server side.

	.A LOOK AT THE COMPILER PASSES
	-----------------------------
	..THE COMPILER DRIVER (m0)
	---------------------

		TBD

	..PASS1 (m1)
	------------
	Pass 1 (m1) of the compiler is a parsing pass the input to which is
	a stream of token produced by the lexical analyser (lexer) The two 
	main components of m0 thus, are the parser and the lexer.

	The lexer is encapsulated in a function called yylex(). The lexer
	is called by the parser to supply tokens as and when needed. It is
	a hand-coded lexer which operates off a state-transition table. 
	Starting from state 0 , the lexer accepts a character and makes a
	transition to a new state depending upon the character. This process
	repeats till a whole token has been recognised. White spaces, comments
	and newlines are ignored by the lexer. Tokens returned by the lexer
	can be keywords, identifiers, numeric and string constants, and
	characters which form the syntax of the IDL. Keyword recognition is
	the responsiblity of the lexer. A keyword is recognised as an 
	identifier intially. A preallocated keyword table is used to
	distinguish between keywords and identifiers. If an identifier is
	found in the keyword table it is returned as a keyword by the lexer.

	The parser is automatically generated by the yacc parser generator.
	The input to the parser generator is a source file containing 
	grammar productions for the IDL . The output is a ".c" file which
	contains the parsing tables along with the driver routine for the
	parser. The parser is encapsulated in a function called yyparse().

	The process of recognition of a valid syntactic construct consists of
	reductions of grammar rules specified in the grammar file (grammar.y).
	Interspersed with the grammar rules are action routines which are 
	coded by the compiler writers and which are executed after the 
	production that they are in gets reduced.

	Apart from parsing, m0 is responsible for building the type graph
	the type data base  and the symbol table. The process of building 
	the type graph and the symbol table is described elsewhere (WHERE??).
	Suffice it to say that as and when productions are recognised, the
	type graph and symbol table are built.

	As types are defined, they also get entered in the type data base.
	The type data base contains reference counts which indicate the
	usage of the type and are useful in determining the kind of 
	marshalling that the type will undergo. Type sub-graphs corresponding
	to procedures (loosely classified as signatures) also get entered into
	the type data base. 


	..PASS2 (m2)
	------------

	Pass 2 is responsible for parsing the ACF file. This process follows
	the IDL parsing. The m0 checks for the presence of an ACF file and
	invokes pass 2 (m2) if needed. 

	The parser for pass2 also is generated by the parser generator. The
	same lexical analyser is used even for pass 2. Since the parser 
	generator generates the parser function yyparse(), it results in
	a name clash with the m1 parse. The proposed scheme for preventing
	the name clash is to redefine yyparse as (say) yy2parse and similarly 
	renaming any global variables belonging to the parser module in an 
	include file which is included when the generated parser is compiled.

	The m2 verifies that the acf conforms to the syntax and semantics of
	NIDL. m2 acts on the type graph generated by m1 and qualifies it
	with the attributes collected from the acf. No new data structures 
	are introduced by m2.

	..PASS3 (m3)
	------------

	Pass 3 is the optimisation pass. This pass determines the kind of
	marshalling a type or a procedure will undergo. Working off the type
	data base, m3 determines how types and procedures will be marshalled.
	Details of the optimisation are described in chapter (WHERE??) of this
	document.

	..PASS4 (m4)
	------------

	m4 generates the stub code and include files for the application.
	Details of this pass are TBD.

	.CODE ORGANISATION 
	------------------

	The above description decribes the logical layout of the passes. The
	physical code layout, generally parallels the logical layout, with
	some other considerations in mind.

	The most important consideration is that the compiler would be expected
	to work under the DOS , OS/2 and NT environments. For DOS, the size
	of the compiler code in memory is an important factor in deciding
	the layout of the compiler. More code residing in memory means lesser
	memory available for compiler internal storage and the smaller is 
	the IDL file that can be compiled. 

	Since all the passes are fairly independent of each other and only 
	share the data structures among themselves, it is proposed that
	we overlay the various passes, with the driver (m0) residing in memory
	at all times and fanning out control. Overlaying is explicit under
	DOS with the modules partitioned into various overlays at link time.

	Under NT and OS/2 the same effect can be achieved by defining the
	various passes into segments which can be loaded on demand. Again,
	m0 is always loaded.
	
	Although overlaying is not of immediate concern, it serves well to
	be prepared for a quick change to this if needed. Keeping this in
	mind, it is proposed to organise the compiler passes such that
	each pass is (almost) self sufficient. Each compiler pass will be
	organised into modules, with one global entry point into the module.
	In other words, calls from one module to another WILL NOT (hopefully)
	happen. The advantage will be that in the overlayed scheme, overlay 
	(or segment) swapping will not occur. This does not apply to routines
	like type graph driver/ symbol table driver routines which are globally
	accessed. Such routines will be part of the m0 module, which stays
	in memory all the time. Thus all except one routine defined in every 
	module will be static. Following this discipline will mean that a 
	change later to the overlayed scheme will be quick and painless. The
	other, though relatively less significant, advantage is that static 
	routines within a module can be "near" routines thus saving code.