Initial revision

1987-03-03 10:59:52 +00:00
parent 4d4c8b45fb
commit 004f017550
30 changed files with 3903 additions and 0 deletions
--- a/doc/ego/il/il1
+++ b/doc/ego/il/il1
@@ -0,0 +1,112 @@
+.bp
+.NH 1
+Inline substitution
+.NH 2
+Introduction
+.PP
+The Inline Substitution technique (IL)
+tries to decrease the overhead associated
+with procedure calls (invocations).
+During a procedure call, several actions
+must be undertaken to set up the right
+environment for the called procedure.
+.[
+johnson calling sequence
+.]
+On return from the procedure, most of these
+effects must be undone.
+This entire process introduces significant
+costs in execution time as well as
+in object code size.
+.PP
+The inline substitution technique replaces
+some of the calls by the modified body of
+the called procedure, hence eliminating
+the overhead.
+Furthermore, as the calling and called procedure
+are now integrated, they can be optimized
+together, using other techniques of the optimizer.
+This often leads to extra opportunities for
+optimization
+.[
+ball predicting effects
+.]
+.[
+carter code generation cacm
+.]
+.[
+scheifler inline cacm
+.]
+.PP
+An inline substitution of a call to a procedure P increases
+the size of the program, unless P is very small or P is
+called only once.
+In the latter case, P can be eliminated.
+In practice, procedures that are called only once occur
+quite frequently, due to the
+introduction of structured programming.
+(Carter
+.[
+carter umi ann arbor
+.]
+states that almost 50% of the Pascal procedures
+he analyzed were called just once).
+.PP
+Scheifler
+.[
+scheifler inline cacm
+.]
+has a more general view of inline substitution.
+In his model, the program under consideration is
+allowed to grow by a certain amount,
+i.e. code size is sacrificed to speed up the program.
+The above two cases are just special cases of
+his model, obtained by setting the size-change to
+(approximately) zero.
+He formulates the substitution problem as follows:
+.IP
+"Given a program, a subset of all invocations,
+a maximum program size, and a maximum procedure size,
+find a sequence of substitutions that minimizes
+the expected execution time."
+.LP
+Scheifler shows that this problem is NP-complete
+.[~[
+aho hopcroft ullman analysis algorithms
+.], chapter 10]
+by reduction to the Knapsack Problem.
+Heuristics will have to be used to find a near-optimal
+solution.
+.PP
+In the following chapters we will extend
+Scheifler's view and adapt it to the EM Global Optimizer.
+We will first describe the transformations that have
+to be applied to the EM text when a call is substituted
+in line.
+Next we will examine in which cases inline substitution
+is not possible or desirable.
+Heuristics will be developed for
+chosing a good sequence of substitutions.
+These heuristics make no demand on the user
+(such as making profiles
+.[
+scheifler inline cacm
+.]
+or giving pragmats
+.[~[
+ichbiah ada military standard
+.], section 6.3.2]),
+although the model could easily be extended
+to use such information.
+Finally, we will discuss the implementation
+of the IL phase of the optimizer.
+.PP
+We will often use the term inline expansion
+as a synonym of inline substitution.
+.sp 0
+The inverse technique of procedure abstraction
+(automatic subroutine generation)
+.[
+shaffer subroutine generation
+.]
+will not be discussed in this report.
--- a/doc/ego/il/il2
+++ b/doc/ego/il/il2
@@ -0,0 +1,93 @@
+.NH 2
+Parameters and local variables.
+.PP
+In the EM calling sequence, the calling procedure
+pushes its parameters on the stack
+before doing the CAL.
+The called routine first saves some
+status information on the stack and then
+allocates space for its own locals
+(also on the stack).
+Usually, one special purpose register,
+the Local Base (LB) register,
+is used to access both the locals and the
+parameters.
+If memory is highly segmented,
+the stack frames of the caller and the callee
+may be allocated in different fragments;
+an extra Argument Base (AB) register is used
+in this case to access the actual parameters.
+See 4.2 of
+.[
+keizer architecture
+.]
+for further details.
+.PP
+If a procedure call is expanded in line,
+there are two problems:
+.IP 1. 3
+No stack frame will be allocated for the called procedure;
+we must find another place to put its locals.
+.IP 2.
+The LB register cannot be used to access the actual
+parameters;
+as the CAL instruction is deleted, the LB will
+still point to the local base of the \fIcalling\fR procedure.
+.LP
+The local variables of the called procedure will
+be put in the stack frame of the calling procedure,
+just after its own locals.
+The size of the stack frame of the
+calling procedure will be increased
+during its entire lifetime.
+Therefore our model will allow a
+limit to be set on the number of bytes
+for locals that the called procedure may have
+(see next section).
+.PP
+There are several alternatives to access the parameters.
+An actual parameter may be any auxiliary expression,
+which we will refer to as
+the \fIactual parameter expression\fR.
+The value of this expression is stored
+in a location on the stack (see above),
+the \fIparameter location\fR.
+.sp 0
+The alternatives for accessing parameters are:
+.IP -
+save the value of the stackpointer at the point of the CAL
+in a temporary variable X;
+this variable can be used to simulate the AB register,  i.e.
+parameter locations are accessed via an offset to
+the value of X.
+.IP -
+create a new temporary local variable T for
+the parameter (in the stack frame of the caller);
+every access to the parameter location must be changed
+into an access to T.
+.IP -
+do not evaluate the actual parameter expression before the call;
+instead, substitute this expression for every use of the
+parameter location.
+.LP
+The first method may be expensive if X is not
+put in a register.
+We will not use this method.
+The time required to evaluate and access the
+parameters when the second method is used
+will not differ much from the normal
+calling sequence (i.e. not in line call).
+It is not expensive, but there are no
+extra savings either.
+The third method is essentially the 'by name'
+parameter mechanism of Algol60.
+If the actual parameter is just a numeric constant,
+it is advantageous to use it.
+Yet, there are several circumstances
+under which it cannot or should not be used.
+We will deal with this in the next section.
+.sp 0
+In general we will use the third method,
+if it is possible and desirable.
+Such parameters will be called \fIin line parameters\fR.
+In all other cases we will use the second method.
--- a/doc/ego/il/il3
+++ b/doc/ego/il/il3
@@ -0,0 +1,164 @@
+.NH 2
+Feasibility and desirability analysis
+.PP
+Feasibility and desirability analysis
+of in line substitution differ
+somewhat from most other techniques.
+Usually, much effort is needed to find
+a feasible opportunity for optimization
+(e.g. a redundant subexpression).
+Desirability analysis then checks
+if it is really advantageous to do
+the optimization.
+For IL, opportunities are easy to find.
+To see if an in line expansion is
+desirable will not be hard either.
+Yet, the main problem is to find the most
+desirable ones.
+We will deal with this problem later and
+we will first attend feasibility and
+desirability analysis.
+.PP
+There are several reasons why a procedure invocation
+cannot or should not be expanded in line.
+.sp
+A call to a procedure P cannot be expanded in line
+in any of the following cases:
+.IP 1. 3
+The body of P is not available as EM text.
+Clearly, there is no way to do the substitution.
+.IP 2.
+P, or any procedure called by P (transitively),
+follows the chain of statically enclosing
+procedures (via a LXL or LXA instruction)
+or follows the chain of dynamically enclosing
+procedures (via a DCH).
+If the call were expanded in line,
+one level would be removed from the chains,
+leading to total chaos.
+This chaos could be solved by patching up
+every LXL, LXA or DCH in all procedures
+that could be part of the chains,
+but this is hard to implement.
+.IP 3.
+P, or any procedure called by P (transitively),
+calls a procedure whose body is not
+available as EM text.
+The unknown procedure may use an LXL, LXA or DCH.
+However, in several languages a separately
+compiled procedure has no access to the
+static or dynamic chain.
+In this case
+this point does not apply.
+.IP 4.
+P, or any procedure called by P (transitively),
+uses the LPB instruction, which converts a
+local base to an argument base;
+as the locals and parameters are stored
+in a non-standard way (differing from the
+normal EM calling sequence) this instruction
+would yield incorrect results.
+.IP 5.
+The total number of bytes of the parameters
+of P is not known.
+P may be a procedure with a variable number
+of parameters or may have an array of dynamic size
+as value parameter.
+.LP
+It is undesirable to expand a call to a procedure P in line
+in any of the following cases:
+.IP 1. 3
+P is large, i.e. the number of EM instructions
+of P exceeds some threshold.
+The expanded code would be large too.
+Furthermore, several programs in ACK,
+including the global optimizer itself,
+may run out of memory if they they have to run
+in a small address space and are provided
+very large procedures.
+The threshold may be set to infinite,
+in which case this point does not apply.
+.IP 2.
+P has many local variables.
+All these variables would have to be allocated
+in the stack frame of the calling procedure.
+.PP
+If a call may be expanded in line, we have to
+decide how to access its parameters.
+In the previous section we stated that we would
+use in line parameters whenever possible and desirable.
+There are several reasons why a parameter
+cannot or should not be expanded in line.
+.sp
+No parameter of a procedure P can be expanded in line,
+in any of the following cases:
+.IP 1. 3
+P, or any procedure called by P (transitively),
+does a store-indirect or a use-indirect (i.e. through
+a pointer).
+However, if the front-end has generated messages
+telling that certain parameters can not be accessed
+indirectly, those parameters may be expanded in line.
+.IP 2.
+P, or any procedure called by P (transitively),
+calls a procedure whose body is not available as EM text.
+The unknown procedure may do a store-indirect
+or a use-indirect.
+However, the same remark about front-end messages
+as for 1. holds here.
+.IP 3.
+The address of a parameter location is taken (via a LAL).
+In the normal calling sequence, all parameters
+are stored sequentially. If the address of one
+parameter location is taken, the address of any
+other parameter location can be computed from it.
+Hence we must put every parameter in a temporary location;
+furthermore, all these locations must be in
+the same order as for the normal calling sequence.
+.IP 4.
+P has overlapping parameters; for example, it uses
+the parameter at offset 10 both as a 2 byte and as a 4 byte
+parameter.
+Such code may be produced by the front ends if
+the formal parameter is of some record type
+with variants.
+.PP
+Sometimes a specific parameter must not be expanded in line.
+.sp 0
+An actual parameter expression cannot be expanded in line
+in any of the following cases:
+.IP 1. 3
+P stores into the parameter location.
+Even if the actual parameter expression is a simple
+variable, it is incorrect to change the 'store into
+formal' into a 'store into actual', because of
+the parameter mechanism used.
+In Pascal, the following expansion is incorrect:
+.DS
+procedure p (x:integer);
+begin
+   x := 20;
+end;
+...
+a := 10;                a := 10;
+p(a);        --->       a := 20;
+write(a);               write(a);
+.DE
+.IP 2.
+P changes any of the operands of the
+actual parameter expression.
+If the expression is expanded and evaluated
+after the operand has been changed,
+the wrong value will be used.
+.IP 3.
+The actual parameter expression has side effects.
+It must be evaluated only once,
+at the place of the call.
+.LP
+It is undesirable to expand an actual parameter in line
+in the following case:
+.IP 1. 3
+The parameter is used more than once
+(dynamically) and the actual parameter expression
+is not just a simple variable or constant.
+.LP
--- a/doc/ego/il/il4
+++ b/doc/ego/il/il4
@@ -0,0 +1,132 @@
+.NH 2
+Heuristic rules
+.PP
+Using the information described
+in the previous section,
+we can find all calls that can
+be expanded in line, and for which
+this expansion is desirable.
+In general, we cannot expand all these calls,
+so we have to choose the 'best' ones.
+With every CAL instruction
+that may be expanded, we associate
+a \fIpay off\fR,
+which expresses how desirable it is
+to expand this specific CAL.
+.sp
+Let Tc denote the portion of EM text involved
+in a specific call, i.e. the pushing of the actual
+parameter expressions, the CAL itself,
+the popping of the parameters and the
+pushing of the result (if any, via an LFR).
+Let Te denote the EM text that would be obtained
+by expanding the call in line.
+Let Pc be the original program and Pe the program
+with Te substituted for Tc.
+The pay off of the CAL depends on two factors:
+.IP -
+T = execution_time(Pe) - execution_time(Pc)
+.IP -
+S = code_size(Pe) - code_size(Pc)
+.LP
+The change in execution time (T) depends on:
+.IP -
+T1 = execution_time(Te) - execution_time(Tc)
+.IP -
+N = number of times Te or Tc get executed.
+.LP
+We assume that T1 will be the same every
+time the code gets executed.
+This is a reasonable assumption.
+(Note that we are talking about one CAL,
+not about different calls to the same procedure).
+Hence
+.DS
+T = N * T1
+.DE
+T1 can be estimated by a careful analysis
+of the transformations that are performed.
+Below, we list everything that will be
+different when a call is expanded in line:
+.IP -
+The CAL instruction is not executed.
+This saves a subroutine jump.
+.IP -
+The instructions in the procedure prolog
+are not executed.
+These instructions, generated from the PRO pseudo,
+save some machine registers 
+(including the old LB), set the new LB and allocate space
+for the locals of the called routine.
+The savings may be less if there are no
+locals to allocate.
+.IP -
+In line parameters are not evaluated before the call
+and are not pushed on the stack.
+.IP -
+All remaining parameters are stored in local variables,
+instead of being pushed on the stack.
+.IP -
+If the number of parameters is nonzero,
+the ASP instruction after the CAL is not executed.
+.IP -
+Every reference to an in line parameter is
+substituted by the parameter expression.
+.IP -
+RET (return) instructions are replaced by
+BRA (branch) instructions.
+If the called procedure 'falls through'
+(i.e. it has only one RET, at the end of its code),
+even the BRA is not needed.
+.IP -
+The LFR (fetch function result) is not executed
+.PP
+Besides these changes, which are caused directly by IL,
+other changes may occur as IL influences other optimization
+techniques, such as Register Allocation and Constant Propagation.
+Our heuristic rules do not take into account the quite
+inpredictable effects on Register Allocation.
+It does, however, favour calls that have numeric \fIconstants\fR
+as parameter; especially the constant "0" as an inline
+parameter gets high scores,
+as further optimizations may often be possible.
+.PP
+It cannot be determined statically how often a CAL instruction gets
+executed.
+We will use \fIloop nesting\fR information here.
+The nesting level of the loop in which
+the CAL appears (if any) will be used as an
+indication for the number of times it gets executed.
+.PP
+Based on all these facts,
+the pay off of a call will be computed.
+The following model was developed empirically.
+Assume procedure P calls procedure Q.
+The call takes place in basic block B.
+.DS
+ZP = # zero parameters
+CP = # constant parameters - ZP
+LN = Loop Nesting level (0 if outside any loop)
+F  = \fIif\fR # formal parameters of Q > 0 \fIthen\fR 1 \fIelse\fR 0
+FT = \fIif\fR Q falls through \fIthen\fR 1 \fIelse\fR 0
+S  = size(Q) - 1 - # inline_parameters - F
+L  = \fIif\fR # local variables of P > 0 \fIthen\fR 0 \fIelse\fR -1
+A  = CP + 2 * ZP
+N  = \fIif\fR LN=0 and P is never called from a loop \fIthen\fR 0 \fIelse\fR (LN+1)**2
+FM = \fIif\fR B is a firm block \fIthen\fR 2 \fIelse\fR 1
+
+pay_off = (100/S + FT + F + L + A) * N * FM
+.DE
+S stands for the size increase of the program,
+which is slightly less than the size of Q.
+The size of a procedure is taken to be its number
+of (non-pseudo) EM instructions.
+The terms "loop nesting level" and "firm" were defined
+in the chapter on the Intermediate Code (section "loop tables").
+If a call is not inside a loop and the calling procedure
+is itself never called from a loop (transitively),
+then the call will probably be executed at most once.
+Such a call is never expanded in line (its pay off is zero).
+If the calling procedure doesn't have local variables, a penalty (L)
+is introduced, as it will most likely get local variables if the
+call gets expanded.
--- a/doc/ego/il/il5
+++ b/doc/ego/il/il5
@@ -0,0 +1,440 @@
+.NH 2
+Implementation
+.PP
+A major factor in the implementation
+of Inline Substitution is the requirement
+not to use an excessive amount of memory.
+IL essentially analyzes the entire program;
+it makes decisions based on which procedure calls
+appear in the whole program.
+Yet, because of the memory restriction, it is
+not feasible to read the entire program
+in main memory.
+To solve this problem, the IL phase has been
+split up into three subphases that are executed sequentially:
+.IP 1.
+analyze every procedure; see how it accesses its parameters;
+simultaneously collect all calls
+appearing in the whole program an put them
+in a \fIcall-list\fR.
+.IP 2.
+use the call-list and decide which calls will be substituted
+in line.
+.IP 3.
+take the decisions of subphase 2 and modify the
+program accordingly.
+.LP
+Subphases 1 and 3 scan the input program; only
+subphase 3 modifies it.
+It is essential that the decisions can be made
+in subphase 2
+without using the input program,
+provided that subphase 1 puts enough information
+in the call-list.
+Subphase 2 keeps the entire call-list in main memory
+and repeatedly scans it, to
+find the next best candidate for expansion.
+.PP
+We will specify the
+data structures used by IL before 
+describing the subphases.
+.NH 3
+Data structures
+.NH 4
+The procedure table
+.PP
+In subphase 1 information is gathered about every procedure
+and added to the procedure table.
+This information is used by the heuristic rules.
+A proctable entry for procedure p has
+the following extra information:
+.IP -
+is it allowed to substitute an invocation of p in line?
+.IP -
+is it allowed to put any parameter of such a call in line?
+.IP -
+the size of p (number of EM instructions)
+.IP -
+does p 'fall through'?
+.IP -
+a description of the formal parameters that p accesses; this information
+is obtained by looking at the code of p. For every parameter f,
+we record:
+.RS
+.IP -
+the offset of f
+.IP -
+the type of f (word, double word, pointer)
+.IP -
+may the corresponding actual parameter be put in line?
+.IP -
+is f ever accessed indirectly?
+.IP -
+if f used: never, once or more than once?
+.RE
+.IP -
+the number of times p is called (see below)
+.IP -
+the file address of its call-count information (see below).
+.LP
+.NH 4
+Call-count information
+.PP
+As a result of Inline Substitution, some procedures may
+become useless, because all their invocations have been
+substituted in line.
+One of the tasks of IL is to keep track which
+procedures are no longer called.
+Note that IL is especially keen on procedures that are
+called only once
+(possibly as a result of expanding all other calls to it).
+So we want to know how many times a procedure
+is called \fIduring\fR Inline Substitution.
+It is not good enough to compute this
+information afterwards.
+The task is rather complex, because
+the number of times a procedure is called
+varies during the entire process:
+.IP 1.
+If a call to p is substituted in line,
+the number of calls to p gets decremented by 1.
+.IP 2.
+If a call to p is substituted in line,
+and p contains n calls to q, then the number of calls to q
+gets incremented by n.
+.IP 3.
+If a procedure p is removed (because it is no
+longer called) and p contains n calls to q,
+then the number of calls to q gets decremented by n.
+.LP
+(Note that p may be the same as q, if p is recursive).
+.sp 0
+So we actually want to have the following information:
+.DS
+NRCALL(p,q) = number of call to q appearing in p,
+
+for all procedures p and q that may be put in line.
+.DE
+This information, called \fIcall-count information\fR is
+computed by the first subphase.
+It is stored in a file.
+It is represented as a number of lists, rather than as
+a (very sparse) matrix.
+Every procedure has a list of (proc,count) pairs,
+telling which procedures it calls, and how many times.
+The file address of its call-count list is stored
+in its proctable entry.
+Whenever this information is needed, it is fetched from
+the file, using direct access.
+The proctable entry also contains the number of times
+a procedure is called, at any moment.
+.NH 4
+The call-list
+.PP
+The call-list is the major data structure use by IL.
+Every item of the list describes one procedure call.
+It contains the following attributes:
+.IP -
+the calling procedure (caller)
+.IP -
+the called procedure (callee)
+.IP -
+identification of the CAL instruction (sequence number)
+.IP -
+the loop nesting level; our heuristic rules appreciate
+calls inside a loop (or even inside a loop nested inside
+another loop, etc.) more than other calls
+.IP -
+the actual parameter expressions involved in the call;
+for every actual, we record:
+.RS
+.IP -
+the EM code of the expression
+.IP -
+the number of bytes of its result (size)
+.IP -
+an indication if the actual may be put in line
+.RE
+.LP
+The structure of the call-list is rather complex.
+Whenever a call is expanded in line, new calls
+will suddenly appear in the program,
+that were not contained in the original body
+of the calling subroutine.
+These calls are inherited from the called procedure.
+We will refer to these invocations as \fInested calls\fR
+(see Fig. 5.1).
+.DS
+procedure p is
+begin                           .
+     a();                       .
+     b();                       .
+end;
+
+procedure r is            procedure r is
+begin                     begin
+     x();                      x();
+     p();  -- in line          a();  -- nested call
+     y();                      b();  -- nested call
+end;                           y();
+                          end;
+
+Fig. 5.1 Example of nested procedure calls
+.DE
+Nested calls may subsequently be put in line too
+(probably resulting in a yet deeper nesting level, etc.).
+So the call-list does not always reflect the source program,
+but changes dynamically, as decisions are made.
+If a call to p is expanded, all calls appearing in p
+will be added to the call-list.
+.sp 0
+A convenient and elegant way to represent
+the call-list is to use a LISP-like list.
+.[
+poel lisp trac
+.]
+Calls that appear at the same level
+are linked in the CDR direction. If a call C
+to a procedure p is expanded,
+all calls appearing in p are put in a sub-list
+of C, i.e. in its CAR.
+In the example above, before the decision
+to expand the call to p is made, the
+call-list of procedure r looks like:
+.DS
+(call-to-x, call-to-p, call-to-y)
+.DE
+After the decision, it looks like:
+.DS
+(call-to-x, (call-to-p*, call-to-a, call-to-b), call-to-y)
+.DE
+The call to p is marked, because it has been
+substituted.
+Whenever IL wants to traverse the call-list of some procedure,
+it uses the well-known LISP technique of
+recursion in the CAR direction and
+iteration in the CDR direction
+(see page 1.19-2 of
+.[
+poel lisp trac
+.]
+).
+All list traversals look like:
+.DS
+traverse(list)
+{
+    for (c = first(list); c != 0; c = CDR(c)) {
+	if (c is marked) {
+	    traverse(CAR(c));
+	} else {
+	    do something with c
+	}
+    }
+}
+.DE
+The entire call-list consists of a number of LISP-like lists,
+one for every procedure.
+The proctable entry of a procedure contains a pointer
+to the beginning of the list.
+.NH 3
+The first subphase: procedure analysis
+.PP
+The tasks of the first subphase are to determine
+several attributes of every procedure
+and to construct the basic call-list,
+i.e. without nested calls.
+The size of a procedure is determined
+by simply counting its EM instructions.
+Pseudo instructions are skipped.
+A procedure does not 'fall through' if its CFG
+contains a basic block
+that is not the last block of the CFG and
+that ends on a RET instruction.
+The formal parameters of a procedure are determined
+by inspection of
+its code.
+.PP
+The call-list in constructed by looking at all CAL instructions
+appearing in the program.
+The call-list should only contain calls to procedures
+that may be put in line.
+This fact is only known if the procedure was
+analyzed earlier.
+If a call to a procedure p appears in the program
+before the body of p,
+the call will always be put in the call-list.
+If p is later found to be unsuitable,
+the call will be removed from the list by the
+second subphase.
+.PP
+An important issue is the recognition
+of the actual parameter expressions of the call.
+The front ends produces messages telling how many
+bytes of formal parameters every procedure accesses.
+(If there is no such message for a procedure, it
+cannot be put in line).
+The actual parameters together must account for
+the same number of bytes.A recursive descent parser is used
+to parse side-effect free EM expressions.
+It uses a table and some
+auxiliary routines to determine
+how many bytes every EM instruction pops from the stack
+and how many bytes it pushes onto the stack.
+These numbers depend on the EM instruction, its argument,
+and the wordsize and pointersize of the target machine.
+Initially, the parser has to recognize the
+number of bytes specified in the formals-message,
+say N.
+Assume the first instruction before the CAL pops S bytes
+and pushes R bytes.
+If R > N, too many bytes are recognized
+and the parser fails.
+Else, it calls itself recursively to recognize the
+S bytes used as operand of the instruction.
+If it succeeds in doing so, it continues with the next instruction,
+i.e. the first instruction before the code recognized by
+the recursive call, to recognize N-R more bytes.
+The result is a number of EM instructions that collectively push N bytes.
+If an instruction is come across that has side-effects
+(e.g. a store or a procedure call) or of which R and S cannot
+be computed statically (e.g. a LOS), it fails.
+.sp 0
+Note that the parser traverses the code backwards.
+As EM code is essentially postfix code, the parser works top down.
+.PP
+If the parser fails to recognize the parameters, the call will not
+be substituted in line.
+If the parameters can be determined, they still have to
+match the formal parameters of the called procedure.
+This check is performed by the second subphase; it cannot be
+done here, because it is possible that the called
+procedure has not been analyzed yet.
+.PP
+The entire call-list is written to a file,
+to be processed by the second subphase.
+.NH 3
+The second subphase: making decisions
+.PP
+The task of the second subphase is quite easy
+to understand.
+It reads the call-list file,
+builds an incore call-list and deletes every
+call that may not be expanded in line (either because the called
+procedure may not be put in line, or because the actual parameters
+of the call do not match the formal parameters of the called procedure).
+It assigns a \fIpay-off\fR to every call,
+indicating how desirable it is to expand it.
+.PP
+The subphase repeatedly scans the call-list and takes
+the call with the highest ratio.
+The chosen one gets marked,
+and the call-list is extended with the nested calls,
+as described above.
+These nested calls are also assigned a ratio,
+and will be considered too during the next scans.
+.sp 0
+After every decision the number of times
+every procedure is called is updated, using
+the call-count information.
+Meanwhile, the subphase keeps track of the amount of space left
+available.
+If all space is used, or if there are no more calls left to
+be expanded, it exits this loop.
+Finally, calls to procedures that are called only
+once are also chosen.
+.PP
+The actual parameters of a call are only needed by
+this subphase to assign a ratio to a call.
+To save some space, these actuals are not kept in main memory.
+They are removed after the call has been read and a ratio
+has been assigned to it.
+So this subphase works with \fIabstracts\fR of calls.
+After all work has been done,
+the actual parameters of the chosen calls are retrieved
+from a file,
+as they are needed by the transformation subphase.
+.NH 3
+The third subphase: doing transformations
+.PP
+The third subphase makes the actual modifications to
+the EM text.
+It is directed by the decisions made in the previous subphase,
+as expressed via the call-list.
+The call-list read by this subphase contains
+only calls that were selected for expansion.
+The list is ordered in the same way as the EM text,
+i.e. if a call C1 appears before a call C2 in the call-list,
+C1 also appears before C2 in the EM text.
+So the EM text is traversed linearly,
+the calls that have to be substituted are determined
+and the modifications are made.
+If a procedure is come across that is no longer needed,
+it is simply not written to the output EM file.
+The substitution of a call takes place in distinct steps:
+.IP "change the calling sequence" 7
+.sp 0
+The actual parameter expressions are changed.
+Parameters that are put in line are removed.
+All remaining ones must store their result in a
+temporary local variable, rather than
+push it on the stack.
+The CAL instruction and any ASP (to pop actual parameters)
+or LFR (to fetch the result of a function)
+are deleted.
+.IP "fetch the text of the called procedure"
+.sp 0
+Direct disk access is used to to read the text of the
+called procedure.
+The file offset is obtained from the proctable entry.
+.IP "allocate bytes for locals and temporaries"
+.sp 0
+The local variables of the called procedure will be put in the
+stack frame of the calling procedure.
+The same applies to any temporary variables
+that hold the result of parameters
+that were not put in line.
+The proctable entry of the caller is updated.
+.IP "put a label after the CAL"
+.sp 0
+If the called procedure contains a RET (return) instruction
+somewhere in the middle of its text (i.e. it does
+not fall through), the RET must be changed into
+a BRA (branch), to jump over the
+remainder of the text.
+This label is not needed if the called
+procedure falls through.
+.IP "copy the text of the called procedure and modify it"
+.sp 0
+References to local variables of the called routine
+and to parameters that are not put in line
+are changed to refer to the
+new local of the caller.
+References to in line parameters are replaced
+by the actual parameter expression.
+Returns (RETs) are either deleted or
+replaced by a BRA.
+Messages containing information about local
+variables or parameters are changed.
+Global data declarations and the PRO and END pseudos
+are removed.
+Instruction labels and references to them are
+changed to make sure they do not have the
+same identifying number as
+labels in the calling procedure.
+.IP "insert the modified text"
+.sp 0
+The pseudos of the called procedure are put after the pseudos
+of the calling procedure.
+The real text of the callee is put at
+the place where the CAL was.
+.IP "take care of nested substitutions"
+.sp 0
+The expanded procedure may contain calls that
+have to be expanded too (nested calls).
+If the descriptor of this call contains actual
+parameter expressions,
+the code of the expressions has to be changed
+the same way as the code of the callee was changed.
+Next, the entire process of finding CALs and doing
+the substitutions is repeated recursively.
+.LP
--- a/doc/ego/il/il6
+++ b/doc/ego/il/il6
@@ -0,0 +1,27 @@
+.NH 2
+Source files of IL
+.PP
+The sources of IL are in the following files
+and packages (the prefixes 1_, 2_ and 3_ refer to the three subphases):
+.IP il.h: 14
+declarations of global variables and
+data structures
+.IP il.c:
+the routine main; the driving routines of the three subphases
+.IP 1_anal:
+contains a subroutine that analyzes a procedure
+.IP 1_cal:
+contains a subroutine that analyzes a call
+.IP 1_aux:
+implements auxiliary procedures used by subphase 1
+.IP 2_aux:
+implements auxiliary procedures used by subphase 2
+.IP 3_subst:
+the driving routine for doing the substitution
+.IP 3_change:
+lower level routines that do certain modifications
+.IP 3_aux:
+implements auxiliary procedures used by subphase 3
+.IP aux
+implements auxiliary procedures used by several subphases.
+.LP