Initial revision
This commit is contained in:
112
doc/ego/il/il1
Normal file
112
doc/ego/il/il1
Normal file
@@ -0,0 +1,112 @@
|
||||
.bp
|
||||
.NH 1
|
||||
Inline substitution
|
||||
.NH 2
|
||||
Introduction
|
||||
.PP
|
||||
The Inline Substitution technique (IL)
|
||||
tries to decrease the overhead associated
|
||||
with procedure calls (invocations).
|
||||
During a procedure call, several actions
|
||||
must be undertaken to set up the right
|
||||
environment for the called procedure.
|
||||
.[
|
||||
johnson calling sequence
|
||||
.]
|
||||
On return from the procedure, most of these
|
||||
effects must be undone.
|
||||
This entire process introduces significant
|
||||
costs in execution time as well as
|
||||
in object code size.
|
||||
.PP
|
||||
The inline substitution technique replaces
|
||||
some of the calls by the modified body of
|
||||
the called procedure, hence eliminating
|
||||
the overhead.
|
||||
Furthermore, as the calling and called procedure
|
||||
are now integrated, they can be optimized
|
||||
together, using other techniques of the optimizer.
|
||||
This often leads to extra opportunities for
|
||||
optimization
|
||||
.[
|
||||
ball predicting effects
|
||||
.]
|
||||
.[
|
||||
carter code generation cacm
|
||||
.]
|
||||
.[
|
||||
scheifler inline cacm
|
||||
.]
|
||||
.PP
|
||||
An inline substitution of a call to a procedure P increases
|
||||
the size of the program, unless P is very small or P is
|
||||
called only once.
|
||||
In the latter case, P can be eliminated.
|
||||
In practice, procedures that are called only once occur
|
||||
quite frequently, due to the
|
||||
introduction of structured programming.
|
||||
(Carter
|
||||
.[
|
||||
carter umi ann arbor
|
||||
.]
|
||||
states that almost 50% of the Pascal procedures
|
||||
he analyzed were called just once).
|
||||
.PP
|
||||
Scheifler
|
||||
.[
|
||||
scheifler inline cacm
|
||||
.]
|
||||
has a more general view of inline substitution.
|
||||
In his model, the program under consideration is
|
||||
allowed to grow by a certain amount,
|
||||
i.e. code size is sacrificed to speed up the program.
|
||||
The above two cases are just special cases of
|
||||
his model, obtained by setting the size-change to
|
||||
(approximately) zero.
|
||||
He formulates the substitution problem as follows:
|
||||
.IP
|
||||
"Given a program, a subset of all invocations,
|
||||
a maximum program size, and a maximum procedure size,
|
||||
find a sequence of substitutions that minimizes
|
||||
the expected execution time."
|
||||
.LP
|
||||
Scheifler shows that this problem is NP-complete
|
||||
.[~[
|
||||
aho hopcroft ullman analysis algorithms
|
||||
.], chapter 10]
|
||||
by reduction to the Knapsack Problem.
|
||||
Heuristics will have to be used to find a near-optimal
|
||||
solution.
|
||||
.PP
|
||||
In the following chapters we will extend
|
||||
Scheifler's view and adapt it to the EM Global Optimizer.
|
||||
We will first describe the transformations that have
|
||||
to be applied to the EM text when a call is substituted
|
||||
in line.
|
||||
Next we will examine in which cases inline substitution
|
||||
is not possible or desirable.
|
||||
Heuristics will be developed for
|
||||
chosing a good sequence of substitutions.
|
||||
These heuristics make no demand on the user
|
||||
(such as making profiles
|
||||
.[
|
||||
scheifler inline cacm
|
||||
.]
|
||||
or giving pragmats
|
||||
.[~[
|
||||
ichbiah ada military standard
|
||||
.], section 6.3.2]),
|
||||
although the model could easily be extended
|
||||
to use such information.
|
||||
Finally, we will discuss the implementation
|
||||
of the IL phase of the optimizer.
|
||||
.PP
|
||||
We will often use the term inline expansion
|
||||
as a synonym of inline substitution.
|
||||
.sp 0
|
||||
The inverse technique of procedure abstraction
|
||||
(automatic subroutine generation)
|
||||
.[
|
||||
shaffer subroutine generation
|
||||
.]
|
||||
will not be discussed in this report.
|
||||
93
doc/ego/il/il2
Normal file
93
doc/ego/il/il2
Normal file
@@ -0,0 +1,93 @@
|
||||
.NH 2
|
||||
Parameters and local variables.
|
||||
.PP
|
||||
In the EM calling sequence, the calling procedure
|
||||
pushes its parameters on the stack
|
||||
before doing the CAL.
|
||||
The called routine first saves some
|
||||
status information on the stack and then
|
||||
allocates space for its own locals
|
||||
(also on the stack).
|
||||
Usually, one special purpose register,
|
||||
the Local Base (LB) register,
|
||||
is used to access both the locals and the
|
||||
parameters.
|
||||
If memory is highly segmented,
|
||||
the stack frames of the caller and the callee
|
||||
may be allocated in different fragments;
|
||||
an extra Argument Base (AB) register is used
|
||||
in this case to access the actual parameters.
|
||||
See 4.2 of
|
||||
.[
|
||||
keizer architecture
|
||||
.]
|
||||
for further details.
|
||||
.PP
|
||||
If a procedure call is expanded in line,
|
||||
there are two problems:
|
||||
.IP 1. 3
|
||||
No stack frame will be allocated for the called procedure;
|
||||
we must find another place to put its locals.
|
||||
.IP 2.
|
||||
The LB register cannot be used to access the actual
|
||||
parameters;
|
||||
as the CAL instruction is deleted, the LB will
|
||||
still point to the local base of the \fIcalling\fR procedure.
|
||||
.LP
|
||||
The local variables of the called procedure will
|
||||
be put in the stack frame of the calling procedure,
|
||||
just after its own locals.
|
||||
The size of the stack frame of the
|
||||
calling procedure will be increased
|
||||
during its entire lifetime.
|
||||
Therefore our model will allow a
|
||||
limit to be set on the number of bytes
|
||||
for locals that the called procedure may have
|
||||
(see next section).
|
||||
.PP
|
||||
There are several alternatives to access the parameters.
|
||||
An actual parameter may be any auxiliary expression,
|
||||
which we will refer to as
|
||||
the \fIactual parameter expression\fR.
|
||||
The value of this expression is stored
|
||||
in a location on the stack (see above),
|
||||
the \fIparameter location\fR.
|
||||
.sp 0
|
||||
The alternatives for accessing parameters are:
|
||||
.IP -
|
||||
save the value of the stackpointer at the point of the CAL
|
||||
in a temporary variable X;
|
||||
this variable can be used to simulate the AB register, i.e.
|
||||
parameter locations are accessed via an offset to
|
||||
the value of X.
|
||||
.IP -
|
||||
create a new temporary local variable T for
|
||||
the parameter (in the stack frame of the caller);
|
||||
every access to the parameter location must be changed
|
||||
into an access to T.
|
||||
.IP -
|
||||
do not evaluate the actual parameter expression before the call;
|
||||
instead, substitute this expression for every use of the
|
||||
parameter location.
|
||||
.LP
|
||||
The first method may be expensive if X is not
|
||||
put in a register.
|
||||
We will not use this method.
|
||||
The time required to evaluate and access the
|
||||
parameters when the second method is used
|
||||
will not differ much from the normal
|
||||
calling sequence (i.e. not in line call).
|
||||
It is not expensive, but there are no
|
||||
extra savings either.
|
||||
The third method is essentially the 'by name'
|
||||
parameter mechanism of Algol60.
|
||||
If the actual parameter is just a numeric constant,
|
||||
it is advantageous to use it.
|
||||
Yet, there are several circumstances
|
||||
under which it cannot or should not be used.
|
||||
We will deal with this in the next section.
|
||||
.sp 0
|
||||
In general we will use the third method,
|
||||
if it is possible and desirable.
|
||||
Such parameters will be called \fIin line parameters\fR.
|
||||
In all other cases we will use the second method.
|
||||
164
doc/ego/il/il3
Normal file
164
doc/ego/il/il3
Normal file
@@ -0,0 +1,164 @@
|
||||
.NH 2
|
||||
Feasibility and desirability analysis
|
||||
.PP
|
||||
Feasibility and desirability analysis
|
||||
of in line substitution differ
|
||||
somewhat from most other techniques.
|
||||
Usually, much effort is needed to find
|
||||
a feasible opportunity for optimization
|
||||
(e.g. a redundant subexpression).
|
||||
Desirability analysis then checks
|
||||
if it is really advantageous to do
|
||||
the optimization.
|
||||
For IL, opportunities are easy to find.
|
||||
To see if an in line expansion is
|
||||
desirable will not be hard either.
|
||||
Yet, the main problem is to find the most
|
||||
desirable ones.
|
||||
We will deal with this problem later and
|
||||
we will first attend feasibility and
|
||||
desirability analysis.
|
||||
.PP
|
||||
There are several reasons why a procedure invocation
|
||||
cannot or should not be expanded in line.
|
||||
.sp
|
||||
A call to a procedure P cannot be expanded in line
|
||||
in any of the following cases:
|
||||
.IP 1. 3
|
||||
The body of P is not available as EM text.
|
||||
Clearly, there is no way to do the substitution.
|
||||
.IP 2.
|
||||
P, or any procedure called by P (transitively),
|
||||
follows the chain of statically enclosing
|
||||
procedures (via a LXL or LXA instruction)
|
||||
or follows the chain of dynamically enclosing
|
||||
procedures (via a DCH).
|
||||
If the call were expanded in line,
|
||||
one level would be removed from the chains,
|
||||
leading to total chaos.
|
||||
This chaos could be solved by patching up
|
||||
every LXL, LXA or DCH in all procedures
|
||||
that could be part of the chains,
|
||||
but this is hard to implement.
|
||||
.IP 3.
|
||||
P, or any procedure called by P (transitively),
|
||||
calls a procedure whose body is not
|
||||
available as EM text.
|
||||
The unknown procedure may use an LXL, LXA or DCH.
|
||||
However, in several languages a separately
|
||||
compiled procedure has no access to the
|
||||
static or dynamic chain.
|
||||
In this case
|
||||
this point does not apply.
|
||||
.IP 4.
|
||||
P, or any procedure called by P (transitively),
|
||||
uses the LPB instruction, which converts a
|
||||
local base to an argument base;
|
||||
as the locals and parameters are stored
|
||||
in a non-standard way (differing from the
|
||||
normal EM calling sequence) this instruction
|
||||
would yield incorrect results.
|
||||
.IP 5.
|
||||
The total number of bytes of the parameters
|
||||
of P is not known.
|
||||
P may be a procedure with a variable number
|
||||
of parameters or may have an array of dynamic size
|
||||
as value parameter.
|
||||
.LP
|
||||
It is undesirable to expand a call to a procedure P in line
|
||||
in any of the following cases:
|
||||
.IP 1. 3
|
||||
P is large, i.e. the number of EM instructions
|
||||
of P exceeds some threshold.
|
||||
The expanded code would be large too.
|
||||
Furthermore, several programs in ACK,
|
||||
including the global optimizer itself,
|
||||
may run out of memory if they they have to run
|
||||
in a small address space and are provided
|
||||
very large procedures.
|
||||
The threshold may be set to infinite,
|
||||
in which case this point does not apply.
|
||||
.IP 2.
|
||||
P has many local variables.
|
||||
All these variables would have to be allocated
|
||||
in the stack frame of the calling procedure.
|
||||
.PP
|
||||
If a call may be expanded in line, we have to
|
||||
decide how to access its parameters.
|
||||
In the previous section we stated that we would
|
||||
use in line parameters whenever possible and desirable.
|
||||
There are several reasons why a parameter
|
||||
cannot or should not be expanded in line.
|
||||
.sp
|
||||
No parameter of a procedure P can be expanded in line,
|
||||
in any of the following cases:
|
||||
.IP 1. 3
|
||||
P, or any procedure called by P (transitively),
|
||||
does a store-indirect or a use-indirect (i.e. through
|
||||
a pointer).
|
||||
However, if the front-end has generated messages
|
||||
telling that certain parameters can not be accessed
|
||||
indirectly, those parameters may be expanded in line.
|
||||
.IP 2.
|
||||
P, or any procedure called by P (transitively),
|
||||
calls a procedure whose body is not available as EM text.
|
||||
The unknown procedure may do a store-indirect
|
||||
or a use-indirect.
|
||||
However, the same remark about front-end messages
|
||||
as for 1. holds here.
|
||||
.IP 3.
|
||||
The address of a parameter location is taken (via a LAL).
|
||||
In the normal calling sequence, all parameters
|
||||
are stored sequentially. If the address of one
|
||||
parameter location is taken, the address of any
|
||||
other parameter location can be computed from it.
|
||||
Hence we must put every parameter in a temporary location;
|
||||
furthermore, all these locations must be in
|
||||
the same order as for the normal calling sequence.
|
||||
.IP 4.
|
||||
P has overlapping parameters; for example, it uses
|
||||
the parameter at offset 10 both as a 2 byte and as a 4 byte
|
||||
parameter.
|
||||
Such code may be produced by the front ends if
|
||||
the formal parameter is of some record type
|
||||
with variants.
|
||||
.PP
|
||||
Sometimes a specific parameter must not be expanded in line.
|
||||
.sp 0
|
||||
An actual parameter expression cannot be expanded in line
|
||||
in any of the following cases:
|
||||
.IP 1. 3
|
||||
P stores into the parameter location.
|
||||
Even if the actual parameter expression is a simple
|
||||
variable, it is incorrect to change the 'store into
|
||||
formal' into a 'store into actual', because of
|
||||
the parameter mechanism used.
|
||||
In Pascal, the following expansion is incorrect:
|
||||
.DS
|
||||
procedure p (x:integer);
|
||||
begin
|
||||
x := 20;
|
||||
end;
|
||||
...
|
||||
a := 10; a := 10;
|
||||
p(a); ---> a := 20;
|
||||
write(a); write(a);
|
||||
.DE
|
||||
.IP 2.
|
||||
P changes any of the operands of the
|
||||
actual parameter expression.
|
||||
If the expression is expanded and evaluated
|
||||
after the operand has been changed,
|
||||
the wrong value will be used.
|
||||
.IP 3.
|
||||
The actual parameter expression has side effects.
|
||||
It must be evaluated only once,
|
||||
at the place of the call.
|
||||
.LP
|
||||
It is undesirable to expand an actual parameter in line
|
||||
in the following case:
|
||||
.IP 1. 3
|
||||
The parameter is used more than once
|
||||
(dynamically) and the actual parameter expression
|
||||
is not just a simple variable or constant.
|
||||
.LP
|
||||
132
doc/ego/il/il4
Normal file
132
doc/ego/il/il4
Normal file
@@ -0,0 +1,132 @@
|
||||
.NH 2
|
||||
Heuristic rules
|
||||
.PP
|
||||
Using the information described
|
||||
in the previous section,
|
||||
we can find all calls that can
|
||||
be expanded in line, and for which
|
||||
this expansion is desirable.
|
||||
In general, we cannot expand all these calls,
|
||||
so we have to choose the 'best' ones.
|
||||
With every CAL instruction
|
||||
that may be expanded, we associate
|
||||
a \fIpay off\fR,
|
||||
which expresses how desirable it is
|
||||
to expand this specific CAL.
|
||||
.sp
|
||||
Let Tc denote the portion of EM text involved
|
||||
in a specific call, i.e. the pushing of the actual
|
||||
parameter expressions, the CAL itself,
|
||||
the popping of the parameters and the
|
||||
pushing of the result (if any, via an LFR).
|
||||
Let Te denote the EM text that would be obtained
|
||||
by expanding the call in line.
|
||||
Let Pc be the original program and Pe the program
|
||||
with Te substituted for Tc.
|
||||
The pay off of the CAL depends on two factors:
|
||||
.IP -
|
||||
T = execution_time(Pe) - execution_time(Pc)
|
||||
.IP -
|
||||
S = code_size(Pe) - code_size(Pc)
|
||||
.LP
|
||||
The change in execution time (T) depends on:
|
||||
.IP -
|
||||
T1 = execution_time(Te) - execution_time(Tc)
|
||||
.IP -
|
||||
N = number of times Te or Tc get executed.
|
||||
.LP
|
||||
We assume that T1 will be the same every
|
||||
time the code gets executed.
|
||||
This is a reasonable assumption.
|
||||
(Note that we are talking about one CAL,
|
||||
not about different calls to the same procedure).
|
||||
Hence
|
||||
.DS
|
||||
T = N * T1
|
||||
.DE
|
||||
T1 can be estimated by a careful analysis
|
||||
of the transformations that are performed.
|
||||
Below, we list everything that will be
|
||||
different when a call is expanded in line:
|
||||
.IP -
|
||||
The CAL instruction is not executed.
|
||||
This saves a subroutine jump.
|
||||
.IP -
|
||||
The instructions in the procedure prolog
|
||||
are not executed.
|
||||
These instructions, generated from the PRO pseudo,
|
||||
save some machine registers
|
||||
(including the old LB), set the new LB and allocate space
|
||||
for the locals of the called routine.
|
||||
The savings may be less if there are no
|
||||
locals to allocate.
|
||||
.IP -
|
||||
In line parameters are not evaluated before the call
|
||||
and are not pushed on the stack.
|
||||
.IP -
|
||||
All remaining parameters are stored in local variables,
|
||||
instead of being pushed on the stack.
|
||||
.IP -
|
||||
If the number of parameters is nonzero,
|
||||
the ASP instruction after the CAL is not executed.
|
||||
.IP -
|
||||
Every reference to an in line parameter is
|
||||
substituted by the parameter expression.
|
||||
.IP -
|
||||
RET (return) instructions are replaced by
|
||||
BRA (branch) instructions.
|
||||
If the called procedure 'falls through'
|
||||
(i.e. it has only one RET, at the end of its code),
|
||||
even the BRA is not needed.
|
||||
.IP -
|
||||
The LFR (fetch function result) is not executed
|
||||
.PP
|
||||
Besides these changes, which are caused directly by IL,
|
||||
other changes may occur as IL influences other optimization
|
||||
techniques, such as Register Allocation and Constant Propagation.
|
||||
Our heuristic rules do not take into account the quite
|
||||
inpredictable effects on Register Allocation.
|
||||
It does, however, favour calls that have numeric \fIconstants\fR
|
||||
as parameter; especially the constant "0" as an inline
|
||||
parameter gets high scores,
|
||||
as further optimizations may often be possible.
|
||||
.PP
|
||||
It cannot be determined statically how often a CAL instruction gets
|
||||
executed.
|
||||
We will use \fIloop nesting\fR information here.
|
||||
The nesting level of the loop in which
|
||||
the CAL appears (if any) will be used as an
|
||||
indication for the number of times it gets executed.
|
||||
.PP
|
||||
Based on all these facts,
|
||||
the pay off of a call will be computed.
|
||||
The following model was developed empirically.
|
||||
Assume procedure P calls procedure Q.
|
||||
The call takes place in basic block B.
|
||||
.DS
|
||||
ZP = # zero parameters
|
||||
CP = # constant parameters - ZP
|
||||
LN = Loop Nesting level (0 if outside any loop)
|
||||
F = \fIif\fR # formal parameters of Q > 0 \fIthen\fR 1 \fIelse\fR 0
|
||||
FT = \fIif\fR Q falls through \fIthen\fR 1 \fIelse\fR 0
|
||||
S = size(Q) - 1 - # inline_parameters - F
|
||||
L = \fIif\fR # local variables of P > 0 \fIthen\fR 0 \fIelse\fR -1
|
||||
A = CP + 2 * ZP
|
||||
N = \fIif\fR LN=0 and P is never called from a loop \fIthen\fR 0 \fIelse\fR (LN+1)**2
|
||||
FM = \fIif\fR B is a firm block \fIthen\fR 2 \fIelse\fR 1
|
||||
|
||||
pay_off = (100/S + FT + F + L + A) * N * FM
|
||||
.DE
|
||||
S stands for the size increase of the program,
|
||||
which is slightly less than the size of Q.
|
||||
The size of a procedure is taken to be its number
|
||||
of (non-pseudo) EM instructions.
|
||||
The terms "loop nesting level" and "firm" were defined
|
||||
in the chapter on the Intermediate Code (section "loop tables").
|
||||
If a call is not inside a loop and the calling procedure
|
||||
is itself never called from a loop (transitively),
|
||||
then the call will probably be executed at most once.
|
||||
Such a call is never expanded in line (its pay off is zero).
|
||||
If the calling procedure doesn't have local variables, a penalty (L)
|
||||
is introduced, as it will most likely get local variables if the
|
||||
call gets expanded.
|
||||
440
doc/ego/il/il5
Normal file
440
doc/ego/il/il5
Normal file
@@ -0,0 +1,440 @@
|
||||
.NH 2
|
||||
Implementation
|
||||
.PP
|
||||
A major factor in the implementation
|
||||
of Inline Substitution is the requirement
|
||||
not to use an excessive amount of memory.
|
||||
IL essentially analyzes the entire program;
|
||||
it makes decisions based on which procedure calls
|
||||
appear in the whole program.
|
||||
Yet, because of the memory restriction, it is
|
||||
not feasible to read the entire program
|
||||
in main memory.
|
||||
To solve this problem, the IL phase has been
|
||||
split up into three subphases that are executed sequentially:
|
||||
.IP 1.
|
||||
analyze every procedure; see how it accesses its parameters;
|
||||
simultaneously collect all calls
|
||||
appearing in the whole program an put them
|
||||
in a \fIcall-list\fR.
|
||||
.IP 2.
|
||||
use the call-list and decide which calls will be substituted
|
||||
in line.
|
||||
.IP 3.
|
||||
take the decisions of subphase 2 and modify the
|
||||
program accordingly.
|
||||
.LP
|
||||
Subphases 1 and 3 scan the input program; only
|
||||
subphase 3 modifies it.
|
||||
It is essential that the decisions can be made
|
||||
in subphase 2
|
||||
without using the input program,
|
||||
provided that subphase 1 puts enough information
|
||||
in the call-list.
|
||||
Subphase 2 keeps the entire call-list in main memory
|
||||
and repeatedly scans it, to
|
||||
find the next best candidate for expansion.
|
||||
.PP
|
||||
We will specify the
|
||||
data structures used by IL before
|
||||
describing the subphases.
|
||||
.NH 3
|
||||
Data structures
|
||||
.NH 4
|
||||
The procedure table
|
||||
.PP
|
||||
In subphase 1 information is gathered about every procedure
|
||||
and added to the procedure table.
|
||||
This information is used by the heuristic rules.
|
||||
A proctable entry for procedure p has
|
||||
the following extra information:
|
||||
.IP -
|
||||
is it allowed to substitute an invocation of p in line?
|
||||
.IP -
|
||||
is it allowed to put any parameter of such a call in line?
|
||||
.IP -
|
||||
the size of p (number of EM instructions)
|
||||
.IP -
|
||||
does p 'fall through'?
|
||||
.IP -
|
||||
a description of the formal parameters that p accesses; this information
|
||||
is obtained by looking at the code of p. For every parameter f,
|
||||
we record:
|
||||
.RS
|
||||
.IP -
|
||||
the offset of f
|
||||
.IP -
|
||||
the type of f (word, double word, pointer)
|
||||
.IP -
|
||||
may the corresponding actual parameter be put in line?
|
||||
.IP -
|
||||
is f ever accessed indirectly?
|
||||
.IP -
|
||||
if f used: never, once or more than once?
|
||||
.RE
|
||||
.IP -
|
||||
the number of times p is called (see below)
|
||||
.IP -
|
||||
the file address of its call-count information (see below).
|
||||
.LP
|
||||
.NH 4
|
||||
Call-count information
|
||||
.PP
|
||||
As a result of Inline Substitution, some procedures may
|
||||
become useless, because all their invocations have been
|
||||
substituted in line.
|
||||
One of the tasks of IL is to keep track which
|
||||
procedures are no longer called.
|
||||
Note that IL is especially keen on procedures that are
|
||||
called only once
|
||||
(possibly as a result of expanding all other calls to it).
|
||||
So we want to know how many times a procedure
|
||||
is called \fIduring\fR Inline Substitution.
|
||||
It is not good enough to compute this
|
||||
information afterwards.
|
||||
The task is rather complex, because
|
||||
the number of times a procedure is called
|
||||
varies during the entire process:
|
||||
.IP 1.
|
||||
If a call to p is substituted in line,
|
||||
the number of calls to p gets decremented by 1.
|
||||
.IP 2.
|
||||
If a call to p is substituted in line,
|
||||
and p contains n calls to q, then the number of calls to q
|
||||
gets incremented by n.
|
||||
.IP 3.
|
||||
If a procedure p is removed (because it is no
|
||||
longer called) and p contains n calls to q,
|
||||
then the number of calls to q gets decremented by n.
|
||||
.LP
|
||||
(Note that p may be the same as q, if p is recursive).
|
||||
.sp 0
|
||||
So we actually want to have the following information:
|
||||
.DS
|
||||
NRCALL(p,q) = number of call to q appearing in p,
|
||||
|
||||
for all procedures p and q that may be put in line.
|
||||
.DE
|
||||
This information, called \fIcall-count information\fR is
|
||||
computed by the first subphase.
|
||||
It is stored in a file.
|
||||
It is represented as a number of lists, rather than as
|
||||
a (very sparse) matrix.
|
||||
Every procedure has a list of (proc,count) pairs,
|
||||
telling which procedures it calls, and how many times.
|
||||
The file address of its call-count list is stored
|
||||
in its proctable entry.
|
||||
Whenever this information is needed, it is fetched from
|
||||
the file, using direct access.
|
||||
The proctable entry also contains the number of times
|
||||
a procedure is called, at any moment.
|
||||
.NH 4
|
||||
The call-list
|
||||
.PP
|
||||
The call-list is the major data structure use by IL.
|
||||
Every item of the list describes one procedure call.
|
||||
It contains the following attributes:
|
||||
.IP -
|
||||
the calling procedure (caller)
|
||||
.IP -
|
||||
the called procedure (callee)
|
||||
.IP -
|
||||
identification of the CAL instruction (sequence number)
|
||||
.IP -
|
||||
the loop nesting level; our heuristic rules appreciate
|
||||
calls inside a loop (or even inside a loop nested inside
|
||||
another loop, etc.) more than other calls
|
||||
.IP -
|
||||
the actual parameter expressions involved in the call;
|
||||
for every actual, we record:
|
||||
.RS
|
||||
.IP -
|
||||
the EM code of the expression
|
||||
.IP -
|
||||
the number of bytes of its result (size)
|
||||
.IP -
|
||||
an indication if the actual may be put in line
|
||||
.RE
|
||||
.LP
|
||||
The structure of the call-list is rather complex.
|
||||
Whenever a call is expanded in line, new calls
|
||||
will suddenly appear in the program,
|
||||
that were not contained in the original body
|
||||
of the calling subroutine.
|
||||
These calls are inherited from the called procedure.
|
||||
We will refer to these invocations as \fInested calls\fR
|
||||
(see Fig. 5.1).
|
||||
.DS
|
||||
procedure p is
|
||||
begin .
|
||||
a(); .
|
||||
b(); .
|
||||
end;
|
||||
|
||||
procedure r is procedure r is
|
||||
begin begin
|
||||
x(); x();
|
||||
p(); -- in line a(); -- nested call
|
||||
y(); b(); -- nested call
|
||||
end; y();
|
||||
end;
|
||||
|
||||
Fig. 5.1 Example of nested procedure calls
|
||||
.DE
|
||||
Nested calls may subsequently be put in line too
|
||||
(probably resulting in a yet deeper nesting level, etc.).
|
||||
So the call-list does not always reflect the source program,
|
||||
but changes dynamically, as decisions are made.
|
||||
If a call to p is expanded, all calls appearing in p
|
||||
will be added to the call-list.
|
||||
.sp 0
|
||||
A convenient and elegant way to represent
|
||||
the call-list is to use a LISP-like list.
|
||||
.[
|
||||
poel lisp trac
|
||||
.]
|
||||
Calls that appear at the same level
|
||||
are linked in the CDR direction. If a call C
|
||||
to a procedure p is expanded,
|
||||
all calls appearing in p are put in a sub-list
|
||||
of C, i.e. in its CAR.
|
||||
In the example above, before the decision
|
||||
to expand the call to p is made, the
|
||||
call-list of procedure r looks like:
|
||||
.DS
|
||||
(call-to-x, call-to-p, call-to-y)
|
||||
.DE
|
||||
After the decision, it looks like:
|
||||
.DS
|
||||
(call-to-x, (call-to-p*, call-to-a, call-to-b), call-to-y)
|
||||
.DE
|
||||
The call to p is marked, because it has been
|
||||
substituted.
|
||||
Whenever IL wants to traverse the call-list of some procedure,
|
||||
it uses the well-known LISP technique of
|
||||
recursion in the CAR direction and
|
||||
iteration in the CDR direction
|
||||
(see page 1.19-2 of
|
||||
.[
|
||||
poel lisp trac
|
||||
.]
|
||||
).
|
||||
All list traversals look like:
|
||||
.DS
|
||||
traverse(list)
|
||||
{
|
||||
for (c = first(list); c != 0; c = CDR(c)) {
|
||||
if (c is marked) {
|
||||
traverse(CAR(c));
|
||||
} else {
|
||||
do something with c
|
||||
}
|
||||
}
|
||||
}
|
||||
.DE
|
||||
The entire call-list consists of a number of LISP-like lists,
|
||||
one for every procedure.
|
||||
The proctable entry of a procedure contains a pointer
|
||||
to the beginning of the list.
|
||||
.NH 3
|
||||
The first subphase: procedure analysis
|
||||
.PP
|
||||
The tasks of the first subphase are to determine
|
||||
several attributes of every procedure
|
||||
and to construct the basic call-list,
|
||||
i.e. without nested calls.
|
||||
The size of a procedure is determined
|
||||
by simply counting its EM instructions.
|
||||
Pseudo instructions are skipped.
|
||||
A procedure does not 'fall through' if its CFG
|
||||
contains a basic block
|
||||
that is not the last block of the CFG and
|
||||
that ends on a RET instruction.
|
||||
The formal parameters of a procedure are determined
|
||||
by inspection of
|
||||
its code.
|
||||
.PP
|
||||
The call-list in constructed by looking at all CAL instructions
|
||||
appearing in the program.
|
||||
The call-list should only contain calls to procedures
|
||||
that may be put in line.
|
||||
This fact is only known if the procedure was
|
||||
analyzed earlier.
|
||||
If a call to a procedure p appears in the program
|
||||
before the body of p,
|
||||
the call will always be put in the call-list.
|
||||
If p is later found to be unsuitable,
|
||||
the call will be removed from the list by the
|
||||
second subphase.
|
||||
.PP
|
||||
An important issue is the recognition
|
||||
of the actual parameter expressions of the call.
|
||||
The front ends produces messages telling how many
|
||||
bytes of formal parameters every procedure accesses.
|
||||
(If there is no such message for a procedure, it
|
||||
cannot be put in line).
|
||||
The actual parameters together must account for
|
||||
the same number of bytes.A recursive descent parser is used
|
||||
to parse side-effect free EM expressions.
|
||||
It uses a table and some
|
||||
auxiliary routines to determine
|
||||
how many bytes every EM instruction pops from the stack
|
||||
and how many bytes it pushes onto the stack.
|
||||
These numbers depend on the EM instruction, its argument,
|
||||
and the wordsize and pointersize of the target machine.
|
||||
Initially, the parser has to recognize the
|
||||
number of bytes specified in the formals-message,
|
||||
say N.
|
||||
Assume the first instruction before the CAL pops S bytes
|
||||
and pushes R bytes.
|
||||
If R > N, too many bytes are recognized
|
||||
and the parser fails.
|
||||
Else, it calls itself recursively to recognize the
|
||||
S bytes used as operand of the instruction.
|
||||
If it succeeds in doing so, it continues with the next instruction,
|
||||
i.e. the first instruction before the code recognized by
|
||||
the recursive call, to recognize N-R more bytes.
|
||||
The result is a number of EM instructions that collectively push N bytes.
|
||||
If an instruction is come across that has side-effects
|
||||
(e.g. a store or a procedure call) or of which R and S cannot
|
||||
be computed statically (e.g. a LOS), it fails.
|
||||
.sp 0
|
||||
Note that the parser traverses the code backwards.
|
||||
As EM code is essentially postfix code, the parser works top down.
|
||||
.PP
|
||||
If the parser fails to recognize the parameters, the call will not
|
||||
be substituted in line.
|
||||
If the parameters can be determined, they still have to
|
||||
match the formal parameters of the called procedure.
|
||||
This check is performed by the second subphase; it cannot be
|
||||
done here, because it is possible that the called
|
||||
procedure has not been analyzed yet.
|
||||
.PP
|
||||
The entire call-list is written to a file,
|
||||
to be processed by the second subphase.
|
||||
.NH 3
|
||||
The second subphase: making decisions
|
||||
.PP
|
||||
The task of the second subphase is quite easy
|
||||
to understand.
|
||||
It reads the call-list file,
|
||||
builds an incore call-list and deletes every
|
||||
call that may not be expanded in line (either because the called
|
||||
procedure may not be put in line, or because the actual parameters
|
||||
of the call do not match the formal parameters of the called procedure).
|
||||
It assigns a \fIpay-off\fR to every call,
|
||||
indicating how desirable it is to expand it.
|
||||
.PP
|
||||
The subphase repeatedly scans the call-list and takes
|
||||
the call with the highest ratio.
|
||||
The chosen one gets marked,
|
||||
and the call-list is extended with the nested calls,
|
||||
as described above.
|
||||
These nested calls are also assigned a ratio,
|
||||
and will be considered too during the next scans.
|
||||
.sp 0
|
||||
After every decision the number of times
|
||||
every procedure is called is updated, using
|
||||
the call-count information.
|
||||
Meanwhile, the subphase keeps track of the amount of space left
|
||||
available.
|
||||
If all space is used, or if there are no more calls left to
|
||||
be expanded, it exits this loop.
|
||||
Finally, calls to procedures that are called only
|
||||
once are also chosen.
|
||||
.PP
|
||||
The actual parameters of a call are only needed by
|
||||
this subphase to assign a ratio to a call.
|
||||
To save some space, these actuals are not kept in main memory.
|
||||
They are removed after the call has been read and a ratio
|
||||
has been assigned to it.
|
||||
So this subphase works with \fIabstracts\fR of calls.
|
||||
After all work has been done,
|
||||
the actual parameters of the chosen calls are retrieved
|
||||
from a file,
|
||||
as they are needed by the transformation subphase.
|
||||
.NH 3
|
||||
The third subphase: doing transformations
|
||||
.PP
|
||||
The third subphase makes the actual modifications to
|
||||
the EM text.
|
||||
It is directed by the decisions made in the previous subphase,
|
||||
as expressed via the call-list.
|
||||
The call-list read by this subphase contains
|
||||
only calls that were selected for expansion.
|
||||
The list is ordered in the same way as the EM text,
|
||||
i.e. if a call C1 appears before a call C2 in the call-list,
|
||||
C1 also appears before C2 in the EM text.
|
||||
So the EM text is traversed linearly,
|
||||
the calls that have to be substituted are determined
|
||||
and the modifications are made.
|
||||
If a procedure is come across that is no longer needed,
|
||||
it is simply not written to the output EM file.
|
||||
The substitution of a call takes place in distinct steps:
|
||||
.IP "change the calling sequence" 7
|
||||
.sp 0
|
||||
The actual parameter expressions are changed.
|
||||
Parameters that are put in line are removed.
|
||||
All remaining ones must store their result in a
|
||||
temporary local variable, rather than
|
||||
push it on the stack.
|
||||
The CAL instruction and any ASP (to pop actual parameters)
|
||||
or LFR (to fetch the result of a function)
|
||||
are deleted.
|
||||
.IP "fetch the text of the called procedure"
|
||||
.sp 0
|
||||
Direct disk access is used to to read the text of the
|
||||
called procedure.
|
||||
The file offset is obtained from the proctable entry.
|
||||
.IP "allocate bytes for locals and temporaries"
|
||||
.sp 0
|
||||
The local variables of the called procedure will be put in the
|
||||
stack frame of the calling procedure.
|
||||
The same applies to any temporary variables
|
||||
that hold the result of parameters
|
||||
that were not put in line.
|
||||
The proctable entry of the caller is updated.
|
||||
.IP "put a label after the CAL"
|
||||
.sp 0
|
||||
If the called procedure contains a RET (return) instruction
|
||||
somewhere in the middle of its text (i.e. it does
|
||||
not fall through), the RET must be changed into
|
||||
a BRA (branch), to jump over the
|
||||
remainder of the text.
|
||||
This label is not needed if the called
|
||||
procedure falls through.
|
||||
.IP "copy the text of the called procedure and modify it"
|
||||
.sp 0
|
||||
References to local variables of the called routine
|
||||
and to parameters that are not put in line
|
||||
are changed to refer to the
|
||||
new local of the caller.
|
||||
References to in line parameters are replaced
|
||||
by the actual parameter expression.
|
||||
Returns (RETs) are either deleted or
|
||||
replaced by a BRA.
|
||||
Messages containing information about local
|
||||
variables or parameters are changed.
|
||||
Global data declarations and the PRO and END pseudos
|
||||
are removed.
|
||||
Instruction labels and references to them are
|
||||
changed to make sure they do not have the
|
||||
same identifying number as
|
||||
labels in the calling procedure.
|
||||
.IP "insert the modified text"
|
||||
.sp 0
|
||||
The pseudos of the called procedure are put after the pseudos
|
||||
of the calling procedure.
|
||||
The real text of the callee is put at
|
||||
the place where the CAL was.
|
||||
.IP "take care of nested substitutions"
|
||||
.sp 0
|
||||
The expanded procedure may contain calls that
|
||||
have to be expanded too (nested calls).
|
||||
If the descriptor of this call contains actual
|
||||
parameter expressions,
|
||||
the code of the expressions has to be changed
|
||||
the same way as the code of the callee was changed.
|
||||
Next, the entire process of finding CALs and doing
|
||||
the substitutions is repeated recursively.
|
||||
.LP
|
||||
27
doc/ego/il/il6
Normal file
27
doc/ego/il/il6
Normal file
@@ -0,0 +1,27 @@
|
||||
.NH 2
|
||||
Source files of IL
|
||||
.PP
|
||||
The sources of IL are in the following files
|
||||
and packages (the prefixes 1_, 2_ and 3_ refer to the three subphases):
|
||||
.IP il.h: 14
|
||||
declarations of global variables and
|
||||
data structures
|
||||
.IP il.c:
|
||||
the routine main; the driving routines of the three subphases
|
||||
.IP 1_anal:
|
||||
contains a subroutine that analyzes a procedure
|
||||
.IP 1_cal:
|
||||
contains a subroutine that analyzes a call
|
||||
.IP 1_aux:
|
||||
implements auxiliary procedures used by subphase 1
|
||||
.IP 2_aux:
|
||||
implements auxiliary procedures used by subphase 2
|
||||
.IP 3_subst:
|
||||
the driving routine for doing the substitution
|
||||
.IP 3_change:
|
||||
lower level routines that do certain modifications
|
||||
.IP 3_aux:
|
||||
implements auxiliary procedures used by subphase 3
|
||||
.IP aux
|
||||
implements auxiliary procedures used by several subphases.
|
||||
.LP
|
||||
Reference in New Issue
Block a user