Initial revision
This commit is contained in:
371
doc/ego/ov/ov1
Normal file
371
doc/ego/ov/ov1
Normal file
@@ -0,0 +1,371 @@
|
||||
.bp
|
||||
.NH 1
|
||||
Overview of the global optimizer
|
||||
.NH 2
|
||||
The ACK compilation process
|
||||
.PP
|
||||
The EM Global Optimizer is one of three optimizers that are
|
||||
part of the Amsterdam Compiler Kit (ACK).
|
||||
The phases of ACK are:
|
||||
.IP 1.
|
||||
A Front End translates a source program to EM
|
||||
.IP 2.
|
||||
The Peephole Optimizer
|
||||
.[
|
||||
tanenbaum staveren peephole toplass
|
||||
.]
|
||||
reads EM code and produces 'better' EM code.
|
||||
It performs a number of optimizations (mostly peephole
|
||||
optimizations)
|
||||
such as constant folding, strength reduction and unreachable code
|
||||
elimination.
|
||||
.IP 3.
|
||||
The Global Optimizer further improves the EM code.
|
||||
.IP 4.
|
||||
The Code Generator transforms EM to assembly code
|
||||
of the target computer.
|
||||
.IP 5.
|
||||
The Target Optimizer improves the assembly code.
|
||||
.IP 6.
|
||||
An Assembler/Loader generates an executable file.
|
||||
.LP
|
||||
For a more extensive overview of the ACK compilation process,
|
||||
we refer to.
|
||||
.[
|
||||
tanenbaum toolkit rapport
|
||||
.]
|
||||
.[
|
||||
tanenbaum toolkit cacm
|
||||
.]
|
||||
.PP
|
||||
The input of the Global Optimizer may consist of files and
|
||||
libraries.
|
||||
Every file or module in the library must contain EM code in
|
||||
Compact Assembly Language format.
|
||||
.[~[
|
||||
tanenbaum machine architecture
|
||||
.], section 11.2]
|
||||
The output consists of one such EM file.
|
||||
The input files and libraries together need not
|
||||
constitute an entire program,
|
||||
although as much of the program as possible should be supplied.
|
||||
The more information about the program the optimizer
|
||||
gets, the better its output code will be.
|
||||
.PP
|
||||
The Global Optimizer is language- and machine-independent,
|
||||
i.e. it can be used for all languages and machines supported by ACK.
|
||||
Yet, it puts some unavoidable restrictions on the EM code
|
||||
produced by the Front End (see below).
|
||||
It must have some knowledge of the target machine.
|
||||
This knowledge is expressed in a machine description table
|
||||
which is passed as argument to the optimizer.
|
||||
This table does not contain very detailed information about the
|
||||
target (such as its instruction set and addressing modes).
|
||||
.NH 2
|
||||
The EM code
|
||||
.PP
|
||||
The definition of EM, the intermediate code of all ACK compilers,
|
||||
is given in a separate document.
|
||||
.[
|
||||
tanenbaum machine architecture
|
||||
.]
|
||||
We will only discuss some features of EM that are most relevant
|
||||
to the Global Optimizer.
|
||||
.PP
|
||||
EM is the assembly code of a virtual \fIstack machine\fR.
|
||||
All operations are performed on the top of the stack.
|
||||
For example, the statement "A := B + 3" may be expressed in EM as:
|
||||
.DS
|
||||
LOL -4 -- push local variable B
|
||||
LOC 3 -- push constant 3
|
||||
ADI 2 -- add two 2-byte items on top of
|
||||
-- the stack and push the result
|
||||
STL -2 -- pop A
|
||||
.DE
|
||||
So EM is essentially a \fIpostfix\fR code.
|
||||
.PP
|
||||
EM has a rich instruction set, containing several arithmetic
|
||||
and logical operators.
|
||||
It also contains special-case instructions (such as INCrement).
|
||||
.PP
|
||||
EM has \fIglobal\fR (\fIexternal\fR) variables, accessible
|
||||
by all procedures and \fIlocal\fR variables, accessible by a few
|
||||
(nested) procedures.
|
||||
The local variables of a lexically enclosing procedure may
|
||||
be accessed via a \fIstatic link\fR.
|
||||
EM has instructions to follow the static chain.
|
||||
There are EM instruction to allow a procedure
|
||||
to access its local variables directly (such as LOL and STL above).
|
||||
Local variables are referenced via an offset in the stack frame
|
||||
of the procedure, rather than by their names (e.g. -2 and -4 above).
|
||||
The EM code does not contain the (source language) type
|
||||
of the variables.
|
||||
.PP
|
||||
All structured statements in the source program are expressed in
|
||||
low level jump instructions.
|
||||
Besides conditional and unconditional branch instructions, there are
|
||||
two case instructions (CSA and CSB),
|
||||
to allow efficient translation of case statements.
|
||||
.NH 2
|
||||
Requirements on the EM input
|
||||
.PP
|
||||
As the optimizer should be useful for all languages,
|
||||
it clearly should not put severe restrictions on the EM code
|
||||
of the input.
|
||||
There is, however, one immovable requirement:
|
||||
it must be possible to determine the \fIflow of control\fR of the
|
||||
input program.
|
||||
As virtually all global optimizations are based on control flow information,
|
||||
the optimizer would be totally powerless without it.
|
||||
For this reason we restrict the usage of the case jump instructions (CSA/CSB)
|
||||
of EM.
|
||||
Such an instruction is always called with the address of a case descriptor
|
||||
on top the the stack.
|
||||
.[~[
|
||||
tanenbaum machine architecture
|
||||
.] section 7.4]
|
||||
This descriptor contains the labels of all possible
|
||||
destinations of the jump.
|
||||
We demand that all case descriptors are allocated in a global
|
||||
data fragment of type ROM, i.e. the case descriptors
|
||||
may not be modifyable.
|
||||
Furthermore, any case instruction should be immediately preceded by
|
||||
a LAE (Load Address External) instruction, that loads the
|
||||
address of the descriptor,
|
||||
so the descriptor can be uniquely identified.
|
||||
.PP
|
||||
The optimizer will work improperly if the user deceives the control flow.
|
||||
We will give two methods to do this.
|
||||
.PP
|
||||
In "C" the notorious library routines "setjmp" and "longjmp"
|
||||
.[
|
||||
unix programmer's manual
|
||||
.]
|
||||
may be used to jump out of a procedure,
|
||||
but can also be used for a number of other stuffy purposes,
|
||||
for example, to create an extra entry point in a loop.
|
||||
.DS
|
||||
while (condition) {
|
||||
....
|
||||
setjmp(buf);
|
||||
...
|
||||
}
|
||||
...
|
||||
longjmp(buf);
|
||||
.DE
|
||||
The invocation to longjmp actually is a jump to the place of
|
||||
the last call to setjmp with the same argument (buf).
|
||||
As the calls to setjmp and longjmp are indistinguishable from
|
||||
normal procedure calls, the optimizer will not see the danger.
|
||||
No need to say that several loop optimizations will behave
|
||||
unexpectedly when presented with such pathological input.
|
||||
.PP
|
||||
Another way to deceive the flow of control is
|
||||
by using exception handling routines.
|
||||
Ada*
|
||||
.FS
|
||||
* Ada is a registered trademark of the U.S. Government
|
||||
(Ada Joint Program Office).
|
||||
.FE
|
||||
has clearly recognized the dangers of exception handling,
|
||||
but other languages (such as PL/I) have not.
|
||||
.[
|
||||
ada rationale
|
||||
.]
|
||||
.PP
|
||||
The optimizer will be more effective if the EM input contains
|
||||
some extra information about the source program.
|
||||
Especially the \fIregister message\fR is very important.
|
||||
These messages indicate which local variables may never be
|
||||
accessed indirectly.
|
||||
Most optimizations benefit significantly by this information.
|
||||
.PP
|
||||
The Inline Substitution technique needs to know how many bytes
|
||||
of formal parameters every procedure accesses.
|
||||
Only calls to procedures for which the EM code contains this information
|
||||
will be substituted in line.
|
||||
.NH 2
|
||||
Structure of the optimizer
|
||||
.PP
|
||||
The Global Optimizer is organized as a number of \fIphases\fR,
|
||||
each one performing some task.
|
||||
The main structure is as follows:
|
||||
.IP IC 6
|
||||
the Intermediate Code construction phase transforms EM into the
|
||||
intermediate code (ic) of the optimizer
|
||||
.IP CF
|
||||
the Control Flow phase extends the ic with control flow
|
||||
information and interprocedural information
|
||||
.IP OPTs
|
||||
zero or more optimization phases, each one performing one or
|
||||
more related optimizations
|
||||
.IP CA
|
||||
the Compact Assembly phase generates Compact Assembly Language EM code
|
||||
out of ic.
|
||||
.LP
|
||||
.PP
|
||||
An important issue in the design of a global optimizer is the
|
||||
interaction between optimization techniques.
|
||||
It is often advantageous to combine several techniques in
|
||||
one algorithm that takes into account all interactions between them.
|
||||
Ideally, one single algorithm should be developed that does
|
||||
all optimizations simultaneously and deals with all possible interactions.
|
||||
In practice, such an algorithm is still far out of reach.
|
||||
Instead some rather ad hoc (albeit important) combinations are chosen,
|
||||
such as Common Subexpression Elimination and Register Allocation.
|
||||
.[
|
||||
prabhala sethi common subexpressions
|
||||
.]
|
||||
.[
|
||||
sethi ullman optimal code
|
||||
.]
|
||||
.PP
|
||||
In the Em Global Optimizer there is one separate algorithm for
|
||||
every technique.
|
||||
Note that this does not mean that all techniques are independent
|
||||
of each other.
|
||||
.PP
|
||||
In principle, the optimization phases can be run in any order;
|
||||
a phase may even be run more than once.
|
||||
However, the following rules should be obeyed:
|
||||
.IP -
|
||||
the Live Variable analysis phase (LV) must be run prior to
|
||||
Register Allocation (RA), as RA uses information outputted by LV.
|
||||
.IP -
|
||||
RA should be the last phase; this is a consequence of the way
|
||||
the interface between RA and the Code Generator is defined.
|
||||
.LP
|
||||
The ordering of the phases has significant impact on
|
||||
the quality of the produced code.
|
||||
In
|
||||
.[
|
||||
wulf overview production quality carnegie-mellon
|
||||
.]
|
||||
two kinds of phase ordering problems are distinguished.
|
||||
If two techniques A and B both take away opportunities of each other,
|
||||
there is a "negative" ordering problem.
|
||||
If, on the other hand, both A and B introduce new optimization
|
||||
opportunities for each other, the problem is called "positive".
|
||||
In the Global Optimizer the following interactions must be
|
||||
taken into account:
|
||||
.IP -
|
||||
Inline Substitution (IL) may create new opportunities for most
|
||||
other techniques, so it should be run as early as possible
|
||||
.IP -
|
||||
Use Definition analysis (UD) may introduce opportunities for LV.
|
||||
.IP -
|
||||
Strength Reduction may create opportunities for UD
|
||||
.LP
|
||||
The optimizer has a default phase ordering, which can
|
||||
be changed by the user.
|
||||
.NH 2
|
||||
Structure of this document
|
||||
.PP
|
||||
The remaining chapters of this document each describe one
|
||||
phase of the optimizer.
|
||||
For every phase, we describe its task, its design,
|
||||
its implementation, and its source files.
|
||||
The latter two sections are intended to aid the
|
||||
maintenance of the optimizer and
|
||||
can be skipped by the initial reader.
|
||||
.NH 2
|
||||
References
|
||||
.PP
|
||||
There are very
|
||||
few modern textbooks on optimization.
|
||||
Chapters 12, 13, and 14 of
|
||||
.[
|
||||
aho compiler design
|
||||
.]
|
||||
are a good introduction to the subject.
|
||||
Wulf et. al.
|
||||
.[
|
||||
wulf optimizing compiler
|
||||
.]
|
||||
describe one specific optimizing (Bliss) compiler.
|
||||
Anklam et. al.
|
||||
.[
|
||||
anklam vax-11
|
||||
.]
|
||||
discuss code generation and optimization in
|
||||
compilers for one specific machine (a Vax-11).
|
||||
Kirchgaesner et. al.
|
||||
.[
|
||||
optimizing ada compiler
|
||||
.]
|
||||
present a brief description of many
|
||||
optimizations; the report also contains a lengthy (over 60 pages)
|
||||
bibliography.
|
||||
.PP
|
||||
The number of articles on optimization is quite impressive.
|
||||
The Lowrey and Medlock paper on the Fortran H compiler
|
||||
.[
|
||||
object code optimization
|
||||
.]
|
||||
is a classical one.
|
||||
Other papers on global optimization are.
|
||||
.[
|
||||
faiman optimizing pascal
|
||||
.]
|
||||
.[
|
||||
perkins sites
|
||||
.]
|
||||
.[
|
||||
harrison general purpose optimizing
|
||||
.]
|
||||
.[
|
||||
morel partial redundancies
|
||||
.]
|
||||
.[
|
||||
Mintz global optimizer
|
||||
.]
|
||||
Freudenberger
|
||||
.[
|
||||
freudenberger setl optimizer
|
||||
.]
|
||||
describes an optimizer for a Very High Level Language (SETL).
|
||||
The Production-Quality Compiler-Compiler (PQCC) project uses
|
||||
very sophisticated compiler techniques, as described in.
|
||||
.[
|
||||
wulf overview ieee
|
||||
.]
|
||||
.[
|
||||
wulf overview carnegie-mellon
|
||||
.]
|
||||
.[
|
||||
wulf machine-relative
|
||||
.]
|
||||
.PP
|
||||
Several Ph.D. theses are dedicated to optimization.
|
||||
Davidson
|
||||
.[
|
||||
davidson simplifying
|
||||
.]
|
||||
outlines a machine-independent peephole optimizer that
|
||||
improves assembly code.
|
||||
Katkus
|
||||
.[
|
||||
katkus
|
||||
.]
|
||||
describes how efficient programs can be obtained at little cost by
|
||||
optimizing only a small part of a program.
|
||||
Photopoulos
|
||||
.[
|
||||
photopoulos mixed code
|
||||
.]
|
||||
discusses the idea of generating interpreted intermediate code as well
|
||||
as assembly code, to obtain programs that are both small and fast.
|
||||
Shaffer
|
||||
.[
|
||||
shaffer automatic
|
||||
.]
|
||||
describes the theory of automatic subroutine generation.
|
||||
.]
|
||||
Leverett
|
||||
.[
|
||||
leverett register allocation compilers
|
||||
.]
|
||||
deals with register allocation in the PQCC compilers.
|
||||
.PP
|
||||
References to articles about specific optimization techniques
|
||||
will be given in later chapters.
|
||||
Reference in New Issue
Block a user