Initial revision

This commit is contained in:
ceriel
1987-03-03 10:59:52 +00:00
parent 4d4c8b45fb
commit 004f017550
30 changed files with 3903 additions and 0 deletions

58
doc/ego/ud/ud1 Normal file
View File

@@ -0,0 +1,58 @@
.bp
.NH 1
Use-Definition analysis
.NH 2
Introduction
.PP
The "Use-Definition analysis" phase (UD) consists of two related optimization
techniques that both depend on "Use-Definition" information.
The techniques are Copy Propagation and Constant Propagation.
They are best explained via an example (see Figs. 11.1 and 11.2).
.DS
(1) A := B A := B
... --> ...
(2) use(A) use(B)
Fig. 11.1 An example of Copy Propagation
.DE
.DS
(1) A := 12 A := 12
... --> ...
(2) use(A) use(12)
Fig. 11.2 An example of Constant Propagation
.DE
Both optimizations have to check that the value of A at line (2)
can only be obtained at line (1).
Copy Propagation also has to assure that the value of B is
the same at line (1) as at line (2).
.PP
One purpose of both transformations is to introduce
opportunities for the Dead Code Elimination optimization.
If the variable A is used nowhere else, the assignment A := B
becomes useless and can be eliminated.
.sp 0
If B is less expensive to access than A (e.g. this is sometimes the case
if A is a local variable and B is a global variable),
Copy Propagation directly improves the code itself.
If A is cheaper to access the transformation will not be performed.
Likewise, a constant as operand may be cheeper than a variable.
Having a constant as operand may also facilitate other optimizations.
.PP
The design of UD is based on the theory described in section
14.1 and 14.3 of.
.[
aho compiler design
.]
As a main departure from that theory,
we do not demand the statement A := B to become redundant after
Copy Propagation.
If B is cheaper to access than A, the optimization is always performed;
if B is more expensive than A, we never do the transformation.
If A and B are equally expensive UD uses the heuristic rule to
replace infrequently used variables by frequently used ones.
This rule increases the chances of the assignment to become useless.
.PP
In the next section we will give a brief outline of the data
flow theory used
for the implementation of UD.

64
doc/ego/ud/ud2 Normal file
View File

@@ -0,0 +1,64 @@
.NH 2
Data flow information
.PP
.NH 3
Use-Definition information
.PP
A \fIdefinition\fR of a variable A is an assignment to A.
A definition is said to \fIreach\fR a point p if there is a
path in the control flow graph from the definition to p, such that
A is not redefined on that path.
.PP
For every basic block B, we define the following sets:
.IP GEN[b] 9
the set of definitions in b that reach the end of b.
.IP KILL[b]
the set of definitions outside b that define a variable that
is changed in b.
.IP IN[b]
the set of all definitions reaching the beginning of b.
.IP OUT[b]
the set of all definitions reaching the end of b.
.LP
GEN and KILL can be determined by inspecting the code of the procedure.
IN and OUT are computed by solving the following data flow equations:
.DS
(1) OUT[b] = IN[b] - KILL[b] + GEN[b]
(2) IN[b] = OUT[p1] + ... + OUT[pn],
where PRED(b) = {p1, ... , pn}
.DE
.NH 3
Copy information
.PP
A \fIcopy\fR is a definition of the form "A := B".
A copy is said to be \fIgenerated\fR in a basic block n if
it occurs in n and there is no subsequent assignment to B in n.
A copy is said to be \fIkilled\fR in n if:
.IP (i)
it occurs in n and there is a subsequent assignment to B within n, or
.IP (ii)
it occurs outside n, the definition A := B reaches the beginning of n
and B is changed in n (note that a copy also is a definition).
.LP
A copy \fIreaches\fR a point p, if there are no assignments to B
on any path in the control flow graph from the copy to p.
.PP
We define the following sets:
.IP C_GEN[b] 11
the set of all copies in b generated in b.
.IP C_KILL[b]
the set of all copies killed in b.
.IP C_IN[b]
the set of all copies reaching the beginning of b.
.IP C_OUT[b]
the set of all copies reaching the end of b.
.LP
C_IN and C_OUT are computed by solving the following equations:
(root is the entry node of the current procedure; '*' denotes
set intersection)
.DS
(1) C_OUT[b] = C_IN[b] - C_KILL[b] + C_GEN[b]
(2) C_IN[b] = C_OUT[p1] * ... * C_OUT[pn],
where PRED(b) = {p1, ... , pn} and b /= root
C_IN[root] = {all copies}
.DE

26
doc/ego/ud/ud3 Normal file
View File

@@ -0,0 +1,26 @@
.NH 2
Pointers and subroutine calls
.PP
The theory outlined above assumes that variables can
only be changed by a direct assignment.
This condition does not hold for EM.
In case of an assignment through a pointer variable,
it is in general impossible to see which variable is affected
by the assignment.
Similar problems occur in the presence of procedure calls.
Therefore we distinguish two kinds of definitions:
.IP -
an \fIexplicit\fR definition is a direct assignment to one
specific variable
.IP -
an \fIimplicit\fR definition is the potential alteration of
a variable as a result of a procedure call or an indirect assignment.
.LP
An indirect assignment causes implicit definitions to
all variables that may be accessed indirectly, i.e.
all local variables for which no register message was generated
and all global variables.
If a procedure contains an indirect assignment it may change the
same set of variables, else it may change some global variables directly.
The KILL, GEN, IN and OUT sets contain explicit as well
as implicit definitions.

78
doc/ego/ud/ud4 Normal file
View File

@@ -0,0 +1,78 @@
.NH 2
Implementation
.PP
UD first builds a number of tables:
.IP locals: 9
contains information about the local variables of the
current procedure (offset,size,whether a register message was found
for it and, if so, the score field of that message)
.IP defs:
a table of all explicit definitions appearing in the
current procedure.
.IP copies:
a table of all copies appearing in the
current procedure.
.LP
Every variable (local as well as global), definition and copy
is identified by a unique number, which is the index
in the table.
All tables are constructed by traversing the EM code.
A fourth table, "vardefs" is used, indexed by a 'variable number',
which contains for every variable the set of explicit definitions of it.
Also, for each basic block b, the set CHGVARS containing all variables
changed by it is computed.
.PP
The GEN sets are obtained in one scan over the EM text,
by analyzing every EM instruction.
The KILL set of a basic block b is computed by looking at the
set of variables
changed by b (i.e. CHGVARS[b]).
For every such variable v, all explicit definitions to v
(i.e. vardefs[v]) that are not in GEN[b] are added to KILL[b].
Also, the implicit defininition of v is added to KILL[b].
Next, the data flow equations for use-definition information
are solved,
using a straight forward, iterative algorithm.
All sets are represented as bitvectors, so the operations
on sets (union, difference) can be implemented efficiently.
.PP
The C_GEN and C_KILL sets are computed simultaneously in one scan
over the EM text.
For every copy A := B appearing in basic block b we do
the following:
.IP 1.
for every basic block n /= b that changes B, see if the definition A := B
reaches the beginning of n (i.e. check if the index number of A := B in
the "defs" table is an element of IN[n]);
if so, add the copy to C_KILL[n]
.IP 2.
if B is redefined later on in b, add the copy to C_KILL[b], else
add it to C_GEN[b]
.LP
C_IN and C_OUT are computed from C_GEN and C_KILL via the second set of
data flow equations.
.PP
Finally, in one last scan all opportunities for optimization are
detected.
For every use u of a variable A, we check if
there is a unique explicit definition d reaching u.
.sp
If the definition is a copy A := B and B has the same value at d as
at u, then the use of A at u may be changed into B.
The latter condition can be verified as follows:
.IP -
if u and d are in the same basic block, see if there is
any assignment to B in between d and u
.IP -
if u and d are in different basic blocks, the condition is
satisfied if there is no assignment to B in the block of u prior to u
and d is in C_IN[b].
.LP
Before the transformation is actually done, UD first makes sure the
alteration is really desirable, as described before.
The information needed for this purpose (access costs of local and
global variables) is read from a machine descriptor file.
.sp
If the only definition reaching u has the form "A := constant", the use
of A at u is replaced by the constant.

19
doc/ego/ud/ud5 Normal file
View File

@@ -0,0 +1,19 @@
.NH 2
Source files of UD
.PP
The sources of UD are in the following files and packages:
.IP ud.h: 14
declarations of global variables and data structures
.IP ud.c:
the routine main; initialization of target machine dependent tables
.IP defs:
routines to compute the GEN and KILL sets and routines to analyse
EM instructions
.IP const:
routines involved in constant propagation
.IP copy:
routines involved in copy propagation
.IP aux:
contains auxiliary routines
.LP