Initial revision
This commit is contained in:
57
doc/ego/ic/ic1
Normal file
57
doc/ego/ic/ic1
Normal file
@@ -0,0 +1,57 @@
|
||||
.bp
|
||||
.NH
|
||||
The Intermediate Code and the IC phase
|
||||
.PP
|
||||
In this chapter the intermediate code of the EM global optimizer
|
||||
will be defined.
|
||||
The 'Intermediate Code construction' phase (IC),
|
||||
which builds the initial intermediate code from
|
||||
EM Compact Assembly Language,
|
||||
will be described.
|
||||
.NH 2
|
||||
Introduction
|
||||
.PP
|
||||
The EM global optimizer is a multi pass program,
|
||||
hence there is a need for an intermediate code.
|
||||
Usually, programs in the Amsterdam Compiler Kit use the
|
||||
Compact Assembly Language format
|
||||
.[~[
|
||||
keizer architecture
|
||||
.], section 11.2]
|
||||
for this purpose.
|
||||
Although this code has some convenient features,
|
||||
such as being compact,
|
||||
it is quite unsuitable in our case,
|
||||
because of a number of reasons.
|
||||
At first, the code lacks global information
|
||||
about whole procedures or whole basic blocks.
|
||||
Second, it uses identifiers ('names') to bind
|
||||
defining and applied occurrences of
|
||||
procedures, data labels and instruction labels.
|
||||
Although this is usual in high level programming
|
||||
languages, it is awkward in an intermediate code
|
||||
that must be read many times.
|
||||
Each pass of the optimizer would have
|
||||
to incorporate an identifier look-up mechanism
|
||||
to associate a defining occurrence with each
|
||||
applied occurrence of an identifier.
|
||||
Finally, EM programs are used to declare blocks of bytes,
|
||||
rather than variables. A 'hol 6' instruction may be used to
|
||||
declare three 2-byte variables.
|
||||
Clearly, the optimizer wants to deal with variables, and
|
||||
not with rows of bytes.
|
||||
.PP
|
||||
To overcome these problems, we have developed a new
|
||||
intermediate code.
|
||||
This code does not merely consist of the EM instructions,
|
||||
but also contains global information in the
|
||||
form of tables and graphs.
|
||||
Before describing the intermediate code we will
|
||||
first leap aside to outline
|
||||
the problems one generally encounters
|
||||
when trying to store complex data structures such as
|
||||
graphs outside the program, i.e. in a file.
|
||||
We trust this will enhance the
|
||||
comprehensibility of the
|
||||
intermediate code definition and the design and implementation
|
||||
of the IC phase.
|
||||
146
doc/ego/ic/ic2
Normal file
146
doc/ego/ic/ic2
Normal file
@@ -0,0 +1,146 @@
|
||||
.NH 2
|
||||
Representation of complex data structures in a sequential file
|
||||
.PP
|
||||
Most programmers are quite used to deal with
|
||||
complex data structures, such as
|
||||
arrays, graphs and trees.
|
||||
There are some particular problems that occur
|
||||
when storing such a data structure
|
||||
in a sequential file.
|
||||
We call data that is kept in
|
||||
main memory
|
||||
.UL internal
|
||||
,as opposed to
|
||||
.UL external
|
||||
data
|
||||
that is kept in a file outside the program.
|
||||
.sp
|
||||
We assume a simple data structure of a
|
||||
scalar type (integer, floating point number)
|
||||
has some known external representation.
|
||||
An
|
||||
.UL array
|
||||
having elements of a scalar type can be represented
|
||||
externally easily, by successively
|
||||
representing its elements.
|
||||
The external representation may be preceded by a
|
||||
number, giving the length of the array.
|
||||
Now, consider a linear, singly linked list,
|
||||
the elements of which look like:
|
||||
.DS
|
||||
record
|
||||
data: scalar_type;
|
||||
next: pointer_type;
|
||||
end;
|
||||
.DE
|
||||
It is significant to note that the "next"
|
||||
fields of the elements only have a meaning within
|
||||
main memory.
|
||||
The field contains the address of some location in
|
||||
main memory.
|
||||
If a list element is written to a file in
|
||||
some program,
|
||||
and read by another program,
|
||||
the element will be allocated at a different
|
||||
address in main memory.
|
||||
Hence this address value is completely
|
||||
useless outside the program.
|
||||
.sp
|
||||
One may represent the list by ignoring these "next" fields
|
||||
and storing the data items in the order they are linked.
|
||||
The "next" fields are represented \fIimplicitly\fR.
|
||||
When the file is read again,
|
||||
the same list can be reconstructed.
|
||||
In order to know where the external representation of the
|
||||
list ends,
|
||||
it may be useful to put the length of
|
||||
the list in front of it.
|
||||
.sp
|
||||
Note that arrays and linear lists have the
|
||||
same external representation.
|
||||
.PP
|
||||
A doubly linked, linear list,
|
||||
with elements of the type:
|
||||
.DS
|
||||
record
|
||||
data: scalar_type;
|
||||
next,
|
||||
previous: pointer_type;
|
||||
end
|
||||
.DE
|
||||
can be represented in precisely the same way.
|
||||
Both the "next" and the "previous" fields are represented
|
||||
implicitly.
|
||||
.PP
|
||||
Next, consider a binary tree,
|
||||
the nodes of which have type:
|
||||
.DS
|
||||
record
|
||||
data: scalar_type;
|
||||
left,
|
||||
right: pointer_type;
|
||||
end
|
||||
.DE
|
||||
Such a tree can be represented sequentially,
|
||||
by storing its nodes in some fixed order, e.g. prefix order.
|
||||
A special null data item may be used to
|
||||
denote a missing left or right son.
|
||||
For example, let the scalar type be integer,
|
||||
and let the null item be 0.
|
||||
Then the tree of fig. 3.1(a)
|
||||
can be represented as in fig. 3.1(b).
|
||||
.DS
|
||||
4
|
||||
|
||||
9 12
|
||||
|
||||
12 3 4 6
|
||||
|
||||
8 1 5 1
|
||||
|
||||
Fig. 3.1(a) A binary tree
|
||||
|
||||
|
||||
4 9 12 0 0 3 8 0 0 1 0 0 12 4 0 5 0 0 6 1 0 0 0
|
||||
|
||||
Fig. 3.1(b) Its sequential representation
|
||||
.DE
|
||||
We are still able to represent the pointer fields ("left"
|
||||
and "right") implicitly.
|
||||
.PP
|
||||
Finally, consider a general
|
||||
.UL graph
|
||||
, where each node has a "data" field and
|
||||
pointer fields,
|
||||
with no restriction on where they may point to.
|
||||
Now we're at the end of our tale.
|
||||
There is no way to represent the pointers implicitly,
|
||||
like we did with lists and trees.
|
||||
In order to represent them explicitly,
|
||||
we use the following scheme.
|
||||
Every node gets an extra field,
|
||||
containing some unique number that identifies the node.
|
||||
We call this number its
|
||||
.UL id.
|
||||
A pointer is represented externally as the id of the node
|
||||
it points to.
|
||||
When reading the file we use a table that maps
|
||||
an id to the address of its node.
|
||||
In general this table will not be completely filled in
|
||||
until we have read the entire external representation of
|
||||
the graph and allocated internal memory locations for
|
||||
every node.
|
||||
Hence we cannot reconstruct the graph in one scan.
|
||||
That is, there may be some pointers from node A to B,
|
||||
where B is placed after A in the sequential file than A.
|
||||
When we read the node of A we cannot map the id of B
|
||||
to the address of node B,
|
||||
as we have not yet allocated node B.
|
||||
We can overcome this problem if the size
|
||||
of every node is known in advance.
|
||||
In this case we can allocate memory for a node
|
||||
on first reference.
|
||||
Else, the mapping from id to pointer
|
||||
cannot be done while reading nodes.
|
||||
The mapping can be done either in an extra scan
|
||||
or at every reference to the node.
|
||||
414
doc/ego/ic/ic3
Normal file
414
doc/ego/ic/ic3
Normal file
@@ -0,0 +1,414 @@
|
||||
.NH 2
|
||||
Definition of the intermediate code
|
||||
.PP
|
||||
The intermediate code of the optimizer consists
|
||||
of several components:
|
||||
.IP -
|
||||
the object table
|
||||
.IP -
|
||||
the procedure table
|
||||
.IP -
|
||||
the em code
|
||||
.IP -
|
||||
the control flow graphs
|
||||
.IP -
|
||||
the loop table
|
||||
.LP -
|
||||
.PP
|
||||
These components are described in
|
||||
the next sections.
|
||||
The syntactic structure of every component
|
||||
is described by a set of context free syntax rules,
|
||||
with the following conventions:
|
||||
.DS
|
||||
x a non-terminal symbol
|
||||
A a terminal symbol (in capitals)
|
||||
x: a b c; a grammar rule
|
||||
a | b a or b
|
||||
(a)+ 1 or more occurrences of a
|
||||
{a} 0 or more occurrences of a
|
||||
.DE
|
||||
.NH 3
|
||||
The object table
|
||||
.PP
|
||||
EM programs declare blocks of bytes rather than (global) variables.
|
||||
A typical program may declare 'HOL 7780'
|
||||
to allocate space for 8 I/O buffers,
|
||||
2 large arrays and 10 scalar variables.
|
||||
The optimizer wants to deal with
|
||||
.UL objects
|
||||
like variables, buffers and arrays
|
||||
and certainly not with huge numbers of bytes.
|
||||
Therefore the intermediate code contains information
|
||||
about which global objects are used.
|
||||
This information can be obtained from an EM program
|
||||
by just looking at the operands of instruction
|
||||
such as LOE, LAE, LDE, STE, SDE, INE, DEE and ZRE.
|
||||
.PP
|
||||
The object table consists of a list of
|
||||
.UL datablock
|
||||
entries.
|
||||
Each such entry represents a declaration like HOL, BSS,
|
||||
CON or ROM.
|
||||
There are five kinds of datablock entries.
|
||||
The fifth kind,
|
||||
UNKNOWN, denotes a declaration in a
|
||||
separately compiled file that is not made
|
||||
available to the optimizer.
|
||||
Each datablock entry contains the type of the block,
|
||||
its size, and a description of the objects that
|
||||
belong to it.
|
||||
If it is a rom,
|
||||
it also contains a list of values given
|
||||
as arguments to the rom instruction,
|
||||
provided that this list contains only integer numbers.
|
||||
An object has an offset (within its datablock)
|
||||
and a size.
|
||||
The size need not always be determinable.
|
||||
Both datablock and object contain a unique
|
||||
identifying number
|
||||
(see previous section for their use).
|
||||
.DS
|
||||
.UL syntax
|
||||
object_table:
|
||||
{datablock} ;
|
||||
datablock:
|
||||
D_ID -- unique identifying number
|
||||
PSEUDO -- one of ROM,CON,BSS,HOL,UNKNOWN
|
||||
SIZE -- # bytes declared
|
||||
FLAGS
|
||||
{value} -- contents of rom
|
||||
{object} ; -- objects of the datablock
|
||||
object:
|
||||
O_ID -- unique identifying number
|
||||
OFFSET -- offset within the datablock
|
||||
SIZE ; -- size of the object in bytes
|
||||
value:
|
||||
argument ;
|
||||
.DE
|
||||
A data block has only one flag: "external", indicating
|
||||
whether the data label is externally visible.
|
||||
The syntax for "argument" will be given later on
|
||||
(see em_text).
|
||||
.NH 3
|
||||
The procedure table
|
||||
.PP
|
||||
The procedure table contains global information
|
||||
about all procedures that are made available
|
||||
to the optimizer
|
||||
and that are needed by the EM program.
|
||||
(Library units may not be needed, see section 3.5).
|
||||
The table has one entry for
|
||||
every procedure.
|
||||
.DS
|
||||
.UL syntax
|
||||
procedure_table:
|
||||
{procedure}
|
||||
procedure:
|
||||
P_ID -- unique identifying number
|
||||
#LABELS -- number of instruction labels
|
||||
#LOCALS -- number of bytes for locals
|
||||
#FORMALS -- number of bytes for formals
|
||||
FLAGS -- flag bits
|
||||
calling -- procedures called by this one
|
||||
change -- info about global variables changed
|
||||
use ; -- info about global variables used
|
||||
calling:
|
||||
{P_ID} ; -- procedures called
|
||||
change:
|
||||
ext -- external variables changed
|
||||
FLAGS ;
|
||||
use:
|
||||
FLAGS ;
|
||||
ext:
|
||||
{O_ID} ; -- a set of objects
|
||||
.DE
|
||||
.PP
|
||||
The number of bytes of formal parameters accessed by
|
||||
a procedure is determined by the front ends and
|
||||
passed via a message (parameter message) to the optimizer.
|
||||
If the front end is not able to determine this number
|
||||
(e.g. the parameter may be an array of dynamic size or
|
||||
the procedure may have a variable number of arguments) the attribute
|
||||
contains the value 'UNKNOWN_SIZE'.
|
||||
.sp 0
|
||||
A procedure has the following flags:
|
||||
.IP -
|
||||
external: true if the proc. is externally visible
|
||||
.IP -
|
||||
bodyseen: true if its code is available as EM text
|
||||
.IP -
|
||||
calunknown: true if it calls a procedure that has its bodyseen
|
||||
flag not set
|
||||
.IP -
|
||||
environ: true if it uses or changes a (non-global) variable in
|
||||
a lexically enclosing procedure
|
||||
.IP -
|
||||
lpi: true if is used as operand of an lpi instruction, so
|
||||
it may be called indirect
|
||||
.LP
|
||||
The change and use attributes both have one flag: "indirect",
|
||||
indicating whether the procedure does a 'use indirect'
|
||||
or a 'store indirect' (indirect means through a pointer).
|
||||
.NH 3
|
||||
The EM text
|
||||
.PP
|
||||
The EM text contains the EM instructions.
|
||||
Every EM instruction has an operation code (opcode)
|
||||
and 0 or 1 operands.
|
||||
EM pseudo instructions can have more than
|
||||
1 operand.
|
||||
The opcode is just a small (8 bit) integer.
|
||||
.sp
|
||||
There are several kinds of operands, which we will
|
||||
refer to as
|
||||
.UL types.
|
||||
Many EM instructions can have more than one type of operand.
|
||||
The types and their encodings in Compact Assembly Language
|
||||
are discussed extensively in.
|
||||
.[~[
|
||||
keizer architecture
|
||||
.], section 11.2]
|
||||
Of special interest is the way numeric values
|
||||
are represented.
|
||||
Of prime importance is the machine independency of
|
||||
the representation.
|
||||
Ultimately, one could store every integer
|
||||
just as a string of the characters '0' to '9'.
|
||||
As doing arithmetic on strings is awkward,
|
||||
Compact Assembly Language allows several alternatives.
|
||||
The main idea is to look at the value of the integer.
|
||||
Integers that fit in 16, 32 or 64 bits are
|
||||
represented as a row of resp. 2, 4 and 8 bytes,
|
||||
preceded by an indication of how many bytes are used.
|
||||
Longer integers are represented as strings;
|
||||
this is only allowed within pseudo instructions, however.
|
||||
This concept works very well for target machines
|
||||
with reasonable word sizes.
|
||||
At present, most ACK software cannot be used for word sizes
|
||||
higher than 32 bits,
|
||||
although the handles for using larger word sizes are
|
||||
present in the design of the EM code.
|
||||
In the intermediate code we essentially use the
|
||||
same ideas.
|
||||
We allow three representations of integers.
|
||||
.IP -
|
||||
integers that fit in a short are represented as a short
|
||||
.IP -
|
||||
integers that fit in a long but not in a short are represented
|
||||
as longs
|
||||
.IP -
|
||||
all remaining integers are represented as strings
|
||||
(only allowed in pseudos).
|
||||
.LP
|
||||
The terms short and long are defined in
|
||||
.[~[
|
||||
ritchie reference manual programming language
|
||||
.], section 4]
|
||||
and depend only on the source machine
|
||||
(i.e. the machine on which ACK runs),
|
||||
not on the target machines.
|
||||
For historical reasons a long will often be called an
|
||||
.UL offset.
|
||||
.PP
|
||||
Operands can also be instruction labels,
|
||||
objects or procedures.
|
||||
Instruction labels are denoted by a
|
||||
.UL label
|
||||
.UL identifier,
|
||||
which can be distinguished from a normal identifier.
|
||||
.sp
|
||||
The operand of a pseudo instruction can be a list of
|
||||
.UL arguments.
|
||||
Arguments can have the same type as operands, except
|
||||
for the type short, which is not used for arguments.
|
||||
Furthermore, an argument can be a string or
|
||||
a string representation of a signed integer, unsigned integer
|
||||
or floating point number.
|
||||
If the number of arguments is not fully determined by
|
||||
the pseudo instruction (e.g. a ROM pseudo can have any number
|
||||
of arguments), then the list is terminated by a special
|
||||
argument of type CEND.
|
||||
.DS
|
||||
.UL syntax
|
||||
em_text:
|
||||
{line} ;
|
||||
line:
|
||||
INSTR -- opcode
|
||||
OPTYPE -- operand type
|
||||
operand ;
|
||||
operand:
|
||||
empty | -- OPTYPE = NO
|
||||
SHORT | -- OPTYPE = SHORT
|
||||
OFFSET | -- OPTYPE = OFFSET
|
||||
LAB_ID | -- OPTYPE = INSTRLAB
|
||||
O_ID | -- OPTYPE = OBJECT
|
||||
P_ID | -- OPTYPE = PROCEDURE
|
||||
{argument} ; -- OPTYPE = LIST
|
||||
argument:
|
||||
ARGTYPE
|
||||
arg ;
|
||||
arg:
|
||||
empty | -- ARGTYPE = CEND
|
||||
OFFSET |
|
||||
LAB_ID |
|
||||
O_ID |
|
||||
P_ID |
|
||||
string | -- ARGTYPE = STRING
|
||||
const ; -- ARGTYPE = ICON,UCON or FCON
|
||||
string:
|
||||
LENGTH -- number of characters
|
||||
{CHARACTER} ;
|
||||
const:
|
||||
SIZE -- number of bytes
|
||||
string ; -- string representation of (un)signed
|
||||
-- or floating point constant
|
||||
.DE
|
||||
.NH 3
|
||||
The control flow graphs
|
||||
.PP
|
||||
Each procedure can be divided
|
||||
into a number of basic blocks.
|
||||
A basic block is a piece of code with
|
||||
no jumps in, except at the beginning,
|
||||
and no jumps out, except at the end.
|
||||
.PP
|
||||
Every basic block has a set of
|
||||
.UL successors,
|
||||
which are basic blocks that can follow it immediately in
|
||||
the dynamic execution sequence.
|
||||
The
|
||||
.UL predecessors
|
||||
are the basic blocks of which this one
|
||||
is a successor.
|
||||
The successor and predecessor attributes
|
||||
of all basic blocks of a single procedure
|
||||
are said to form the
|
||||
.UL control
|
||||
.UL flow
|
||||
.UL graph
|
||||
of that procedure.
|
||||
.PP
|
||||
Another important attribute is the
|
||||
.UL immediate
|
||||
.UL dominator.
|
||||
A basic block B dominates a block C if
|
||||
every path in the graph from the procedure entry block
|
||||
to C goes through B.
|
||||
The immediate dominator of C is the closest dominator
|
||||
of C on any path from the entry block.
|
||||
(Note that the dominator relation is transitive,
|
||||
so the immediate dominator is well defined.)
|
||||
.PP
|
||||
A basic block also has an attribute containing
|
||||
the identifiers of every
|
||||
.UL loop
|
||||
that the block belongs to (see next section for loops).
|
||||
.DS
|
||||
.UL syntax
|
||||
control_flow_graph:
|
||||
{basic_block} ;
|
||||
basic_block:
|
||||
B_ID -- unique identifying number
|
||||
#INSTR -- number of EM instructions
|
||||
succ
|
||||
pred
|
||||
idom -- immediate dominator
|
||||
loops -- set of loops
|
||||
FLAGS ; -- flag bits
|
||||
succ:
|
||||
{B_ID} ;
|
||||
pred:
|
||||
{B_ID} ;
|
||||
idom:
|
||||
B_ID ;
|
||||
loops:
|
||||
{LP_ID} ;
|
||||
.DE
|
||||
The flag bits can have the values 'firm' and 'strong',
|
||||
which are explained below.
|
||||
.NH 3
|
||||
The loop tables
|
||||
.PP
|
||||
Every procedure has an associated
|
||||
.UL loop
|
||||
.UL table
|
||||
containing information about all the loops
|
||||
in the procedure.
|
||||
Loops can be detected by a close inspection of
|
||||
the control flow graph.
|
||||
The main idea is to look for two basic blocks,
|
||||
B and C, for which the following holds:
|
||||
.IP -
|
||||
B is a successor of C
|
||||
.IP -
|
||||
B is a dominator of C
|
||||
.LP
|
||||
B is called the loop
|
||||
.UL entry
|
||||
and C is called the loop
|
||||
.UL end.
|
||||
Intuitively, C contains a jump backwards to
|
||||
the beginning of the loop (B).
|
||||
.PP
|
||||
A loop L1 is said to be
|
||||
.UL nested
|
||||
within loop L2 if all basic blocks of L1
|
||||
are also part of L2.
|
||||
It is important to note that loops could
|
||||
originally be written as a well structured for -or
|
||||
while loop or as a messy goto loop.
|
||||
Hence loops may partly overlap without one
|
||||
being nested inside the other.
|
||||
The
|
||||
.UL nesting
|
||||
.UL level
|
||||
of a loop is the number of loops in
|
||||
which it is nested (so it is 0 for
|
||||
an outermost loop).
|
||||
The details of loop detection will be discussed later.
|
||||
.PP
|
||||
It is often desirable to know whether a
|
||||
basic block gets executed during every iteration
|
||||
of a loop.
|
||||
This leads to the following definitions:
|
||||
.IP -
|
||||
A basic block B of a loop L is said to be a \fIfirm\fR block
|
||||
of L if B is executed on all successive iterations of L,
|
||||
with the only possible exception of the last iteration.
|
||||
.IP -
|
||||
A basic block B of a loop L is said to be a \fIstrong\fR block
|
||||
of L if B is executed on all successive iterations of L.
|
||||
.LP
|
||||
Note that a strong block is also a firm block.
|
||||
If a block is part of a conditional statement, it is neither
|
||||
strong nor firm, as it may be skipped during some iterations
|
||||
(see Fig. 3.2).
|
||||
.DS
|
||||
loop
|
||||
if cond1 then
|
||||
... -- this code will not
|
||||
-- result in a firm or strong block
|
||||
end if;
|
||||
... -- strong (always executed)
|
||||
exit when cond2;
|
||||
... -- firm (not executed on
|
||||
-- last iteration).
|
||||
end loop;
|
||||
|
||||
Fig. 3.2 Example of firm and strong block
|
||||
.DE
|
||||
.DS
|
||||
.UL syntax
|
||||
looptable:
|
||||
{loop} ;
|
||||
loop:
|
||||
LP_ID -- unique identifying number
|
||||
LEVEL -- loop nesting level
|
||||
entry -- loop entry block
|
||||
end ;
|
||||
entry:
|
||||
B_ID ;
|
||||
end:
|
||||
B_ID ;
|
||||
.DE
|
||||
80
doc/ego/ic/ic4
Normal file
80
doc/ego/ic/ic4
Normal file
@@ -0,0 +1,80 @@
|
||||
.NH 2
|
||||
External representation of the intermediate code
|
||||
.PP
|
||||
The syntax of the intermediate code was given
|
||||
in the previous section.
|
||||
In this section we will make some remarks about
|
||||
the representation of the code in sequential files.
|
||||
.sp
|
||||
We use sequential files in order to avoid
|
||||
the bookkeeping of complex file indices.
|
||||
As a consequence of this decision
|
||||
we can't store all components
|
||||
of the intermediate code
|
||||
in one file.
|
||||
If a phase wishes to change some attribute
|
||||
of a procedure,
|
||||
or wants to add or delete entire procedures
|
||||
(inline substitution may do the latter),
|
||||
the procedure table will only be fully updated
|
||||
after the entire EM text has been scanned.
|
||||
Yet, the next phase undoubtedly wants
|
||||
to read the procedure table before it
|
||||
starts working on the EM text.
|
||||
Hence there is an ordering problem, which
|
||||
can be solved easily by putting the
|
||||
procedure table in a separate file.
|
||||
Similarly, the data block table is kept
|
||||
in a file of its own.
|
||||
.PP
|
||||
The control flow graphs (CFGs) could be mixed
|
||||
with the EM text.
|
||||
Rather, we have chosen to put them
|
||||
in a separate file too.
|
||||
The control flow graph file should be regarded as a
|
||||
file that imposes some structure on the EM-text file,
|
||||
just as an overhead sheet containing a picture
|
||||
of a Flow Chart may be put on an overhead sheet
|
||||
containing statements.
|
||||
The loop tables are also put in the CFG file.
|
||||
A loop imposes an extra structure on the
|
||||
CFGs and hence on the EM text.
|
||||
So there are four files:
|
||||
.IP -
|
||||
the EM-text file
|
||||
.IP -
|
||||
the procedure table file
|
||||
.IP -
|
||||
the object table file
|
||||
.IP -
|
||||
the CFG and loop tables file
|
||||
.LP
|
||||
Every table is preceded by its length, in order to
|
||||
tell where it ends.
|
||||
The CFG file also contains the number of instructions of
|
||||
every basic block,
|
||||
indicating which part of the EM text belongs
|
||||
to that block.
|
||||
.DS
|
||||
.UL syntax
|
||||
intermediate_code:
|
||||
object_table_file
|
||||
proctable_file
|
||||
em_text_file
|
||||
cfg_file ;
|
||||
object_table_file:
|
||||
LENGTH -- number of objects
|
||||
object_table ;
|
||||
proctable_file:
|
||||
LENGTH -- number of procedures
|
||||
procedure_table ;
|
||||
em_text_file:
|
||||
em_text ;
|
||||
cfg_file:
|
||||
{per_proc} ; -- one for every procedure
|
||||
per_proc:
|
||||
BLENGTH -- number of basic blocks
|
||||
LLENGTH -- number of loops
|
||||
control_flow_graph
|
||||
looptable ;
|
||||
.DE
|
||||
163
doc/ego/ic/ic5
Normal file
163
doc/ego/ic/ic5
Normal file
@@ -0,0 +1,163 @@
|
||||
.NH 2
|
||||
The Intermediate Code construction phase
|
||||
.PP
|
||||
The first phase of the global optimizer,
|
||||
called
|
||||
.UL IC,
|
||||
constructs a major part of the intermediate code.
|
||||
To be specific, it produces:
|
||||
.IP -
|
||||
the EM text
|
||||
.IP -
|
||||
the object table
|
||||
.IP -
|
||||
part of the procedure table
|
||||
.LP
|
||||
The calling, change and use attributes of a procedure
|
||||
and all its flags except the external and bodyseen flags
|
||||
are computed by the next phase (Control Flow phase).
|
||||
.PP
|
||||
As explained before,
|
||||
the intermediate code does not contain
|
||||
any names of variables or procedures.
|
||||
The normal identifiers are replaced by identifying
|
||||
numbers.
|
||||
Yet, the output of the global optimizer must
|
||||
contain normal identifiers, as this
|
||||
output is in Compact Assembly Language format.
|
||||
We certainly want all externally visible names
|
||||
to be the same in the input as in the output,
|
||||
because the optimized EM module may be a library unit,
|
||||
used by other modules.
|
||||
IC dumps the names of all procedures and data labels
|
||||
on two files:
|
||||
.IP -
|
||||
the procedure dump file, containing tuples (P_ID, procedure name)
|
||||
.IP -
|
||||
the data dump file, containing tuples (D_ID, data label name)
|
||||
.LP
|
||||
The names of instruction labels are not dumped,
|
||||
as they are not visible outside the procedure
|
||||
in which they are defined.
|
||||
.PP
|
||||
The input to IC consists of one or more files.
|
||||
Each file is either an EM module in Compact Assembly Language
|
||||
format, or a Unix archive file (library) containing such modules.
|
||||
IC only extracts those modules from a library that are
|
||||
needed somehow, just as a linker does.
|
||||
It is advisable to present as much code
|
||||
of the EM program as possible to the optimizer,
|
||||
although it is not required to present the whole program.
|
||||
If a procedure is called somewhere in the EM text,
|
||||
but its body (text) is not included in the input,
|
||||
its bodyseen flag in the procedure table will still
|
||||
be off.
|
||||
Whenever such a procedure is called,
|
||||
we assume the worst case for everything;
|
||||
it will change and use all variables it has access to,
|
||||
it will call every procedure etc.
|
||||
.sp
|
||||
Similarly, if a data label is used
|
||||
but not defined, the PSEUDO attribute in its data block
|
||||
will be set to UNKNOWN.
|
||||
.NH 3
|
||||
Implementation
|
||||
.PP
|
||||
Part of the code for the EM Peephole Optimizer
|
||||
.[
|
||||
staveren peephole toplass
|
||||
.]
|
||||
has been used for IC.
|
||||
Especially the routines that read and unravel
|
||||
Compact Assembly Language and the identifier
|
||||
lookup mechanism have been used.
|
||||
New code was added to recognize objects,
|
||||
build the object and procedure tables and to
|
||||
output the intermediate code.
|
||||
.PP
|
||||
IC uses singly linked linear lists for both the
|
||||
procedure and object table.
|
||||
Hence there are no limits on the size of such
|
||||
a table (except for the trivial fact that it must fit
|
||||
in main memory).
|
||||
Both tables are outputted after all EM code has
|
||||
been processed.
|
||||
IC reads the EM text of one entire procedure
|
||||
at a time,
|
||||
processes it and appends the modified code to
|
||||
the EM text file.
|
||||
EM code is represented internally as a doubly linked linear
|
||||
list of EM instructions.
|
||||
.PP
|
||||
Objects are recognized by looking at the operands
|
||||
of instructions that reference global data.
|
||||
If we come across the instructions:
|
||||
.DS
|
||||
LDE X+6 -- Load Double External
|
||||
LAE X+20 -- Load Address External
|
||||
.DE
|
||||
we conclude that the data block
|
||||
preceded by the data label X contains an object
|
||||
at offset 6 of size twice the word size,
|
||||
and an object at offset 20 of unknown size.
|
||||
.sp
|
||||
A data block entry of the object table is allocated
|
||||
at the first reference to a data label.
|
||||
If this reference is a defining occurrence
|
||||
or a INA pseudo instruction,
|
||||
the label is not externally visible
|
||||
.[~[
|
||||
keizer architecture
|
||||
.], section 11.1.4.3]
|
||||
In this case, the external flag of the data block
|
||||
is turned off.
|
||||
If the first reference is an applied occurrence
|
||||
or a EXA pseudo instruction, the flag is set.
|
||||
We record this information, because the
|
||||
optimizer may change the order of defining and
|
||||
applied occurrences.
|
||||
The INA and EXA pseudos are removed from the EM text.
|
||||
They may be regenerated by the last phase
|
||||
of the optimizer.
|
||||
.sp
|
||||
Similar rules hold for the procedure table
|
||||
and the INP and EXP pseudos.
|
||||
.NH 3
|
||||
Source files of IC
|
||||
.PP
|
||||
The source files of IC consist
|
||||
of the files ic.c, ic.h and several packages.
|
||||
.UL ic.h
|
||||
contains type definitions, macros and
|
||||
variable declarations that may be used by
|
||||
ic.c and by every package.
|
||||
.UL ic.c
|
||||
contains the definitions of these variables,
|
||||
the procedure
|
||||
.UL main
|
||||
and some high level I/O routines used by main.
|
||||
.sp
|
||||
Every package xxx consists of two files.
|
||||
ic_xxx.h contains type definitions,
|
||||
macros, variable declarations and
|
||||
procedure declarations that may be used by
|
||||
every .c file that includes this .h file.
|
||||
The file ic_xxx.c provides the
|
||||
definitions of these variables and
|
||||
the implementation of the declared procedures.
|
||||
IC uses the following packages:
|
||||
.IP lookup: 18
|
||||
procedures that loop up procedure, data label
|
||||
and instruction label names; procedures to dump
|
||||
the procedure and data label names.
|
||||
.IP lib:
|
||||
one procedure that gets the next useful input module;
|
||||
while scanning archives, it skips unnecessary modules.
|
||||
.IP aux:
|
||||
several auxiliary routines.
|
||||
.IP io:
|
||||
low-level I/O routines that unravel the Compact
|
||||
Assembly Language.
|
||||
.IP put:
|
||||
routines that output the intermediate code
|
||||
.LP
|
||||
Reference in New Issue
Block a user