Delete 689 undead files.
These files "magically reappeared" after the conversion from CVS to Mercurial. The old CVS repository deleted these files but did not record *when* it deleted these files. The conversion resurrected these files because they have no history of deletion. These files were probably deleted before year 1995. The CVS repository begins to record deletions around 1995. These files may still appear in older revisions of this Mercurial repository, when they should already be deleted. There is no way to fix this, because the CVS repository provides no dates of deletion. See http://sourceforge.net/mailarchive/message.php?msg_id=29823032
This commit is contained in:
@@ -1,6 +0,0 @@
|
||||
PIC=pic
|
||||
TBL=tbl
|
||||
REFER=refer
|
||||
|
||||
../ceg.doc: ceg.tr ceg.ref
|
||||
$(PIC) ceg.tr | $(REFER) -e -p ceg.ref | $(TBL) > $@
|
||||
@@ -1,284 +0,0 @@
|
||||
.TL
|
||||
|
||||
Code Expander
|
||||
.br
|
||||
(proposal)
|
||||
|
||||
.SH
|
||||
Introduction
|
||||
.LP
|
||||
The \fBcode expander\fR, \fBce\fR, is a program that translates EM-code to
|
||||
objectcode. The main goal is to translate very fast. \fBce\fR is an instance
|
||||
of the EM_CODE(3L)-interface. During execution of \fBce\fR, \fBce\fR will build
|
||||
in core a machine independent objectfile ( NEW A.OUT(5L)). With \fBcv\fR or
|
||||
with routines supplied by the user the machine independent objectcode will
|
||||
be converted to a machine dependent object code. \fBce\fR needs
|
||||
information about the targetmachine (e.g. the opcode's). We divide the
|
||||
information into two parts:
|
||||
.IP
|
||||
- The description in assembly instructions of EM-code instructions.
|
||||
.IP
|
||||
- The description in objectcode of assembly instructions.
|
||||
.LP
|
||||
With these two tables we can make a \fBcode expander generator\fR which
|
||||
generates a \fBce\fR. It is possible to put the information in one table
|
||||
but that will probably introduce (propable) more bugs in the table. So we
|
||||
divide and conquer. With this approach it is also possible to generate
|
||||
assembly code ( rather yhan objectcode), wich is useful for debugging.
|
||||
There is of course a link between the two tables, the link
|
||||
consist of a restriction on the assembly format. Every assembly
|
||||
instruction must have the following format:
|
||||
.sp
|
||||
INSTR ::= LABEL : MNEMONIC [ OPERAND ( "," OPERAND)* ]
|
||||
.sp
|
||||
.LP
|
||||
\fBCeg\fR uses the following algorithm:
|
||||
.IP \0\0a)
|
||||
The assembly table will be converted to a (C-)routine assemble().
|
||||
assemble() gets as argument a string, the assembler instruction,
|
||||
and can use the MNEMONIC to execute the corresponding action in the
|
||||
assembly table.
|
||||
.IP \0\0b)
|
||||
The routine assemble() can now be used to convert the EM-code table to
|
||||
a set of C-routines, wich together form an instance of the
|
||||
EM_CODE(3L).
|
||||
.SH
|
||||
The EM-instruction table
|
||||
.LP
|
||||
We use the following grammar:
|
||||
.sp
|
||||
.TS
|
||||
center box ;
|
||||
l.
|
||||
TABLE ::= (ROW)*
|
||||
ROW ::= C_instr ( SPECIAL | SIMPLE)
|
||||
SPECIAL ::= ( CONDITION SIMPLE)+ 'default' SIMPLE
|
||||
SIMPLE ::= '==>' ACTIONLIST | '::=' ACTIONLIST
|
||||
ACTIONLIST ::= [ ACTION ( ';' ACTION)* ] '.'
|
||||
ACTION ::= function-call | assembly-instruction
|
||||
.TE
|
||||
.LP
|
||||
An example for the 8086:
|
||||
.LP
|
||||
.DS
|
||||
C_lxl
|
||||
$arg1 == 0 ==> "push bp".
|
||||
$arg1 == 1 ==> "push EM_BSIZE(bp)".
|
||||
default ==> "mov cx, $arg1";
|
||||
"mov si, bp";
|
||||
"1: mov si, EM_BSIZE(si);
|
||||
"loop 1b"
|
||||
"push si".
|
||||
.DE
|
||||
.sp
|
||||
Some remarks:
|
||||
.sp
|
||||
* The C_instr is a function indentifier in the EM_CODE(3L)-interface.
|
||||
.LP
|
||||
* CONDITION is a "boolean" C-expression.
|
||||
.LP
|
||||
* The arguments of an EM-instruction can be used in CONDITION and in assembly
|
||||
instructions. They are referred by $arg\fIi\fR. \fBceg\fR modifies the
|
||||
arguments as follows:
|
||||
.IP \0\0-
|
||||
For local variables at positive offsets it increases this offset by EM_BSIZE
|
||||
.IP \0\0-
|
||||
It makes names en labels unique. The user must supply the formats (see mach.h).
|
||||
.LP
|
||||
* function-call is allowed to implement e.g. push/pop optimization.
|
||||
For example:
|
||||
.LP
|
||||
.DS
|
||||
C_adi
|
||||
$arg1 == 2 ==> combine( "pop ax");
|
||||
combine( "pop bx");
|
||||
"add ax, bx";
|
||||
save( "push ax").
|
||||
default ==> arg_error( "C_adi", $arg1).
|
||||
.DE
|
||||
.LP
|
||||
* The C-functions called in the EM-instructions table have to use the routine
|
||||
assemble()/gen?(). "assembler-instr" is in fact assemble( "assembler-instr").
|
||||
.LP
|
||||
* \fBceg\fR takes care not only about the conversions of arguments but also
|
||||
about
|
||||
changes between segments. There are situation when one doesn't want
|
||||
conversion of arguments. This can be done by using ::= in stead of ==>.
|
||||
This is usefull when two C_instr are equivalent. For example:
|
||||
.IP
|
||||
C_slu ::= C_sli( $arg1)
|
||||
.LP
|
||||
* There are EM-CODE instructions wich are machine independent (e.g. C_open()).
|
||||
For these EM_CODE instructions \fBceg\fR will generate \fIdefault\fR-
|
||||
instructions. There is one exception: in the case of C_pro() the tablewriter
|
||||
has to supply a function prolog().
|
||||
.LP
|
||||
* Also the EM-pseudoinstructions C_bss_\fIcstp\fR(), C_hol_\fIcstp\fR(),
|
||||
C_con_\fIcstp\fR() and C_rom_\fIcstp\fR can be translated automaticly.
|
||||
\fBceg\fR only has to know how to interpretate string-constants:
|
||||
.DS
|
||||
\&..icon $arg2 == 1 ==> gen1( (char) atoi( $arg1))
|
||||
$arg2 == 2 ==> gen2( atoi( $arg1))
|
||||
$arg2 == 4 ==> gen4( atol( $arg1))
|
||||
\&..ucon $arg2 == 1 ==> gen1( (char) atoi( $arg1))
|
||||
$arg2 == 2 ==> gen2( atoi( $arg1))
|
||||
$arg2 == 4 ==> gen4( atol( $arg1))
|
||||
\&..fcon ::= not_implemented( "..fcon")
|
||||
.DE
|
||||
.LP
|
||||
* Still, life can be made easier for the tablewriter; For the routines wich
|
||||
he/she didn't implement \fBceg\fR will generate a default instruction wich
|
||||
generates an error-message. \fBceg\fR seems to generate :
|
||||
.IP
|
||||
C_xxx ::= not_implemented( "C_xxx")
|
||||
.SH
|
||||
The assembly table
|
||||
.LP
|
||||
How to map assembly on objectcode.
|
||||
.LP
|
||||
Each row in the table consists of two fields, one field for the assembly
|
||||
instruction, the other field for the corresponding objectcode. The tablewriter
|
||||
can use the following primitives to generate code for the machine
|
||||
instructions :
|
||||
.IP "\0\0gen1( b)\0\0:" 17
|
||||
generates one byte in de machine independent objectfile.
|
||||
.IP "\0\0gen2( w)\0\0:" 17
|
||||
generates one word ( = two bytes), the table writer can change the byte
|
||||
order by setting the flag BYTES_REVERSED.
|
||||
.IP "\0\0gen4( l)\0\0:" 17
|
||||
generates two words ( = four bytes), the table writer can change the word
|
||||
order by setting the flag WORDS_REVERSED.
|
||||
.IP "\0\0reloc( n, o, r)\0\0:" 17
|
||||
generates relocation information for a label ( = name + offset +
|
||||
relocationtype).
|
||||
.LP
|
||||
Besides these primitives the table writer may use his self written
|
||||
C-functions. This allows the table writer e.g. to write functions to set
|
||||
bitfields within a byte.
|
||||
.LP
|
||||
There are more or less two methods to encode the assembly instructions:
|
||||
.IP \0\0a)
|
||||
MNEMONIC and OPERAND('s) are encoded independently of each other. This can be
|
||||
done when the target machine has an orthogonal instruction set (e.g. pdp-11).
|
||||
.IP \0\0b)
|
||||
MNEMONIC and OPERAND('s) together determine the opcode. In this case the
|
||||
assembler often uses overloading: one MNEMONIC is used for several
|
||||
different machine-instructions. For example : (8086)
|
||||
.br
|
||||
mov ax, bx
|
||||
.br
|
||||
mov ax, variable
|
||||
.br
|
||||
These instructions have different opcodes.
|
||||
.LP
|
||||
As the transformation MNEMONIC-OPCODE is not one to
|
||||
one the table writer must be allowed to put restrictions on the operands.
|
||||
This can be done with type declarations. For example:
|
||||
.LP
|
||||
.DS
|
||||
mov dst:REG, src:MEM ==>
|
||||
gen1( 0x8b);
|
||||
modRM( op2.reg, op1);
|
||||
.DE
|
||||
.DS
|
||||
mov dst:REG, src:REG ==>
|
||||
gen1( 0x89);
|
||||
modRM( op2.reg, op1);
|
||||
.DE
|
||||
.LP
|
||||
modRM() is a function written by the tablewriter and is used to encode
|
||||
the operands. This frees the table writer of endless typing.
|
||||
.LP
|
||||
The table writer has to do the "typechecking" by himself. But typechecking
|
||||
is almost the same as operand decoding. So it's more efficient to do this
|
||||
in one function. We now have all the tools to describe the function
|
||||
assemble().
|
||||
.IP
|
||||
assemble() first calls the function
|
||||
decode_operand() ( by the table writer written), with two arguments: a
|
||||
string ( the operand) and a
|
||||
pointer to a struct. The struct is declared by the table writer and must
|
||||
consist of at least a field called type. ( the other fields in the struct can
|
||||
be used to remember information about the decoded operand.) Now assemble()
|
||||
fires a row wich is selected by mapping the MNEMONIC and the type of the
|
||||
operands.
|
||||
.br
|
||||
In the second field of a row there may be references to other
|
||||
fields in the struct (e.g. op2.reg in the example above).
|
||||
.LP
|
||||
We ignored one problem. It's possible when the operands are encoded, that
|
||||
not everything is known. For example $arg\fIi\fR arguments in the
|
||||
EM-instruction table get their value at runtime. This problem is solved by
|
||||
introducing a function eval(). eval() has a string as argument and returns
|
||||
an arith. The string consists of constants and/or $arg\fIi\fR's and the value
|
||||
returned by eval() is the value of the string. To encode the $arg\fIi\fR's
|
||||
in as few bytes as possible the table writer can use the statements %if,
|
||||
%else and %endif. They can be used in the same manner as #if, #else and
|
||||
#endif in C and result in a runtime test. An example :
|
||||
.LP
|
||||
.DS
|
||||
-- Some rows of the assembly table
|
||||
|
||||
mov dst:REG, src:DATA ==>
|
||||
%if sfit( eval( src), 8) /* does the immediate-data fit in 1 byte? */
|
||||
R53( 0x16 , op1.reg);
|
||||
gen1( eval( src));
|
||||
%else
|
||||
R53( 0x17 , op1.reg);
|
||||
gen2( eval( src));
|
||||
%endif
|
||||
.LD
|
||||
|
||||
mov dst:REG, src:REG ==>
|
||||
gen1( 0x8b);
|
||||
modRM( op1.reg, op2);
|
||||
|
||||
.DE
|
||||
.DS
|
||||
-- The corresponding part in the function assemble() :
|
||||
|
||||
case MNEM_mov :
|
||||
decode_operand( arg1, &op1);
|
||||
decode_operand( arg2, &op2);
|
||||
if ( REG( op1.type) && DATA( op2.type)) {
|
||||
printf( "if ( sfit( %s, 8)) {\\\\n", eval( src));
|
||||
R53( 0x16 , op1.reg);
|
||||
printf( "gen1( %s)\\\\n", eval( arg2));
|
||||
printf( "}\\\\nelse {\\\\n");
|
||||
R53( 0x17 , op1.reg);
|
||||
printf( "gen2( %s)\\\\n", eval( arg2));
|
||||
printf( "}\\\\n");
|
||||
}
|
||||
else if ( REG( op1.type) && REG( op2.type)) {
|
||||
gen1( 0x8b);
|
||||
modRM( op1.reg, op2);
|
||||
}
|
||||
|
||||
|
||||
.DE
|
||||
.DS
|
||||
-- Some rows of the right part of the EM-instruction table are translated
|
||||
-- in the following C-functions.
|
||||
|
||||
"mov ax, $arg1" ==>
|
||||
if ( sfit( w, 8)) { /* w is the actual argument of C_xxx( w) */
|
||||
gen1( 176); /* R53() */
|
||||
gen1( w);
|
||||
}
|
||||
else {
|
||||
gen1( 184);
|
||||
gen2( w);
|
||||
}
|
||||
.LD
|
||||
|
||||
"mov ax, bx" ==>
|
||||
gen1( 138);
|
||||
gen1( 99); /* modRM() */
|
||||
.DE
|
||||
.SH
|
||||
Restrictions
|
||||
.LP
|
||||
.IP \0\01)
|
||||
The EM-instructions C_exc() is not implemented.
|
||||
.IP \0\03)
|
||||
All messages are ignored.
|
||||
@@ -1,276 +0,0 @@
|
||||
.TL
|
||||
A prototype Code expander
|
||||
.NH
|
||||
Introduction
|
||||
.PP
|
||||
A program to be compiled with ACK is first fed into the preprocessor.
|
||||
The output of the preprocessor goes into the appropiate front end,
|
||||
whose job it is to produce EM. The EM code generated is
|
||||
fed into the peephole optimizer, wich scans it with a window of few
|
||||
instructions, replacing certain inefficient code sequences by better
|
||||
ones. Following the peephole optimizer follows a backend wich produces
|
||||
good assembly code. The assembly code goes into the assembler and the objectcode
|
||||
then goes into the loader/linker, the final component in the pipeline.
|
||||
.PP
|
||||
For various applications this scheme is too slow. For example for testing
|
||||
programs; In this case the program has to be translated fast and the
|
||||
runtime of the objectcode may be slower. A solution is to build a code
|
||||
expander ( \fBce\fR) wich translates EM code to objectcode. Of course this
|
||||
has to
|
||||
be done automaticly by a code expander generator, but to get some feeling
|
||||
for the problem we started out to build prototypes.
|
||||
We built two types of ce's. One wich tranlated EM to assembly, one
|
||||
wich translated EM to objectcode.
|
||||
.NH
|
||||
EM to assembly
|
||||
.PP
|
||||
We made one for the 8086 and one for the vax4. These ce's are instances of the
|
||||
EM_CODE(3L)-interface and produce for a single EM instruction a set
|
||||
of assembly instruction wich are semantic equivalent.
|
||||
We implemented in the 8086-ce push/pop-optimalization.
|
||||
.NH
|
||||
EM to objectcode
|
||||
.PP
|
||||
Instead of producing assembly code we tried to produce vax4-objectcode.
|
||||
During execution of ce, ce builds in core a machine independent
|
||||
objectfile ( NEW A.OUT(5L)) and just before dumping the tables this
|
||||
objectfile is converted to a Berkly 4.2BSD a.out-file. We build two versions;
|
||||
One with static memory allocation and one with dynamic memory allocation.
|
||||
If the first one runs out of memory it will give an error message and stop,
|
||||
the second one will allocate more memory and proceed with producing
|
||||
objectcode.
|
||||
.PP
|
||||
The C-frontend calls the EM_CODE-interface. So after linking the frontend
|
||||
and the ce we have a pipeline in a program saving a lot of i/o.
|
||||
It is interesting to compare this C-compiler ( called fcemcom) with "cc -c".
|
||||
fcemcom1 (the dynamic variant of fcemcom) is tuned in such a way, that
|
||||
alloc() won't be called.
|
||||
.NH 2
|
||||
Compile time
|
||||
.PP
|
||||
fac.c is a small program that produces n! ( see below). foo.c is small program
|
||||
that loops a lot.
|
||||
.TS
|
||||
center, box, tab(:);
|
||||
c | c | c | c | c | c
|
||||
c | c | n | n | n | n.
|
||||
compiler : program : real : user : sys : object size
|
||||
=
|
||||
fcemcom : sort.c : 31.0 : 17.5 : 1.8 : 23824
|
||||
fcemcom1 : : 59.0 : 21.2 : 3.3 :
|
||||
cc -c : : 50.0 : 38.0 : 3.5 : 6788
|
||||
_
|
||||
fcemcom : ed.c : 37.0 : 23.6 : 2.3 : 41744
|
||||
fcemcom1 : : 1.16.0 : 28.3 : 4.6 :
|
||||
cc -c : : 1.19.0 : 54.8 : 4.3 : 11108
|
||||
_
|
||||
fcemcom : cp.c : 4.0 : 2.4 : 0.8 : 4652
|
||||
fcemcom1 : : 9.0 : 3.0 : 1.0 :
|
||||
cc -c : : 8.0 : 5.2 : 1.6 : 1048
|
||||
_
|
||||
fcemcom : uniq.c : 5.0 : 2.5 : 0.8 : 5568
|
||||
fcemcom1 : : 9.0 : 2.9 : 0.8 :
|
||||
cc -c : : 13.0 : 5.4 : 2.0 : 3008
|
||||
_
|
||||
fcemcom : btlgrep.c : 24.0 : 7.2 : 1.4 : 12968
|
||||
fcemcom1 : : 23.0 : 8.1 : 1.2 :
|
||||
cc -c : : 1.20.0 : 15.3 : 3.8 : 2392
|
||||
_
|
||||
fcemcom : fac.c : 1.0 : 0.1 : 0.5 : 216
|
||||
fecmcom1 : : 2.0 : 0.2 : 0.5 :
|
||||
cc -c : : 3.0 : 0.7 : 1.3 : 92
|
||||
_
|
||||
fcemcom : foo.c : 4.0 : 0.2 : 0.5 : 272
|
||||
fcemcom1 : : 11.0 : 0.3 : 0.5 :
|
||||
cc -c : : 7.0 : 0.8 : 1.6 : 108
|
||||
.TE
|
||||
.NH 2
|
||||
Run time
|
||||
.LP
|
||||
Is the runtime very bad?
|
||||
.TS
|
||||
tab(:), box, center;
|
||||
c | c | c | c | c
|
||||
c | c | n | n | n.
|
||||
compiler : program : real : user : system
|
||||
=
|
||||
fcem : sort.c : 22.0 : 17.5 : 1.5
|
||||
cc : : 5.0 : 2.4 : 1.1
|
||||
_
|
||||
fcem : btlgrep.c : 1.58.0 : 27.2 : 4.2
|
||||
cc : : 12.0 : 3.6 : 1.1
|
||||
_
|
||||
fcem : foo.c : 1.0 : 0.7 : 0.1
|
||||
cc : : 1.0 : 0.4 : 0.1
|
||||
_
|
||||
fcem : uniq.c : 2.0 : 0.5 : 0.3
|
||||
cc : : 1.0 : 0.1 : 0.2
|
||||
.TE
|
||||
.NH 2
|
||||
quality object code
|
||||
.LP
|
||||
The runtime is very bad so its interesting to have look at the code which is
|
||||
produced by fcemcom and by cc -c. I took a program which computes recursively
|
||||
n!.
|
||||
.DS
|
||||
long fac();
|
||||
|
||||
main()
|
||||
{
|
||||
int n;
|
||||
|
||||
scanf( "%D", &n);
|
||||
printf( "fac is %D\\\\n", fac( n));
|
||||
}
|
||||
|
||||
long fac( n)
|
||||
int n;
|
||||
{
|
||||
if ( n == 0)
|
||||
return( 1);
|
||||
else
|
||||
return( n * fac( n-1));
|
||||
}
|
||||
.DE
|
||||
.br
|
||||
.br
|
||||
.br
|
||||
.br
|
||||
.LP
|
||||
"cc -c fac.c" produces :
|
||||
.DS
|
||||
fac: tstl 4(ap)
|
||||
bnequ 7f
|
||||
movl $1, r0
|
||||
ret
|
||||
7f: subl3 $1, 4(ap), r0
|
||||
pushl r0
|
||||
call $1, fac
|
||||
movl r0, -4(fp)
|
||||
mull3 -4(fp), 4(ap), r0
|
||||
ret
|
||||
.DE
|
||||
.br
|
||||
.br
|
||||
.LP
|
||||
"fcem fac.c fac.o" produces :
|
||||
.DS
|
||||
_fac: 0
|
||||
42: jmp be
|
||||
48: pushl 4(ap)
|
||||
4e: pushl $0
|
||||
54: subl2 (sp)+,(sp)
|
||||
57: tstl (sp)+
|
||||
59: bnequ 61
|
||||
5b: jmp 67
|
||||
61: jmp 79
|
||||
67: pushl $1
|
||||
6d: jmp ba
|
||||
73: jmp b9
|
||||
79: pushl 4(ap)
|
||||
7f: pushl $1
|
||||
85: subl2 (sp)+,(sp)
|
||||
88: calls $0,_fac
|
||||
8f: addl2 $4,sp
|
||||
96: pushl r0
|
||||
98: pushl 4(ap)
|
||||
9e: pushl $4
|
||||
a4: pushl $4
|
||||
aa: jsb .cii
|
||||
b0: mull2 (sp)+,(sp)
|
||||
b3: jmp ba
|
||||
b9: ret
|
||||
ba: movl (sp)+,r0
|
||||
bd: ret
|
||||
be: jmp 48
|
||||
.DE
|
||||
.NH 1
|
||||
Conclusions
|
||||
.PP
|
||||
comparing "cc -c" with "fcemcom"
|
||||
.LP
|
||||
.TS
|
||||
center, box, tab(:);
|
||||
c | c s | c | c s
|
||||
^ | c s | ^ | c s
|
||||
^ | c | c | ^ | c | c
|
||||
l | n | n | n | n | n.
|
||||
program : compile time : object size : runtime
|
||||
:_::_
|
||||
: user : sys :: user : sys
|
||||
=
|
||||
sort.c : 0.47 : 0.5 : 3.5 : 7.3 : 1.4
|
||||
_
|
||||
ed.c : 0.46 : 0.5 : 3.8 : : :
|
||||
_
|
||||
cp.c : 0.46 : 0.5 : 4.4 : : :
|
||||
_
|
||||
uniq.c : 0.46 : 0.4 : 1.8 : : :
|
||||
_
|
||||
btlgrep.c : 0.47 : 0.3 : 5.4 : 7.5 : 3.8
|
||||
_
|
||||
fac.c : 0.14 : 0.4 : 2.3 : 1.8 : 1.0
|
||||
_
|
||||
foo.c : 0.25 : 0.3 : 2.5 : 5.0 : 1.5
|
||||
.TE
|
||||
.PP
|
||||
The results for fcemcom1 are almost identical; The only thing that changes
|
||||
is that fcemcom1 is 1.2 slower than fcemcom. ( compile time) This is due to
|
||||
to an another datastructure . In the static version we use huge array's for
|
||||
the text- and
|
||||
data-segment, the relocation information, the symboltable and stringarea.
|
||||
In the dynamic version we use linked lists, wich makes it expensive to get
|
||||
and to put a byte on a abritrary memory location. So it is probably better
|
||||
to use realloc(), because in the most cases there will be enough memory.
|
||||
.PP
|
||||
The quality of the objectcode is very bad. The reason is that the frontend
|
||||
generates bad code and expects the peephole-optimizer to improve the code.
|
||||
This is also one of the main reasons that the runtime is very bad.
|
||||
(e.g. the expensive "cii" with arguments 4 and 4 could be deleted.)
|
||||
So its seems a good
|
||||
idea to put a new peephole-optimizer between the frontend and the ce.
|
||||
.PP
|
||||
Using the peephole optimizer the ce would produce :
|
||||
.DS
|
||||
_fac: 0
|
||||
pushl 4(ap)
|
||||
tstl (sp)+
|
||||
beqlu 1f
|
||||
jmp 3f
|
||||
1 : pushl $1
|
||||
jmp 2f
|
||||
3 : pushl 4(ap)
|
||||
decl (sp)
|
||||
calls $0,_fac
|
||||
addl2 $4,sp
|
||||
pushl r0
|
||||
pushl 4(ap)
|
||||
mull2 (sp)+,(sp)
|
||||
movl (sp)+,r0
|
||||
2 : ret
|
||||
.DE
|
||||
.PP
|
||||
Bruce McKenzy already implemented it and made some improvements in the
|
||||
source code of the ce. The compile-time is two to two and a half times better
|
||||
and the
|
||||
size of the objectcode is two to three times bigger.(comparing with "cc -c")
|
||||
Still we could do better.
|
||||
.PP
|
||||
Using peephole- and push/pop-optimization ce could produce :
|
||||
.DS
|
||||
_fac: 0
|
||||
tstl 4(ap)
|
||||
beqlu 1f
|
||||
jmp 2f
|
||||
1 : pushl $1
|
||||
jmp 3f
|
||||
2 : decl 4(ap)
|
||||
calls $0,_fac
|
||||
addl2 $4,sp
|
||||
mull3 4(ap), r0, -(sp)
|
||||
movl (sp)+, r0
|
||||
3 : ret
|
||||
.DE
|
||||
.PP
|
||||
prof doesn't cooperate, so no profile information.
|
||||
.PP
|
||||
Reference in New Issue
Block a user