Added

1991-09-27 16:19:24 +00:00
parent 63c9fea5c2
commit fb51183da2
29 changed files with 2085 additions and 0 deletions
--- a/doc/sparc/A
+++ b/doc/sparc/A
@@ -0,0 +1,184 @@
+.so init
+.SH
+A. MEASUREMENTS
+.SH
+A.1. \*(OQThe bottom line\*(CQ
+.PP
+Although examples often are most illustrative, the cruel world out there is
+usually more interested in everyday performance figures. To satisfy those
+people too, we will present a series of measurements on our code expander
+taken from (close to) real life situations. These include measurements
+of compile and run times of different programs,
+compiled with different compilers.
+.SH
+A.2. Compile time measurements
+.PP
+Figure A.2.1 shows compile-time measurements for typical C code:
+the dhrystone benchmark\(dg
+.[ [
+dhrystone
+.]].
+.FS
+\(dg To be certain that we only tested the compiler and not the quality of
+the code in the library, we have added our own version of
+\fIstrcmp\fR and \fIstrcpy\fR and have not used the ones present in the
+library.
+.FE
+The numbers represent the duration of each separate pass of the compiler.
+The numbers at the end of each bar represent the total duration of the
+compilation process. As with all measurements in this chapter, the
+quoted time or duration is the sum of user and system time in seconds.
+.PS
+copy "pics/compile_bars"
+.PE
+.DS
+.IP cem: 6
+C to EM frontend
+.IP opt:
+EM peep-hole optimizer
+.IP be:
+EM to assembler backend
+.IP cpp:
+Sun's C preprocessor
+.IP ccom:
+Sun's C compiler
+.IP iropt:
+Sun's optimizer
+.IP cg:
+Sun's code generator
+.IP as:
+Sun's assembler
+.IP ld:
+Sun's linker
+.ce 1
+\fIFigure A.2.1: compile-time measurements.\fR
+.DE
+.sp
+.PP
+A close examination of the first two bars in fig A.2.1 shows that the maximum
+achievable compile-time
+gain compared to \fIcc\fR is about 50% for medium-sized
+programs.\(dd
+.FS
+\(dd (cpp+ccom+as+ld)/(cem+as+ld) = 1.53
+.FE
+For small programs the gain will be less, due to the almost constant
+start-up time of each pass in the compilation process. Only a
+built-in assembler may increase this number up to
+180% in the ideal case that the optimizer, backend and assembler
+would run in zero time. Speed-ups of 5 to 10 times as mentioned in
+.[ [
+fast portable compilers
+.]]
+are therefore not possible on the Sun-4 family. This is also due to
+Sun's implementation of saving and restoring register windows. With
+the current implementation in which only a single window is saved
+or restored on a register-window overflow, it is very time consuming
+when programs have highly dynamic stack use
+due to procedure calls (as is often the case with compilers).
+.PP
+Although we are currently a little slower than \fIcc\fR, it is hard to
+blame this on our backend. Optimizing the backend so that it would run
+twice as fast would only reduce the total compilation process by
+a mere 14%.
+.PP
+Finally it is nice to see that our push/pop-optimization,
+initially designed to generate faster code, has also increased the
+compilation speed. (see also figures A.4.1 and A.4.2.)
+.SH
+A.3. Run time performance
+.PP
+Figure A.3.1 shows the run-time performance of different compilers.
+All results are normalized, where the best available compiler (Sun's
+compiler with full optimization) is represented by 1.0 on our scale.
+.PS
+copy "pics/run-time_bars"
+.PE
+.ce 1
+\fIFigure A.3.1: run time performance.\fR
+.sp 1
+.PP
+The fact that our compiler behaves rather poorly compared to Sun's
+compiler is due to the fact that the dhrystone benchmark uses
+relatively many subroutine calls; all of which have to be 'emulated'
+by our backend.
+.SH
+A.4. Overall performance
+.LP
+In the next two figures we will show the combined run and compile time
+performance of 'our' compiler (the ACK C frontend and our backend)
+compared to Sun's C compiler. Figure A.4.1 shows the results from
+measurements on the dhrystone benchmark.
+.G1
+frame invis left solid bot solid
+label left "run time" "(in \(*msec/dhrystone)"
+label bot "compile time (in sec)"
+coord x 0,21 y 0,610
+ticks left out from 0 to 600 by 200
+ticks bot out from 0 to 20 by 5
+"\(bu" at 3.5, 1000000/1700
+"ack w/o opt" ljust at 3.5 + 1, 1000000/1700
+"\(bu" at 2.8, 1000000/8770
+"ack with opt" below at 2.8 + 0.1, 1000000/8770
+"\(bu" at 16.0, 1000000/10434
+"ack -O4" above at 16.0, 1000000/10434
+"\(bu" at 2.3, 1000000/7270
+"\fIcc\fR" above at 2.3, 1000000/7270
+"\(bu" at 9.0, 1000000/12500
+"\fIcc -O4\fR" above at 9.0, 1000000/12500
+"\(bu" at 5.9, 1000000/15250
+"\fIcc -O\fR" below at 5.9, 1000000/15250
+.G2
+.ce 1
+\fIFigure A.4.1: overall performance on dhrystones.
+.sp 1
+.LP
+Fortunately for us, dhrystones are not all there is. The following
+figure shows the same measurements as the previous one, except
+this time we took a benchmark that uses no subroutines: an implementation
+of Eratosthenes' sieve:
+.G1
+frame invis left solid bot solid
+label left "run time" "for one run" "(in sec)" left .6
+label bot "compile time (in sec)"
+coord x 0,11 y 0,21
+ticks bot out from 0 to 10 by 5
+ticks left out from 0 to 20 by 5
+"\(bu" at 2.5, 17.28
+"ack w/o opt" above at 2.5, 17.28
+"\(bu" at 1.6, 2.93
+"ack with opt" above at 1.6, 2.93
+"\(bu" at 9.4, 2.26
+"ack -O4" above at 9.4, 2.26
+"\(bu" at 1.5, 7.43
+"\fIcc\fR" above at 1.5, 7.43
+"\(bu" at 2.7, 2.02
+"\fIcc -O4\fR" ljust at 1.9, 1.2
+"\(bu" at 2.6, 2.10
+"\fIcc -O\fR" ljust at 3.1,2.5
+.G2
+.ce 1
+\fIFigure A.4.2: overall performance on Eratosthenes' sieve.
+.sp 1
+.PP
+Although the above figures speak for themselves, a small comment
+may be in place. At first it is clear that our compiler is neither
+faster than \fIcc\fR, nor produces faster code than \fIcc -O4\fR. It should
+also be noted however, that we do produce better code than \fIcc\fR
+at only a very small additional cost.
+It is also worth noticing that push-pop optimization
+increases run-time speed as well as compile speed.
+The first seems rather obvious,
+since optimized code is
+faster code, but the increase in compile speed may come as a surprise.
+The main reason is that the \fIas\fR+\fIld\fR time depends largely on the
+amount of generated code, which in general
+depends on the efficiency of the code.
+Push-pop optimization removes a lot of useless instructions which
+would otherwise
+have found their way through to the assembler and the loader.
+Useless instructions inserted in an early stage in the compilation
+process will slow down every following stage, so elimination of useless
+instructions in an early stage, even when it requires a little computational
+overhead, can often be beneficial to the overall compilation speed.
+.bp