usr.bin/gprof/PSD.doc/gathering.me

   1 .\" Copyright (c) 1982, 1993
   2 .\"     The Regents of the University of California.  All rights reserved.
   3 .\"
   4 .\" Redistribution and use in source and binary forms, with or without
   5 .\" modification, are permitted provided that the following conditions
   6 .\" are met:
   7 .\" 1. Redistributions of source code must retain the above copyright
   8 .\"    notice, this list of conditions and the following disclaimer.
   9 .\" 2. Redistributions in binary form must reproduce the above copyright
  10 .\"    notice, this list of conditions and the following disclaimer in the
  11 .\"    documentation and/or other materials provided with the distribution.
  12 .\" 3. All advertising materials mentioning features or use of this software
  13 .\"    must display the following acknowledgement:
  14 .\"     This product includes software developed by the University of
  15 .\"     California, Berkeley and its contributors.
  16 .\" 4. Neither the name of the University nor the names of its contributors
  17 .\"    may be used to endorse or promote products derived from this software
  18 .\"    without specific prior written permission.
  19 .\"
  20 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  21 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  22 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  23 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  24 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  25 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  26 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  27 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  28 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  29 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  30 .\" SUCH DAMAGE.
  31 .\"
  32 .\"     @(#)gathering.me        8.1 (Berkeley) 6/8/93
  33 .\"
  34 .sh 1 "Gathering Profile Data"
  35 .pp
  36 Routine calls or statement executions can be measured by having a
  37 compiler augment the code at strategic points.
  38 The additions can be inline increments to counters [Knuth71]
  39 [Satterthwaite72] [Joy79] or calls to
  40 monitoring routines [Unix].
  41 The counter increment overhead is low, and is suitable for
  42 profiling statements.
  43 A call of the monitoring routine has an overhead comparable with a
  44 call of a regular routine, and is therefore only suited
  45 to profiling on a routine by routine basis.
  46 However, the monitoring routine solution has certain advantages.
  47 Whatever counters are needed by the monitoring routine can be
  48 managed by the monitoring routine itself, rather than being
  49 distributed around the code.
  50 In particular, a monitoring routine can easily be called from separately
  51 compiled programs.
  52 In addition, different monitoring routines can be linked into the
  53 program
  54 being measured
  55 to assemble different profiling data without having to
  56 change the compiler or recompile the program.
  57 We have exploited this approach;
  58 our compilers for C, Fortran77, and Pascal can insert calls to a
  59 monitoring routine in the prologue for each routine.
  60 Use of the monitoring routine requires no planning on part of a
  61 programmer other than to request that augmented routine
  62 prologues be produced during compilation.
  63 .pp
  64 We are interested in gathering three pieces of information during
  65 program execution: call counts and execution times for
  66 each profiled routine, and the arcs of the dynamic call graph
  67 traversed by this execution of the program.
  68 By post-processing of this data we can build the dynamic call
  69 graph for this execution of the program and propagate times along
  70 the edges of this graph to attribute times for routines to the
  71 routines that invoke them.
  72 .pp
  73 Gathering of the profiling information should not greatly
  74 interfere with the running of the program.
  75 Thus, the monitoring routine must not produce trace output each
  76 time it is invoked.
  77 The volume of data thus produced would be unmanageably large,
  78 and the time required to record it would overwhelm the running
  79 time of most programs.
  80 Similarly, the monitoring routine can not do the analysis of
  81 the profiling data (e.g. assembling the call graph, propagating
  82 times around it, discovering cycles, etc.) during program
  83 execution.
  84 Our solution is to gather profiling data in memory during program
  85 execution and to condense it to a file as the profiled
  86 program exits.
  87 This file is then processed by a separate program to produce the
  88 listing of the profile data.
  89 An advantage of this approach is that the profile data for
  90 several executions of a program can be combined by the
  91 post-processing to provide a profile of many
  92 executions.
  93 .pp
  94 The execution time monitoring consists of three parts.
  95 The first part allocates and initializes the runtime monitoring data
  96 structures before the program begins execution.
  97 The second part is the monitoring routine invoked from the
  98 prologue of each profiled routine.
  99 The third part condenses the data structures and writes them
 100 to a file as the program terminates.
 101 The monitoring routine is discussed in detail in the following sections.
 102 .sh 2 "Execution Counts"
 103 .pp
 104 The \fBgprof\fP monitoring routine counts the number of times
 105 each profiled routine is called.
 106 The monitoring routine also records the arc in the call graph
 107 that activated the profiled routine.
 108 The count is associated with the arc in the call graph
 109 rather than with the routine.
 110 Call counts for routines can then be determined by summing the counts
 111 on arcs directed into that routine.
 112 In a machine-dependent fashion, the monitoring routine notes its
 113 own return address.
 114 This address is in the prologue of some profiled routine that is
 115 the destination of an arc in the dynamic call graph.
 116 The monitoring routine also discovers the return address for that
 117 routine, thus identifying the call site, or source of the arc.
 118 The source of the arc is in the \fIcaller\fP, and the destination is in
 119 the \fIcallee\fP.
 120 For example, if a routine A calls a routine B, A is the caller,
 121 and B is the callee.
 122 The prologue of B will include a call to the monitoring routine
 123 that will note the arc from A to B and either initialize or
 124 increment a counter for that arc.
 125 .pp
 126 One can not afford to have the monitoring routine output tracing
 127 information as each arc is identified.
 128 Therefore, the monitoring routine maintains a table of all the
 129 arcs discovered, with counts of the numbers of times each is
 130 traversed during execution.
 131 This table is accessed once per routine call.
 132 Access to it
 133 must be as fast as possible so as not to overwhelm the time
 134 required to execute the program.
 135 .pp
 136 Our solution is to access the table through a hash table.
 137 We use the call site as the primary key with the callee
 138 address being the secondary key.
 139 Since each call site typically calls only one callee, we can
 140 reduce (usually to one) the number of minor lookups based on the callee.
 141 Another alternative would use the callee as the primary key and the
 142 call site as the secondary key.
 143 Such an organization has the advantage of associating callers with
 144 callees, at the expense of longer lookups in the monitoring
 145 routine.
 146 We are fortunate to be running in a virtual memory environment,
 147 and (for the sake of speed) were able to allocate enough space
 148 for the primary hash table to allow a one-to-one mapping from
 149 call site addresses to the primary hash table.
 150 Thus our hash function is trivial to calculate and collisions
 151 occur only for call sites that call multiple
 152 destinations (e.g. functional parameters and functional variables).
 153 A one level hash function using both call site and callee would
 154 result in an unreasonably large hash table.
 155 Further, the number of dynamic call sites and callees is not known during
 156 execution of the profiled program.
 157 .pp
 158 Not all callers and callees can be identified by the monitoring
 159 routine.
 160 Routines that were compiled without the profiling augmentations
 161 will not call the monitoring routine as part of their prologue,
 162 and thus no arcs will be recorded whose destinations are in these
 163 routines.
 164 One need not profile all the routines in a program.
 165 Routines that are not profiled run at full speed.
 166 Certain routines, notably exception handlers, are invoked by
 167 non-standard calling sequences.
 168 Thus the monitoring routine may know the destination of an arc
 169 (the callee),
 170 but find it difficult or
 171 impossible to determine the source of the arc (the caller).
 172 Often in these cases the apparent source of the arc is not a call
 173 site at all.
 174 Such anomalous invocations are declared ``spontaneous''.
 175 .sh 2 "Execution Times"
 176 .pp
 177 The execution times for routines can be gathered in at least two
 178 ways.
 179 One method measures the execution time of a routine by measuring
 180 the elapsed time from routine entry to routine exit.
 181 Unfortunately, time measurement is complicated on time-sharing
 182 systems by the time-slicing of the program.
 183 A second method samples the value of the program counter at some
 184 interval, and infers execution time from the distribution of the
 185 samples within the program.
 186 This technique is particularly suited to time-sharing systems,
 187 where the time-slicing can serve as the basis for sampling
 188 the program counter.
 189 Notice that, whereas the first method could provide exact timings,
 190 the second is inherently a statistical approximation.
 191 .pp
 192 The sampling method need not require support from the operating
 193 system:  all that is needed is the ability to set and respond to
 194 ``alarm clock'' interrupts that run relative to program time.
 195 It is imperative that the intervals be uniform since the
 196 sampling of the program counter rather than the duration of the
 197 interval is the basis of the distribution.
 198 If sampling is done too often, the interruptions to sample the
 199 program counter will overwhelm the running of the profiled program.
 200 On the other hand, the program must run for enough sampled
 201 intervals that the distribution of the samples accurately
 202 represents the distribution of time for the execution of the
 203 program.
 204 As with routine call tracing, the monitoring routine can not
 205 afford to output information for each program counter
 206 sample.
 207 In our computing environment, the operating system can provide a
 208 histogram of the location of the program counter at the end of
 209 each clock tick (1/60th of a second) in which a program runs.
 210 The histogram is assembled in memory as the program runs.
 211 This facility is enabled by our monitoring routine.
 212 We have adjusted the granularity of the histogram so that
 213 program counter values map one-to-one onto the histogram.
 214 We make the simplifying assumption that all calls to a specific
 215 routine require the same amount of time to execute.
 216 This assumption may disguise that some calls
 217 (or worse, some call sites) always invoke a routine
 218 such that its execution is faster (or slower)
 219 than the average time for that routine.
 220 .pp
 221 When the profiled program terminates,
 222 the arc table and the histogram of
 223 program counter samples is written to a file.
 224 The arc table is condensed to consist of the source and destination
 225 addresses of the arc and the count of the number of times the arc
 226 was traversed by this execution of the program.
 227 The recorded histogram consists of counters of the number of
 228 times the program counter was found to be in each of the ranges covered
 229 by the histogram.
 230 The ranges themselves are summarized as a
 231 lower and upper bound and a step size.