contrib/gcc-5.0/gcc/doc/cfg.texi

   1 @c -*-texinfo-*-
   2 @c Copyright (C) 2001-2015 Free Software Foundation, Inc.
   3 @c This is part of the GCC manual.
   4 @c For copying conditions, see the file gcc.texi.
   5
   6 @c ---------------------------------------------------------------------
   7 @c Control Flow Graph
   8 @c ---------------------------------------------------------------------
   9
  10 @node Control Flow
  11 @chapter Control Flow Graph
  12 @cindex CFG, Control Flow Graph
  13 @findex basic-block.h
  14
  15 A control flow graph (CFG) is a data structure built on top of the
  16 intermediate code representation (the RTL or @code{GIMPLE} instruction
  17 stream) abstracting the control flow behavior of a function that is
  18 being compiled.  The CFG is a directed graph where the vertices
  19 represent basic blocks and edges represent possible transfer of
  20 control flow from one basic block to another.  The data structures
  21 used to represent the control flow graph are defined in
  22 @file{basic-block.h}.
  23
  24 In GCC, the representation of control flow is maintained throughout
  25 the compilation process, from constructing the CFG early in
  26 @code{pass_build_cfg} to @code{pass_free_cfg} (see @file{passes.def}).
  27 The CFG takes various different modes and may undergo extensive
  28 manipulations, but the graph is always valid between its construction
  29 and its release.  This way, transfer of information such as data flow,
  30 a measured profile, or the loop tree, can be propagated through the
  31 passes pipeline, and even from @code{GIMPLE} to @code{RTL}.
  32
  33 Often the CFG may be better viewed as integral part of instruction
  34 chain, than structure built on the top of it.  Updating the compiler's
  35 intermediate representation for instructions can not be easily done
  36 without proper maintenance of the CFG simultaneously.
  37
  38 @menu
  39 * Basic Blocks::           The definition and representation of basic blocks.
  40 * Edges::                  Types of edges and their representation.
  41 * Profile information::    Representation of frequencies and probabilities.
  42 * Maintaining the CFG::    Keeping the control flow graph and up to date.
  43 * Liveness information::   Using and maintaining liveness information.
  44 @end menu
  45
  46
  47 @node Basic Blocks
  48 @section Basic Blocks
  49
  50 @cindex basic block
  51 @findex basic_block
  52 A basic block is a straight-line sequence of code with only one entry
  53 point and only one exit.  In GCC, basic blocks are represented using
  54 the @code{basic_block} data type.
  55
  56 @findex ENTRY_BLOCK_PTR, EXIT_BLOCK_PTR
  57 Special basic blocks represent possible entry and exit points of a
  58 function.  These blocks are called @code{ENTRY_BLOCK_PTR} and
  59 @code{EXIT_BLOCK_PTR}.  These blocks do not contain any code.
  60
  61 @findex BASIC_BLOCK
  62 The @code{BASIC_BLOCK} array contains all basic blocks in an
  63 unspecified order.  Each @code{basic_block} structure has a field
  64 that holds a unique integer identifier @code{index} that is the
  65 index of the block in the @code{BASIC_BLOCK} array.
  66 The total number of basic blocks in the function is
  67 @code{n_basic_blocks}.  Both the basic block indices and
  68 the total number of basic blocks may vary during the compilation
  69 process, as passes reorder, create, duplicate, and destroy basic
  70 blocks.  The index for any block should never be greater than
  71 @code{last_basic_block}.  The indices 0 and 1 are special codes
  72 reserved for @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}, the
  73 indices of @code{ENTRY_BLOCK_PTR} and @code{EXIT_BLOCK_PTR}.
  74
  75 @findex next_bb, prev_bb, FOR_EACH_BB, FOR_ALL_BB
  76 Two pointer members of the @code{basic_block} structure are the
  77 pointers @code{next_bb} and @code{prev_bb}.  These are used to keep
  78 doubly linked chain of basic blocks in the same order as the
  79 underlying instruction stream.  The chain of basic blocks is updated
  80 transparently by the provided API for manipulating the CFG@.  The macro
  81 @code{FOR_EACH_BB} can be used to visit all the basic blocks in
  82 lexicographical order, except @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}.
  83 The macro @code{FOR_ALL_BB} also visits all basic blocks in
  84 lexicographical order, including @code{ENTRY_BLOCK} and @code{EXIT_BLOCK}.
  85
  86 @findex post_order_compute, inverted_post_order_compute, walk_dominator_tree
  87 The functions @code{post_order_compute} and @code{inverted_post_order_compute}
  88 can be used to compute topological orders of the CFG.  The orders are
  89 stored as vectors of basic block indices.  The @code{BASIC_BLOCK} array
  90 can be used to iterate each basic block by index.
  91 Dominator traversals are also possible using
  92 @code{walk_dominator_tree}.  Given two basic blocks A and B, block A
  93 dominates block B if A is @emph{always} executed before B@.
  94
  95 Each @code{basic_block} also contains pointers to the first
  96 instruction (the @dfn{head}) and the last instruction (the @dfn{tail})
  97 or @dfn{end} of the instruction stream contained in a basic block.  In
  98 fact, since the @code{basic_block} data type is used to represent
  99 blocks in both major intermediate representations of GCC (@code{GIMPLE}
 100 and RTL), there are pointers to the head and end of a basic block for
 101 both representations, stored in intermediate representation specific
 102 data in the @code{il} field of @code{struct basic_block_def}.
 103
 104 @findex CODE_LABEL
 105 @findex NOTE_INSN_BASIC_BLOCK
 106 For RTL, these pointers are @code{BB_HEAD} and @code{BB_END}.
 107
 108 @cindex insn notes, notes
 109 @findex NOTE_INSN_BASIC_BLOCK
 110 In the RTL representation of a function, the instruction stream
 111 contains not only the ``real'' instructions, but also @dfn{notes}
 112 or @dfn{insn notes} (to distinguish them from @dfn{reg notes}).
 113 Any function that moves or duplicates the basic blocks needs
 114 to take care of updating of these notes.  Many of these notes expect
 115 that the instruction stream consists of linear regions, so updating
 116 can sometimes be tedious.  All types of insn notes are defined
 117 in @file{insn-notes.def}.
 118
 119 In the RTL function representation, the instructions contained in a
 120 basic block always follow a @code{NOTE_INSN_BASIC_BLOCK}, but zero
 121 or more @code{CODE_LABEL} nodes can precede the block note.
 122 A basic block ends with a control flow instruction or with the last
 123 instruction before the next @code{CODE_LABEL} or
 124 @code{NOTE_INSN_BASIC_BLOCK}.
 125 By definition, a @code{CODE_LABEL} cannot appear in the middle of
 126 the instruction stream of a basic block.
 127
 128 @findex can_fallthru
 129 @cindex table jump
 130 In addition to notes, the jump table vectors are also represented as
 131 ``pseudo-instructions'' inside the insn stream.  These vectors never
 132 appear in the basic block and should always be placed just after the
 133 table jump instructions referencing them.  After removing the
 134 table-jump it is often difficult to eliminate the code computing the
 135 address and referencing the vector, so cleaning up these vectors is
 136 postponed until after liveness analysis.   Thus the jump table vectors
 137 may appear in the insn stream unreferenced and without any purpose.
 138 Before any edge is made @dfn{fall-thru}, the existence of such
 139 construct in the way needs to be checked by calling
 140 @code{can_fallthru} function.
 141
 142 @cindex GIMPLE statement iterators
 143 For the @code{GIMPLE} representation, the PHI nodes and statements
 144 contained in a basic block are in a @code{gimple_seq} pointed to by
 145 the basic block intermediate language specific pointers.
 146 Abstract containers and iterators are used to access the PHI nodes
 147 and statements in a basic blocks.  These iterators are called
 148 @dfn{GIMPLE statement iterators} (GSIs).  Grep for @code{^gsi}
 149 in the various @file{gimple-*} and @file{tree-*} files.
 150 There is a @code{gimple_stmt_iterator} type for iterating over
 151 all kinds of statement, and a @code{gphi_iterator} subclass for
 152 iterating over PHI nodes.
 153 The following snippet will pretty-print all PHI nodes the statements
 154 of the current function in the GIMPLE representation.
 155
 156 @smallexample
 157 basic_block bb;
 158
 159 FOR_EACH_BB (bb)
 160   @{
 161    gphi_iterator pi;
 162    gimple_stmt_iterator si;
 163
 164    for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
 165      @{
 166        gphi *phi = pi.phi ();
 167        print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
 168      @}
 169    for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
 170      @{
 171        gimple stmt = gsi_stmt (si);
 172        print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
 173      @}
 174   @}
 175 @end smallexample
 176
 177
 178 @node Edges
 179 @section Edges
 180
 181 @cindex edge in the flow graph
 182 @findex edge
 183 Edges represent possible control flow transfers from the end of some
 184 basic block A to the head of another basic block B@.  We say that A is
 185 a predecessor of B, and B is a successor of A@.  Edges are represented
 186 in GCC with the @code{edge} data type.  Each @code{edge} acts as a
 187 link between two basic blocks: The @code{src} member of an edge
 188 points to the predecessor basic block of the @code{dest} basic block.
 189 The members @code{preds} and @code{succs} of the @code{basic_block} data
 190 type point to type-safe vectors of edges to the predecessors and
 191 successors of the block.
 192
 193 @cindex edge iterators
 194 When walking the edges in an edge vector, @dfn{edge iterators} should
 195 be used.  Edge iterators are constructed using the
 196 @code{edge_iterator} data structure and several methods are available
 197 to operate on them:
 198
 199 @ftable @code
 200 @item ei_start
 201 This function initializes an @code{edge_iterator} that points to the
 202 first edge in a vector of edges.
 203
 204 @item ei_last
 205 This function initializes an @code{edge_iterator} that points to the
 206 last edge in a vector of edges.
 207
 208 @item ei_end_p
 209 This predicate is @code{true} if an @code{edge_iterator} represents
 210 the last edge in an edge vector.
 211
 212 @item ei_one_before_end_p
 213 This predicate is @code{true} if an @code{edge_iterator} represents
 214 the second last edge in an edge vector.
 215
 216 @item ei_next
 217 This function takes a pointer to an @code{edge_iterator} and makes it
 218 point to the next edge in the sequence.
 219
 220 @item ei_prev
 221 This function takes a pointer to an @code{edge_iterator} and makes it
 222 point to the previous edge in the sequence.
 223
 224 @item ei_edge
 225 This function returns the @code{edge} currently pointed to by an
 226 @code{edge_iterator}.
 227
 228 @item ei_safe_safe
 229 This function returns the @code{edge} currently pointed to by an
 230 @code{edge_iterator}, but returns @code{NULL} if the iterator is
 231 pointing at the end of the sequence.  This function has been provided
 232 for existing code makes the assumption that a @code{NULL} edge
 233 indicates the end of the sequence.
 234
 235 @end ftable
 236
 237 The convenience macro @code{FOR_EACH_EDGE} can be used to visit all of
 238 the edges in a sequence of predecessor or successor edges.  It must
 239 not be used when an element might be removed during the traversal,
 240 otherwise elements will be missed.  Here is an example of how to use
 241 the macro:
 242
 243 @smallexample
 244 edge e;
 245 edge_iterator ei;
 246
 247 FOR_EACH_EDGE (e, ei, bb->succs)
 248   @{
 249      if (e->flags & EDGE_FALLTHRU)
 250        break;
 251   @}
 252 @end smallexample
 253
 254 @findex fall-thru
 255 There are various reasons why control flow may transfer from one block
 256 to another.  One possibility is that some instruction, for example a
 257 @code{CODE_LABEL}, in a linearized instruction stream just always
 258 starts a new basic block.  In this case a @dfn{fall-thru} edge links
 259 the basic block to the first following basic block.  But there are
 260 several other reasons why edges may be created.  The @code{flags}
 261 field of the @code{edge} data type is used to store information
 262 about the type of edge we are dealing with.  Each edge is of one of
 263 the following types:
 264
 265 @table @emph
 266 @item jump
 267 No type flags are set for edges corresponding to jump instructions.
 268 These edges are used for unconditional or conditional jumps and in
 269 RTL also for table jumps.  They are the easiest to manipulate as they
 270 may be freely redirected when the flow graph is not in SSA form.
 271
 272 @item fall-thru
 273 @findex EDGE_FALLTHRU, force_nonfallthru
 274 Fall-thru edges are present in case where the basic block may continue
 275 execution to the following one without branching.  These edges have
 276 the @code{EDGE_FALLTHRU} flag set.  Unlike other types of edges, these
 277 edges must come into the basic block immediately following in the
 278 instruction stream.  The function @code{force_nonfallthru} is
 279 available to insert an unconditional jump in the case that redirection
 280 is needed.  Note that this may require creation of a new basic block.
 281
 282 @item exception handling
 283 @cindex exception handling
 284 @findex EDGE_ABNORMAL, EDGE_EH
 285 Exception handling edges represent possible control transfers from a
 286 trapping instruction to an exception handler.  The definition of
 287 ``trapping'' varies.  In C++, only function calls can throw, but for
 288 Java and Ada, exceptions like division by zero or segmentation fault are
 289 defined and thus each instruction possibly throwing this kind of
 290 exception needs to be handled as control flow instruction.  Exception
 291 edges have the @code{EDGE_ABNORMAL} and @code{EDGE_EH} flags set.
 292
 293 @findex purge_dead_edges
 294 When updating the instruction stream it is easy to change possibly
 295 trapping instruction to non-trapping, by simply removing the exception
 296 edge.  The opposite conversion is difficult, but should not happen
 297 anyway.  The edges can be eliminated via @code{purge_dead_edges} call.
 298
 299 @findex REG_EH_REGION, EDGE_ABNORMAL_CALL
 300 In the RTL representation, the destination of an exception edge is
 301 specified by @code{REG_EH_REGION} note attached to the insn.
 302 In case of a trapping call the @code{EDGE_ABNORMAL_CALL} flag is set
 303 too.  In the @code{GIMPLE} representation, this extra flag is not set.
 304
 305 @findex may_trap_p, tree_could_trap_p
 306 In the RTL representation, the predicate @code{may_trap_p} may be used
 307 to check whether instruction still may trap or not.  For the tree
 308 representation, the @code{tree_could_trap_p} predicate is available,
 309 but this predicate only checks for possible memory traps, as in
 310 dereferencing an invalid pointer location.
 311
 312
 313 @item sibling calls
 314 @cindex sibling call
 315 @findex EDGE_ABNORMAL, EDGE_SIBCALL
 316 Sibling calls or tail calls terminate the function in a non-standard
 317 way and thus an edge to the exit must be present.
 318 @code{EDGE_SIBCALL} and @code{EDGE_ABNORMAL} are set in such case.
 319 These edges only exist in the RTL representation.
 320
 321 @item computed jumps
 322 @cindex computed jump
 323 @findex EDGE_ABNORMAL
 324 Computed jumps contain edges to all labels in the function referenced
 325 from the code.  All those edges have @code{EDGE_ABNORMAL} flag set.
 326 The edges used to represent computed jumps often cause compile time
 327 performance problems, since functions consisting of many taken labels
 328 and many computed jumps may have @emph{very} dense flow graphs, so
 329 these edges need to be handled with special care.  During the earlier
 330 stages of the compilation process, GCC tries to avoid such dense flow
 331 graphs by factoring computed jumps.  For example, given the following
 332 series of jumps,
 333
 334 @smallexample
 335   goto *x;
 336   [ @dots{} ]
 337
 338   goto *x;
 339   [ @dots{} ]
 340
 341   goto *x;
 342   [ @dots{} ]
 343 @end smallexample
 344
 345 @noindent
 346 factoring the computed jumps results in the following code sequence
 347 which has a much simpler flow graph:
 348
 349 @smallexample
 350   goto y;
 351   [ @dots{} ]
 352
 353   goto y;
 354   [ @dots{} ]
 355
 356   goto y;
 357   [ @dots{} ]
 358
 359 y:
 360   goto *x;
 361 @end smallexample
 362
 363 @findex pass_duplicate_computed_gotos
 364 However, the classic problem with this transformation is that it has a
 365 runtime cost in there resulting code: An extra jump.  Therefore, the
 366 computed jumps are un-factored in the later passes of the compiler
 367 (in the pass called @code{pass_duplicate_computed_gotos}).
 368 Be aware of that when you work on passes in that area.  There have
 369 been numerous examples already where the compile time for code with
 370 unfactored computed jumps caused some serious headaches.
 371
 372 @item nonlocal goto handlers
 373 @cindex nonlocal goto handler
 374 @findex EDGE_ABNORMAL, EDGE_ABNORMAL_CALL
 375 GCC allows nested functions to return into caller using a @code{goto}
 376 to a label passed to as an argument to the callee.  The labels passed
 377 to nested functions contain special code to cleanup after function
 378 call.  Such sections of code are referred to as ``nonlocal goto
 379 receivers''.  If a function contains such nonlocal goto receivers, an
 380 edge from the call to the label is created with the
 381 @code{EDGE_ABNORMAL} and @code{EDGE_ABNORMAL_CALL} flags set.
 382
 383 @item function entry points
 384 @cindex function entry point, alternate function entry point
 385 @findex LABEL_ALTERNATE_NAME
 386 By definition, execution of function starts at basic block 0, so there
 387 is always an edge from the @code{ENTRY_BLOCK_PTR} to basic block 0.
 388 There is no @code{GIMPLE} representation for alternate entry points at
 389 this moment.  In RTL, alternate entry points are specified by
 390 @code{CODE_LABEL} with @code{LABEL_ALTERNATE_NAME} defined.  This
 391 feature is currently used for multiple entry point prologues and is
 392 limited to post-reload passes only.  This can be used by back-ends to
 393 emit alternate prologues for functions called from different contexts.
 394 In future full support for multiple entry functions defined by Fortran
 395 90 needs to be implemented.
 396
 397 @item function exits
 398 In the pre-reload representation a function terminates after the last
 399 instruction in the insn chain and no explicit return instructions are
 400 used.  This corresponds to the fall-thru edge into exit block.  After
 401 reload, optimal RTL epilogues are used that use explicit (conditional)
 402 return instructions that are represented by edges with no flags set.
 403
 404 @end table
 405
 406
 407 @node Profile information
 408 @section Profile information
 409
 410 @cindex profile representation
 411 In many cases a compiler must make a choice whether to trade speed in
 412 one part of code for speed in another, or to trade code size for code
 413 speed.  In such cases it is useful to know information about how often
 414 some given block will be executed.  That is the purpose for
 415 maintaining profile within the flow graph.
 416 GCC can handle profile information obtained through @dfn{profile
 417 feedback}, but it can also estimate branch probabilities based on
 418 statics and heuristics.
 419
 420 @cindex profile feedback
 421 The feedback based profile is produced by compiling the program with
 422 instrumentation, executing it on a train run and reading the numbers
 423 of executions of basic blocks and edges back to the compiler while
 424 re-compiling the program to produce the final executable.  This method
 425 provides very accurate information about where a program spends most
 426 of its time on the train run.  Whether it matches the average run of
 427 course depends on the choice of train data set, but several studies
 428 have shown that the behavior of a program usually changes just
 429 marginally over different data sets.
 430
 431 @cindex Static profile estimation
 432 @cindex branch prediction
 433 @findex predict.def
 434 When profile feedback is not available, the compiler may be asked to
 435 attempt to predict the behavior of each branch in the program using a
 436 set of heuristics (see @file{predict.def} for details) and compute
 437 estimated frequencies of each basic block by propagating the
 438 probabilities over the graph.
 439
 440 @findex frequency, count, BB_FREQ_BASE
 441 Each @code{basic_block} contains two integer fields to represent
 442 profile information: @code{frequency} and @code{count}.  The
 443 @code{frequency} is an estimation how often is basic block executed
 444 within a function.  It is represented as an integer scaled in the
 445 range from 0 to @code{BB_FREQ_BASE}.  The most frequently executed
 446 basic block in function is initially set to @code{BB_FREQ_BASE} and
 447 the rest of frequencies are scaled accordingly.  During optimization,
 448 the frequency of the most frequent basic block can both decrease (for
 449 instance by loop unrolling) or grow (for instance by cross-jumping
 450 optimization), so scaling sometimes has to be performed multiple
 451 times.
 452
 453 @findex gcov_type
 454 The @code{count} contains hard-counted numbers of execution measured
 455 during training runs and is nonzero only when profile feedback is
 456 available.  This value is represented as the host's widest integer
 457 (typically a 64 bit integer) of the special type @code{gcov_type}.
 458
 459 Most optimization passes can use only the frequency information of a
 460 basic block, but a few passes may want to know hard execution counts.
 461 The frequencies should always match the counts after scaling, however
 462 during updating of the profile information numerical error may
 463 accumulate into quite large errors.
 464
 465 @findex REG_BR_PROB_BASE, EDGE_FREQUENCY
 466 Each edge also contains a branch probability field: an integer in the
 467 range from 0 to @code{REG_BR_PROB_BASE}.  It represents probability of
 468 passing control from the end of the @code{src} basic block to the
 469 @code{dest} basic block, i.e.@: the probability that control will flow
 470 along this edge.  The @code{EDGE_FREQUENCY} macro is available to
 471 compute how frequently a given edge is taken.  There is a @code{count}
 472 field for each edge as well, representing same information as for a
 473 basic block.
 474
 475 The basic block frequencies are not represented in the instruction
 476 stream, but in the RTL representation the edge frequencies are
 477 represented for conditional jumps (via the @code{REG_BR_PROB}
 478 macro) since they are used when instructions are output to the
 479 assembly file and the flow graph is no longer maintained.
 480
 481 @cindex reverse probability
 482 The probability that control flow arrives via a given edge to its
 483 destination basic block is called @dfn{reverse probability} and is not
 484 directly represented, but it may be easily computed from frequencies
 485 of basic blocks.
 486
 487 @findex redirect_edge_and_branch
 488 Updating profile information is a delicate task that can unfortunately
 489 not be easily integrated with the CFG manipulation API@.  Many of the
 490 functions and hooks to modify the CFG, such as
 491 @code{redirect_edge_and_branch}, do not have enough information to
 492 easily update the profile, so updating it is in the majority of cases
 493 left up to the caller.  It is difficult to uncover bugs in the profile
 494 updating code, because they manifest themselves only by producing
 495 worse code, and checking profile consistency is not possible because
 496 of numeric error accumulation.  Hence special attention needs to be
 497 given to this issue in each pass that modifies the CFG@.
 498
 499 @findex REG_BR_PROB_BASE, BB_FREQ_BASE, count
 500 It is important to point out that @code{REG_BR_PROB_BASE} and
 501 @code{BB_FREQ_BASE} are both set low enough to be possible to compute
 502 second power of any frequency or probability in the flow graph, it is
 503 not possible to even square the @code{count} field, as modern CPUs are
 504 fast enough to execute $2^32$ operations quickly.
 505
 506
 507 @node Maintaining the CFG
 508 @section Maintaining the CFG
 509 @findex cfghooks.h
 510
 511 An important task of each compiler pass is to keep both the control
 512 flow graph and all profile information up-to-date.  Reconstruction of
 513 the control flow graph after each pass is not an option, since it may be
 514 very expensive and lost profile information cannot be reconstructed at
 515 all.
 516
 517 GCC has two major intermediate representations, and both use the
 518 @code{basic_block} and @code{edge} data types to represent control
 519 flow.  Both representations share as much of the CFG maintenance code
 520 as possible.  For each representation, a set of @dfn{hooks} is defined
 521 so that each representation can provide its own implementation of CFG
 522 manipulation routines when necessary.  These hooks are defined in
 523 @file{cfghooks.h}.  There are hooks for almost all common CFG
 524 manipulations, including block splitting and merging, edge redirection
 525 and creating and deleting basic blocks.  These hooks should provide
 526 everything you need to maintain and manipulate the CFG in both the RTL
 527 and @code{GIMPLE} representation.
 528
 529 At the moment, the basic block boundaries are maintained transparently
 530 when modifying instructions, so there rarely is a need to move them
 531 manually (such as in case someone wants to output instruction outside
 532 basic block explicitly).
 533
 534 @findex BLOCK_FOR_INSN, gimple_bb
 535 In the RTL representation, each instruction has a
 536 @code{BLOCK_FOR_INSN} value that represents pointer to the basic block
 537 that contains the instruction.  In the @code{GIMPLE} representation, the
 538 function @code{gimple_bb} returns a pointer to the basic block
 539 containing the queried statement.
 540
 541 @cindex GIMPLE statement iterators
 542 When changes need to be applied to a function in its @code{GIMPLE}
 543 representation, @dfn{GIMPLE statement iterators} should be used.  These
 544 iterators provide an integrated abstraction of the flow graph and the
 545 instruction stream.  Block statement iterators are constructed using
 546 the @code{gimple_stmt_iterator} data structure and several modifier are
 547 available, including the following:
 548
 549 @ftable @code
 550 @item gsi_start
 551 This function initializes a @code{gimple_stmt_iterator} that points to
 552 the first non-empty statement in a basic block.
 553
 554 @item gsi_last
 555 This function initializes a @code{gimple_stmt_iterator} that points to
 556 the last statement in a basic block.
 557
 558 @item gsi_end_p
 559 This predicate is @code{true} if a @code{gimple_stmt_iterator}
 560 represents the end of a basic block.
 561
 562 @item gsi_next
 563 This function takes a @code{gimple_stmt_iterator} and makes it point to
 564 its successor.
 565
 566 @item gsi_prev
 567 This function takes a @code{gimple_stmt_iterator} and makes it point to
 568 its predecessor.
 569
 570 @item gsi_insert_after
 571 This function inserts a statement after the @code{gimple_stmt_iterator}
 572 passed in.  The final parameter determines whether the statement
 573 iterator is updated to point to the newly inserted statement, or left
 574 pointing to the original statement.
 575
 576 @item gsi_insert_before
 577 This function inserts a statement before the @code{gimple_stmt_iterator}
 578 passed in.  The final parameter determines whether the statement
 579 iterator is updated to point to the newly inserted statement, or left
 580 pointing to the original  statement.
 581
 582 @item gsi_remove
 583 This function removes the @code{gimple_stmt_iterator} passed in and
 584 rechains the remaining statements in a basic block, if any.
 585 @end ftable
 586
 587 @findex BB_HEAD, BB_END
 588 In the RTL representation, the macros @code{BB_HEAD} and @code{BB_END}
 589 may be used to get the head and end @code{rtx} of a basic block.  No
 590 abstract iterators are defined for traversing the insn chain, but you
 591 can just use @code{NEXT_INSN} and @code{PREV_INSN} instead.  @xref{Insns}.
 592
 593 @findex purge_dead_edges
 594 Usually a code manipulating pass simplifies the instruction stream and
 595 the flow of control, possibly eliminating some edges.  This may for
 596 example happen when a conditional jump is replaced with an
 597 unconditional jump, but also when simplifying possibly trapping
 598 instruction to non-trapping while compiling Java.  Updating of edges
 599 is not transparent and each optimization pass is required to do so
 600 manually.  However only few cases occur in practice.  The pass may
 601 call @code{purge_dead_edges} on a given basic block to remove
 602 superfluous edges, if any.
 603
 604 @findex redirect_edge_and_branch, redirect_jump
 605 Another common scenario is redirection of branch instructions, but
 606 this is best modeled as redirection of edges in the control flow graph
 607 and thus use of @code{redirect_edge_and_branch} is preferred over more
 608 low level functions, such as @code{redirect_jump} that operate on RTL
 609 chain only.  The CFG hooks defined in @file{cfghooks.h} should provide
 610 the complete API required for manipulating and maintaining the CFG@.
 611
 612 @findex split_block
 613 It is also possible that a pass has to insert control flow instruction
 614 into the middle of a basic block, thus creating an entry point in the
 615 middle of the basic block, which is impossible by definition: The
 616 block must be split to make sure it only has one entry point, i.e.@: the
 617 head of the basic block.  The CFG hook @code{split_block} may be used
 618 when an instruction in the middle of a basic block has to become the
 619 target of a jump or branch instruction.
 620
 621 @findex insert_insn_on_edge
 622 @findex commit_edge_insertions
 623 @findex gsi_insert_on_edge
 624 @findex gsi_commit_edge_inserts
 625 @cindex edge splitting
 626 For a global optimizer, a common operation is to split edges in the
 627 flow graph and insert instructions on them.  In the RTL
 628 representation, this can be easily done using the
 629 @code{insert_insn_on_edge} function that emits an instruction
 630 ``on the edge'', caching it for a later @code{commit_edge_insertions}
 631 call that will take care of moving the inserted instructions off the
 632 edge into the instruction stream contained in a basic block.  This
 633 includes the creation of new basic blocks where needed.  In the
 634 @code{GIMPLE} representation, the equivalent functions are
 635 @code{gsi_insert_on_edge} which inserts a block statement
 636 iterator on an edge, and @code{gsi_commit_edge_inserts} which flushes
 637 the instruction to actual instruction stream.
 638
 639 @findex verify_flow_info
 640 @cindex CFG verification
 641 While debugging the optimization pass, the @code{verify_flow_info}
 642 function may be useful to find bugs in the control flow graph updating
 643 code.
 644
 645
 646 @node Liveness information
 647 @section Liveness information
 648 @cindex Liveness representation
 649 Liveness information is useful to determine whether some register is
 650 ``live'' at given point of program, i.e.@: that it contains a value that
 651 may be used at a later point in the program.  This information is
 652 used, for instance, during register allocation, as the pseudo
 653 registers only need to be assigned to a unique hard register or to a
 654 stack slot if they are live.  The hard registers and stack slots may
 655 be freely reused for other values when a register is dead.
 656
 657 Liveness information is available in the back end starting with
 658 @code{pass_df_initialize} and ending with @code{pass_df_finish}.  Three
 659 flavors of live analysis are available: With @code{LR}, it is possible
 660 to determine at any point @code{P} in the function if the register may be
 661 used on some path from @code{P} to the end of the function.  With
 662 @code{UR}, it is possible to determine if there is a path from the
 663 beginning of the function to @code{P} that defines the variable.
 664 @code{LIVE} is the intersection of the @code{LR} and @code{UR} and a
 665 variable is live at @code{P} if there is both an assignment that reaches
 666 it from the beginning of the function and a use that can be reached on
 667 some path from @code{P} to the end of the function.
 668
 669 In general @code{LIVE} is the most useful of the three.  The macros
 670 @code{DF_[LR,UR,LIVE]_[IN,OUT]} can be used to access this information.
 671 The macros take a basic block number and return a bitmap that is indexed
 672 by the register number.  This information is only guaranteed to be up to
 673 date after calls are made to @code{df_analyze}.  See the file
 674 @code{df-core.c} for details on using the dataflow.
 675
 676
 677 @findex REG_DEAD, REG_UNUSED
 678 The liveness information is stored partly in the RTL instruction stream
 679 and partly in the flow graph.  Local information is stored in the
 680 instruction stream: Each instruction may contain @code{REG_DEAD} notes
 681 representing that the value of a given register is no longer needed, or
 682 @code{REG_UNUSED} notes representing that the value computed by the
 683 instruction is never used.  The second is useful for instructions
 684 computing multiple values at once.
 685