contrib/gcc-4.0/gcc/doc/cfg.texi

   1 @c -*-texinfo-*-
   2 @c Copyright (C) 2001, 2003, 2004 Free Software Foundation, Inc.
   3 @c This is part of the GCC manual.
   4 @c For copying conditions, see the file gcc.texi.
   5
   6 @c ---------------------------------------------------------------------
   7 @c Control Flow Graph
   8 @c ---------------------------------------------------------------------
   9
  10 @node Control Flow
  11 @chapter Control Flow Graph
  12 @cindex CFG, Control Flow Graph
  13 @findex basic-block.h
  14
  15 A control flow graph (CFG) is a data structure built on top of the
  16 intermediate code representation (the RTL or @code{tree} instruction
  17 stream) abstracting the control flow behavior of a function that is
  18 being compiled.  The CFG is a directed graph where the vertices
  19 represent basic blocks and edges represent possible transfer of
  20 control flow from one basic block to another.  The data structures
  21 used to represent the control flow graph are defined in
  22 @file{basic-block.h}.
  23
  24 @menu
  25 * Basic Blocks::           The definition and representation of basic blocks.
  26 * Edges::                  Types of edges and their representation.
  27 * Profile information::    Representation of frequencies and probabilities.
  28 * Maintaining the CFG::    Keeping the control flow graph and up to date.
  29 * Liveness information::   Using and maintaining liveness information.
  30 @end menu
  31
  32
  33 @node Basic Blocks
  34 @section Basic Blocks
  35
  36 @cindex basic block
  37 @findex basic_block
  38 A basic block is a straight-line sequence of code with only one entry
  39 point and only one exit.  In GCC, basic blocks are represented using
  40 the @code{basic_block} data type.
  41
  42 @findex next_bb, prev_bb, FOR_EACH_BB
  43 Two pointer members of the @code{basic_block} structure are the
  44 pointers @code{next_bb} and @code{prev_bb}.  These are used to keep
  45 doubly linked chain of basic blocks in the same order as the
  46 underlying instruction stream.  The chain of basic blocks is updated
  47 transparently by the provided API for manipulating the CFG@.  The macro
  48 @code{FOR_EACH_BB} can be used to visit all the basic blocks in
  49 lexicographical order.  Dominator traversals are also possible using
  50 @code{walk_dominator_tree}.  Given two basic blocks A and B, block A
  51 dominates block B if A is @emph{always} executed before B@.
  52
  53 @findex BASIC_BLOCK
  54 The @code{BASIC_BLOCK} array contains all basic blocks in an
  55 unspecified order.  Each @code{basic_block} structure has a field
  56 that holds a unique integer identifier @code{index} that is the
  57 index of the block in the @code{BASIC_BLOCK} array.
  58 The total number of basic blocks in the function is
  59 @code{n_basic_blocks}.  Both the basic block indices and
  60 the total number of basic blocks may vary during the compilation
  61 process, as passes reorder, create, duplicate, and destroy basic
  62 blocks.  The index for any block should never be greater than
  63 @code{last_basic_block}.
  64
  65 @findex ENTRY_BLOCK_PTR, EXIT_BLOCK_PTR
  66 Special basic blocks represent possible entry and exit points of a
  67 function.  These blocks are called @code{ENTRY_BLOCK_PTR} and
  68 @code{EXIT_BLOCK_PTR}.  These blocks do not contain any code, and are
  69 not elements of the @code{BASIC_BLOCK} array.  Therefore they have
  70 been assigned unique, negative index numbers.
  71
  72 Each @code{basic_block} also contains pointers to the first
  73 instruction (the @dfn{head}) and the last instruction (the @dfn{tail})
  74 or @dfn{end} of the instruction stream contained in a basic block.  In
  75 fact, since the @code{basic_block} data type is used to represent
  76 blocks in both major intermediate representations of GCC (@code{tree}
  77 and RTL), there are pointers to the head and end of a basic block for
  78 both representations.
  79
  80 @findex NOTE_INSN_BASIC_BLOCK, CODE_LABEL, notes
  81 For RTL, these pointers are @code{rtx head, end}.  In the RTL function
  82 representation, the head pointer always points either to a
  83 @code{NOTE_INSN_BASIC_BLOCK} or to a @code{CODE_LABEL}, if present.
  84 In the RTL representation of a function, the instruction stream
  85 contains not only the ``real'' instructions, but also @dfn{notes}.
  86 Any function that moves or duplicates the basic blocks needs
  87 to take care of updating of these notes.  Many of these notes expect
  88 that the instruction stream consists of linear regions, making such
  89 updates difficult.   The @code{NOTE_INSN_BASIC_BLOCK} note is the only
  90 kind of note that may appear in the instruction stream contained in a
  91 basic block.  The instruction stream of a basic block always follows a
  92 @code{NOTE_INSN_BASIC_BLOCK},  but zero or more @code{CODE_LABEL}
  93 nodes can precede the block note.   A basic block ends by control flow
  94 instruction or last instruction before following @code{CODE_LABEL} or
  95 @code{NOTE_INSN_BASIC_BLOCK}.  A @code{CODE_LABEL} cannot appear in
  96 the instruction stream of a basic block.
  97
  98 @findex can_fallthru
  99 @cindex table jump
 100 In addition to notes, the jump table vectors are also represented as
 101 ``pseudo-instructions'' inside the insn stream.  These vectors never
 102 appear in the basic block and should always be placed just after the
 103 table jump instructions referencing them.  After removing the
 104 table-jump it is often difficult to eliminate the code computing the
 105 address and referencing the vector, so cleaning up these vectors is
 106 postponed until after liveness analysis.   Thus the jump table vectors
 107 may appear in the insn stream unreferenced and without any purpose.
 108 Before any edge is made @dfn{fall-thru}, the existence of such
 109 construct in the way needs to be checked by calling
 110 @code{can_fallthru} function.
 111
 112 @cindex block statement iterators
 113 For the @code{tree} representation, the head and end of the basic block
 114 are being pointed to by the @code{stmt_list} field, but this special
 115 @code{tree} should never be referenced directly.  Instead, at the tree
 116 level abstract containers and iterators are used to access statements
 117 and expressions in basic blocks.  These iterators are called
 118 @dfn{block statement iterators} (BSIs).  Grep for @code{^bsi}
 119 in the various @file{tree-*} files.
 120 The following snippet will pretty-print all the statements of the
 121 program in the GIMPLE representation.
 122
 123 @smallexample
 124 FOR_EACH_BB (bb)
 125   @{
 126      block_stmt_iterator si;
 127
 128      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
 129        @{
 130           tree stmt = bsi_stmt (si);
 131           print_generic_stmt (stderr, stmt, 0);
 132        @}
 133   @}
 134 @end smallexample
 135
 136
 137 @node Edges
 138 @section Edges
 139
 140 @cindex edge in the flow graph
 141 @findex edge
 142 Edges represent possible control flow transfers from the end of some
 143 basic block A to the head of another basic block B@.  We say that A is
 144 a predecessor of B, and B is a successor of A@.  Edges are represented
 145 in GCC with the @code{edge} data type.  Each @code{edge} acts as a
 146 link between two basic blocks: the @code{src} member of an edge
 147 points to the predecessor basic block of the @code{dest} basic block.
 148 The members @code{preds} and @code{succs} of the @code{basic_block} data
 149 type point to type-safe vectors of edges to the predecessors and
 150 successors of the block.
 151
 152 @cindex edge iterators
 153 When walking the edges in an edge vector, @dfn{edge iterators} should
 154 be used.  Edge iterators are constructed using the
 155 @code{edge_iterator} data structure and several methods are available
 156 to operate on them:
 157
 158 @ftable @code
 159 @item ei_start
 160 This function initializes an @code{edge_iterator} that points to the
 161 first edge in a vector of edges.
 162
 163 @item ei_last
 164 This function initializes an @code{edge_iterator} that points to the
 165 last edge in a vector of edges.
 166
 167 @item ei_end_p
 168 This predicate is @code{true} if an @code{edge_iterator} represents
 169 the last edge in an edge vector.
 170
 171 @item ei_one_before_end_p
 172 This predicate is @code{true} if an @code{edge_iterator} represents
 173 the second last edge in an edge vector.
 174
 175 @item ei_next
 176 This function takes a pointer to an @code{edge_iterator} and makes it
 177 point to the next edge in the sequence.
 178
 179 @item ei_prev
 180 This function takes a pointer to an @code{edge_iterator} and makes it
 181 point to the previous edge in the sequence.
 182
 183 @item ei_edge
 184 This function returns the @code{edge} currently pointed to by an
 185 @code{edge_iterator}.
 186
 187 @item ei_safe_safe
 188 This function returns the @code{edge} currently pointed to by an
 189 @code{edge_iterator}, but returns @code{NULL} if the iterator is
 190 pointing at the end of the sequence.  This function has been provided
 191 for existing code makes the assumption that a @code{NULL} edge
 192 indicates the end of the sequence.
 193
 194 @end ftable
 195
 196 The convenience macro @code{FOR_EACH_EDGE} can be used to visit all of
 197 the edges in a sequence of predecessor or successor edges.  It must
 198 not be used when an element might be removed during the traversal,
 199 otherwise elements will be missed.  Here is an example of how to use
 200 the macro:
 201
 202 @smallexample
 203 edge e;
 204 edge_iterator ei;
 205
 206 FOR_EACH_EDGE (e, ei, bb->succs)
 207   @{
 208      if (e->flags & EDGE_FALLTHRU)
 209        break;
 210   @}
 211 @end smallexample
 212
 213 @findex fall-thru
 214 There are various reasons why control flow may transfer from one block
 215 to another.  One possibility is that some instruction, for example a
 216 @code{CODE_LABEL}, in a linearized instruction stream just always
 217 starts a new basic block.  In this case a @dfn{fall-thru} edge links
 218 the basic block to the first following basic block.  But there are
 219 several other reasons why edges may be created.  The @code{flags}
 220 field of the @code{edge} data type is used to store information
 221 about the type of edge we are dealing with.  Each edge is of one of
 222 the following types:
 223
 224 @table @emph
 225 @item jump
 226 No type flags are set for edges corresponding to jump instructions.
 227 These edges are used for unconditional or conditional jumps and in
 228 RTL also for table jumps.  They are the easiest to manipulate as they
 229 may be freely redirected when the flow graph is not in SSA form.
 230
 231 @item fall-thru
 232 @findex EDGE_FALLTHRU, force_nonfallthru
 233 Fall-thru edges are present in case where the basic block may continue
 234 execution to the following one without branching.  These edges have
 235 the @code{EDGE_FALLTHRU} flag set.  Unlike other types of edges, these
 236 edges must come into the basic block immediately following in the
 237 instruction stream.  The function @code{force_nonfallthru} is
 238 available to insert an unconditional jump in the case that redirection
 239 is needed.  Note that this may require creation of a new basic block.
 240
 241 @item exception handling
 242 @cindex exception handling
 243 @findex EDGE_ABNORMAL, EDGE_EH
 244 Exception handling edges represent possible control transfers from a
 245 trapping instruction to an exception handler.  The definition of
 246 ``trapping'' varies.  In C++, only function calls can throw, but for
 247 Java, exceptions like division by zero or segmentation fault are
 248 defined and thus each instruction possibly throwing this kind of
 249 exception needs to be handled as control flow instruction.  Exception
 250 edges have the @code{EDGE_ABNORMAL} and @code{EDGE_EH} flags set.
 251
 252 @findex purge_dead_edges
 253 When updating the instruction stream it is easy to change possibly
 254 trapping instruction to non-trapping, by simply removing the exception
 255 edge.  The opposite conversion is difficult, but should not happen
 256 anyway.  The edges can be eliminated via @code{purge_dead_edges} call.
 257
 258 @findex REG_EH_REGION, EDGE_ABNORMAL_CALL
 259 In the RTL representation, the destination of an exception edge is
 260 specified by @code{REG_EH_REGION} note attached to the insn.
 261 In case of a trapping call the @code{EDGE_ABNORMAL_CALL} flag is set
 262 too.  In the @code{tree} representation, this extra flag is not set.
 263
 264 @findex may_trap_p, tree_could_trap_p
 265 In the RTL representation, the predicate @code{may_trap_p} may be used
 266 to check whether instruction still may trap or not.  For the tree
 267 representation, the @code{tree_could_trap_p} predicate is available,
 268 but this predicate only checks for possible memory traps, as in
 269 dereferencing an invalid pointer location.
 270
 271
 272 @item sibling calls
 273 @cindex sibling call
 274 @findex EDGE_ABNORMAL, EDGE_SIBCALL
 275 Sibling calls or tail calls terminate the function in a non-standard
 276 way and thus an edge to the exit must be present.
 277 @code{EDGE_SIBCALL} and @code{EDGE_ABNORMAL} are set in such case.
 278 These edges only exist in the RTL representation.
 279
 280 @item computed jumps
 281 @cindex computed jump
 282 @findex EDGE_ABNORMAL
 283 Computed jumps contain edges to all labels in the function referenced
 284 from the code.  All those edges have @code{EDGE_ABNORMAL} flag set.
 285 The edges used to represent computed jumps often cause compile time
 286 performance problems, since functions consisting of many taken labels
 287 and many computed jumps may have @emph{very} dense flow graphs, so
 288 these edges need to be handled with special care.  During the earlier
 289 stages of the compilation process, GCC tries to avoid such dense flow
 290 graphs by factoring computed jumps.  For example, given the following
 291 series of jumps,
 292
 293 @smallexample
 294   goto *x;
 295   [ ... ]
 296
 297   goto *x;
 298   [ ... ]
 299
 300   goto *x;
 301   [ ... ]
 302 @end smallexample
 303
 304 @noindent
 305 factoring the computed jumps results in the following code sequence
 306 which has a much simpler flow graph:
 307
 308 @smallexample
 309   goto y;
 310   [ ... ]
 311
 312   goto y;
 313   [ ... ]
 314
 315   goto y;
 316   [ ... ]
 317
 318 y:
 319   goto *x;
 320 @end smallexample
 321
 322 However, the classic problem with this transformation is that it has a
 323 runtime cost in there resulting code: An extra jump.  Therefore, the
 324 computed jumps are un-factored in the later passes of the compiler.
 325 Be aware of that when you work on passes in that area.  There have
 326 been numerous examples already where the compile time for code with
 327 unfactored computed jumps caused some serious headaches.
 328
 329 @item nonlocal goto handlers
 330 @cindex nonlocal goto handler
 331 @findex EDGE_ABNORMAL, EDGE_ABNORMAL_CALL
 332 GCC allows nested functions to return into caller using a @code{goto}
 333 to a label passed to as an argument to the callee.  The labels passed
 334 to nested functions contain special code to cleanup after function
 335 call.  Such sections of code are referred to as ``nonlocal goto
 336 receivers''.  If a function contains such nonlocal goto receivers, an
 337 edge from the call to the label is created with the
 338 @code{EDGE_ABNORMAL} and @code{EDGE_ABNORMAL_CALL} flags set.
 339
 340 @item function entry points
 341 @cindex function entry point, alternate function entry point
 342 @findex LABEL_ALTERNATE_NAME
 343 By definition, execution of function starts at basic block 0, so there
 344 is always an edge from the @code{ENTRY_BLOCK_PTR} to basic block 0.
 345 There is no @code{tree} representation for alternate entry points at
 346 this moment.  In RTL, alternate entry points are specified by
 347 @code{CODE_LABEL} with @code{LABEL_ALTERNATE_NAME} defined.  This
 348 feature is currently used for multiple entry point prologues and is
 349 limited to post-reload passes only.  This can be used by back-ends to
 350 emit alternate prologues for functions called from different contexts.
 351 In future full support for multiple entry functions defined by Fortran
 352 90 needs to be implemented.
 353
 354 @item function exits
 355 In the pre-reload representation a function terminates after the last
 356 instruction in the insn chain and no explicit return instructions are
 357 used.  This corresponds to the fall-thru edge into exit block.  After
 358 reload, optimal RTL epilogues are used that use explicit (conditional)
 359 return instructions that are represented by edges with no flags set.
 360
 361 @end table
 362
 363
 364 @node Profile information
 365 @section Profile information
 366
 367 @cindex profile representation
 368 In many cases a compiler must make a choice whether to trade speed in
 369 one part of code for speed in another, or to trade code size for code
 370 speed.  In such cases it is useful to know information about how often
 371 some given block will be executed.  That is the purpose for
 372 maintaining profile within the flow graph.
 373 GCC can handle profile information obtained through @dfn{profile
 374 feedback}, but it can also  estimate branch probabilities based on
 375 statics and heuristics.
 376
 377 @cindex profile feedback
 378 The feedback based profile is produced by compiling the program with
 379 instrumentation, executing it on a train run and reading the numbers
 380 of executions of basic blocks and edges back to the compiler while
 381 re-compiling the program to produce the final executable.  This method
 382 provides very accurate information about where a program spends most
 383 of its time on the train run.  Whether it matches the average run of
 384 course depends on the choice of train data set, but several studies
 385 have shown that the behavior of a program usually changes just
 386 marginally over different data sets.
 387
 388 @cindex Static profile estimation
 389 @cindex branch prediction
 390 @findex predict.def
 391 When profile feedback is not available, the compiler may be asked to
 392 attempt to predict the behavior of each branch in the program using a
 393 set of heuristics (see @file{predict.def} for details) and compute
 394 estimated frequencies of each basic block by propagating the
 395 probabilities over the graph.
 396
 397 @findex frequency, count, BB_FREQ_BASE
 398 Each @code{basic_block} contains two integer fields to represent
 399 profile information: @code{frequency} and @code{count}.  The
 400 @code{frequency} is an estimation how often is basic block executed
 401 within a function.  It is represented as an integer scaled in the
 402 range from 0 to @code{BB_FREQ_BASE}.  The most frequently executed
 403 basic block in function is initially set to @code{BB_FREQ_BASE} and
 404 the rest of frequencies are scaled accordingly.  During optimization,
 405 the frequency of the most frequent basic block can both decrease (for
 406 instance by loop unrolling) or grow (for instance by cross-jumping
 407 optimization), so scaling sometimes has to be performed multiple
 408 times.
 409
 410 @findex gcov_type
 411 The @code{count} contains hard-counted numbers of execution measured
 412 during training runs and is nonzero only when profile feedback is
 413 available.  This value is represented as the host's widest integer
 414 (typically a 64 bit integer) of the special type @code{gcov_type}.
 415
 416 Most optimization passes can use only the frequency information of a
 417 basic block, but a few passes may want to know hard execution counts.
 418 The frequencies should always match the counts after scaling, however
 419 during updating of the profile information numerical error may
 420 accumulate into quite large errors.
 421
 422 @findex REG_BR_PROB_BASE, EDGE_FREQUENCY
 423 Each edge also contains a branch probability field: an integer in the
 424 range from 0 to @code{REG_BR_PROB_BASE}.  It represents probability of
 425 passing control from the end of the @code{src} basic block to the
 426 @code{dest} basic block, i.e.@: the probability that control will flow
 427 along this edge.   The @code{EDGE_FREQUENCY} macro is available to
 428 compute how frequently a given edge is taken.  There is a @code{count}
 429 field for each edge as well, representing same information as for a
 430 basic block.
 431
 432 The basic block frequencies are not represented in the instruction
 433 stream, but in the RTL representation the edge frequencies are
 434 represented for conditional jumps (via the @code{REG_BR_PROB}
 435 macro) since they are used when instructions are output to the
 436 assembly file and the flow graph is no longer maintained.
 437
 438 @cindex reverse probability
 439 The probability that control flow arrives via a given edge to its
 440 destination basic block is called @dfn{reverse probability} and is not
 441 directly represented, but it may be easily computed from frequencies
 442 of basic blocks.
 443
 444 @findex redirect_edge_and_branch
 445 Updating profile information is a delicate task that can unfortunately
 446 not be easily integrated with the CFG manipulation API@.  Many of the
 447 functions and hooks to modify the CFG, such as
 448 @code{redirect_edge_and_branch}, do not have enough information to
 449 easily update the profile, so updating it is in the majority of cases
 450 left up to the caller.  It is difficult to uncover bugs in the profile
 451 updating code, because they manifest themselves only by producing
 452 worse code, and checking profile consistency is not possible because
 453 of numeric error accumulation.  Hence special attention needs to be
 454 given to this issue in each pass that modifies the CFG@.
 455
 456 @findex REG_BR_PROB_BASE, BB_FREQ_BASE, count
 457 It is important to point out that @code{REG_BR_PROB_BASE} and
 458 @code{BB_FREQ_BASE} are both set low enough to be possible to compute
 459 second power of any frequency or probability in the flow graph, it is
 460 not possible to even square the @code{count} field, as modern CPUs are
 461 fast enough to execute $2^32$ operations quickly.
 462
 463
 464 @node Maintaining the CFG
 465 @section Maintaining the CFG
 466 @findex cfghooks.h
 467
 468 An important task of each compiler pass is to keep both the control
 469 flow graph and all profile information up-to-date.  Reconstruction of
 470 the control flow graph after each pass is not an option, since it may be
 471 very expensive and lost profile information cannot be reconstructed at
 472 all.
 473
 474 GCC has two major intermediate representations, and both use the
 475 @code{basic_block} and @code{edge} data types to represent control
 476 flow.  Both representations share as much of the CFG maintenance code
 477 as possible.  For each representation, a set of @dfn{hooks} is defined
 478 so that each representation can provide its own implementation of CFG
 479 manipulation routines when necessary.  These hooks are defined in
 480 @file{cfghooks.h}.  There are hooks for almost all common CFG
 481 manipulations, including block splitting and merging, edge redirection
 482 and creating and deleting basic blocks.  These hooks should provide
 483 everything you need to maintain and manipulate the CFG in both the RTL
 484 and @code{tree} representation.
 485
 486 At the moment, the basic block boundaries are maintained transparently
 487 when modifying instructions, so there rarely is a need to move them
 488 manually (such as in case someone wants to output instruction outside
 489 basic block explicitly).
 490 Often the CFG may be better viewed as integral part of instruction
 491 chain, than structure built on the top of it.  However, in principle
 492 the control flow graph for the @code{tree} representation is
 493 @emph{not} an integral part of the representation, in that a function
 494 tree may be expanded without first building a  flow graph for the
 495 @code{tree} representation at all.  This happens when compiling
 496 without any @code{tree} optimization enabled.  When the @code{tree}
 497 optimizations are enabled and the instruction stream is rewritten in
 498 SSA form, the CFG is very tightly coupled with the instruction stream.
 499 In particular, statement insertion and removal has to be done with
 500 care.  In fact, the whole @code{tree} representation can not be easily
 501 used or maintained without proper maintenance of the CFG
 502 simultaneously.
 503
 504 @findex BLOCK_FOR_INSN, bb_for_stmt
 505 In the RTL representation, each instruction has a
 506 @code{BLOCK_FOR_INSN} value that represents pointer to the basic block
 507 that contains the instruction.  In the @code{tree} representation, the
 508 function @code{bb_for_stmt} returns a pointer to the basic block
 509 containing the queried statement.
 510
 511 @cindex block statement iterators
 512 When changes need to be applied to a function in its @code{tree}
 513 representation, @dfn{block statement iterators} should be used.  These
 514 iterators provide an integrated abstraction of the flow graph and the
 515 instruction stream.  Block statement iterators iterators are
 516 constructed using the @code{block_stmt_iterator} data structure and
 517 several modifier are available, including the following:
 518
 519 @ftable @code
 520 @item bsi_start
 521 This function initializes a @code{block_stmt_iterator} that points to
 522 the first non-empty statement in a basic block.
 523
 524 @item bsi_last
 525 This function initializes a @code{block_stmt_iterator} that points to
 526 the last statement in a basic block.
 527
 528 @item bsi_end_p
 529 This predicate is @code{true} if a @code{block_stmt_iterator}
 530 represents the end of a basic block.
 531
 532 @item bsi_next
 533 This function takes a @code{block_stmt_iterator} and makes it point to
 534 its successor.
 535
 536 @item bsi_prev
 537 This function takes a @code{block_stmt_iterator} and makes it point to
 538 its predecessor.
 539
 540 @item bsi_insert_after
 541 This function inserts a statement after the @code{block_stmt_iterator}
 542 passed in.  The final parameter determines whether the statement
 543 iterator is updated to point to the newly inserted statement, or left
 544 pointing to the original statement.
 545
 546 @item bsi_insert_before
 547 This function inserts a statement before the @code{block_stmt_iterator}
 548 passed in.  The final parameter determines whether the statement
 549 iterator is updated to point to the newly inserted statement, or left
 550 pointing to the original  statement.
 551
 552 @item bsi_remove
 553 This function removes the @code{block_stmt_iterator} passed in and
 554 rechains the remaining statements in a basic block, if any.
 555 @end ftable
 556
 557 @findex BB_HEAD, BB_END
 558 In the RTL representation, the macros @code{BB_HEAD} and @code{BB_END}
 559 may be used to get the head and end @code{rtx} of a basic block.  No
 560 abstract iterators are defined for traversing the insn chain, but you
 561 can just use @code{NEXT_INSN} and @code{PREV_INSN} instead.  See
 562 @xref{Insns}.
 563
 564 @findex purge_dead_edges
 565 Usually a code manipulating pass simplifies the instruction stream and
 566 the flow of control, possibly eliminating some edges.  This may for
 567 example happen when a conditional jump is replaced with an
 568 unconditional jump, but also when simplifying possibly trapping
 569 instruction to non-trapping while compiling Java.  Updating of edges
 570 is not transparent and each optimization pass is required to do so
 571 manually.  However only few cases occur in practice.  The pass may
 572 call @code{purge_dead_edges} on a given basic block to remove
 573 superfluous edges, if any.
 574
 575 @findex redirect_edge_and_branch, redirect_jump
 576 Another common scenario is redirection of branch instructions, but
 577 this is best modeled as redirection of edges in the control flow graph
 578 and thus use of @code{redirect_edge_and_branch} is preferred over more
 579 low level functions, such as @code{redirect_jump} that operate on RTL
 580 chain only.  The CFG hooks defined in @file{cfghooks.h} should provide
 581 the complete API required for manipulating and maintaining the CFG@.
 582
 583 @findex find_sub_basic_blocks, split_block
 584 It is also possible that a pass has to insert control flow instruction
 585 into the middle of a basic block, thus creating an entry point in the
 586 middle of the basic block, which is impossible by definition: The
 587 block must be split to make sure it only has one entry point, i.e.@: the
 588 head of the basic block.  In the RTL representation, the
 589 @code{find_sub_basic_blocks} may be used to split existing basic block
 590 and add necessary edges.  The CFG hook @code{split_block} may be used
 591 when an instruction in the middle of a basic block has to become the
 592 target of a jump or branch instruction.
 593
 594 @findex insert_insn_on_edge
 595 @findex commit_edge_insertions
 596 @findex bsi_insert_on_edge
 597 @findex bsi_commit_edge_inserts
 598 @cindex edge splitting
 599 For a global optimizer, a common operation is to split edges in the
 600 flow graph and insert instructions on them.  In the RTL
 601 representation, this can be easily done using the
 602 @code{insert_insn_on_edge} function that emits an instruction
 603 ``on the edge'', caching it for a later @code{commit_edge_insertions}
 604 call that will take care of moving the inserted instructions off the
 605 edge into the instruction stream contained in a basic block.  This
 606 includes the creation of new basic blocks where needed.  In the
 607 @code{tree} representation, the equivalent functions are
 608 @code{bsi_insert_on_edge} which inserts a block statement
 609 iterator on an edge, and @code{bsi_commit_edge_inserts} which flushes
 610 the instruction to actual instruction stream.
 611
 612 While debugging the optimization pass, an @code{verify_flow_info}
 613 function may be useful to find bugs in the control flow graph updating
 614 code.
 615
 616 Note that at present, the representation of control flow in the
 617 @code{tree} representation is discarded before expanding to RTL@.
 618 Long term the CFG should be maintained and ``expanded'' to the
 619 RTL representation along with the function @code{tree} itself.
 620
 621
 622 @node Liveness information
 623 @section Liveness information
 624 @cindex Liveness representation
 625 Liveness information is useful to determine whether some register is
 626 ``live'' at given point of program, i.e.@: that it contains a value that
 627 may be used at a later point in the program.  This information is
 628 used, for instance, during register allocation, as the pseudo
 629 registers only need to be assigned to a unique hard register or to a
 630 stack slot if they are live.  The hard registers and stack slots may
 631 be freely reused for other values when a register is dead.
 632
 633 @findex REG_DEAD, REG_UNUSED
 634 The liveness information is stored partly in the RTL instruction
 635 stream and partly in the flow graph.  Local information is stored in
 636 the instruction stream:
 637 Each instruction may contain @code{REG_DEAD} notes representing that
 638 the value of a given register is no longer needed, or
 639 @code{REG_UNUSED} notes representing that the value computed by the
 640 instruction is never used.  The second is useful for instructions
 641 computing multiple values at once.
 642
 643 @findex global_live_at_start, global_live_at_end
 644 Global liveness information is stored in the control flow graph.
 645 Each basic block contains two bitmaps, @code{global_live_at_start} and
 646 @code{global_live_at_end} representing liveness of each register at
 647 the entry and exit of the basic block.  The file @code{flow.c}
 648 contains functions to compute liveness of each register at any given
 649 place in the instruction stream using this information.
 650
 651 @findex BB_DIRTY, clear_bb_flags, update_life_info_in_dirty_blocks
 652 Liveness is expensive to compute and thus it is desirable to keep it
 653 up to date during code modifying passes.  This can be easily
 654 accomplished using the @code{flags} field of a basic block.  Functions
 655 modifying the instruction stream automatically set the @code{BB_DIRTY}
 656 flag of a modifies basic block, so the pass may simply
 657 use@code{clear_bb_flags} before doing any modifications and then ask
 658 the data flow module to have liveness updated via the
 659 @code{update_life_info_in_dirty_blocks} function.
 660
 661 This scheme works reliably as long as no control flow graph
 662 transformations are done.  The task of updating liveness after control
 663 flow graph changes is more difficult as normal iterative data flow
 664 analysis may produce invalid results or get into an infinite cycle
 665 when the initial solution is not below the desired one.  Only simple
 666 transformations, like splitting basic blocks or inserting on edges,
 667 are safe, as functions to implement them already know how to update
 668 liveness information locally.