contrib/gcc/f/ffe.texi

   1 @c Copyright (C) 1999 Free Software Foundation, Inc.
   2 @c This is part of the G77 manual.
   3 @c For copying conditions, see the file g77.texi.
   4
   5 @node Front End
   6 @chapter Front End
   7 @cindex GNU Fortran Front End (FFE)
   8 @cindex FFE
   9 @cindex @code{g77}, front end
  10 @cindex front end, @code{g77}
  11
  12 This chapter describes some aspects of the design and implementation
  13 of the @code{g77} front end.
  14 Much of the information below applies not to current
  15 releases of @code{g77},
  16 but to the 0.6 rewrite being designed and implemented
  17 as of late May, 1999.
  18
  19 To find about things that are ``To Be Determined'' or ``To Be Done'',
  20 search for the string TBD.
  21 If you want to help by working on one or more of these items,
  22 email me at @email{@value{email-burley}}.
  23 If you're planning to do more than just research issues and offer comments,
  24 see @uref{http://www.gnu.org/software/contribute.html} for steps you might
  25 need to take first.
  26
  27 @menu
  28 * Overview of Sources::
  29 * Overview of Translation Process::
  30 * Philosophy of Code Generation::
  31 * Two-pass Design::
  32 * Challenges Posed::
  33 * Transforming Statements::
  34 * Transforming Expressions::
  35 * Internal Naming Conventions::
  36 @end menu
  37
  38 @node Overview of Sources
  39 @section Overview of Sources
  40
  41 The current directory layout includes the following:
  42
  43 @table @file
  44 @item @value{srcdir}/gcc/
  45 Non-g77 files in gcc
  46
  47 @item @value{srcdir}/gcc/f/
  48 GNU Fortran front end sources
  49
  50 @item @value{srcdir}/libf2c/
  51 @code{libg2c} configuration and @code{g2c.h} file generation
  52
  53 @item @value{srcdir}/libf2c/libF77/
  54 General support and math portion of @code{libg2c}
  55
  56 @item @value{srcdir}/libf2c/libI77/
  57 I/O portion of @code{libg2c}
  58
  59 @item @value{srcdir}/libf2c/libU77/
  60 Additional interfaces to Unix @code{libc} for @code{libg2c}
  61 @end table
  62
  63 Components of note in @code{g77} are described below.
  64
  65 @file{f/} as a whole contains the source for @code{g77},
  66 while @file{libf2c/} contains a portion of the separate program
  67 @code{f2c}.
  68 Note that the @code{libf2c} code is not part of the program @code{g77},
  69 just distributed with it.
  70
  71 @file{f/} contains text files that document the Fortran compiler, source
  72 files for the GNU Fortran Front End (FFE), and some other stuff.
  73 The @code{g77} compiler code is placed in @file{f/} because it,
  74 along with its contents,
  75 is designed to be a subdirectory of a @code{gcc} source directory,
  76 @file{gcc/},
  77 which is structured so that language-specific front ends can be ``dropped
  78 in'' as subdirectories.
  79 The C++ front end (@code{g++}), is an example of this---it resides in
  80 the @file{cp/} subdirectory.
  81 Note that the C front end (also referred to as @code{gcc})
  82 is an exception to this, as its source files reside
  83 in the @file{gcc/} directory itself.
  84
  85 @file{libf2c/} contains the run-time libraries for the @code{f2c} program,
  86 also used by @code{g77}.
  87 These libraries normally referred to collectively as @code{libf2c}.
  88 When built as part of @code{g77},
  89 @code{libf2c} is installed under the name @code{libg2c} to avoid
  90 conflict with any existing version of @code{libf2c},
  91 and thus is often referred to as @code{libg2c} when the
  92 @code{g77} version is specifically being referred to.
  93
  94 The @code{netlib} version of @code{libf2c/}
  95 contains two distinct libraries,
  96 @code{libF77} and @code{libI77},
  97 each in their own subdirectories.
  98 In @code{g77}, this distinction is not made,
  99 beyond maintaining the subdirectory structure in the source-code tree.
 100
 101 @file{libf2c/} is not part of the program @code{g77},
 102 just distributed with it.
 103 It contains files not present
 104 in the official (@code{netlib}) version of @code{libf2c},
 105 and also contains some minor changes made from @code{libf2c},
 106 to fix some bugs,
 107 and to facilitate automatic configuration, building, and installation of
 108 @code{libf2c} (as @code{libg2c}) for use by @code{g77} users.
 109 See @file{libf2c/README} for more information,
 110 including licensing conditions
 111 governing distribution of programs containing code from @code{libg2c}.
 112
 113 @code{libg2c}, @code{g77}'s version of @code{libf2c},
 114 adds Dave Love's implementation of @code{libU77},
 115 in the @file{libf2c/libU77/} directory.
 116 This library is distributed under the
 117 GNU Library General Public License (LGPL)---see the
 118 file @file{libf2c/libU77/COPYING.LIB}
 119 for more information,
 120 as this license
 121 governs distribution conditions for programs containing code
 122 from this portion of the library.
 123
 124 Files of note in @file{f/} and @file{libf2c/} are described below:
 125
 126 @table @file
 127 @item f/BUGS
 128 Lists some important bugs known to be in g77.
 129 Or use Info (or GNU Emacs Info mode) to read
 130 the ``Actual Bugs'' node of the @code{g77} documentation:
 131
 132 @smallexample
 133 info -f f/g77.info -n "Actual Bugs"
 134 @end smallexample
 135
 136 @item f/ChangeLog
 137 Lists recent changes to @code{g77} internals.
 138
 139 @item libf2c/ChangeLog
 140 Lists recent changes to @code{libg2c} internals.
 141
 142 @item f/NEWS
 143 Contains the per-release changes.
 144 These include the user-visible
 145 changes described in the node ``Changes''
 146 in the @code{g77} documentation, plus internal
 147 changes of import.
 148 Or use:
 149
 150 @smallexample
 151 info -f f/g77.info -n News
 152 @end smallexample
 153
 154 @item f/g77.info*
 155 The @code{g77} documentation, in Info format,
 156 produced by building @code{g77}.
 157
 158 All users of @code{g77} (not just installers) should read this,
 159 using the @code{more} command if neither the @code{info} command,
 160 nor GNU Emacs (with its Info mode), are available, or if users
 161 aren't yet accustomed to using these tools.
 162 All of these files are readable as ``plain text'' files,
 163 though they're easier to navigate using Info readers
 164 such as @code{info} and GNU Emacs Info mode.
 165 @end table
 166
 167 If you want to explore the FFE code, which lives entirely in @file{f/},
 168 here are a few clues.
 169 The file @file{g77spec.c} contains the @code{g77}-specific source code
 170 for the @code{g77} command only---this just forms a variant of the
 171 @code{gcc} command, so,
 172 just as the @code{gcc} command itself does not contain the C front end,
 173 the @code{g77} command does not contain the Fortran front end (FFE).
 174 The FFE code ends up in an executable named @file{f771},
 175 which does the actual compiling,
 176 so it contains the FFE plus the @code{gcc} back end (GBE),
 177 the latter to do most of the optimization, and the code generation.
 178
 179 The file @file{parse.c} is the source file for @code{yyparse()},
 180 which is invoked by the GBE to start the compilation process,
 181 for @file{f771}.
 182
 183 The file @file{top.c} contains the top-level FFE function @code{ffe_file}
 184 and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*},
 185 and @samp{FFE_[A-Za-z].*} symbols.
 186
 187 The file @file{fini.c} is a @code{main()} program that is used when building
 188 the FFE to generate C header and source files for recognizing keywords.
 189 The files @file{malloc.c} and @file{malloc.h} comprise a memory manager
 190 that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and
 191 @samp{MALLOC_[A-Za-z].*} symbols.
 192
 193 All other modules named @var{xyz}
 194 are comprised of all files named @samp{@var{xyz}*.@var{ext}}
 195 and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*},
 196 and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols.
 197 If you understand all this, congratulations---it's easier for me to remember
 198 how it works than to type in these regular expressions.
 199 But it does make it easy to find where a symbol is defined.
 200 For example, the symbol @samp{ffexyz_set_something} would be defined
 201 in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}.
 202
 203 The ``porting'' files of note currently are:
 204
 205 @table @file
 206 @item proj.c
 207 @itemx proj.h
 208 This defines the ``language'' used by all the other source files,
 209 the language being Standard C plus some useful things
 210 like @code{ARRAY_SIZE} and such.
 211
 212 @item target.c
 213 @itemx target.h
 214 These describe the target machine
 215 in terms of what data types are supported,
 216 how they are denoted
 217 (to what C type does an @code{INTEGER*8} map, for example),
 218 how to convert between them,
 219 and so on.
 220 Over time, versions of @code{g77} rely less on this file
 221 and more on run-time configuration based on GBE info
 222 in @file{com.c}.
 223
 224 @item com.c
 225 @itemx com.h
 226 These are the primary interface to the GBE.
 227
 228 @item ste.c
 229 @itemx ste.h
 230 This contains code for implementing recognized executable statements
 231 in the GBE.
 232
 233 @item src.c
 234 @itemx src.h
 235 These contain information on the format(s) of source files
 236 (such as whether they are never to be processed as case-insensitive
 237 with regard to Fortran keywords).
 238 @end table
 239
 240 If you want to debug the @file{f771} executable,
 241 for example if it crashes,
 242 note that the global variables @code{lineno} and @code{input_filename}
 243 are usually set to reflect the current line being read by the lexer
 244 during the first-pass analysis of a program unit and to reflect
 245 the current line being processed during the second-pass compilation
 246 of a program unit.
 247
 248 If an invocation of the function @code{ffestd_exec_end} is on the stack,
 249 the compiler is in the second pass, otherwise it is in the first.
 250
 251 (This information might help you reduce a test case and/or work around
 252 a bug in @code{g77} until a fix is available.)
 253
 254 @node Overview of Translation Process
 255 @section Overview of Translation Process
 256
 257 The order of phases translating source code to the form accepted
 258 by the GBE is:
 259
 260 @enumerate
 261 @item
 262 Stripping punched-card sources (@file{g77stripcard.c})
 263
 264 @item
 265 Lexing (@file{lex.c})
 266
 267 @item
 268 Stand-alone statement identification (@file{sta.c})
 269
 270 @item
 271 Parsing (@file{stb.c} and @file{expr.c})
 272
 273 @item
 274 Constructing (@file{stc.c})
 275
 276 @item
 277 Collecting (@file{std.c})
 278
 279 @item
 280 Expanding (@file{ste.c})
 281 @end enumerate
 282
 283 To get a rough idea of how a particularly twisted Fortran statement
 284 gets treated by the passes, consider:
 285
 286 @smallexample
 287       FORMAT(I2 4H)=(J/
 288      &   I3)
 289 @end smallexample
 290
 291 The job of @file{lex.c} is to know enough about Fortran syntax rules
 292 to break the statement up into distinct lexemes without requiring
 293 any feedback from subsequent phases:
 294
 295 @smallexample
 296 `FORMAT'
 297 `('
 298 `I24H'
 299 `)'
 300 `='
 301 `('
 302 `J'
 303 `/'
 304 `I3'
 305 `)'
 306 @end smallexample
 307
 308 The job of @file{sta.c} is to figure out the kind of statement,
 309 or, at least, statement form, that sequence of lexemes represent.
 310
 311 The sooner it can do this (in terms of using the smallest number of
 312 lexemes, starting with the first for each statement), the better,
 313 because that leaves diagnostics for problems beyond the recognition
 314 of the statement form to subsequent phases,
 315 which can usually better describe the nature of the problem.
 316
 317 In this case, the @samp{=} at ``level zero''
 318 (not nested within parentheses)
 319 tells @file{sta.c} that this is an @emph{assignment-form},
 320 not @code{FORMAT}, statement.
 321
 322 An assignment-form statement might be a statement-function
 323 definition or an executable assignment statement.
 324
 325 To make that determination,
 326 @file{sta.c} looks at the first two lexemes.
 327
 328 Since the second lexeme is @samp{(},
 329 the first must represent an array for this to be an assignment statement,
 330 else it's a statement function.
 331
 332 Either way, @file{sta.c} hands off the statement to @file{stb.c}
 333 (either its statement-function parser or its assignment-statement parser).
 334
 335 @file{stb.c} forms a
 336 statement-specific record containing the pertinent information.
 337 That information includes a source expression and,
 338 for an assignment statement, a destination expression.
 339 Expressions are parsed by @file{expr.c}.
 340
 341 This record is passed to @file{stc.c},
 342 which copes with the implications of the statement
 343 within the context established by previous statements.
 344
 345 For example, if it's the first statement in the file
 346 or after an @code{END} statement,
 347 @file{stc.c} recognizes that, first of all,
 348 a main program unit is now being lexed
 349 (and tells that to @file{std.c}
 350 before telling it about the current statement).
 351
 352 @file{stc.c} attaches whatever information it can,
 353 usually derived from the context established by the preceding statements,
 354 and passes the information to @file{std.c}.
 355
 356 @file{std.c} saves this information away,
 357 since the GBE cannot cope with information
 358 that might be incomplete at this stage.
 359
 360 For example, @samp{I3} might later be determined
 361 to be an argument to an alternate @code{ENTRY} point.
 362
 363 When @file{std.c} is told about the end of an external (top-level)
 364 program unit,
 365 it passes all the information it has saved away
 366 on statements in that program unit
 367 to @file{ste.c}.
 368
 369 @file{ste.c} ``expands'' each statement, in sequence, by
 370 constructing the appropriate GBE information and calling
 371 the appropriate GBE routines.
 372
 373 Details on the transformational phases follow.
 374 Keep in mind that Fortran numbering is used,
 375 so the first character on a line is column 1,
 376 decimal numbering is used, and so on.
 377
 378 @menu
 379 * g77stripcard::
 380 * lex.c::
 381 * sta.c::
 382 * stb.c::
 383 * expr.c::
 384 * stc.c::
 385 * std.c::
 386 * ste.c::
 387
 388 * Gotchas (Transforming)::
 389 * TBD (Transforming)::
 390 @end menu
 391
 392 @node g77stripcard
 393 @subsection g77stripcard
 394
 395 The @code{g77stripcard} program handles removing content beyond
 396 column 72 (adjustable via a command-line option),
 397 optionally warning about that content being something other
 398 than trailing whitespace or Fortran commentary.
 399
 400 This program is needed because @code{lex.c} doesn't pay attention
 401 to maximum line lengths at all, to make it easier to maintain,
 402 as well as faster (for sources that don't depend on the maximum
 403 column length vis-a-vis trailing non-blank non-commentary content).
 404
 405 Just how this program will be run---whether automatically for
 406 old source (perhaps as the default for @file{.f} files?)---is not
 407 yet determined.
 408
 409 In the meantime, it might as well be implemented as a typical UNIX pipe.
 410
 411 It should accept a @samp{-fline-length-@var{n}} option,
 412 with the default line length set to 72.
 413
 414 When the text it strips off the end of a line is not blank
 415 (not spaces and tabs),
 416 it should insert an additional comment line
 417 (beginning with @samp{!},
 418 so it works for both fixed-form and free-form files)
 419 containing the text,
 420 following the stripped line.
 421 The inserted comment should have a prefix of some kind,
 422 TBD, that distinguishes the comment as representing stripped text.
 423 Users could use that to @code{sed} out such lines, if they wished---it
 424 seems silly to provide a command-line option to delete information
 425 when it can be so easily filtered out by another program.
 426
 427 (This inserted comment should be designed to ``fit in'' well
 428 with whatever the Fortran community is using these days for
 429 preprocessor, translator, and other such products, like OpenMP.
 430 What that's all about, and how @code{g77} can elegantly fit its
 431 special comment conventions into it all, is TBD as well.
 432 We don't want to reinvent the wheel here, but if there turn out
 433 to be too many conflicting conventions, we might have to invent
 434 one that looks nothing like the others, but which offers their
 435 host products a better infrastructure in which to fit and coexist
 436 peacefully.)
 437
 438 @code{g77stripcard} probably shouldn't do any tab expansion or other
 439 fancy stuff.
 440 People can use @code{expand} or other pre-filtering if they like.
 441 The idea here is to keep each stage quite simple, while providing
 442 excellent performance for ``normal'' code.
 443
 444 (Code with junk beyond column 73 is not really ``normal'',
 445 as it comes from a card-punch heritage,
 446 and will be increasingly hard for tomorrow's Fortran programmers to read.)
 447
 448 @node lex.c
 449 @subsection lex.c
 450
 451 To help make the lexer simple, fast, and easy to maintain,
 452 while also having @code{g77} generally encourage Fortran programmers
 453 to write simple, maintainable, portable code by maximizing the
 454 performance of compiling that kind of code:
 455
 456 @itemize @bullet
 457 @item
 458 There'll be just one lexer, for both fixed-form and free-form source.
 459
 460 @item
 461 It'll care about the form only when handling the first 7 columns of
 462 text, stuff like spaces between strings of alphanumerics, and
 463 how lines are continued.
 464
 465 Some other distinctions will be handled by subsequent phases,
 466 so at least one of them will have to know which form is involved.
 467
 468 For example, @samp{I = 2 . 4} is acceptable in fixed form,
 469 and works in free form as well given the implementation @code{g77}
 470 presently uses.
 471 But the standard requires a diagnostic for it in free form,
 472 so the parser has to be able to recognize that
 473 the lexemes aren't contiguous
 474 (information the lexer @emph{does} have to provide)
 475 and that free-form source is being parsed,
 476 so it can provide the diagnostic.
 477
 478 The @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme.
 479 Otherwise, it'd have to know a whole lot more about how to parse Fortran,
 480 or subsequent phases (mainly parsing) would have two paths through
 481 lots of critical code---one to handle the lexeme @samp{2}, @samp{.},
 482 and @samp{4} in sequence, another to handle the lexeme @samp{2.4}.
 483
 484 @item
 485 It won't worry about line lengths
 486 (beyond the first 7 columns for fixed-form source).
 487
 488 That is, once it starts parsing the ``statement'' part of a line
 489 (column 7 for fixed-form, column 1 for free-form),
 490 it'll keep going until it finds a newline,
 491 rather than ignoring everything past a particular column
 492 (72 or 132).
 493
 494 The implication here is that there shouldn't @emph{be}
 495 anything past that last column, other than whitespace or
 496 commentary, because users using typical editors
 497 (or viewing output as typically printed)
 498 won't necessarily know just where the last column is.
 499
 500 Code that has ``garbage'' beyond the last column
 501 (almost certainly only fixed-form code with a punched-card legacy,
 502 such as code using columns 73-80 for ``sequence numbers'')
 503 will have to be run through @code{g77stripcard} first.
 504
 505 Also, keeping track of the maximum column position while also watching out
 506 for the end of a line @emph{and} while reading from a file
 507 just makes things slower.
 508 Since a file must be read, and watching for the end of the line
 509 is necessary (unless the typical input file was preprocessed to
 510 include the necessary number of trailing spaces),
 511 dropping the tracking of the maximum column position
 512 is the only way to reduce the complexity of the pertinent code
 513 while maintaining high performance.
 514
 515 @item
 516 ASCII encoding is assumed for the input file.
 517
 518 Code written in other character sets will have to be converted first.
 519
 520 @item
 521 Tabs (ASCII code 9)
 522 will be converted to spaces via the straightforward
 523 approach.
 524
 525 Specifically, a tab is converted to between one and eight spaces
 526 as necessary to reach column @var{n},
 527 where dividing @samp{(@var{n} - 1)} by eight
 528 results in a remainder of zero.
 529
 530 @item
 531 Linefeeds (ASCII code 10)
 532 mark the ends of lines.
 533
 534 @item
 535 A carriage return (ASCII code 13)
 536 is accept if it immediately precedes a linefeed,
 537 in which case it is ignored.
 538
 539 Otherwise, it is rejected (with a diagnostic).
 540
 541 @item
 542 Any other characters other than the above
 543 that are not part of the GNU Fortran Character Set
 544 (@pxref{Character Set})
 545 are rejected with a diagnostic.
 546
 547 This includes backspaces, form feeds, and the like.
 548
 549 (It might make sense to allow a form feed in column 1
 550 as long as that's the only character on a line.
 551 It certainly wouldn't seem to cost much in terms of performance.)
 552
 553 @item
 554 The end of the input stream (EOF)
 555 ends the current line.
 556
 557 @item
 558 The distinction between uppercase and lowercase letters
 559 will be preserved.
 560
 561 It will be up to subsequent phases to decide to fold case.
 562
 563 Current plans are to permit any casing for Fortran (reserved) keywords
 564 while preserving casing for user-defined names.
 565 (This might not be made the default for @file{.f} files, though.)
 566
 567 Preserving case seems necessary to provide more direct access
 568 to facilities outside of @code{g77}, such as to C or Pascal code.
 569
 570 Names of intrinsics will probably be matchable in any case,
 571 However, there probably won't be any option to require
 572 a particular mixed-case appearance of intrinsics
 573 (as there was for @code{g77} prior to version 0.6),
 574 because that's painful to maintain,
 575 and probably nobody uses it.
 576
 577 (How @samp{external SiN; r = sin(x)} would be handled is TBD.
 578 I think old @code{g77} might already handle that pretty elegantly,
 579 but whether we can cope with allowing the same fragment to reference
 580 a @emph{different} procedure, even with the same interface,
 581 via @samp{s = SiN(r)}, needs to be determined.
 582 If it can't, we need to make sure that when code introduces
 583 a user-defined name, any intrinsic matching that name
 584 using a case-insensitive comparison
 585 is ``turned off''.)
 586
 587 @item
 588 Backslashes in @code{CHARACTER} and Hollerith constants
 589 are not allowed.
 590
 591 This avoids the confusion introduced by some Fortran compiler vendors
 592 providing C-like interpretation of backslashes,
 593 while others provide straight-through interpretation.
 594
 595 Some kind of lexical construct (TBD) will be provided to allow
 596 flagging of a @code{CHARACTER}
 597 (but probably not a Hollerith)
 598 constant that permits backslashes.
 599 It'll necessarily be a prefix, such as:
 600
 601 @smallexample
 602 PRINT *, C'This line has a backspace \b here.'
 603 PRINT *, F'This line has a straight backslash \ here.'
 604 @end smallexample
 605
 606 Further, command-line options might be provided to specify that
 607 one prefix or the other is to be assumed as the default
 608 for @code{CHARACTER} constants.
 609
 610 However, it seems more helpful for @code{g77} to provide a program
 611 that converts prefix all constants
 612 (or just those containing backslashes)
 613 with the desired designation,
 614 so printouts of code can be read
 615 without knowing the compile-time options used when compiling it.
 616
 617 If such a program is provided
 618 (let's name it @code{g77slash} for now),
 619 then a command-line option to @code{g77} should not be provided.
 620 (Though, given that it'll be easy to implement, it might be hard
 621 to resist user requests for it ``to compile faster than if we
 622 have to invoke another filter''.)
 623
 624 This program would take a command-line option to specify the
 625 default interpretation of slashes,
 626 affecting which prefix it uses for constants.
 627
 628 @code{g77slash} probably should automatically convert Hollerith
 629 constants that contain slashes
 630 to the appropriate @code{CHARACTER} constants.
 631 Then @code{g77} wouldn't have to define a prefix syntax for Hollerith
 632 constants specifying whether they want C-style or straight-through
 633 backslashes.
 634 @end itemize
 635
 636 The above implements nearly exactly what is specified by
 637 @ref{Character Set},
 638 and
 639 @ref{Lines},
 640 except it also provides automatic conversion of tabs
 641 and ignoring of newline-related carriage returns.
 642
 643 It also effects the ``pure visual'' model,
 644 by which is meant that a user viewing his code
 645 in a typical text editor
 646 (assuming it's not preprocessed via @code{g77stripcard} or similar)
 647 doesn't need any special knowledge
 648 of whether spaces on the screen are really tabs,
 649 whether lines end immediately after the last visible non-space character
 650 or after a number of spaces and tabs that follow it,
 651 or whether the last line in the file is ended by a newline.
 652
 653 Most editors don't make these distinctions,
 654 the ANSI FORTRAN 77 standard doesn't require them to,
 655 and it permits a standard-conforming compiler
 656 to define a method for transforming source code to
 657 ``standard form'' however it wants.
 658
 659 So, GNU Fortran defines it such that users have the best chance
 660 of having the code be interpreted the way it looks on the screen
 661 of the typical editor.
 662
 663 (Fancy editors should @emph{never} be required to correctly read code
 664 written in classic two-dimensional-plaintext form.
 665 By correct reading I mean ability to read it, book-like, without
 666 mistaking text ignored by the compiler for program code and vice versa,
 667 and without having to count beyond the first several columns.
 668 The vague meaning of ASCII TAB, among other things, complicates
 669 this somewhat, but as long as ``everyone'', including the editor,
 670 other tools, and printer, agrees about the every-eighth-column convention,
 671 the GNU Fortran ``pure visual'' model meets these requirements.
 672 Any language or user-visible source form
 673 requiring special tagging of tabs,
 674 the ends of lines after spaces/tabs,
 675 and so on, is broken by this definition.
 676 Fortunately, Fortran @emph{itself} is not broken,
 677 even if most vendor-supplied defaults for their Fortran compilers @emph{are}
 678 in this regard.)
 679
 680 Further, this model provides a clean interface
 681 to whatever preprocessors or code-generators are used
 682 to produce input to this phase of @code{g77}.
 683 Mainly, they need not worry about long lines.
 684
 685 @node sta.c
 686 @subsection sta.c
 687
 688 @node stb.c
 689 @subsection stb.c
 690
 691 @node expr.c
 692 @subsection expr.c
 693
 694 @node stc.c
 695 @subsection stc.c
 696
 697 @node std.c
 698 @subsection std.c
 699
 700 @node ste.c
 701 @subsection ste.c
 702
 703 @node Gotchas (Transforming)
 704 @subsection Gotchas (Transforming)
 705
 706 This section is not about transforming ``gotchas'' into something else.
 707 It is about the weirder aspects of transforming Fortran,
 708 however that's defined,
 709 into a more modern, canonical form.
 710
 711 @subsubsection Multi-character Lexemes
 712
 713 Each lexeme carries with it a pointer to where it appears in the source.
 714
 715 To provide the ability for diagnostics to point to column numbers,
 716 in addition to line numbers and names,
 717 lexemes that represent more than one (significant) character
 718 in the source code need, generally,
 719 to provide pointers to where each @emph{character} appears in the source.
 720
 721 This provides the ability to properly identify the precise location
 722 of the problem in code like
 723
 724 @smallexample
 725 SUBROUTINE X
 726 END
 727 BLOCK DATA X
 728 END
 729 @end smallexample
 730
 731 which, in fixed-form source, would result in single lexemes
 732 consisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}.
 733 (The problem is that @samp{X} is defined twice,
 734 so a pointer to the @samp{X} in the second definition,
 735 as well as a follow-up pointer to the corresponding pointer in the first,
 736 would be preferable to pointing to the beginnings of the statements.)
 737
 738 This need also arises when parsing (and diagnosing) @code{FORMAT}
 739 statements.
 740
 741 Further, it arises when diagnosing
 742 @code{FMT=} specifiers that contain constants
 743 (or partial constants, or even propagated constants!)
 744 in I/O statements, as in:
 745
 746 @smallexample
 747 PRINT '(I2, 3HAB)', J
 748 @end smallexample
 749
 750 (A pointer to the beginning of the prematurely-terminated Hollerith
 751 constant, and/or to the close parenthese, is preferable to a pointer
 752 to the open-parenthese or the apostrophe that precedes it.)
 753
 754 Multi-character lexemes, which would seem to naturally include
 755 at least digit strings, alphanumeric strings, @code{CHARACTER}
 756 constants, and Hollerith constants, therefore need to provide
 757 location information on each character.
 758 (Maybe Hollerith constants don't, but it's unnecessary to except them.)
 759
 760 The question then arises, what about @emph{other} multi-character lexemes,
 761 such as @samp{**} and @samp{//},
 762 and Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on?
 763
 764 Turns out there's a need to identify the location of the second character
 765 of these two-character lexemes.
 766 For example, in @samp{I(/J) = K}, the slash needs to be diagnosed
 767 as the problem, not the open parenthese.
 768 Similarly, it is preferable to diagnose the second slash in
 769 @samp{I = J // K} rather than the first, given the implicit typing
 770 rules, which would result in the compiler disallowing the attempted
 771 concatenation of two integers.
 772 (Though, since that's more of a semantic issue,
 773 it's not @emph{that} much preferable.)
 774
 775 Even sequences that could be parsed as digit strings could use location info,
 776 for example, to diagnose the @samp{9} in the octal constant @samp{O'129'}.
 777 (This probably will be parsed as a character string,
 778 to be consistent with the parsing of @samp{Z'129A'}.)
 779
 780 To avoid the hassle of recording the location of the second character,
 781 while also preserving the general rule that each significant character
 782 is distinctly pointed to by the lexeme that contains it,
 783 it's best to simply not have any fixed-size lexemes
 784 larger than one character.
 785
 786 This new design is expected to make checking for two
 787 @samp{*} lexemes in a row much easier than the old design,
 788 so this is not much of a sacrifice.
 789 It probably makes the lexer much easier to implement
 790 than it makes the parser harder.
 791
 792 @subsubsection Space-padding Lexemes
 793
 794 Certain lexemes need to be padded with virtual spaces when the
 795 end of the line (or file) is encountered.
 796
 797 This is necessary in fixed form, to handle lines that don't
 798 extend to column 72, assuming that's the line length in effect.
 799
 800 @subsubsection Bizarre Free-form Hollerith Constants
 801
 802 Last I checked, the Fortran 90 standard actually required the compiler
 803 to silently accept something like
 804
 805 @smallexample
 806 FORMAT ( 1 2   Htwelve chars )
 807 @end smallexample
 808
 809 as a valid @code{FORMAT} statement specifying a twelve-character
 810 Hollerith constant.
 811
 812 The implication here is that, since the new lexer is a zero-feedback one,
 813 it won't know that the special case of a @code{FORMAT} statement being parsed
 814 requires apparently distinct lexemes @samp{1} and @samp{2} to be treated as
 815 a single lexeme.
 816
 817 (This is a horrible misfeature of the Fortran 90 language.
 818 It's one of many such misfeatures that almost make me want
 819 to not support them, and forge ahead with designing a new
 820 ``GNU Fortran'' language that has the features,
 821 but not the misfeatures, of Fortran 90,
 822 and provide utility programs to do the conversion automatically.)
 823
 824 So, the lexer must gather distinct chunks of decimal strings into
 825 a single lexeme in contexts where a single decimal lexeme might
 826 start a Hollerith constant.
 827
 828 (Which probably means it might as well do that all the time
 829 for all multi-character lexemes, even in free-form mode,
 830 leaving it to subsequent phases to pull them apart as they see fit.)
 831
 832 Compare the treatment of this to how
 833
 834 @smallexample
 835 CHARACTER * 4 5 HEY
 836 @end smallexample
 837
 838 and
 839
 840 @smallexample
 841 CHARACTER * 12 HEY
 842 @end smallexample
 843
 844 must be treated---the former must be diagnosed, due to the separation
 845 between lexemes, the latter must be accepted as a proper declaration.
 846
 847 @subsubsection Hollerith Constants
 848
 849 Recognizing a Hollerith constant---specifically,
 850 that an @samp{H} or @samp{h} after a digit string begins
 851 such a constant---requires some knowledge of context.
 852
 853 Hollerith constants (such as @samp{2HAB}) can appear after:
 854
 855 @itemize @bullet
 856 @item
 857 @samp{(}
 858
 859 @item
 860 @samp{,}
 861
 862 @item
 863 @samp{=}
 864
 865 @item
 866 @samp{+}, @samp{-}, @samp{/}
 867
 868 @item
 869 @samp{*}, except as noted below
 870 @end itemize
 871
 872 Hollerith constants don't appear after:
 873
 874 @itemize @bullet
 875 @item
 876 @samp{CHARACTER*},
 877 which can be treated generally as
 878 any @samp{*} that is the second lexeme of a statement
 879 @end itemize
 880
 881 @subsubsection Confusing Function Keyword
 882
 883 While
 884
 885 @smallexample
 886 REAL FUNCTION FOO ()
 887 @end smallexample
 888
 889 must be a @code{FUNCTION} statement and
 890
 891 @smallexample
 892 REAL FUNCTION FOO (5)
 893 @end smallexample
 894
 895 must be a type-definition statement,
 896
 897 @smallexample
 898 REAL FUNCTION FOO (@var{names})
 899 @end smallexample
 900
 901 where @var{names} is a comma-separated list of names,
 902 can be one or the other.
 903
 904 The only way to disambiguate that statement
 905 (short of mandating free-form source or a short maximum
 906 length for name for external procedures)
 907 is based on the context of the statement.
 908
 909 In particular, the statement is known to be within an
 910 already-started program unit
 911 (but not at the outer level of the @code{CONTAINS} block),
 912 it is a type-declaration statement.
 913
 914 Otherwise, the statement is a @code{FUNCTION} statement,
 915 in that it begins a function program unit
 916 (external, or, within @code{CONTAINS}, nested).
 917
 918 @subsubsection Weird READ
 919
 920 The statement
 921
 922 @smallexample
 923 READ (N)
 924 @end smallexample
 925
 926 is equivalent to either
 927
 928 @smallexample
 929 READ (UNIT=(N))
 930 @end smallexample
 931
 932 or
 933
 934 @smallexample
 935 READ (FMT=(N))
 936 @end smallexample
 937
 938 depending on which would be valid in context.
 939
 940 Specifically, if @samp{N} is type @code{INTEGER},
 941 @samp{READ (FMT=(N))} would not be valid,
 942 because parentheses may not be used around @samp{N},
 943 whereas they may around it in @samp{READ (UNIT=(N))}.
 944
 945 Further, if @samp{N} is type @code{CHARACTER},
 946 the opposite is true---@samp{READ (UNIT=(N))} is not valid,
 947 but @samp{READ (FMT=(N))} is.
 948
 949 Strictly speaking, if anything follows
 950
 951 @smallexample
 952 READ (N)
 953 @end smallexample
 954
 955 in the statement, whether the first lexeme after the close
 956 parenthese is a comma could be used to disambiguate the two cases,
 957 without looking at the type of @samp{N},
 958 because the comma is required for the @samp{READ (FMT=(N))}
 959 interpretation and disallowed for the @samp{READ (UNIT=(N))}
 960 interpretation.
 961
 962 However, in practice, many Fortran compilers allow
 963 the comma for the @samp{READ (UNIT=(N))}
 964 interpretation anyway
 965 (in that they generally allow a leading comma before
 966 an I/O list in an I/O statement),
 967 and much code takes advantage of this allowance.
 968
 969 (This is quite a reasonable allowance, since the
 970 juxtaposition of a comma-separated list immediately
 971 after an I/O control-specification list, which is also comma-separated,
 972 without an intervening comma,
 973 looks sufficiently ``wrong'' to programmers
 974 that they can't resist the itch to insert the comma.
 975 @samp{READ (I, J), K, L} simply looks cleaner than
 976 @samp{READ (I, J) K, L}.)
 977
 978 So, type-based disambiguation is needed unless strict adherence
 979 to the standard is always assumed, and we're not going to assume that.
 980
 981 @node TBD (Transforming)
 982 @subsection TBD (Transforming)
 983
 984 Continue researching gotchas, designing the transformational process,
 985 and implementing it.
 986
 987 Specific issues to resolve:
 988
 989 @itemize @bullet
 990 @item
 991 Just where should @code{INCLUDE} processing take place?
 992
 993 Clearly before (or part of) statement identification (@file{sta.c}),
 994 since determining whether @samp{I(J)=K} is a statement-function
 995 definition or an assignment statement requires knowing the context,
 996 which in turn requires having processed @code{INCLUDE} files.
 997
 998 @item
 999 Just where should (if it was implemented) @code{USE} processing take place?
1000
1001 This gets into the whole issue of how @code{g77} should handle the concept
1002 of modules.
1003 I think GNAT already takes on this issue, but don't know more than that.
1004 Jim Giles has written extensively on @code{comp.lang.fortran}
1005 about his opinions on module handling, as have others.
1006 Jim's views should be taken into account.
1007
1008 Actually, Richard M. Stallman (RMS) also has written up
1009 some guidelines for implementing such things,
1010 but I'm not sure where I read them.
1011 Perhaps the old @email{gcc2@@cygnus.com} list.
1012
1013 If someone could dig references to these up and get them to me,
1014 that would be much appreciated!
1015 Even though modules are not on the short-term list for implementation,
1016 it'd be helpful to know @emph{now} how to avoid making them harder to
1017 implement them @emph{later}.
1018
1019 @item
1020 Should the @code{g77} command become just a script that invokes
1021 all the various preprocessing that might be needed,
1022 thus making it seem slower than necessary for legacy code
1023 that people are unwilling to convert,
1024 or should we provide a separate script for that,
1025 thus encouraging people to convert their code once and for all?
1026
1027 At least, a separate script to behave as old @code{g77} did,
1028 perhaps named @code{g77old}, might ease the transition,
1029 as might a corresponding one that converts source codes
1030 named @code{g77oldnew}.
1031
1032 These scripts would take all the pertinent options @code{g77} used
1033 to take and run the appropriate filters,
1034 passing the results to @code{g77} or just making new sources out of them
1035 (in a subdirectory, leaving the user to do the dirty deed of
1036 moving or copying them over the old sources).
1037
1038 @item
1039 Do other Fortran compilers provide a prefix syntax
1040 to govern the treatment of backslashes in @code{CHARACTER}
1041 (or Hollerith) constants?
1042
1043 Knowing what other compilers provide would help.
1044
1045 @item
1046 Is it okay to drop support for the @samp{-fintrin-case-initcap},
1047 @samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap},
1048 and @samp{-fcase-initcap} options?
1049
1050 I've asked @email{info-gnu-fortran@@gnu.org} for input on this.
1051 Not having to support these makes it easier to write the new front end,
1052 and might also avoid complicated its design.
1053 @end itemize
1054
1055 @node Philosophy of Code Generation
1056 @section Philosophy of Code Generation
1057
1058 Don't poke the bear.
1059
1060 The @code{g77} front end generates code
1061 via the @code{gcc} back end.
1062
1063 @cindex GNU Back End (GBE)
1064 @cindex GBE
1065 @cindex @code{gcc}, back end
1066 @cindex back end, gcc
1067 @cindex code generator
1068 The @code{gcc} back end (GBE) is a large, complex
1069 labyrinth of intricate code
1070 written in a combination of the C language
1071 and specialized languages internal to @code{gcc}.
1072
1073 While the @emph{code} that implements the GBE
1074 is written in a combination of languages,
1075 the GBE itself is,
1076 to the front end for a language like Fortran,
1077 best viewed as a @emph{compiler}
1078 that compiles its own, unique, language.
1079
1080 The GBE's ``source'', then, is written in this language,
1081 which consists primarily of
1082 a combination of calls to GBE functions
1083 and @dfn{tree} nodes
1084 (which are, themselves, created
1085 by calling GBE functions).
1086
1087 So, the @code{g77} generates code by, in effect,
1088 translating the Fortran code it reads
1089 into a form ``written'' in the ``language''
1090 of the @code{gcc} back end.
1091
1092 @cindex GBEL
1093 @cindex GNU Back End Language (GBEL)
1094 This language will heretofore be referred to as @dfn{GBEL},
1095 for GNU Back End Language.
1096
1097 GBEL is an evolving language,
1098 not fully specified in any published form
1099 as of this writing.
1100 It offers many facilities,
1101 but its ``core'' facilities
1102 are those that corresponding most directly
1103 to those needed to support @code{gcc}
1104 (compiling code written in GNU C).
1105
1106 The @code{g77} Fortran Front End (FFE)
1107 is designed and implemented
1108 to navigate the currents and eddies
1109 of ongoing GBEL and @code{gcc} development
1110 while also delivering on the potential
1111 of an integrated FFE
1112 (as compared to using a converter like @code{f2c}
1113 and feeding the output into @code{gcc}).
1114
1115 Goals of the FFE's code-generation strategy include:
1116
1117 @itemize @bullet
1118 @item
1119 High likelihood of generation of correct code,
1120 or, failing that, producing a fatal diagnostic or crashing.
1121
1122 @item
1123 Generation of highly optimized code,
1124 as directed by the user
1125 via GBE-specific (versus @code{g77}-specific) constructs,
1126 such as command-line options.
1127
1128 @item
1129 Fast overall (FFE plus GBE) compilation.
1130
1131 @item
1132 Preservation of source-level debugging information.
1133 @end itemize
1134
1135 The strategies historically, and currently, used by the FFE
1136 to achieve these goals include:
1137
1138 @itemize @bullet
1139 @item
1140 Use of GBEL constructs that most faithfully encapsulate
1141 the semantics of Fortran.
1142
1143 @item
1144 Avoidance of GBEL constructs that are so rarely used,
1145 or limited to use in specialized situations not related to Fortran,
1146 that their reliability and performance has not yet been established
1147 as sufficient for use by the FFE.
1148
1149 @item
1150 Flexible design, to readily accommodate changes to specific
1151 code-generation strategies, perhaps governed by command-line options.
1152 @end itemize
1153
1154 @cindex Bear-poking
1155 @cindex Poking the bear
1156 ``Don't poke the bear'' somewhat summarizes the above strategies.
1157 The GBE is the bear.
1158 The FFE is designed and implemented to avoid poking it
1159 in ways that are likely to just annoy it.
1160 The FFE usually either tackles it head-on,
1161 or avoids treating it in ways dissimilar to how
1162 the @code{gcc} front end treats it.
1163
1164 For example, the FFE uses the native array facility in the back end
1165 instead of the lower-level pointer-arithmetic facility
1166 used by @code{gcc} when compiling @code{f2c} output).
1167 Theoretically, this presents more opportunities for optimization,
1168 faster compile times,
1169 and the production of more faithful debugging information.
1170 These benefits were not, however, immediately realized,
1171 mainly because @code{gcc} itself makes little or no use
1172 of the native array facility.
1173
1174 Complex arithmetic is a case study of the evolution of this strategy.
1175 When originally implemented,
1176 the GBEL had just evolved its own native complex-arithmetic facility,
1177 so the FFE took advantage of that.
1178
1179 When porting @code{g77} to 64-bit systems,
1180 it was discovered that the GBE didn't really
1181 implement its native complex-arithmetic facility properly.
1182
1183 The short-term solution was to rewrite the FFE
1184 to instead use the lower-level facilities
1185 that'd be used by @code{gcc}-compiled code
1186 (assuming that code, itself, didn't use the native complex type
1187 provided, as an extension, by @code{gcc}),
1188 since these were known to work,
1189 and, in any case, if shown to not work,
1190 would likely be rapidly fixed
1191 (since they'd likely not work for vanilla C code in similar circumstances).
1192
1193 However, the rewrite accommodated the original, native approach as well
1194 by offering a command-line option to select it over the emulated approach.
1195 This allowed users, and especially GBE maintainers, to try out
1196 fixes to complex-arithmetic support in the GBE
1197 while @code{g77} continued to default to compiling more code correctly,
1198 albeit producing (typically) slower executables.
1199
1200 As of April 1999, it appeared that the last few bugs
1201 in the GBE's support of its native complex-arithmetic facility
1202 were worked out.
1203 The FFE was changed back to default to using that native facility,
1204 leaving emulation as an option.
1205
1206 Other Fortran constructs---arrays, character strings,
1207 complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates,
1208 and so on---involve issues similar to those pertaining to complex arithmetic.
1209
1210 So, it is possible that the history
1211 of how the FFE handled complex arithmetic
1212 will be repeated, probably in modified form
1213 (and hopefully over shorter timeframes),
1214 for some of these other facilities.
1215
1216 @node Two-pass Design
1217 @section Two-pass Design
1218
1219 The FFE does not tell the GBE anything about a program unit
1220 until after the last statement in that unit has been parsed.
1221 (A program unit is a Fortran concept that corresponds, in the C world,
1222 mostly closely to functions definitions in ISO C.
1223 That is, a program unit in Fortran is like a top-level function in C.
1224 Nested functions, found among the extensions offered by GNU C,
1225 correspond roughly to Fortran's statement functions.)
1226
1227 So, while parsing the code in a program unit,
1228 the FFE saves up all the information
1229 on statements, expressions, names, and so on,
1230 until it has seen the last statement.
1231
1232 At that point, the FFE revisits the saved information
1233 (in what amounts to a second @dfn{pass} over the program unit)
1234 to perform the actual translation of the program unit into GBEL,
1235 ultimating in the generation of assembly code for it.
1236
1237 Some lookahead is performed during this second pass,
1238 so the FFE could be viewed as a ``two-plus-pass'' design.
1239
1240 @menu
1241 * Two-pass Code::
1242 * Why Two Passes::
1243 @end menu
1244
1245 @node Two-pass Code
1246 @subsection Two-pass Code
1247
1248 Most of the code that turns the first pass (parsing)
1249 into a second pass for code generation
1250 is in @file{@value{path-g77}/std.c}.
1251
1252 It has external functions,
1253 called mainly by siblings in @file{@value{path-g77}/stc.c},
1254 that record the information on statements and expressions
1255 in the order they are seen in the source code.
1256 These functions save that information.
1257
1258 It also has an external function that revisits that information,
1259 calling the siblings in @file{@value{path-g77}/ste.c},
1260 which handles the actual code generation
1261 (by generating GBEL code,
1262 that is, by calling GBE routines
1263 to represent and specify expressions, statements, and so on).
1264
1265 @node Why Two Passes
1266 @subsection Why Two Passes
1267
1268 The need for two passes was not immediately evident
1269 during the design and implementation of the code in the FFE
1270 that was to produce GBEL.
1271 Only after a few kludges,
1272 to handle things like incorrectly-guessed @code{ASSIGN} label nature,
1273 had been implemented,
1274 did enough evidence pile up to make it clear
1275 that @file{std.c} had to be introduced to intercept,
1276 save, then revisit as part of a second pass,
1277 the digested contents of a program unit.
1278
1279 Other such missteps have occurred during the evolution of the FFE,
1280 because of the different goals of the FFE and the GBE.
1281
1282 Because the GBE's original, and still primary, goal
1283 was to directly support the GNU C language,
1284 the GBEL, and the GBE itself,
1285 requires more complexity
1286 on the part of most front ends
1287 than it requires of @code{gcc}'s.
1288
1289 For example,
1290 the GBEL offers an interface that permits the @code{gcc} front end
1291 to implement most, or all, of the language features it supports,
1292 without the front end having to
1293 make use of non-user-defined variables.
1294 (It's almost certainly the case that all of K&R C,
1295 and probably ANSI C as well,
1296 is handled by the @code{gcc} front end
1297 without declaring such variables.)
1298
1299 The FFE, on the other hand, must resort to a variety of ``tricks''
1300 to achieve its goals.
1301
1302 Consider the following C code:
1303
1304 @smallexample
1305 int
1306 foo (int a, int b)
1307 @{
1308   int c = 0;
1309
1310   if ((c = bar (c)) == 0)
1311     goto done;
1312
1313   quux (c << 1);
1314
1315 done:
1316   return c;
1317 @}
1318 @end smallexample
1319
1320 Note what kinds of objects are declared, or defined, before their use,
1321 and before any actual code generation involving them
1322 would normally take place:
1323
1324 @itemize @bullet
1325 @item
1326 Return type of function
1327
1328 @item
1329 Entry point(s) of function
1330
1331 @item
1332 Dummy arguments
1333
1334 @item
1335 Variables
1336
1337 @item
1338 Initial values for variables
1339 @end itemize
1340
1341 Whereas, the following items can, and do,
1342 suddenly appear ``out of the blue'' in C:
1343
1344 @itemize @bullet
1345 @item
1346 Label references
1347
1348 @item
1349 Function references
1350 @end itemize
1351
1352 Not surprisingly, the GBE faithfully permits the latter set of items
1353 to be ``discovered'' partway through GBEL ``programs'',
1354 just as they are permitted to in C.
1355
1356 Yet, the GBE has tended, at least in the past,
1357 to be reticent to fully support similar ``late'' discovery
1358 of items in the former set.
1359
1360 This makes Fortran a poor fit for the ``safe'' subset of GBEL.
1361 Consider:
1362
1363 @smallexample
1364       FUNCTION X (A, ARRAY, ID1)
1365       CHARACTER*(*) A
1366       DOUBLE PRECISION X, Y, Z, TMP, EE, PI
1367       REAL ARRAY(ID1*ID2)
1368       COMMON ID2
1369       EXTERNAL FRED
1370
1371       ASSIGN 100 TO J
1372       CALL FOO (I)
1373       IF (I .EQ. 0) PRINT *, A(0)
1374       GOTO 200
1375
1376       ENTRY Y (Z)
1377       ASSIGN 101 TO J
1378 200   PRINT *, A(1)
1379       READ *, TMP
1380       GOTO J
1381 100   X = TMP * EE
1382       RETURN
1383 101   Y = TMP * PI
1384       CALL FRED
1385       DATA EE, PI /2.71D0, 3.14D0/
1386       END
1387 @end smallexample
1388
1389 Here are some observations about the above code,
1390 which, while somewhat contrived,
1391 conforms to the FORTRAN 77 and Fortran 90 standards:
1392
1393 @itemize @bullet
1394 @item
1395 The return type of function @samp{X} is not known
1396 until the @samp{DOUBLE PRECISION} line has been parsed.
1397
1398 @item
1399 Whether @samp{A} is a function or a variable
1400 is not known until the @samp{PRINT *, A(0)} statement
1401 has been parsed.
1402
1403 @item
1404 The bounds of the array of argument @samp{ARRAY}
1405 depend on a computation involving
1406 the subsequent argument @samp{ID1}
1407 and the blank-common member @samp{ID2}.
1408
1409 @item
1410 Whether @samp{Y} and @samp{Z} are local variables,
1411 additional function entry points,
1412 or dummy arguments to additional entry points
1413 is not known
1414 until the @code{ENTRY} statement is parsed.
1415
1416 @item
1417 Similarly, whether @samp{TMP} is a local variable is not known
1418 until the @samp{READ *, TMP} statement is parsed.
1419
1420 @item
1421 The initial values for @samp{EE} and @samp{PI}
1422 are not known until after the @code{DATA} statement is parsed.
1423
1424 @item
1425 Whether @samp{FRED} is a function returning type @code{REAL}
1426 or a subroutine
1427 (which can be thought of as returning type @code{void}
1428 @emph{or}, to support alternate returns in a simple way,
1429 type @code{int})
1430 is not known
1431 until the @samp{CALL FRED} statement is parsed.
1432
1433 @item
1434 Whether @samp{100} is a @code{FORMAT} label
1435 or the label of an executable statement
1436 is not known
1437 until the @samp{X =} statement is parsed.
1438 (These two types of labels get @emph{very} different treatment,
1439 especially when @code{ASSIGN}'ed.)
1440
1441 @item
1442 That @samp{J} is a local variable is not known
1443 until the first @code{ASSIGN} statement is parsed.
1444 (This happens @emph{after} executable code has been seen.)
1445 @end itemize
1446
1447 Very few of these ``discoveries''
1448 can be accommodated by the GBE as it has evolved over the years.
1449 The GBEL doesn't support several of them,
1450 and those it might appear to support
1451 don't always work properly,
1452 especially in combination with other GBEL and GBE features,
1453 as implemented in the GBE.
1454
1455 (Had the GBE and its GBEL originally evolved to support @code{g77},
1456 the shoe would be on the other foot, so to speak---most, if not all,
1457 of the above would be directly supported by the GBEL,
1458 and a few C constructs would probably not, as they are in reality,
1459 be supported.
1460 Both this mythical, and today's real, GBE caters to its GBEL
1461 by, sometimes, scrambling around, cleaning up after itself---after
1462 discovering that assumptions it made earlier during code generation
1463 are incorrect.)
1464
1465 So, the FFE handles these discrepancies---between the order in which
1466 it discovers facts about the code it is compiling,
1467 and the order in which the GBEL and GBE support such discoveries---by
1468 performing what amounts to two
1469 passes over each program unit.
1470
1471 (A few ambiguities can remain at that point,
1472 such as whether, given @samp{EXTERNAL BAZ}
1473 and no other reference to @samp{BAZ} in the program unit,
1474 it is a subroutine, a function, or a block-data---which, in C-speak,
1475 governs its declared return type.
1476 Fortunately, these distinctions are easily finessed
1477 for the procedure, library, and object-file interfaces
1478 supported by @code{g77}.)
1479
1480 @node Challenges Posed
1481 @section Challenges Posed
1482
1483 Consider the following Fortran code, which uses various extensions
1484 (including some to Fortran 90):
1485
1486 @smallexample
1487 SUBROUTINE X(A)
1488 CHARACTER*(*) A
1489 COMPLEX CFUNC
1490 INTEGER*2 CLOCKS(200)
1491 INTEGER IFUNC
1492
1493 CALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')'))))
1494 @end smallexample
1495
1496 The above poses the following challenges to any Fortran compiler
1497 that uses run-time interfaces, and a run-time library, roughly similar
1498 to those used by @code{g77}:
1499
1500 @itemize @bullet
1501 @item
1502 Assuming the library routine that supports @code{SYSTEM_CLOCK}
1503 expects to set an @code{INTEGER*4} variable via its @code{COUNT} argument,
1504 the compiler must make available to it a temporary variable of that type.
1505
1506 @item
1507 Further, after the @code{SYSTEM_CLOCK} library routine returns,
1508 the compiler must ensure that the temporary variable it wrote
1509 is copied into the appropriate element of the @samp{CLOCKS} array.
1510 (This assumes the compiler doesn't just reject the code,
1511 which it should if it is compiling under some kind of a ``strict'' option.)
1512
1513 @item
1514 To determine the correct index into the @samp{CLOCKS} array,
1515 (putting aside the fact that the index, in this particular case,
1516 need not be computed until after
1517 the @code{SYSTEM_CLOCK} library routine returns),
1518 the compiler must ensure that the @code{IFUNC} function is called.
1519
1520 That requires evaluating its argument,
1521 which requires, for @code{g77}
1522 (assuming @code{-ff2c} is in force),
1523 reserving a temporary variable of type @code{COMPLEX}
1524 for use as a repository for the return value
1525 being computed by @samp{CFUNC}.
1526
1527 @item
1528 Before invoking @samp{CFUNC},
1529 is argument must be evaluated,
1530 which requires allocating, at run time,
1531 a temporary large enough to hold the result of the concatenation,
1532 as well as actually performing the concatenation.
1533
1534 @item
1535 The large temporary needed during invocation of @code{CFUNC}
1536 should, ideally, be deallocated
1537 (or, at least, left to the GBE to dispose of, as it sees fit)
1538 as soon as @code{CFUNC} returns,
1539 which means before @code{IFUNC} is called
1540 (as it might need a lot of dynamically allocated memory).
1541 @end itemize
1542
1543 @code{g77} currently doesn't support all of the above,
1544 but, so that it might someday, it has evolved to handle
1545 at least some of the above requirements.
1546
1547 Meeting the above requirements is made more challenging
1548 by conforming to the requirements of the GBEL/GBE combination.
1549
1550 @node Transforming Statements
1551 @section Transforming Statements
1552
1553 Most Fortran statements are given their own block,
1554 and, for temporary variables they might need, their own scope.
1555 (A block is what distinguishes @samp{@{ foo (); @}}
1556 from just @samp{foo ();} in C.
1557 A scope is included with every such block,
1558 providing a distinct name space for local variables.)
1559
1560 Label definitions for the statement precede this block,
1561 so @samp{10 PRINT *, I} is handled more like
1562 @samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}}
1563 (where @samp{fl10} is just a notation meaning ``Fortran Label 10''
1564 for the purposes of this document).
1565
1566 @menu
1567 * Statements Needing Temporaries::
1568 * Transforming DO WHILE::
1569 * Transforming Iterative DO::
1570 * Transforming Block IF::
1571 * Transforming SELECT CASE::
1572 @end menu
1573
1574 @node Statements Needing Temporaries
1575 @subsection Statements Needing Temporaries
1576
1577 Any temporaries needed during, but not beyond,
1578 execution of a Fortran statement,
1579 are made local to the scope of that statement's block.
1580
1581 This allows the GBE to share storage for these temporaries
1582 among the various statements without the FFE
1583 having to manage that itself.
1584
1585 (The GBE could, of course, decide to optimize
1586 management of these temporaries.
1587 For example, it could, theoretically,
1588 schedule some of the computations involving these temporaries
1589 to occur in parallel.
1590 More practically, it might leave the storage for some temporaries
1591 ``live'' beyond their scopes, to reduce the number of
1592 manipulations of the stack pointer at run time.)
1593
1594 Temporaries needed across distinct statement boundaries usually
1595 are associated with Fortran blocks (such as @code{DO}/@code{END DO}).
1596 (Also, there might be temporaries not associated with blocks at all---these
1597 would be in the scope of the entire program unit.)
1598
1599 Each Fortran block @emph{should} get its own block/scope in the GBE.
1600 This is best, because it allows temporaries to be more naturally handled.
1601 However, it might pose problems when handling labels
1602 (in particular, when they're the targets of @code{GOTO}s outside the Fortran
1603 block), and generally just hassling with replicating
1604 parts of the @code{gcc} front end
1605 (because the FFE needs to support
1606 an arbitrary number of nested back-end blocks
1607 if each Fortran block gets one).
1608
1609 So, there might still be a need for top-level temporaries, whose
1610 ``owning'' scope is that of the containing procedure.
1611
1612 Also, there seems to be problems declaring new variables after
1613 generating code (within a block) in the back end, leading to, e.g.,
1614 @samp{label not defined before binding contour} or similar messages,
1615 when compiling with @samp{-fstack-check} or
1616 when compiling for certain targets.
1617
1618 Because of that, and because sometimes these temporaries are not
1619 discovered until in the middle of of generating code for an expression
1620 statement (as in the case of the optimization for @samp{X**I}),
1621 it seems best to always
1622 pre-scan all the expressions that'll be expanded for a block
1623 before generating any of the code for that block.
1624
1625 This pre-scan then handles discovering and declaring, to the back end,
1626 the temporaries needed for that block.
1627
1628 It's also important to treat distinct items in an I/O list as distinct
1629 statements deserving their own blocks.
1630 That's because there's a requirement
1631 that each I/O item be fully processed before the next one,
1632 which matters in cases like @samp{READ (*,*), I, A(I)}---the
1633 element of @samp{A} read in the second item
1634 @emph{must} be determined from the value
1635 of @samp{I} read in the first item.
1636
1637 @node Transforming DO WHILE
1638 @subsection Transforming DO WHILE
1639
1640 @samp{DO WHILE(expr)} @emph{must} be implemented
1641 so that temporaries needed to evaluate @samp{expr}
1642 are generated just for the test, each time.
1643
1644 Consider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed:
1645
1646 @smallexample
1647 for (;;)
1648   @{
1649     int temp0;
1650
1651     @{
1652       char temp1[large];
1653
1654       libg77_catenate (temp1, a, b);
1655       temp0 = libg77_ne (temp1, 'END');
1656     @}
1657
1658     if (! temp0)
1659       break;
1660
1661     @dots{}
1662   @}
1663 @end smallexample
1664
1665 In this case, it seems like a time/space tradeoff
1666 between allocating and deallocating @samp{temp1} for each iteration
1667 and allocating it just once for the entire loop.
1668
1669 However, if @samp{temp1} is allocated just once for the entire loop,
1670 it could be the wrong size for subsequent iterations of that loop
1671 in cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')},
1672 because the body of the loop might modify @samp{I} or @samp{J}.
1673
1674 So, the above implementation is used,
1675 though a more optimal one can be used
1676 in specific circumstances.
1677
1678 @node Transforming Iterative DO
1679 @subsection Transforming Iterative DO
1680
1681 An iterative @code{DO} loop
1682 (one that specifies an iteration variable)
1683 is required by the Fortran standards
1684 to be implemented as though an iteration count
1685 is computed before entering the loop body,
1686 and that iteration count used to determine
1687 the number of times the loop body is to be performed
1688 (assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}).
1689
1690 The FFE handles this by allocating a temporary variable
1691 to contain the computed number of iterations.
1692 Since this variable must be in a scope that includes the entire loop,
1693 a GBEL block is created for that loop,
1694 and the variable declared as belonging to the scope of that block.
1695
1696 @node Transforming Block IF
1697 @subsection Transforming Block IF
1698
1699 Consider:
1700
1701 @smallexample
1702 SUBROUTINE X(A,B,C)
1703 CHARACTER*(*) A, B, C
1704 LOGICAL LFUNC
1705
1706 IF (LFUNC (A//B)) THEN
1707   CALL SUBR1
1708 ELSE IF (LFUNC (A//C)) THEN
1709   CALL SUBR2
1710 ELSE
1711   CALL SUBR3
1712 END
1713 @end smallexample
1714
1715 The arguments to the two calls to @samp{LFUNC}
1716 require dynamic allocation (at run time),
1717 but are not required during execution of the @code{CALL} statements.
1718
1719 So, the scopes of those temporaries must be within blocks inside
1720 the block corresponding to the Fortran @code{IF} block.
1721
1722 This cannot be represented ``naturally''
1723 in vanilla C, nor in GBEL.
1724 The @code{if}, @code{elseif}, @code{else},
1725 and @code{endif} constructs
1726 provided by both languages must,
1727 for a given @code{if} block,
1728 share the same C/GBE block.
1729
1730 Therefore, any temporaries needed during evaluation of @samp{expr}
1731 while executing @samp{ELSE IF(expr)}
1732 must either have been predeclared
1733 at the top of the corresponding @code{IF} block,
1734 or declared within a new block for that @code{ELSE IF}---a block that,
1735 since it cannot contain the @code{else} or @code{else if} itself
1736 (due to the above requirement),
1737 actually implements the rest of the @code{IF} block's
1738 @code{ELSE IF} and @code{ELSE} statements
1739 within an inner block.
1740
1741 The FFE takes the latter approach.
1742
1743 @node Transforming SELECT CASE
1744 @subsection Transforming SELECT CASE
1745
1746 @code{SELECT CASE} poses a few interesting problems for code generation,
1747 if efficiency and frugal stack management are important.
1748
1749 Consider @samp{SELECT CASE (I('PREFIX'//A))},
1750 where @samp{A} is @code{CHARACTER*(*)}.
1751 In a case like this---basically,
1752 in any case where largish temporaries are needed
1753 to evaluate the expression---those temporaries should
1754 not be ``live'' during execution of any of the @code{CASE} blocks.
1755
1756 So, evaluation of the expression is best done within its own block,
1757 which in turn is within the @code{SELECT CASE} block itself
1758 (which contains the code for the CASE blocks as well,
1759 though each within their own block).
1760
1761 Otherwise, we'd have the rough equivalent of this pseudo-code:
1762
1763 @smallexample
1764 @{
1765   char temp[large];
1766
1767   libg77_catenate (temp, 'prefix', a);
1768
1769   switch (i (temp))
1770     @{
1771     case 0:
1772       @dots{}
1773     @}
1774 @}
1775 @end smallexample
1776
1777 And that would leave temp[large] in scope during the CASE blocks
1778 (although a clever back end *could* see that it isn't referenced
1779 in them, and thus free that temp before executing the blocks).
1780
1781 So this approach is used instead:
1782
1783 @smallexample
1784 @{
1785   int temp0;
1786
1787   @{
1788     char temp1[large];
1789
1790     libg77_catenate (temp1, 'prefix', a);
1791     temp0 = i (temp1);
1792   @}
1793
1794   switch (temp0)
1795     @{
1796     case 0:
1797       @dots{}
1798     @}
1799 @}
1800 @end smallexample
1801
1802 Note how @samp{temp1} goes out of scope before starting the switch,
1803 thus making it easy for a back end to free it.
1804
1805 The problem @emph{that} solution has, however,
1806 is with @samp{SELECT CASE('prefix'//A)}
1807 (which is currently not supported).
1808
1809 Unless the GBEL is extended to support arbitrarily long character strings
1810 in its @code{case} facility,
1811 the FFE has to implement @code{SELECT CASE} on @code{CHARACTER}
1812 (probably excepting @code{CHARACTER*1})
1813 using a cascade of
1814 @code{if}, @code{elseif}, @code{else}, and @code{endif} constructs
1815 in GBEL.
1816
1817 To prevent the (potentially large) temporary,
1818 needed to hold the selected expression itself (@samp{'prefix'//A}),
1819 from being in scope during execution of the @code{CASE} blocks,
1820 two approaches are available:
1821
1822 @itemize @bullet
1823 @item
1824 Pre-evaluate all the @code{CASE} tests,
1825 producing an integer ordinal that is used,
1826 a la @samp{temp0} in the earlier example,
1827 as if @samp{SELECT CASE(temp0)} had been written.
1828
1829 Each corresponding @code{CASE} is replaced with @samp{CASE(@var{i})},
1830 where @var{i} is the ordinal for that case,
1831 determined while, or before,
1832 generating the cascade of @code{if}-related constructs
1833 to cope with @code{CHARACTER} selection.
1834
1835 @item
1836 Make @samp{temp0} above just
1837 large enough to hold the longest @code{CASE} string
1838 that'll actually be compared against the expression
1839 (in this case, @samp{'prefix'//A}).
1840
1841 Since that length must be constant
1842 (because @code{CASE} expressions are all constant),
1843 it won't be so large,
1844 and, further, @samp{temp1} need not be dynamically allocated,
1845 since normal @code{CHARACTER} assignment can be used
1846 into the fixed-length @samp{temp0}.
1847 @end itemize
1848
1849 Both of these solutions require @code{SELECT CASE} implementation
1850 to be changed so all the corresponding @code{CASE} statements
1851 are seen during the actual code generation for @code{SELECT CASE}.
1852
1853 @node Transforming Expressions
1854 @section Transforming Expressions
1855
1856 The interactions between statements, expressions, and subexpressions
1857 at program run time can be viewed as:
1858
1859 @smallexample
1860 @var{action}(@var{expr})
1861 @end smallexample
1862
1863 Here, @var{action} is the series of steps
1864 performed to effect the statement,
1865 and @var{expr} is the expression
1866 whose value is used by @var{action}.
1867
1868 Expanding the above shows a typical order of events at run time:
1869
1870 @smallexample
1871 Evaluate @var{expr}
1872 Perform @var{action}, using result of evaluation of @var{expr}
1873 Clean up after evaluating @var{expr}
1874 @end smallexample
1875
1876 So, if evaluating @var{expr} requires allocating memory,
1877 that memory can be freed before performing @var{action}
1878 only if it is not needed to hold the result of evaluating @var{expr}.
1879 Otherwise, it must be freed no sooner than
1880 after @var{action} has been performed.
1881
1882 The above are recursive definitions,
1883 in the sense that they apply to subexpressions of @var{expr}.
1884
1885 That is, evaluating @var{expr} involves
1886 evaluating all of its subexpressions,
1887 performing the @var{action} that computes the
1888 result value of @var{expr},
1889 then cleaning up after evaluating those subexpressions.
1890
1891 The recursive nature of this evaluation is implemented
1892 via recursive-descent transformation of the top-level statements,
1893 their expressions, @emph{their} subexpressions, and so on.
1894
1895 However, that recursive-descent transformation is,
1896 due to the nature of the GBEL,
1897 focused primarily on generating a @emph{single} stream of code
1898 to be executed at run time.
1899
1900 Yet, from the above, it's clear that multiple streams of code
1901 must effectively be simultaneously generated
1902 during the recursive-descent analysis of statements.
1903
1904 The primary stream implements the primary @var{action} items,
1905 while at least two other streams implement
1906 the evaluation and clean-up items.
1907
1908 Requirements imposed by expressions include:
1909
1910 @itemize @bullet
1911 @item
1912 Whether the caller needs to have a temporary ready
1913 to hold the value of the expression.
1914
1915 @item
1916 Other stuff???
1917 @end itemize
1918
1919 @node Internal Naming Conventions
1920 @section Internal Naming Conventions
1921
1922 Names exported by FFE modules have the following (regular-expression) forms.
1923 Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}},
1924 where @var{mod} is lowercase or uppercase alphanumerics, respectively,
1925 are exported by the module @code{ffe@var{mod}},
1926 with the source code doing the exporting in @file{@var{mod}.h}.
1927 (Usually, the source code for the implementation is in @file{@var{mod}.c}.)
1928
1929 Identifiers that don't fit the following forms
1930 are not considered exported,
1931 even if they are according to the C language.
1932 (For example, they might be made available to other modules
1933 solely for use within expansions of exported macros,
1934 not for use within any source code in those other modules.)
1935
1936 @table @code
1937 @item ffe@var{mod}
1938 The single typedef exported by the module.
1939
1940 @item FFE@var{umod}_[A-Z][A-Z0-9_]*
1941 (Where @var{umod} is the uppercase for of @var{mod}.)
1942
1943 A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}.
1944
1945 @item ffe@var{mod}[A-Z][A-Z][a-z0-9]*
1946 A typedef exported by the module.
1947
1948 The portion of the identifier after @code{ffe@var{mod}} is
1949 referred to as @code{ctype}, a capitalized (mixed-case) form
1950 of @code{type}.
1951
1952 @item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]?
1953 (Where @var{umod} is the uppercase for of @var{mod}.)
1954
1955 A @code{#define} or @code{enum} constant of the type
1956 @code{ffe@var{mod}@var{type}},
1957 where @var{type} is the lowercase form of @var{ctype}
1958 in an exported typedef.
1959
1960 @item ffe@var{mod}_@var{value}
1961 A function that does or returns something,
1962 as described by @var{value} (see below).
1963
1964 @item ffe@var{mod}_@var{value}_@var{input}
1965 A function that does or returns something based
1966 primarily on the thing described by @var{input} (see below).
1967 @end table
1968
1969 Below are names used for @var{value} and @var{input},
1970 along with their definitions.
1971
1972 @table @code
1973 @item col
1974 A column number within a line (first column is number 1).
1975
1976 @item file
1977 An encapsulation of a file's name.
1978
1979 @item find
1980 Looks up an instance of some type that matches specified criteria,
1981 and returns that, even if it has to create a new instance or
1982 crash trying to find it (as appropriate).
1983
1984 @item initialize
1985 Initializes, usually a module.  No type.
1986
1987 @item int
1988 A generic integer of type @code{int}.
1989
1990 @item is
1991 A generic integer that contains a true (non-zero) or false (zero) value.
1992
1993 @item len
1994 A generic integer that contains the length of something.
1995
1996 @item line
1997 A line number within a source file,
1998 or a global line number.
1999
2000 @item lookup
2001 Looks up an instance of some type that matches specified criteria,
2002 and returns that, or returns nil.
2003
2004 @item name
2005 A @code{text} that points to a name of something.
2006
2007 @item new
2008 Makes a new instance of the indicated type.
2009 Might return an existing one if appropriate---if so,
2010 similar to @code{find} without crashing.
2011
2012 @item pt
2013 Pointer to a particular character (line, column pairs)
2014 in the input file (source code being compiled).
2015
2016 @item run
2017 Performs some herculean task.  No type.
2018
2019 @item terminate
2020 Terminates, usually a module.  No type.
2021
2022 @item text
2023 A @code{char *} that points to generic text.
2024 @end table