contrib/gdb-7/gdb/doc/agentexpr.texi

   1 @c \input texinfo
   2 @c %**start of header
   3 @c @setfilename agentexpr.info
   4 @c @settitle GDB Agent Expressions
   5 @c @setchapternewpage off
   6 @c %**end of header
   7
   8 @c This file is part of the GDB manual.
   9 @c
  10 @c Copyright (C) 2003, 2004, 2005, 2006, 2009
  11 @c               Free Software Foundation, Inc.
  12 @c
  13 @c See the file gdb.texinfo for copying conditions.
  14
  15 @node Agent Expressions
  16 @appendix The GDB Agent Expression Mechanism
  17
  18 In some applications, it is not feasible for the debugger to interrupt
  19 the program's execution long enough for the developer to learn anything
  20 helpful about its behavior.  If the program's correctness depends on its
  21 real-time behavior, delays introduced by a debugger might cause the
  22 program to fail, even when the code itself is correct.  It is useful to
  23 be able to observe the program's behavior without interrupting it.
  24
  25 Using GDB's @code{trace} and @code{collect} commands, the user can
  26 specify locations in the program, and arbitrary expressions to evaluate
  27 when those locations are reached.  Later, using the @code{tfind}
  28 command, she can examine the values those expressions had when the
  29 program hit the trace points.  The expressions may also denote objects
  30 in memory --- structures or arrays, for example --- whose values GDB
  31 should record; while visiting a particular tracepoint, the user may
  32 inspect those objects as if they were in memory at that moment.
  33 However, because GDB records these values without interacting with the
  34 user, it can do so quickly and unobtrusively, hopefully not disturbing
  35 the program's behavior.
  36
  37 When GDB is debugging a remote target, the GDB @dfn{agent} code running
  38 on the target computes the values of the expressions itself.  To avoid
  39 having a full symbolic expression evaluator on the agent, GDB translates
  40 expressions in the source language into a simpler bytecode language, and
  41 then sends the bytecode to the agent; the agent then executes the
  42 bytecode, and records the values for GDB to retrieve later.
  43
  44 The bytecode language is simple; there are forty-odd opcodes, the bulk
  45 of which are the usual vocabulary of C operands (addition, subtraction,
  46 shifts, and so on) and various sizes of literals and memory reference
  47 operations.  The bytecode interpreter operates strictly on machine-level
  48 values --- various sizes of integers and floating point numbers --- and
  49 requires no information about types or symbols; thus, the interpreter's
  50 internal data structures are simple, and each bytecode requires only a
  51 few native machine instructions to implement it.  The interpreter is
  52 small, and strict limits on the memory and time required to evaluate an
  53 expression are easy to determine, making it suitable for use by the
  54 debugging agent in real-time applications.
  55
  56 @menu
  57 * General Bytecode Design::     Overview of the interpreter.
  58 * Bytecode Descriptions::       What each one does.
  59 * Using Agent Expressions::     How agent expressions fit into the big picture.
  60 * Varying Target Capabilities:: How to discover what the target can do.
  61 * Tracing on Symmetrix::        Special info for implementation on EMC's
  62                                 boxes.
  63 * Rationale::                   Why we did it this way.
  64 @end menu
  65
  66
  67 @c @node Rationale
  68 @c @section Rationale
  69
  70
  71 @node General Bytecode Design
  72 @section General Bytecode Design
  73
  74 The agent represents bytecode expressions as an array of bytes.  Each
  75 instruction is one byte long (thus the term @dfn{bytecode}).  Some
  76 instructions are followed by operand bytes; for example, the @code{goto}
  77 instruction is followed by a destination for the jump.
  78
  79 The bytecode interpreter is a stack-based machine; most instructions pop
  80 their operands off the stack, perform some operation, and push the
  81 result back on the stack for the next instruction to consume.  Each
  82 element of the stack may contain either a integer or a floating point
  83 value; these values are as many bits wide as the largest integer that
  84 can be directly manipulated in the source language.  Stack elements
  85 carry no record of their type; bytecode could push a value as an
  86 integer, then pop it as a floating point value.  However, GDB will not
  87 generate code which does this.  In C, one might define the type of a
  88 stack element as follows:
  89 @example
  90 union agent_val @{
  91   LONGEST l;
  92   DOUBLEST d;
  93 @};
  94 @end example
  95 @noindent
  96 where @code{LONGEST} and @code{DOUBLEST} are @code{typedef} names for
  97 the largest integer and floating point types on the machine.
  98
  99 By the time the bytecode interpreter reaches the end of the expression,
 100 the value of the expression should be the only value left on the stack.
 101 For tracing applications, @code{trace} bytecodes in the expression will
 102 have recorded the necessary data, and the value on the stack may be
 103 discarded.  For other applications, like conditional breakpoints, the
 104 value may be useful.
 105
 106 Separate from the stack, the interpreter has two registers:
 107 @table @code
 108 @item pc
 109 The address of the next bytecode to execute.
 110
 111 @item start
 112 The address of the start of the bytecode expression, necessary for
 113 interpreting the @code{goto} and @code{if_goto} instructions.
 114
 115 @end table
 116 @noindent
 117 Neither of these registers is directly visible to the bytecode language
 118 itself, but they are useful for defining the meanings of the bytecode
 119 operations.
 120
 121 There are no instructions to perform side effects on the running
 122 program, or call the program's functions; we assume that these
 123 expressions are only used for unobtrusive debugging, not for patching
 124 the running code.
 125
 126 Most bytecode instructions do not distinguish between the various sizes
 127 of values, and operate on full-width values; the upper bits of the
 128 values are simply ignored, since they do not usually make a difference
 129 to the value computed.  The exceptions to this rule are:
 130 @table @asis
 131
 132 @item memory reference instructions (@code{ref}@var{n})
 133 There are distinct instructions to fetch different word sizes from
 134 memory.  Once on the stack, however, the values are treated as full-size
 135 integers.  They may need to be sign-extended; the @code{ext} instruction
 136 exists for this purpose.
 137
 138 @item the sign-extension instruction (@code{ext} @var{n})
 139 These clearly need to know which portion of their operand is to be
 140 extended to occupy the full length of the word.
 141
 142 @end table
 143
 144 If the interpreter is unable to evaluate an expression completely for
 145 some reason (a memory location is inaccessible, or a divisor is zero,
 146 for example), we say that interpretation ``terminates with an error''.
 147 This means that the problem is reported back to the interpreter's caller
 148 in some helpful way.  In general, code using agent expressions should
 149 assume that they may attempt to divide by zero, fetch arbitrary memory
 150 locations, and misbehave in other ways.
 151
 152 Even complicated C expressions compile to a few bytecode instructions;
 153 for example, the expression @code{x + y * z} would typically produce
 154 code like the following, assuming that @code{x} and @code{y} live in
 155 registers, and @code{z} is a global variable holding a 32-bit
 156 @code{int}:
 157 @example
 158 reg 1
 159 reg 2
 160 const32 @i{address of z}
 161 ref32
 162 ext 32
 163 mul
 164 add
 165 end
 166 @end example
 167
 168 In detail, these mean:
 169 @table @code
 170
 171 @item reg 1
 172 Push the value of register 1 (presumably holding @code{x}) onto the
 173 stack.
 174
 175 @item reg 2
 176 Push the value of register 2 (holding @code{y}).
 177
 178 @item const32 @i{address of z}
 179 Push the address of @code{z} onto the stack.
 180
 181 @item ref32
 182 Fetch a 32-bit word from the address at the top of the stack; replace
 183 the address on the stack with the value.  Thus, we replace the address
 184 of @code{z} with @code{z}'s value.
 185
 186 @item ext 32
 187 Sign-extend the value on the top of the stack from 32 bits to full
 188 length.  This is necessary because @code{z} is a signed integer.
 189
 190 @item mul
 191 Pop the top two numbers on the stack, multiply them, and push their
 192 product.  Now the top of the stack contains the value of the expression
 193 @code{y * z}.
 194
 195 @item add
 196 Pop the top two numbers, add them, and push the sum.  Now the top of the
 197 stack contains the value of @code{x + y * z}.
 198
 199 @item end
 200 Stop executing; the value left on the stack top is the value to be
 201 recorded.
 202
 203 @end table
 204
 205
 206 @node Bytecode Descriptions
 207 @section Bytecode Descriptions
 208
 209 Each bytecode description has the following form:
 210
 211 @table @asis
 212
 213 @item @code{add} (0x02): @var{a} @var{b} @result{} @var{a+b}
 214
 215 Pop the top two stack items, @var{a} and @var{b}, as integers; push
 216 their sum, as an integer.
 217
 218 @end table
 219
 220 In this example, @code{add} is the name of the bytecode, and
 221 @code{(0x02)} is the one-byte value used to encode the bytecode, in
 222 hexadecimal.  The phrase ``@var{a} @var{b} @result{} @var{a+b}'' shows
 223 the stack before and after the bytecode executes.  Beforehand, the stack
 224 must contain at least two values, @var{a} and @var{b}; since the top of
 225 the stack is to the right, @var{b} is on the top of the stack, and
 226 @var{a} is underneath it.  After execution, the bytecode will have
 227 popped @var{a} and @var{b} from the stack, and replaced them with a
 228 single value, @var{a+b}.  There may be other values on the stack below
 229 those shown, but the bytecode affects only those shown.
 230
 231 Here is another example:
 232
 233 @table @asis
 234
 235 @item @code{const8} (0x22) @var{n}: @result{} @var{n}
 236 Push the 8-bit integer constant @var{n} on the stack, without sign
 237 extension.
 238
 239 @end table
 240
 241 In this example, the bytecode @code{const8} takes an operand @var{n}
 242 directly from the bytecode stream; the operand follows the @code{const8}
 243 bytecode itself.  We write any such operands immediately after the name
 244 of the bytecode, before the colon, and describe the exact encoding of
 245 the operand in the bytecode stream in the body of the bytecode
 246 description.
 247
 248 For the @code{const8} bytecode, there are no stack items given before
 249 the @result{}; this simply means that the bytecode consumes no values
 250 from the stack.  If a bytecode consumes no values, or produces no
 251 values, the list on either side of the @result{} may be empty.
 252
 253 If a value is written as @var{a}, @var{b}, or @var{n}, then the bytecode
 254 treats it as an integer.  If a value is written is @var{addr}, then the
 255 bytecode treats it as an address.
 256
 257 We do not fully describe the floating point operations here; although
 258 this design can be extended in a clean way to handle floating point
 259 values, they are not of immediate interest to the customer, so we avoid
 260 describing them, to save time.
 261
 262
 263 @table @asis
 264
 265 @item @code{float} (0x01): @result{}
 266
 267 Prefix for floating-point bytecodes.  Not implemented yet.
 268
 269 @item @code{add} (0x02): @var{a} @var{b} @result{} @var{a+b}
 270 Pop two integers from the stack, and push their sum, as an integer.
 271
 272 @item @code{sub} (0x03): @var{a} @var{b} @result{} @var{a-b}
 273 Pop two integers from the stack, subtract the top value from the
 274 next-to-top value, and push the difference.
 275
 276 @item @code{mul} (0x04): @var{a} @var{b} @result{} @var{a*b}
 277 Pop two integers from the stack, multiply them, and push the product on
 278 the stack.  Note that, when one multiplies two @var{n}-bit numbers
 279 yielding another @var{n}-bit number, it is irrelevant whether the
 280 numbers are signed or not; the results are the same.
 281
 282 @item @code{div_signed} (0x05): @var{a} @var{b} @result{} @var{a/b}
 283 Pop two signed integers from the stack; divide the next-to-top value by
 284 the top value, and push the quotient.  If the divisor is zero, terminate
 285 with an error.
 286
 287 @item @code{div_unsigned} (0x06): @var{a} @var{b} @result{} @var{a/b}
 288 Pop two unsigned integers from the stack; divide the next-to-top value
 289 by the top value, and push the quotient.  If the divisor is zero,
 290 terminate with an error.
 291
 292 @item @code{rem_signed} (0x07): @var{a} @var{b} @result{} @var{a modulo b}
 293 Pop two signed integers from the stack; divide the next-to-top value by
 294 the top value, and push the remainder.  If the divisor is zero,
 295 terminate with an error.
 296
 297 @item @code{rem_unsigned} (0x08): @var{a} @var{b} @result{} @var{a modulo b}
 298 Pop two unsigned integers from the stack; divide the next-to-top value
 299 by the top value, and push the remainder.  If the divisor is zero,
 300 terminate with an error.
 301
 302 @item @code{lsh} (0x09): @var{a} @var{b} @result{} @var{a<<b}
 303 Pop two integers from the stack; let @var{a} be the next-to-top value,
 304 and @var{b} be the top value.  Shift @var{a} left by @var{b} bits, and
 305 push the result.
 306
 307 @item @code{rsh_signed} (0x0a): @var{a} @var{b} @result{} @code{(signed)}@var{a>>b}
 308 Pop two integers from the stack; let @var{a} be the next-to-top value,
 309 and @var{b} be the top value.  Shift @var{a} right by @var{b} bits,
 310 inserting copies of the top bit at the high end, and push the result.
 311
 312 @item @code{rsh_unsigned} (0x0b): @var{a} @var{b} @result{} @var{a>>b}
 313 Pop two integers from the stack; let @var{a} be the next-to-top value,
 314 and @var{b} be the top value.  Shift @var{a} right by @var{b} bits,
 315 inserting zero bits at the high end, and push the result.
 316
 317 @item @code{log_not} (0x0e): @var{a} @result{} @var{!a}
 318 Pop an integer from the stack; if it is zero, push the value one;
 319 otherwise, push the value zero.
 320
 321 @item @code{bit_and} (0x0f): @var{a} @var{b} @result{} @var{a&b}
 322 Pop two integers from the stack, and push their bitwise @code{and}.
 323
 324 @item @code{bit_or} (0x10): @var{a} @var{b} @result{} @var{a|b}
 325 Pop two integers from the stack, and push their bitwise @code{or}.
 326
 327 @item @code{bit_xor} (0x11): @var{a} @var{b} @result{} @var{a^b}
 328 Pop two integers from the stack, and push their bitwise
 329 exclusive-@code{or}.
 330
 331 @item @code{bit_not} (0x12): @var{a} @result{} @var{~a}
 332 Pop an integer from the stack, and push its bitwise complement.
 333
 334 @item @code{equal} (0x13): @var{a} @var{b} @result{} @var{a=b}
 335 Pop two integers from the stack; if they are equal, push the value one;
 336 otherwise, push the value zero.
 337
 338 @item @code{less_signed} (0x14): @var{a} @var{b} @result{} @var{a<b}
 339 Pop two signed integers from the stack; if the next-to-top value is less
 340 than the top value, push the value one; otherwise, push the value zero.
 341
 342 @item @code{less_unsigned} (0x15): @var{a} @var{b} @result{} @var{a<b}
 343 Pop two unsigned integers from the stack; if the next-to-top value is less
 344 than the top value, push the value one; otherwise, push the value zero.
 345
 346 @item @code{ext} (0x16) @var{n}: @var{a} @result{} @var{a}, sign-extended from @var{n} bits
 347 Pop an unsigned value from the stack; treating it as an @var{n}-bit
 348 twos-complement value, extend it to full length.  This means that all
 349 bits to the left of bit @var{n-1} (where the least significant bit is bit
 350 0) are set to the value of bit @var{n-1}.  Note that @var{n} may be
 351 larger than or equal to the width of the stack elements of the bytecode
 352 engine; in this case, the bytecode should have no effect.
 353
 354 The number of source bits to preserve, @var{n}, is encoded as a single
 355 byte unsigned integer following the @code{ext} bytecode.
 356
 357 @item @code{zero_ext} (0x2a) @var{n}: @var{a} @result{} @var{a}, zero-extended from @var{n} bits
 358 Pop an unsigned value from the stack; zero all but the bottom @var{n}
 359 bits.  This means that all bits to the left of bit @var{n-1} (where the
 360 least significant bit is bit 0) are set to the value of bit @var{n-1}.
 361
 362 The number of source bits to preserve, @var{n}, is encoded as a single
 363 byte unsigned integer following the @code{zero_ext} bytecode.
 364
 365 @item @code{ref8} (0x17): @var{addr} @result{} @var{a}
 366 @itemx @code{ref16} (0x18): @var{addr} @result{} @var{a}
 367 @itemx @code{ref32} (0x19): @var{addr} @result{} @var{a}
 368 @itemx @code{ref64} (0x1a): @var{addr} @result{} @var{a}
 369 Pop an address @var{addr} from the stack.  For bytecode
 370 @code{ref}@var{n}, fetch an @var{n}-bit value from @var{addr}, using the
 371 natural target endianness.  Push the fetched value as an unsigned
 372 integer.
 373
 374 Note that @var{addr} may not be aligned in any particular way; the
 375 @code{ref@var{n}} bytecodes should operate correctly for any address.
 376
 377 If attempting to access memory at @var{addr} would cause a processor
 378 exception of some sort, terminate with an error.
 379
 380 @item @code{ref_float} (0x1b): @var{addr} @result{} @var{d}
 381 @itemx @code{ref_double} (0x1c): @var{addr} @result{} @var{d}
 382 @itemx @code{ref_long_double} (0x1d): @var{addr} @result{} @var{d}
 383 @itemx @code{l_to_d} (0x1e): @var{a} @result{} @var{d}
 384 @itemx @code{d_to_l} (0x1f): @var{d} @result{} @var{a}
 385 Not implemented yet.
 386
 387 @item @code{dup} (0x28): @var{a} => @var{a} @var{a}
 388 Push another copy of the stack's top element.
 389
 390 @item @code{swap} (0x2b): @var{a} @var{b} => @var{b} @var{a}
 391 Exchange the top two items on the stack.
 392
 393 @item @code{pop} (0x29): @var{a} =>
 394 Discard the top value on the stack.
 395
 396 @item @code{if_goto} (0x20) @var{offset}: @var{a} @result{}
 397 Pop an integer off the stack; if it is non-zero, branch to the given
 398 offset in the bytecode string.  Otherwise, continue to the next
 399 instruction in the bytecode stream.  In other words, if @var{a} is
 400 non-zero, set the @code{pc} register to @code{start} + @var{offset}.
 401 Thus, an offset of zero denotes the beginning of the expression.
 402
 403 The @var{offset} is stored as a sixteen-bit unsigned value, stored
 404 immediately following the @code{if_goto} bytecode.  It is always stored
 405 most significant byte first, regardless of the target's normal
 406 endianness.  The offset is not guaranteed to fall at any particular
 407 alignment within the bytecode stream; thus, on machines where fetching a
 408 16-bit on an unaligned address raises an exception, you should fetch the
 409 offset one byte at a time.
 410
 411 @item @code{goto} (0x21) @var{offset}: @result{}
 412 Branch unconditionally to @var{offset}; in other words, set the
 413 @code{pc} register to @code{start} + @var{offset}.
 414
 415 The offset is stored in the same way as for the @code{if_goto} bytecode.
 416
 417 @item @code{const8} (0x22) @var{n}: @result{} @var{n}
 418 @itemx @code{const16} (0x23) @var{n}: @result{} @var{n}
 419 @itemx @code{const32} (0x24) @var{n}: @result{} @var{n}
 420 @itemx @code{const64} (0x25) @var{n}: @result{} @var{n}
 421 Push the integer constant @var{n} on the stack, without sign extension.
 422 To produce a small negative value, push a small twos-complement value,
 423 and then sign-extend it using the @code{ext} bytecode.
 424
 425 The constant @var{n} is stored in the appropriate number of bytes
 426 following the @code{const}@var{b} bytecode.  The constant @var{n} is
 427 always stored most significant byte first, regardless of the target's
 428 normal endianness.  The constant is not guaranteed to fall at any
 429 particular alignment within the bytecode stream; thus, on machines where
 430 fetching a 16-bit on an unaligned address raises an exception, you
 431 should fetch @var{n} one byte at a time.
 432
 433 @item @code{reg} (0x26) @var{n}: @result{} @var{a}
 434 Push the value of register number @var{n}, without sign extension.  The
 435 registers are numbered following GDB's conventions.
 436
 437 The register number @var{n} is encoded as a 16-bit unsigned integer
 438 immediately following the @code{reg} bytecode.  It is always stored most
 439 significant byte first, regardless of the target's normal endianness.
 440 The register number is not guaranteed to fall at any particular
 441 alignment within the bytecode stream; thus, on machines where fetching a
 442 16-bit on an unaligned address raises an exception, you should fetch the
 443 register number one byte at a time.
 444
 445 @item @code{trace} (0x0c): @var{addr} @var{size} @result{}
 446 Record the contents of the @var{size} bytes at @var{addr} in a trace
 447 buffer, for later retrieval by GDB.
 448
 449 @item @code{trace_quick} (0x0d) @var{size}: @var{addr} @result{} @var{addr}
 450 Record the contents of the @var{size} bytes at @var{addr} in a trace
 451 buffer, for later retrieval by GDB.  @var{size} is a single byte
 452 unsigned integer following the @code{trace} opcode.
 453
 454 This bytecode is equivalent to the sequence @code{dup const8 @var{size}
 455 trace}, but we provide it anyway to save space in bytecode strings.
 456
 457 @item @code{trace16} (0x30) @var{size}: @var{addr} @result{} @var{addr}
 458 Identical to trace_quick, except that @var{size} is a 16-bit big-endian
 459 unsigned integer, not a single byte.  This should probably have been
 460 named @code{trace_quick16}, for consistency.
 461
 462 @item @code{end} (0x27): @result{}
 463 Stop executing bytecode; the result should be the top element of the
 464 stack.  If the purpose of the expression was to compute an lvalue or a
 465 range of memory, then the next-to-top of the stack is the lvalue's
 466 address, and the top of the stack is the lvalue's size, in bytes.
 467
 468 @end table
 469
 470
 471 @node Using Agent Expressions
 472 @section Using Agent Expressions
 473
 474 Agent expressions can be used in several different ways by @value{GDBN},
 475 and the debugger can generate different bytecode sequences as appropriate.
 476
 477 One possibility is to do expression evaluation on the target rather
 478 than the host, such as for the conditional of a conditional
 479 tracepoint.  In such a case, @value{GDBN} compiles the source
 480 expression into a bytecode sequence that simply gets values from
 481 registers or memory, does arithmetic, and returns a result.
 482
 483 Another way to use agent expressions is for tracepoint data
 484 collection.  @value{GDBN} generates a different bytecode sequence for
 485 collection; in addition to bytecodes that do the calculation,
 486 @value{GDBN} adds @code{trace} bytecodes to save the pieces of
 487 memory that were used.
 488
 489 @itemize @bullet
 490
 491 @item
 492 The user selects trace points in the program's code at which GDB should
 493 collect data.
 494
 495 @item
 496 The user specifies expressions to evaluate at each trace point.  These
 497 expressions may denote objects in memory, in which case those objects'
 498 contents are recorded as the program runs, or computed values, in which
 499 case the values themselves are recorded.
 500
 501 @item
 502 GDB transmits the tracepoints and their associated expressions to the
 503 GDB agent, running on the debugging target.
 504
 505 @item
 506 The agent arranges to be notified when a trace point is hit.  Note that,
 507 on some systems, the target operating system is completely responsible
 508 for collecting the data; see @ref{Tracing on Symmetrix}.
 509
 510 @item
 511 When execution on the target reaches a trace point, the agent evaluates
 512 the expressions associated with that trace point, and records the
 513 resulting values and memory ranges.
 514
 515 @item
 516 Later, when the user selects a given trace event and inspects the
 517 objects and expression values recorded, GDB talks to the agent to
 518 retrieve recorded data as necessary to meet the user's requests.  If the
 519 user asks to see an object whose contents have not been recorded, GDB
 520 reports an error.
 521
 522 @end itemize
 523
 524
 525 @node Varying Target Capabilities
 526 @section Varying Target Capabilities
 527
 528 Some targets don't support floating-point, and some would rather not
 529 have to deal with @code{long long} operations.  Also, different targets
 530 will have different stack sizes, and different bytecode buffer lengths.
 531
 532 Thus, GDB needs a way to ask the target about itself.  We haven't worked
 533 out the details yet, but in general, GDB should be able to send the
 534 target a packet asking it to describe itself.  The reply should be a
 535 packet whose length is explicit, so we can add new information to the
 536 packet in future revisions of the agent, without confusing old versions
 537 of GDB, and it should contain a version number.  It should contain at
 538 least the following information:
 539
 540 @itemize @bullet
 541
 542 @item
 543 whether floating point is supported
 544
 545 @item
 546 whether @code{long long} is supported
 547
 548 @item
 549 maximum acceptable size of bytecode stack
 550
 551 @item
 552 maximum acceptable length of bytecode expressions
 553
 554 @item
 555 which registers are actually available for collection
 556
 557 @item
 558 whether the target supports disabled tracepoints
 559
 560 @end itemize
 561
 562
 563
 564 @node Tracing on Symmetrix
 565 @section Tracing on Symmetrix
 566
 567 This section documents the API used by the GDB agent to collect data on
 568 Symmetrix systems.
 569
 570 Cygnus originally implemented these tracing features to help EMC
 571 Corporation debug their Symmetrix high-availability disk drives.  The
 572 Symmetrix application code already includes substantial tracing
 573 facilities; the GDB agent for the Symmetrix system uses those facilities
 574 for its own data collection, via the API described here.
 575
 576 @deftypefn Function DTC_RESPONSE adbg_find_memory_in_frame (FRAME_DEF *@var{frame}, char *@var{address}, char **@var{buffer}, unsigned int *@var{size})
 577 Search the trace frame @var{frame} for memory saved from @var{address}.
 578 If the memory is available, provide the address of the buffer holding
 579 it; otherwise, provide the address of the next saved area.
 580
 581 @itemize @bullet
 582
 583 @item
 584 If the memory at @var{address} was saved in @var{frame}, set
 585 @code{*@var{buffer}} to point to the buffer in which that memory was
 586 saved, set @code{*@var{size}} to the number of bytes from @var{address}
 587 that are saved at @code{*@var{buffer}}, and return
 588 @code{OK_TARGET_RESPONSE}.  (Clearly, in this case, the function will
 589 always set @code{*@var{size}} to a value greater than zero.)
 590
 591 @item
 592 If @var{frame} does not record any memory at @var{address}, set
 593 @code{*@var{size}} to the distance from @var{address} to the start of
 594 the saved region with the lowest address higher than @var{address}.  If
 595 there is no memory saved from any higher address, set @code{*@var{size}}
 596 to zero.  Return @code{NOT_FOUND_TARGET_RESPONSE}.
 597 @end itemize
 598
 599 These two possibilities allow the caller to either retrieve the data, or
 600 walk the address space to the next saved area.
 601 @end deftypefn
 602
 603 This function allows the GDB agent to map the regions of memory saved in
 604 a particular frame, and retrieve their contents efficiently.
 605
 606 This function also provides a clean interface between the GDB agent and
 607 the Symmetrix tracing structures, making it easier to adapt the GDB
 608 agent to future versions of the Symmetrix system, and vice versa.  This
 609 function searches all data saved in @var{frame}, whether the data is
 610 there at the request of a bytecode expression, or because it falls in
 611 one of the format's memory ranges, or because it was saved from the top
 612 of the stack.  EMC can arbitrarily change and enhance the tracing
 613 mechanism, but as long as this function works properly, all collected
 614 memory is visible to GDB.
 615
 616 The function itself is straightforward to implement.  A single pass over
 617 the trace frame's stack area, memory ranges, and expression blocks can
 618 yield the address of the buffer (if the requested address was saved),
 619 and also note the address of the next higher range of memory, to be
 620 returned when the search fails.
 621
 622 As an example, suppose the trace frame @code{f} has saved sixteen bytes
 623 from address @code{0x8000} in a buffer at @code{0x1000}, and thirty-two
 624 bytes from address @code{0xc000} in a buffer at @code{0x1010}.  Here are
 625 some sample calls, and the effect each would have:
 626
 627 @table @code
 628
 629 @item adbg_find_memory_in_frame (f, (char*) 0x8000, &buffer, &size)
 630 This would set @code{buffer} to @code{0x1000}, set @code{size} to
 631 sixteen, and return @code{OK_TARGET_RESPONSE}, since @code{f} saves
 632 sixteen bytes from @code{0x8000} at @code{0x1000}.
 633
 634 @item adbg_find_memory_in_frame (f, (char *) 0x8004, &buffer, &size)
 635 This would set @code{buffer} to @code{0x1004}, set @code{size} to
 636 twelve, and return @code{OK_TARGET_RESPONSE}, since @file{f} saves the
 637 twelve bytes from @code{0x8004} starting four bytes into the buffer at
 638 @code{0x1000}.  This shows that request addresses may fall in the middle
 639 of saved areas; the function should return the address and size of the
 640 remainder of the buffer.
 641
 642 @item adbg_find_memory_in_frame (f, (char *) 0x8100, &buffer, &size)
 643 This would set @code{size} to @code{0x3f00} and return
 644 @code{NOT_FOUND_TARGET_RESPONSE}, since there is no memory saved in
 645 @code{f} from the address @code{0x8100}, and the next memory available
 646 is at @code{0x8100 + 0x3f00}, or @code{0xc000}.  This shows that request
 647 addresses may fall outside of all saved memory ranges; the function
 648 should indicate the next saved area, if any.
 649
 650 @item adbg_find_memory_in_frame (f, (char *) 0x7000, &buffer, &size)
 651 This would set @code{size} to @code{0x1000} and return
 652 @code{NOT_FOUND_TARGET_RESPONSE}, since the next saved memory is at
 653 @code{0x7000 + 0x1000}, or @code{0x8000}.
 654
 655 @item adbg_find_memory_in_frame (f, (char *) 0xf000, &buffer, &size)
 656 This would set @code{size} to zero, and return
 657 @code{NOT_FOUND_TARGET_RESPONSE}.  This shows how the function tells the
 658 caller that no further memory ranges have been saved.
 659
 660 @end table
 661
 662 As another example, here is a function which will print out the
 663 addresses of all memory saved in the trace frame @code{frame} on the
 664 Symmetrix INLINES console:
 665 @example
 666 void
 667 print_frame_addresses (FRAME_DEF *frame)
 668 @{
 669   char *addr;
 670   char *buffer;
 671   unsigned long size;
 672
 673   addr = 0;
 674   for (;;)
 675     @{
 676       /* Either find out how much memory we have here, or discover
 677          where the next saved region is.  */
 678       if (adbg_find_memory_in_frame (frame, addr, &buffer, &size)
 679           == OK_TARGET_RESPONSE)
 680         printp ("saved %x to %x\n", addr, addr + size);
 681       if (size == 0)
 682         break;
 683       addr += size;
 684     @}
 685 @}
 686 @end example
 687
 688 Note that there is not necessarily any connection between the order in
 689 which the data is saved in the trace frame, and the order in which
 690 @code{adbg_find_memory_in_frame} will return those memory ranges.  The
 691 code above will always print the saved memory regions in order of
 692 increasing address, while the underlying frame structure might store the
 693 data in a random order.
 694
 695 [[This section should cover the rest of the Symmetrix functions the stub
 696 relies upon, too.]]
 697
 698 @node Rationale
 699 @section Rationale
 700
 701 Some of the design decisions apparent above are arguable.
 702
 703 @table @b
 704
 705 @item What about stack overflow/underflow?
 706 GDB should be able to query the target to discover its stack size.
 707 Given that information, GDB can determine at translation time whether a
 708 given expression will overflow the stack.  But this spec isn't about
 709 what kinds of error-checking GDB ought to do.
 710
 711 @item Why are you doing everything in LONGEST?
 712
 713 Speed isn't important, but agent code size is; using LONGEST brings in a
 714 bunch of support code to do things like division, etc.  So this is a
 715 serious concern.
 716
 717 First, note that you don't need different bytecodes for different
 718 operand sizes.  You can generate code without @emph{knowing} how big the
 719 stack elements actually are on the target.  If the target only supports
 720 32-bit ints, and you don't send any 64-bit bytecodes, everything just
 721 works.  The observation here is that the MIPS and the Alpha have only
 722 fixed-size registers, and you can still get C's semantics even though
 723 most instructions only operate on full-sized words.  You just need to
 724 make sure everything is properly sign-extended at the right times.  So
 725 there is no need for 32- and 64-bit variants of the bytecodes.  Just
 726 implement everything using the largest size you support.
 727
 728 GDB should certainly check to see what sizes the target supports, so the
 729 user can get an error earlier, rather than later.  But this information
 730 is not necessary for correctness.
 731
 732
 733 @item Why don't you have @code{>} or @code{<=} operators?
 734 I want to keep the interpreter small, and we don't need them.  We can
 735 combine the @code{less_} opcodes with @code{log_not}, and swap the order
 736 of the operands, yielding all four asymmetrical comparison operators.
 737 For example, @code{(x <= y)} is @code{! (x > y)}, which is @code{! (y <
 738 x)}.
 739
 740 @item Why do you have @code{log_not}?
 741 @itemx Why do you have @code{ext}?
 742 @itemx Why do you have @code{zero_ext}?
 743 These are all easily synthesized from other instructions, but I expect
 744 them to be used frequently, and they're simple, so I include them to
 745 keep bytecode strings short.
 746
 747 @code{log_not} is equivalent to @code{const8 0 equal}; it's used in half
 748 the relational operators.
 749
 750 @code{ext @var{n}} is equivalent to @code{const8 @var{s-n} lsh const8
 751 @var{s-n} rsh_signed}, where @var{s} is the size of the stack elements;
 752 it follows @code{ref@var{m}} and @var{reg} bytecodes when the value
 753 should be signed.  See the next bulleted item.
 754
 755 @code{zero_ext @var{n}} is equivalent to @code{const@var{m} @var{mask}
 756 log_and}; it's used whenever we push the value of a register, because we
 757 can't assume the upper bits of the register aren't garbage.
 758
 759 @item Why not have sign-extending variants of the @code{ref} operators?
 760 Because that would double the number of @code{ref} operators, and we
 761 need the @code{ext} bytecode anyway for accessing bitfields.
 762
 763 @item Why not have constant-address variants of the @code{ref} operators?
 764 Because that would double the number of @code{ref} operators again, and
 765 @code{const32 @var{address} ref32} is only one byte longer.
 766
 767 @item Why do the @code{ref@var{n}} operators have to support unaligned fetches?
 768 GDB will generate bytecode that fetches multi-byte values at unaligned
 769 addresses whenever the executable's debugging information tells it to.
 770 Furthermore, GDB does not know the value the pointer will have when GDB
 771 generates the bytecode, so it cannot determine whether a particular
 772 fetch will be aligned or not.
 773
 774 In particular, structure bitfields may be several bytes long, but follow
 775 no alignment rules; members of packed structures are not necessarily
 776 aligned either.
 777
 778 In general, there are many cases where unaligned references occur in
 779 correct C code, either at the programmer's explicit request, or at the
 780 compiler's discretion.  Thus, it is simpler to make the GDB agent
 781 bytecodes work correctly in all circumstances than to make GDB guess in
 782 each case whether the compiler did the usual thing.
 783
 784 @item Why are there no side-effecting operators?
 785 Because our current client doesn't want them?  That's a cheap answer.  I
 786 think the real answer is that I'm afraid of implementing function
 787 calls.  We should re-visit this issue after the present contract is
 788 delivered.
 789
 790 @item Why aren't the @code{goto} ops PC-relative?
 791 The interpreter has the base address around anyway for PC bounds
 792 checking, and it seemed simpler.
 793
 794 @item Why is there only one offset size for the @code{goto} ops?
 795 Offsets are currently sixteen bits.  I'm not happy with this situation
 796 either:
 797
 798 Suppose we have multiple branch ops with different offset sizes.  As I
 799 generate code left-to-right, all my jumps are forward jumps (there are
 800 no loops in expressions), so I never know the target when I emit the
 801 jump opcode.  Thus, I have to either always assume the largest offset
 802 size, or do jump relaxation on the code after I generate it, which seems
 803 like a big waste of time.
 804
 805 I can imagine a reasonable expression being longer than 256 bytes.  I
 806 can't imagine one being longer than 64k.  Thus, we need 16-bit offsets.
 807 This kind of reasoning is so bogus, but relaxation is pathetic.
 808
 809 The other approach would be to generate code right-to-left.  Then I'd
 810 always know my offset size.  That might be fun.
 811
 812 @item Where is the function call bytecode?
 813
 814 When we add side-effects, we should add this.
 815
 816 @item Why does the @code{reg} bytecode take a 16-bit register number?
 817
 818 Intel's IA-64 architecture has 128 general-purpose registers,
 819 and 128 floating-point registers, and I'm sure it has some random
 820 control registers.
 821
 822 @item Why do we need @code{trace} and @code{trace_quick}?
 823 Because GDB needs to record all the memory contents and registers an
 824 expression touches.  If the user wants to evaluate an expression
 825 @code{x->y->z}, the agent must record the values of @code{x} and
 826 @code{x->y} as well as the value of @code{x->y->z}.
 827
 828 @item Don't the @code{trace} bytecodes make the interpreter less general?
 829 They do mean that the interpreter contains special-purpose code, but
 830 that doesn't mean the interpreter can only be used for that purpose.  If
 831 an expression doesn't use the @code{trace} bytecodes, they don't get in
 832 its way.
 833
 834 @item Why doesn't @code{trace_quick} consume its arguments the way everything else does?
 835 In general, you do want your operators to consume their arguments; it's
 836 consistent, and generally reduces the amount of stack rearrangement
 837 necessary.  However, @code{trace_quick} is a kludge to save space; it
 838 only exists so we needn't write @code{dup const8 @var{SIZE} trace}
 839 before every memory reference.  Therefore, it's okay for it not to
 840 consume its arguments; it's meant for a specific context in which we
 841 know exactly what it should do with the stack.  If we're going to have a
 842 kludge, it should be an effective kludge.
 843
 844 @item Why does @code{trace16} exist?
 845 That opcode was added by the customer that contracted Cygnus for the
 846 data tracing work.  I personally think it is unnecessary; objects that
 847 large will be quite rare, so it is okay to use @code{dup const16
 848 @var{size} trace} in those cases.
 849
 850 Whatever we decide to do with @code{trace16}, we should at least leave
 851 opcode 0x30 reserved, to remain compatible with the customer who added
 852 it.
 853
 854 @end table