1 .\" Copyright (c) 1979 The Regents of the University of California.
2 .\" All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
12 .\" 3. All advertising materials mentioning features or use of this software
13 .\" must display the following acknowledgement:
14 .\" This product includes software developed by the University of
15 .\" California, Berkeley and its contributors.
16 .\" 4. Neither the name of the University nor the names of its contributors
17 .\" may be used to endorse or promote products derived from this software
18 .\" without specific prior written permission.
20 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
21 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
23 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
24 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
26 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
27 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
28 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
29 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32 .\" @(#)pxin1.n 5.2 (Berkeley) 4/17/91
33 .\" $FreeBSD: src/share/doc/papers/px/pxin1.n,v 1.1.1.1.14.1 2000/11/30 17:13:59 ru Exp $
34 .\" $DragonFly: src/share/doc/papers/px/pxin1.n,v 1.2 2003/06/17 04:36:56 dillon Exp $
45 assembly language, using the
51 are also written in the
53 systems programming language C.
55 consists of a main procedure that reads in the interpreter code,
56 a main interpreter loop that transfers successively to various
57 code segments implementing the abstract machine operations,
58 built-in procedures and functions,
59 and several routines that support the implementation of the
60 Pascal input-output environment.
62 The interpreter runs at a fraction of the speed of equivalent
63 compiled C code, with this fraction varying from 1/5 to 1/15.
64 The interpreter occupies 18.5K bytes of instruction space, shared among
65 all processes executing Pascal, and has 4.6K bytes of data space (constants,
66 error messages, etc.) a copy of which is allocated to each executing process.
68 Format of the object file
71 normally interprets the code left in an object file by a run of the
74 The file where the translator puts the object originally, and the most
75 commonly interpreted file, is called
77 In order that all persons using
79 share a common text image, this executable file is
80 a small process that coordinates with the interpreter to start
82 The interpreter code is placed
83 at the end of a special ``header'' file and the size of the initialized
84 data area of this header file is expanded to include this code,
85 so that during execution it is located at an
86 easily determined address in its data space.
87 When executed, the object process creates a
89 creates another process by doing a
91 and arranges that the resulting parent process becomes an instance of
93 The child process then writes the interpreter code through
94 the pipe that it has to the
95 interpreter process parent.
96 When this process is complete, the child exits.
98 The real advantage of this approach is that it does not require modifications
99 to the shell, and that the resultant objects are ``true objects'' not
100 requiring special treatment.
101 A simpler mechanism would be to determine the name of the file that was
102 executed and pass this to the interpreter.
103 However it is not possible to determine this name
106 \*(dd\ For instance, if the
108 program is placed in the directory
110 then when the user types
112 the first argument to the program, nominally the programs name, is
114 While it would be possible to search in the standard place,
115 i.e. the current directory, and the system directories
119 for a corresponding object file,
120 this would be expensive and not guaranteed to succeed.
121 Several shells exist that allow other directories to be searched
122 for commands, and there is,
124 no way to determine what these directories are.
127 General features of object code
129 Pascal object code is relocatable as all addressing references for
130 control transfers within the code are relative.
131 The code consists of instructions interspersed with inline data.
132 All instructions have a length that is an even number of bytes.
133 No variables are kept in the object code area.
135 The first byte of a Pascal interpreter instruction contains an operation
137 This allows a total of 256 major operation codes, and 232 of these are
138 in use in the current
140 The second byte of each interpreter instruction is called the
141 ``sub-operation code'',
144 It contains a small integer that may, for example, be used as a
145 block-structure level for the associated operation.
146 If the instruction can take a longword constant,
147 this constant is often packed into the sub-opcode
148 if it fits into 8 bits and is not zero.
149 A sub-opcode value of zero specifies that the constant would not
150 fit and therefore follows in the next word.
151 This is a space optimization, the value of zero for flagging
152 the longer case being convenient because it is easy to test.
154 Other instruction formats are used.
156 instructions take an offset in the following word,
157 operators that load constants onto the stack
158 take arbitrarily long inline constant values,
159 and many operations deal exclusively with data on the
160 interpreter stack, requiring no inline data.
162 Stack structure of the interpreter
164 The interpreter emulates a stack-structured Pascal machine.
165 The ``load'' instructions put values onto the stack, where all
166 arithmetic operations take place.
167 The ``store'' instructions take values off the stack
168 and place them in an address that is also contained on the stack.
169 The only way to move data or to compute in the machine is with the stack.
171 To make the interpreter operations more powerful
172 and to thereby increase the interpreter speed,
173 the arithmetic operations in the interpreter are ``typed''.
174 That is, length conversion of arithmetic values occurs when they are
175 used in an operation.
176 This eliminates interpreter cycles for length conversion
177 and the associated overhead.
178 For example, when adding an integer that fits in one byte to one that
179 requires four bytes to store, no ``conversion'' operators are required.
180 The one byte integer is loaded onto the stack, followed by the four
181 byte integer, and then an adding operator is used that has, implicit
182 in its definition, the sizes of the arguments.
184 Data types in the interpreter
186 The interpreter deals with several different fundamental data types.
187 In the memory of the machine, 1, 2, and 4 byte integers are supported,
188 with only 2 and 4 byte integers being present on the stack.
189 The interpreter always converts to 4 byte integers when there is a possibility
190 of overflowing the shorter formats.
191 This corresponds to the Pascal language definition of overflow in
192 arithmetic operations that requires that the result be correct
193 if all partial values lie within the bounds of the base integer type:
194 4 byte integer values.
196 Character constants are treated similarly to 1 byte integers for
197 most purposes, as are Boolean values.
198 All enumerated types are treated as integer values of
199 an appropriate length, usually 1 byte.
200 The interpreter also has real numbers, occupying 8 bytes of storage,
201 and sets and strings of varying length.
202 The appropriate operations are included for each data type, such as
203 set union and intersection and an operation to write a string.
207 data formats are supported by the interpreter.
208 The smallest unit of storage occupied by any variable is one byte.
213 thus degenerate to simple memory to memory transfers with
214 no special processing.
218 The interpreter runtime environment uses a stack data area and a heap
219 data area, that are kept at opposite ends of memory
220 and grow towards each other.
221 All global variables and variables local to procedures and functions
222 are kept in the stack area.
223 Dynamically allocated variables and buffers for input/output are
224 allocated in the heap.
226 The addressing of block structured variables is done by using
228 that contains the address of its stack frame
229 for each statically active block.\*(Dg
231 \*(dg\ Here ``block'' is being used to mean any
236 This display is referenced by instructions that load and store
237 variables and maintained by the operations for
238 block entry and exit, and for non-local
244 Three ``global'' variables in the interpreter, in addition to the
252 is a pointer to the display entry for the current block;
255 is the abstract machine location counter;
258 is a register that holds the address of the main interpreter
259 loop so that returning to the loop to fetch the next instruction is
262 The stack frame structure
265 has a stack frame consisting of three parts:
266 a block mark, local variables, and temporary storage for partially
267 evaluated expressions.
268 The stack in the interpreter grows from the high addresses in memory
269 to the low addresses,
270 so that those parts of the stack frame that are ``on the top''
271 of the stack have the most negative offsets from the display
273 The major parts of the stack frame are represented in Figure 1.1.
275 Note that the local variables of each block
276 have negative offsets from the corresponding display entry,
277 the ``first'' local variable having offset `\-2'.
281 The block mark contains the saved information necessary
282 to restore the environment when the current block exits.
283 It consists of two parts.
284 The first and top-most part is saved by the
286 instruction in the interpreter.
287 This information is not present for the main program
288 as it is never ``called''.
289 The second part of the block mark is created by the
291 begin block operator that also allocates and clears the
292 local variable storage.
293 The format of these blocks is represented in Figure 1.2.
297 The data saved by the
299 operator includes the line number
301 of the point of call,
302 that is printed if the program execution ends abnormally;
305 giving the return address;
306 and the current display entry address
312 begin operator saves the previous display contents at the level
313 of this block, so that the display can be restored on block exit.
314 A pointer to the beginning line number and the
315 name of this block is also saved.
316 This information is stored in the interpreter object code in-line after the
319 It is used in printing a post-mortem backtrace.
320 The saved file name and buffer reference are necessary because of
321 the input/output structure
322 (this is discussed in detail in
323 sections 3.3 and 3.4).
324 The top of stack reference gives the value the stack pointer should
325 have when there are no expression temporaries on the stack.
326 It is used for a consistency check in the
328 line number operators in the interpreter, that occurs before
329 each statement executed.
330 This helps to catch bugs in the interpreter, that often manifest
331 themselves by leaving the stack non-empty between statements.
333 Note that there is no explicit static link here.
334 Thus to set up the display correctly after a non-local
336 statement one must ``unwind''
337 through all the block marks on the stack to rebuild the display.
339 Arguments and return values
341 A function returns its value into a space reserved by the calling
345 are placed on top of this return area.
350 calls, arguments are placed at the end of the expression evaluation area
354 completes, expression evaluation can continue
355 after popping the arguments to the
358 exactly as if the function value had been ``loaded''.
361 are also popped off the stack by the caller
362 after its execution ends.
365 As a simple example consider the following stack structure
366 for a call to a function
368 of the form ``f(a)''.
379 the calling sequence for this function would be:
390 Here we use the operator
392 to clear space for the return value,
395 on the stack with a ``right value'' operator,
399 and can then complete evaluation of the containing expression.
400 The operations used here will be explained in section 2.
406 10 \*bfunction\fR f(i: integer): real;
413 would have code sequence:
429 operator takes 9 bytes of inline data.
430 The first byte specifies the
431 length of the function name.
432 The second longword specifies the
433 amount of local variable storage, here none.
434 The succeeding two lines give the line number of the
436 and the name of the block
440 operator places a name pointer in the block mark.
443 first takes an address of the
447 using the address of operator
450 The next operation in the interpretation of this function is the loading
454 is at the level of the
459 and the first variable in the local variable area.
462 completes by assigning the 4 byte integer on the stack to the 8 byte
463 return location, hence the
465 assignment operator, and then uses the
467 operator to exit the current block.
469 The main interpreter loop
471 The main interpreter loop is simply:
475 \fBcaseb\fR (lc)+,$0,$255
476 <table of opcode interpreter addresses>
479 The main opcode is extracted from the first byte of the instruction
480 and used to index into the table of opcode interpreter addresses.
481 Control is then transferred to the specified location.
482 The sub-opcode may be used to index the display,
484 or to specify one of several relational operators.
485 In the cases where a constant is needed, but it
486 is not small enough to fit in the byte sub-operator,
487 a zero is placed there and the constant follows in the next word.
488 Zero is easily tested for,
489 as the instruction that fetches the
490 sub-opcode sets the condition code flags.
500 is all that is needed to effect this packing of data.
501 This technique saves space in the Pascal
505 The address of the instruction at
507 is always contained in the register variable
509 Thus a return to the main interpreter is simply:
513 that is both quick and occupies little space.
517 Errors during interpretation fall into three classes:
519 1) Interpreter detected errors.
520 2) Hardware detected errors.
524 Interpreter detected errors include I/O errors and
525 built-in function errors.
526 These errors cause a subroutine call to an error routine
527 with a single parameter indicating the cause of the error.
528 Hardware errors such as range errors and overflows are
529 fielded by a special routine that determines the opcode
530 that caused the error.
531 It then calls the error routine with an appropriate error
533 External events include interrupts and system limits such
535 They generate a call to the error routine with an
536 appropriate error code.
537 The error routine processes the error condition,
538 printing an appropriate error message and usually
539 a backtrace from the point of the error.