1 .\" Copyright (c) 1979 The Regents of the University of California.
2 .\" All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
12 .\" 3. All advertising materials mentioning features or use of this software
13 .\" must display the following acknowledgement:
14 .\" This product includes software developed by the University of
15 .\" California, Berkeley and its contributors.
16 .\" 4. Neither the name of the University nor the names of its contributors
17 .\" may be used to endorse or promote products derived from this software
18 .\" without specific prior written permission.
20 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
21 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
23 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
24 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
26 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
27 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
28 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
29 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32 .\" @(#)pxin1.n 5.2 (Berkeley) 4/17/91
33 .\" $FreeBSD: src/share/doc/papers/px/pxin1.n,v 1.1.1.1.14.1 2000/11/30 17:13:59 ru Exp $
44 assembly language, using the
50 are also written in the
52 systems programming language C.
54 consists of a main procedure that reads in the interpreter code,
55 a main interpreter loop that transfers successively to various
56 code segments implementing the abstract machine operations,
57 built-in procedures and functions,
58 and several routines that support the implementation of the
59 Pascal input-output environment.
61 The interpreter runs at a fraction of the speed of equivalent
62 compiled C code, with this fraction varying from 1/5 to 1/15.
63 The interpreter occupies 18.5K bytes of instruction space, shared among
64 all processes executing Pascal, and has 4.6K bytes of data space (constants,
65 error messages, etc.) a copy of which is allocated to each executing process.
67 Format of the object file
70 normally interprets the code left in an object file by a run of the
73 The file where the translator puts the object originally, and the most
74 commonly interpreted file, is called
76 In order that all persons using
78 share a common text image, this executable file is
79 a small process that coordinates with the interpreter to start
81 The interpreter code is placed
82 at the end of a special ``header'' file and the size of the initialized
83 data area of this header file is expanded to include this code,
84 so that during execution it is located at an
85 easily determined address in its data space.
86 When executed, the object process creates a
88 creates another process by doing a
90 and arranges that the resulting parent process becomes an instance of
92 The child process then writes the interpreter code through
93 the pipe that it has to the
94 interpreter process parent.
95 When this process is complete, the child exits.
97 The real advantage of this approach is that it does not require modifications
98 to the shell, and that the resultant objects are ``true objects'' not
99 requiring special treatment.
100 A simpler mechanism would be to determine the name of the file that was
101 executed and pass this to the interpreter.
102 However it is not possible to determine this name
105 \*(dd\ For instance, if the
107 program is placed in the directory
109 then when the user types
111 the first argument to the program, nominally the programs name, is
113 While it would be possible to search in the standard place,
114 i.e. the current directory, and the system directories
118 for a corresponding object file,
119 this would be expensive and not guaranteed to succeed.
120 Several shells exist that allow other directories to be searched
121 for commands, and there is,
123 no way to determine what these directories are.
126 General features of object code
128 Pascal object code is relocatable as all addressing references for
129 control transfers within the code are relative.
130 The code consists of instructions interspersed with inline data.
131 All instructions have a length that is an even number of bytes.
132 No variables are kept in the object code area.
134 The first byte of a Pascal interpreter instruction contains an operation
136 This allows a total of 256 major operation codes, and 232 of these are
137 in use in the current
139 The second byte of each interpreter instruction is called the
140 ``sub-operation code'',
143 It contains a small integer that may, for example, be used as a
144 block-structure level for the associated operation.
145 If the instruction can take a longword constant,
146 this constant is often packed into the sub-opcode
147 if it fits into 8 bits and is not zero.
148 A sub-opcode value of zero specifies that the constant would not
149 fit and therefore follows in the next word.
150 This is a space optimization, the value of zero for flagging
151 the longer case being convenient because it is easy to test.
153 Other instruction formats are used.
155 instructions take an offset in the following word,
156 operators that load constants onto the stack
157 take arbitrarily long inline constant values,
158 and many operations deal exclusively with data on the
159 interpreter stack, requiring no inline data.
161 Stack structure of the interpreter
163 The interpreter emulates a stack-structured Pascal machine.
164 The ``load'' instructions put values onto the stack, where all
165 arithmetic operations take place.
166 The ``store'' instructions take values off the stack
167 and place them in an address that is also contained on the stack.
168 The only way to move data or to compute in the machine is with the stack.
170 To make the interpreter operations more powerful
171 and to thereby increase the interpreter speed,
172 the arithmetic operations in the interpreter are ``typed''.
173 That is, length conversion of arithmetic values occurs when they are
174 used in an operation.
175 This eliminates interpreter cycles for length conversion
176 and the associated overhead.
177 For example, when adding an integer that fits in one byte to one that
178 requires four bytes to store, no ``conversion'' operators are required.
179 The one byte integer is loaded onto the stack, followed by the four
180 byte integer, and then an adding operator is used that has, implicit
181 in its definition, the sizes of the arguments.
183 Data types in the interpreter
185 The interpreter deals with several different fundamental data types.
186 In the memory of the machine, 1, 2, and 4 byte integers are supported,
187 with only 2 and 4 byte integers being present on the stack.
188 The interpreter always converts to 4 byte integers when there is a possibility
189 of overflowing the shorter formats.
190 This corresponds to the Pascal language definition of overflow in
191 arithmetic operations that requires that the result be correct
192 if all partial values lie within the bounds of the base integer type:
193 4 byte integer values.
195 Character constants are treated similarly to 1 byte integers for
196 most purposes, as are Boolean values.
197 All enumerated types are treated as integer values of
198 an appropriate length, usually 1 byte.
199 The interpreter also has real numbers, occupying 8 bytes of storage,
200 and sets and strings of varying length.
201 The appropriate operations are included for each data type, such as
202 set union and intersection and an operation to write a string.
206 data formats are supported by the interpreter.
207 The smallest unit of storage occupied by any variable is one byte.
212 thus degenerate to simple memory to memory transfers with
213 no special processing.
217 The interpreter runtime environment uses a stack data area and a heap
218 data area, that are kept at opposite ends of memory
219 and grow towards each other.
220 All global variables and variables local to procedures and functions
221 are kept in the stack area.
222 Dynamically allocated variables and buffers for input/output are
223 allocated in the heap.
225 The addressing of block structured variables is done by using
227 that contains the address of its stack frame
228 for each statically active block.\*(Dg
230 \*(dg\ Here ``block'' is being used to mean any
235 This display is referenced by instructions that load and store
236 variables and maintained by the operations for
237 block entry and exit, and for non-local
243 Three ``global'' variables in the interpreter, in addition to the
251 is a pointer to the display entry for the current block;
254 is the abstract machine location counter;
257 is a register that holds the address of the main interpreter
258 loop so that returning to the loop to fetch the next instruction is
261 The stack frame structure
264 has a stack frame consisting of three parts:
265 a block mark, local variables, and temporary storage for partially
266 evaluated expressions.
267 The stack in the interpreter grows from the high addresses in memory
268 to the low addresses,
269 so that those parts of the stack frame that are ``on the top''
270 of the stack have the most negative offsets from the display
272 The major parts of the stack frame are represented in Figure 1.1.
274 Note that the local variables of each block
275 have negative offsets from the corresponding display entry,
276 the ``first'' local variable having offset `\-2'.
280 The block mark contains the saved information necessary
281 to restore the environment when the current block exits.
282 It consists of two parts.
283 The first and top-most part is saved by the
285 instruction in the interpreter.
286 This information is not present for the main program
287 as it is never ``called''.
288 The second part of the block mark is created by the
290 begin block operator that also allocates and clears the
291 local variable storage.
292 The format of these blocks is represented in Figure 1.2.
296 The data saved by the
298 operator includes the line number
300 of the point of call,
301 that is printed if the program execution ends abnormally;
304 giving the return address;
305 and the current display entry address
311 begin operator saves the previous display contents at the level
312 of this block, so that the display can be restored on block exit.
313 A pointer to the beginning line number and the
314 name of this block is also saved.
315 This information is stored in the interpreter object code in-line after the
318 It is used in printing a post-mortem backtrace.
319 The saved file name and buffer reference are necessary because of
320 the input/output structure
321 (this is discussed in detail in
322 sections 3.3 and 3.4).
323 The top of stack reference gives the value the stack pointer should
324 have when there are no expression temporaries on the stack.
325 It is used for a consistency check in the
327 line number operators in the interpreter, that occurs before
328 each statement executed.
329 This helps to catch bugs in the interpreter, that often manifest
330 themselves by leaving the stack non-empty between statements.
332 Note that there is no explicit static link here.
333 Thus to set up the display correctly after a non-local
335 statement one must ``unwind''
336 through all the block marks on the stack to rebuild the display.
338 Arguments and return values
340 A function returns its value into a space reserved by the calling
344 are placed on top of this return area.
349 calls, arguments are placed at the end of the expression evaluation area
353 completes, expression evaluation can continue
354 after popping the arguments to the
357 exactly as if the function value had been ``loaded''.
360 are also popped off the stack by the caller
361 after its execution ends.
364 As a simple example consider the following stack structure
365 for a call to a function
367 of the form ``f(a)''.
378 the calling sequence for this function would be:
389 Here we use the operator
391 to clear space for the return value,
394 on the stack with a ``right value'' operator,
398 and can then complete evaluation of the containing expression.
399 The operations used here will be explained in section 2.
405 10 \*bfunction\fR f(i: integer): real;
412 would have code sequence:
428 operator takes 9 bytes of inline data.
429 The first byte specifies the
430 length of the function name.
431 The second longword specifies the
432 amount of local variable storage, here none.
433 The succeeding two lines give the line number of the
435 and the name of the block
439 operator places a name pointer in the block mark.
442 first takes an address of the
446 using the address of operator
449 The next operation in the interpretation of this function is the loading
453 is at the level of the
458 and the first variable in the local variable area.
461 completes by assigning the 4 byte integer on the stack to the 8 byte
462 return location, hence the
464 assignment operator, and then uses the
466 operator to exit the current block.
468 The main interpreter loop
470 The main interpreter loop is simply:
474 \fBcaseb\fR (lc)+,$0,$255
475 <table of opcode interpreter addresses>
478 The main opcode is extracted from the first byte of the instruction
479 and used to index into the table of opcode interpreter addresses.
480 Control is then transferred to the specified location.
481 The sub-opcode may be used to index the display,
483 or to specify one of several relational operators.
484 In the cases where a constant is needed, but it
485 is not small enough to fit in the byte sub-operator,
486 a zero is placed there and the constant follows in the next word.
487 Zero is easily tested for,
488 as the instruction that fetches the
489 sub-opcode sets the condition code flags.
499 is all that is needed to effect this packing of data.
500 This technique saves space in the Pascal
504 The address of the instruction at
506 is always contained in the register variable
508 Thus a return to the main interpreter is simply:
512 that is both quick and occupies little space.
516 Errors during interpretation fall into three classes:
518 1) Interpreter detected errors.
519 2) Hardware detected errors.
523 Interpreter detected errors include I/O errors and
524 built-in function errors.
525 These errors cause a subroutine call to an error routine
526 with a single parameter indicating the cause of the error.
527 Hardware errors such as range errors and overflows are
528 fielded by a special routine that determines the opcode
529 that caused the error.
530 It then calls the error routine with an appropriate error
532 External events include interrupts and system limits such
534 They generate a call to the error routine with an
535 appropriate error code.
536 The error routine processes the error condition,
537 printing an appropriate error message and usually
538 a backtrace from the point of the error.