Berkeley Pascal User's Manual Version 3.1 - April 1986 William N. Joy|=, Susan L. Graham, Charles B. Haley|=, Marshall Kirk McKusick, and Peter B. Kessler|= Computer Science Division Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, California 94720 _A_B_S_T_R_A_C_T Berkeley Pascal is designed for interactive instructional use and runs on the PDP/11 and VAX/11 computers. Interpretive code is produced, providing fast translation at the expense of slower execution speed. There is also a fully compatible compiler for the VAX/11. An execution profiler and Wirth's cross reference program are also available with the system. The system supports full Pascal. The language accepted is `standard' Pascal, and a small number of extensions. There is an option to suppress the extensions. The extensions include a separate compilation facility and the ability to link to object modules produced from other source languages. The _U_s_e_r'_s _M_a_n_u_a_l gives a list of sources relating to the UNIX* system, the Pascal language, and the Berkeley Pascal system. Basic usage exam- ples are provided for the Pascal components _p_i, _p_x, _p_i_x, _p_c, and _p_x_p. Errors commonly encountered in these programs are discussed. Details are given of special considerations due to the __________________________ Copyright 1977, 1979, 1980, 1983 W. N. Joy, S. L. Gra- ham, C. B. Haley, M. K. McKusick, P. B. Kessler |=Author's current addresses: William Joy: Sun Microsys- tems, 2550 Garcia Ave., Mountain View, CA 94043; Charles Haley: S & B Associates, 1110 Centennial Ave., Piscataway, NJ 08854; Peter Kessler: Xerox Research Park, Palo Alto, CA * UNIX is a Trademark of Bell Laboratories. 9 9 September 27, 1987 - 2 - interactive implementation. A number of examples are provided including many dealing with input/output. An appendix supplements Wirth's _P_a_s_c_a_l _R_e_p_o_r_t to form the full definition of the Berkeley implementation of the language. _I_n_t_r_o_d_u_c_t_i_o_n The Berkeley Pascal _U_s_e_r'_s _M_a_n_u_a_l consists of five major sections and an appendix. In section 1 we give sources of information about UNIX, about the programming language Pascal, and about the Berkeley implementation of the language. Section 2 introduces the Berkeley implementa- tion and provides a number of tutorial examples. Section 3 discusses the error diagnostics produced by the translators _p_c and _p_i, and the runtime interpreter _p_x. Section 4 describes input/output with special attention given to features of the interactive implementation and to features unique to UNIX. Section 5 gives details on the components of the system and explanation of all relevant options. The _U_s_e_r'_s _M_a_n_u_a_l concludes with an appendix to Wirth's _P_a_s_c_a_l _R_e_p_o_r_t with which it forms a precise definition of the implementation. _H_i_s_t_o_r_y _o_f _t_h_e _i_m_p_l_e_m_e_n_t_a_t_i_o_n The first Berkeley system was written by Ken Thompson in early 1976. The main features of the present system were implemented by Charles Haley and William Joy during the latter half of 1976. Earlier versions of this system have been in use since January, 1977. The system was moved to the VAX-11 by Peter Kessler and Kirk McKusick with the porting of the interpreter in the spring of 1979, and the implementation of the compiler in the summer of 1980. _1. _S_o_u_r_c_e_s _o_f _i_n_f_o_r_m_a_t_i_o_n This section lists the resources available for informa- tion about general features of UNIX, text editing, the Pas- cal language, and the Berkeley Pascal implementation, con- cluding with a list of references. The available documents include both so-called standard documents - those distri- buted with all UNIX system - and documents (such as this one) written at Berkeley. _1._1. _W_h_e_r_e _t_o _g_e_t _d_o_c_u_m_e_n_t_a_t_i_o_n Current documentation for most of the UNIX system is available ``on line'' at your terminal. Details on getting such documentation interactively are given in section 1.3. - 3 - _1._2. _D_o_c_u_m_e_n_t_a_t_i_o_n _d_e_s_c_r_i_b_i_n_g _U_N_I_X The following documents are those recommended as tutorial and reference material about the UNIX system. We give the documents with the introductory and tutorial materials first, the reference materials last. _U_N_I_X _F_o_r _B_e_g_i_n_n_e_r_s - _S_e_c_o_n_d _E_d_i_t_i_o_n This document is the basic tutorial for UNIX available with the standard system. _C_o_m_m_u_n_i_c_a_t_i_n_g _w_i_t_h _U_N_I_X This is also a basic tutorial on the system and assumes no previous familiarity with computers; it was written at Berkeley. _A_n _i_n_t_r_o_d_u_c_t_i_o_n _t_o _t_h_e _C _s_h_e_l_l This document introduces _c_s_h, the shell in common use at Berkeley, and provides a good deal of general description about the way in which the system functions. It provides a useful glossary of terms used in discussing the system. _U_N_I_X _P_r_o_g_r_a_m_m_e_r'_s _M_a_n_u_a_l This manual is the major source of details on the com- ponents of the UNIX system. It consists of an Introduction, a permuted index, and eight command sections. Section 1 consists of descriptions of most of the ``commands'' of UNIX. Most of the other sections have limited relevance to the user of Berkeley Pascal, being of interest mainly to system programmers. UNIX documentation often refers the reader to sections of the manual. Such a reference consists of a command name and a section number or name. An example of such a refer- ence would be: _e_d (1). Here _e_d is a command name - the standard UNIX text editor, and `(1)' indicates that its documentation is in section 1 of the manual. The pieces of the Berkeley Pascal system are _p_i (1), _p_x (1), the combined Pascal translator and interpretive execu- tor _p_i_x (1), the Pascal compiler _p_c (1), the Pascal execu- tion profiler _p_x_p (1), and the Pascal cross-reference gen- erator _p_x_r_e_f (1). It is possible to obtain a copy of a manual section by using the _m_a_n (1) command. To get the Pascal documentation just described one could issue the command: % _m_a_n _p_i - 4 - to the shell. The user input here is shown in _b_o_l_d _f_a_c_e; the `% ', which was printed by the shell as a prompt, is not. Similarly the command: % _m_a_n _m_a_n asks the _m_a_n command to describe itself. _1._3. _T_e_x_t _e_d_i_t_i_n_g _d_o_c_u_m_e_n_t_s The following documents introduce the various UNIX text editors. Most Berkeley users use a version of the text edi- tor _e_x; either _e_d_i_t, which is a version of _e_x for new and casual users, _e_x itself, or _v_i (visual) which focuses on the display editing portion of _e_x. _A _T_u_t_o_r_i_a_l _I_n_t_r_o_d_u_c_t_i_o_n _t_o _t_h_e _U_N_I_X _T_e_x_t _E_d_i_t_o_r This document, written by Brian Kernighan of Bell Laboratories, is a tutorial for the standard UNIX text edi- tor _e_d. It introduces you to the basics of text editing, and provides enough information to meet day-to-day editing needs, for _e_d users. _E_d_i_t: _A _t_u_t_o_r_i_a_l This introduces the use of _e_d_i_t, an editor similar to _e_d which provides a more hospitable environment for begin- ning users. _E_x/_e_d_i_t _C_o_m_m_a_n_d _S_u_m_m_a_r_y This summarizes the features of the editors _e_x and _e_d_i_t in a concise form. If you have used a line oriented editor before this summary alone may be enough to get you started. _E_x _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l - _V_e_r_s_i_o_n _3._7 A complete reference on the features of _e_x and _e_d_i_t. _A_n _I_n_t_r_o_d_u_c_t_i_o_n _t_o _D_i_s_p_l_a_y _E_d_i_t_i_n_g _w_i_t_h _V_i _V_i is a display oriented text editor. It can be used on most any CRT terminal, and uses the screen as a window into the file you are editing. Changes you make to the file are reflected in what you see. This manual serves both as an introduction to editing with _v_i and a reference manual. _V_i _Q_u_i_c_k _R_e_f_e_r_e_n_c_e This reference card is a handy quick guide to _v_i; you should get one when you get the introduction to _v_i. 9 9 - 5 - _1._4. _P_a_s_c_a_l _d_o_c_u_m_e_n_t_s - _T_h_e _l_a_n_g_u_a_g_e This section describes the documents on the Pascal language which are likely to be most useful to the Berkeley Pascal user. Complete references for these documents are given in section 1.7. _P_a_s_c_a_l _U_s_e_r _M_a_n_u_a_l By Kathleen Jensen and Niklaus Wirth, the _U_s_e_r _M_a_n_u_a_l provides a tutorial introduction to the features of the language Pascal, and serves as an excellent quick-reference to the language. The reader with no familiarity with Algol-like languages may prefer one of the Pascal text books listed below, as they provide more examples and explanation. Particularly important here are pages 116-118 which define the syntax of the language. Sections 13 and 14 and Appendix F pertain only to the 6000-3.4 implementation of Pascal. _P_a_s_c_a_l _R_e_p_o_r_t By Niklaus Wirth, this document is bound with the _U_s_e_r _M_a_n_u_a_l. It is the guiding reference for implementors and the fundamental definition of the language. Some program- mers find this report too concise to be of practical use, preferring the _U_s_e_r _M_a_n_u_a_l as a reference. _B_o_o_k_s _o_n _P_a_s_c_a_l Several good books which teach Pascal or use it as a medium are available. The books by Wirth _S_y_s_t_e_m_a_t_i_c _P_r_o_- _g_r_a_m_m_i_n_g and _A_l_g_o_r_i_t_h_m_s + _D_a_t_a _S_t_r_u_c_t_u_r_e_s = _P_r_o_g_r_a_m_s use Pascal as a vehicle for teaching programming and data struc- ture concepts respectively. They are both recommended. Other books on Pascal are listed in the references below. _1._5. _P_a_s_c_a_l _d_o_c_u_m_e_n_t_s - _T_h_e _B_e_r_k_e_l_e_y _I_m_p_l_e_m_e_n_t_a_t_i_o_n This section describes the documentation which is available describing the Berkeley implementation of Pascal. _U_s_e_r'_s _M_a_n_u_a_l The document you are reading is the _U_s_e_r'_s _M_a_n_u_a_l for Berkeley Pascal. We often refer the reader to the Jensen- Wirth _U_s_e_r _M_a_n_u_a_l mentioned above, a different document with a similar name. _M_a_n_u_a_l _s_e_c_t_i_o_n_s The sections relating to Pascal in the _U_N_I_X _P_r_o_g_r_a_m_m_e_r'_s _M_a_n_u_a_l are _p_i_x (1), _p_i (1), _p_c (1), _p_x (1), _p_x_p (1), and _p_x_r_e_f (1). These sections give a description of each program, summarize the available options, indicate - 6 - files used by the program, give basic information on the diagnostics produced and include a list of known bugs. _I_m_p_l_e_m_e_n_t_a_t_i_o_n _n_o_t_e_s For those interested in the internal organization of the Berkeley Pascal system there are a series of _I_m_p_l_e_m_e_n_t_a_- _t_i_o_n _N_o_t_e_s describing these details. The _B_e_r_k_e_l_e_y _P_a_s_c_a_l _P_X_P _I_m_p_l_e_m_e_n_t_a_t_i_o_n _N_o_t_e_s describe the Pascal interpreter _p_x; and the _B_e_r_k_e_l_e_y _P_a_s_c_a_l _P_X _I_m_p_l_e_m_e_n_t_a_t_i_o_n _N_o_t_e_s describe the structure of the execution profiler _p_x_p. _1._6. _R_e_f_e_r_e_n_c_e_s _U_N_I_X _D_o_c_u_m_e_n_t_s _C_o_m_m_u_n_i_c_a_t_i_n_g _W_i_t_h _U_N_I_X Computer Center University of California, Berkeley January, 1978. Ricki Blau and James Joyce _E_d_i_t: _a _t_u_t_o_r_i_a_l UNIX User's Supplementary Documents (USD), 14 University of California, Berkeley, CA. 94720 April, 1986. _E_x/_e_d_i_t _C_o_m_m_a_n_d _S_u_m_m_a_r_y Computer Center University of California, Berkeley August, 1978. William Joy _E_x _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l - _V_e_r_s_i_o_n _3._7 UNIX User's Supplementary Documents (USD), 16 University of California, Berkeley, CA. 94720 April, 1986. William Joy _A_n _I_n_t_r_o_d_u_c_t_i_o_n _t_o _D_i_s_p_l_a_y _E_d_i_t_i_n_g _w_i_t_h _V_i UNIX User's Supplementary Documents (USD), 15 University of California, Berkeley, CA. 94720 April, 1986. William Joy _A_n _I_n_t_r_o_d_u_c_t_i_o_n _t_o _t_h_e _C _s_h_e_l_l (_R_e_v_i_s_e_d) UNIX User's Supplementary Documents (USD), 4 University of California, Berkeley, CA. 94720 - 7 - April, 1986. Brian W. Kernighan _U_N_I_X _f_o_r _B_e_g_i_n_n_e_r_s - _S_e_c_o_n_d _E_d_i_t_i_o_n UNIX User's Supplementary Documents (USD), 1 University of California, Berkeley, CA. 94720 April, 1986. Brian W. Kernighan _A _T_u_t_o_r_i_a_l _I_n_t_r_o_d_u_c_t_i_o_n _t_o _t_h_e _U_N_I_X _T_e_x_t _E_d_i_t_o_r UNIX User's Supplementary Documents (USD), 12 University of California, Berkeley, CA. 94720 April, 1986. Dennis M. Ritchie and Ken Thompson _T_h_e _U_N_I_X _T_i_m_e _S_h_a_r_i_n_g _S_y_s_t_e_m Reprinted from Communications of the ACM July 1974 in UNIX Programmer's Supplementary Documents, Volume 2 (PS2), 1 University of California, Berkeley, CA. 94720 April, 1986. _P_a_s_c_a_l _L_a_n_g_u_a_g_e _D_o_c_u_m_e_n_t_s Cooper and Clancy _O_h! _P_a_s_c_a_l!, _2_n_d _E_d_i_t_i_o_n W. W. Norton & Company, Inc. 500 Fifth Ave., NY, NY. 10110 1985, 475 pp. Cooper _S_t_a_n_d_a_r_d _P_a_s_c_a_l _U_s_e_r _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l W. W. Norton & Company, Inc. 500 Fifth Ave., NY, NY. 10110 1983, 176 pp. Kathleen Jensen and Niklaus Wirth _P_a_s_c_a_l - _U_s_e_r _M_a_n_u_a_l _a_n_d _R_e_p_o_r_t Springer-Verlag, New York. 1975, 167 pp. Niklaus Wirth _A_l_g_o_r_i_t_h_m_s + _D_a_t_a _s_t_r_u_c_t_u_r_e_s = _P_r_o_g_r_a_m_s Prentice-Hall, New York. 1976, 366 pp. 9 9 - 8 - _B_e_r_k_e_l_e_y _P_a_s_c_a_l _d_o_c_u_m_e_n_t_s The following documents are available from the Computer Center Library at the University of California, Berkeley. William N. Joy _B_e_r_k_e_l_e_y _P_a_s_c_a_l _P_X _I_m_p_l_e_m_e_n_t_a_t_i_o_n _N_o_t_e_s Version 1.1, April 1979. (Vax-11 Version 2.0 By Kirk McKusick, December, 1979) William N. Joy _B_e_r_k_e_l_e_y _P_a_s_c_a_l _P_X_P _I_m_p_l_e_m_e_n_t_a_t_i_o_n _N_o_t_e_s Version 1.1, April 1979. _2. _B_a_s_i_c _U_N_I_X _P_a_s_c_a_l The following sections explain the basics of using Berkeley Pascal. In examples here we use the text editor _e_x (1). Users of the text editor _e_d should have little trouble following these examples, as _e_x is similar to _e_d. We use _e_x because it allows us to make clearer examples.|- The new UNIX user will find it helpful to read one of the text editor documents described in section 1.4 before continuing with this section. _2._1. _A _f_i_r_s_t _p_r_o_g_r_a_m To prepare a program for Berkeley Pascal we first need to have an account on UNIX and to `login' to the system on this account. These procedures are described in the docu- ments _C_o_m_m_u_n_i_c_a_t_i_n_g _w_i_t_h _U_N_I_X and _U_N_I_X _f_o_r _B_e_g_i_n_n_e_r_s. Once we are logged in we need to choose a name for our program; let us call it `first' as this is the first exam- ple. We must also choose a name for the file in which the program will be stored. The Berkeley Pascal system requires that programs reside in files which have names ending with the sequence `.p' so we will call our file `first.p'. A sample editing session to create this file would begin: % _e_x _f_i_r_s_t._p "first.p" [New file] : __________________________ |- Users with CRT terminals should find the editor _v_i more pleasant to use; we do not show its use here be- cause its display oriented nature makes it difficult to illustrate. 9 9 - 9 - We didn't expect the file to exist, so the error diagnostic doesn't bother us. The editor now knows the name of the file we are creating. The `:' prompt indicates that it is ready for command input. We can add the text for our pro- gram using the `append' command as follows. :_a_p_p_e_n_d _p_r_o_g_r_a_m _f_i_r_s_t(_o_u_t_p_u_t) _b_e_g_i_n _w_r_i_t_e_l_n('_H_e_l_l_o, _w_o_r_l_d!') _e_n_d. . : The line containing the single `.' character here indicated the end of the appended text. The `:' prompt indicates that _e_x is ready for another command. As the editor operates in a temporary work space we must now store the contents of this work space in the file `first.p' so we can use the Pas- cal translator and executor _p_i_x on it. :_w_r_i_t_e "first.p" [New file] 4 lines, 59 characters :_q_u_i_t % We wrote out the file from the edit buffer here with the `write' command, and _e_x indicated the number of lines and characters written. We then quit the editor, and now have a prompt from the shell.|= We are ready to try to translate and execute our pro- gram. % _p_i_x _f_i_r_s_t._p Mon Apr 14 19:00 1986 first.p: 2 begin e =>=>=>|^ => Inserted ';' Execution begins... Hello, world! Execution terminated. 1 statements executed in 0.02 seconds cpu time. % The translator first printed a syntax error diagnostic. The number 2 here indicates that the rest of the line is an image of the second line of our program. The translator is __________________________ |= Our examples here assume you are using _c_s_h. 9 9 - 10 - saying that it expected to find a `;' before the keyword _b_e_g_i_n on this line. If we look at the Pascal syntax charts in the Jensen-Wirth _U_s_e_r _M_a_n_u_a_l, or at some of the sample programs therein, we will see that we have omitted the ter- minating `;' of the _p_r_o_g_r_a_m statement on the first line of our program. One other thing to notice about the error diagnostic is the letter `e' at the beginning. It stands for `error', indicating that our input was not legal Pascal. The fact that it is an `e' rather than an `E' indicates that the translator managed to recover from this error well enough that generation of code and execution could take place. Execution is possible whenever no fatal `E' errors occur during translation. The other classes of diagnostics are `w' warnings, which do not necessarily indicate errors in the program, but point out inconsistencies which are likely to be due to program bugs, and `s' standard-Pascal viola- tions.+ After completing the translation of the program to interpretive code, the Pascal system indicates that execu- tion of the translated program began. The output from the execution of the program then appeared. At program termina- tion, the Pascal runtime system indicated the number of statements executed, and the amount of cpu time used, with the resolution of the latter being 1/60'th of a second. Let us now fix the error in the program and translate it to a permanent object code file _o_b_j using _p_i. The pro- gram _p_i translates Pascal programs but stores the object code instead of executing it*. % _e_x _f_i_r_s_t._p "first.p" 4 lines, 59 characters :_1 _p_r_i_n_t program first(output) :_s/$/; program first(output); :_w_r_i_t_e __________________________ +The standard Pascal warnings occur only when the asso- ciated _s translator option is enabled. The _s option is discussed in sections 5.1 and A.6 below. Warning diag- nostics are discussed at the end of section 3.2, the associated _w option is described in section 5.2. *This script indicates some other useful approaches to debugging Pascal programs. As in _e_d we can shorten commands in _e_x to an initial prefix of the command name as we did with the _s_u_b_s_t_i_t_u_t_e command here. We have also used the `!' shell escape command here to execute other commands with a shell without leaving the editor. 9 9 - 11 - "first.p" 4 lines, 60 characters :_q_u_i_t % _p_i _f_i_r_s_t._p % If we now use the UNIX _l_s list files command we can see what files we have: % _l_s first.p obj % The file `obj' here contains the Pascal interpreter code. We can execute this by typing: % _p_x _o_b_j Hello, world! 1 statements executed in 0.01 seconds cpu time. % Alternatively, the command: % _o_b_j will have the same effect. Some examples of different ways to execute the program follow. % _p_x Hello, world! 1 statements executed in 0.01 seconds cpu time. % _p_i -_p _f_i_r_s_t._p % _p_x _o_b_j Hello, world! % _p_i_x -_p _f_i_r_s_t._p Hello, world! % Note that _p_x will assume that `obj' is the file we wish to execute if we don't tell it otherwise. The last two translations use the -_p no-post-mortem option to eliminate execution statistics and `Execution begins' and `Execution terminated' messages. See section 5.2 for more details. If we now look at the files in our directory we will see: % _l_s - 12 - first.p obj % We can give our object program a name other than `obj' by using the move command _m_v (1). Thus to name our program `hello': % _m_v _o_b_j _h_e_l_l_o % _h_e_l_l_o Hello, world! % _l_s first.p hello % Finally we can get rid of the Pascal object code by using the _r_m (1) remove file command, e.g.: % _r_m _h_e_l_l_o % _l_s first.p % For small programs which are being developed _p_i_x tends to be more convenient to use than _p_i and _p_x. Except for absence of the _o_b_j file after a _p_i_x run, a _p_i_x command is equivalent to a _p_i command followed by a _p_x command. For larger programs, where a number of runs testing different parts of the program are to be made, _p_i is useful as this _o_b_j file can be executed any desired number of times. _2._2. _A _l_a_r_g_e_r _p_r_o_g_r_a_m Suppose that we have used the editor to put a larger program in the file `bigger.p'. We can list this program with line numbers by using the program _c_a_t-n i.e.: % _c_a_t -_n _b_i_g_g_e_r._p 1 (* 2 * Graphic representation of a function 3 * f(x) = exp(-x) * sin(2 * pi * x) 4 *) 5 program graph1(output); 6 const 7 d = 0.0625; (* 1/16, 16 lines for interval [x, x+1] *) 8 s = 32; (* 32 character width for interval [x, x+1] 9 h = 34; (* Character position of x-axis *) 10 c = 6.28138; (* 2 * pi *) 11 lim = 32; - 13 - 12 var 13 x, y: real; 14 i, n: integer; 15 begin 16 for i := 0 to lim begin 17 x := d / i; 18 y := exp(-x9 * sin(i * x); 19 n := Round(s * y) + h; 20 repeat 21 write(' '); 22 n := n - 1 23 writeln('*') 24 end. % This program is similar to program 4.9 on page 30 of the Jensen-Wirth _U_s_e_r _M_a_n_u_a_l. A number of problems have been introduced into this example for pedagogical reasons. If we attempt to translate and execute the program using _p_i_x we get the following response: % _p_i_x _b_i_g_g_e_r._p Mon Apr 14 19:00 1986 bigger.p: 9 h = 34; (* Character position of x-axis *) w =>=>=>=>=>=>=>=>=>=>=>=>=>=>|^ => (* in a (* ... *) comment 16 for i := 0 to lim begin e =>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>|^ => Inserted keyword do 18 y := exp(-x9 * sin(i * x); E =>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>|^ => Undefined variable e =>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>|^ => Inserted ')' 19 n := Round(s * y) + h; E =>=>=>=>=>=>=>=>=>=>=>=>=>|^ => Undefined function E =>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>|^ => Undefined variable 23 writeln('*') e =>=>=>=>=>=>=>=>=>=>=>|^ => Inserted ';' 24 end. E =>=>=>|^ => Expected keyword until E =>=>=>=>=>|^ => Malformed declaration E =>=>=>=>=>|^ => Unexpected end-of-file - QUIT Execution suppressed due to compilation errors % Since there were fatal `E' errors in our program, no code was generated and execution was necessarily suppressed. One thing which would be useful at this point is a listing of the program with the error messages. We can get this by using the command: % _p_i -_l _b_i_g_g_e_r._p 9 9 - 14 - There is no point in using _p_i_x here, since we know there are fatal errors in the program. This command will produce the output at our terminal. If we are at a terminal which does not produce a hard copy we may wish to print this listing off-line on a line printer. We can do this with the com- mand: % _p_i -_l _b_i_g_g_e_r._p | _l_p_r In the next few sections we will illustrate various aspects of the Berkeley Pascal system by correcting this program. _2._3. _C_o_r_r_e_c_t_i_n_g _t_h_e _f_i_r_s_t _e_r_r_o_r_s Most of the errors which occurred in this program were _s_y_n_t_a_c_t_i_c errors, those in the format and structure of the program rather than its content. Syntax errors are flagged by printing the offending line, and then a line which flags the location at which an error was detected. The flag line also gives an explanation stating either a possible cause of the error, a simple action which can be taken to recover from the error so as to be able to continue the analysis, a symbol which was expected at the point of error, or an indi- cation that the input was `malformed'. In the last case, the recovery may skip ahead in the input to a point where analysis of the program can continue. In this example, the first error diagnostic indicates that the translator detected a comment within a comment. While this is not considered an error in `standard' Pascal, it usually corresponds to an error in the program which is being translated. In this case, we have accidentally omit- ted the trailing `*)' of the comment on line 8. We can begin an editor session to correct this problem by doing: % _e_x _b_i_g_g_e_r._p "bigger.p" 24 lines, 512 characters :_8_s/$/ *) s = 32; (* 32 character width for interval [x, x+1] *) : The second diagnostic, given after line 16, indicates that the keyword _d_o was expected before the keyword _b_e_g_i_n in the _f_o_r statement. If we examine the _s_t_a_t_e_m_e_n_t syntax chart on page 118 of the Jensen-Wirth _U_s_e_r _M_a_n_u_a_l we will discover that _d_o is a necessary part of the _f_o_r statement. Simi- larly, we could have referred to section C.3 of the Jensen- Wirth _U_s_e_r _M_a_n_u_a_l to learn about the _f_o_r statement and got- ten the same information there. It is often useful to refer - 15 - to these syntax charts and to the relevant sections of this book. We can correct this problem by first scanning for the keyword _f_o_r in the file and then substituting the keyword _d_o to appear in front of the keyword _b_e_g_i_n there. Thus: :/_f_o_r for i := 0 to lim begin :_s/_b_e_g_i_n/_d_o & for i := 0 to lim do begin : The next error in the program is easy to pinpoint. On line 18, we didn't hit the shift key and got a `9' instead of a `)'. The translator diagnosed that `x9' was an undefined variable and, later, that a `)' was missing in the state- ment. It should be stressed that _p_i is not suggesting that you should insert a `)' before the `;'. It is only indicat- ing that making this change will help it to be able to con- tinue analyzing the program so as to be able to diagnose further errors. You must then determine the true cause of the error and make the appropriate correction to the source text. This error also illustrates the fact that one error in the input may lead to multiple error diagnostics. _P_i attempts to give only one diagnostic for each error, but single errors in the input sometimes appear to be more than one error. It is also the case that _p_i may not detect an error when it occurs, but may detect it later in the input. This would have happened in this example if we had typed `x' instead of `x9'. The translator next detected, on line 19, that the function _R_o_u_n_d and the variable _h were undefined. It does not know about _R_o_u_n_d because Berkeley Pascal normally dis- tinguishes between upper and lower case.+ On UNIX lower-case is preferred*, and all keywords and built-in _p_r_o_c_e_d_u_r_e and _f_u_n_c_t_i_o_n names are composed of lower-case letters, just as they are in the Jensen-Wirth _P_a_s_c_a_l _R_e_p_o_r_t. Thus we need to use the function _r_o_u_n_d here. As far as _h is concerned, we can see why it is undefined if we look back to line 9 and note that its definition was lost in the non-terminated com- ment. This diagnostic need not, therefore, concern us. The next error which occurred in the program caused the __________________________ +In ``standard'' Pascal no distinction is made based on case. *One good reason for using lower-case is that it is easier to type. 9 9 - 16 - translator to insert a `;' before the statement calling _w_r_i_- _t_e_l_n on line 23. If we examine the program around the point of error we will see that the actual error is that the key- word _u_n_t_i_l and an associated expression have been omitted here. Note that the diagnostic from the translator does not indicate the actual error, and is somewhat misleading. The translator made the correction which seemed to be most plau- sible. As the omission of a `;' character is a common mis- take, the translator chose to indicate this as a possible fix here. It later detected that the keyword _u_n_t_i_l was missing, but not until it saw the keyword _e_n_d on line 24. The combination of these diagnostics indicate to us the true problem. The final syntactic error message indicates that the translator needed an _e_n_d keyword to match the _b_e_g_i_n at line 15. Since the _e_n_d at line 24 is supposed to match this _b_e_g_i_n, we can infer that another _b_e_g_i_n must have been mismatched, and have matched this _e_n_d. Thus we see that we need an _e_n_d to match the _b_e_g_i_n at line 16, and to appear before the final _e_n_d. We can make these corrections: :/_x_9/_s//_x) y := exp(-x) * sin(i * x); :+_s/_R_o_u_n_d/_r_o_u_n_d n := round(s * y) + h; :/_w_r_i_t_e write(' '); :/ writeln('*') :_i_n_s_e_r_t _u_n_t_i_l _n = _0; . :$ end. :_i_n_s_e_r_t _e_n_d . : At the end of each _p_r_o_c_e_d_u_r_e or _f_u_n_c_t_i_o_n and the end of the _p_r_o_g_r_a_m the translator summarizes references to unde- fined variables and improper usages of variables. It also gives warnings about potential errors. In our program, the summary errors do not indicate any further problems but the warning that _c is unused is somewhat suspicious. Examining the program we see that the constant was intended to be used in the expression which is an argument to _s_i_n, so we can correct this expression, and translate the program. We have now made a correction for each diagnosed error in our pro- gram. 9 9 - 17 - :?_i ?_s//_c / y := exp(-x) * sin(c * x); :_w_r_i_t_e "bigger.p" 26 lines, 538 characters :_q_u_i_t % _p_i _b_i_g_g_e_r._p % It should be noted that the translator suppresses warning diagnostics for a particular _p_r_o_c_e_d_u_r_e, _f_u_n_c_t_i_o_n or the main _p_r_o_g_r_a_m when it finds severe syntax errors in that part of the source text. This is to prevent possibly confusing and incorrect warning diagnostics from being produced. Thus these warning diagnostics may not appear in a program with bad syntax errors until these errors are corrected. We are now ready to execute our program for the first time. We will do so in the next section after giving a listing of the corrected program for reference purposes. % _c_a_t -_n _b_i_g_g_e_r._p 1 (* 2 * Graphic representation of a function 3 * f(x) = exp(-x) * sin(2 * pi * x) 4 *) 5 program graph1(output); 6 const 7 d = 0.0625; (* 1/16, 16 lines for interval [x, x+1] *) 8 s = 32; (* 32 character width for interval [x, x+1] *) 9 h = 34; (* Character position of x-axis *) 10 c = 6.28138; (* 2 * pi *) 11 lim = 32; 12 var 13 x, y: real; 14 i, n: integer; 15 begin 16 for i := 0 to lim do begin 17 x := d / i; 18 y := exp(-x) * sin(c * x); 19 n := round(s * y) + h; 20 repeat 21 write(' '); 22 n := n - 1 23 until n = 0; 24 writeln('*') 25 end 26 end. % 9 9 - 18 - _2._4. _E_x_e_c_u_t_i_n_g _t_h_e _s_e_c_o_n_d _e_x_a_m_p_l_e We are now ready to execute the second example. The following output was produced by our first run. % _p_x Execution begins... Real division by zero Error in "graph1"+2 near line 17. Execution terminated abnormally. 2 statements executed in 0.01 seconds cpu time. % Here the interpreter is presenting us with a runtime error diagnostic. It detected a `division by zero' at line 17. Examining line 17, we see that we have written the statement `x := d / i' instead of `x := d * i'. We can correct this and rerun the program: % _e_x _b_i_g_g_e_r._p "bigger.p" 26 lines, 538 characters :_1_7 x := d / i :_s'/'* x := d * i :_w_r_i_t_e "bigger.p" 26 lines, 538 characters :_q % _p_i_x _b_i_g_g_e_r._p Execution begins... * * * * * * * * * * * * * * * * * * * - 19 - * * * * * * * * * * * * * * Execution terminated. 2550 statements executed in 0.67 seconds cpu time. % This appears to be the output we wanted. We could now save the output in a file if we wished by using the shell to redirect the output: % _p_x > _g_r_a_p_h We can use _c_a_t (1) to see the contents of the file graph. We can also make a listing of the graph on the line printer without putting it into a file, e.g. % _p_x | _l_p_r Execution begins... Execution terminated. 2550 statements executed in 0.68 seconds cpu time. % Note here that the statistics lines came out on our termi- nal. The statistics line comes out on the diagnostic output (unit 2.) There are two ways to get rid of the statistics line. We can redirect the statistics message to the printer using the syntax `|&' to the shell rather than `|', i.e.: % _p_x |& _l_p_r % or we can translate the program with the _p option disabled on the command line as we did above. This will disable all post-mortem dumping including the statistics line, thus: 9 9 - 20 - % _p_i -_p _b_i_g_g_e_r._p % _p_x | _l_p_r % This option also disables the statement limit which normally guards against infinite looping. You should not use it until your program is debugged. Also if _p is specified and an error occurs, you will not get run time diagnostic infor- mation to help you determine what the problem is. _2._5. _F_o_r_m_a_t_t_i_n_g _t_h_e _p_r_o_g_r_a_m _l_i_s_t_i_n_g It is possible to use special lines within the source text of a program to format the program listing. An empty line (one with no characters on it) corresponds to a `space' macro in an assembler, leaving a completely blank line without a line number. A line containing only a control-l (form-feed) character will cause a page eject in the listing with the corresponding line number suppressed. This corresponds to an `eject' pseudo-instruction. See also sec- tion 5.2 for details on the _n and _i options of _p_i. _2._6. _E_x_e_c_u_t_i_o_n _p_r_o_f_i_l_i_n_g An execution profile consists of a structured listing of (all or part of) a program with information about the number of times each statement in the program was executed for a particular run of the program. These profiles can be used for several purposes. In a program which was abnor- mally terminated due to excessive looping or recursion or by a program fault, the counts can facilitate location of the error. Zero counts mark portions of the program which were not executed; during the early debugging stages they should prompt new test data or a re-examination of the program logic. The profile is perhaps most valuable, however, in drawing attention to the (typically small) portions of the program that dominate execution time. This information can be used for source level optimization. _A_n _e_x_a_m_p_l_e A prime number is a number which is divisible only by itself and the number one. The program _p_r_i_m_e_s, written by Niklaus Wirth, determines the first few prime numbers. In translating the program we have specified the _z option to _p_i_x. This option causes the translator to generate counters and count instructions sufficient in number to determine the number of times each statement in the program was executed.+ __________________________ +The counts are completely accurate only in the absence of runtime errors and nonlocal _g_o_t_o statements. This is not generally a problem, however, as in structured programs nonlocal _g_o_t_o statements occur infrequently, and counts are incorrect after abnormal termination - 21 - When execution of the program completes, either normally or abnormally, this count data is written to the file _p_m_o_n._o_u_t in the current directory.* It is then possible to prepare an execution profile by giving _p_x_p the name of the file associ- ated with this data, as was done in the following example. % _p_i_x -_l -_z _p_r_i_m_e_s._p Berkeley Pascal PI -- Version 3.1 (9/7/85) Mon Apr 14 19:00 1986 primes.p 1 program primes(output); 2 const n = 50; n1 = 7; (*n1 = sqrt(n)*) 3 var i,k,x,inc,lim,square,l: integer; 4 prim: boolean; 5 p,v: array[1..n1] of integer; 6 begin 7 write(2:6, 3:6); l := 2; 8 x := 1; inc := 4; lim := 1; square := 9; 9 for i := 3 to n do 10 begin (*find next prime*) 11 repeat x := x + inc; inc := 6-inc; 12 if square <= x then 13 begin lim := lim+1; 14 v[lim] := square; square := sqr(p[lim+1]) 15 end ; 16 k := 2; prim := true; 17 while prim and (k v[k] 21 end 22 until prim; 23 if i <= n1 then p[i] := x; 24 write(x:6); l := l+1; 25 if l = 10 then 26 begin writeln; l := 0 27 end 28 end ; 29 writeln; 30 end . Execution begins... 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 __________________________ only when the _u_p_w_a_r_d _l_o_o_k described below to get a count passes a suspended call point. *_P_m_o_n._o_u_t has a name similar to _m_o_n._o_u_t the monitor file produced by the profiling facility of the C com- piler _c_c (1). See _p_r_o_f (1) for a discussion of the C compiler profiling facilities. 9 9 - 22 - 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 Execution terminated. 1404 statements executed in 0.37 seconds cpu time. % _D_i_s_c_u_s_s_i_o_n The header lines of the outputs of _p_i_x and _p_x_p in this example indicate the version of the translator and execution profiler in use at the time this example was prepared. The time given with the file name (also on the header line) indicates the time of last modification of the program source file. This time serves to _v_e_r_s_i_o_n _s_t_a_m_p the input program. _P_x_p also indicates the time at which the profile data was gathered. % _p_x_p -_z _p_r_i_m_e_s._p Berkeley Pascal PXP -- Version 2.13 (4/2/84) Mon Apr 14 19:00 1986 primes.p Profiled Sun Sep 27 17:22 1987 1 1. =>|program primes(output); 2 |const 2 | n = 50; 2 | n1 = 7; (*n1 = sqrt(n)*) 3 |var 3 | i, k, x, inc, lim, square, l: integer; 4 | prim: boolean; 5 | p, v: array [1..n1] of integer; 6 |begin 7 | write(2: 6, 3: 6); 7 | l := 2; 8 | x := 1; 8 | inc := 4; 8 | lim := 1; 8 | square := 9; 9 | for i := 3 to n do begin (*find next prime*) 9 48. =>| repeat 11 76. =>| x := x + inc; 11 | inc := 6 - inc; 12 | if square <= x then begin 13 5. =>| lim := lim + 1; 14 | v[lim] := square; 14 | square := sqr(p[lim + 1]) 14 | end; 16 | k := 2; 16 | prim := true; - 23 - 17 | while prim and (k < lim) do begin 18 157. =>| k := k + 1; 19 | if v[k] < x then 19 42. =>| v[k] := v[k] + 2 * p[k]; 20 | prim := x <> v[k] 20 | end 20 |until prim; 23 | if i <= n1 then 23 5. =>| p[i] := x; 24 | write(x: 6); 24 | l := l + 1; 25 | if l = 10 then begin 26 5. =>| writeln; 26 | l := 0 26 | end 26 | end; 29 | writeln 29 |end. % To determine the number of times a statement was exe- cuted, one looks to the left of the statement and finds the corresponding vertical bar `|'. If this vertical bar is labelled with a count then that count gives the number of times the statement was executed. If the bar is not labelled, we look up in the listing to find the first `|' which directly above the original one which has a count and that is the answer. Thus, in our example, _k was incremented 157 times on line 18, while the _w_r_i_t_e procedure call on line 24 was executed 48 times as given by the count on the _r_e_p_e_a_t. More information on _p_x_p can be found in its manual sec- tion _p_x_p (1) and in sections 5.4, 5.5 and 5.10. _3. _E_r_r_o_r _d_i_a_g_n_o_s_t_i_c_s This section of the _U_s_e_r'_s _M_a_n_u_a_l discusses the error diagnostics of the programs _p_i, _p_c and _p_x. _P_i_x is a simple but useful program which invokes _p_i and _p_x to do all the real processing. See its manual section _p_i_x (1) and section 5.2 below for more details. All the diagnostics given by _p_i will also be given by _p_c. _3._1. _T_r_a_n_s_l_a_t_o_r _s_y_n_t_a_x _e_r_r_o_r_s A few comments on the general nature of the syntax errors usually made by Pascal programmers and the recovery mechanisms of the current translator may help in using the system. 9 9 - 24 - _I_l_l_e_g_a_l _c_h_a_r_a_c_t_e_r_s Characters such as `$', `!', and `@' are not part of the language Pascal. If they are found in the source pro- gram, and are not part of a constant string, a constant character, or a comment, they are considered to be `illegal characters'. This can happen if you leave off an opening string quote `''. Note that the character `"', although used in English to quote strings, is not used to quote strings in Pascal. Most non-printing characters in your input are also illegal except in character constants and character strings. Except for the tab and form feed charac- ters, which are used to ease formatting of the program, non-printing characters in the input file print as the char- acter `?' so that they will show in your listing. _S_t_r_i_n_g _e_r_r_o_r_s There is no character string of length 0 in Pascal. Consequently the input `''' is not acceptable. Similarly, encountering an end-of-line after an opening string quote `'' without encountering the matching closing quote yields the diagnostic ``Unmatched ' for string''. It is permissi- ble to use the character `#' instead of `'' to delimit char- acter and constant strings for portability reasons. For this reason, a spuriously placed `#' sometimes causes the diagnostic about unbalanced quotes. Similarly, a `#' in column one is used when preparing programs which are to be kept in multiple files. See section 5.11 for details. _C_o_m_m_e_n_t_s _i_n _a _c_o_m_m_e_n_t, _n_o_n-_t_e_r_m_i_n_a_t_e_d _c_o_m_m_e_n_t_s As we saw above, these errors are usually caused by leaving off a comment delimiter. You can convert parts of your program to comments without generating this diagnostic since there are two different kinds of comments - those del- imited by `{' and `}', and those delimited by `(*' and `*)'. Thus consider: { This is a comment enclosing a piece of program a := functioncall; (* comment within comment *) procedurecall; lhs := rhs; (* another comment *) } By using one kind of comment exclusively in your pro- gram you can use the other delimiters when you need to ``comment out'' parts of your program+. In this way you __________________________ +If you wish to transport your program, especially to the 6000-3.4 implementation, you should use the charac- ter sequence `(*' to delimit comments. For transporta- - 25 - will also allow the translator to help by detecting state- ments accidentally placed within comments. If a comment does not terminate before the end of the input file, the translator will point to the beginning of the comment, indicating that the comment is not terminated. In this case processing will terminate immediately. See the discussion of ``QUIT'' below. _D_i_g_i_t_s _i_n _n_u_m_b_e_r_s This part of the language is a minor nuisance. Pascal requires digits in real numbers both before and after the decimal point. Thus the following statements, which look quite reasonable to FORTRAN users, generate diagnostics in Pascal: Mon Apr 14 19:00 1986 digits.p: 4 r := 0.; e =>=>=>=>=>=>|^ => Digits required after decimal point 5 r := .0; e =>=>=>=>=>|^ => Digits required before decimal point 6 r := 1.e10; e =>=>=>=>=>=>|^ => Digits required after decimal point 7 r := .05e-10; e =>=>=>=>=>|^ => Digits required before decimal point These same constructs are also illegal as input to the Pas- cal interpreter _p_x. _R_e_p_l_a_c_e_m_e_n_t_s, _i_n_s_e_r_t_i_o_n_s, _a_n_d _d_e_l_e_t_i_o_n_s When a syntax error is encountered in the input text, the parser invokes an error recovery procedure. This pro- cedure examines the input text immediately after the point of error and considers a set of simple corrections to see whether they will allow the analysis to continue. These corrections involve replacing an input token with a dif- ferent token, inserting a token, or replacing an input token with a different token. Most of these changes will not cause fatal syntax errors. The exception is the insertion of or replacement with a symbol such as an identifier or a number; in this case the recovery makes no attempt to deter- mine _w_h_i_c_h identifier or _w_h_a_t number should be inserted, hence these are considered fatal syntax errors. Consider the following example. 9__________________________ tion over the _r_c_s_l_i_n_k to Pascal 6000-3.4, the character `#' should be used to delimit characters and constant strings. 9 - 26 - % _p_i_x -_l _s_y_n_e_r_r._p Berkeley Pascal PI -- Version 3.1 (9/7/85) Mon Apr 14 19:00 1986 synerr.p 1 program syn(output); 2 var i, j are integer; e =>=>=>=>=>=>=>|^ => Replaced identifier with a ':' 3 begin 4 for j :* 1 to 20 begin e =>=>=>=>=>=>=>=>=>=>|^ => Replaced '*' with a '=' e =>=>=>=>=>=>=>=>=>=>=>=>=>=>=>|^ => Inserted keyword do 5 write(j); 6 i = 2 ** j; e =>=>=>=>=>=>=>=>=>=>=>=>|^ => Inserted ':' E =>=>=>=>=>=>=>=>=>=>=>=>=>=>|^ => Inserted identifier 7 writeln(i)) E =>=>=>=>=>=>=>=>=>=>=>=>=>=>=>=>|^ => Deleted ')' 8 end 9 end. % The only surprise here may be that Pascal does not have an exponentiation operator, hence the complaint about `**'. This error illustrates that, if you assume that the language has a feature which it does not, the translator diagnostic may not indicate this, as the translator is unlikely to recognize the construct you supply. _U_n_d_e_f_i_n_e_d _o_r _i_m_p_r_o_p_e_r _i_d_e_n_t_i_f_i_e_r_s If an identifier is encountered in the input but is undefined, the error recovery will replace it with an iden- tifier of the appropriate class. Further references to this identifier will be summarized at the end of the containing _p_r_o_c_e_d_u_r_e or _f_u_n_c_t_i_o_n or at the end of the _p_r_o_g_r_a_m if the reference occurred in the main program. Similarly, if an identifier is used in an inappropriate way, e.g. if a _t_y_p_e identifier is used in an assignment statement, or if a sim- ple variable is used where a _r_e_c_o_r_d variable is required, a diagnostic will be produced and an identifier of the appropriate type inserted. Further incorrect references to this identifier will be flagged only if they involve incorrect use in a different way, with all incorrect uses being summarized in the same way as undefined variable uses are. _E_x_p_e_c_t_e_d _s_y_m_b_o_l_s, _m_a_l_f_o_r_m_e_d _c_o_n_s_t_r_u_c_t_s If none of the above mentioned corrections appear rea- sonable, the error recovery will examine the input to the left of the point of error to see if there is only one sym- bol which can follow this input. If this is the case, the recovery will print a diagnostic which indicates that the - 27 - given symbol was `Expected'. In cases where none of these corrections resolve the problems in the input, the recovery may issue a diagnostic that indicates that the input is ``malformed''. If neces- sary, the translator may then skip forward in the input to a place where analysis can continue. This process may cause some errors in the text to be missed. Consider the following example: % _p_i_x -_l _s_y_n_e_r_r_2._p Berkeley Pascal PI -- Version 3.1 (9/7/85) Mon Apr 14 19:00 1986 synerr2.p 1 program synerr2(input,outpu); 2 integer a(10) E =>=>=>|^ => Malformed declaration 3 begin 4 read(b); E =>=>=>=>=>=>=>=>=>|^ => Undefined variable 5 for c := 1 to 10 do E =>=>=>=>=>=>=>=>=>|^ => Undefined variable 6 a(c) := b * c; E =>=>=>=>=>=>=>=>=>=>=>|^ => Undefined procedure E =>=>=>=>=>=>=>=>=>=>=>=>=>|^ => Malformed statement 7 end. E 1 - File outpu listed in program statement but not declared In program synerr2: E - a undefined on lines 6 E - b undefined on line 4 E - c undefined on line 5 6 Execution suppressed due to compilation errors % Here we misspelled _o_u_t_p_u_t and gave a FORTRAN style variable declaration which the translator diagnosed as a `Malformed declaration'. When, on line 6, we used `(' and `)' for sub- scripting (as in FORTRAN) rather than the `[' and `]' which are used in Pascal, the translator noted that _a was not defined as a _p_r_o_c_e_d_u_r_e. This occurred because _p_r_o_c_e_d_u_r_e and _f_u_n_c_t_i_o_n argument lists are delimited by parentheses in Pas- cal. As it is not permissible to assign to procedure calls the translator diagnosed a malformed statement at the point of assignment. _E_x_p_e_c_t_e_d _a_n_d _u_n_e_x_p_e_c_t_e_d _e_n_d-_o_f-_f_i_l_e, ``_Q_U_I_T'' If the translator finds a complete program, but there is more non-comment text in the input file, then it will indicate that an end-of-file was expected. This situation may occur after a bracketing error, or if too many _e_n_ds are - 28 - present in the input. The message may appear after the recovery says that it ``Expected `.''' since `.' is the sym- bol that terminates a program. If severe errors in the input prohibit further process- ing the translator may produce a diagnostic followed by ``QUIT''. One example of this was given above - a non- terminated comment; another example is a line which is longer than 160 characters. Consider also the following example. % _p_i_x -_l _m_i_s_m._p Berkeley Pascal PI -- Version 3.1 (9/7/85) Mon Apr 14 19:00 1986 mism.p 1 program mismatch(output) 2 begin e =>=>=>|^ => Inserted ';' 3 writeln('***'); 4 { The next line is the last line in the file } 5 writeln E =>=>=>=>=>=>=>=>=>=>|^ => Malformed declaration E =>=>=>=>=>=>=>=>=>=>|^ => Unexpected end-of-file - QUIT % _3._2. _T_r_a_n_s_l_a_t_o_r _s_e_m_a_n_t_i_c _e_r_r_o_r_s The extremely large number of semantic diagnostic mes- sages which the translator produces make it unreasonable to discuss each message or group of messages in detail. The messages are, however, very informative. We will here explain the typical formats and the terminology used in the error messages so that you will be able to make sense out of them. In any case in which a diagnostic is not completely comprehensible you can refer to the _U_s_e_r _M_a_n_u_a_l by Jensen and Wirth for examples. _F_o_r_m_a_t _o_f _t_h_e _e_r_r_o_r _d_i_a_g_n_o_s_t_i_c_s As we saw in the example program above, the error diag- nostics from the Pascal translator include the number of a line in the text of the program as well as the text of the error message. While this number is most often the line where the error occurred, it is occasionally the number of a line containing a bracketing keyword like _e_n_d or _u_n_t_i_l. In this case, the diagnostic may refer to the previous state- ment. This occurs because of the method the translator uses for sampling line numbers. The absence of a trailing `;' in the previous statement causes the line number corresponding to the _e_n_d or _u_n_t_i_l. to become associated with the state- ment. As Pascal is a free-format language, the line number - 29 - associations can only be approximate and may seem arbitrary to some users. This is the only notable exception, however, to reasonable associations. _I_n_c_o_m_p_a_t_i_b_l_e _t_y_p_e_s Since Pascal is a strongly typed language, many seman- tic errors manifest themselves as type errors. These are called `type clashes' by the translator. The types allowed for various operators in the language are summarized on page 108 of the Jensen-Wirth _U_s_e_r _M_a_n_u_a_l. It is important to know that the Pascal translator, in its diagnostics, distin- guishes between the following type `classes': array Boolean char file integer pointer real record scalar string These words are plugged into a great number of error mes- sages. Thus, if you tried to assign an _i_n_t_e_g_e_r value to a _c_h_a_r variable you would receive a diagnostic like the fol- lowing: Mon Apr 14 19:00 1986 clash.p: E 7 - Type clash: integer is incompatible with char ... Type of expression clashed with type of variable in assignment In this case, one error produced a two line error message. If the same error occurs more than once, the same explana- tory diagnostic will be given each time. _S_c_a_l_a_r The only class whose meaning is not self-explanatory is `scalar'. Scalar has a precise meaning in the Jensen-Wirth _U_s_e_r _M_a_n_u_a_l where, in fact, it refers to _c_h_a_r, _i_n_t_e_g_e_r, _r_e_a_l, and _B_o_o_l_e_a_n types as well as the enumerated types. For the purposes of the Pascal translator, scalar in an error message refers to a user-defined, enumerated type, such as _o_p_s in the example above or _c_o_l_o_r in _t_y_p_e color = (red, green, blue) For integers, the more explicit denotation _i_n_t_e_g_e_r is used. Although it would be correct, in the context of the _U_s_e_r _M_a_n_u_a_l to refer to an integer variable as a _s_c_a_l_a_r variable _p_i prefers the more specific identification. _F_u_n_c_t_i_o_n _a_n_d _p_r_o_c_e_d_u_r_e _t_y_p_e _e_r_r_o_r_s For built-in procedures and functions, two kinds of errors occur. If the routines are called with the wrong number of arguments a message similar to: - 30 - Mon Apr 14 19:00 1986 sin1.p: E 12 - sin takes exactly one argument is given. If the type of the argument is wrong, a message like Mon Apr 14 19:00 1986 sin2.p: E 12 - sin's argument must be integer or real, not char is produced. A few functions and procedures implemented in Pascal 6000-3.4 are diagnosed as unimplemented in Berkeley Pascal, notably those related to _s_e_g_m_e_n_t_e_d files. _C_a_n'_t _r_e_a_d _a_n_d _w_r_i_t_e _s_c_a_l_a_r_s, _e_t_c. The messages which state that scalar (user-defined) types cannot be written to and from files are often mysteri- ous. It is in fact the case that if you define _t_y_p_e color = (red, green, blue) ``standard'' Pascal does not associate these constants with the strings `red', `green', and `blue' in any way. An extension has been added which allows enumerated types to be read and written, however if the program is to be portable, you will have to write your own routines to perform these functions. Standard Pascal only allows the reading of char- acters, integers and real numbers from text files. You can- not read strings or Booleans. It is possible to make a _f_i_l_e _o_f color but the representation is binary rather than string. _E_x_p_r_e_s_s_i_o_n _d_i_a_g_n_o_s_t_i_c_s The diagnostics for semantically ill-formed expressions are very explicit. Consider this sample translation: % _p_i -_l _e_x_p_r._p Berkeley Pascal PI -- Version 3.1 (9/7/85) Mon Apr 14 19:00 1986 expr.p 1 program x(output); 2 var 3 a: set of char; 4 b: Boolean; 5 c: (red, green, blue); 6 p: ^ integer; - 31 - 7 A: alfa; 8 B: packed array [1..5] of char; 9 begin 10 b := true; 11 c := red; 12 new(p); 13 a := []; 14 A := 'Hello, yellow'; 15 b := a and b; 16 a := a * 3; 17 if input < 2 then writeln('boo'); 18 if p <= 2 then writeln('sure nuff'); 19 if A = B then writeln('same'); 20 if c = true then writeln('hue''s and color''s') 21 end. E 14 - Constant string too long E 15 - Left operand of and must be Boolean, not set E 16 - Cannot mix sets with integers and reals as operands of * E 17 - files may not participate in comparisons E 18 - pointers and integers cannot be compared - operator was <= E 19 - Strings not same length in = comparison E 20 - scalars and Booleans cannot be compared - operator was = e 21 - Input is used but not defined in the program statement In program x: w - constant green is never used w - constant blue is never used w - variable B is used but never set % This example is admittedly far-fetched, but illustrates that the error messages are sufficiently clear to allow easy determination of the problem in the expressions. _T_y_p_e _e_q_u_i_v_a_l_e_n_c_e Several diagnostics produced by the Pascal translator complain about `non-equivalent types'. In general, Berkeley Pascal considers variables to have the same type only if they were declared with the same constructed type or with the same type identifier. Thus, the variables _x and _y declared as _v_a_r x: ^ integer; y: ^ integer; do not have the same type. The assignment x := y thus produces the diagnostics: 9 9 - 32 - Mon Apr 14 19:00 1986 typequ.p: E 7 - Type clash: non-identical pointer types ... Type of expression clashed with type of variable in assignment Thus it is always necessary to declare a type such as _t_y_p_e intptr = ^ integer; and use it to declare _v_a_r x: intptr; y: intptr; Note that if we had initially declared _v_a_r x, y: ^ integer; then the assignment statement would have worked. The state- ment x^ := y^ is allowed in either case. Since the parameter to a _p_r_o_- _c_e_d_u_r_e or _f_u_n_c_t_i_o_n must be declared with a type identifier rather than a constructed type, it is always necessary, in practice, to declare any type which will be used in this way. _U_n_r_e_a_c_h_a_b_l_e _s_t_a_t_e_m_e_n_t_s Berkeley Pascal flags unreachable statements. Such statements usually correspond to errors in the program logic. Note that a statement is considered to be reachable if there is a potential path of control, even if it can never be taken. Thus, no diagnostic is produced for the statement: _i_f false _t_h_e_n writeln('impossible!') _G_o_t_o'_s _i_n_t_o _s_t_r_u_c_t_u_r_e_d _s_t_a_t_e_m_e_n_t_s The translator detects and complains about _g_o_t_o state- ments which transfer control into structured statements (_f_o_r, _w_h_i_l_e, etc.) It does not allow such jumps, nor does it allow branching from the _t_h_e_n part of an _i_f statement into the _e_l_s_e part. Such checks are made only within the body of a single procedure or function. 9 9 - 33 - _U_n_u_s_e_d _v_a_r_i_a_b_l_e_s, _n_e_v_e_r _s_e_t _v_a_r_i_a_b_l_e_s Although _p_i always clears variables to 0 at _p_r_o_c_e_d_u_r_e and _f_u_n_c_t_i_o_n entry, _p_c does not unless runtime checking is enabled using the _C option. It is _n_o_t good programming practice to rely on this initialization. To discourage this practice, and to help detect errors in program logic, _p_i flags as a `w' warning error: 1) Use of a variable which is never assigned a value. 2) A variable which is declared but never used, dis- tinguishing between those variables for which values are computed but which are never used, and those completely unused. In fact, these diagnostics are applied to all declared items. Thus a _c_o_n_s_t or a _p_r_o_c_e_d_u_r_e which is declared but never used is flagged. The _w option of _p_i may be used to suppress these warnings; see sections 5.1 and 5.2. _3._3. _T_r_a_n_s_l_a_t_o_r _p_a_n_i_c_s, _i/_o _e_r_r_o_r_s _P_a_n_i_c_s One class of error which rarely occurs, but which causes termination of all processing when it does is a panic. A panic indicates a translator-detected internal inconsistency. A typical panic message is: snark (rvalue) line=110 yyline=109 Snark in pi If you receive such a message, the translation will be quickly and perhaps ungracefully terminated. You should contact a teaching assistant or a member of the system staff, after saving a copy of your program for later inspec- tion. If you were making changes to an existing program when the problem occurred, you may be able to work around the problem by ascertaining which change caused the _s_n_a_r_k and making a different change or correcting an error in the program. A small number of panics are possible in _p_x. All panics should be reported to a teaching assistant or systems staff so that they can be fixed. _O_u_t _o_f _m_e_m_o_r_y The only other error which will abort translation when no errors are detected is running out of memory. All tables in the translator, with the exception of the parse stack, are dynamically allocated, and can grow to take up the full available process space of 64000 bytes on the PDP-11. On the VAX-11, table sizes are extremely generous and very - 34 - large (25000) line programs have been easily accommodated. For the PDP-11, it is generally true that the size of the largest translatable program is directly related to _p_r_o_- _c_e_d_u_r_e and _f_u_n_c_t_i_o_n size. A number of non-trivial Pascal programs, including some with more than 2000 lines and 2500 statements have been translated and interpreted using Berke- ley Pascal on PDP-11's. Notable among these are the Pascal-S interpreter, a large set of programs for automated generation of code generators, and a general context-free parsing program which has been used to parse sentences with a grammar for a superset of English. In general, very large programs should be translated using _p_c and the separate com- pilation facility. If you receive an out of space message from the trans- lator during translation of a large _p_r_o_c_e_d_u_r_e or _f_u_n_c_t_i_o_n or one containing a large number of string constants you may yet be able to translate your program if you break this one _p_r_o_c_e_d_u_r_e or _f_u_n_c_t_i_o_n into several routines. _I/_O _e_r_r_o_r_s Other errors which you may encounter when running _p_i relate to input-output. If _p_i cannot open the file you specify, or if the file is empty, you will be so informed. _3._4. _R_u_n-_t_i_m_e _e_r_r_o_r_s We saw, in our second example, a run-time error. We here give the general description of run-time errors. The more unusual interpreter error messages are explained briefly in the manual section for _p_x (1). _S_t_a_r_t-_u_p _e_r_r_o_r_s These errors occur when the object file to be executed is not available or appropriate. Typical errors here are caused by the specified object file not existing, not being a Pascal object, or being inaccessible to the user. _P_r_o_g_r_a_m _e_x_e_c_u_t_i_o_n _e_r_r_o_r_s These errors occur when the program interacts with the Pascal runtime environment in an inappropriate way. Typical errors are values or subscripts out of range, bad arguments to built-in functions, exceeding the statement limit because of an infinite loop, or running out of memory*. The inter- preter will produce a backtrace after the error occurs, __________________________ *The checks for running out of memory are not foolproof and there is a chance that the interpreter will fault, producing a core image when it runs out of memory. This situation occurs very rarely. 9 9 - 35 - showing all the active routine calls, unless the _p option was disabled when the program was translated. Unfor- tunately, no variable values are given and no way of extracting them is available.* As an example of such an error, assume that we have accidentally declared the constant _n_1 to be 6, instead of 7 on line 2 of the program primes as given in section 2.6 above. If we run this program we get the following response. % _p_i_x _p_r_i_m_e_s._p Execution begins... 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 Subscript value of 7 is out of range Error in "primes"+8 near line 14. Execution terminated abnormally. 941 statements executed in 0.25 seconds cpu time. % Here the interpreter indicates that the program ter- minated abnormally due to a subscript out of range near line 14, which is eight lines into the body of the program primes. _I_n_t_e_r_r_u_p_t_s If the program is interrupted while executing and the _p option was not specified, then a backtrace will be printed.+ The file _p_m_o_n._o_u_t of profile information will be written if the program was translated with the _z option enabled to _p_i or _p_i_x. 9__________________________ * On the VAX-11, each variable is restricted to allo- cate at most 65000 bytes of storage (this is a PDP- 11ism that has survived to the VAX.) +Occasionally, the Pascal system will be in an incon- sistent state when this occurs, e.g. when an interrupt terminates a _p_r_o_c_e_d_u_r_e or _f_u_n_c_t_i_o_n entry or exit. In this case, the backtrace will only contain the current line. A reverse call order list of procedures will not be given. 9 - 36 - _I/_O _i_n_t_e_r_a_c_t_i_o_n _e_r_r_o_r_s The final class of interpreter errors results from inappropriate interactions with files, including the user's terminal. Included here are bad formats for integer and real numbers (such as no digits after the decimal point) when reading. _4. _I_n_p_u_t/_o_u_t_p_u_t This section describes features of the Pascal input/output environment, with special consideration of the features peculiar to an interactive implementation. _4._1. _I_n_t_r_o_d_u_c_t_i_o_n Our first sample programs, in section 2, used the file _o_u_t_p_u_t. We gave examples there of redirecting the output to a file and to the line printer using the shell. Similarly, we can read the input from a file or another program. Con- sider the following Pascal program which is similar to the program _c_a_t (1). % _p_i_x -_l _k_a_t._p <_p_r_i_m_e_s Berkeley Pascal PI -- Version 3.1 (9/7/85) Mon Apr 14 19:00 1986 kat.p 1 program kat(input, output); 2 var 3 ch: char; 4 begin 5 while not eof do begin 6 while not eoln do begin 7 read(ch); 8 write(ch) 9 end; 10 readln; 11 writeln 12 end 13 end { kat }. Execution begins... 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 Execution terminated. 925 statements executed in 0.25 seconds cpu time. % 9 9 - 37 - Here we have used the shell's syntax to redirect the program input from a file in _p_r_i_m_e_s in which we had placed the out- put of our prime number program of section 2.6. It is also possible to `pipe' input to this program much as we piped input to the line printer daemon _l_p_r (1) before. Thus, the same output as above would be produced by % _c_a_t _p_r_i_m_e_s | _p_i_x -_l _k_a_t._p All of these examples use the shell to control the input and output from files. One very simple way to associ- ate Pascal files with named UNIX files is to place the file name in the _p_r_o_g_r_a_m statement. For example, suppose we have previously created the file _d_a_t_a. We then use it as input to another version of a listing program. % _c_a_t _d_a_t_a line one. line two. line three is the end. % _p_i_x -_l _c_o_p_y_d_a_t_a._p Berkeley Pascal PI -- Version 3.1 (9/7/85) Mon Apr 14 19:00 1986 copydata.p 1 program copydata(data, output); 2 var 3 ch: char; 4 data: text; 5 begin 6 reset(data); 7 while not eof(data) do begin 8 while not eoln(data) do begin 9 read(data, ch); 10 write(ch) 11 end; 12 readln(data); 13 writeln 14 end 15 end { copydata }. Execution begins... line one. line two. line three is the end. Execution terminated. 134 statements executed in 0.04 seconds cpu time. % By mentioning the file _d_a_t_a in the _p_r_o_g_r_a_m statement, we have indicated that we wish it to correspond to the UNIX - 38 - file _d_a_t_a. Then, when we `reset(data)', the Pascal system opens our file `data' for reading. More sophisticated, but less portable, examples of using UNIX files will be given in sections 4.5 and 4.6. There is a portability problem even with this simple example. Some Pascal systems attach mean- ing to the ordering of the file in the _p_r_o_g_r_a_m statement file list. Berkeley Pascal does not do so. _4._2. _E_o_f _a_n_d _e_o_l_n An extremely common problem encountered by new users of Pascal, especially in the interactive environment offered by UNIX, relates to the definitions of _e_o_f and _e_o_l_n. These functions are supposed to be defined at the beginning of execution of a Pascal program, indicating whether the input device is at the end of a line or the end of a file. Set- ting _e_o_f or _e_o_l_n actually corresponds to an implicit read in which the input is inspected, but no input is ``used up''. In fact, there is no way the system can know whether the input is at the end-of-file or the end-of-line unless it attempts to read a line from it. If the input is from a previously created file, then this reading can take place without run-time action by the user. However, if the input is from a terminal, then the input is what the user types.+ If the system were to do an initial read automatically at the beginning of program execution, and if the input were a terminal, the user would have to type some input before exe- cution could begin. This would make it impossible for the program to begin by prompting for input or printing a herald. Berkeley Pascal has been designed so that an initial read is not necessary. At any given time, the Pascal system may or may not know whether the end-of-file or end-of-line conditions are true. Thus, internally, these functions can have three values - true, false, and ``I don't know yet; if you ask me I'll have to find out''. All files remain in this last, indeterminate state until the Pascal program requires a value for _e_o_f or _e_o_l_n either explicitly or impli- citly, e.g. in a call to _r_e_a_d. The important point to note here is that if you force the Pascal system to determine whether the input is at the end-of-file or the end-of-line, it will be necessary for it to attempt to read from the input. Thus consider the following example code 9__________________________ +It is not possible to determine whether the input is a terminal, as the input may appear to be a file but ac- tually be a _p_i_p_e, the output of a program which is reading from the terminal. 9 - 39 - _w_h_i_l_e _n_o_t eof _d_o _b_e_g_i_n write('number, please? '); read(i); writeln('that was a ', i: 2) _e_n_d At first glance, this may be appear to be a correct program for requesting, reading and echoing numbers. Notice, how- ever, that the _w_h_i_l_e loop asks whether _e_o_f is true _b_e_f_o_r_e the request is printed. This will force the Pascal system to decide whether the input is at the end-of-file. The Pas- cal system will give no messages; it will simply wait for the user to type a line. By producing the desired prompting before testing _e_o_f, the following code avoids this problem: write('number, please ?'); _w_h_i_l_e _n_o_t eof _d_o _b_e_g_i_n read(i); writeln('that was a ', i:2); write('number, please ?') _e_n_d The user must still type a line before the _w_h_i_l_e test is completed, but the prompt will ask for it. This example, however, is still not correct. To understand why, it is first necessary to know, as we will discuss below, that there is a blank character at the end of each line in a Pas- cal text file. The _r_e_a_d procedure, when reading integers or real numbers, is defined so that, if there are only blanks left in the file, it will return a zero value and set the end-of-file condition. If, however, there is a number remaining in the file, the end-of-file condition will not be set even if it is the last number, as _r_e_a_d never reads the blanks after the number, and there is always at least one blank. Thus the modified code will still put out a spurious that was a 0 at the end of a session with it when the end-of-file is reached. The simplest way to correct the problem in this example is to use the procedure _r_e_a_d_l_n instead of _r_e_a_d here. In general, unless we test the end-of-file condition both before and after calls to _r_e_a_d or _r_e_a_d_l_n, there will be inputs for which our program will attempt to read past end- of-file. _4._3. _M_o_r_e _a_b_o_u_t _e_o_l_n To have a good understanding of when _e_o_l_n will be true it is necessary to know that in any file there is a special character indicating end-of-line, and that, in effect, the Pascal system always reads one character ahead of the Pascal - 40 - _r_e_a_d commands.+ For instance, in response to `read(ch)', the system sets _c_h to the current input character and gets the next input character. If the current input character is the last character of the line, then the next input character from the file is the new-line character, the normal UNIX line separator. When the read routine gets the new-line character, it replaces that character by a blank (causing every line to end with a blank) and sets _e_o_l_n to true. _E_o_l_n will be true as soon as we read the last character of the line and _b_e_f_o_r_e we read the blank character corresponding to the end of line. Thus it is almost always a mistake to write a program which deals with input in the following way: read(ch); _i_f eoln _t_h_e_n _D_o_n_e _w_i_t_h _l_i_n_e _e_l_s_e _N_o_r_m_a_l _p_r_o_c_e_s_s_i_n_g as this will almost surely have the effect of ignoring the last character in the line. The `read(ch)' belongs as part of the normal processing. Given this framework, it is not hard to explain the function of a _r_e_a_d_l_n call, which is defined as: _w_h_i_l_e _n_o_t eoln _d_o get(input); get(input); This advances the file until the blank corresponding to the end-of-line is the current input symbol and then discards this blank. The next character available from _r_e_a_d will therefore be the first character of the next line, if one exists. _4._4. _O_u_t_p_u_t _b_u_f_f_e_r_i_n_g A final point about Pascal input-output must be noted here. This concerns the buffering of the file _o_u_t_p_u_t. It is extremely inefficient for the Pascal system to send each character to the user's terminal as the program generates it for output; even less efficient if the output is the input of another program such as the line printer daemon _l_p_r (1). To gain efficiency, the Pascal system ``buffers'' the output characters (i.e. it saves them in memory until the buffer is full and then emits the entire buffer in one system interac- tion.) However, to allow interactive prompting to work as in __________________________ +In Pascal terms, `read(ch)' corresponds to `ch := in- put^; get(input)' 9 9 - 41 - the example given above, this prompt must be printed before the Pascal system waits for a response. For this reason, Pascal normally prints all the output which has been gen- erated for the file _o_u_t_p_u_t whenever 1) A _w_r_i_t_e_l_n occurs, or 2) The program reads from the terminal, or 3) The procedure _m_e_s_s_a_g_e or _f_l_u_s_h is called. Thus, in the code sequence _f_o_r i := 1 to 5 _d_o _b_e_g_i_n write(i: 2); _C_o_m_p_u_t_e _a _l_o_t _w_i_t_h _n_o _o_u_t_p_u_t _e_n_d; writeln the output integers will not print until the _w_r_i_t_e_l_n occurs. The delay can be somewhat disconcerting, and you should be aware that it will occur. By setting the _b option to 0 before the _p_r_o_g_r_a_m statement by inserting a comment of the form (*$b0*) we can cause _o_u_t_p_u_t to be completely unbuffered, with a corresponding horrendous degradation in program efficiency. Option control in comments is discussed in section 5. _4._5. _F_i_l_e_s, _r_e_s_e_t, _a_n_d _r_e_w_r_i_t_e It is possible to use extended forms of the built-in functions _r_e_s_e_t and _r_e_w_r_i_t_e to get more general associations of UNIX file names with Pascal file variables. When a file other than _i_n_p_u_t or _o_u_t_p_u_t is to be read or written, then the reading or writing must be preceded by a _r_e_s_e_t or _r_e_w_r_i_t_e call. In general, if the Pascal file variable has never been used before, there will be no UNIX filename asso- ciated with it. As we saw in section 2.9, by mentioning the file in the _p_r_o_g_r_a_m statement, we could cause a UNIX file with the same name as the Pascal variable to be associated with it. If we do not mention a file in the _p_r_o_g_r_a_m state- ment and use it for the first time with the statement reset(f) or 9 9 - 42 - rewrite(f) then the Pascal system will generate a temporary name of the form `tmp.x' for some character `x', and associate this UNIX file name name with the Pascal file. The first such gen- erated name will be `tmp.1' and the names continue by incre- menting their last character through the ASCII set. The advantage of using such temporary files is that they are automatically _r_e_m_o_v_ed by the Pascal system as soon as they become inaccessible. They are not removed, however, if a runtime error causes termination while they are in scope. To cause a particular UNIX pathname to be associated with a Pascal file variable we can give that name in the _r_e_s_e_t or _r_e_w_r_i_t_e call, e.g. we could have associated the Pascal file _d_a_t_a with the file `primes' in our example in section 3.1 by doing: reset(data, 'primes') instead of a simple reset(data) In this case it is not essential to mention `data' in the program statement, but it is still a good idea because is serves as an aid to program documentation. The second parameter to _r_e_s_e_t and _r_e_w_r_i_t_e may be any string value, including a variable. Thus the names of UNIX files to be associated with Pascal file variables can be read in at run time. Full details on file name/file variable associations are given in section A.3. _4._6. _A_r_g_c _a_n_d _a_r_g_v Each UNIX process receives a variable length sequence of arguments each of which is a variable length character string. The built-in function _a_r_g_c and the built-in pro- cedure _a_r_g_v can be used to access and process these argu- ments. The value of the function _a_r_g_c is the number of arguments to the process. By convention, the arguments are treated as an array, and indexed from 0 to _a_r_g_c-1, with the zeroth argument being the name of the program being exe- cuted. The rest of the arguments are those passed to the command on the command line. Thus, the command % _o_b_j /_e_t_c/_m_o_t_d /_u_s_r/_d_i_c_t/_w_o_r_d_s _h_e_l_l_o will invoke the program in the file _o_b_j with _a_r_g_c having a value of 4. The zeroth element accessed by _a_r_g_v will be `obj', the first `/etc/motd', etc. - 43 - Pascal does not provide variable size arrays, nor does it allow character strings of varying length. For this rea- son, _a_r_g_v is a procedure and has the syntax argv(i, a) where _i is an integer and _a is a string variable. This pro- cedure call assigns the (possibly truncated or blank padded) _i'th argument of the current process to the string variable _a. The file manipulation routines _r_e_s_e_t and _r_e_w_r_i_t_e will strip trailing blanks from their optional second arguments so that this blank padding is not a problem in the usual case where the arguments are file names. We are now ready to give a Berkeley Pascal program `kat', based on that given in section 3.1 above, which can be used with the same syntax as the UNIX system program _c_a_t (1). % _c_a_t _k_a_t._p program kat(input, output); var ch: char; i: integer; name: packed array [1..100] of char; begin i := 1; repeat if i < argc then begin argv(i, name); reset(input, name); i := i + 1 end; while not eof do begin while not eoln do begin read(ch); write(ch) end; readln; writeln end until i >= argc end { kat }. % Note that the _r_e_s_e_t call to the file _i_n_p_u_t here, which is necessary for a clear program, may be disallowed on other systems. As this program deals mostly with _a_r_g_c and _a_r_g_v and UNIX system dependent considerations, portability is of little concern. If this program is in the file `kat.p', then we can do - 44 - % _p_i _k_a_t._p % _m_v _o_b_j _k_a_t % _k_a_t _p_r_i_m_e_s 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 930 statements executed in 0.29 seconds cpu time. % _k_a_t _T_h_i_s _i_s _a _l_i_n_e _o_f _t_e_x_t. This is a line of text. _T_h_e _n_e_x_t _l_i_n_e _c_o_n_t_a_i_n_s _o_n_l_y _a_n _e_n_d-_o_f-_f_i_l_e (_a_n _i_n_v_i_s_i_b_l_e _c_o_n_t_r_o_l-_d!) The next line contains only an end-of-file (an invisible control-d!) 287 statements executed in 0.08 seconds cpu time. % Thus we see that, if it is given arguments, `kat' will, like _c_a_t, copy each one in turn. If no arguments are given, it copies from the standard input. Thus it will work as it did before, with % _k_a_t < _p_r_i_m_e_s now equivalent to % _k_a_t _p_r_i_m_e_s although the mechanisms are quite different in the two cases. Note that if `kat' is given a bad file name, for example: % _k_a_t _x_x_x_x_q_q_q Could not open xxxxqqq: No such file or directory Error in "kat"+5 near line 11. 4 statements executed in 0.02 seconds cpu time. % it will give a diagnostic and a post-mortem control flow backtrace for debugging. If we were going to use `kat', we might want to translate it differently, e.g.: % _p_i -_p_b _k_a_t._p % _m_v _o_b_j _k_a_t - 45 - Here we have disabled the post-mortem statistics printing, so as not to get the statistics or the full traceback on error. The _b option will cause the system to block buffer the input/output so that the program will run more effi- ciently on large files. We could have also specified the _t option to turn off runtime tests if that was felt to be a speed hindrance to the program. Thus we can try the last examples again: % _k_a_t _x_x_x_x_q_q_q Could not open xxxxqqq: No such file or directory Error in "kat" % _k_a_t _p_r_i_m_e_s 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 % The interested reader may wish to try writing a program which accepts command line arguments like _p_i does, using _a_r_g_c and _a_r_g_v to process them. _5. _D_e_t_a_i_l_s _o_n _t_h_e _c_o_m_p_o_n_e_n_t_s _o_f _t_h_e _s_y_s_t_e_m _5._1. _O_p_t_i_o_n_s The programs _p_i, _p_c, and _p_x_p take a number of options.+ There is a standard UNIX convention for passing options to programs on the command line, and this convention is fol- lowed by the Berkeley Pascal system programs. As we saw in the examples above, option related arguments consisted of the character `-' followed by a single character option name. Except for the _b option which takes a single digit value, each option may be set on (enabled) or off (dis- abled.) When an on/off valued option appears on the command line of _p_i or it inverts the default setting of that option. Thus % _p_i -_l _f_o_o._p __________________________ +As _p_i_x uses _p_i to translate Pascal programs, it takes the options of _p_i also. We refer to them here, howev- er, as _p_i options. 9 9 - 46 - enables the listing option _l, since it defaults off, while % _p_i -_t _f_o_o._p disables the run time tests option _t, since it defaults on. In additon to inverting the default settings of _p_i options on the command line, it is also possible to control the _p_i options within the body of the program by using com- ments of a special form illustrated by {$l-} Here we see that the opening comment delimiter (which could also be a `(*') is immediately followed by the charac- ter `$'. After this `$', which signals the start of the option list, we can place a sequence of letters and option controls, separated by `,' characters*. The most basic actions for options are to set them, thus {$l+ Enable listing} or to clear them {$t-,p- No run-time tests, no post mortem analysis} Notice that `+' always enables an option and `-' always dis- ables it, no matter what the default is. Thus `-' has a different meaning in an option comment than it has on the command line. As shown in the examples, normal comment text may follow the option list. _5._2. _O_p_t_i_o_n_s _c_o_m_m_o_n _t_o _P_i, _P_c, _a_n_d _P_i_x The following options are common to both the compiler and the interpreter. With each option we give its default setting, the setting it would have if it appeared on the command line, and a sample command using the option. Most options are on/off valued, with the _b option taking a single __________________________ *This format was chosen because it is used by Pascal 6000-3.4. In general the options common to both imple- mentations are controlled in the same way so that com- ment control in options is mostly portable. It is recommended, however, that only one control be put per comment for maximum portability, as the Pascal 6000-3.4 implementation will ignore controls after the first one which it does not recognize. 9 9 - 47 - digit value. _B_u_f_f_e_r_i_n_g _o_f _t_h_e _f_i_l_e _o_u_t_p_u_t - _b The _b option controls the buffering of the file _o_u_t_p_u_t. The default is line buffering, with flushing at each refer- ence to the file _i_n_p_u_t and under certain other circumstances detailed in section 5 below. Mentioning _b on the command line, e.g. % _p_i -_b _a_s_s_e_m_b_l_e_r._p causes standard output to be block buffered, where a block is some system-defined number of characters. The _b option may also be controlled in comments. It, unique among the Berkeley Pascal options, takes a single digit value rather than an on or off setting. A value of 0, e.g. {$b0} causes the file _o_u_t_p_u_t to be unbuffered. Any value 2 or greater causes block buffering and is equivalent to the flag on the command line. The option control comment setting _b must precede the _p_r_o_g_r_a_m statement. _I_n_c_l_u_d_e _f_i_l_e _l_i_s_t_i_n_g - _i The _i option takes the name of an _i_n_c_l_u_d_e file, _p_r_o_- _c_e_d_u_r_e or _f_u_n_c_t_i_o_n name and causes it to be listed while translating+. Typical uses would be % _p_i_x -_i _s_c_a_n_n_e_r._i _c_o_m_p_i_l_e_r._p to make a listing of the routines in the file scanner.i, and % _p_i_x -_i _s_c_a_n_n_e_r _c_o_m_p_i_l_e_r._p to make a listing of only the routine _s_c_a_n_n_e_r. This option is especially useful for conservation-minded programmers making partial program listings. _M_a_k_e _a _l_i_s_t_i_n_g - _l The _l option enables a listing of the program. The _l option defaults off. When specified on the command line, it causes a header line identifying the version of the transla- tor in use and a line giving the modification time of the __________________________ +_I_n_c_l_u_d_e files are discussed in section 5.9. 9 9 - 48 - file being translated to appear before the actual program listing. The _l option is pushed and popped by the _i option at appropriate points in the program. _S_t_a_n_d_a_r_d _P_a_s_c_a_l _o_n_l_y - _s The _s option causes many of the features of the UNIX implementation which are not found in standard Pascal to be diagnosed as `s' warning errors. This option defaults off and is enabled when mentioned on the command line. Some of the features which are diagnosed are: non-standard _p_r_o_c_e_d_u_r_es and _f_u_n_c_t_i_o_ns, extensions to the _p_r_o_c_e_d_u_r_e _w_r_i_t_e, and the padding of constant strings with blanks. In addi- tion, all letters are mapped to lower case except in strings and characters so that the case of keywords and identifiers is effectively ignored. The _s option is most useful when a program is to be transported, thus % _p_i -_s _i_s_i_t_s_t_d._p will produce warnings unless the program meets the standard. _R_u_n_t_i_m_e _t_e_s_t_s - _t _a_n_d _C These options control the generation of tests that subrange variable values are within bounds at run time. _p_i defaults to generating tests and uses the option _t to dis- able them. _p_c defaults to not generating tests, and uses the option _C to enable them. Disabling runtime tests also causes _a_s_s_e_r_t statements to be treated as comments.* _S_u_p_p_r_e_s_s _w_a_r_n_i_n_g _d_i_a_g_n_o_s_t_i_c_s - _w The _w option, which defaults on, allows the translator to print a number of warnings about inconsistencies it finds in the input program. Turning this option off with a com- ment of the form {$w-} or on the command line % _p_i -_w _t_r_y_m_e._p suppresses these usually useful diagnostics. 9__________________________ *See section A.1 for a description of _a_s_s_e_r_t state- ments. 9 - 49 - _G_e_n_e_r_a_t_e _c_o_u_n_t_e_r_s _f_o_r _a _p_x_p _e_x_e_c_u_t_i_o_n _p_r_o_f_i_l_e - _z The _z option, which defaults off, enables the produc- tion of execution profiles. By specifying _z on the command line, i.e. % _p_i -_z _f_o_o._p or by enabling it in a comment before the _p_r_o_g_r_a_m statement causes _p_i and _p_c to insert operations in the interpreter code to count the number of times each statement was exe- cuted. An example of using _p_x_p was given in section 2.6; its options are described in section 5.6. Note that the _z option cannot be used on separately compiled programs. _5._3. _O_p_t_i_o_n_s _a_v_a_i_l_a_b_l_e _i_n _P_i _P_o_s_t-_m_o_r_t_e_m _d_u_m_p - _p The _p option defaults on, and causes the runtime system to initiate a post-mortem backtrace when an error occurs. It also cause _p_x to count statements in the executing pro- gram, enforcing a statement limit to prevent infinite loops. Specifying _p on the command line disables these checks and the ability to give this post-mortem analysis. It does make smaller and faster programs, however. It is also possible to control the _p option in comments. To prevent the post- mortem backtrace on error, _p must be off at the end of the _p_r_o_g_r_a_m statement. Thus, the Pascal cross-reference program was translated with % _p_i -_p_b_t _p_x_r_e_f._p _5._4. _O_p_t_i_o_n_s _a_v_a_i_l_a_b_l_e _i_n _P_x The first argument to _p_x is the name of the file con- taining the program to be interpreted. If no arguments are given, then the file _o_b_j is executed. If more arguments are given, they are available to the Pascal program by using the built-ins _a_r_g_c and _a_r_g_v as described in section 4.6. _P_x may also be invoked automatically. In this case, whenever a Pascal object file name is given as a command, the command will be executed with _p_x prepended to it; that is % _o_b_j _p_r_i_m_e_s will be converted to read 9 9 - 50 - % _p_x _o_b_j _p_r_i_m_e_s _5._5. _O_p_t_i_o_n_s _a_v_a_i_l_a_b_l_e _i_n _P_c _G_e_n_e_r_a_t_e _a_s_s_e_m_b_l_y _l_a_n_g_u_a_g_e - _S The program is compiled and the assembly language out- put is left in file appended .s. Thus % _p_c -_S _f_o_o._p creates a file _f_o_o._s. No executable file is created. _S_y_m_b_o_l_i_c _D_e_b_u_g_g_e_r _I_n_f_o_r_m_a_t_i_o_n - _g The _g option causes the compiler to generate informa- tion needed by _s_d_b(1) the symbolic debugger. For a complete description of _s_d_b see Volume 2c of the UNIX Reference Manual. _R_e_d_i_r_e_c_t _t_h_e _o_u_t_p_u_t _f_i_l_e - _o The _n_a_m_e argument after the -_o is used as the name of the output file instead of _a._o_u_t. Its typical use is to name the compiled program using the root of the file name. Thus: % _p_c -_o _m_y_p_r_o_g _m_y_p_r_o_g._p causes the compiled program to be called _m_y_p_r_o_g. _G_e_n_e_r_a_t_e _c_o_u_n_t_e_r_s _f_o_r _a _p_r_o_f _e_x_e_c_u_t_i_o_n _p_r_o_f_i_l_e - _p The compiler produces code which counts the number of times each routine is called. The profiling is based on a periodic sample taken by the system rather than by inline counters used by _p_x_p. This results in less degradation in execution, at somewhat of a loss in accuracy. See _p_r_o_f(1) for a more complete description. _R_u_n _t_h_e _o_b_j_e_c_t _c_o_d_e _o_p_t_i_m_i_z_e_r - _O The output of the compiler is run through the object code optimizer. This provides an increase in compile time in exchange for a decrease in compiled code size and execu- tion time. 9 9 - 51 - _5._6. _O_p_t_i_o_n_s _a_v_a_i_l_a_b_l_e _i_n _P_x_p _P_x_p takes, on its command line, a list of options fol- lowed by the program file name, which must end in `.p' as it must for _p_i, _p_c, and _p_i_x. _P_x_p will produce an execution profile if any of the _z, _t or _c options is specified on the command line. If none of these options is specified, then _p_x_p functions as a program reformatter. It is important to note that only the _z and _w options of _p_x_p, which are common to _p_i, _p_c, and _p_x_p can be con- trolled in comments. All other options must be specified on the command line to have any effect. The following options are relevant to profiling with _p_x_p: _I_n_c_l_u_d_e _t_h_e _b_o_d_i_e_s _o_f _a_l_l _r_o_u_t_i_n_e_s _i_n _t_h_e _p_r_o_f_i_l_e - _a _P_x_p normally suppresses printing the bodies of routines which were never executed, to make the profile more compact. This option forces all routine bodies to be printed. _S_u_p_p_r_e_s_s _d_e_c_l_a_r_a_t_i_o_n _p_a_r_t_s _f_r_o_m _a _p_r_o_f_i_l_e - _d Normally a profile includes declaration parts. Speci- fying _d on the command line suppresses declaration parts. _E_l_i_m_i_n_a_t_e _i_n_c_l_u_d_e _d_i_r_e_c_t_i_v_e_s - _e Normally, _p_x_p preserves _i_n_c_l_u_d_e directives to the out- put when reformatting a program, as though they were com- ments. Specifying -_e causes the contents of the specified files to be reformatted into the output stream instead. This is an easy way to eliminate _i_n_c_l_u_d_e directives, e.g. before transporting a program. _F_u_l_l_y _p_a_r_e_n_t_h_e_s_i_z_e _e_x_p_r_e_s_s_i_o_n_s - _f Normally _p_x_p prints expressions with the minimal parenthesization necessary to preserve the structure of the input. This option causes _p_x_p to fully parenthesize expres- sions. Thus the statement which prints as d := a + b mod c / e with minimal parenthesization, the default, will print as d := a + ((b mod c) / e) with the _f option specified on the command line. 9 9 - 52 - _L_e_f_t _j_u_s_t_i_f_y _a_l_l _p_r_o_c_e_d_u_r_e_s _a_n_d _f_u_n_c_t_i_o_n_s - _j Normally, each _p_r_o_c_e_d_u_r_e and _f_u_n_c_t_i_o_n body is indented to reflect its static nesting depth. This option prevents this nesting and can be used if the indented output would be too wide. _P_r_i_n_t _a _t_a_b_l_e _s_u_m_m_a_r_i_z_i_n_g _p_r_o_c_e_d_u_r_e _a_n_d _f_u_n_c_t_i_o_n _c_a_l_l_s - _t The _t option causes _p_x_p to print a table summarizing the number of calls to each _p_r_o_c_e_d_u_r_e and _f_u_n_c_t_i_o_n in the program. It may be specified in combination with the _z option, or separately. _E_n_a_b_l_e _a_n_d _c_o_n_t_r_o_l _t_h_e _p_r_o_f_i_l_e - _z The _z profile option is very similar to the _i listing control option of _p_i. If _z is specified on the command line, then all arguments up to the source file argument which ends in `.p' are taken to be the names of _p_r_o_c_e_d_u_r_es and _f_u_n_c_t_i_o_ns or _i_n_c_l_u_d_e files which are to be profiled. If this list is null, then the whole file is to be profiled. A typical command for extracting a profile of part of a large program would be % _p_x_p -_z _t_e_s_t _p_a_r_s_e_r._i _c_o_m_p_i_l_e_r._p This specifies that profiles of the routines in the file _p_a_r_s_e_r._i and the routine _t_e_s_t are to be made. _5._7. _F_o_r_m_a_t_t_i_n_g _p_r_o_g_r_a_m_s _u_s_i_n_g _p_x_p The program _p_x_p can be used to reformat programs, by using a command of the form % _p_x_p _d_i_r_t_y._p > _c_l_e_a_n._p Note that since the shell creates the output file `clean.p' before _p_x_p executes, so `clean.p' and `dirty.p' must not be the same file. _P_x_p automatically paragraphs the program, performing housekeeping chores such as comment alignment, and treating blank lines, lines containing exactly one blank and lines containing only a form-feed character as though they were comments, preserving their vertical spacing effect in the output. _P_x_p distinguishes between four kinds of comments: 1) Left marginal comments, which begin in the first column of the input line and are placed in the first column of an output line. 9 9 - 53 - 2) Aligned comments, which are preceded by no input tokens on the input line. These are aligned in the output with the running program text. 3) Trailing comments, which are preceded in the input line by a token with no more than two spaces separating the token from the comment. 4) Right marginal comments, which are preceded in the input line by a token from which they are separated by at least three spaces or a tab. These are aligned down the right margin of the output, currently to the first tab stop after the 40th column from the current ``left margin''. Consider the following program. % _c_a_t _c_o_m_m_e_n_t_s._p { This is a left marginal comment. } program hello(output); var i : integer; {This is a trailing comment} j : integer; {This is a right marginal comment} k : array [ 1..10] of array [1..10] of integer; {Marginal, but past the margin} { An aligned, multi-line comment which explains what this program is all about } begin i := 1; {Trailing i comment} {A left marginal comment} {An aligned comment} j := 1; {Right marginal comment} k[1] := 1; writeln(i, j, k[1]) end. When formatted by _p_x_p the following output is produced. % _p_x_p _c_o_m_m_e_n_t_s._p { This is a left marginal comment. } program hello(output); var i: integer; {This is a trailing comment} j: integer; {This is a right marginal comment} k: array [1..10] of array [1..10] of integer;{Marginal, but past the margin} { An aligned, multi-line comment which explains what this program is all about } - 54 - begin i := 1; {Trailing i comment} {A left marginal comment} {An aligned comment} j := 1; {Right marginal comment} k[1] := 1; writeln(i, j, k[1]) end. % The following formatting related options are currently available in _p_x_p. The options _f and _j described in the pre- vious section may also be of interest. _S_t_r_i_p _c_o_m_m_e_n_t_s -_s The _s option causes _p_x_p to remove all comments from the input text. _U_n_d_e_r_l_i_n_e _k_e_y_w_o_r_d_s - _ A command line argument of the form -_ as in % _p_x_p -_ _d_i_r_t_y._p can be used to cause _p_x_p to underline all keywords in the output for enhanced readability. _S_p_e_c_i_f_y _i_n_d_e_n_t_i_n_g _u_n_i_t - [_2_3_4_5_6_7_8_9] The normal unit which _p_x_p uses to indent a structure statement level is 4 spaces. By giving an argument of the form -_d with _d a digit, 2 <_ _d <_ 9 you can specify that _d spaces are to be used per level instead. _5._8. _P_x_r_e_f The cross-reference program _p_x_r_e_f may be used to make cross-referenced listings of Pascal programs. To produce a cross-reference of the program in the file `foo.p' one can execute the command: % _p_x_r_e_f _f_o_o._p The cross-reference is, unfortunately, not block structured. Full details on _p_x_r_e_f are given in its manual section _p_x_r_e_f (1). 9 9 - 55 - _5._9. _M_u_l_t_i-_f_i_l_e _p_r_o_g_r_a_m_s A text inclusion facility is available with Berkeley Pascal. This facility allows the interpolation of source text from other files into the source stream of the transla- tor. It can be used to divide large programs into more manageable pieces for ease in editing, listing, and mainte- nance. The _i_n_c_l_u_d_e facility is based on that of the UNIX C compiler. To trigger it you can place the character `#' in the first portion of a line and then, after an arbitrary number of blanks or tabs, the word `include' followed by a filename enclosed in single `'' or double `"' quotation marks. The file name may be followed by a semicolon `;' if you wish to treat this as a pseudo-Pascal statement. The filenames of included files must end in `.i'. An example of the use of included files in a main program would be: _p_r_o_g_r_a_m compiler(input, output, obj); #_i_n_c_l_u_d_e "globals.i" #_i_n_c_l_u_d_e "scanner.i" #_i_n_c_l_u_d_e "parser.i" #_i_n_c_l_u_d_e "semantics.i" _b_e_g_i_n { main program } _e_n_d. At the point the _i_n_c_l_u_d_e pseudo-statement is encoun- tered in the input, the lines from the included file are interpolated into the input stream. For the purposes of translation and runtime diagnostics and statement numbers in the listings and post-mortem backtraces, the lines in the included file are numbered from 1. Nested includes are pos- sible up to 10 deep. See the descriptions of the _i option of _p_i in section 5.2 above; this can be used to control listing when _i_n_c_l_u_d_e files are present. When a non-trivial line is encountered in the source text after an _i_n_c_l_u_d_e finishes, the `popped' filename is printed, in the same manner as above. For the purposes of error diagnostics when not making a listing, the filename will be printed before each diagnostic if the current filename has changed since the last filename was printed. 9 9 - 56 - _5._1_0. _S_e_p_a_r_a_t_e _C_o_m_p_i_l_a_t_i_o_n _w_i_t_h _P_c A separate compilation facility is provided with the Berkeley Pascal compiler, _p_c. This facility allows programs to be divided into a number of files and the pieces to be compiled individually, to be linked together at some later time. This is especially useful for large programs, where small changes would otherwise require time-consuming re- compilation of the entire program. Normally, _p_c expects to be given entire Pascal pro- grams. However, if given the -_c option on the command line, it will accept a sequence of definitions and declarations, and compile them into a ._o file, to be linked with a Pascal program at a later time. In order that procedures and func- tions be available across separately compiled files, they must be declared with the directive _e_x_t_e_r_n_a_l. This direc- tive is similar to the directive _f_o_r_w_a_r_d in that it must precede the resolution of the function or procedure, and formal parameters and function result types must be speci- fied at the _e_x_t_e_r_n_a_l declaration and may not be specified at the resolution. Type checking is performed across separately compiled files. Since Pascal type defintions define unique types, any types which are shared between separately compiled files must be the same definition. This seemingly impossible problem is solved using a facility similar to the _i_n_c_l_u_d_e facility discussed above. Definitions may be placed in files with the extension ._h and the files included by separately compiled files. Each definition from a ._h file defines a unique type, and all uses of a definition from the same ._h file define the same type. Similarly, the facility is extended to allow the definition of _c_o_n_s_ts and the declaration of _l_a_b_e_ls, _v_a_rs, and _e_x_t_e_r_n_a_l _f_u_n_c_t_i_o_ns and _p_r_o_c_e_d_u_r_es. Thus _p_r_o_c_e_d_u_r_es and _f_u_n_c_t_i_o_ns which are used between separately compiled files must be declared _e_x_t_e_r_n_a_l, and must be so declared in a ._h file included by any file which calls or resolves the _f_u_n_c_t_i_o_n or _p_r_o_c_e_d_u_r_e. Con- versely, _f_u_n_c_t_i_o_ns and _p_r_o_c_e_d_u_r_es declared _e_x_t_e_r_n_a_l may only be so declared in ._h files. These files may be included only at the outermost level, and thus define or declare glo- bal objects. Note that since only _e_x_t_e_r_n_a_l _f_u_n_c_t_i_o_n and _p_r_o_c_e_d_u_r_e declarations (and not resolutions) are allowed in ._h files, statically nested _f_u_n_c_t_i_o_ns and _p_r_o_c_e_d_u_r_es can not be declared _e_x_t_e_r_n_a_l. An example of the use of included ._h files in a program would be: _p_r_o_g_r_a_m compiler(input, output, obj); #_i_n_c_l_u_d_e "globals.h" - 57 - #_i_n_c_l_u_d_e "scanner.h" #_i_n_c_l_u_d_e "parser.h" #_i_n_c_l_u_d_e "semantics.h" _b_e_g_i_n { main program } _e_n_d. This might include in the main program the definitions and declarations of all the global _l_a_b_e_ls, _c_o_n_s_ts, _t_y_p_es _v_a_rs from the file globals.h, and the _e_x_t_e_r_n_a_l _f_u_n_c_t_i_o_n and _p_r_o_c_e_d_u_r_e declarations for each of the separately compiled files for the scanner, parser and semantics. The header file _s_c_a_n_n_e_r._h would contain declarations of the form: _t_y_p_e token = _r_e_c_o_r_d { token fields } _e_n_d; _f_u_n_c_t_i_o_n scan(_v_a_r inputfile: text): token; _e_x_t_e_r_n_a_l; Then the scanner might be in a separately compiled file con- taining: #_i_n_c_l_u_d_e "globals.h" #_i_n_c_l_u_d_e "scanner.h" _f_u_n_c_t_i_o_n scan; _b_e_g_i_n { scanner code } _e_n_d; which includes the same global definitions and declarations and resolves the scanner functions and procedures declared _e_x_t_e_r_n_a_l in the file scanner.h. _A. _A_p_p_e_n_d_i_x _t_o _W_i_r_t_h'_s _P_a_s_c_a_l _R_e_p_o_r_t This section is an appendix to the definition of the Pascal language in Niklaus Wirth's _P_a_s_c_a_l _R_e_p_o_r_t and, with that Report, precisely defines the Berkeley implementation. This appendix includes a summary of extensions to the language, gives the ways in which the undefined specifica- tions were resolved, gives limitations and restrictions of the current implementation, and lists the added functions and procedures available. It concludes with a list of differences with the commonly available Pascal 6000-3.4 implementation, and some comments on standard and portable Pascal. - 58 - _A._1. _E_x_t_e_n_s_i_o_n_s _t_o _t_h_e _l_a_n_g_u_a_g_e _P_a_s_c_a_l This section defines non-standard language constructs available in Berkeley Pascal. The _s standard Pascal option of the translators _p_i and _p_c can be used to detect these extensions in programs which are to be transported. _S_t_r_i_n_g _p_a_d_d_i_n_g Berkeley Pascal will pad constant strings with blanks in expressions and as value parameters to make them as long as is required. The following is a legal Berkeley Pascal program: _p_r_o_g_r_a_m x(output); _v_a_r z : _p_a_c_k_e_d _a_r_r_a_y [ 1 .. 13 ] _o_f char; _b_e_g_i_n z := 'red'; writeln(z) _e_n_d; The padded blanks are added on the right. Thus the assign- ment above is equivalent to: z := 'red ' which is standard Pascal. _O_c_t_a_l _c_o_n_s_t_a_n_t_s, _o_c_t_a_l _a_n_d _h_e_x_a_d_e_c_i_m_a_l _w_r_i_t_e Octal constants may be given as a sequence of octal digits followed by the character `b' or `B'. The forms write(a:n _o_c_t) and write(a:n _h_e_x) cause the internal representation of expression _a, which must be Boolean, character, integer, pointer, or a user- defined enumerated type, to be written in octal or hexade- cimal respectively. _A_s_s_e_r_t _s_t_a_t_e_m_e_n_t An _a_s_s_e_r_t statement causes a _B_o_o_l_e_a_n expression to be evaluated each time the statement is executed. A runtime error results if any of the expressions evaluates to be _f_a_l_s_e. The _a_s_s_e_r_t statement is treated as a comment if - 59 - run-time tests are disabled. The syntax for _a_s_s_e_r_t is: _a_s_s_e_r_t _E_n_u_m_e_r_a_t_e_d _t_y_p_e _i_n_p_u_t-_o_u_t_p_u_t Enumerated types may be read and written. On output the string name associated with the enumerated value is out- put. If the value is out of range, a runtime error occurs. On input an identifier is read and looked up in a table of names associated with the type of the variable, and the appropriate internal value is assigned to the variable being read. If the name is not found in the table a runtime error occurs. _S_t_r_u_c_t_u_r_e _r_e_t_u_r_n_i_n_g _f_u_n_c_t_i_o_n_s An extension has been added which allows functions to return arbitrary sized structures rather than just scalars as in the standard. _S_e_p_a_r_a_t_e _c_o_m_p_i_l_a_t_i_o_n The compiler _p_c has been extended to allow separate compilation of programs. Procedures and functions declared at the global level may be compiled separately. Type check- ing of calls to separately compiled routines is performed at load time to insure that the program as a whole is con- sistent. See section 5.10 for details. _A._2. _R_e_s_o_l_u_t_i_o_n _o_f _t_h_e _u_n_d_e_f_i_n_e_d _s_p_e_c_i_f_i_c_a_t_i_o_n_s _F_i_l_e _n_a_m_e - _f_i_l_e _v_a_r_i_a_b_l_e _a_s_s_o_c_i_a_t_i_o_n_s Each Pascal file variable is associated with a named UNIX file. Except for _i_n_p_u_t and _o_u_t_p_u_t, which are excep- tions to some of the rules, a name can become associated with a file in any of three ways: 1) If a global Pascal file variable appears in the _p_r_o_g_r_a_m statement then it is associated with UNIX file of the same name. 2) If a file was reset or rewritten using the extended two-argument form of _r_e_s_e_t or _r_e_w_r_i_t_e then the given name is associated. 3) If a file which has never had UNIX name associated is reset or rewritten without specifying a name via the second argument, then a temporary name of the form `tmp.x' is associated with the file. Temporary names start with `tmp.1' and continue by - 60 - incrementing the last character in the USASCII ordering. Temporary files are removed automati- cally when their scope is exited. _T_h_e _p_r_o_g_r_a_m _s_t_a_t_e_m_e_n_t The syntax of the _p_r_o_g_r_a_m statement is: _p_r_o_g_r_a_m ( { , } ) ; The file identifiers (other than _i_n_p_u_t and _o_u_t_p_u_t) must be declared as variables of _f_i_l_e type in the global declaration part. _T_h_e _f_i_l_e_s _i_n_p_u_t _a_n_d _o_u_t_p_u_t The formal parameters _i_n_p_u_t and _o_u_t_p_u_t are associated with the UNIX standard input and output and have a somewhat special status. The following rules must be noted: 1) The program heading _m_u_s_t contains the formal parameter _o_u_t_p_u_t. If _i_n_p_u_t is used, explicitly or implicitly, then it must also be declared here. 2) Unlike all other files, the Pascal files _i_n_p_u_t and _o_u_t_p_u_t must not be defined in a declaration, as their declaration is automatically: _v_a_r input, output: text 3) The procedure _r_e_s_e_t may be used on _i_n_p_u_t. If no UNIX file name has ever been associated with _i_n_p_u_t, and no file name is given, then an attempt will be made to `rewind' _i_n_p_u_t. If this fails, a run time error will occur. _R_e_w_r_i_t_e calls to out- put act as for any other file, except that _o_u_t_p_u_t initially has no associated file. This means that a simple rewrite(output) associates a temporary name with _o_u_t_p_u_t. _D_e_t_a_i_l_s _f_o_r _f_i_l_e_s If a file other than _i_n_p_u_t is to be read, then reading must be initiated by a call to the procedure _r_e_s_e_t which causes the Pascal system to attempt to open the associated UNIX file for reading. If this fails, then a runtime error occurs. Writing of a file other than _o_u_t_p_u_t must be - 61 - initiated by a _r_e_w_r_i_t_e call, which causes the Pascal system to create the associated UNIX file and to then open the file for writing only. _B_u_f_f_e_r_i_n_g The buffering for _o_u_t_p_u_t is determined by the value of the _b option at the end of the _p_r_o_g_r_a_m statement. If it has its default value 1, then _o_u_t_p_u_t is buffered in blocks of up to 512 characters, flushed whenever a writeln occurs and at each reference to the file _i_n_p_u_t. If it has the value 0, _o_u_t_p_u_t is unbuffered. Any value of 2 or more gives block buffering without line or _i_n_p_u_t reference flushing. All other output files are always buffered in blocks of 512 characters. All output buffers are flushed when the files are closed at scope exit, whenever the procedure _m_e_s_s_a_g_e is called, and can be flushed using the built-in procedure _f_l_u_s_h. An important point for an interactive implementation is the definition of `input|^'. If _i_n_p_u_t is a teletype, and the Pascal system reads a character at the beginning of execu- tion to define `input|^', then no prompt could be printed by the program before the user is required to type some input. For this reason, `input|^' is not defined by the system until its definition is needed, reading from a file occurring only when necessary. _T_h_e _c_h_a_r_a_c_t_e_r _s_e_t Seven bit USASCII is the character set used on UNIX. The standard Pascal symbols `and', 'or', 'not', '<=', '>=', '<>', and the uparrow `|^' (for pointer qualification) are recognized.+ Less portable are the synonyms tilde `~' for _n_o_t, `&' for _a_n_d, and `|' for _o_r. Upper and lower case are considered to be distinct. Keywords and built-in _p_r_o_c_e_d_u_r_e and _f_u_n_c_t_i_o_n names are com- posed of all lower case letters. Thus the identifiers GOTO and GOto are distinct both from each other and from the key- word _g_o_t_o. The standard type `boolean' is also available as `Boolean'. Character strings and constants may be delimited by the character `'' or by the character `#'; the latter is some- times convenient when programs are to be transported. Note __________________________ +On many terminals and printers, the up arrow is represented as a circumflex `^'. These are not dis- tinct characters, but rather different graphic representations of the same internal codes. The proposed standard for Pascal considers them to be the same. 9 9 - 62 - that the `#' character has special meaning when it is the first character on a line - see _M_u_l_t_i-_f_i_l_e _p_r_o_g_r_a_m_s below. _T_h_e _s_t_a_n_d_a_r_d _t_y_p_e_s The standard type _i_n_t_e_g_e_r is conceptually defined as _t_y_p_e integer = minint .. maxint; _I_n_t_e_g_e_r is implemented with 32 bit twos complement arith- metic. Predefined constants of type _i_n_t_e_g_e_r are: _c_o_n_s_t maxint = 2147483647; minint = -2147483648; The standard type _c_h_a_r is conceptually defined as _t_y_p_e char = minchar .. maxchar; Built-in character constants are `minchar' and `maxchar', `bell' and `tab'; ord(minchar) = 0, ord(maxchar) = 127. The type _r_e_a_l is implemented using 64 bit floating point arithmetic. The floating point arithmetic is done in `rounded' mode, and provides approximately 17 digits of pre- cision with numbers as small as 10 to the negative 38th power and as large as 10 to the 38th power. _C_o_m_m_e_n_t_s Comments can be delimited by either `{' and `}' or by `(*' and `*)'. If the character `{' appears in a comment delimited by `{' and `}', a warning diagnostic is printed. A similar warning will be printed if the sequence `(*' appears in a comment delimited by `(*' and `*)'. The res- triction implied by this warning is not part of standard Pascal, but detects many otherwise subtle errors. _O_p_t_i_o_n _c_o_n_t_r_o_l Options of the translators may be controlled in two distinct ways. A number of options may appear on the com- mand line invoking the translator. These options are given as one or more strings of letters preceded by the character `-' and cause the default setting of each given option to be changed. This method of communication of options is expected to predominate for UNIX. Thus the command % _p_i -_l -_s _f_o_o._p 9 9 - 63 - translates the file foo.p with the listing option enabled (as it normally is off), and with only standard Pascal features available. If more control over the portions of the program where options are enabled is required, then option control in com- ments can and should be used. The format for option control in comments is identical to that used in Pascal 6000-3.4. One places the character `$' as the first character of the comment and follows it by a comma separated list of direc- tives. Thus an equivalent to the command line example given above would be: {$l+,s+ listing on, standard Pascal} as the first line of the program. The `l' option is more appropriately specified on the command line, since it is extremely unlikely in an interactive environment that one wants a listing of the program each time it is translated. Directives consist of a letter designating the option, followed either by a `+' to turn the option on, or by a `-' to turn the option off. The _b option takes a single digit instead of a `+' or `-'. _N_o_t_e_s _o_n _t_h_e _l_i_s_t_i_n_g_s The first page of a listing includes a banner line indicating the version and date of generation of _p_i or _p_c. It also includes the UNIX path name supplied for the source file and the date of last modification of that file. Within the body of the listing, lines are numbered con- secutively and correspond to the line numbers for the edi- tor. Currently, two special kinds of lines may be used to format the listing: a line consisting of a form-feed charac- ter, control-l, which causes a page eject in the listing, and a line with no characters which causes the line number to be suppressed in the listing, creating a truly blank line. These lines thus correspond to `eject' and `space' macros found in many assemblers. Non-printing characters are printed as the character `?' in the listing.+ 9__________________________ +The character generated by a control-i indents to the next `tab stop'. Tab stops are set every 8 columns in UNIX. Tabs thus provide a quick way of indenting in the program. 9 - 64 - _T_h_e _s_t_a_n_d_a_r_d _p_r_o_c_e_d_u_r_e _w_r_i_t_e If no minimum field length parameter is specified for a _w_r_i_t_e, the following default values are assumed: integer 10 real 22 Boolean length of `true' or `false' char 1 string length of the string oct 11 hex 8 The end of each line in a text file should be explicitly indicated by `writeln(f)', where `writeln(output)' may be written simply as `writeln'. For UNIX, the built-in func- tion `page(f)' puts a single ASCII form-feed character on the output file. For programs which are to be transported the filter _p_c_c can be used to interpret carriage control, as UNIX does not normally do so. _A._3. _R_e_s_t_r_i_c_t_i_o_n_s _a_n_d _l_i_m_i_t_a_t_i_o_n_s _F_i_l_e_s Files cannot be members of files or members of dynami- cally allocated structures. _A_r_r_a_y_s, _s_e_t_s _a_n_d _s_t_r_i_n_g_s The calculations involving array subscripts and set elements are done with 16 bit arithmetic. This restricts the types over which arrays and sets may be defined. The lower bound of such a range must be greater than or equal to -32768, and the upper bound less than 32768. In particular, strings may have any length from 1 to 65535 characters, and sets may contain no more than 65535 elements. _L_i_n_e _a_n_d _s_y_m_b_o_l _l_e_n_g_t_h There is no intrinsic limit on the length of identif- iers. Identifiers are considered to be distinct if they differ in any single position over their entire length. There is a limit, however, on the maximum input line length. This limit is quite generous however, currently exceeding 160 characters. _P_r_o_c_e_d_u_r_e _a_n_d _f_u_n_c_t_i_o_n _n_e_s_t_i_n_g _a_n_d _p_r_o_g_r_a_m _s_i_z_e At most 20 levels of _p_r_o_c_e_d_u_r_e and _f_u_n_c_t_i_o_n nesting are allowed. There is no fundamental, translator defined limit on the size of the program which can be translated. The ultimate limit is supplied by the hardware and thus, on the PDP-11, by the 16 bit address space. If one runs up against - 65 - the `ran out of memory' diagnostic the program may yet translate if smaller procedures are used, as a lot of space is freed by the translator at the completion of each _p_r_o_- _c_e_d_u_r_e or _f_u_n_c_t_i_o_n in the current implementation. On the VAX-11, there is an implementation defined limit of 65536 bytes per variable. There is no limit on the number of variables. _O_v_e_r_f_l_o_w There is currently no checking for overflow on arith- metic operations at run-time on the PDP-11. Overflow check- ing is performed on the VAX-11 by the hardware. _A._4. _A_d_d_e_d _t_y_p_e_s, _o_p_e_r_a_t_o_r_s, _p_r_o_c_e_d_u_r_e_s _a_n_d _f_u_n_c_t_i_o_n_s _A_d_d_i_t_i_o_n_a_l _p_r_e_d_e_f_i_n_e_d _t_y_p_e_s The type _a_l_f_a is predefined as: _t_y_p_e alfa = _p_a_c_k_e_d _a_r_r_a_y [ 1..10 ] _o_f _c_h_a_r The type _i_n_t_s_e_t is predefined as: _t_y_p_e intset = _s_e_t _o_f 0..127 In most cases the context of an expression involving a con- stant set allows the translator to determine the type of the set, even though the constant set itself may not uniquely determine this type. In the cases where it is not possible to determine the type of the set from local context, the expression type defaults to a set over the entire base type unless the base type is integer+. In the latter case the type defaults to the current binding of _i_n_t_s_e_t, which must be ``type set of (a subrange of) integer'' at that point. Note that if _i_n_t_s_e_t is redefined via: _t_y_p_e intset = _s_e_t _o_f 0..58; then the default integer set is the implicit _i_n_t_s_e_t of Pas- cal 6000-3.4 9__________________________ +The current translator makes a special case of the construct `if ... in [ ... ]' and enforces only the more lax restriction on 16 bit arithmetic given above in this case. 9 - 66 - _A_d_d_i_t_i_o_n_a_l _p_r_e_d_e_f_i_n_e_d _o_p_e_r_a_t_o_r_s The relationals `<' and `>' of proper set inclusion are available. With _a and _b sets, note that (_n_o_t (_a < _b)) <> (_a >= _b) As an example consider the sets _a = [0,2] and _b = [1]. The only relation true between these sets is `<>'. _N_o_n-_s_t_a_n_d_a_r_d _p_r_o_c_e_d_u_r_e_s argv(i,a) where _i is an integer and _a is a string variable assigns the (possi- bly truncated or blank padded) _i'th argument of the invocation of the current UNIX process to the vari- able _a. The range of valid _i is _0 to _a_r_g_c-_1. date(a) assigns the current date to the alfa variable _a in the format `dd mmm yy ', where `mmm' is the first three characters of the month, i.e. `Apr'. flush(f) writes the output buffered for Pas- cal file _f into the associated UNIX file. halt terminates the execution of the program with a control flow back- trace. linelimit(f,x)* with _f a textfile and _x an integer expres- sion causes the program to be abnormally terminated if more than _x lines are written on file _f. If _x is less than 0 then no limit is imposed. message(x,...) causes the parameters, which have the format of those to the built-in _p_r_o_c_e_d_u_r_e _w_r_i_t_e, to be written unbuffered on the diagnostic unit 2, almost always the user's termi- nal. null a procedure of no arguments which __________________________ *Currently ignored by pdp-11 _p_x. 9 9 - 67 - does absolutely nothing. It is useful as a place holder, and is generated by _p_x_p in place of the invisible empty statement. remove(a) where _a is a string causes the UNIX file whose name is _a, with trailing blanks eliminated, to be removed. reset(f,a) where _a is a string causes the file whose name is _a (with blanks trimmed) to be associated with _f in addition to the normal function of _r_e_s_e_t. rewrite(f,a) is analogous to `reset' above. stlimit(i) where _i is an integer sets the statement limit to be _i statements. Specifying the _p option to _p_c dis- ables statement limit counting. time(a) causes the current time in the form ` hh:mm:ss ' to be assigned to the alfa variable _a. _N_o_n-_s_t_a_n_d_a_r_d _f_u_n_c_t_i_o_n_s argc returns the count of arguments when the Pascal program was invoked. _A_r_g_c is always at least 1. card(x) returns the cardinality of the set _x, i.e. the number of elements con- tained in the set. clock returns an integer which is the number of central processor mil- liseconds of user time used by this process. expo(x) yields the integer valued exponent of the floating-point representa- tion of _x; expo(_x) = entier(log2(abs(_x))). random(x) where _x is a real parameter, evaluated but otherwise ignored, invokes a linear congruential ran- dom number generator. Successive seeds are generated as (seed*a + c) mod m and the new random number is a normalization of the seed to the range 0.0 to 1.0; a is 62605, c is - 68 - 113218009, and m is 536870912. The initial seed is 7774755. seed(i) where _i is an integer sets the ran- dom number generator seed to _i and returns the previous seed. Thus seed(seed(i)) has no effect except to yield value _i. sysclock an integer function of no arguments returns the number of central pro- cessor milliseconds of system time used by this process. undefined(x) a Boolean function. Its argument is a real number and it always returns false. wallclock an integer function of no arguments returns the time in seconds since 00:00:00 GMT January 1, 1970. _A._5. _R_e_m_a_r_k_s _o_n _s_t_a_n_d_a_r_d _a_n_d _p_o_r_t_a_b_l_e _P_a_s_c_a_l It is occasionally desirable to prepare Pascal programs which will be acceptable at other Pascal installations. While certain system dependencies are bound to creep in, judicious design and programming practice can usually elim- inate most of the non-portable usages. Wirth's _P_a_s_c_a_l _R_e_p_o_r_t concludes with a standard for implementation and pro- gram exchange. In particular, the following differences may cause trouble when attempting to transport programs between this implementation and Pascal 6000-3.4. Using the _s translator option may serve to indicate many problem areas.+ _F_e_a_t_u_r_e_s _n_o_t _a_v_a_i_l_a_b_l_e _i_n _B_e_r_k_e_l_e_y _P_a_s_c_a_l Segmented files and associated functions and pro- cedures. The function _t_r_u_n_c with two arguments. Arrays whose indices exceed the capacity of 16 bit arithmetic. 9__________________________ +The _s option does not, however, check that identifiers differ in the first 8 characters. _P_i and _p_c also do not check the semantics of _p_a_c_k_e_d. 9 - 69 - _F_e_a_t_u_r_e_s _a_v_a_i_l_a_b_l_e _i_n _B_e_r_k_e_l_e_y _P_a_s_c_a_l _b_u_t _n_o_t _i_n _P_a_s_c_a_l _6_0_0_0-_3._4 The procedures _r_e_s_e_t and _r_e_w_r_i_t_e with file names. The functions _a_r_g_c, _s_e_e_d, _s_y_s_c_l_o_c_k, and _w_a_l_l_c_l_o_c_k. The procedures _a_r_g_v, _f_l_u_s_h, and _r_e_m_o_v_e. _M_e_s_s_a_g_e with arguments other than character strings. _W_r_i_t_e with keyword _h_e_x. The _a_s_s_e_r_t statement. Reading and writing of enumerated types. Allowing functions to return structures. Separate compilation of programs. Comparison of records. _O_t_h_e_r _p_r_o_b_l_e_m _a_r_e_a_s Sets and strings are more general in Berkeley Pascal; see the restrictions given in the Jensen-Wirth _U_s_e_r _M_a_n_u_a_l for details on the 6000-3.4 restrictions. The character set differences may cause problems, espe- cially the use of the function _c_h_r, characters as arguments to _o_r_d, and comparisons of characters, since the character set ordering differs between the two machines. The Pascal 6000-3.4 compiler uses a less strict notion of type equivalence. In Berkeley Pascal, types are con- sidered identical only if they are represented by the same type identifier. Thus, in particular, unnamed types are unique to the variables/fields declared with them. Pascal 6000-3.4 doesn't recognize our option flags, so it is wise to put the control of Berkeley Pascal options to the end of option lists or, better yet, restrict the option list length to one. For Pascal 6000-3.4 the ordering of files in the pro- gram statement has significance. It is desirable to place _i_n_p_u_t and _o_u_t_p_u_t as the first two files in the _p_r_o_g_r_a_m statement. 9 9 - 70 - _A_c_k_n_o_w_l_e_d_g_m_e_n_t_s The financial support of William Joy and Susan Graham by the National Science Foundation under grants MCS74- 07644-A04, MCS78-07291, and MCS80-05144, and the William Joy by an IBM Graduate Fellowship are gratefully acknowledged. 9 9