RATFOR - A Preprocessor for a Rational Fortran Brian W. Kernighan Bell Laboratories Murray Hill, New Jersey 07974 _A_B_S_T_R_A_C_T Although Fortran is not a pleasant language to use, it does have the advantages of universality and (usually) rela- tive efficiency. The Ratfor language attempts to conceal the main deficiencies of Fortran while retaining its desir- able qualities, by providing decent control flow statements: o+ statement grouping o+ _i_f-_e_l_s_e and _s_w_i_t_c_h for decision-making o+ _w_h_i_l_e, _f_o_r, _d_o, and _r_e_p_e_a_t-_u_n_t_i_l for looping o+ _b_r_e_a_k and _n_e_x_t for controlling loop exits and some ``syntactic sugar'': -------------------------- This paper is a revised and expanded version of oe pub- lished in _S_o_f_t_w_a_r_e-_P_r_a_c_t_i_c_e _a_n_d _E_x_p_e_r_i_e_n_c_e, October 1975. The Ratfor described here is the one in use on UNIX and GCOS at Bell Laboratories, Murray Hill, N. J. - 2 - o+ free form input (multiple statements/line, automatic continuation) o+ unobtrusive comment convention o+ translation of >, >=, etc., into .GT., .GE., etc. o+ _r_e_t_u_r_n(expression) statement for functions o+ _d_e_f_i_n_e statement for symbolic parameters o+ _i_n_c_l_u_d_e statement for including source files Ratfor is implemented as a preprocessor which translates this language into Fortran. Once the control flow and cosmetic deficiencies of For- tran are hidden, the resulting language is remarkably pleasant to use. Ratfor programs are markedly easier to write, and to read, and thus easier to debug, maintain and modify than their Fortran equivalents. It is readily possible to write Ratfor programs which are portable to other environments. Ratfor is written in itself in this way, so it is also portable; versions of Rat- for are now running on at least two dozen different types of computers at over five hundred locations. This paper discusses design criteria for a Fortran preprocessor, the Ratfor language and its implementation, and user experience. - 3 - _1. _I_N_T_R_O_D_U_C_T_I_O_N Most programmers will agree that Fortran is an unpleasant language to program in, yet there are many occa- sions when they are forced to use it. For example, Fortran is often the only language thoroughly supported on the local computer. Indeed, it is the closest thing to a universal programming language currently available: with care it is possible to write large, truly portable Fortran programs[1]. Finally, Fortran is often the most ``efficient'' language available, particularly for programs requiring much computa- tion. But Fortran _i_s unpleasant. Perhaps the worst defi- ciency is in the control flow statements - conditional branches and loops - which express the logic of the program. The conditional statements in Fortran are primitive. The Arithmetic IF forces the user into at least two statement numbers and two (implied) GOTO's; it leads to unintelligible code, and is eschewed by good programmers. The Logical IF is better, in that the test part can be stated clearly, but hopelessly restrictive because the statement that follows the IF can only be one Fortran statement (with some _f_u_r_t_h_e_r restrictions!). And of course there can be no ELSE part to a Fortran IF: there is no way to specify an alternative action if the IF is not satisfied. The Fortran DO restricts the user to going forward in an arithmetic progression. It is fine for ``1 to N in steps - 4 - of 1 (or 2 or ...)'', but there is no direct way to go back- wards, or even (in ANSI Fortran[2]) to go from 1 to N-1. And of course the DO is useless if one's problem doesn't map into an arithmetic progression. The result of these failings is that Fortran programs must be written with numerous labels and branches. The resulting code is particularly difficult to read and under- stand, and thus hard to debug and modify. When one is faced with an unpleasant language, a useful technique is to define a new language that overcomes the deficiencies, and to translate it into the unpleasant one with a preprocessor. This is the approach taken with Rat- for. (The preprocessor idea is of course not new, and preprocessors for Fortran are especially popular today. A recent listing [3] of preprocessors shows more than 50, of which at least half a dozen are widely available.) _2. _L_A_N_G_U_A_G_E _D_E_S_C_R_I_P_T_I_O_N _D_e_s_i_g_n Ratfor attempts to retain the merits of Fortran (universality, portability, efficiency) while hiding the worst Fortran inadequacies. The language _i_s Fortran except for two aspects. First, since control flow is central to any program, regardless of the specific application, the primary task of Ratfor is to conceal this part of Fortran from the user, by providing decent control flow structures. - 5 - These structures are sufficient and comfortable for struc- tured programming in the narrow sense of programming without GOTO's. Second, since the preprocessor must examine an entire program to translate the control structure, it is possible at the same time to clean up many of the ``cosmetic'' deficiencies of Fortran, and thus provide a language which is easier and more pleasant to read and write. Beyond these two aspects - control flow and cosmetics - Ratfor does nothing about the host of other weaknesses of Fortran. Although it would be straightforward to extend it to provide character strings, for example, they are not needed by everyone, and of course the preprocessor would be harder to implement. Throughout, the design principle which has determined what should be in Ratfor and what should not has been _R_a_t_f_o_r _d_o_e_s_n'_t _k_n_o_w _a_n_y _F_o_r_t_r_a_n. Any language feature which would require that Ratfor really understand Fortran has been omitted. We will return to this point in the section on implementation. Even within the confines of control flow and cosmetics, we have attempted to be selective in what features to pro- vide. The intent has been to provide a small set of the most useful constructs, rather than to throw in everything that has ever been thought useful by someone. The rest of this section contains an informal descrip- tion of the Ratfor language. The control flow aspects will - 6 - be quite familiar to readers used to languages like Algol, PL/I, Pascal, etc., and the cosmetic changes are equally straightforward. We shall concentrate on showing what the language looks like. _S_t_a_t_e_m_e_n_t _G_r_o_u_p_i_n_g Fortran provides no way to group statements together, short of making them into a subroutine. The standard con- struction ``if a condition is true, do this group of things,'' for example, if (x > 100) { call error("x>100"); err = 1; return } cannot be written directly in Fortran. Instead a programmer is forced to translate this relatively clear thought into murky Fortran, by stating the negative condition and branch- ing around the group of statements: if (x .le. 100) goto 10 call error(5hx>100) err = 1 return 10 ... When the program doesn't work, or when it must be modified, this must be translated back into a clearer form before one can be sure what it does. Ratfor eliminates this error-prone and confusing back- and-forth translation; the first form _i_s the way the compu- tation is written in Ratfor. A group of statements can be treated as a unit by enclosing them in the braces { and }. - 7 - This is true throughout the language: wherever a single Rat- for statement can be used, there can be several enclosed in braces. (Braces seem clearer and less obtrusive than _b_e_g_i_n and _e_n_d or _d_o and _e_n_d, and of course _d_o and _e_n_d already have Fortran meanings.) Cosmetics contribute to the readability of code, and thus to its understandability. The character ``>'' is clearer than ``.GT.'', so Ratfor translates it appropri- ately, along with several other similar shorthands. Although many Fortran compilers permit character strings in quotes (like "_x>_1_0_0"), quotes are not allowed in ANSI For- tran, so Ratfor converts it into the right number of _H's: computers count better than people do. Ratfor is a free-form language: statements may appear anywhere on a line, and several may appear on one line if they are separated by semicolons. The example above could also be written as if (x > 100) { call error("x>100") err = 1 return } In this case, no semicolon is needed at the end of each line because Ratfor assumes there is one statement per line unless told otherwise. Of course, if the statement that follows the _i_f is a single statement (Ratfor or otherwise), no braces are - 8 - needed: if (y <= 0.0 & z <= 0.0) write(6, 20) y, z No continuation need be indicated because the statement is clearly not finished on the first line. In general Ratfor continues lines when it seems obvious that they are not yet done. (The continuation convention is discussed in detail later.) Although a free-form language permits wide latitude in formatting styles, it is wise to pick one that is readable, then stick to it. In particular, proper indentation is vital, to make the logical structure of the program obvious to the reader. _T_h_e ``_e_l_s_e'' _C_l_a_u_s_e Ratfor provides an _e_l_s_e statement to handle the con- struction ``if a condition is true, do this thing, _o_t_h_e_r_w_i_s_e do that thing.'' if (a <= b) { sw = 0; write(6, 1) a, b } else { sw = 1; write(6, 1) b, a } This writes out the smaller of _a and _b, then the larger, and sets _s_w appropriately. The Fortran equivalent of this code is circuitous indeed: - 9 - if (a .gt. b) goto 10 sw = 0 write(6, 1) a, b goto 20 10 sw = 1 write(6, 1) b, a 20 ... This is a mechanical translation; shorter forms exist, as they do for many similar situations. But all translations suffer from the same problem: since they are translations, they are less clear and understandable than code that is not a translation. To understand the Fortran version, one must scan the entire program to make sure that no other statement branches to statements 10 or 20 before one knows that indeed this is an _i_f-_e_l_s_e construction. With the Ratfor version, there is no question about how one gets to the parts of the statement. The _i_f-_e_l_s_e is a single unit, which can be read, understood, and ignored if not relevant. The program says what it means. As before, if the statement following an _i_f or an _e_l_s_e is a single statement, no braces are needed: if (a <= b) sw = 0 else sw = 1 The syntax of the _i_f statement is if (_l_e_g_a_l _F_o_r_t_r_a_n _c_o_n_d_i_t_i_o_n) _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t else _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t where the _e_l_s_e part is optional. The _l_e_g_a_l _F_o_r_t_r_a_n - 10 - _c_o_n_d_i_t_i_o_n is anything that can legally go into a Fortran Logical IF. Ratfor does not check this clause, since it does not know enough Fortran to know what is permitted. The _R_a_t_- _f_o_r _s_t_a_t_e_m_e_n_t is any Ratfor or Fortran statement, or any collection of them in braces. _N_e_s_t_e_d _i_f'_s Since the statement that follows an _i_f or an _e_l_s_e can be any Ratfor statement, this leads immediately to the pos- sibility of another _i_f or _e_l_s_e. As a useful example, con- sider this problem: the variable _f is to be set to -1 if _x is less than zero, to +1 if _x is greater than 100, and to 0 otherwise. Then in Ratfor, we write if (x < 0) f = -1 else if (x > 100) f = +1 else f = 0 Here the statement after the first _e_l_s_e is another _i_f-_e_l_s_e. Logically it is just a single statement, although it is rather complicated. This code says what it means. Any version written in straight Fortran will necessarily be indirect because For- tran does not let you say what you mean. And as always, clever shortcuts may turn out to be too clever to understand a year from now. Following an _e_l_s_e with an _i_f is one way to write a - 11 - multi-way branch in Ratfor. In general the structure if (...) - - - else if (...) - - - else if (...) - - - ... else - - - provides a way to specify the choice of exactly one of several alternatives. (Ratfor also provides a _s_w_i_t_c_h state- ment which does the same job in certain special cases; in more general situations, we have to make do with spare parts.) The tests are laid out in sequence, and each one is followed by the code associated with it. Read down the list of decisions until one is found that is satisfied. The code associated with this condition is executed, and then the entire structure is finished. The trailing _e_l_s_e part han- dles the ``default'' case, where none of the other condi- tions apply. If there is no default action, this final _e_l_s_e part is omitted: if (x < 0) x = 0 else if (x > 100) x = 100 _i_f-_e_l_s_e _a_m_b_i_g_u_i_t_y There is one thing to notice about complicated struc- tures involving nested _i_f's and _e_l_s_e's. Consider - 12 - if (x > 0) if (y > 0) write(6, 1) x, y else write(6, 2) y There are two _i_f's and only one _e_l_s_e. Which _i_f does the _e_l_s_e go with? This is a genuine ambiguity in Ratfor, as it is in many other programming languages. The ambiguity is resolved in Ratfor (as elsewhere) by saying that in such cases the _e_l_s_e goes with the closest previous un-_e_l_s_e'ed _i_f. Thus in this case, the _e_l_s_e goes with the inner _i_f, as we have indicated by the indentation. It is a wise practice to resolve such cases by explicit braces, just to make your intent clear. In the case above, we would write if (x > 0) { if (y > 0) write(6, 1) x, y else write(6, 2) y } which does not change the meaning, but leaves no doubt in the reader's mind. If we want the other association, we _m_u_s_t write if (x > 0) { if (y > 0) write(6, 1) x, y } else write(6, 2) y - 13 - _T_h_e ``_s_w_i_t_c_h'' _S_t_a_t_e_m_e_n_t The _s_w_i_t_c_h statement provides a clean way to express multi-way branches which branch on the value of some integer-valued expression. The syntax is switch (_e_x_p_r_e_s_s_i_o_n) { case _e_x_p_r_1 : _s_t_a_t_e_m_e_n_t_s case _e_x_p_r_2, _e_x_p_r_3 : _s_t_a_t_e_m_e_n_t_s ... default: _s_t_a_t_e_m_e_n_t_s } Each _c_a_s_e is followed by a list of comma-separated integer expressions. The _e_x_p_r_e_s_s_i_o_n inside _s_w_i_t_c_h is com- pared against the case expressions _e_x_p_r_1, _e_x_p_r_2, and so on in turn until one matches, at which time the statements fol- lowing that _c_a_s_e are executed. If no cases match _e_x_p_r_e_s_- _s_i_o_n, and there is a _d_e_f_a_u_l_t section, the statements with it are done; if there is no _d_e_f_a_u_l_t, nothing is done. In all situations, as soon as some block of statements is executed, the entire _s_w_i_t_c_h is exited immediately. (Readers familiar with C[4] should beware that this behavior is not the same as the C _s_w_i_t_c_h.) _T_h_e ``_d_o'' _S_t_a_t_e_m_e_n_t The _d_o statement in Ratfor is quite similar to the DO statement in Fortran, except that it uses no statement number. The statement number, after all, serves only to - 14 - mark the end of the DO, and this can be done just as easily with braces. Thus do i = 1, n { x(i) = 0.0 y(i) = 0.0 z(i) = 0.0 } is the same as do 10 i = 1, n x(i) = 0.0 y(i) = 0.0 z(i) = 0.0 10 continue The syntax is: do _l_e_g_a_l-_F_o_r_t_r_a_n-_D_O-_t_e_x_t _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t The part that follows the keyword _d_o has to be something that can legally go into a Fortran DO statement. Thus if a local version of Fortran allows DO limits to be expressions (which is not currently permitted in ANSI Fortran), they can be used in a Ratfor _d_o. The _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t part will often be enclosed in braces, but as with the _i_f, a single statement need not have braces around it. This code sets an array to zero: do i = 1, n x(i) = 0.0 Slightly more complicated, do i = 1, n do j = 1, n m(i, j) = 0 - 15 - sets the entire array _m to zero, and do i = 1, n do j = 1, n if (i < j) m(i, j) = -1 else if (i == j) m(i, j) = 0 else m(i, j) = +1 sets the upper triangle of _m to -1, the diagonal to zero, and the lower triangle to +1. (The operator == is ``equals'', that is, ``.EQ.''.) In each case, the statement that follows the _d_o is logically a _s_i_n_g_l_e statement, even though complicated, and thus needs no braces. ``_b_r_e_a_k'' _a_n_d ``_n_e_x_t'' Ratfor provides a statement for leaving a loop early, and one for beginning the next iteration. _b_r_e_a_k causes an immediate exit from the _d_o; in effect it is a branch to the statement _a_f_t_e_r the _d_o. _n_e_x_t is a branch to the bottom of the loop, so it causes the next iteration to be done. For example, this code skips over negative values in an array: do i = 1, n { if (x(i) < 0.0) next _p_r_o_c_e_s_s _p_o_s_i_t_i_v_e _e_l_e_m_e_n_t } _b_r_e_a_k and _n_e_x_t also work in the other Ratfor looping con- structions that we will talk about in the next few sections. _b_r_e_a_k and _n_e_x_t can be followed by an integer to indi- cate breaking or iterating that level of enclosing loop; - 16 - thus break 2 exits from two levels of enclosing loops, and _b_r_e_a_k _1 is equivalent to _b_r_e_a_k. _n_e_x_t _2 iterates the second enclosing loop. (Realistically, multi-level _b_r_e_a_k's and _n_e_x_t's are not likely to be much used because they lead to code that is hard to understand and somewhat risky to change.) _T_h_e ``_w_h_i_l_e'' _S_t_a_t_e_m_e_n_t One of the problems with the Fortran DO statement is that it generally insists upon being done once, regardless of its limits. If a loop begins DO I = 2, 1 this will typically be done once with _I set to 2, even though common sense would suggest that perhaps it shouldn't be. Of course a Ratfor _d_o can easily be preceded by a test if (j <= k) do i = j, k { - - - } but this has to be a conscious act, and is often overlooked by programmers. A more serious problem with the DO statement is that it encourages that a program be written in terms of an arith- metic progression with small positive steps, even though that may not be the best way to write it. If code has to be - 17 - contorted to fit the requirements imposed by the Fortran DO, it is that much harder to write and understand. To overcome these difficulties, Ratfor provides a _w_h_i_l_e statement, which is simply a loop: ``while some condition is true, repeat this group of statements''. It has no precon- ceptions about why one is looping. For example, this rou- tine to compute sin(x) by the Maclaurin series combines two termination criteria. real function sin(x, e) # returns sin(x) to accuracy e, by # sin(x) = x - x**3/3! + x**5/5! - ... sin = x term = x i = 3 while (abs(term)>e & i<100) { term = -term * x**2 / float(i*(i-1)) sin = sin + term i = i + 2 } return end Notice that if the routine is entered with _t_e_r_m already smaller than _e, the loop will be done _z_e_r_o _t_i_m_e_s, that is, no attempt will be made to compute _x**_3 and thus a potential underflow is avoided. Since the test is made at the top of a _w_h_i_l_e loop instead of the bottom, a special case disap- pears - the code works at one of its boundaries. (The test _i<_1_0_0 is the other boundary - making sure the routine stops after some maximum number of iterations.) As an aside, a sharp character ``#'' in a line marks - 18 - the beginning of a comment; the rest of the line is comment. Comments and code can co-exist on the same line - one can make marginal remarks, which is not possible with Fortran's ``C in column 1'' convention. Blank lines are also permit- ted anywhere (they are not in Fortran); they should be used to emphasize the natural divisions of a program. The syntax of the _w_h_i_l_e statement is while (_l_e_g_a_l _F_o_r_t_r_a_n _c_o_n_d_i_t_i_o_n) _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t As with the _i_f, _l_e_g_a_l _F_o_r_t_r_a_n _c_o_n_d_i_t_i_o_n is something that can go into a Fortran Logical IF, and _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t is a single statement, which may be multiple statements in braces. The _w_h_i_l_e encourages a style of coding not normally practiced by Fortran programmers. For example, suppose _n_e_x_t_c_h is a function which returns the next input character both as a function value and in its argument. Then a loop to find the first non-blank character is just while (nextch(ich) == iblank) ; A semicolon by itself is a null statement, which is neces- sary here to mark the end of the _w_h_i_l_e; if it were not present, the _w_h_i_l_e would control the next statement. When the loop is broken, _i_c_h contains the first non-blank. Of course the same code can be written in Fortran as 100 if (nextch(ich) .eq. iblank) goto 100 - 19 - but many Fortran programmers (and a few compilers) believe this line is illegal. The language at one's disposal strongly influences how one thinks about a problem. _T_h_e ``_f_o_r'' _S_t_a_t_e_m_e_n_t The _f_o_r statement is another Ratfor loop, which attempts to carry the separation of loop-body from reason- for-looping a step further than the _w_h_i_l_e. A _f_o_r statement allows explicit initialization and increment steps as part of the statement. For example, a DO loop is just for (i = 1; i <= n; i = i + 1) ... This is equivalent to i = 1 while (i <= n) { ... i = i + 1 } The initialization and increment of _i have been moved into the _f_o_r statement, making it easier to see at a glance what controls the loop. The _f_o_r and _w_h_i_l_e versions have the advantage that they will be done zero times if _n is less than 1; this is not true of the _d_o. The loop of the sine routine in the previous section can be re-written with a _f_o_r as - 20 - for (i=3; abs(term) > e & i < 100; i=i+2) { term = -term * x**2 / float(i*(i-1)) sin = sin + term } The syntax of the _f_o_r statement is for ( _i_n_i_t ; _c_o_n_d_i_t_i_o_n ; _i_n_c_r_e_m_e_n_t ) _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t _i_n_i_t is any single Fortran statement, which gets done once before the loop begins. _i_n_c_r_e_m_e_n_t is any single Fortran statement, which gets done at the end of each pass through the loop, before the test. _c_o_n_d_i_t_i_o_n is again anything that is legal in a logical IF. Any of _i_n_i_t, _c_o_n_d_i_t_i_o_n, and _i_n_c_r_e_- _m_e_n_t may be omitted, although the semicolons _m_u_s_t always be present. A non-existent _c_o_n_d_i_t_i_o_n is treated as always true, so _f_o_r(;;) is an indefinite repeat. (But see the _r_e_p_e_a_t-_u_n_t_i_l in the next section.) The _f_o_r statement is particularly useful for backward loops, chaining along lists, loops that might be done zero times, and similar things which are hard to express with a DO statement, and obscure to write out with IF's and GOTO's. For example, here is a backwards DO loop to find the last non-blank character on a card: for (i = 80; i > 0; i = i - 1) if (card(i) != blank) break (``!='' is the same as ``.NE.''). The code scans the columns from 80 through to 1. If a non-blank is found, the loop is - 21 - immediately broken. (_b_r_e_a_k and _n_e_x_t work in _f_o_r's and _w_h_i_l_e's just as in _d_o's). If _i reaches zero, the card is all blank. This code is rather nasty to write with a regular For- tran DO, since the loop must go forward, and we must expli- citly set up proper conditions when we fall out of the loop. (Forgetting this is a common error.) Thus: DO 10 J = 1, 80 I = 81 - J IF (CARD(I) .NE. BLANK) GO TO 11 10 CONTINUE I = 0 11 ... The version that uses the _f_o_r handles the termination condi- tion properly for free; _i _i_s zero when we fall out of the _f_o_r loop. The increment in a _f_o_r need not be an arithmetic pro- gression; the following program walks along a list (stored in an integer array _p_t_r) until a zero pointer is found, adding up elements from a parallel array of values: sum = 0.0 for (i = first; i > 0; i = ptr(i)) sum = sum + value(i) Notice that the code works correctly if the list is empty. Again, placing the test at the top of a loop instead of the bottom eliminates a potential boundary error. _T_h_e ``_r_e_p_e_a_t-_u_n_t_i_l'' _s_t_a_t_e_m_e_n_t In spite of the dire warnings, there are times when one - 22 - really needs a loop that tests at the bottom after one pass through. This service is provided by the _r_e_p_e_a_t-_u_n_t_i_l: repeat _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t until (_l_e_g_a_l _F_o_r_t_r_a_n _c_o_n_d_i_t_i_o_n) The _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t part is done once, then the condition is evaluated. If it is true, the loop is exited; if it is false, another pass is made. The _u_n_t_i_l part is optional, so a bare _r_e_p_e_a_t is the cleanest way to specify an infinite loop. Of course such a loop must ultimately be broken by some transfer of control such as _s_t_o_p, _r_e_t_u_r_n, or _b_r_e_a_k, or an implicit stop such as running out of input with a READ statement. As a matter of observed fact[8], the _r_e_p_e_a_t-_u_n_t_i_l statement is _m_u_c_h less used than the other looping construc- tions; in particular, it is typically outnumbered ten to one by _f_o_r and _w_h_i_l_e. Be cautious about using it, for loops that test only at the bottom often don't handle null cases well. _M_o_r_e _o_n _b_r_e_a_k _a_n_d _n_e_x_t _b_r_e_a_k exits immediately from _d_o, _w_h_i_l_e, _f_o_r, and _r_e_p_e_a_t-_u_n_t_i_l. _n_e_x_t goes to the test part of _d_o, _w_h_i_l_e and _r_e_p_e_a_t-_u_n_t_i_l, and to the increment step of a _f_o_r. ``_r_e_t_u_r_n'' _S_t_a_t_e_m_e_n_t The standard Fortran mechanism for returning a value from a function uses the name of the function as a variable - 23 - which can be assigned to; the last value stored in it is the function value upon return. For example, here is a routine _e_q_u_a_l which returns 1 if two arrays are identical, and zero if they differ. The array ends are marked by the special value -1. # equal - compare str1 to str2; # return 1 if equal, 0 if not integer function equal(str1, str2) integer str1(100), str2(100) integer i for (i = 1; str1(i) == str2(i); i = i + 1) if (str1(i) == -1) { equal = 1 return } equal = 0 return end In many languages (e.g., PL/I) one instead says return (_e_x_p_r_e_s_s_i_o_n) to return a value from a function. Since this is often clearer, Ratfor provides such a _r_e_t_u_r_n statement - in a function _F, _r_e_t_u_r_n(expression) is equivalent to { F = expression; return } For example, here is _e_q_u_a_l again: - 24 - # equal - compare str1 to str2; # return 1 if equal, 0 if not integer function equal(str1, str2) integer str1(100), str2(100) integer i for (i = 1; str1(i) == str2(i); i = i + 1) if (str1(i) == -1) return(1) return(0) end If there is no parenthesized expression after _r_e_t_u_r_n, a nor- mal RETURN is made. (Another version of _e_q_u_a_l is presented shortly.) _C_o_s_m_e_t_i_c_s As we said above, the visual appearance of a language has a substantial effect on how easy it is to read and understand programs. Accordingly, Ratfor provides a number of cosmetic facilities which may be used to make programs more readable. _F_r_e_e-_f_o_r_m _I_n_p_u_t Statements can be placed anywhere on a line; long statements are continued automatically, as are long condi- tions in _i_f, _w_h_i_l_e, _f_o_r, and _u_n_t_i_l. Blank lines are ignored. Multiple statements may appear on one line, if they are separated by semicolons. No semicolon is needed at the end of a line, if Ratfor can make some reasonable guess about whether the statement ends there. Lines ending with any of the characters - 25 - = + - * , | & ( _ are assumed to be continued on the next line. Underscores are discarded wherever they occur; all others remain as part of the statement. Any statement that begins with an all-numeric field is assumed to be a Fortran label, and placed in columns 1-5 upon output. Thus write(6, 100); 100 format("hello") is converted into write(6, 100) 100 format(5hhello) _T_r_a_n_s_l_a_t_i_o_n _S_e_r_v_i_c_e_s Text enclosed in matching single or double quotes is converted to _n_H... but is otherwise unaltered (except for formatting - it may get split across card boundaries during the reformatting process). Within quoted strings, the backslash `\' serves as an escape character: the next char- acter is taken literally. This provides a way to get quotes (and of course the backslash itself) into quoted strings: "\\\'" is a string containing a backslash and an apostrophe. (This is _n_o_t the standard convention of doubled quotes, but it is easier to use and more general.) Any line that begins with the character `%' is left - 26 - absolutely unaltered except for stripping off the `%' and moving the line one position to the left. This is useful for inserting control cards, and other things that should not be transmogrified (like an existing Fortran program). Use `%' only for ordinary statements, not for the condition parts of _i_f, _w_h_i_l_e, etc., or the output may come out in an unexpected place. The following character translations are made, except within single or double quotes or on a line beginning with a `%'. == .eq. != .ne. > .gt. >= .ge. < .lt. <= .le. & .and. | .or. ! .not. ^ .not. In addition, the following translations are provided for input devices with restricted character sets. [ { ] } $( { $) } ``_d_e_f_i_n_e'' _S_t_a_t_e_m_e_n_t Any string of alphanumeric characters can be defined as a name; thereafter, whenever that name occurs in the input (delimited by non-alphanumerics) it is replaced by the rest of the definition line. (Comments and trailing white spaces are stripped off). A defined name can be arbitrarily long, and must begin with a letter. _d_e_f_i_n_e is typically used to create symbolic parameters: - 27 - define ROWS 100 define COLS 50 dimension a(ROWS), b(ROWS, COLS) if (i > ROWS | j > COLS) ... Alternately, definitions may be written as define(ROWS, 100) In this case, the defining text is everything after the comma up to the balancing right parenthesis; this allows multi-line definitions. It is generally a wise practice to use symbolic parame- ters for most constants, to help make clear the function of what would otherwise be mysterious numbers. As an example, here is the routine _e_q_u_a_l again, this time with symbolic constants. define YES 1 define NO 0 define EOS -1 define ARB 100 # equal - compare str1 to str2; # return YES if equal, NO if not integer function equal(str1, str2) integer str1(ARB), str2(ARB) integer i for (i = 1; str1(i) == str2(i); i = i + 1) if (str1(i) == EOS) return(YES) return(NO) end ``_i_n_c_l_u_d_e'' _S_t_a_t_e_m_e_n_t The statement - 28 - include file inserts the file found on input stream _f_i_l_e into the Ratfor input in place of the _i_n_c_l_u_d_e statement. The standard usage is to place COMMON blocks on a file, and _i_n_c_l_u_d_e that file whenever a copy is needed: subroutine x include commonblocks ... end suroutine y include commonblocks ... end This ensures that all copies of the COMMON blocks are ident- ical _P_i_t_f_a_l_l_s, _B_o_t_c_h_e_s, _B_l_e_m_i_s_h_e_s _a_n_d _o_t_h_e_r _F_a_i_l_i_n_g_s Ratfor catches certain syntax errors, such as missing braces, _e_l_s_e clauses without an _i_f, and most errors involv- ing missing parentheses in statements. Beyond that, since Ratfor knows no Fortran, any errors you make will be reported by the Fortran compiler, so you will from time to time have to relate a Fortran diagnostic back to the Ratfor source. Keywords are reserved - using _i_f, _e_l_s_e, etc., as vari- able names will typically wreak havoc. Don't leave spaces in keywords. Don't use the Arithmetic IF. The Fortran _n_H convention is not recognized anywhere by - 29 - Ratfor; use quotes instead. _3. _I_M_P_L_E_M_E_N_T_A_T_I_O_N Ratfor was originally written in C[4] on the UNIX operating system[5]. The language is specified by a context free grammar and the compiler constructed using the YACC compiler-compiler[6]. The Ratfor grammar is simple and straightforward, being essentially prog : stat | prog stat stat : if (...) stat | if (...) stat else stat | while (...) stat | for (...; ...; ...) stat | do ... stat | repeat stat | repeat stat until (...) | switch (...) { case ...: prog ... default: prog } | return | break | next | digits stat | { prog } | anything unrecognizable The observation that Ratfor knows no Fortran follows directly from the rule that says a statement is ``anything unrecognizable''. In fact most of Fortran falls into this category, since any statement that does not begin with one of the keywords is by definition ``unrecognizable.'' Code generation is also simple. If the first thing on a source line is not a keyword (like _i_f, _e_l_s_e, etc.) the entire statement is simply copied to the output with - 30 - appropriate character translation and formatting. (Leading digits are treated as a label.) Keywords cause only slightly more complicated actions. For example, when _i_f is recog- nized, two consecutive labels L and L+1 are generated and the value of L is stacked. The condition is then isolated, and the code if (.not. (condition)) goto L is output. The _s_t_a_t_e_m_e_n_t part of the _i_f is then translated. When the end of the statement is encountered (which may be some distance away and include nested if's, of course), the code L continue is generated, unless there is an _e_l_s_e clause, in which case the code is goto L+1 L continue In this latter case, the code L+1 continue is produced after the _s_t_a_t_e_m_e_n_t part of the _e_l_s_e. Code gen- eration for the various loops is equally simple. One might argue that more care should be taken in code generation. For example, if there is no trailing _e_l_s_e, if (i > 0) x = a should be left alone, not converted into - 31 - if (.not. (i .gt. 0)) goto 100 x = a 100 continue But what are optimizing compilers for, if not to improve code? It is a rare program indeed where this kind of ``inefficiency'' will make even a measurable difference. In the few cases where it is important, the offending lines can be protected by `%'. The use of a compiler-compiler is definitely the pre- ferred method of software development. The language is well-defined, with few syntactic irregularities. Implemen- tation is quite simple; the original construction took under a week. The language is sufficiently simple, however, that an _a_d _h_o_c recognizer can be readily constructed to do the same job if no compiler-compiler is available. The C version of Ratfor is used on UNIX and on the Honeywell GCOS systems. C compilers are not as widely available as Fortran, however, so there is also a Ratfor written in itself and originally bootstrapped with the C version. The Ratfor version was written so as to translate into the portable subset of Fortran described in [1], so it is portable, having been run essentially without change on at least twelve distinct machines. (The main restrictions of the portable subset are: only one character per machine word; subscripts in the form _c*_v+__c; avoiding expressions in places like DO loops; consistency in subroutine argument usage, and in COMMON declarations. Ratfor itself will not - 32 - gratuitously generate non-standard Fortran.) The Ratfor version is about 1500 lines of Ratfor (com- pared to about 1000 lines of C); this compiles into 2500 lines of Fortran. This expansion ratio is somewhat higher than average, since the compiled code contains unnecessary occurrences of COMMON declarations. The execution time of the Ratfor version is dominated by two routines that read and write cards. Clearly these routines could be replaced by machine coded local versions; unless this is done, the efficiency of other parts of the translation process is largely irrelevant. _4. _E_X_P_E_R_I_E_N_C_E _G_o_o_d _T_h_i_n_g_s ``It's so much better than Fortran'' is the most common response of users when asked how well Ratfor meets their needs. Although cynics might consider this to be vacuous, it does seem to be true that decent control flow and cosmet- ics converts Fortran from a bad language into quite a rea- sonable one, assuming that Fortran data structures are ade- quate for the task at hand. Although there are no quantitative results, users feel that coding in Ratfor is at least twice as fast as in For- tran. More important, debugging and subsequent revision are much faster than in Fortran. Partly this is simply because the code can be _r_e_a_d. The looping statements which test at - 33 - the top instead of the bottom seem to eliminate or at least reduce the occurrence of a wide class of boundary errors. And of course it is easy to do structured programming in Ratfor; this self-discipline also contributes markedly to reliability. One interesting and encouraging fact is that programs written in Ratfor tend to be as readable as programs written in more modern languages like Pascal. Once one is freed from the shackles of Fortran's clerical detail and rigid input format, it is easy to write code that is readable, even esthetically pleasing. For example, here is a Ratfor implementation of the linear table search discussed by Knuth [7]: A(m+1) = x for (i = 1; A(i) != x; i = i + 1) ; if (i > m) { m = i B(i) = 1 } else B(i) = B(i) + 1 A large corpus (5400 lines) of Ratfor, including a subset of the Ratfor preprocessor itself, can be found in [8]. _B_a_d _T_h_i_n_g_s The biggest single problem is that many Fortran syntax errors are not detected by Ratfor but by the local Fortran compiler. The compiler then prints a message in terms of the generated Fortran, and in a few cases this may be diffi- - 34 - cult to relate back to the offending Ratfor line, especially if the implementation conceals the generated Fortran. This problem could be dealt with by tagging each generated line with some indication of the source line that created it, but this is inherently implementation-dependent, so no action has yet been taken. Error message interpretation is actu- ally not so arduous as might be thought. Since Ratfor gen- erates no variables, only a simple pattern of IF's and GOTO's, data-related errors like missing DIMENSION state- ments are easy to find in the Fortran. Furthermore, there has been a steady improvement in Ratfor's ability to catch trivial syntactic errors like unbalanced parentheses and quotes. There are a number of implementation weaknesses that are a nuisance, especially to new users. For example, key- words are reserved. This rarely makes any difference, except for those hardy souls who want to use an Arithmetic IF. A few standard Fortran constructions are not accepted by Ratfor, and this is perceived as a problem by users with a large corpus of existing Fortran programs. Protecting every line with a `%' is not really a complete solution, although it serves as a stop-gap. The best long-term solution is provided by the program Struct [9], which converts arbitrary Fortran programs into Ratfor. Users who export programs often complain that the gen- erated Fortran is ``unreadable'' because it is not taste- - 35 - fully formatted and contains extraneous CONTINUE statements. To some extent this can be ameliorated (Ratfor now has an option to copy Ratfor comments into the generated Fortran), but it has always seemed that effort is better spent on the input language than on the output esthetics. One final problem is partly attributable to success - since Ratfor is relatively easy to modify, there are now several dialects of Ratfor. Fortunately, so far most of the differences are in character set, or in invisible aspects like code generation. _5. _C_O_N_C_L_U_S_I_O_N_S Ratfor demonstrates that with modest effort it is pos- sible to convert Fortran from a bad language into quite a good one. A preprocessor is clearly a useful way to extend or ameliorate the facilities of a base language. When designing a language, it is important to concen- trate on the essential requirement of providing the user with the best language possible for a given effort. One must avoid throwing in ``features'' - things which the user may trivially construct within the existing framework. One must also avoid getting sidetracked on irrelevan- cies. For instance it seems pointless for Ratfor to prepare a neatly formatted listing of either its input or its out- put. The user is presumably capable of the self-discipline required to prepare neat input that reflects his thoughts. - 36 - It is much more important that the language provide free- form input so he _c_a_n format it neatly. No one should read the output anyway except in the most dire circumstances. _A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s C. A. R. Hoare once said that ``One thing [the language designer] should not do is to include untried ideas of his own.'' Ratfor follows this precept very closely - everything in it has been stolen from someone else. Most of the con- trol flow structures are taken directly from the language C[4] developed by Dennis Ritchie; the comment and continua- tion conventions are adapted from Altran[10]. I am grateful to Stuart Feldman, whose patient simula- tion of an innocent user during the early days of Ratfor led to several design improvements and the eradication of bugs. He also translated the C parse-tables and YACC parser into Fortran for the first Ratfor version of Ratfor. _R_e_f_e_r_e_n_c_e_s [1] B. G. Ryder, ``The PFORT Verifier,'' _S_o_f_t_w_a_r_e-_P_r_a_c_t_i_c_e & _E_x_p_e_r_i_e_n_c_e, October 1974. [2] American National Standard Fortran. American National Standards Institute, New York, 1966. [3] _F_o_r-_w_o_r_d: _F_o_r_t_r_a_n _D_e_v_e_l_o_p_m_e_n_t _N_e_w_s_l_e_t_t_e_r, August 1975. [4] B. W. Kernighan and D. M. Ritchie, _T_h_e _C _P_r_o_g_r_a_m_m_i_n_g - 37 - _L_a_n_g_u_a_g_e, Prentice-Hall, Inc., 1978. [5] D. M. Ritchie and K. L. Thompson, ``The UNIX Time- sharing System.'' _C_A_C_M, July 1974. [6] S. C. Johnson, ``YACC - Yet Another Compiler- Compiler.'' Bell Laboratories Computing Science Techni- cal Report #32, 1978. [7] D. E. Knuth, ``Structured Programming with goto State- ments.'' _C_o_m_p_u_t_i_n_g _S_u_r_v_e_y_s, December 1974. [8] B. W. Kernighan and P. J. Plauger, _S_o_f_t_w_a_r_e _T_o_o_l_s, Addison-Wesley, 1976. [9] B. S. Baker, ``Struct - A Program which Structures For- tran'', Bell Laboratories internal memorandum, December 1975. [10] A. D. Hall, ``The Altran System for Rational Function Manipulation - A Survey.'' _C_A_C_M, August 1971. - 38 - _A_p_p_e_n_d_i_x: _U_s_a_g_e _o_n _U_N_I_X _a_n_d _G_C_O_S. Beware - local customs vary. Check with a native before going into the jungle. _U_N_I_X The program _r_a_t_f_o_r is the basic translator; it takes either a list of file names or the standard input and writes Fortran on the standard output. Options include -_6_x, which uses _x as a continuation character in column 6 (UNIX uses & in column 1), and -_C, which causes Ratfor comments to be copied into the generated Fortran. The program _r_c provides an interface to the _r_a_t_f_o_r com- mand which is much the same as _c_c. Thus rc [options] files compiles the files specified by _f_i_l_e_s. Files with names end- ing in ._r are Ratfor source; other files are assumed to be for the loader. The flags -_C and -_6_x described above are recognized, as are -c compile only; don't load -f save intermediate Fortran .f files -r Ratfor only; implies -c and -f -2 use big Fortran compiler (for large programs) -U flag undeclared variables (not universally available) Other flags are passed on to the loader. _G_C_O_S The program ./_r_a_t_f_o_r is the bare translator, and is - 39 - identical to the UNIX version, except that the continuation convention is & in column 6. Thus ./ratfor files >output translates the Ratfor source on _f_i_l_e_s and collects the gen- erated Fortran on file `output' for subsequent processing. ./_r_c provides much the same services as _r_c (within the limitations of GCOS), regrettably with a somewhat different syntax. Options recognized by ./_r_c include name Ratfor source or library, depending on type h=/name make TSS H* file (runnable version); run as /name r=/name update and use random library a= compile as ascii (default is bcd) C= copy comments into Fortran f=name Fortran source file g=name gmap source file Other options are as specified for the ./_c_c command described in [4]. _T_S_O, _T_S_S, _a_n_d _o_t_h_e_r _s_y_s_t_e_m_s Ratfor exists on various other systems; check with the author for specifics.