RATFOR - A Preprocessor for a Rational Fortran


                     Brian W. Kernighan


                     Bell Laboratories


               Murray Hill, New Jersey 07974


                          _A_B_S_T_R_A_C_T


     Although Fortran is not a pleasant language to use,  it

does have the advantages of universality and (usually) rela-

tive efficiency.  The Ratfor language  attempts  to  conceal

the  main deficiencies of Fortran while retaining its desir-

able qualities, by providing decent control flow statements:


   o+ statement grouping


   o+ _i_f-_e_l_s_e and _s_w_i_t_c_h for decision-making


   o+ _w_h_i_l_e, _f_o_r, _d_o, and _r_e_p_e_a_t-_u_n_t_i_l for looping


   o+ _b_r_e_a_k and _n_e_x_t for controlling loop exits


and some ``syntactic sugar'':
--------------------------
This paper is a revised and expanded version of oe pub-
lished  in  _S_o_f_t_w_a_r_e-_P_r_a_c_t_i_c_e  _a_n_d  _E_x_p_e_r_i_e_n_c_e, October
1975.  The Ratfor described here is the one in  use  on
UNIX and GCOS at Bell Laboratories, Murray Hill, N. J.


                           - 2 -


   o+ free form input  (multiple  statements/line,  automatic

     continuation)


   o+ unobtrusive comment convention


   o+ translation of >, >=, etc., into .GT., .GE., etc.


   o+ _r_e_t_u_r_n(expression) statement for functions


   o+ _d_e_f_i_n_e statement for symbolic parameters


   o+ _i_n_c_l_u_d_e statement for including source files


Ratfor is implemented as  a  preprocessor  which  translates

this language into Fortran.


     Once the control flow and cosmetic deficiencies of For-

tran  are  hidden,  the  resulting  language  is  remarkably

pleasant to use.  Ratfor programs  are  markedly  easier  to

write,  and  to read, and thus easier to debug, maintain and

modify than their Fortran equivalents.


     It is readily possible to write Ratfor  programs  which

are  portable  to  other environments.  Ratfor is written in

itself in this way, so it is also portable; versions of Rat-

for are now running on at least two dozen different types of

computers at over five hundred locations.


     This paper discusses  design  criteria  for  a  Fortran

preprocessor,  the  Ratfor  language and its implementation,

and user experience.


                           - 3 -


_1.  _I_N_T_R_O_D_U_C_T_I_O_N


     Most  programmers  will  agree  that  Fortran   is   an

unpleasant  language to program in, yet there are many occa-

sions when they are forced to use it.  For example,  Fortran

is often the only language thoroughly supported on the local

computer.  Indeed, it is the closest thing  to  a  universal

programming  language  currently  available: with care it is

possible to write large, truly portable Fortran programs[1].

Finally,  Fortran  is  often the most ``efficient'' language

available, particularly for programs requiring much computa-

tion.


     But Fortran _i_s unpleasant.   Perhaps  the  worst  defi-

ciency  is  in  the  control  flow  statements - conditional

branches and loops - which express the logic of the program.

The  conditional  statements  in Fortran are primitive.  The

Arithmetic IF forces the user into at  least  two  statement

numbers and two (implied) GOTO's; it leads to unintelligible

code, and is eschewed by good programmers.  The  Logical  IF

is  better, in that the test part can be stated clearly, but

hopelessly restrictive because the  statement  that  follows

the  IF can only be one Fortran statement (with some _f_u_r_t_h_e_r

restrictions!).  And of course there can be no ELSE part  to

a  Fortran  IF:  there  is  no way to specify an alternative

action if the IF is not satisfied.


     The Fortran DO restricts the user to going  forward  in

an arithmetic progression.  It is fine for ``1 to N in steps


                           - 4 -


of 1 (or 2 or ...)'', but there is no direct way to go back-

wards,  or  even  (in  ANSI Fortran[2]) to go from 1 to N-1.

And of course the DO is useless if one's problem doesn't map

into an arithmetic progression.


     The result of these failings is that  Fortran  programs

must  be  written  with  numerous  labels and branches.  The

resulting code is particularly difficult to read and  under-

stand, and thus hard to debug and modify.


     When one is faced with an unpleasant language, a useful

technique  is  to  define  a new language that overcomes the

deficiencies, and to translate it into  the  unpleasant  one

with  a  preprocessor.  This is the approach taken with Rat-

for.  (The preprocessor idea  is  of  course  not  new,  and

preprocessors  for  Fortran are especially popular today.  A

recent listing [3] of preprocessors shows more than  50,  of

which at least half a dozen are widely available.)


_2.  _L_A_N_G_U_A_G_E _D_E_S_C_R_I_P_T_I_O_N


_D_e_s_i_g_n


     Ratfor  attempts  to  retain  the  merits  of   Fortran

(universality,  portability,  efficiency)  while  hiding the

worst Fortran inadequacies.  The language _i_s Fortran  except

for  two  aspects.   First, since control flow is central to

any program, regardless of  the  specific  application,  the

primary  task  of  Ratfor is to conceal this part of Fortran

from the user, by providing decent control flow  structures.


                           - 5 -


These  structures  are sufficient and comfortable for struc-

tured programming in the narrow sense of programming without

GOTO's.  Second,  since  the  preprocessor  must  examine an

entire program to translate the  control  structure,  it  is

possible   at  the  same  time  to  clean  up  many  of  the

``cosmetic'' deficiencies of Fortran,  and  thus  provide  a

language  which  is  easier  and  more  pleasant to read and

write.


     Beyond these two aspects - control flow and cosmetics -

Ratfor  does  nothing  about the host of other weaknesses of

Fortran.  Although it would be straightforward to extend  it

to  provide  character  strings,  for  example, they are not

needed by everyone, and of course the preprocessor would  be

harder to implement.  Throughout, the design principle which

has determined what should be in Ratfor and what should  not

has  been  _R_a_t_f_o_r  _d_o_e_s_n'_t  _k_n_o_w  _a_n_y _F_o_r_t_r_a_n.  Any language

feature which would require that  Ratfor  really  understand

Fortran  has  been omitted.  We will return to this point in

the section on implementation.


     Even within the confines of control flow and cosmetics,

we  have  attempted to be selective in what features to pro-

vide.  The intent has been to provide a  small  set  of  the

most  useful  constructs, rather than to throw in everything

that has ever been thought useful by someone.


     The rest of this section contains an informal  descrip-

tion  of the Ratfor language.  The control flow aspects will


                           - 6 -


be quite familiar to readers used to languages  like  Algol,

PL/I,  Pascal,  etc.,  and  the cosmetic changes are equally

straightforward.  We shall concentrate on showing  what  the

language looks like.


_S_t_a_t_e_m_e_n_t _G_r_o_u_p_i_n_g


     Fortran provides no way to group  statements  together,

short  of  making them into a subroutine.  The standard con-

struction ``if  a  condition  is  true,  do  this  group  of

things,'' for example,


     if (x > 100)
          { call error("x>100"); err = 1;
               return }

cannot be written directly in Fortran.  Instead a programmer

is  forced  to  translate this relatively clear thought into

murky Fortran, by stating the negative condition and branch-

ing around the group of statements:


          if (x .le. 100) goto 10
               call error(5hx>100)
               err = 1
               return
     10   ...

When the program doesn't work, or when it must be  modified,

this  must be translated back into a clearer form before one

can be sure what it does.


     Ratfor eliminates this error-prone and confusing  back-

and-forth  translation; the first form _i_s the way the compu-

tation is written in Ratfor.  A group of statements  can  be

treated  as  a unit by enclosing them in the braces { and }.


                           - 7 -


This is true throughout the language: wherever a single Rat-

for  statement can be used, there can be several enclosed in

braces.  (Braces seem clearer and less obtrusive than  _b_e_g_i_n

and _e_n_d or _d_o and _e_n_d, and of course _d_o and _e_n_d already have

Fortran meanings.)


     Cosmetics contribute to the readability  of  code,  and

thus  to  its  understandability.   The  character  ``>'' is

clearer than ``.GT.'', so  Ratfor  translates  it  appropri-

ately,   along   with   several  other  similar  shorthands.

Although many Fortran compilers permit character strings  in

quotes  (like  "_x>_1_0_0"), quotes are not allowed in ANSI For-

tran, so Ratfor converts it into the right  number  of  _H's:

computers count better than people do.


     Ratfor is a free-form language: statements  may  appear

anywhere  on  a  line, and several may appear on one line if

they are separated by semicolons.  The example  above  could

also be written as


     if (x > 100) {
          call error("x>100")
          err = 1
          return
     }

In this case, no semicolon is needed at the end of each line

because  Ratfor  assumes  there  is  one  statement per line

unless told otherwise.


     Of course, if the statement that follows the  _i_f  is  a

single  statement  (Ratfor  or  otherwise),  no  braces  are


                           - 8 -


needed:


     if (y <= 0.0 & z <= 0.0)
          write(6, 20) y, z

No continuation need be indicated because the  statement  is

clearly  not  finished on the first line.  In general Ratfor

continues lines when it seems obvious that they are not  yet

done.   (The  continuation convention is discussed in detail

later.)


     Although a free-form language permits wide latitude  in

formatting  styles, it is wise to pick one that is readable,

then stick to it.   In  particular,  proper  indentation  is

vital,  to make the logical structure of the program obvious

to the reader.


_T_h_e ``_e_l_s_e'' _C_l_a_u_s_e


     Ratfor provides an _e_l_s_e statement to  handle  the  con-

struction ``if a condition is true, do this thing, _o_t_h_e_r_w_i_s_e

do that thing.''


     if (a <= b)
          { sw = 0; write(6, 1) a, b }
     else
          { sw = 1; write(6, 1) b, a }

This writes out the smaller of _a and _b, then the larger, and

sets _s_w appropriately.


     The Fortran  equivalent  of  this  code  is  circuitous

indeed:


                           - 9 -


          if (a .gt. b) goto 10
               sw = 0
               write(6, 1) a, b
               goto 20
     10   sw = 1
          write(6, 1) b, a
     20   ...

This is a mechanical translation; shorter  forms  exist,  as

they  do  for many similar situations.  But all translations

suffer from the same problem: since they  are  translations,

they are less clear and understandable than code that is not

a translation.  To understand the Fortran version, one  must

scan the entire program to make sure that no other statement

branches to statements 10 or 20 before one knows that indeed

this  is  an _i_f-_e_l_s_e construction.  With the Ratfor version,

there is no question about how one gets to the parts of  the

statement.  The _i_f-_e_l_s_e is a single unit, which can be read,

understood, and ignored if not relevant.  The  program  says

what it means.


     As before, if the statement following an _i_f or an  _e_l_s_e

is a single statement, no braces are needed:


     if (a <= b)
          sw = 0
     else
          sw = 1


     The syntax of the _i_f statement is


     if (_l_e_g_a_l _F_o_r_t_r_a_n _c_o_n_d_i_t_i_o_n)
          _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t
     else
          _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t

where  the  _e_l_s_e  part  is  optional.   The  _l_e_g_a_l   _F_o_r_t_r_a_n


                           - 10 -


_c_o_n_d_i_t_i_o_n  is  anything  that  can legally go into a Fortran

Logical IF. Ratfor does not check this clause, since it does

not know enough Fortran to know what is permitted.  The _R_a_t_-

_f_o_r _s_t_a_t_e_m_e_n_t is any Ratfor or  Fortran  statement,  or  any

collection of them in braces.


_N_e_s_t_e_d _i_f'_s


     Since the statement that follows an _i_f or an  _e_l_s_e  can

be  any Ratfor statement, this leads immediately to the pos-

sibility of another _i_f or _e_l_s_e. As a  useful  example,  con-

sider  this  problem: the variable _f is to be set to -1 if _x

is less than zero, to +1 if _x is greater than 100, and to  0

otherwise.  Then in Ratfor, we write


     if (x < 0)
          f = -1
     else if (x > 100)
          f = +1
     else
          f = 0

Here the statement after the first _e_l_s_e is another  _i_f-_e_l_s_e.

Logically  it  is  just  a  single statement, although it is

rather complicated.


     This code says what it means.  Any version  written  in

straight  Fortran  will necessarily be indirect because For-

tran does not let you say what you  mean.   And  as  always,

clever shortcuts may turn out to be too clever to understand

a year from now.


     Following an _e_l_s_e with an _i_f is  one  way  to  write  a


                           - 11 -


multi-way branch in Ratfor.  In general the structure


     if (...)
          - - -
     else if (...)
          - - -
     else if (...)
          - - -
      ...
     else
          - - -

provides a way to specify  the  choice  of  exactly  one  of

several alternatives.  (Ratfor also provides a _s_w_i_t_c_h state-

ment which does the same job in certain  special  cases;  in

more  general  situations,  we  have  to  make do with spare

parts.) The tests are laid out in sequence, and each one  is

followed by the code associated with it.  Read down the list

of decisions until one is found that is satisfied.  The code

associated  with  this  condition  is executed, and then the

entire structure is finished.  The trailing _e_l_s_e  part  han-

dles  the  ``default''  case, where none of the other condi-

tions apply.  If there is no default action, this final _e_l_s_e

part is omitted:


     if (x < 0)
          x = 0
     else if (x > 100)
          x = 100


_i_f-_e_l_s_e _a_m_b_i_g_u_i_t_y


     There is one thing to notice about  complicated  struc-

tures involving nested _i_f's and _e_l_s_e's. Consider


                           - 12 -


     if (x > 0)
          if (y > 0)
               write(6, 1) x, y
          else
               write(6, 2) y

There are two _i_f's and only one _e_l_s_e. Which _i_f does the _e_l_s_e

go with?


     This is a genuine ambiguity in Ratfor, as it is in many

other  programming  languages.  The ambiguity is resolved in

Ratfor (as elsewhere) by saying that in such cases the  _e_l_s_e

goes  with  the closest previous un-_e_l_s_e'ed _i_f. Thus in this

case, the _e_l_s_e goes with the inner _i_f, as we have  indicated

by the indentation.


     It is a wise practice to resolve such cases by explicit

braces,  just to make your intent clear.  In the case above,

we would write


     if (x > 0) {
          if (y > 0)
               write(6, 1) x, y
          else
               write(6, 2) y
     }

which does not change the meaning, but leaves  no  doubt  in

the  reader's  mind.   If  we want the other association, we

_m_u_s_t write


     if (x > 0) {
          if (y > 0)
               write(6, 1) x, y
     }
     else
          write(6, 2) y


                           - 13 -


_T_h_e ``_s_w_i_t_c_h'' _S_t_a_t_e_m_e_n_t


     The _s_w_i_t_c_h statement provides a clean  way  to  express

multi-way  branches  which  branch  on  the  value  of  some

integer-valued expression.  The syntax is


     switch (_e_x_p_r_e_s_s_i_o_n) {

          case _e_x_p_r_1 :
               _s_t_a_t_e_m_e_n_t_s
          case _e_x_p_r_2, _e_x_p_r_3 :
               _s_t_a_t_e_m_e_n_t_s
          ...
          default:
               _s_t_a_t_e_m_e_n_t_s
     }


     Each _c_a_s_e is followed  by  a  list  of  comma-separated

integer  expressions.   The _e_x_p_r_e_s_s_i_o_n inside _s_w_i_t_c_h is com-

pared against the case expressions _e_x_p_r_1, _e_x_p_r_2, and  so  on

in turn until one matches, at which time the statements fol-

lowing that _c_a_s_e are executed.  If no  cases  match  _e_x_p_r_e_s_-

_s_i_o_n, and there is a _d_e_f_a_u_l_t section, the statements with it

are done; if there is no _d_e_f_a_u_l_t, nothing is done.   In  all

situations, as soon as some block of statements is executed,

the entire _s_w_i_t_c_h is exited immediately.  (Readers  familiar

with  C[4]  should beware that this behavior is not the same

as the C _s_w_i_t_c_h.)


_T_h_e ``_d_o'' _S_t_a_t_e_m_e_n_t


     The _d_o statement in Ratfor is quite similar to  the  DO

statement  in  Fortran,  except  that  it  uses no statement

number.  The statement number, after  all,  serves  only  to


                           - 14 -


mark  the end of the DO, and this can be done just as easily

with braces.  Thus


          do i = 1, n {
               x(i) = 0.0
               y(i) = 0.0
               z(i) = 0.0
          }

is the same as


          do 10 i = 1, n
               x(i) = 0.0
               y(i) = 0.0
               z(i) = 0.0
     10   continue

The syntax is:


     do _l_e_g_a_l-_F_o_r_t_r_a_n-_D_O-_t_e_x_t
          _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t

The part that follows the keyword _d_o  has  to  be  something

that  can legally go into a Fortran DO statement.  Thus if a

local version of Fortran allows DO limits to be  expressions

(which is not currently permitted in ANSI Fortran), they can

be used in a Ratfor _d_o.


     The _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t part will  often  be  enclosed  in

braces, but as with the _i_f, a single statement need not have

braces around it.  This code sets an array to zero:


     do i = 1, n
          x(i) = 0.0

Slightly more complicated,


     do i = 1, n
          do j = 1, n
               m(i, j) = 0


                           - 15 -


sets the entire array _m to zero, and


     do i = 1, n
          do j = 1, n
               if (i < j)
                    m(i, j) = -1
               else if (i == j)
                    m(i, j) = 0
               else
                    m(i, j) = +1

sets the upper triangle of _m to -1, the  diagonal  to  zero,

and   the  lower  triangle  to  +1.   (The  operator  ==  is

``equals'', that is, ``.EQ.''.) In each case, the  statement

that  follows  the  _d_o is logically a _s_i_n_g_l_e statement, even

though complicated, and thus needs no braces.


``_b_r_e_a_k'' _a_n_d ``_n_e_x_t''


     Ratfor provides a statement for leaving a  loop  early,

and  one  for beginning the next iteration.  _b_r_e_a_k causes an

immediate exit from the _d_o; in effect it is a branch to  the

statement  _a_f_t_e_r  the  _d_o. _n_e_x_t is a branch to the bottom of

the loop, so it causes the next iteration to be  done.   For

example, this code skips over negative values in an array:


     do i = 1, n {
          if (x(i) < 0.0)
               next
          _p_r_o_c_e_s_s _p_o_s_i_t_i_v_e _e_l_e_m_e_n_t
     }

_b_r_e_a_k and _n_e_x_t also work in the other  Ratfor  looping  con-

structions that we will talk about in the next few sections.


     _b_r_e_a_k and _n_e_x_t can be followed by an integer  to  indi-

cate  breaking  or  iterating  that level of enclosing loop;


                           - 16 -


thus


     break 2

exits from two levels of enclosing  loops,  and  _b_r_e_a_k _1  is

equivalent  to  _b_r_e_a_k.  _n_e_x_t _2 iterates the second enclosing

loop.  (Realistically, multi-level _b_r_e_a_k's  and  _n_e_x_t's  are

not likely to be much used because they lead to code that is

hard to understand and somewhat risky to change.)


_T_h_e ``_w_h_i_l_e'' _S_t_a_t_e_m_e_n_t


     One of the problems with the Fortran  DO  statement  is

that  it  generally insists upon being done once, regardless

of its limits.  If a loop begins


     DO I = 2, 1

this will typically be done once  with  _I  set  to  2,  even

though  common sense would suggest that perhaps it shouldn't

be.  Of course a Ratfor _d_o can easily be preceded by a test


     if (j <= k)
          do i = j, k  {
               - - -
          }

but this has to be a conscious act, and is often  overlooked

by programmers.


     A more serious problem with the DO statement is that it

encourages  that  a program be written in terms of an arith-

metic progression with small  positive  steps,  even  though

that may not be the best way to write it.  If code has to be


                           - 17 -


contorted to fit the requirements imposed by the Fortran DO,

it is that much harder to write and understand.


     To overcome these difficulties, Ratfor provides a _w_h_i_l_e

statement, which is simply a loop: ``while some condition is

true, repeat this group of statements''.  It has no  precon-

ceptions  about  why one is looping.  For example, this rou-

tine to compute sin(x) by the Maclaurin series combines  two

termination criteria.


 real function sin(x, e)
    # returns sin(x) to accuracy e, by
    # sin(x) = x - x**3/3! + x**5/5! - ...

    sin = x
    term = x

    i = 3
    while (abs(term)>e & i<100) {
       term = -term * x**2 / float(i*(i-1))
       sin = sin + term
       i = i + 2
    }

    return
    end


     Notice that if the routine is entered with _t_e_r_m already

smaller  than  _e, the loop will be done _z_e_r_o _t_i_m_e_s, that is,

no attempt will be made to compute _x**_3 and thus a potential

underflow  is avoided.  Since the test is made at the top of

a _w_h_i_l_e loop instead of the bottom, a  special  case  disap-

pears  - the code works at one of its boundaries.  (The test

_i<_1_0_0 is the other boundary - making sure the routine  stops

after some maximum number of iterations.)


     As an aside, a sharp character ``#'' in  a  line  marks


                           - 18 -


the beginning of a comment; the rest of the line is comment.

Comments and code can co-exist on the same line  -  one  can

make  marginal remarks, which is not possible with Fortran's

``C in column 1'' convention.  Blank lines are also  permit-

ted  anywhere (they are not in Fortran); they should be used

to emphasize the natural divisions of a program.


     The syntax of the _w_h_i_l_e statement is


     while (_l_e_g_a_l _F_o_r_t_r_a_n _c_o_n_d_i_t_i_o_n)
          _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t

As with the _i_f, _l_e_g_a_l _F_o_r_t_r_a_n _c_o_n_d_i_t_i_o_n  is  something  that

can  go into a Fortran Logical IF, and _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t is a

single  statement,  which  may  be  multiple  statements  in

braces.


     The _w_h_i_l_e encourages a style  of  coding  not  normally

practiced  by  Fortran  programmers.   For  example, suppose

_n_e_x_t_c_h is a function which returns the next input  character

both  as  a function value and in its argument.  Then a loop

to find the first non-blank character is just


     while (nextch(ich) == iblank)
          ;

A semicolon by itself is a null statement, which  is  neces-

sary  here  to  mark  the  end  of the _w_h_i_l_e; if it were not

present, the _w_h_i_l_e would control the next  statement.   When

the  loop  is  broken, _i_c_h contains the first non-blank.  Of

course the same code can be written in Fortran as


 100  if (nextch(ich) .eq. iblank) goto 100


                           - 19 -


but many Fortran programmers (and a few  compilers)  believe

this  line  is  illegal.   The  language  at  one's disposal

strongly influences how one thinks about a problem.


_T_h_e ``_f_o_r'' _S_t_a_t_e_m_e_n_t


     The  _f_o_r  statement  is  another  Ratfor  loop,   which

attempts  to  carry the separation of loop-body from reason-

for-looping a step further than the _w_h_i_l_e. A  _f_o_r  statement

allows  explicit  initialization and increment steps as part

of the statement.  For example, a DO loop is just


     for (i = 1; i <= n; i = i + 1) ...

This is equivalent to


     i = 1
     while (i <= n) {
          ...
          i = i + 1
     }

The initialization and increment of _i have been  moved  into

the  _f_o_r statement, making it easier to see at a glance what

controls the loop.


     The _f_o_r and _w_h_i_l_e versions have the advantage that they

will  be  done  zero  times if _n is less than 1; this is not

true of the _d_o.


     The loop of the sine routine in  the  previous  section

can be re-written with a _f_o_r as


                           - 20 -


   for (i=3; abs(term) > e & i < 100;
       i=i+2) {
        term = -term * x**2 / float(i*(i-1))
        sin = sin + term
   }


     The syntax of the _f_o_r statement is


     for ( _i_n_i_t ; _c_o_n_d_i_t_i_o_n ; _i_n_c_r_e_m_e_n_t )
          _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t

_i_n_i_t is any single Fortran statement, which gets  done  once

before  the  loop  begins.   _i_n_c_r_e_m_e_n_t is any single Fortran

statement, which gets done at the end of each  pass  through

the loop, before the test.  _c_o_n_d_i_t_i_o_n is again anything that

is legal in a logical IF. Any of _i_n_i_t, _c_o_n_d_i_t_i_o_n, and _i_n_c_r_e_-

_m_e_n_t  may be omitted, although the semicolons _m_u_s_t always be

present.  A non-existent  _c_o_n_d_i_t_i_o_n  is  treated  as  always

true,  so  _f_o_r(;;)  is  an  indefinite repeat.  (But see the

_r_e_p_e_a_t-_u_n_t_i_l in the next section.)


     The _f_o_r statement is particularly useful  for  backward

loops,  chaining  along lists, loops that might be done zero

times, and similar things which are hard to express  with  a

DO statement, and obscure to write out with IF's and GOTO's.

For example, here is a backwards DO loop to  find  the  last

non-blank character on a card:


     for (i = 80; i > 0; i = i - 1)
          if (card(i) != blank)
               break

(``!='' is the same as ``.NE.''). The code scans the columns

from  80 through to 1.  If a non-blank is found, the loop is


                           - 21 -


immediately broken.  (_b_r_e_a_k  and  _n_e_x_t  work  in  _f_o_r's  and

_w_h_i_l_e's just as in _d_o's). If _i reaches zero, the card is all

blank.


     This code is rather nasty to write with a regular  For-

tran  DO, since the loop must go forward, and we must expli-

citly set up proper conditions when we fall out of the loop.

(Forgetting this is a common error.) Thus:


    DO 10 J = 1, 80
       I = 81 - J
       IF (CARD(I) .NE. BLANK) GO TO 11
 10 CONTINUE
    I = 0
 11 ...

The version that uses the _f_o_r handles the termination condi-

tion  properly  for  free; _i _i_s zero when we fall out of the

_f_o_r loop.


     The increment in a _f_o_r need not be an  arithmetic  pro-

gression;  the  following program walks along a list (stored

in an integer array _p_t_r) until  a  zero  pointer  is  found,

adding up elements from a parallel array of values:


     sum = 0.0
     for (i = first; i > 0; i = ptr(i))
          sum = sum + value(i)

Notice that the code works correctly if the list  is  empty.

Again,  placing the test at the top of a loop instead of the

bottom eliminates a potential boundary error.


_T_h_e ``_r_e_p_e_a_t-_u_n_t_i_l'' _s_t_a_t_e_m_e_n_t


     In spite of the dire warnings, there are times when one


                           - 22 -


really  needs a loop that tests at the bottom after one pass

through.  This service is provided by the _r_e_p_e_a_t-_u_n_t_i_l:


     repeat
          _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t
     until (_l_e_g_a_l _F_o_r_t_r_a_n _c_o_n_d_i_t_i_o_n)

The _R_a_t_f_o_r _s_t_a_t_e_m_e_n_t part is done once, then  the  condition

is  evaluated.   If it is true, the loop is exited; if it is

false, another pass is made.


     The _u_n_t_i_l part is optional, so a  bare  _r_e_p_e_a_t  is  the

cleanest  way to specify an infinite loop.  Of course such a

loop must ultimately be broken by some transfer  of  control

such  as _s_t_o_p, _r_e_t_u_r_n, or _b_r_e_a_k, or an implicit stop such as

running out of input with a READ statement.


     As a  matter  of  observed  fact[8],  the  _r_e_p_e_a_t-_u_n_t_i_l

statement is _m_u_c_h less used than the other looping construc-

tions; in particular, it is typically outnumbered ten to one

by _f_o_r and _w_h_i_l_e. Be cautious about using it, for loops that

test only at the bottom often don't handle null cases well.


_M_o_r_e _o_n _b_r_e_a_k _a_n_d _n_e_x_t


     _b_r_e_a_k  exits  immediately  from  _d_o,  _w_h_i_l_e,  _f_o_r,  and

_r_e_p_e_a_t-_u_n_t_i_l.  _n_e_x_t  goes  to the test part of _d_o, _w_h_i_l_e and

_r_e_p_e_a_t-_u_n_t_i_l, and to the increment step of a _f_o_r.


``_r_e_t_u_r_n'' _S_t_a_t_e_m_e_n_t


     The standard Fortran mechanism for  returning  a  value

from  a function uses the name of the function as a variable


                           - 23 -


which can be assigned to; the last value stored in it is the

function  value upon return.  For example, here is a routine

_e_q_u_a_l which returns 1 if two arrays are identical, and  zero

if  they  differ.   The array ends are marked by the special

value -1.


 # equal - compare str1 to str2;
 #  return 1 if equal, 0 if not
    integer function equal(str1, str2)
    integer str1(100), str2(100)
    integer i

    for (i = 1; str1(i) == str2(i); i = i + 1)
       if (str1(i) == -1) {
          equal = 1
          return
       }
    equal = 0
    return
    end


     In many languages (e.g., PL/I) one instead says


     return (_e_x_p_r_e_s_s_i_o_n)

to return a value from a  function.   Since  this  is  often

clearer,  Ratfor  provides  such  a  _r_e_t_u_r_n statement - in a

function _F, _r_e_t_u_r_n(expression) is equivalent to


     { F = expression; return }

For example, here is _e_q_u_a_l again:


                           - 24 -


 # equal - compare str1 to str2;
 #  return 1 if equal, 0 if not
    integer function equal(str1, str2)
    integer str1(100), str2(100)
    integer i

    for (i = 1; str1(i) == str2(i); i = i + 1)
       if (str1(i) == -1)
          return(1)
    return(0)
    end

If there is no parenthesized expression after _r_e_t_u_r_n, a nor-

mal  RETURN is made.  (Another version of _e_q_u_a_l is presented

shortly.)


_C_o_s_m_e_t_i_c_s


     As we said above, the visual appearance of  a  language

has  a  substantial  effect  on  how  easy it is to read and

understand programs.  Accordingly, Ratfor provides a  number

of  cosmetic  facilities  which may be used to make programs

more readable.


_F_r_e_e-_f_o_r_m _I_n_p_u_t


     Statements can be  placed  anywhere  on  a  line;  long

statements  are  continued automatically, as are long condi-

tions in _i_f, _w_h_i_l_e, _f_o_r, and _u_n_t_i_l. Blank lines are ignored.

Multiple  statements  may  appear  on  one line, if they are

separated by semicolons.  No semicolon is needed at the  end

of  a  line,  if Ratfor can make some reasonable guess about

whether the statement ends there.  Lines ending with any  of

the characters


                           - 25 -


     =    +    -    *    ,    |    &    (    _

are assumed to be continued on the next  line.   Underscores

are discarded wherever they occur; all others remain as part

of the statement.


     Any statement that begins with an all-numeric field  is

assumed  to  be  a  Fortran label, and placed in columns 1-5

upon output.  Thus


     write(6, 100); 100 format("hello")

is converted into


          write(6, 100)
     100  format(5hhello)


_T_r_a_n_s_l_a_t_i_o_n _S_e_r_v_i_c_e_s


     Text enclosed in matching single or  double  quotes  is

converted  to  _n_H...  but is otherwise unaltered (except for

formatting - it may get split across card boundaries  during

the  reformatting  process).   Within  quoted  strings,  the

backslash `\' serves as an escape character: the next  char-

acter is taken literally.  This provides a way to get quotes

(and of course the backslash itself) into quoted strings:


     "\\\'"

is a string containing a backslash and an apostrophe.  (This

is  _n_o_t the standard convention of doubled quotes, but it is

easier to use and more general.)


     Any line that begins with the  character  `%'  is  left


                           - 26 -


absolutely  unaltered  except  for stripping off the `%' and

moving the line one position to the left.   This  is  useful

for  inserting  control  cards, and other things that should

not be transmogrified (like an  existing  Fortran  program).

Use  `%' only for ordinary statements, not for the condition

parts of _i_f, _w_h_i_l_e, etc., or the output may come out  in  an

unexpected place.


     The following character translations are  made,  except

within single or double quotes or on a line beginning with a

`%'.


     ==   .eq.      !=   .ne.
     >    .gt.      >=   .ge.
     <    .lt.      <=   .le.
     &    .and.     |    .or.
     !    .not.     ^    .not.

In addition, the following  translations  are  provided  for

input devices with restricted character sets.


     [    {         ]    }
     $(   {         $)   }


``_d_e_f_i_n_e'' _S_t_a_t_e_m_e_n_t


     Any string of alphanumeric characters can be defined as

a  name;  thereafter, whenever that name occurs in the input

(delimited by non-alphanumerics) it is replaced by the  rest

of the definition line.  (Comments and trailing white spaces

are stripped off).  A defined name can be arbitrarily  long,

and must begin with a letter.


     _d_e_f_i_n_e is typically used to create symbolic parameters:


                           - 27 -


     define    ROWS 100
     define    COLS 50
     dimension a(ROWS), b(ROWS, COLS)
          if (i > ROWS  |  j > COLS) ...

Alternately, definitions may be written as


     define(ROWS, 100)

In this case, the defining  text  is  everything  after  the

comma  up  to  the  balancing right parenthesis; this allows

multi-line definitions.


     It is generally a wise practice to use symbolic parame-

ters  for most constants, to help make clear the function of

what would otherwise be mysterious numbers.  As an  example,

here  is  the  routine  _e_q_u_a_l again, this time with symbolic

constants.


   define   YES      1
   define   NO    0
   define   EOS      -1
   define   ARB      100

   # equal - compare str1 to str2;
   #  return YES if equal, NO if not
      integer function equal(str1, str2)
      integer str1(ARB), str2(ARB)
      integer i

      for (i = 1; str1(i) == str2(i);
      i = i + 1)
         if (str1(i) == EOS)
            return(YES)
      return(NO)
      end


``_i_n_c_l_u_d_e'' _S_t_a_t_e_m_e_n_t


     The statement


                           - 28 -


          include file

inserts the file found on input stream _f_i_l_e into the  Ratfor

input in place of the _i_n_c_l_u_d_e statement.  The standard usage

is to place COMMON blocks on a file, and _i_n_c_l_u_d_e  that  file

whenever a copy is needed:


     subroutine x
          include commonblocks
          ...
          end

     suroutine y
          include commonblocks
          ...
          end

This ensures that all copies of the COMMON blocks are ident-

ical


_P_i_t_f_a_l_l_s, _B_o_t_c_h_e_s, _B_l_e_m_i_s_h_e_s _a_n_d _o_t_h_e_r _F_a_i_l_i_n_g_s


     Ratfor catches certain syntax errors, such  as  missing

braces,  _e_l_s_e clauses without an _i_f, and most errors involv-

ing missing parentheses in statements.  Beyond  that,  since

Ratfor  knows  no  Fortran,  any  errors  you  make  will be

reported by the Fortran compiler, so you will from  time  to

time  have to relate a Fortran diagnostic back to the Ratfor

source.


     Keywords are reserved - using _i_f, _e_l_s_e, etc., as  vari-

able  names  will typically wreak havoc.  Don't leave spaces

in keywords.  Don't use the Arithmetic IF.


     The Fortran _n_H convention is not recognized anywhere by


                           - 29 -


Ratfor; use quotes instead.


_3.  _I_M_P_L_E_M_E_N_T_A_T_I_O_N


     Ratfor was originally  written  in  C[4]  on  the  UNIX

operating system[5].  The language is specified by a context

free grammar and the compiler  constructed  using  the  YACC

compiler-compiler[6].


     The Ratfor grammar is simple and straightforward, being

essentially


     prog : stat
          | prog   stat
     stat : if (...) stat
          | if (...) stat else stat
          | while (...) stat
          | for (...; ...; ...) stat
          | do ... stat
          | repeat stat
          | repeat stat until (...)
          | switch (...) { case ...: prog ...
                    default: prog }
          | return
          | break
          | next
          | digits   stat
          | { prog }
          | anything unrecognizable

The  observation  that  Ratfor  knows  no  Fortran   follows

directly  from  the rule that says a statement is ``anything

unrecognizable''.  In fact most of Fortran falls  into  this

category,  since  any statement that does not begin with one

of the keywords is by definition ``unrecognizable.''


     Code generation is also simple.  If the first thing  on

a  source  line  is  not a keyword (like _i_f, _e_l_s_e, etc.) the

entire  statement  is  simply  copied  to  the  output  with


                           - 30 -


appropriate  character translation and formatting.  (Leading

digits are treated as a label.) Keywords cause only slightly

more  complicated  actions.   For example, when _i_f is recog-

nized, two consecutive labels L and L+1  are  generated  and

the  value of L is stacked.  The condition is then isolated,

and the code


     if (.not. (condition)) goto L

is output.  The _s_t_a_t_e_m_e_n_t part of the _i_f is then translated.

When  the  end of the statement is encountered (which may be

some distance away and include nested if's, of course),  the

code


     L    continue

is generated, unless there is an _e_l_s_e clause, in which  case

the code is


          goto L+1
     L    continue

In this latter case, the code


     L+1  continue

is produced after the _s_t_a_t_e_m_e_n_t part of the _e_l_s_e. Code  gen-

eration for the various loops is equally simple.


     One might argue that more care should be taken in  code

generation.  For example, if there is no trailing _e_l_s_e,


          if (i > 0) x = a

should be left alone, not converted into


                           - 31 -


          if (.not. (i .gt. 0)) goto 100
          x = a
     100  continue

But what are optimizing compilers for,  if  not  to  improve

code?   It  is  a  rare  program  indeed  where this kind of

``inefficiency'' will make even a measurable difference.  In

the few cases where it is important, the offending lines can

be protected by `%'.


     The use of a compiler-compiler is definitely  the  pre-

ferred  method  of  software  development.   The language is

well-defined, with few syntactic irregularities.   Implemen-

tation is quite simple; the original construction took under

a week.  The language is sufficiently simple, however,  that

an  _a_d  _h_o_c  recognizer can be readily constructed to do the

same job if no compiler-compiler is available.


     The C version of Ratfor is used  on  UNIX  and  on  the

Honeywell  GCOS  systems.   C  compilers  are  not as widely

available as Fortran, however, so there  is  also  a  Ratfor

written  in  itself  and  originally bootstrapped with the C

version.  The Ratfor version was written so as to  translate

into  the portable subset of Fortran described in [1], so it

is portable, having been run essentially without  change  on

at  least  twelve distinct machines.  (The main restrictions

of the portable subset are: only one character  per  machine

word;  subscripts in the form _c*_v+__c; avoiding expressions in

places like DO loops;  consistency  in  subroutine  argument

usage,  and  in COMMON declarations.  Ratfor itself will not


                           - 32 -


gratuitously generate non-standard Fortran.)


     The Ratfor version is about 1500 lines of Ratfor  (com-

pared  to  about  1000  lines of C); this compiles into 2500

lines of Fortran.  This expansion ratio is  somewhat  higher

than  average,  since the compiled code contains unnecessary

occurrences of COMMON declarations.  The execution  time  of

the  Ratfor  version  is dominated by two routines that read

and write cards.  Clearly these routines could  be  replaced

by  machine  coded  local versions; unless this is done, the

efficiency of other parts  of  the  translation  process  is

largely irrelevant.


_4.  _E_X_P_E_R_I_E_N_C_E


_G_o_o_d _T_h_i_n_g_s


     ``It's so much better than Fortran'' is the most common

response  of  users  when  asked how well Ratfor meets their

needs.  Although cynics might consider this to  be  vacuous,

it does seem to be true that decent control flow and cosmet-

ics converts Fortran from a bad language into quite  a  rea-

sonable  one, assuming that Fortran data structures are ade-

quate for the task at hand.


     Although there are no quantitative results, users  feel

that  coding  in Ratfor is at least twice as fast as in For-

tran.  More important, debugging and subsequent revision are

much  faster than in Fortran.  Partly this is simply because

the code can be _r_e_a_d.  The looping statements which test  at


                           - 33 -


the  top instead of the bottom seem to eliminate or at least

reduce the occurrence of a wide class  of  boundary  errors.

And  of  course  it  is easy to do structured programming in

Ratfor; this self-discipline also  contributes  markedly  to

reliability.


     One interesting and encouraging fact is  that  programs

written in Ratfor tend to be as readable as programs written

in more modern languages like Pascal.   Once  one  is  freed

from  the  shackles  of  Fortran's clerical detail and rigid

input format, it is easy to write  code  that  is  readable,

even  esthetically  pleasing.  For example, here is a Ratfor

implementation of the linear table search discussed by Knuth

[7]:


     A(m+1) = x
     for (i = 1; A(i) != x; i = i + 1)
          ;
     if (i > m) {
          m = i
          B(i) = 1
     }
     else
          B(i) = B(i) + 1

A large corpus (5400 lines) of Ratfor, including a subset of

the Ratfor preprocessor itself, can be found in [8].


_B_a_d _T_h_i_n_g_s


     The biggest single problem is that many Fortran  syntax

errors  are  not detected by Ratfor but by the local Fortran

compiler.  The compiler then prints a message  in  terms  of

the generated Fortran, and in a few cases this may be diffi-


                           - 34 -


cult to relate back to the offending Ratfor line, especially

if  the implementation conceals the generated Fortran.  This

problem could be dealt with by tagging each  generated  line

with some indication of the source line that created it, but

this is inherently implementation-dependent,  so  no  action

has  yet  been taken.  Error message interpretation is actu-

ally not so arduous as might be thought.  Since Ratfor  gen-

erates  no  variables,  only  a  simple  pattern of IF's and

GOTO's, data-related errors like  missing  DIMENSION  state-

ments  are  easy to find in the Fortran.  Furthermore, there

has been a steady improvement in Ratfor's ability  to  catch

trivial  syntactic  errors  like  unbalanced parentheses and

quotes.


     There are a number of  implementation  weaknesses  that

are  a nuisance, especially to new users.  For example, key-

words are  reserved.   This  rarely  makes  any  difference,

except  for  those hardy souls who want to use an Arithmetic

IF. A few standard Fortran constructions are not accepted by

Ratfor,  and  this is perceived as a problem by users with a

large corpus of existing Fortran programs.  Protecting every

line  with a `%' is not really a complete solution, although

it serves as a stop-gap.  The  best  long-term  solution  is

provided by the program Struct [9], which converts arbitrary

Fortran programs into Ratfor.


     Users who export programs often complain that the  gen-

erated  Fortran  is  ``unreadable'' because it is not taste-


                           - 35 -


fully formatted and contains extraneous CONTINUE statements.

To  some  extent  this can be ameliorated (Ratfor now has an

option to copy Ratfor comments into the generated  Fortran),

but  it has always seemed that effort is better spent on the

input language than on the output esthetics.


     One final problem is partly attributable to  success  -

since  Ratfor  is  relatively  easy to modify, there are now

several dialects of Ratfor.  Fortunately, so far most of the

differences  are  in  character set, or in invisible aspects

like code generation.


_5.  _C_O_N_C_L_U_S_I_O_N_S


     Ratfor demonstrates that with modest effort it is  pos-

sible  to  convert  Fortran from a bad language into quite a

good one.  A preprocessor is clearly a useful way to  extend

or ameliorate the facilities of a base language.


     When designing a language, it is important  to  concen-

trate  on  the  essential  requirement of providing the user

with the best language possible for  a  given  effort.   One

must  avoid throwing in ``features'' - things which the user

may trivially construct within the existing framework.


     One must also avoid getting sidetracked  on  irrelevan-

cies.  For instance it seems pointless for Ratfor to prepare

a neatly formatted listing of either its input or  its  out-

put.   The user is presumably capable of the self-discipline

required to prepare neat input that reflects  his  thoughts.


                           - 36 -


It  is  much  more important that the language provide free-

form input so he _c_a_n format it neatly.  No one  should  read

the output anyway except in the most dire circumstances.


_A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s


     C. A. R. Hoare once said that ``One thing [the language

designer]  should  not do is to include untried ideas of his

own.'' Ratfor follows this precept very closely - everything

in  it  has been stolen from someone else.  Most of the con-

trol flow structures are taken directly  from  the  language

C[4]  developed by Dennis Ritchie; the comment and continua-

tion conventions are adapted from Altran[10].


     I am grateful to Stuart Feldman, whose patient  simula-

tion of an innocent user during the early days of Ratfor led

to several design improvements and the eradication of  bugs.

He  also  translated the C parse-tables and YACC parser into

Fortran for the first Ratfor version of Ratfor.


_R_e_f_e_r_e_n_c_e_s


[1]  B. G. Ryder, ``The PFORT Verifier,''  _S_o_f_t_w_a_r_e-_P_r_a_c_t_i_c_e

     & _E_x_p_e_r_i_e_n_c_e, October 1974.


[2]  American National Standard Fortran.  American  National

     Standards Institute, New York, 1966.


[3]  _F_o_r-_w_o_r_d: _F_o_r_t_r_a_n _D_e_v_e_l_o_p_m_e_n_t _N_e_w_s_l_e_t_t_e_r, August 1975.


[4]  B. W. Kernighan and D. M. Ritchie,  _T_h_e  _C  _P_r_o_g_r_a_m_m_i_n_g


                           - 37 -


     _L_a_n_g_u_a_g_e, Prentice-Hall, Inc., 1978.


[5]  D. M. Ritchie and K.  L.  Thompson,  ``The  UNIX  Time-

     sharing System.'' _C_A_C_M, July 1974.


[6]  S.  C.  Johnson,  ``YACC  -   Yet   Another   Compiler-

     Compiler.'' Bell Laboratories Computing Science Techni-

     cal Report #32, 1978.


[7]  D. E. Knuth, ``Structured Programming with goto  State-

     ments.'' _C_o_m_p_u_t_i_n_g _S_u_r_v_e_y_s, December 1974.


[8]  B. W. Kernighan and  P.  J.  Plauger,  _S_o_f_t_w_a_r_e  _T_o_o_l_s,

     Addison-Wesley, 1976.


[9]  B. S. Baker, ``Struct - A Program which Structures For-

     tran'', Bell Laboratories internal memorandum, December

     1975.


[10] A. D. Hall, ``The Altran System for  Rational  Function

     Manipulation - A Survey.'' _C_A_C_M, August 1971.


                           - 38 -


_A_p_p_e_n_d_i_x: _U_s_a_g_e _o_n _U_N_I_X _a_n_d _G_C_O_S.


     Beware - local  customs  vary.   Check  with  a  native

before going into the jungle.


_U_N_I_X


     The program _r_a_t_f_o_r is the basic  translator;  it  takes

either a list of file names or the standard input and writes

Fortran on the standard output.  Options include -_6_x,  which

uses  _x as a continuation character in column 6 (UNIX uses &

in column 1), and -_C, which causes  Ratfor  comments  to  be

copied into the generated Fortran.


     The program _r_c provides an interface to the _r_a_t_f_o_r com-

mand which is much the same as _c_c. Thus


     rc [options] files

compiles the files specified by _f_i_l_e_s. Files with names end-

ing  in  ._r are Ratfor source; other files are assumed to be

for the loader.  The flags -_C and -_6_x  described  above  are

recognized, as are


     -c   compile only; don't load
     -f   save intermediate Fortran .f files
     -r   Ratfor only; implies -c and -f
     -2   use big Fortran compiler (for large programs)
     -U   flag undeclared variables (not universally available)

Other flags are passed on to the loader.


_G_C_O_S


     The program ./_r_a_t_f_o_r is the  bare  translator,  and  is


                           - 39 -


identical  to the UNIX version, except that the continuation

convention is & in column 6.  Thus


     ./ratfor  files  >output

translates the Ratfor source on _f_i_l_e_s and collects the  gen-

erated Fortran on file `output' for subsequent processing.


     ./_r_c provides much the same services as _r_c (within  the

limitations  of GCOS), regrettably with a somewhat different

syntax.  Options recognized by ./_r_c include


     name        Ratfor source or library, depending on type
     h=/name     make TSS H* file (runnable version); run as /name
     r=/name     update and use random library
     a=          compile as ascii (default is bcd)
     C=          copy comments into Fortran
     f=name      Fortran source file
     g=name      gmap source file

Other  options  are  as  specified  for  the  ./_c_c   command

described in [4].


_T_S_O, _T_S_S, _a_n_d _o_t_h_e_r _s_y_s_t_e_m_s


     Ratfor exists on various other systems; check with  the

author for specifics.