A Tour Through the Portable C Compiler


                       S. C. Johnson

                   AT&T Bell Laboratories


                        Donn Seeley

               Department of Computer Science
                     University of Utah


                          _A_B_S_T_R_A_C_T


          Since its introduction, the Portable  C  Com-
     piler  has become the standard UNIX C compiler for
     many machines.  Three quarters or more of the code
     in the compiler is machine independent and much of
     the rest can be generated easily  using  knowledge
     of  the target architecture.  This paper describes
     the structure and organization of the compiler and
     tries  to further simplify the job of the compiler
     porter.

          This document originally  appeared  with  the
     Seventh  Edition of UNIX, and has been revised and
     extended for publication with the Fourth  Berkeley
     Software  Distribution.   The  new material covers
     changes which have been made in the compiler since
     the  Seventh Edition, and includes some discussion
     of secondary topics which were thought  to  be  of
     interest in future ports of the compiler.


     Revised April, 1986


_I_n_t_r_o_d_u_c_t_i_o_n

     A C compiler has been implemented that has proved to be
quite  portable,  serving  as  the  basis for C compilers on
roughly a dozen machines, including the DEC  VAX,  Honeywell
6000,  IBM  370, and Interdata 8/32.  The compiler is highly
compatible  with  the  C  language  standard.   [  Kernighan
Ritchie Prentice 1978 ]


                     September 27, 1987


                           - 2 -


     Among the goals of this compiler are portability,  high
reliability,  and the use of state-of-the-art techniques and
tools wherever practical.  Although the  efficiency  of  the
compiling  process  is  not  a primary goal, the compiler is
efficient enough, and produces good enough code, to serve as
a production compiler.

     The language implemented is highly compatible with  the
current  PDP-11  version of C.  Moreover, roughly 75% of the
compiler, including nearly all the  syntactic  and  semantic
routines,  is machine independent.  The compiler also serves
as the major portion of the program  _l_i_n_t,  described  else-
where.  [ Johnson Lint Program Checker ]

     A number of earlier attempts to make portable compilers
are worth noting.  While on CO-OP assignment to Bell Labs in
1973, Alan Snyder wrote a portable C compiler which was  the
basis  of his Master's Thesis at M.I.T.  [ Snyder Portable C
Compiler ] This compiler was very slow and complicated,  and
contained  a  number of rather serious implementation diffi-
culties; nevertheless, a number of Snyder's ideas appear  in
this work.

     Most earlier portable  compilers,  including  Snyder's,
have proceeded by defining an intermediate language, perhaps
based on three-address code or code for a stack machine, and
writing  a machine independent program to translate from the
source code to this  intermediate  code.   The  intermediate
code  is then read by a second pass, and interpreted or com-
piled.  This approach is elegant, and has a number of advan-
tages,  especially if the target machine is far removed from
the host.  It suffers from some disadvantages as well.  Some
constructions,  like  initialization and subroutine prologs,
are difficult or expensive to express in a machine  indepen-
dent  way that still allows them to be easily adapted to the
target assemblers.  Most of these approaches require a  sym-
bol  table  to  be constructed in the second (machine depen-
dent)  pass,  and/or  require  powerful  target  assemblers.
Also,  many  conversion operators may be generated that have
no effect on a given machine, but may be  needed  on  others
(for  example,  pointer  to  pointer  conversions usually do
nothing in C, but must be generated because there  are  some
machines where they are significant).

     For these reasons, the first pass of the portable  com-
piler is not entirely machine independent.  It contains some
machine dependent features, such as initialization,  subrou-
tine  prolog  and  epilog,  certain storage allocation func-
tions, code for the _s_w_i_t_c_h statement, and code to throw  out
unneeded conversion operators.

     As a crude measure of the degree of  portability  actu-
ally achieved, the Interdata 8/32 C compiler has roughly 600
machine dependent lines of source out of 4600 in Pass 1, and


                     September 27, 1987


                           - 3 -


1000  out of 3400 in Pass 2.  In total, 1600 out of 8000, or
20%, of the total source is machine dependent (12%  in  Pass
1,  30%  in  Pass  2).  These percentages can be expected to
rise slightly as the compiler is tuned.  The  percentage  of
machine-dependent code for the IBM is 22%, for the Honeywell
25%.  If the assembler format and structure  were  the  same
for  all  these  machines, perhaps another 5-10% of the code
would become machine independent.

     These figures are  sufficiently  misleading  as  to  be
almost  meaningless.  A large fraction of the machine depen-
dent code can be  converted  in  a  straightforward,  almost
mechanical  way.  On the other hand, a certain amount of the
code requires hard intellectual effort to convert, since the
algorithms  embodied  in this part of the code are typically
complicated and machine dependent.

     To summarize, however, if you need a C compiler written
for  a  machine with a reasonable architecture, the compiler
is already three quarters finished!

_O_v_e_r_v_i_e_w

     This paper discusses the structure and organization  of
the  portable  compiler.  The intent is to give the big pic-
ture, rather than discussing the  details  of  a  particular
machine  implementation.   After a brief overview and a dis-
cussion of the source file structure,  the  paper  describes
the major data structures, and then delves more closely into
the two passes.  Some of the theoretical work on  which  the
compiler  is  based, and its application to the compiler, is
discussed elsewhere.  [ johnson portable theory  practice  ]
One of the major design issues in any C compiler, the design
of the calling sequence and stack frame, is the subject of a
separate   memorandum.    [  johnson  lesk  ritchie  calling
sequence ]

     The compiler consists of two passes, _p_a_s_s_1  and  _p_a_s_s_2,
that together turn C source code into assembler code for the
target machine.  The two passes are preceded by a preproces-
sor,  that  handles the #_d_e_f_i_n_e and #_i_n_c_l_u_d_e statements, and
related features (e.g., #_i_f_d_e_f, etc.).  The two  passes  may
optionally be followed by a machine dependent code improver.

     The output of the preprocessor is a text file  that  is
read as the standard input of the first pass.  This produces
as standard output another text file that becomes the  stan-
dard input of the second pass.  The second pass produces, as
standard output, the desired assembler language source code.
The  code  improver, if used, converts the assembler code to
more effective code, and the result is passed to the  assem-
bler.   The  preprocessor and the two passes all write error
messages on the standard  error  file.   Thus  the  compiler
itself  makes few demands on the I/O library support, aiding


                     September 27, 1987


                           - 4 -


in the bootstrapping process.

     The division of the compiler into two passes  is  some-
what  artificial.   The compiler can optionally be loaded so
that both passes operate in the same  program.   This  ``one
pass''  operation  eliminates  the  overhead  of reading and
writing the intermediate  file,  so  the  compiler  operates
about  30%  faster in this mode.  It also occupies about 30%
more space than the larger  of  the  two  component  passes.
This  ``one  pass''  compiler  is  the  standard  version on
machines with large address spaces, such as the VAX.

     Because the compiler is fundamentally structured as two
passes,  even  when  loaded  as one, this document primarily
describes the two pass version.

     The first pass does the lexical analysis, parsing,  and
symbol  table  maintenance.   It also constructs parse trees
for expressions, and keeps track of the types of  the  nodes
in  these  trees.  Additional code is devoted to initializa-
tion.  Machine dependent portions of the first pass serve to
generate  subroutine prologs and epilogs, code for switches,
and code for branches, label definitions,  alignment  opera-
tions, changes of location counter, etc.

     The intermediate file is a  text  file  organized  into
lines.   Lines beginning with a right parenthesis are copied
by the second pass directly to its  output  file,  with  the
parenthesis  stripped  off.   Thus, when the first pass pro-
duces assembly code, such as subroutine prologs, etc.,  each
line  is  prefaced with a right parenthesis; the second pass
passes these lines to through to the assembler.

     The major job done by the second pass is generation  of
code  for  expressions.  The expression parse trees produced
in the first pass are written onto the intermediate file  in
Polish  Prefix form: first, there is a line beginning with a
period, followed by the source file line number and name  on
which the expression appeared (for debugging purposes).  The
successive lines represent the nodes of the parse tree,  one
node  per  line.   Each line contains the node number, type,
and any values (e.g., values of constants) that  may  appear
in  the node.  Lines representing nodes with descendants are
immediately followed by the  left  subtree  of  descendants,
then the right.  Since the number of descendants of any node
is completely determined by the node  number,  there  is  no
need to mark the end of the tree.

     There are only two other line types in the intermediate
file.   Lines  beginning  with  a  left square bracket (`[')
represent the beginning of blocks (delimited by { ...  }  in
the  C  source);  lines beginning with right square brackets
(`]') represent the end of blocks.  The remainder  of  these
lines  tell  how  much  stack  space,  and how many register


                     September 27, 1987


                           - 5 -


variables, are currently in use.

     Thus, the second pass  reads  the  intermediate  files,
copies  the  `)' lines, makes note of the information in the
`[' and `]' lines, and devotes most of its effort to the `.'
lines  and  their  associated expression trees, turning them
turns into assembly code to evaluate the expressions.

     In the one pass version of the compiler, the expression
trees  contain  information  useful  to both logical passes.
Instead of writing the trees onto an intermediate file, each
tree is transformed in place into an acceptable form for the
code generator.  The code generator then writes  the  result
of compiling this tree onto the standard output.  Instead of
`[' and `]' lines in the intermediate file, the  information
is  passed  directly  to the second pass routines.  Assembly
code produced by the  first  pass  is  simply  written  out,
without the need for `)' at the head of each line.

_T_h_e _S_o_u_r_c_e _F_i_l_e_s

     The  compiler  source  consists  of  25  source  files.
Several  header  files  contain  information which is needed
across various source modules.  _M_a_n_i_f_e_s_t._h has  declarations
for  node  types, type manipulation macros and other macros,
and some global data definitions.   _M_a_c_d_e_f_s._h  has  machine-
dependent definitions, such as the size and alignment of the
various  data  representations.   _C_o_n_f_i_g._h  defines  symbols
which  control  the configuration of the compiler, including
such things as the sizes of various tables and  whether  the
compiler   is  ``one  pass''.   The  compiler  conditionally
includes another file, _o_n_e_p_a_s_s._h, which contains definitions
which  are  particular  to  a  ``one pass'' compiler.  _N_d_u._h
defines the basic tree  building  structure  which  is  used
throughout  the  compiler  to  construct  expression  trees.
_M_a_n_i_f_e_s_t._h includes a file of opcode  and  type  definitions
named  _p_c_c_l_o_c_a_l._h; this file is automatically generated from
a header file specific to the C compiler  named  _l_o_c_a_l_d_e_f_s._h
and  a public header file /_u_s_r/_i_n_c_l_u_d_e/_p_c_c._h.  Another file,
_p_c_c_t_o_k_e_n_s, is generated in a similar way and contains  token
definitions  for the compiler's Yacc [ Johnson Yacc Compiler
cstr ]  grammar.   Two  machine  independent  header  files,
_p_a_s_s_1._h and _p_a_s_s_2._h, contain the data structure and manifest
definitions for the first and second  passes,  respectively.
In  the  second  pass,  a  machine  dependent  header  file,
_m_a_c_2_d_e_f_s._h, contains declarations of register names, etc.

     _C_o_m_m_o_n._c contains machine independent routines used  in
both  passes.   These  include  routines  for allocating and
freeing trees, walking over trees, printing debugging infor-
mation,  and printing error messages.  This file can be com-
piled in two flavors, one for pass 1 and  one  for  pass  2,
depending on what conditional compilation symbol is used.


                     September 27, 1987


                           - 6 -


     Entire sections of this document  are  devoted  to  the
detailed  structure  of the passes.  For the moment, we just
give a brief description of the files.  The  first  pass  is
obtained by compiling and loading _c_g_r_a_m._y, _c_o_d_e._c, _c_o_m_m_o_n._c,
_l_o_c_a_l._c,  _o_p_t_i_m._c,  _p_f_t_n._c,  _s_c_a_n._c,  _s_t_a_b._c,  _t_r_e_e_s._c   and
_x_d_e_f_s._c.   _S_c_a_n._c  is  the  lexical analyzer, which provides
tokens to the bottom-up parser which is defined by the  Yacc
grammar _c_g_r_a_m._y.  _X_d_e_f_s._c is a short file of external defin-
itions.  _P_f_t_n._c maintains the symbol table,  and  does  ini-
tialization.   _T_r_e_e_s._c builds the expression trees, and com-
putes the node types.  _O_p_t_i_m._c does some machine independent
optimizations  on  the  expression trees.  _C_o_m_m_o_n._c contains
service routines common to the two passes of  the  compiler.
All  the  above  files  are  machine independent.  The files
_l_o_c_a_l._c and _c_o_d_e._c contain machine dependent code  for  gen-
erating  subroutine  prologs,  switch  code,  and  the like.
_S_t_a_b._c contains machine dependent code for producing  exter-
nal  symbol  table  information  which  can drive a symbolic
debugger.

     The second pass is produced by  compiling  and  loading
_a_l_l_o._c,  _c_o_m_m_o_n._c,  _l_o_c_a_l_2._c, _m_a_t_c_h._c, _o_r_d_e_r._c, _r_e_a_d_e_r._c and
_t_a_b_l_e._c.  _R_e_a_d_e_r._c reads the intermediate file, and controls
the  major logic of the code generation.  _A_l_l_o._c keeps track
of busy and free registers.  _M_a_t_c_h._c controls  the  matching
of  code  templates to subtrees of the expression tree to be
compiled.  _C_o_m_m_o_n._c defines certain service routines, as  in
the  first  pass.   The above files are machine independent.
_O_r_d_e_r._c controls the machine dependent details of  the  code
generation strategy.  _L_o_c_a_l_2._c has many small machine depen-
dent routines, and tables of opcodes, register  types,  etc.
_T_a_b_l_e._c has the code template tables, which are also clearly
machine dependent.

_D_a_t_a _S_t_r_u_c_t_u_r_e _C_o_n_s_i_d_e_r_a_t_i_o_n_s

     This section discusses the node  numbers,  type  words,
and  expression  trees,  used  throughout both passes of the
compiler.

     The  file  _m_a_n_i_f_e_s_t._h  defines   those   symbols   used
throughout  both passes.  The intent is to use the same sym-
bol name (e.g., MINUS) for the given operator throughout the
lexical  analysis,  parsing, tree building, and code genera-
tion phases.  _M_a_n_i_f_e_s_t._h obtains  some  of  its  definitions
from   two   other  header  files,  _l_o_c_a_l_d_e_f_s._h  and  _p_c_c._h.
_L_o_c_a_l_d_e_f_s._h contains definitions for operator symbols  which
are  specific to the C compiler.  _P_c_c._h contains definitions
for operators and types which may be used by other compilers
to  communicate with a portable code generator based on pass
2; this code generator will be described later.

     A token like MINUS may be seen in the lexical  analyzer
before it is known whether it is a unary or binary operator;


                     September 27, 1987


                           - 7 -


clearly, it is necessary to know this by the time the  parse
tree  is  constructed.   Thus,  an operator (really a macro)
called UNARY is provided, so that MINUS and UNARY MINUS  are
both  distinct  node numbers.  Similarly, many binary opera-
tors exist in an assignment form (for example, -=), and  the
operator  ASG  may be applied to such node names to generate
new ones, e.g. ASG MINUS.

     It is frequently desirable to know if a node represents
a  leaf  (no descendants), a unary operator (one descendant)
or a binary operator (two descendants).  The macro _o_p_t_y_p_e(_o)
returns  one  of  the  manifest  constants  LTYPE, UTYPE, or
BITYPE, respectively, depending on the node number _o.  Simi-
larly,  _a_s_g_o_p(_o) returns true if _o is an assignment operator
number (=, +=, etc. ), and _l_o_g_o_p(_o) returns true if _o  is  a
relational or logical (&&, ||, or !) operator.

     C has a  rich  typing  structure,  with  a  potentially
infinite  number  of  types.   To  begin with, there are the
basic types: CHAR, SHORT, INT, LONG, the  unsigned  versions
known  as UCHAR, USHORT, UNSIGNED, ULONG, and FLOAT, DOUBLE,
and finally STRTY (a structure), UNIONTY, and ENUMTY.  Then,
there  are  three  operators that can be applied to types to
make others: if _t is a type, we may potentially  have  types
_p_o_i_n_t_e_r  _t_o  _t,  _f_u_n_c_t_i_o_n _r_e_t_u_r_n_i_n_g _t, and _a_r_r_a_y _o_f _t'_s gen-
erated from _t.  Thus, an arbitrary type in C consists  of  a
basic type, and zero or more of these operators.

     In the compiler, a type is represented by  an  unsigned
integer;  the  rightmost  four bits hold the basic type, and
the remaining bits are divided into two-bit fields, contain-
ing 0 (no operator), or one of the three operators described
above.  The modifiers are read right to left  in  the  word,
starting  with the two-bit field adjacent to the basic type,
until a field with 0 in it is reached.  The macros PTR, FTN,
and  ARY  represent  the _p_o_i_n_t_e_r _t_o, _f_u_n_c_t_i_o_n _r_e_t_u_r_n_i_n_g, and
_a_r_r_a_y _o_f operators.  The macro values are  shifted  so  that
they  align  with  the  first  two-bit  field;  thus PTR+INT
represents the type for an integer pointer, and

        ARY + (PTR<<2) + (FTN<<4) + DOUBLE

represents the type of an array  of  pointers  to  functions
returning doubles.

     The type words are ordinarily  manipulated  by  macros.
If  _t  is  a  type  word,  _B_T_Y_P_E(_t)  gives  the  basic type.
_I_S_P_T_R(_t), _I_S_A_R_Y(_t), and _I_S_F_T_N(_t) ask if an  object  of  this
type  is  a  pointer,  array,  or  a function, respectively.
_M_O_D_T_Y_P_E(_t,_b) sets the basic type of _t to _b.  _D_E_C_R_E_F(_t) gives
the  type resulting from removing the first operator from _t.
Thus, if _t is a pointer to _t', a function returning  _t',  or
an  array  of  _t', then _D_E_C_R_E_F(_t) would equal _t'.  _I_N_C_R_E_F(_t)
gives the type representing a pointer to _t.  Finally,  there


                     September 27, 1987


                           - 8 -


are   operators   for   dealing  with  the  unsigned  types.
_I_S_U_N_S_I_G_N_E_D(_t) returns true if _t is one  of  the  four  basic
unsigned  types; in this case, _D_E_U_N_S_I_G_N(_t) gives the associ-
ated `signed' type.  Similarly, _U_N_S_I_G_N_A_B_L_E(_t)  returns  true
if  _t  is  one  of  the  four  basic types that could become
unsigned, and _E_N_U_N_S_I_G_N(_t) returns the unsigned analogue of _t
in this case.

     The other important global data structure  is  that  of
expression  trees.  The actual shapes of the nodes are given
in _n_d_u._h.  The information stored for each pass is not quite
the  same;  in  the  first pass, nodes contain dimension and
size information, while in the  second  pass  nodes  contain
register  allocation  information.   Nevertheless, all nodes
contain fields called _o_p, containing the  node  number,  and
_t_y_p_e,  containing the type word.  A function called _t_a_l_l_o_c()
returns a pointer to a new tree node.  To free a  node,  its
_o_p  field  need  merely be set to FREE.  The other fields in
the node will remain intact at least until the next  alloca-
tion.

     Nodes representing  binary  operators  contain  fields,
_l_e_f_t  and _r_i_g_h_t, that contain pointers to the left and right
descendants.  Unary operator nodes have the _l_e_f_t field,  and
a value field called _r_v_a_l.  Leaf nodes, with no descendants,
have two value fields: _l_v_a_l and _r_v_a_l.

     At appropriate times,  the  function  _t_c_h_e_c_k()  can  be
called,  to  check  that  there are no busy nodes remaining.
This is used as a compiler consistency check.  The  function
_t_c_o_p_y(_p)  takes  a  pointer  _p  that points to an expression
tree, and returns a pointer to a disjoint copy of the  tree.
The  function  _w_a_l_k_f(_p,_f)  performs  a postorder walk of the
tree pointed to by _p, and applies the  function  _f  to  each
node.  The function _f_w_a_l_k(_p,_f,_d) does a preorder walk of the
tree pointed to by _p.  At each node, it calls a function  _f,
passing to it the node pointer, a value passed down from its
ancestor, and two pointers to values to be  passed  down  to
the left and right descendants (if any).  The value _d is the
value passed down to the root.  _F_w_a_l_k is used for  a  number
of tree labeling and debugging activities.

     The other  major  data  structure,  the  symbol  table,
exists only in pass one, and will be discussed later.

_P_a_s_s _O_n_e

     The first pass does lexical analysis,  parsing,  symbol
table maintenance, tree building, optimization, and a number
of machine dependent things.  This pass is  largely  machine
independent,  and  the  machine  independent sections can be
pretty  successfully  ignored.   Thus,  they  will  be  only
sketched here.


                     September 27, 1987


                           - 9 -


_L_e_x_i_c_a_l _A_n_a_l_y_s_i_s

     The lexical analyzer is a conceptually  simple  routine
that  reads  the  input  and  returns  the  tokens  of the C
language as it encounters them: names, constants, operators,
and keywords.  The conceptual simplicity of this job is con-
founded a bit by  several  other  simple  jobs  that  unfor-
tunately must go on simultaneously.  These include

o+    Keeping track of the current filename and line  number,
     and occasionally setting this information as the result
     of preprocessor control lines.

o+    Skipping comments.

o+    Properly dealing with  octal,  decimal,  hex,  floating
     point,  and  character  constants, as well as character
     strings.

     To achieve speed, the program maintains several  tables
that  are indexed into by character value, to tell the lexi-
cal analyzer what to do next.  To achieve portability, these
tables must be initialized each time the compiler is run, in
order that the table entries reflect the local character set
values.

_P_a_r_s_i_n_g

     As mentioned above, the parser  is  generated  by  Yacc
from  the  grammar _c_g_r_a_m._y.  The grammar is relatively read-
able, but contains some unusual features that are worth com-
ment.

     Perhaps the strangest feature of  the  grammar  is  the
treatment  of declarations.  The problem is to keep track of
the basic type and the storage class while interpreting  the
various stars, brackets, and parentheses that may surround a
given name.  The entire declaration mechanism must be recur-
sive,  since  declarations may appear within declarations of
structures and unions, or even within a _s_i_z_e_o_f  construction
inside a dimension in another declaration!

     There  are  some  difficulties  in  using  a  bottom-up
parser,  such  as  produced by Yacc, to handle constructions
where a lot of left context information must be kept around.
The problem is that the original PDP-11 compiler is top-down
in implementation, and some of the semantics  of  C  reflect
this.   In a top-down parser, the input rules are restricted
somewhat, but one can naturally associate temporary  storage
with a rule at a very early stage in the recognition of that
rule.  In a bottom-up parser, there is more freedom  in  the
specification  of  rules,  but  it is more difficult to know
what rule is being matched until the entire  rule  is  seen.
The  parser  described by _c_g_r_a_m._y makes effective use of the


                     September 27, 1987


                           - 10 -


bottom-up parsing mechanism  in  some  places  (notably  the
treatment  of  expressions),  but struggles against the res-
trictions in others.  The usual result is that it is  neces-
sary  to  run a stack of values ``on the side'', independent
of the Yacc value stack, in order to be able  to  store  and
access  information  deep  within inner constructions, where
the relationship of the rules being recognized to the  total
picture is not yet clear.

     In the case of declarations, the attribute  information
(type, etc.) for a declaration is carefully kept immediately
to the left of the declarator (that part of the  declaration
involving  the  name).   In  this  way,  when  it is time to
declare the name, the name and the type information  can  be
quickly  brought  together.  The ``$0'' mechanism of Yacc is
used to accomplish this.  The result is not pretty,  but  it
works.   The  storage class information changes more slowly,
so it is kept in an external variable, and stacked if neces-
sary.   Some of the grammar could be considerably cleaned up
by using some more recent features of Yacc, notably  actions
within  rules  and the ability to return multiple values for
actions.

     A stack is also used to keep track of the current loca-
tion to be branched to when a _b_r_e_a_k or _c_o_n_t_i_n_u_e statement is
processed.

     This use of external stacks dates from  the  time  when
Yacc did not permit values to be structures.  Some, or most,
of this use of external stacks could be eliminated by  redo-
ing  the  grammar to use the mechanisms now provided.  There
are some areas,  however,  particularly  the  processing  of
structure,  union,  and  enumeration  declarations, function
prologs, and switch statement processing,  when  having  all
the affected data together in an array speeds later process-
ing; in this case, use of external storage seems essential.

     The _c_g_r_a_m._y file also  contains  some  small  functions
used as utility functions in the parser.  These include rou-
tines for  saving  case  values  and  labels  in  processing
switches,  and  stacking  and popping values on the external
stack described above.

_S_t_o_r_a_g_e _C_l_a_s_s_e_s

     C has a finite, but fairly extensive, number of storage
classes available.  One of the compiler design decisions was
to process the storage  class  information  totally  in  the
first  pass;  by the second pass, this information must have
been totally dealt with.  This means that all of the storage
allocation must take place in the first pass, so that refer-
ences to automatics and parameters can be turned into refer-
ences  to  cells lying a certain number of bytes offset from
certain machine registers.  Much of this  transformation  is


                     September 27, 1987


                           - 11 -


machine  dependent,  and  strongly  depends  on  the storage
class.

     The classes include EXTERN  (for  externally  declared,
but  not  defined  variables),  EXTDEF (for external defini-
tions), and similar distinctions  for  USTATIC  and  STATIC,
UFORTRAN  and FORTRAN (for fortran functions) and ULABEL and
LABEL.  The storage classes REGISTER and AUTO  are  obvious,
as  are  STNAME, UNAME, and ENAME (for structure, union, and
enumeration tags), and the associated MOS, MOU, and MOE (for
the  members).   TYPEDEF  is  treated  as a storage class as
well.  There are two  special  storage  classes:  PARAM  and
SNULL.   SNULL  is  used  to  distinguish  the case where no
explicit storage class has been given; before  an  entry  is
made   in  the  symbol  table  the  true  storage  class  is
discovered.  Similarly, PARAM  is  used  for  the  temporary
entry  in  the  symbol  table made before the declaration of
function parameters is completed.

     The most complexity in the storage class process  comes
from  bit fields.  A separate storage class is kept for each
width bit field; a _k bit bit field has storage class _k  plus
FIELD.   This  enables the size to be quickly recovered from
the storage class.

_S_y_m_b_o_l _T_a_b_l_e _M_a_i_n_t_e_n_a_n_c_e

     The symbol table routines do far more than simply enter
names  into the symbol table; considerable semantic process-
ing and checking is done as well.  For  example,  if  a  new
declaration  comes in, it must be checked to see if there is
a previous declaration of the same  symbol.   If  there  is,
there  are  many  cases.   The declarations may agree and be
compatible (for example, an extern  declaration  can  appear
twice)  in  which  case the new declaration is ignored.  The
new declaration may add information  (such  as  an  explicit
array dimension) to an already present declaration.  The new
declaration may be different, but still correct  (for  exam-
ple,  an extern declaration of something may be entered, and
then later the definition may be seen).  The new declaration
may  be  incompatible, but appear in an inner block; in this
case, the old declaration is carefully hidden away, and  the
new  one comes into force until the block is left.  Finally,
the declarations may be incompatible, and an  error  message
must be produced.

     A number of other factors make for additional  complex-
ity.   The  type declared by the user is not always the type
entered into the symbol table  (for  example,  if  a  formal
parameter  to  a  function  is  declared  to  be an array, C
requires that this be changed into a pointer before entry in
the  symbol  table).   Moreover,  there are various kinds of
illegal types that may be declared which  are  difficult  to
check  for  syntactically (for example, a function returning


                     September 27, 1987


                           - 12 -


an array).  Finally, there is a strange feature  in  C  that
requires structure tag names and member names for structures
and unions to be taken from a different logical symbol table
than  ordinary  identifiers.  Keeping track of which kind of
name is involved is a  bit  of  struggle  (consider  typedef
names used within structure declarations, for example).

     The symbol table handling routines have been  rewritten
a  number  of times to extend features, improve performance,
and fix bugs.  They address the above problems with  reason-
able effectiveness but a singular lack of grace.

     When a name is read in the input, it is hashed, and the
routine  _l_o_o_k_u_p  is called, together with a flag which tells
which symbol table should be searched (actually, both symbol
tables  are stored in one, and a flag is used to distinguish
individual entries).  If the name is found,  _l_o_o_k_u_p  returns
the  index  to  the  entry  found; otherwise, it makes a new
entry, marks it UNDEF (undefined), and returns the index  of
the  new entry.  This index is stored in the _r_v_a_l field of a
NAME node.

     When a declaration is being parsed, this NAME  node  is
made  part  of  a  tree  with UNARY MUL nodes for each *, LB
nodes for each array descriptor (the  right  descendant  has
the  dimension),  and  UNARY  CALL  nodes  for each function
descriptor.  This tree is passed  to  the  routine  _t_y_m_e_r_g_e,
along with the attribute type of the whole declaration; this
routine collapses the tree to  a  single  node,  by  calling
_t_y_r_e_d_u_c_e,  and then modifies the type to reflect the overall
type of the declaration.

     Dimension and size information is  stored  in  a  table
called  _d_i_m_t_a_b.  To properly describe a type in C, one needs
not just the type information but also size information (for
structures  and enumerations) and dimension information (for
arrays).  Sizes and offsets are dealt with in  the  compiler
by  giving  the associated indices into _d_i_m_t_a_b.  _T_y_m_e_r_g_e and
_t_y_r_e_d_u_c_e call _d_s_t_a_s_h to put the discovered  dimensions  away
into  the _d_i_m_t_a_b array.  _T_y_m_e_r_g_e returns a pointer to a sin-
gle node that contains the symbol table index  in  its  _r_v_a_l
field, and the size and dimension indices in fields _c_s_i_z and
_c_d_i_m, respectively.  This information is properly considered
part of the type in the first pass, and is carried around at
all times.

     To enter an element into the symbol table, the  routine
_d_e_f_i_d is called; it is handed a storage class, and a pointer
to the node produced by _t_y_m_e_r_g_e.  _D_e_f_i_d calls _f_i_x_t_y_p_e, which
adjusts  and  checks the given type depending on the storage
class, and converts null types appropriately.  It then calls
_f_i_x_c_l_a_s_s, which does a similar job for the storage class; it
is here, for example, that register declarations are  either
allowed or changed to auto.


                     September 27, 1987


                           - 13 -


     The new declaration is now compared  against  an  older
one,  if  present, and several pages of validity checks per-
formed.  If the definitions are  compatible,  with  possibly
some  added  information, the processing is straightforward.
If the definitions differ, the block levels of  the  current
and  the  old  declaration  are compared.  The current block
level is kept in  _b_l_e_v_e_l,  an  external  variable;  the  old
declaration  level is kept in the symbol table.  Block level
0 is for external declarations, 1 is for arguments to  func-
tions, and 2 and above are blocks within a function.  If the
current block level is the same as the old  declaration,  an
error  results.   If  the current block level is higher, the
new declaration overrides the old.  This is done by  marking
the  old  symbol  table  entry  ``hidden'', and making a new
entry, marked ``hiding''.   _L_o_o_k_u_p  will  skip  over  hidden
entries.   When  a  block  is  left,  the  symbol  table  is
searched, and any entries defined in  that  block  are  des-
troyed;  if  they  hid  other  entries,  the old entries are
``unhidden''.

     This nice block  structure  is  warped  a  bit  because
labels do not follow the block structure rules (one can do a
_g_o_t_o into a block,  for  example);  default  definitions  of
functions  in  inner  blocks  also  persist clear out to the
outermost scope.  This implies that cleaning up  the  symbol
table  after  block  exit is more subtle than it might first
seem.

     For successful new definitions, _d_e_f_i_d also  initializes
a  ``general  purpose''  field, _o_f_f_s_e_t, in the symbol table.
It contains the stack offset for automatics and  parameters,
the  register  number for register variables, the bit offset
into the structure for structure members, and  the  internal
label  number  for  static variables and labels.  The offset
field is set by _f_a_l_l_o_c for bit  fields,  and  _d_c_l_s_t_r_u_c_t  for
structures and unions.

     The symbol table entry itself thus contains  the  name,
type  word,  size  and  dimension offsets, offset value, and
declaration block level.  It also  has  a  field  of  flags,
describing what symbol table the name is in, and whether the
entry is hidden, or hides another.  Finally, a  field  gives
the  line  number  of the last use, or of the definition, of
the name.  This is used mainly for diagnostics, but is  use-
ful to _l_i_n_t as well.

     In some special cases, there is  more  than  the  above
amount  of  information  kept  for  the use of the compiler.
This is especially true with structures; for use in initial-
ization,  structure  declarations must have access to a list
of the members of the structure.  This list is also kept  in
_d_i_m_t_a_b.   Because  a  structure can be mentioned long before
the members are known, it is necessary to have another level
of  indirection  in  the table.  The two words following the


                     September 27, 1987


                           - 14 -


_c_s_i_z entry in _d_i_m_t_a_b are used to hold the alignment  of  the
structure,  and  the index in dimtab of the list of members.
This list contains the symbol table indices for  the  struc-
ture members, terminated by a -1.

_T_r_e_e _B_u_i_l_d_i_n_g

     The  portable  compiler  transforms  expressions   into
expression trees.  As the parser recognizes each rule making
up an expression, it  calls  _b_u_i_l_d_t_r_e_e  which  is  given  an
operator  number, and pointers to the left and right descen-
dants.  _B_u_i_l_d_t_r_e_e first examines the left and right  descen-
dants,  and, if they are both constants, and the operator is
appropriate, simply does the constant computation at compile
time,  and  returns  the  result  as a constant.  Otherwise,
_b_u_i_l_d_t_r_e_e allocates  a  node  for  the  head  of  the  tree,
attaches  the descendants to it, and ensures that conversion
operators are generated if needed, and that the type of  the
new  node  is  consistent  with  the  types of the operands.
There is also a considerable amount of  semantic  complexity
here;  many combinations of types are illegal, and the port-
able compiler makes a strong effort to check the legality of
expression  types  completely.   This  is done both for _l_i_n_t
purposes, and to prevent such  semantic  errors  from  being
passed through to the code generator.

     The heart of _b_u_i_l_d_t_r_e_e is a large  table,  accessed  by
the  routine _o_p_a_c_t.  This routine maps the types of the left
and right operands into a rather smaller set of descriptors,
and  then  accesses  a  table  (actually encoded in a switch
statement) which for each operator and pair of types  causes
an action to be returned.  The actions are logical or's of a
number of separate actions, which  may  be  carried  out  by
_b_u_i_l_d_t_r_e_e.  These component actions may include checking the
left side to ensure that it is  an  lvalue  (can  be  stored
into),  applying  a  type  conversion  to  the left or right
operand, setting the type of the new node to the type of the
left  or  right operand, calling various routines to balance
the types of the left and right  operands,  and  suppressing
the  ordinary  conversion of arrays and function operands to
pointers.  An important operation  is  OTHER,  which  causes
some  special  code  to  be  invoked in _b_u_i_l_d_t_r_e_e, to handle
issues which are unique to a particular operator.   Examples
of  this are structure and union reference (actually handled
by the routine _s_t_r_e_f), the building of  NAME,  ICON,  STRING
and  FCON  (floating  point  constant) nodes, unary * and &,
structure assignment, and calls.  In the case of unary * and
&, _b_u_i_l_d_t_r_e_e will cancel a * applied to a tree, the top node
of which is &, and conversely.

     Another special operation is PUN; this causes the  com-
piler  to  check  for  type  mismatches, such as intermixing
pointers and integers.


                     September 27, 1987


                           - 15 -


     The treatment  of  conversion  operators  is  a  rather
strange  area of the compiler (and of C!).  The introduction
of type casts only confounded this situation.  Most  of  the
conversion  operators  are generated by calls to _t_y_m_a_t_c_h and
_p_t_m_a_t_c_h, both of which are given a tree, and asked  to  make
the  operands  agree in type.  _P_t_m_a_t_c_h treats the case where
one of the operands is a pointer; _t_y_m_a_t_c_h treats  all  other
cases.  Where these routines have decided on the proper type
for an operand, they call _m_a_k_e_t_y, which is  handed  a  tree,
and  a  type  word,  dimension  offset, and size offset.  If
necessary, it inserts a conversion  operation  to  make  the
types  correct.  Conversion operations are never inserted on
the left side of assignment operators, however.   There  are
two  conversion  operators used; PCONV, if the conversion is
to a non-basic type (usually a pointer), and SCONV,  if  the
conversion is to a basic type (scalar).

     To allow for maximum flexibility, every  node  produced
by  _b_u_i_l_d_t_r_e_e  is given to a machine dependent routine, _c_l_o_-
_c_a_l, immediately after it is produced.   This  is  to  allow
more  or  less immediate rewriting of those nodes which must
be adapted for the local machine.  The conversion operations
are given to _c_l_o_c_a_l as well; on most machines, many of these
conversions do nothing, and should  be  thrown  away  (being
careful  to retain the type).  If this operation is done too
early, however, later calls to _b_u_i_l_d_t_r_e_e  may  get  confused
about correct type of the subtrees; thus _c_l_o_c_a_l is given the
conversion operations only after the entire tree  is  built.
This topic will be dealt with in more detail later.

_I_n_i_t_i_a_l_i_z_a_t_i_o_n

     Initialization is one of the messier areas in the port-
able  compiler.   The  only  consolation is that most of the
mess takes place in the machine independent part,  where  it
is  may be safely ignored by the implementor of the compiler
for a particular machine.

     The basic problem is that the semantics of  initializa-
tion really calls for a co-routine structure; one collection
of programs reading constants from the input  stream,  while
another,  independent set of programs places these constants
into the appropriate spots in memory.  The dramatic  differ-
ences  in  the  local assemblers also come to the fore here.
The parsing problems are dealt  with  by  keeping  a  rather
extensive stack containing the current state of the initial-
ization; the assembler problems are dealt with by  having  a
fair number of machine dependent routines.

     The stack  contains  the  symbol  table  number,  type,
dimension  index,  and size index for the current identifier
being initialized.  Another entry has the offset,  in  bits,
of  the  beginning of the current identifier.  Another entry
keeps track of how many elements  have  been  seen,  if  the


                     September 27, 1987


                           - 16 -


current  identifier  is an array.  Still another entry keeps
track of the current member of a  structure  being  initial-
ized.   Finally,  there  is  an entry containing flags which
keep track of the current state of the  initialization  pro-
cess  (e.g.,  tell  if  a  `}' has been seen for the current
identifier).

     When an initialization begins, the routine  _b_e_g_i_n_i_t  is
called;  it  handles the alignment restrictions, if any, and
calls _i_n_s_t_k to create the stack  entry.   This  is  done  by
first  making  an entry on the top of the stack for the item
being initialized.  If the top entry is  an  array,  another
entry  is  made  on the stack for the first element.  If the
top entry is a structure, another entry is made on the stack
for the first member of the structure.  This continues until
the top element of  the  stack  is  a  scalar.   _I_n_s_t_k  then
returns, and the parser begins collecting initializers.

     When a constant is  obtained,  the  routine  _d_o_i_n_i_t  is
called;  it  examines the stack, and does whatever is neces-
sary to assign the current constant to the scalar on the top
of  the stack.  _g_o_t_s_c_a_l is then called, which rearranges the
stack so that the next scalar to be initialized gets  placed
on  top  of the stack.  This process continues until the end
of the initializers; _e_n_d_i_n_i_t cleans up.  If a `{' or `}'  is
encountered  in the string of initializers, it is handled by
calling _i_l_b_r_a_c_e or _i_r_b_r_a_c_e, respectively.

     A central issue is the treatment of the ``holes''  that
arise  as  a  result  of  alignment restrictions or explicit
requests for holes in bit fields.  There is a  global  vari-
able,  _i_n_o_f_f,  which contains the current offset in the ini-
tialization (all offsets in the first pass of  the  compiler
are  in bits).  _D_o_i_n_i_t figures out from the top entry on the
stack the expected bit offset of  the  next  identifier;  it
calls  the  machine  dependent  routine  _i_n_f_o_r_c_e which, in a
machine dependent way, forces the  assembler  to  set  aside
space  if  need be so that the next scalar seen will go into
the appropriate bit offset position.  The scalar  itself  is
passed to one of the machine dependent routines _f_i_n_c_o_d_e (for
floating point  initialization),  _i_n_c_o_d_e  (for  fields,  and
other  initializations  less than an int in size), and _c_i_n_i_t
(for all other initializations).  The size is passed to  all
these  routines,  and it is up to the machine dependent rou-
tines to ensure that the initializer  occupies  exactly  the
right size.

     Character strings represent a bit of an exception.   If
a character string is seen as the initializer for a pointer,
the characters making up the string must be put out under  a
different  location counter.  When the lexical analyzer sees
the quote at the head of a character string, it returns  the
token  STRING,  but  does not do anything with the contents.
The parser calls  _g_e_t_s_t_r,  which  sets  up  the  appropriate


                     September 27, 1987


                           - 17 -


location  counters  and  flags,  and calls _l_x_s_t_r to read and
process the contents of the string.

     If the string is being used to initialize  a  character
array, _l_x_s_t_r calls _p_u_t_b_y_t_e, which in effect simulates _d_o_i_n_i_t
for each character read.  If the string is used to  initial-
ize  a  character  pointer,  _l_x_s_t_r calls a machine dependent
routine, _b_y_c_o_d_e, which stashes  away  each  character.   The
pointer  to this string is then returned, and processed nor-
mally by _d_o_i_n_i_t.

     The null at the end of the string is treated as  if  it
were read explicitly by _l_x_s_t_r.

_S_t_a_t_e_m_e_n_t_s

     The first pass addresses four main areas; declarations,
expressions,  initialization, and statements.  The statement
processing is relatively simple; most of it is  carried  out
in the parser directly.  Most of the logic is concerned with
allocating label numbers, defining the labels, and branching
appropriately.   An  external  symbol,  _r_e_a_c_h_e_d,  is  1 if a
statement can be reached, 0 otherwise; this is used to do  a
bit  of simple flow analysis as the program is being parsed,
and also to avoid generating the subroutine return  sequence
if  the  subroutine  cannot ``fall through'' the last state-
ment.

     Conditional  branches  are  handled  by  generating  an
expression  node, CBRANCH, whose left descendant is the con-
ditional expression and the right descendant is an ICON node
containing the internal label number to be branched to.  For
efficiency, the semantics are that the label is gone  to  if
the condition is _f_a_l_s_e.

     The switch statement is compiled by collecting the case
entries,  and an indication as to whether there is a default
case; an internal label number  is  generated  for  each  of
these,  and  remembered  in  a  big  array.   The expression
comprising the value to be switched on is compiled when  the
switch  keyword  is  encountered, but the expression tree is
headed by a special node, FORCE, which tells the  code  gen-
erator  to  put  the  expression  value  into a special dis-
tinguished register (this same mechanism is  used  for  pro-
cessing  the  return statement).  When the end of the switch
block is reached, the array containing the  case  values  is
sorted, and checked for duplicate entries (an error); if all
is correct,  the  machine  dependent  routine  _g_e_n_s_w_i_t_c_h  is
called,  with  this array of labels and values in increasing
order.  _G_e_n_s_w_i_t_c_h can assume that the value to be tested  is
already  in  the  register which is the usual integer return
value register.


                     September 27, 1987


                           - 18 -


_O_p_t_i_m_i_z_a_t_i_o_n

     There is a machine  independent  file,  _o_p_t_i_m._c,  which
contains  a  relatively  short  optimization routine, _o_p_t_i_m.
Actually the word optimization is something of  a  misnomer;
the  results are not optimum, only improved, and the routine
is in fact not optional; it must be called for proper opera-
tion of the compiler.

     _O_p_t_i_m is called after an expression tree is built,  but
before  the code generator is called.  The essential part of
its job is to call _c_l_o_c_a_l on the conversion  operators.   On
most machines, the treatment of & is also essential: by this
time in the processing, the only node which is a legal  des-
cendant  of & is NAME.  (Possible descendants of * have been
eliminated by _b_u_i_l_d_t_r_e_e.) The address of a static  name  is,
almost  by definition, a constant, and can be represented by
an ICON node on most machines (provided that the loader  has
enough power).  Unfortunately, this is not universally true;
on some machine, such as the IBM 370, the issue of addressa-
bility rears its ugly head; thus, before turning a NAME node
into an ICON node, the machine dependent function _a_n_d_a_b_l_e is
called.

     The optimization attempts of _o_p_t_i_m are  quite  limited.
It is primarily concerned with improving the behavior of the
compiler with operations one of whose arguments  is  a  con-
stant.   In the simplest case, the constant is placed on the
right if the operation is commutative.   The  compiler  also
makes a limited search for expressions such as

        ( _x + _a ) + _b

where _a and _b are constants, and attempts to combine _a and _b
at  compile  time.  A number of special cases are also exam-
ined; additions of 0 and multiplications by 1  are  removed,
although  the  correct  processing of these cases to get the
type of the resulting tree correct is decidedly  nontrivial.
In  some  cases,  the  addition  or  multiplication  must be
replaced by a conversion operator to  keep  the  types  from
becoming  fouled  up.  In cases where a relational operation
is being done and one operand is a  constant,  the  operands
are  permuted and the operator altered, if necessary, to put
the constant on the right.  Finally,  multiplications  by  a
power of 2 are changed to shifts.

_M_a_c_h_i_n_e _D_e_p_e_n_d_e_n_t _S_t_u_f_f

     A number of the first pass machine  dependent  routines
have  been  discussed  above.   In general, the routines are
short, and easy to adapt from machine to machine.   The  two
exceptions  to this general rule are _c_l_o_c_a_l and the function
prolog and epilog generation routines, _b_f_c_o_d_e and _e_f_c_o_d_e.


                     September 27, 1987


                           - 19 -


     _C_l_o_c_a_l has the job of  rewriting,  if  appropriate  and
desirable,  the  nodes  constructed by _b_u_i_l_d_t_r_e_e.  There are
two major areas where this  is  important:  NAME  nodes  and
conversion  operations.   In  the case of NAME nodes, _c_l_o_c_a_l
must rewrite the NAME node to reflect  the  actual  physical
location  of  the  name in the machine.  In effect, the NAME
node must be examined, the symbol table entry found (through
the _r_v_a_l field of the node), and, based on the storage class
of the node, the tree must be  rewritten.   Automatic  vari-
ables and parameters are typically rewritten by treating the
reference to the variable as a structure reference, off  the
register  which  holds  the  stack  or argument pointer; the
_s_t_r_e_f routine is set up to be called in  this  way,  and  to
build  the  appropriate tree.  In the most general case, the
tree consists of a unary * node, whose  descendant  is  a  +
node,  with  the stack or argument register as left operand,
and a constant offset as right  operand.   In  the  case  of
LABEL and internal static nodes, the _r_v_a_l field is rewritten
to be the negative of the internal label number; a  negative
_r_v_a_l  field  is  taken  to  be  an  internal  label  number.
Finally, a name of class REGISTER must be converted  into  a
REG  node,  and  the  _r_v_a_l  field  replaced  by the register
number.  In fact, this part of the _c_l_o_c_a_l routine is  nearly
machine  independent;  only for machines with addressability
problems (IBM 370 again!) does it have to be noticeably dif-
ferent.

     The conversion operator treatment is rather tricky.  It
is  necessary to handle the application of conversion opera-
tors to constants in _c_l_o_c_a_l,  in  order  that  all  constant
expressions can have their values known at compile time.  In
extreme cases, this may mean that  some  simulation  of  the
arithmetic  of the target machine might have to be done in a
cross-compiler.  In the most common case,  conversions  from
pointer  to pointer do nothing.  For some machines, however,
conversion from byte pointer to short or long pointer  might
require  a shift or rotate operation, which would have to be
generated here.

     The extension of  the  portable  compiler  to  machines
where  the  size  of  a pointer depends on its type would be
straightforward, but has not yet been done.

     Another machine dependent issue in the  first  pass  is
the  generation  of  external  ``symbol table'' information.
This sort of symbol table is used by programs such  as  sym-
bolic  debuggers  to relate object code back to source code.
Symbol table routines are provided in the file _s_t_a_b._c, which
is  included  in the machine dependent sources for the first
pass.  The symbol table routines insert assembly  code  con-
taining  assembly  pseudo-ops  directly into the instruction
stream generated by the compiler.

     There are two basic kinds of symbol  table  operations.


                     September 27, 1987


                           - 20 -


The  simplest  operation  is the generation of a source line
number; this serves to map an address in an executable image
into a line in a source file so that a debugger can find the
source code corresponding to  the  instructions  being  exe-
cuted.   The routine _p_s_l_i_n_e is called by the scanner to emit
source line numbers when a nonempty  source  line  is  seen.
The  other  variety of symbol table operation is the genera-
tion of type and address information about C symbols.   This
is  done  through  the  _o_u_t_s_t_a_b  routine,  which is normally
called using the FIXDEF macro in the monster  _d_e_f_i_d  routine
in  _p_f_t_n._c  that enters symbols into the compiler's internal
symbol table.

     Yet another  major  machine  dependent  issue  involves
function  prolog  and epilog generation.  The hard part here
is the design of the stack frame and calling sequence;  this
design issue is discussed elsewhere.  [ Johnson Lesk Ritchie
calling sequence ] The routine _b_f_c_o_d_e  is  called  with  the
number  of  arguments  the  function is defined with, and an
array containing the symbol table indices  of  the  declared
parameters.   _B_f_c_o_d_e must generate the code to establish the
new stack frame, save the return address and previous  stack
pointer  value on the stack, and save whatever registers are
to be used for register variables.  The stack size  and  the
number  of  register  variables  is not known when _b_f_c_o_d_e is
called, so these numbers must be referred  to  by  assembler
constants, which are defined when they are known (usually in
the second pass, after all register  variables,  automatics,
and  temporaries  have been seen).  The final job is to find
those parameters which may have been declared register,  and
generate  the code to initialize the register with the value
passed on the stack.  Once again,  for  most  machines,  the
general  logic  of _b_f_c_o_d_e remains the same, but the contents
of the _p_r_i_n_t_f calls  in  it  will  change  from  machine  to
machine.   _e_f_c_o_d_e is rather simpler, having just to generate
the default return at the end of a function.   This  may  be
nontrivial  in  the case of a function returning a structure
or union, however.

     There seems to be  no  really  good  place  to  discuss
structures  and  unions, but this is as good a place as any.
The C language now supports structure  assignment,  and  the
passing  of  structures  as  arguments to functions, and the
receiving of structures back from functions.  This was added
rather late to C, and thus to the portable compiler.  Conse-
quently, it fits in  less  well  than  the  older  features.
Moreover,  most  of the burden of making these features work
is placed on the machine dependent code.

     There are both conceptual and practical problems.  Con-
ceptually,  the  compiler is structured around the idea that
to compute something, you put it into a register and work on
it.   This  notion  causes a bit of trouble on some machines
(e.g., machines with 3-address opcodes),  but  matches  many


                     September 27, 1987


                           - 21 -


machines quite well.  Unfortunately, this notion breaks down
with structures.  The closest that one can come is  to  keep
the  addresses  of  the structures in registers.  The actual
code sequences used to move structures vary from the trivial
(a  multiple  byte  move) to the horrible (a function call),
and are very machine dependent.

     The practical problem is more painful.  When a function
returning  a  structure is called, this function has to have
some place to put the structure value.  If it places  it  on
the stack, it has difficulty popping its stack frame.  If it
places the value in a static temporary, the routine fails to
be  reentrant.   The most logically consistent way of imple-
menting this is for the caller to pass in  a  pointer  to  a
spot  where  the called function should put the value before
returning.  This is relatively straightforward,  although  a
bit  tedious,  to  implement, but means that the caller must
have properly declared the function type, even if the  value
is  never  used.   On  some  machines, such as the Interdata
8/32, the return value simply overlays the  argument  region
(which  on  the  8/32  is part of the caller's stack frame).
The caller takes care of leaving enough room if the returned
value  is larger than the arguments.  This also assumes that
the caller declares the function properly.

     The PDP-11 and the VAX have  stack  hardware  which  is
used  in  function  calls  and  returns;  this makes it very
inconvenient to use either  of  the  above  mechanisms.   In
these  machines, a static area within the called function is
allocated, and the function return value is copied  into  it
on  return; the function returns the address of that region.
This is simple to implement, but is non-reentrant.  However,
the function can now be called as a subroutine without being
properly declared, without the disaster which  would  other-
wise  ensue.  No matter what choice is taken, the convention
is that the function actually returns  the  address  of  the
return structure value.

     In building expression  trees,  the  portable  compiler
takes  a  bit for granted about structures.  It assumes that
functions returning structures actually return a pointer  to
the  structure,  and it assumes that a reference to a struc-
ture is actually a reference to its address.  The  structure
assignment  operator  is rebuilt so that the left operand is
the structure being assigned to, but the  right  operand  is
the  address  of the structure being assigned; this makes it
easier to deal with

        _a = _b = _c

and similar constructions.

     There are four special tree nodes associated with these
operations:  STASG  (structure assignment), STARG (structure


                     September 27, 1987


                           - 22 -


argument to a function call), and STCALL  and  UNARY  STCALL
(calls  of  a  function  with  nonzero  and  zero arguments,
respectively).  These four nodes are unique in that the size
and  alignment  information,  which can be determined by the
type for all other objects in C, must be known to carry  out
these  operations;  special  fields  are  set aside in these
nodes to contain this information, and special  intermediate
code is used to transmit this information.

_F_i_r_s_t _P_a_s_s _S_u_m_m_a_r_y

     There are may other  issues  which  have  been  ignored
here,  partly  to  justify the title ``tour'', and partially
because they have seemed to cause little trouble.  There are
some  debugging  flags which may be turned on, by giving the
compiler's first pass the argument

        -_X[flags]

Some of the more interesting flags are -Xd for the  defining
and freeing of symbols, -Xi for initialization comments, and
-Xb for various comments about the building  of  trees.   In
many  cases,  repeating  the  flag more than once gives more
information; thus, -Xddd gives more  information  than  -Xd.
In  the  two  pass version of the compiler, the flags should
not be set when the output is sent to the second pass, since
the  debugging output and the intermediate code both go onto
the standard output.

     We turn now to consideration of the second pass.

_P_a_s_s _T_w_o

     Code generation is far less well understood than  pars-
ing or lexical analysis, and for this reason the second pass
is far harder to discuss in a file by file manner.  A  great
deal  of  the  difficulty is in understanding the issues and
the strategies employed to meet them.  Any particular  func-
tion is likely to be reasonably straightforward.

     Thus, this part of the paper will  concentrate  a  good
deal  on the broader aspects of strategy in the code genera-
tor, and will not get too intimate with the details.

_O_v_e_r_v_i_e_w

     It is difficult to organize  a  code  generator  to  be
flexible  enough  to  generate  code  for  a large number of
machines, and still be efficient for any one of them.  Flex-
ibility  is  also  important  when it comes time to tune the
code generator to improve the output code quality.   On  the
other  hand,  too  much flexibility can lead to semantically
incorrect code, and potentially a combinatorial explosion in
the number of cases to be considered in the compiler.


                     September 27, 1987


                           - 23 -


     One goal of the code generator is to have a high degree
of  correctness.   It is very desirable to have the compiler
detect its own inability to generate  correct  code,  rather
than  to  produce  incorrect code.  This goal is achieved by
having a simple model of  the  job  to  be  done  (e.g.,  an
expression  tree)  and  a  simple model of the machine state
(e.g., which registers are free).  The act of generating  an
instruction  performs  a  transformation on the tree and the
machine state; hopefully, the tree eventually  gets  reduced
to     a     single     node.      If    each    of    these
instruction/transformation pairs  is  correct,  and  if  the
machine  state  model  really represents the actual machine,
and if the transformations reduce  the  input  tree  to  the
desired single node, then the output code will be correct.

     For most real machines, there is no  definitive  theory
of  code  generation  that  encompasses all the C operators.
Thus the selection of which  instruction/transformations  to
generate,  and  in what order, will have a heuristic flavor.
If, for some expression tree, no transformation applies, or,
more  seriously,  if  the  heuristics  select  a sequence of
instruction/transformations that do not in fact  reduce  the
tree,  the  compiler  will  report its inability to generate
code, and abort.

     A major part of the code generator  is  concerned  with
the  model and the transformations.  Most of this is machine
independent, or depends only on simple tables.   The  flexi-
bility  comes from the heuristics that guide the transforma-
tions of the trees, the selection of subgoals, and the  ord-
ering of the computation.

_T_h_e _M_a_c_h_i_n_e _M_o_d_e_l

     The machine is assumed to have a number  of  registers,
of at most two different types: _A and _B.  Within each regis-
ter class, there may be scratch  (temporary)  registers  and
dedicated  registers  (e.g.,  register  variables, the stack
pointer, etc.).  Requests to  allocate  and  free  registers
involve only the temporary registers.

     Each of the registers in the machine is  given  a  name
and a number in the _m_a_c_2_d_e_f_s._h file; the numbers are used as
indices into various tables that describe the registers,  so
they  should  be  kept small.  One such table is the _r_s_t_a_t_u_s
table on file _l_o_c_a_l_2._c.  This table is indexed  by  register
number,  and contains expressions made up from manifest con-
stants describing the register types:  SAREG  for  dedicated
AREG's,  SAREG|STAREG  for  scratch  AREG's,  and  SBREG and
SBREG|STBREG similarly for BREG's.  There  are  macros  that
access  this information: _i_s_b_r_e_g(_r) returns true if register
number _r is a BREG, and _i_s_t_r_e_g(_r) returns true  if  register
number  _r  is  a  temporary  AREG  or  BREG.  Another table,
_r_n_a_m_e_s, contains the  register  names;  this  is  used  when


                     September 27, 1987


                           - 24 -


putting out assembler code and diagnostics.

     The usage of registers is kept track  of  by  an  array
called _b_u_s_y.  _B_u_s_y[_r] is the number of uses of register _r in
the current tree being processed.  The allocation and  free-
ing of registers will be discussed later as part of the code
generation algorithm.

_G_e_n_e_r_a_l _O_r_g_a_n_i_z_a_t_i_o_n

     As mentioned above, the second pass  reads  lines  from
the   intermediate  file,  copying  through  to  the  output
unchanged any lines that begin with a `)', and  making  note
of the information about stack usage and register allocation
contained on lines beginning with `]' and `['.  The  expres-
sion trees, whose beginning is indicated by a line beginning
with `.', are read and rebuilt into trees.  If the  compiler
is  loaded as one pass, the expression trees are immediately
available to the code generator.

     The actual code generation is done by  a  hierarchy  of
routines.   The  routine  _d_e_l_a_y  is first given the tree; it
attempts to delay some postfix ++ and --  computations  that
might  reasonably  be  done after the smoke clears.  It also
attempts to handle comma (`,') operators  by  computing  the
left  side  expression first, and then rewriting the tree to
eliminate the operator.  _D_e_l_a_y calls _c_o_d_g_e_n to  control  the
actual code generation process.  _C_o_d_g_e_n takes as arguments a
pointer to the expression tree, and a second argument  that,
for  socio-historical  reasons,  is  called  a  _c_o_o_k_i_e.  The
cookie describes a set of goals that would be acceptable for
the  code generation: these are assigned to individual bits,
so they may be logically or'ed  together  to  form  a  large
number  of  possible  goals.   Among  the possible goals are
FOREFF (compute for side effects only; don't worry about the
value),  INTEMP  (compute  and  store value into a temporary
location in memory), INAREG (compute into  an  A  register),
INTAREG  (compute  into  a  scratch  A register), INBREG and
INTBREG similarly, FORCC (compute for condition codes),  and
FORARG (compute it as a function argument; e.g., stack it if
appropriate).

     _C_o_d_g_e_n first canonicalizes the tree by  calling  _c_a_n_o_n.
This  routine  looks  for certain transformations that might
now be applicable to the tree.  One, which  is  very  common
and very powerful, is to fold together an indirection opera-
tor (UNARY MUL) and a register (REG); in most machines, this
combination  is addressable directly, and so is similar to a
NAME in its behavior.  The UNARY  MUL  and  REG  are  folded
together to make another node type called OREG.  In fact, in
many machines it is possible to directly  address  not  just
the  cell pointed to by a register, but also cells differing
by a constant offset from the cell pointed to by the  regis-
ter.   _C_a_n_o_n  also looks for such cases, calling the machine


                     September 27, 1987


                           - 25 -


dependent routine _n_o_t_o_f_f to decide if the offset is  accept-
able (for example, in the IBM 370 the offset must be between
0 and 4095 bytes).  Another optimization is to  replace  bit
field  operations  by  shifts  and  masks  if  the operation
involves extracting the field.  Finally, a machine dependent
routine,  _s_u_c_o_m_p,  is  called that computes the Sethi-Ullman
numbers for the tree (see below).

     After the tree is canonicalized, _c_o_d_g_e_n calls the  rou-
tine  _s_t_o_r_e  whose job is to select a subtree of the tree to
be computed and (usually) stored before beginning the compu-
tation  of the full tree.  _S_t_o_r_e must return a tree that can
be computed without need for  any  temporary  storage  loca-
tions.  In effect, the only store operations generated while
processing the subtree must be as  a  response  to  explicit
assignment  operators in the tree.  This division of the job
marks one of the more significant,  and  successful,  depar-
tures  from  most  other  compilers.  It means that the code
generator can operate under the assumption  that  there  are
enough  registers to do its job, without worrying about tem-
porary storage.  If a store into a temporary appears in  the
output,  it  is  always  as  a direct result of logic in the
_s_t_o_r_e routine; this makes debugging easier.

     One consequence of this organization is  that  code  is
not  generated by a treewalk.  There are theoretical results
that support this decision.   [  Aho  Johnson  Optimal  Code
Trees jacm ] It may be desirable to compute several subtrees
and store them before tackling the whole tree; if a  subtree
is  to  be  stored, this is known before the code generation
for the subtree is begun, and the subtree is  computed  when
all scratch registers are available.

     The _s_t_o_r_e routine decides what subtrees, if any, should
be  stored  by  making  use  of numbers, called _S_e_t_h_i-_U_l_l_m_a_n
_n_u_m_b_e_r_s, that give, for each subtree of an expression  tree,
the  minimum number of scratch registers required to compile
the subtree, without any stores into temporaries.   [  Sethi
Ullman  jacm  1970  ]  These  numbers  are  computed  by the
machine-dependent routine  _s_u_c_o_m_p,  called  by  _c_a_n_o_n.   The
basic  notion  is that, knowing the Sethi-Ullman numbers for
the descendants of a node, and knowing the operator  of  the
node  and  some  information  about  the machine, the Sethi-
Ullman number of the node itself can be  computed.   If  the
Sethi-Ullman number for a tree exceeds the number of scratch
registers available, some subtree must  be  stored.   Unfor-
tunately, the theory behind the Sethi-Ullman numbers applies
only to uselessly simple machines and  operators.   For  the
rich  set  of  C operators, and for machines with asymmetric
registers, register pairs, different kinds of registers, and
exceptional  forms  of  addressing,  the  theory  cannot  be
applied directly.  The basic idea of estimation  is  a  good
one,  however,  and  well  worth  applying; the application,
especially when the compiler comes to be tuned for high code


                     September 27, 1987


                           - 26 -


quality,  goes  beyond  the park of theory into the swamp of
heuristics.  This topic will be taken up again  later,  when
more of the compiler structure has been described.

     After examining the Sethi-Ullman numbers, _s_t_o_r_e selects
a subtree, if any, to be stored, and returns the subtree and
the associated cookie in the external variables _s_t_o_t_r_e_e  and
_s_t_o_c_o_o_k.   If  a  subtree has been selected, or if the whole
tree is ready to be processed, the routine _o_r_d_e_r is  called,
with a tree and cookie.  _O_r_d_e_r generates code for trees that
do not require temporary locations.  _O_r_d_e_r may  make  recur-
sive  calls  on  itself,  and, in some cases, on _c_o_d_g_e_n; for
example, when processing the operators  &&,  ||,  and  comma
(`,'), that have a left to right evaluation, it is incorrect
for _s_t_o_r_e examine the  right  operand  for  subtrees  to  be
stored.   In these cases, _o_r_d_e_r will call _c_o_d_g_e_n recursively
when it is permissible to work  on  the  right  operand.   A
similar issue arises with the ? : operator.

     The _o_r_d_e_r routine works by matching  the  current  tree
with  a  set of code templates.  If a template is discovered
that will match the current tree and cookie, the  associated
assembly  language  statement  or  statements are generated.
The tree is then rewritten, as specified by the template, to
represent  the  effect  of the output instruction(s).  If no
template match is found, first an attempt is made to find  a
match with a different cookie; for example, in order to com-
pute an expression with cookie INTEMP  (store  into  a  tem-
porary storage location), it is usually necessary to compute
the expression  into  a  scratch  register  first.   If  all
attempts  to  match the tree fail, the heuristic part of the
algorithm becomes dominant.  Control is typically  given  to
one  of  a  number of machine-dependent routines that may in
turn recursively call _o_r_d_e_r to achieve a subgoal of the com-
putation  (for example, one of the arguments may be computed
into a temporary register).  After  this  subgoal  has  been
achieved,  the  process begins again with the modified tree.
If the machine-dependent heuristics are unable to reduce the
tree  further,  a  number  of default rewriting rules may be
considered appropriate.  For example, if the left operand of
a  +  is  a  scratch register, the + can be replaced by a +=
operator; the tree may then match a template.

     To close this introduction, we will discuss  the  steps
in compiling code for the expression

        _a += _b

where _a and _b are static variables.

     To begin with, the whole expression  tree  is  examined
with  cookie  FOREFF,  and  no  match is found.  Search with
other cookies is equally fruitless, so an attempt at rewrit-
ing is made.  Suppose we are dealing with the Interdata 8/32


                     September 27, 1987


                           - 27 -


for the moment.  It is recognized that  the  left  hand  and
right  hand sides of the += operator are addressable, and in
particular the left hand side has no side effects, so it  is
permissible to rewrite this as

        _a = _a + _b

and this is done.  No match is found on this tree either, so
a  machine  dependent rewrite is done; it is recognized that
the left hand side of the assignment is addressable, but the
right  hand  side  is  not in a register, so _o_r_d_e_r is called
recursively, being asked to put the right hand side  of  the
assignment  into  a  register.   This  invocation  of  _o_r_d_e_r
searches the tree for  a  match,  and  fails.   The  machine
dependent  rule for + notices that the right hand operand is
addressable; it decides to  put  the  left  operand  into  a
scratch  register.  Another recursive call to _o_r_d_e_r is made,
with the tree consisting solely  of  the  leaf  _a,  and  the
cookie asking that the value be placed into a scratch regis-
ter.  This now matches a template, and a load instruction is
emitted.   The node consisting of _a is rewritten in place to
represent the register into which  _a  is  loaded,  and  this
third  call  to _o_r_d_e_r returns.  The second call to _o_r_d_e_r now
finds that it has the tree

        reg + _b

to consider.  Once again, there is no match, but the default
rewriting  rule  rewrites  the + as a += operator, since the
left operand is a scratch  register.   When  this  is  done,
there is a match: in fact,

        reg += _b

simply describes the effect of the add instruction on a typ-
ical machine.  After the add is emitted, the tree is rewrit-
ten to consist merely of the register node, since the result
of  the  add  is  now in the register.  This agrees with the
cookie passed to the second invocation  of  _o_r_d_e_r,  so  this
invocation  terminates,  returning  to the first level.  The
original tree has now become

        _a = reg

which matches a template for  the  store  instruction.   The
store  is  output,  and  the tree rewritten to become just a
single register node.  At this point, since  the  top  level
call  to _o_r_d_e_r was interested only in side effects, the call
to _o_r_d_e_r returns, and the code generation is  completed;  we
have  generated  a  load, add, and store, as might have been
expected.

     The effect of machine architecture on this is consider-
able.   For  example,  on  the  Honeywell  6000, the machine


                     September 27, 1987


                           - 28 -


dependent heuristics recognize that there  is  an  ``add  to
storage'' instruction, so the strategy is quite different; _b
is loaded in to a register,  and  then  an  add  to  storage
instruction  generated  to  add  this register in to _a.  The
transformations, involving as they do the  semantics  of  C,
are  largely  machine independent.  The decisions as to when
to use them, however, are almost totally machine dependent.

     Having given a broad outline  of  the  code  generation
process,  we  shall  next consider the heart of it: the tem-
plates.  This leads naturally into discussions  of  template
matching  and  register allocation, and finally a discussion
of the machine dependent interfaces and strategies.

_T_h_e _T_e_m_p_l_a_t_e_s

     The templates describe the effect of the target machine
instructions  on  the  model of computation around which the
compiler is organized.  In effect, each  template  has  five
logical sections, and represents an assertion of the form:

     _I_f we have a subtree of a given shape (1), and we  have
     a  goal  (cookie)  or goals to achieve (2), and we have
     sufficient free resources (3),  _t_h_e_n  we  may  emit  an
     instruction  or  instructions (4), and rewrite the sub-
     tree in a particular manner (5), and the rewritten tree
     will achieve the desired goals.

     These five sections will be discussed  in  more  detail
later.  First, we give an example of a template:

        ASG PLUS, INAREG,
                  SAREG,    TINT,
                  SNAME,    TINT,
                            0,        RLEFT,
                            "         add       AL,AR\n",

The top line specifies the  operator  (+=)  and  the  cookie
(compute the value of the subtree into an AREG).  The second
and third lines specify  the  left  and  right  descendants,
respectively,  of the += operator.  The left descendant must
be a REG node, representing an A register, and have  integer
type,  while  the  right  side must be a NAME node, and also
have integer type.  The fourth line  contains  the  resource
requirements  (no  scratch registers or temporaries needed),
and the rewriting rule (replace the subtree by the left des-
cendant).   Finally,  the  quoted  string  on  the last line
represents the output to the assembler: lower case  letters,
tabs,  spaces,  etc.  are  copied  _v_e_r_b_a_t_i_m.  to the output;
upper case letters trigger  various  macro-like  expansions.
Thus,  _A_L  would  expand  into  the Address form of the Left
operand - presumably the  register  number.   Similarly,  _A_R
would  expand  into  the name of the right operand.  The _a_d_d
instruction of the last section might  well  be  emitted  by


                     September 27, 1987


                           - 29 -


this template.

     In principle, it would be  possible  to  make  separate
templates  for all legal combinations of operators, cookies,
types, and shapes.  In practice, the number of  combinations
is  very large.  Thus, a considerable amount of mechanism is
present to permit a large number of subtrees to  be  matched
by a single template.  Most of the shape and type specifiers
are individual bits, and can be  logically  or'ed  together.
There  are  a  number  of  special  descriptors for matching
classes of operators.  The cookies can also be combined.  As
an  example  of  the  kind of template that really arises in
practice, the actual template for the  Interdata  8/32  that
subsumes the above example is:

        ASG OPSIMP,         INAREG|FORCC,
                  SAREG,    TINT|TUNSIGNED|TPOINT,
                  SAREG|SNAME|SOREG|SCON,       TINT|TUNSIGNED|TPOINT,
                            0,        RLEFT|RESCC,
                            "         OI        AL,AR\n",

Here, OPSIMP represents the operators +, -,  |,  &,  and  ^.
The _O_I macro in the output string expands into the appropri-
ate Integer Opcode for the operator.   The  left  and  right
sides  can  be  integers,  unsigned,  or pointer types.  The
right side can be, in addition to  a  name,  a  register,  a
memory  location  whose  address  is given by a register and
displacement (OREG), or a constant.  Finally, these instruc-
tions  set the condition codes, and so can be used in condi-
tion contexts: the cookie and rewriting rules reflect this.

_T_h_e _T_e_m_p_l_a_t_e _M_a_t_c_h_i_n_g _A_l_g_o_r_i_t_h_m

     The heart of the second pass is the  template  matching
algorithm,  in  the  routine  _m_a_t_c_h.  _M_a_t_c_h is called with a
tree and a cookie; it  attempts  to  match  the  given  tree
against  some  template  that will transform it according to
one of the goals given in the cookie.  If a  match  is  suc-
cessful,  the transformation is applied; _e_x_p_a_n_d is called to
generate the assembly code, and then  _r_e_c_l_a_i_m  rewrites  the
tree,  and  reclaims  the resources, such as registers, that
might have become free as a result of the generated code.

     This part of the compiler is among the most time criti-
cal.   There  is  a  spectrum  of  implementation techniques
available for doing this matching.  The most naive algorithm
simply  looks at the templates one by one.  This can be con-
siderably improved upon by restricting  the  search  for  an
acceptable template.  It would be possible to do better than
this if the templates were given to a separate program  that
ate them and generated a template matching subroutine.  This
would make maintenance of the  compiler  much  more  compli-
cated, however, so this has not been done.


                     September 27, 1987


                           - 30 -


     The matching algorithm is actually carried out by  res-
tricting  the  range  in the table that must be searched for
each opcode.  This introduces  a  number  of  complications,
however,  and  needs a bit of sympathetic help by the person
constructing the compiler in order to obtain  best  results.
The  exact tuning of this algorithm continues; it is best to
consult the code and comments in _m_a_t_c_h for the  latest  ver-
sion.

     In order to match a template to a tree, it is necessary
to  match  not only the cookie and the operator of the root,
but also the types and shapes of the left and right  descen-
dants  (if  any)  of  the tree.  A convention is established
here that is carried out throughout the second pass  of  the
compiler.  If a node represents a unary operator, the single
descendant is always the ``left''  descendant.   If  a  node
represents  a unary operator or a leaf node (no descendants)
the ``right'' descendant is taken by convention  to  be  the
node  itself.  This enables templates to easily match leaves
and conversion operators, for  example,  without  any  addi-
tional mechanism in the matching program.

     The type matching is straightforward; it is possible to
specify  any  combination  of basic types, general pointers,
and pointers to one or more of the basic types.   The  shape
matching is somewhat more complicated, but still pretty sim-
ple.  Templates have a collection of possible operand shapes
on  which  the opcode might match.  In the simplest case, an
_a_d_d operation might be able to  add  to  either  a  register
variable  or  a  scratch  register,  and might be able (with
appropriate help from the assembler) to add an integer  con-
stant  (ICON), a static memory cell (NAME), or a stack loca-
tion (OREG).

     It is usually attractive to specify a  number  of  such
shapes, and distinguish between them when the assembler out-
put is produced.  It is possible to describe  the  union  of
many  elementary  shapes  such  as ICON, NAME, OREG, AREG or
BREG (both scratch and register forms), etc.  To  handle  at
least  the  simple  forms of indirection, one can also match
some more complicated forms of trees: STARNM and STARREG can
match more complicated trees headed by an indirection opera-
tor, and SFLD can match certain trees headed by a FLD opera-
tor.   These  patterns  call machine dependent routines that
match the patterns of interest  on  a  given  machine.   The
shape SWADD may be used to recognize NAME or OREG nodes that
lie on word boundaries: this may be of  some  importance  on
word  addressed  machines.   Finally, there are some special
shapes: these may not be used in conjunction with the  other
shapes, but may be defined and extended in machine dependent
ways.  The special shapes SZERO, SONE, and SMONE are  prede-
fined and match constants 0, 1, and -1, respectively; others
are easy to add and match by  using  the  machine  dependent
routine _s_p_e_c_i_a_l.


                     September 27, 1987


                           - 31 -


     When a template has been found that matches the root of
the  tree,  the cookie, and the shapes and types of the des-
cendants, there is still one bar to a total match: the  tem-
plate  may  call  for some resources (for example, a scratch
register).  The routine _a_l_l_o is called, and it  attempts  to
allocate  the  resources.  If it cannot, the match fails; no
resources  are  allocated.   If  successful,  the  allocated
resources  are  given numbers 1, 2, etc. for later reference
when the assembly code is generated.   The  routines  _e_x_p_a_n_d
and _r_e_c_l_a_i_m are then called.  The _m_a_t_c_h routine then returns
a special value, MDONE.  If no match was  found,  the  value
MNOPE  is  returned;  this  is a signal to the caller to try
more cookie values, or attempt a rewriting rule.   _M_a_t_c_h  is
also  used  to  select  rewriting rules, although the way of
doing this is pretty  straightforward.   A  special  cookie,
FORREW, is used to ask _m_a_t_c_h to search for a rewriting rule.
The rewriting rules are keyed to various opcodes;  most  are
carried out in _o_r_d_e_r.  Since the question of when to rewrite
is one of the key issues in  code  generation,  it  will  be
taken up again later.

_R_e_g_i_s_t_e_r _A_l_l_o_c_a_t_i_o_n

     The register allocation routines,  and  the  allocation
strategy, play a central role in the correctness of the code
generation algorithm.  If there are bugs in the Sethi-Ullman
computation  that cause the number of needed registers to be
underestimated, the compiler may run out of  scratch  regis-
ters; it is essential that the allocator keep track of those
registers that are free and busy, in order  to  detect  such
conditions.

     Allocation of registers takes place as the result of  a
template  match;  the  routine  _a_l_l_o  is  called with a word
describing the number of A registers, B registers, and  tem-
porary  locations needed.  The allocation of temporary loca-
tions on the stack is relatively straightforward,  and  will
not be further covered; the bookkeeping is a bit tricky, but
conceptually trivial, and requests for  temporary  space  on
the stack will never fail.

     Register allocation is less straightforward.   The  two
major  complications  are  _p_a_i_r_i_n_g  and  _s_h_a_r_i_n_g.   In  many
machines, some operations (such as multiplication and  divi-
sion), and/or some types (such as longs or double precision)
require even/odd pairs  of  registers.   Operations  of  the
first  type  are exceptionally difficult to deal with in the
compiler; in fact, their theoretical properties  are  rather
bad  as  well.   [  Aho  Johnson  Ullman Multiregister ] The
second issue is  dealt  with  rather  more  successfully;  a
machine  dependent  function  called  _s_z_t_y(_t) is called that
returns 1 or 2, depending  on  the  number  of  A  registers
required to hold an object of type _t.  If _s_z_t_y returns 2, an
even/odd pair of A registers is allocated for each  request.


                     September 27, 1987


                           - 32 -


As  part  of  its  duties,  the  routine _u_s_a_b_l_e finds usable
register pairs for various operations.  This task is not  as
easy as it sounds; it does not suffice to merely use _s_z_t_y on
the expression tree, since there are situations in  which  a
register  pair temporary is needed even though the result of
the expression requires only one register.  This  can  occur
with assignment operator expressions which have int type but
a double right hand side,  or  with  relational  expressions
where one operand is float and the other double.

     The other issue, sharing, is more subtle, but important
for  good code quality.  When registers are allocated, it is
possible to reuse registers that hold  address  information,
and  use  them  to  contain the values computed or accessed.
For example, on the IBM 360, if register 2 has a pointer  to
an  integer  in  it, we may load the integer into register 2
itself by saying:

        L         2,0(2)

If register 2 had a byte pointer, however, the sequence  for
loading  a  character  involves clearing the target register
first, and then inserting the desired character:

        SR        3,3
        IC        3,0(2)

In the first case, if register 3 were used as the target, it
would  lead  to  a  larger  number of registers used for the
expression than were required; the compiler  would  generate
inefficient  code.   On  the  other hand, if register 2 were
used as the target in the second case, the code would simply
be wrong.  In the first case, register 2 can be _s_h_a_r_e_d while
in the second, it cannot.

     In the specification of the register needs in the  tem-
plates,  it is possible to indicate whether required scratch
registers may be shared with possible registers on the  left
or the right of the input tree.  In order that a register be
shared, it must be scratch, and it must be used  only  once,
on the appropriate side of the tree being compiled.

     The _a_l_l_o routine thus has a bit more to do  than  meets
the eye; it calls _f_r_e_e_r_e_g to obtain a free register for each
A and B register request.  _F_r_e_e_r_e_g makes multiple  calls  on
the routine _u_s_a_b_l_e to decide if a given register can be used
to satisfy a given need.  _U_s_a_b_l_e calls _s_h_a_r_e_i_t if the regis-
ter  is  busy,  but might be shared.  Finally, _s_h_a_r_e_i_t calls
_u_s_h_a_r_e to decide if the desired register is actually in  the
appropriate subtree, and can be shared.

     Just to add additional  complexity,  on  some  machines
(such as the IBM 370) it is possible to have ``double index-
ing'' forms of addressing; these are represented  by  OREG's


                     September 27, 1987


                           - 33 -


with  the base and index registers encoded into the register
field.  While the register allocation and  deallocation  _p_e_r
_s_e  is  not made more difficult by this phenomenon, the code
itself is somewhat more complex.

     Having allocated the registers and expanded the  assem-
bly  language, it is time to reclaim the resources; the rou-
tine _r_e_c_l_a_i_m does this.  Many operations produce  more  than
one  result.   For  example,  many arithmetic operations may
produce a value in a register, and also  set  the  condition
codes.   Assignment  operations  may leave results both in a
register and in memory.  _R_e_c_l_a_i_m is passed three parameters;
the  tree  and  cookie  that were matched, and the rewriting
field of the  template.   The  rewriting  field  allows  the
specification  of possible results; the tree is rewritten to
reflect the results of the operation.  If the tree was  com-
puted for side effects only (FOREFF), the tree is freed, and
all resources in it reclaimed.  If the tree was computed for
condition  codes, the resources are also freed, and the tree
replaced by a special  node  type,  FORCC.   Otherwise,  the
value  may  be  found  in the left argument of the root, the
right  argument  of  the  root,  or  one  of  the  temporary
resources allocated.  In these cases, first the resources of
the tree, and the newly allocated resources, are freed; then
the resources needed by the result are made busy again.  The
final result must  always  match  the  shape  of  the  input
cookie;  otherwise, the compiler error ``cannot reclaim'' is
generated.  There are some machine dependent ways of prefer-
ring  results in registers or memory when there are multiple
results matching multiple goals in the cookie.

     _R_e_c_l_a_i_m also implements, in a curious way, C's  ``usual
arithmetic  conversions''.  When a value is generated into a
temporary register, _r_e_c_l_a_i_m decides what the type  and  size
of  the  result  will  be.   Unless  automatic conversion is
specifically suppressed in the  code  template  with  the  T
macro,  _r_e_c_l_a_i_m  converts  char  and  short  results to int,
unsigned char and unsigned short results  to  unsigned  int,
and float into double (for double only floating point arith-
metic).  This conversion is a simple type pun;  no  instruc-
tions  for  converting the value are actually emitted.  This
implies that registers must always contain a value  that  is
at  least as wide as a register, which greatly restricts the
range of possible templates.

_T_h_e _M_a_c_h_i_n_e _D_e_p_e_n_d_e_n_t _I_n_t_e_r_f_a_c_e

     The files _o_r_d_e_r._c, _l_o_c_a_l_2._c, and _t_a_b_l_e._c,  as  well  as
the  header  file  _m_a_c_2_d_e_f_s, represent the machine dependent
portion of the second pass.  The machine  dependent  portion
can  be  roughly  divided into two: the easy portion and the
hard portion.  The easy portion tells the compiler the names
of  the  registers,  and arranges that the compiler generate
the  proper  assembler  formats,  opcode   names,   location


                     September 27, 1987


                           - 34 -


counters,  etc.   The hard portion involves the Sethi-Ullman
computation, the rewriting rules, and, to some  extent,  the
templates.   It is hard because there are no real algorithms
that apply; most of this portion  is  based  on  heuristics.
This  section  discusses  the easy portion; the next several
sections will discuss the hard portion.

     If the compiler  is  adapted  from  a  compiler  for  a
machine  of  similar  architecture,  the easy part is indeed
easy.  In _m_a_c_2_d_e_f_s, the register  numbers  are  defined,  as
well  as various parameters for the stack frame, and various
macros that describe the machine  architecture.   If  double
indexing  is to be permitted, for example, the symbol R2REGS
is defined.  Also, a number of macros that are  involved  in
function  call  processing,  especially for unusual function
call mechanisms, are defined here.

     In _l_o_c_a_l_2._c, a large number  of  simple  functions  are
defined.   These do things such as write out opcodes, regis-
ter names, and address forms for the assembler.  Part of the
function  call  code  is defined here; that is nontrivial to
design, but typically rather straightforward  to  implement.
Among the easy routines in _o_r_d_e_r._c are routines for generat-
ing a created label, defining a label,  and  generating  the
arguments of a function call.

     These routines tend to have a local effect, and  depend
on  a fairly straightforward way on the target assembler and
the design decisions already made about the compiler.   Thus
they will not be further treated here.

_T_h_e _R_e_w_r_i_t_i_n_g _R_u_l_e_s

     When a tree fails to match any template, it  becomes  a
candidate  for rewriting.  Before the tree is rewritten, the
machine dependent routine _n_e_x_t_c_o_o_k is called with  the  tree
and  the  cookie; it suggests another cookie that might be a
better candidate for the matching of the tree.  If all  else
fails, the templates are searched with the cookie FORREW, to
look for a rewriting rule.  The rewriting rules are  of  two
kinds;  for  most of the common operators, there are machine
dependent rewriting rules that may  be  applied;  these  are
handled  by  machine dependent functions that are called and
given the tree to be computed.  These  routines  may  recur-
sively  call _o_r_d_e_r or _c_o_d_g_e_n to cause certain subgoals to be
achieved; if they actually call for some alteration  of  the
tree,  they  return  1,  and  the  code generation algorithm
recanonicalizes and tries again.  If these  routines  choose
not  to  deal with the tree, the default rewriting rules are
applied.

     The assignment operators, when rewritten, call the rou-
tine  _s_e_t_a_s_g.   This is assumed to rewrite the tree at least
to the point where there are no side  effects  in  the  left


                     September 27, 1987


                           - 35 -


hand  side.   If there is still no template match, a default
rewriting is done that causes an expression such as

        _a += _b

to be rewritten as

        _a = _a + _b

This is a useful default for  certain  mixtures  of  strange
types  (for  example, when _a is a bit field and _b an charac-
ter) that otherwise might need separate table entries.

     Simple assignment, structure assignment, and all  forms
of  calls  are  handled  completely by the machine dependent
routines.  For historical reasons, the  routines  generating
the  calls  return  1  on  failure, 0 on success, unlike the
other routines.

     The machine dependent  routine  _s_e_t_b_i_n  handles  binary
operators;  it  too must do most of the job.  In particular,
when it returns 0, it must do so with the left hand side  in
a  temporary  register.   The default rewriting rule in this
case is to convert the binary operator into  the  associated
assignment  operator; since the left hand side is assumed to
be a temporary register, this preserves  the  semantics  and
often allows a considerable saving in the template table.

     The increment and decrement operators may be dealt with
with the machine dependent routine _s_e_t_i_n_c_r.  If this routine
chooses not to  deal  with  the  tree,  the  rewriting  rule
replaces

        _x ++

by

        ( (_x += _1) - _1)

which preserves the semantics.  Once again, this is not  too
attractive for the most common cases, but can generate close
to optimal code when the type of x is unusual.

     Finally, the indirection (UNARY MUL) operator  is  also
handled  in  a  special  way.  The machine dependent routine
_o_f_f_s_t_a_r is extremely important for the efficient  generation
of  code.   _O_f_f_s_t_a_r is called with a tree that is the direct
descendant of a UNARY MUL node; its job is to transform this
tree   so  that  the  combination  of  UNARY  MUL  with  the
transformed tree becomes  addressable.   On  most  machines,
_o_f_f_s_t_a_r can simply compute the tree into an A or B register,
depending on the architecture, and then _c_a_n_o_n will make  the
resulting  tree into an OREG.  On many machines, _o_f_f_s_t_a_r can
profitably choose to do less work than computing its  entire


                     September 27, 1987


                           - 36 -


argument  into  a  register.   For  example,  if  the target
machine supports OREG's with a constant offset from a regis-
ter, and _o_f_f_s_t_a_r is called with a tree of the form

        _e_x_p_r + _c_o_n_s_t

where _c_o_n_s_t is a constant, then _o_f_f_s_t_a_r  need  only  compute
_e_x_p_r  into  the  appropriate  form of register.  On machines
that support double indexing, _o_f_f_s_t_a_r  may  have  even  more
choice  as to how to proceed.  The proper tuning of _o_f_f_s_t_a_r,
which is not typically too difficult, should be one  of  the
first  tries  at  optimization  attempted  by  the  compiler
writer.

_T_h_e _S_e_t_h_i-_U_l_l_m_a_n _C_o_m_p_u_t_a_t_i_o_n

     The heart of the heuristics is the computation  of  the
Sethi-Ullman  numbers.   This  computation is closely linked
with the rewriting rules and the  templates.   As  mentioned
before,  the  Sethi-Ullman  numbers are expected to estimate
the number of scratch registers needed to compute  the  sub-
trees  without  using  any  stores.   However,  the original
theory does not apply to real machines.  For one thing,  the
theory assumes that all registers are interchangeable.  Real
machines have general purpose,  floating  point,  and  index
registers,  register  pairs,  etc.  The theory also does not
account for side effects; this rules out  various  forms  of
pathology  that  arise from assignment and assignment opera-
tors.  Condition codes are also undreamed of.  Finally,  the
influence  of  types, conversions, and the various addressa-
bility restrictions and extensions of real machines are also
ignored.

     Nevertheless,  for  a  ``useless''  theory,  the  basic
insight  of  Sethi  and Ullman is amazingly useful in a real
compiler.  The notion that one should  attempt  to  estimate
the resource needs of trees before starting the code genera-
tion provides a natural means of splitting the code  genera-
tion  problem,  and  provides  a  bit of redundancy and self
checking in the compiler.  Moreover, if writing  the  Sethi-
Ullman  routines is hard, describing, writing, and debugging
the alternative (routines that attempt to free up  registers
by  stores  into  temporaries ``on the fly'') is even worse.
Nevertheless, it should be  clearly  understood  that  these
routines exist in a realm where there is no ``right'' way to
write them; it is an art, the realm of heuristics, and, con-
sequently,  a  major source of bugs in the compiler.  Often,
the early, crude versions  of  these  routines  give  little
trouble; only after the compiler is actually working and the
code quality is being improved do serious problem have to be
faced.   Having  a  simple,  regular machine architecture is
worth quite a lot at this time.

     The  major  problems  arise  from  asymmetries  in  the


                     September 27, 1987


                           - 37 -


registers:  register pairs, having different kinds of regis-
ters, and the related  problem  of  needing  more  than  one
register  (frequently  a  pair)  to store certain data types
(such as longs or doubles).  There appears to be no  general
way  of  treating  this problem; solutions have to be fudged
for each machine where the problem arises.  On the Honeywell
66,  for  example, there are only two general purpose regis-
ters, so a need for a pair is the same as the need  for  two
registers.   On the IBM 370, the register pair (0,1) is used
to do multiplications and divisions; registers 0 and  1  are
not  generally considered part of the scratch registers, and
so do not require allocation explicitly.  On  the  Interdata
8/32, after much consideration, the decision was made not to
try to deal with the register pair issue; operations such as
multiplication  and division that required pairs were simply
assumed to take all of the scratch registers.  Several weeks
of  effort had failed to produce an algorithm that seemed to
have much chance of running successfully without  inordinate
debugging  effort.   The difficulty of this issue should not
be minimized; it represents one  of  the  main  intellectual
efforts in porting the compiler.  Nevertheless, this problem
has been fudged with a degree of success on nearly  a  dozen
machines, so the compiler writer should not abandon hope.

     The Sethi-Ullman computations interact with the rest of
the  compiler in a number of rather subtle ways.  As already
discussed, the _s_t_o_r_e routine uses the  Sethi-Ullman  numbers
to  decide  which  subtrees  are too difficult to compute in
registers, and  must  be  stored.   There  are  also  subtle
interactions  between  the rewriting routines and the Sethi-
Ullman numbers.  Suppose we have a tree such as

        _A - _B

where _A and _B are expressions; suppose further that _B  takes
two  registers,  and  _A  one.  It is possible to compute the
full expression in two registers by first computing  _B,  and
then, using the scratch register used by _B, but not contain-
ing the answer, compute _A.   The  subtraction  can  then  be
done,  computing  the expression.  (Note that this assumes a
number of things, not the least of  which  are  register-to-
register  subtraction operators and symmetric registers.) If
the  machine  dependent  routine  _s_e_t_b_i_n,  however,  is  not
prepared  to recognize this case and compute the more diffi-
cult side of the expression first, the  Sethi-Ullman  number
must  be  set to three.  Thus, the Sethi-Ullman number for a
tree should represent the code that  the  machine  dependent
routines are actually willing to generate.

     The interaction can go the other way.  If  we  take  an
expression such as

        * ( _p + _i )


                     September 27, 1987


                           - 38 -


where _p is a pointer and _i an integer, this can probably  be
done  in  one  register  on most machines.  Thus, its Sethi-
Ullman number would probably  be  set  to  one.   If  double
indexing  is possible in the machine, a possible way of com-
puting the expression is to load both _p and  _i  into  regis-
ters,  and  then  use  double  indexing.  This would use two
scratch registers; in such a case, it is possible  that  the
scratch  registers might be unobtainable, or might make some
other part of the computation run  out  of  registers.   The
usual  solution  is to cause _o_f_f_s_t_a_r to ignore opportunities
for double indexing that would tie up more scratch registers
than the Sethi-Ullman number had reserved.

     In summary,  the  Sethi-Ullman  computation  represents
much of the craftsmanship and artistry in any application of
the portable compiler.  It is  also  a  frequent  source  of
bugs.   Algorithms  are  available  that will produce nearly
optimal code for  specialized  machines,  but  unfortunately
most  existing  machines  are far removed from these ideals.
The best way of proceeding in practice is to  start  with  a
compiler  for  a  similar machine to the target, and proceed
very carefully.

_R_e_g_i_s_t_e_r _A_l_l_o_c_a_t_i_o_n

     After the  Sethi-Ullman  numbers  are  computed,  _o_r_d_e_r
calls  a  routine,  _r_a_l_l_o, that does register allocation, if
appropriate.  This routine does relatively little,  in  gen-
eral;  this  is  especially  true  if  the target machine is
fairly regular.  There are a few cases where it  is  assumed
that the result of a computation takes place in a particular
register; switch and  function  return  are  the  two  major
places.   The expression tree has a field, _r_a_l_l, that may be
filled with a register number; this is taken to  be  a  pre-
ferred  register, and the first temporary register allocated
by a template match will be this preferred  one,  if  it  is
free.  If not, no particular action is taken; this is just a
heuristic.  If no register preference is present, the  field
contains  NOPREF.   In some cases, the result must be placed
in a given register, no matter what.  The register number is
placed  in  _r_a_l_l,  and the mask MUSTDO is logically or'ed in
with it.  In this case, if the subtree  is  requested  in  a
register,  and  comes  back  in  a  register  other than the
demanded one, it is moved by calling the routine _r_m_o_v_e.   If
the  target register for this move is busy, it is a compiler
error.

     Note that this mechanism is the only one that will ever
cause  a register-to-register move between scratch registers
(unless such a move is buried in the  depths  of  some  tem-
plate).  This simplifies debugging.  In some cases, there is
a rather strange interaction between the register allocation
and  the  Sethi-Ullman  number;  if  there is an operator or
situation requiring a particular register, the allocator and


                     September 27, 1987


                           - 39 -


the  Sethi-Ullman  computation  must conspire to ensure that
the target register is not being used by  some  intermediate
result of some far-removed computation.  This is most easily
done by making the special operation take all  of  the  free
registers,  preventing  any other partially-computed results
from cluttering up the works.

_T_e_m_p_l_a_t_e _S_h_o_r_t_c_u_t_s

     Some operations are just too hard or too clumsy  to  be
implemented in code templates on a particular architecture.

     One way to handle such operations is  to  replace  them
with  function calls.  The intermediate file reading code in
_r_e_a_d_e_r._c contains a  call  to  an  implementation  dependent
macro MYREADER; this can be defined to call various routines
which walk the code tree and  perform  transformations.   On
the VAX, for example, unsigned division and remainder opera-
tions are far too complex to encode in a template.  The rou-
tine  _h_a_r_d_o_p_s  is  called  from  a  tree walk in _m_y_r_e_a_d_e_r to
detect these operations and replace them with calls to the C
runtime  functions  _u_d_i_v and _u_r_e_m.  (There are complementary
functions _a_u_d_i_v and _a_u_r_e_m which are provided as support  for
unsigned assignment operator expressions; they are different
from _u_d_i_v and _u_r_e_m because the left hand side of an  assign-
ment  operator expression must be evaluated only once.) Note
that arithmetic support routines are always  expensive;  the
compiler makes an effort to notice common operations such as
unsigned division by a constant power of two  and  generates
optimal code for these inline.

     Another escape  involves  the  routine  _z_z_z_c_o_d_e.   This
function  is  called  from _e_x_p_a_n_d to process template macros
which start with the character _Z.  On the VAX, many  complex
code  generation  problems  are  swept  under  the  rug into
_z_z_z_c_o_d_e.  Scalar type conversions are a particularly  annoy-
ing  issue;  they  are primarily handled using the macro ZA.
Rather than creating a template for each possible conversion
and  result,  which  would  be tedious and complex given C's
many scalar types, this macro allows the  compiler  to  take
shortcuts.   Tough  conversions such as unsigned into double
are easily handled using special code under ZA.  One conven-
tion  which makes scalar conversions somewhat more difficult
than they might otherwise be is the strict requirement  that
values  in  registers  must  have  a type that is as wide or
wider than a single register.  This convention is used  pri-
marily  to implement the ``usual arithmetic conversions'' of
C, but it can get in the way when converting between (say) a
char value and an unsigned short.  A routine named _c_o_l_l_a_p_s_i_-
_b_l_e is used to determine whether one  operation  or  two  is
needed to produce a register-width result.

     Another convenient macro is ZP.  This macro is used  to
generate an appropriate conditional test after a comparison.


                     September 27, 1987


                           - 40 -


This makes it possible to  avoid  a  profusion  of  template
entries  which  essentially  duplicate each other, one entry
for each type of test multiplied by the number of  different
comparison conditions.  A related macro, ZN, is used to nor-
malize the result of  a  relational  test  by  producing  an
integer 0 or 1.

     The macro ZS does the unlovely job of  generating  code
for  structure assignments.  It tests the size of the struc-
ture to see what VAX instruction can be used to move it, and
is  capable  of  emitting a block move instruction for large
structures.  On other architectures this macro could be used
to generate a function call to a block copy routine.

     The macro ZG was  recently  introduced  to  handle  the
thorny  issue  of assignment operator expressions which have
an integral left hand side and a floating point  right  hand
side.   These  expressions  are passed to the code generator
without the usual type balancing so that good  code  can  be
generated for them.  Older versions of the portable compiler
computed these expressions with integer arithmetic; with the
ZG  operator, the current compiler can convert the left hand
side to the appropriate floating type, compute  the  expres-
sion with floating point arithmetic, convert the result back
to integral type and store it in the left hand side.   These
operations  are  performed by recursive calls to _z_z_z_c_o_d_e and
other routines related to _e_x_p_a_n_d.

     An assortment of other macros finish the job of  inter-
preting code templates.  Among the more interesting ones: ZC
produces the number of words pushed on the  argument  stack,
which  is  useful for function calls; ZD and ZE produce con-
stant increment and decrement operations; ZL and ZR  produce
the  assembler  letter code (l, w or b) corresponding to the
size and type of the left and right operand respectively.

_S_h_a_r_e_d _C_o_d_e

     The _l_i_n_t utility shares sources with the portable  com-
piler.   _L_i_n_t  uses  all  of  the machine independent pass 1
sources, and adds its own set of ``machine dependent''  rou-
tines,  contained  mostly  in  _l_i_n_t._c.   _L_i_n_t uses a private
intermediate file format and a private pass 2  whose  source
is  _l_p_a_s_s_2._c.   Several  modifications  were  made  to the C
scanner in _s_c_a_n._c, conditionally compiled  with  the  symbol
LINT,  in  order  to  support  _l_i_n_t's  convention of passing
``pragma'' information inside special comments.  A few other
minor  modifications  were  also made, _e._g. to skip over _a_s_m
statements.

     The _f_7_7 and _p_c compilers use  a  code  generator  which
shares  sources  with pass 2 of the portable compiler.  This
code generator is very similar to pass 2  but  uses  a  dif-
ferent  intermediate  file  format.   Three source files are


                     September 27, 1987


                           - 41 -


needed in addition to the  pass  2  sources.   _f_o_r_t._c  is  a
machine independent source file which contains a pass 2 main
routine that replaces the equivalent  routine  in  _r_e_a_d_e_r._c,
together with several routines for reading the binary inter-
mediate file.  _f_o_r_t._c includes the  machine  dependent  file
_f_o_r_t._h, which defines two trivial label generation routines.
A header file /_u_s_r/_i_n_c_l_u_d_e/_p_c_c._h  defines  opcode  and  type
symbols  which are needed to provide a standard intermediate
file format; this file is also included by the  Fortran  and
Pascal  compilers.  The creation of this header file made it
necessary to make some changes in the  way  the  portable  C
compiler  is built.  These changes were made with the aim of
minimizing the number  of  lines  changed  in  the  original
sources.   Macro  symbols in _p_c_c._h are flagged with a unique
prefix to avoid symbol name collisions in  the  Fortran  and
Pascal  compilers,  which have their own internal opcode and
type symbols.  A _s_e_d(1) script is used to strip  these  pre-
fixes,  producing  an include file named _p_c_c_l_o_c_a_l._h which is
specific to the portable C compiler and contains opcode sym-
bols  which are compatible with the original opcode symbols.
A similar _s_e_d script is used  to  produce  a  file  of  Yacc
tokens for the C grammar.

     A number of changes to existing source files were  made
to  accommodate the Fortran-style pass 2.  These changes are
conditionally compiled using the symbol FORT.  Many  changes
were  needed to implement single-precision arithmetic; other
changes concern such things as  the  avoidance  of  floating
point move instructions, which on the VAX can cause floating
point faults when a datum is not a normalized floating point
value.  In earlier implementations of the Fortran-style pass
2 there were a number of stub files  which  served  only  to
define  the  symbol  FORT in a particular source file; these
files have been removed for 4.3BSD in favor of a new  compi-
lation  strategy  which yields up to three different objects
from a single source file,  depending  on  what  compilation
control symbols are defined for that file.

     The Fortran-style pass 2 uses a Polish  Postfix  inter-
mediate  file.   The  file is in binary format, and is logi-
cally divided into a stream of 32-bit records.  Each  record
consists  of  an (_o_p_c_o_d_e, _v_a_l_u_e, _t_y_p_e) triple, possibly fol-
lowed inline by more descriptive  information.   The  _o_p_c_o_d_e
and  _t_y_p_e  are  selected  from  the  list in _p_c_c._h; the type
encodes a basic type,  around  which  may  be  wrapped  type
modifiers  such as ``pointer to'' or ``array of'' to produce
more complex types.  The function  of  the  _v_a_l_u_e  parameter
depends on the opcode; it may be used for a flag, a register
number or the value of a constant, or it may be unused.  The
optional  inline data is often a null-terminated string, but
it may also be a binary offset from a  register  or  from  a
symbolic  constant;  sometimes  both  a string and an offset
appear.


                     September 27, 1987


                           - 42 -


     Here are a few samples of intermediate file records and
their interpretation:

                     Optional
8OpcodeType Value9       Data8             Interpretation
8
98____________________________________________________________________________________________________________________________
9ICON  int  flag=0binary=5
7                                 the integer constant 5
NAME  char flag=1
7                 binary=1,
                 string="_foo_"
77                                 a character*1 element in a
                                 Fortran common block _f_o_o at
                                 offset 1
OREG  char reg=11
7                 offset=1,
                 string="v.2-v.1"
77                                 the second element of a For-
                                 tran character*1 array, ex-
                                 pressed as an offset from a
                                 static base register
PLUS  float
7                                 a single precision add
FTEXT      size=2string=".text 0"
7                                 an inline assembler directive
                                 of length 2 (32-bit records)


_C_o_m_p_i_l_e_r _B_u_g_s

     The portable compiler has an excellent record  of  gen-
erating   correct  code.   The  requirement  for  reasonable
cooperation between the  register  allocation,  Sethi-Ullman
computation,  rewriting  rules, and templates builds quite a
bit of redundancy into the compiling process.  The effect of
this  is  that,  in  a surprisingly short time, the compiler
will start generating correct code for those  programs  that
it can compile.  The hard part of the job then becomes find-
ing and eliminating  those  situations  where  the  compiler
refuses  to  compile a program because it knows it cannot do
it right.  For example, a template may  simply  be  missing;
this may either give a compiler error of the form ``no match
for op ...'' , or cause the compiler to go into an  infinite
loop  applying  various rewriting rules.  The compiler has a
variable, _n_r_e_c_u_r, that is set to 0 at the  beginning  of  an
expressions, and incremented at key spots in the compilation
process; if this parameter  gets  too  large,  the  compiler
decides  that  it  is in a loop, and aborts.  Loops are also
characteristic of botches in the machine-dependent rewriting
rules.   Bad  Sethi-Ullman  computations  usually  cause the
scratch registers to run out;  this  often  means  that  the
Sethi-Ullman  number  was  underestimated,  so _s_t_o_r_e did not
store something it should have; alternatively, it  can  mean
that  the  rewriting rules were not smart enough to find the
sequence that _s_u_c_o_m_p assumed would be used.

     The best approach when a  compiler  error  is  detected
involves  several stages.  First, try to get a small example
program that steps on the  bug.   Second,  turn  on  various
debugging  flags  in the code generator, and follow the tree
through the process of being matched  and  rewritten.   Some
flags  of interest are -e, which prints the expression tree,
-r,  which  gives  information  about  the   allocation   of

9                     September 27, 1987


                           - 43 -


registers, -a, which gives information about the performance
of _r_a_l_l_o, and -o, which gives information about the behavior
of _o_r_d_e_r.  This technique should allow most bugs to be found
relatively quickly.

     Unfortunately, finding the bug is usually  not  enough;
it  must also be fixed!  The difficulty arises because a fix
to the particular bug of interest tends to break other  code
that  already  works.   Regression tests, tests that compare
the performance of a new compiler against the performance of
an  older  one, are very valuable in preventing major catas-
trophes.

_C_o_m_p_i_l_e_r _E_x_t_e_n_s_i_o_n_s

     The portable C compiler makes a few extensions  to  the
language described by Ritchie.

     _S_i_n_g_l_e _p_r_e_c_i_s_i_o_n _a_r_i_t_h_m_e_t_i_c.  ``All floating arithmetic
in  C  is  carried out in double-precision; whenever a _f_l_o_a_t
appears in a an expression it is  lengthened  to  _d_o_u_b_l_e  by
zero-padding  its  fraction.''  Dennis Ritchie.  [ kernighan
ritchie prentice 1978 ] Programmers who would like to use  C
to  write  numerical  applications  often  shy  away from it
because C programs cannot perform  single  precision  arith-
metic.   On  machines such as the VAX which can cleanly sup-
port arithmetic on two (or more)  sizes  of  floating  point
values,  programs  which can take advantage of single preci-
sion arithmetic will run faster.  A  very  popular  proposal
for the ANSI C standard states that implementations may per-
form single precision  computations  with  single  precision
arithmetic;  some  actual C implementations already do this,
and now the Berkeley compiler joins them.

     The changes are implemented in the compiler with a  set
of  conditional  compilation  directives based on the symbol
SPRECC; thus two compilers are generated, one with only dou-
ble precision arithmetic and one with both double and single
precision arithmetic.  The _c_c program  uses  a  flag  -_f  to
select   the   single/double   version   of   the   compiler
(/_l_i_b/_s_c_c_o_m) instead of  the  default  double  only  version
(/_l_i_b/_c_c_o_m).  It is expected that at some time in the future
the  double  only  compiler  will   be   retired   and   the
single/double compiler will become the default.

     There  are  a  few  implementation   details   of   the
single/double  compiler  which  will be of interest to users
and compiler porters.  To maintain compatibility with  func-
tions compiled by the double only compiler, single precision
actual arguments are still coerced to double precision,  and
formal  arguments  which  are  declared single precision are
still ``really'' double precision.  This may change if func-
tion prototypes of the sort proposed for the ANSI C standard
are eventually adopted.  Floating point  constants  are  now


                     September 27, 1987


                           - 44 -


classified into single precision and double precision types.
The precision of a constant is determined from context; if a
floating constant appears in an arithmetic expression with a
single precision value, the constant is  treated  as  having
single  precision type and the arithmetic expression is com-
puted using single precision arithmetic.

     Remarkably little code in the  compiler  needed  to  be
changed  to  implement  the single/double compiler.  In many
cases the changes overlapped with special  cases  which  are
used  for  the  Fortran-style pass 2 (/_l_i_b/_f_1).  Most of the
single precision changes were implemented by Sam Leffler.

     _P_r_e_p_r_o_c_e_s_s_o_r _e_x_t_e_n_s_i_o_n_s.  The portable  C  compiler  is
normally distributed with a macro preprocessor written by J.
F.  Reiser.   This  preprocessor  implements  the   features
described  in  Ritchie's  reference  manual; it removes com-
ments, expands macro definitions and removes or inserts code
based  on conditional compilation directives.  Two interest-
ing extensions are provided by this version of  the  prepro-
cessor:

o+    When comments are removed, no  white  space  is  neces-
     sarily   substituted;   this  has  the  effect  of  _r_e-
     _t_o_k_e_n_i_z_i_n_g code,  since  the  PCC  will  reanalyze  the
     input.  Macros can thus create new tokens by clever use
     of  comments.   For  example,  the   macro   definition
     ``#define  foo(a,b)  a/**/b'' creates a macro _f_o_o which
     concatenates its two arguments, forming a new token.

o+    Macro bodies are analyzed for macro  arguments  without
     regard  to  the  boundaries of string or character con-
     stants.   The  definition  ``#define   bar(a)   "a\n"''
     creates  a  macro which returns the literal form of its
     argument embedded in a string with a newline appended.

     These extensions are not portable to a number of  other
C  preprocessors.   They  may  be  replaced in the future by
corresponding ANSI C features, when the ANSI C standard  has
been formalized.

_S_u_m_m_a_r_y _a_n_d _C_o_n_c_l_u_s_i_o_n

     The portable compiler has been a useful tool  for  pro-
viding  C  capability on a large number of diverse machines,
and for testing a number  of  theoretical  constructs  in  a
practical setting.  It has many blemishes, both in style and
functionality.  It has been applied to  many  more  machines
than  first  anticipated,  of a much wider range than origi-
nally dreamed of.  Its use has also spread much faster  than
expected,  leaving  parts of the compiler still somewhat raw
in shape.

     On the theoretical side, there is some  hope  that  the


                     September 27, 1987


                           - 45 -


skeleton  of  the _s_u_c_o_m_p routine could be generated for many
machines directly from the templates; this would give a con-
siderable  boost  to  the portability and correctness of the
compiler, but might  affect  tunability  and  code  quality.
There  is also room for more optimization, both within _o_p_t_i_m
and in the form of a portable ``peephole'' optimizer.

     On the practical, development side, the compiler  could
probably  be sped up and made smaller without doing too much
violence to its basic  structure.   Parts  of  the  compiler
deserve  to  be rewritten; the initialization code, register
allocation, and parser are prime candidates.   It  might  be
that  doing some or all of the parsing with a recursive des-
cent  parser  might  save  enough  space  and  time  to   be
worthwhile;  it  would  certainly ease the problem of moving
the compiler to an environment where  _Y_a_c_c  is  not  already
present.

_A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s

     I would like to thank the many  people  who  have  sym-
pathetically,  and  even enthusiastically, helped me grapple
with what has been a frustrating program to write, test, and
install.   D.  M.  Ritchie  and E. N. Pinson provided needed
early encouragement and philosophical guidance; M. E.  Lesk,
R.  Muha, T. G. Peterson, G. Riddle, L. Rosler, R. W. Mitze,
B. R. Rowland, S. I. Feldman, and T. B. London have all con-
tributed  ideas,  gripes,  and  all, at one time or another,
climbed ``into the pits'' with me to  help  debug.   Without
their  help  this  effort would have not been possible; with
it, it was often kind of fun.  S. C. Johnson


     Many people have contributed fixes and improvements  to
the  current  Berkeley version of the compiler.  A number of
really valuable fixes were contributed  by  Ralph  Campbell,
Sam  Leffler,  Kirk McKusick, Arthur Olsen, Donn Seeley, Don
Speck and Chris Torek, but most of the bugs were spotted  by
the  legions  of virtuous C programmers who were kind enough
to let us know that the compiler was  broken  and  when  the
heck  were  we  going to get it fixed?  Thank you all.  Donn
Seeley


[ $LIST$ ]


                     September 27, 1987