A System for Typesetting Mathematics Brian W. Kernighan and Lorinda L. Cherry _A_B_S_T_R_A_C_T This paper describes the design and implemen- tation of a system for typesetting mathematics. The language has been designed to be easy to learn and to use by people (for example, secretaries and mathematical typists) who know neither mathematics nor typesetting. Experience indicates that the language can be learned in an hour or so, for it has few rules and fewer exceptions. For typical expressions, the size and font changes, position- ing, line drawing, and the like necessary to print according to mathematical conventions are all done automatically. For example, the input 9 sum from i=0 to infinity x sub i = pi over 2 9 produces 9999 _i=078_R78oo999_x_i=99277_J9_ 9 The syntax of the language is specified by a small context-free grammar; a compiler-compiler is used to make a compiler that translates this language into typesetting commands. Output may be produced on either a phototypesetter or on a ter- minal with forward and reverse half-line motions. The system interfaces directly with text format- ting programs, so mixtures of text and mathematics may be handled simply. This paper is a revision of a paper origi- nally published in CACM, March, 1975. 978 9 August 3, 1987 USD:26-2 A System for Typesetting Mathematics _1. _I_n_t_r_o_d_u_c_t_i_o_n ``Mathematics is known in the trade as _d_i_f_f_i_c_u_l_t, or _p_e_n_a_l_t_y, _c_o_p_y because it is slower, more difficult, and more expensive to set in type than any other kind of copy normally occurring in books and journals.'' [1] One difficulty with mathematical text is the multiplicity of characters, sizes, and fonts. An expression such as 99 _x__->_J_/27lim (tan _x)8sin 2_x9 _= 1 9requires an intimate mixture of roman, italic and greek 7777777777777777777777777777777777777777777777777777778 letters, in three sizes, and a special character or two. (``Requires'' is perhaps the wrong word, but mathematics has its own typographical conventions which are quite different from those of ordinary text.) Typesetting such an expression by tradi- tional methods is still an essentially manual opera- tion. A second difficulty is the two dimensional charac- ter of mathematics, which the superscript and limits in the preceding example showed in its simplest form. August 3, 1987 A System for Typesetting Mathematics USD:26-3 This is carried further by _a0_+9999_a1_+9999_a2_+99_a3_+ _. _. _.77_b39____________________777_b29__________________________777_b19________________________________ and still further by 9_^ _a_e8_m_x9_-_b_e8_-_m_x78_d_x9____________________9 _=7777 |99|99|99|99|99|99|99|99_|99_m_a_b77_-19______9 coth8_-19(99_b77_a9__9_e8_m_x9)777_m_a_b7719______9 tanh8_-19(99_b77_a9__9_e8_m_x9)7772_m_a_b7719________9 log99 _a_e8_m_x9_+_b77_a_e8_m_x9_-_b9____________ These examples also show line-drawing, built-up char- acters like braces and radi- cals, and a spectrum of positioning problems. (Sec- tion 6 shows what a user has to type to produce these on 77777777777777777777777777777777777777777777777777777778 our system.) _2. _P_h_o_t_o_c_o_m_p_o_s_i_t_i_o_n Photocomposition tech- niques can be used to solve some of the problems of typesetting mathematics. A phototypesetter is a device which exposes a piece of photographic paper or film, placing characters wherever they are wanted. The Graphic Systems photo- typesetter[2] on the UNIX operating system[3] works by shining light through a character stencil. The character is made the right August 3, 1987 USD:26-4 A System for Typesetting Mathematics size by lenses, and the light beam directed by fiber optics to the desired place on a piece of photographic paper. The exposed paper is developed and typically used in some form of photo-offset reproduction. On UNIX, the photo- typesetter is driven by a formatting program called TROFF [4]. TROFF was designed for setting running text. It also provides all of the facilities that one needs for doing mathematics, such as arbitrary horizontal and vertical motions, line- 7777777777777777777777777777777777777777777777777777777 drawing, size changing, but the syntax for describing these special operations is difficult to learn, and dif- ficult even for experienced users to type correctly. For this reason we decided to use TROFF as an ``assembly language,'' by designing a language for describing mathematical expressions, and compiling it into TROFF. _3. _L_a_n_g_u_a_g_e _D_e_s_i_g_n The fundamental princi- ple upon which we based our language design is that the August 3, 1987 A System for Typesetting Mathematics USD:26-5 language should be easy to use by people (for example, secretaries) who know nei- ther mathematics nor typesetting. This principle implies several things. First, ``normal'' mathematical con- ventions about operator pre- cedence, parentheses, and the like cannot be used, for to give special meaning to such characters means that the user has to understand what he or she is typing. Thus the language should not assume, for instance, that parentheses are always bal- 7777777777777777777777777777777777777777777777777777777 anced, for they are not in the half-open interval (_a,_b]. Nor should it assume that that _a+_b can be replaced by (_a+_b)81/29, or that 1/(1-_x) is better writ- 9 ten as99 1-_x7719___9 (or vice versa). 9 Second, there should be relatively few rules, key- words, special symbols and operators, and the like. This keeps the language easy to learn and remember. Furthermore, there should be few exceptions to the rules that do exist: if something works in one situation, it should work everywhere. If August 3, 1987 USD:26-6 A System for Typesetting Mathematics a variable can have a sub- script, then a subscript can have a subscript, and so on without limit. Third, ``standard'' things should happen automatically. Someone who types ``x=y+z+1'' should get ``_x=_y+_z+1''. Subscripts and superscripts should automat- ically be printed in an appropriately smaller size, with no special interven- tion. Fraction bars have to be made the right length and positioned at the right height. And so on. Indeed a mechanism for overriding 7777777777777777777777777777777777777777777777777777777 default actions has to exist, but its application is the exception, not the rule. We assume that the typ- ist has a reasonable picture (a two-dimensional represen- tation) of the desired final form, as might be handwrit- ten by the author of a paper. We also assume that the input is typed on a com- puter terminal much like an ordinary typewriter. This implies an input alphabet of perhaps 100 characters, none of them special. August 3, 1987 A System for Typesetting Mathematics USD:26-7 A secondary, but still important, goal in our design was that the system should be easy to implement, since neither of the authors had any desire to make a long-term project of it. Since our design was not firm, it was also necessary that the program be easy to change at any time. To make the program easy to build and to change, and to guarantee regularity (``it should work every- where''), the language is defined by a context-free grammar, described in Sec- 7777777777777777777777777777777777777777777777777777777 tion 5. The compiler for the language was built using a compiler-compiler. A priori, the grammar/compiler-compiler approach seemed the right thing to do. Our subsequent experience leads us to believe that any other course would have been folly. The original language was designed in a few days. Construction of a working system sufficient to try significant examples required perhaps a person- month. Since then, we have spent a modest amount of August 3, 1987 USD:26-8 A System for Typesetting Mathematics additional time over several years tuning, adding facili- ties, and occasionally changing the language as users make criticisms and suggestions. We also decided quite early that we would let TROFF do our work for us whenever possible. TROFF is quite a powerful program, with a macro facility, text and arithmetic variables, numerical computation and testing, and conditional branching. Thus we have been able to avoid writing a lot of mundane but tricky 7777777777777777777777777777777777777777777777777777777 software. For example, we store no text strings, but simply pass them on to TROFF. Thus we avoid having to write a storage manage- ment package. Furthermore, we have been able to isolate ourselves from most details of the particular device and character set currently in use. For example, we let TROFF compute the widths of all strings of characters; we need know nothing about them. A third design goal is special to our environment. Since our program is only August 3, 1987 A System for Typesetting Mathematics USD:26-9 useful for typesetting mathematics, it is necessary that it interface cleanly with the underlying typeset- ting language for the bene- fit of users who want to set intermingled mathematics and text (the usual case). The standard mode of operation is that when a document is typed, mathematical expres- sions are input as part of the text, but marked by user settable delimiters. The program reads this input and treats as comments those things which are not mathematics, simply passing 777777777777777777777777777777777777777777777777777777 them through untouched. At the same time it converts the mathematical input into the necessary TROFF com- mands. The resulting iout- put is passed directly to TROFF where the comments and the mathematical parts both become text and/or TROFF commands. _4. _T_h_e _L_a_n_g_u_a_g_e We will not try to describe the language pre- cisely h