From bloom-picayune.mit.edu!mintaka.lcs.mit.edu!olivea!uunet!mcsun!uknet!ieunet!ccvax.ucd.ie!pflynn Tue Sep  8 00:53:20 EDT 1992
Article: 1391 of comp.text.sgml
Xref: bloom-picayune.mit.edu comp.text.sgml:1391 comp.text.tex:21027
Path: bloom-picayune.mit.edu!mintaka.lcs.mit.edu!olivea!uunet!mcsun!uknet!ieunet!ccvax.ucd.ie!pflynn
From: pflynn@ccvax.ucd.ie (Peter Flynn, Official Muffin-sampler)
Newsgroups: comp.text.sgml,comp.text.tex
Subject: SGML and TeX
Message-ID: <1992Sep7.101921.49631@ccvax.ucd.ie>
Date: 7 Sep 92 10:19:21 GMT
Organization: University College Dublin
Lines: 187

TUGboat 13[2] carries an abstract by Reinhard Wonneberger (pp226--227)
called "Approaching SGML from TeX", in which he summarises some of the
possible ways to use TeX to print from an SGML instance.

The following file is an attempt I cooked up over the weekend to demonstrate 
the feasibility of this approach. It still fails on a lot of things, but they 
don't look insuperable. The instance referenced at the end of the file can 
be retrieved by anon ftp from curia.ucc.ie (143.239.1.8) in pub/curia

--------------------------
% SGML.TEX --- a pilot set of macros to provide rudimentary 
%              typesetting of SGML-encoded documents with NO
%              pre- or postprocessing (you better believe it)
%              (c) 1992 Peter Flynn 
% 
% Warning: this file uses the EPLAIN macros of Karl Berry, obtainable
% from any of the TeX archives such as tex.ac.uk or ymir.claremont.edu
%
% WARNING: this is a pilot. No guarantees, but it seems to 
% work on the tags I mention below. It should form the basis
% for much more work, as with proper persuasion, TeX should be
% able to process an unaltered SGML instance (and DTD) and 
% produce a piece of acceptable typesetting (IMHO :-).
%
% If you are going to do some work on this, please ask me first:
% I am unlikely to object, but I would like to know about it.
% 
% Version history:
% 
%   0.1 (Sep 92) reads and acts on a minimal tagset of HTML
%                used in network-browseable documents by WWW
%                This comprises (work so far):
% 
%                     <title>...</title>        Document title
%                     <h1>...</h1>              Header level 1
%                     <h2>...</h2>              Header level 2
%                     <h3>...</h3>              Header level 3
%                     <dl>...                   Simple list
%                         <dt>...<dd>...        Item name, text
%                         </dl>                 End of list
%                     <p>                       Paragraph
%                     some entities like &aacute; (see below)
%
%                I haven't figured out how to handle multi-word
%                tags (eg with attributes) like <a name=0 h=test.doc>
%                yet, because in the parsing, TeX turns the space
%                into another category of character. Gimme time!
%                Another source of confusion is the presence of a
%                slash in a quoted filename within an attribute to
%                such tags when TeX is looking for the slash which
%                indicates the endtag. However...:-)
% 
% All comments to pflynn@curia.ucc.ie (Fax: +353 21 277194)
 
\input eplain                                   % get it from the archives!
\font\stt=cmtt8                                 % used for the tags
\font\sbf=cmssbx10 scaled \magstep1             % used for the title
\font\sc=cmcsc10                                % used for some headers

% Make a slash an ordinary letter.
\catcode`\/=11

% Define \pos, the position in a tag of the slash character
% and    \slash, a flag, 0=no slash found, 1=slash found.
\newcount\pos\newcount\slash

% The \parse and \getchar are adapted from the \length macro
% at the end of Chapter 20 (p.219) of the TeXbook. A call to
% \parse returns \slash=0 or \slash=1 depending on whether
% the argument was a starttag or endtag.
\def\parse#1{\global\pos=0\global\slash=0\getchar#1/}
\def\getchar#1{\ifx#1/\ifnum\pos=0\global\slash=1\global\advance\pos 
by1\let\next=\getchar\else\let\next=\relax\fi%
\else\global\advance\pos by1\let\next=\getchar\fi\next}

% Use \raggedcenter from Appendix A 14.34 (p.317) of the TeXbook
\def\raggedcenter{\leftskip=0pt plus12em \rightskip=\leftskip
\parfillskip=0pt \spaceskip=.3333em \xspaceskip=.5em \parindent=0pt
\pretolerance=9999 \tolerance=9999
\hyphenpenalty=9999 \exhyphenpenalty=9999 }

% Define the visual meanings to be attached to the tags
\def\title{\par\begingroup\raggedcenter\sbf}
\def\/title{\bigskip\endgroup}
\def\p{\par}
% Header level tags have to go in a group so that digits can
% be treated as letters for purposes of definition.
\begingroup\catcode`\2=11\catcode`\1=11
\global\def\h1{\bigbreak\noindent\begingroup\bf}
\global\def\/h1{\endgroup\medskip\noindent\ignorespaces}
\global\def\h2{\medbreak\noindent\begingroup\sc}
\global\def\/h2{\endgroup\smallskip\noindent\ignorespaces}
\global\def\h3{\smallbreak\noindent\begingroup\sl}
\global\def\/h3{\endgroup\par\noindent\ignorespaces}
\endgroup
\def\dl{\unorderedlist}
\def\/dl{\endunorderedlist}
\def\dt{\li\it}
\def\dd{\item{}\rm}
\def\a #1{\footnote{#1}}
\def\/a{}
\def\entr{\item{$\bullet$}}

% Make the less-than (opentag) character active, and establish
% two controls to let the use turn on tag presence and formatting
% in the output. Default is no tags and no formatting: this will
% output pages of plain typewriter text. Saying \showtagstrue
% will include the tags in the output; saying \formattrue will
% perform the formatting defined above. Either or both can be 
% used, but must be inserted where shown below, before the \input.
\catcode`\<=\active
\newif\ifshowtags\newif\ifformat

% Define the main routine to handle a tag
\def<#1>{\parse{#1}\ifnum\slash=1\ifshowtags\endtag{#1}\fi
                                 \ifformat\csname#1\endcsname\fi
                   \else\ifformat\csname#1\endcsname\fi
                        \ifshowtags\starttag{#1}\fi\fi}

% Set up some variable to handle the boxing of tags for output
\newbox\tagbox\newdimen\tagwidth\newdimen\boxwidth
\def\hlinefill{\leaders\hrule height.2pt\hfill}

% Define what a starttag looks like
\def\starttag#1{\setbox\tagbox=\hbox{{\stt#1}}%
\tagwidth=\wd\tagbox\advance\tagwidth by2pt%
\boxwidth=\tagwidth\advance\boxwidth by4pt%
\leavevmode\lower2.5pt\hbox{\vrule width.2pt\vbox{\hsize=\boxwidth\parindent=0pt\offinterlineskip%
\line{\hbox to\tagwidth{\hlinefill}\hfil}%
\line{\hskip2pt\box\tagbox\kern-.5pt$\rangle$\hfil}%
\line{\hbox to\tagwidth{\hlinefill}\hfil}}}}

% Define what an endtag looks like
\def\endtag#1{\setbox\tagbox=\hbox{{\stt#1}}%
\tagwidth=\wd\tagbox\advance\tagwidth by2pt%
\boxwidth=\tagwidth\advance\boxwidth by4pt%
\leavevmode\lower2.5pt\hbox{\vbox{\hsize=\boxwidth\parindent=0pt\offinterlineskip%
\line{\hfil\hbox to\tagwidth{\hlinefill}}%
\line{\hfil$\langle$\kern-1pt\box\tagbox\hskip2pt}%
\line{\hfil\hbox to\tagwidth{\hlinefill}}}\vrule width.2pt}}

% Define some of the simpler entities
\def\aacute{\'a}
\def\eacute{\'e}
\def\iacute{\'{\i}}
\def\oacute{\'o}
\def\uacute{\'u}
\def\ocus{\&}
\def\amp{\&}
\def\nodoti{\i}
\def\aelig{\ae}
\def\mdash{---}

% Turn on the recognition of the ampersand so entities become active
\catcode`\&=\active
\def&#1;{\csname#1\endcsname}

% Slip in recognition of a few of TeX's special characters
% The % sign itself is done only later, immediately before
% inputting the SGML instance, so that we can continue using
% comments until then.
\catcode`\$=\active\def${\$}
\catcode`\#=\active\def#{\#}

% Uncomment your choice of options here
\showtagstrue
\formattrue

% Make some assumptions about the style of output, based on the above:
\ifshowtags\raggedright\else\fi
\ifformat\else\ttraggedright\fi
\tolerance=7500
% And define the double-quote (") as active so typewriter-style
% quotes come out as open-and-closed in flip-flop manner. Bad style
% to use them in SGML anyway, <quote>...</quote> is better :-)
\ifformat\newcount\qcount\catcode`\"=\active
\def"{\global\advance\qcount by1\ifodd\qcount``\else''\fi}\fi

% Input your SGML instance here, after the comment character 
% is redefined (no more comments from here on...
\catcode`\%=\active\def%{\%}

\input /info/curia/Chron_Scot.html

\bye

----------------------------------------------------------


