This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: HTML/QuickCheck,  Next: HTML/RefMunger,  Prev: HTML/PrettyPrinter,  Up: Module List

a simple and fast HTML syntax checking package for perl 4 and perl 5
********************************************************************

NAME
====

   HTMLQuickCheck.pm - a simple and fast HTML syntax checking package for
perl 4 and perl 5

SYNOPSIS          require 'HTMLQuickCheck.pm';
==============================================

     &HTML'QuickCheck'OK($html_text) || die "Bad HTML: $HTML'QuickCheck'Error";

     or for perl 5:
     HTML::QuickCheck::OK($html_text) ||
             die "Bad HTML: $HTML::QuickCheck::Error";

DESCRIPTION
===========

   The objective of the package is to provide a fast and essential HTML
check (esp. for CGI scripts where response time is important) to prevent a
piece of user input HTML code from messing up the rest of a file, i.e., to
minimize and localize any possible damage created by including a piece of
user input HTML text in a dynamic document.

   HTMLQuickCheck checks for unmatched < and >, unmatched tags and improper
nesting, which could ruin the rest of the document.  Attributes and
elements with optional end tags are not checked, as they should not cause
disasters with any decent browsers (they should ignore any unrecognized
tags and attributes according to the standard).  A piece of HTML that
passes HTMLQuickCheck may not necessarily be valid HTML, but it would be
very unlikely to screw others but itself. A valid piece of HTML that
doesn't pass the HTMLQuickCheck is however very likely to screw many
browsers(which are obviously broken in terms of strict conformance).

   HTMLQuickCheck currently supports HTML 1.0, 2.x (draft), 3.0 (draft) and
netscape extensions (1.1).

EXAMPLE
=======

     htmlchk, a simple html checker:

     #!/usr/local/bin/perl
     require 'HTMLQuickCheck.pm';
     undef $/;
     print &HTML'QuickCheck'OK(<>) ? "HTML OK\n" :
             "Bad HTML:\n", $HTML'QuickCheck'Error;
     __END__

     Usage:
     htmlchk [html_file]

AUTHOR
======

   Luke Y. Lu <ylu@mail.utexas.edu>

SEE ALSO
========

   HTML docs at <URL:http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html>;
HTML validation service at <URL:http://www.halsoft.com/html/>; perlSGML
package at <URL:http://www.oac.uci.edu/indiv/ehood/perlSGML.html>; weblint
at <URL:http://www.khoros.unm.edu/staff/neilb/weblint.html>

BUGS
====

   Please report them to the author.


File: pm.info,  Node: HTML/RefMunger,  Next: HTML/SimpleLinkExtor,  Prev: HTML/QuickCheck,  Up: Module List

module to mangle HREF links within HTML files
*********************************************

NAME
====

   HTML::RefMunger - module to mangle HREF links within HTML files

SYNOPSIS
========

     use HTML::RefMunger;
     refmunger( [options] );

DESCRIPTION
===========

ARGUMENTS
=========

   HTML::RefMunger takes the following arguments:

help
          --help

     Displays the usage message.

infile
          --infile=name

     Specify the pod file to convert.  Input is taken from STDIN if no
     infile is specified.

outdir
          --outdir=name

     Specify the output directory to stash the munged HTML files in.
     Defaults to "." ( the current working directory )

outfile
          --convention=[UNIX,MSDOS,MacOS]

     Specify the filename remapping convention to use. Current supported
     formats are UNIX ( 14-character, such as Xenix ), MSDOS ( 8.3 format
     ) and MacOS ( 32 character ).

verbose
          --verbose

     Display progress messages.

EXAMPLE
=======

     refmunger( "refmunger", "--infile=foo.html",
                "--convention=MacOS" );

AUTHOR
======

   Alligator Descartes <descarte@arcana.co.uk>

BUGS
====

LIMITATIONS
===========

   * Passing directories as the -infile current doesn't work. It should
     recursively translate everything in that directory downwards, but it
     doesn't.

   * There should be some sort of funky file renaming thing that happens to
     referenced links, I guess. At the moment, you also need to translate
     those links *via* `HTML::RefMunger' to generate the correct name.

SEE ALSO
========

   `refmunger' in this node

COPYRIGHT
=========

   This program is distributed under the Artistic License.


File: pm.info,  Node: HTML/SimpleLinkExtor,  Next: HTML/SimpleParse,  Prev: HTML/RefMunger,  Up: Module List

SYNOPSIS
========

     use HTML::SimpleLinkExtor;
     
     my $extor = HTML::SimpleLinkExtor->new();
     $extor->parse_file($filename);
     #--or--
     $extor->parse($html);
     
     #extract all of the links
     @all_links   = $extor->links;

     #extract the img links
     @img_srcs    = $extor->img;

     #extract the frame links
     @frame_srcs  = $extor->frame;

     #extract the hrefs
     @area_hrefs  = $extor->area;
     @a_hrefs     = $extor->a;
     @base_hrefs  = $extor->base;
     @hrefs       = $extor->href;

     #extract the body background link
     @body_bg     = $extor->body;
     @background  = $extor->background;

DESCRIPTION
===========

   This is a simple HTML link extractor designed for the person who does
not want to deal with the intricacies of HTML::Parser or the
de-referencing needed to get links out of HTML::LinkExtor.

   You can extract all the links or some of the links (based on the HTML
tag name or attribute name). If a <BASE HREF> tag is found, all of the
relative URLs will be resolved according to that reference.

   This module is simply a subclass around HTML::LinkExtor.

$extor = HTML::SimpleLinkExtor->new()
     Create the link extractor object.

$extor = HTML::SimpleLinkExtor->new($base)
     Create the link extractor object and resolve the relative URLs
     accoridng to the supplied base URL. The supplied base URL overrides
     any other base URL found in the HTML.

$extor = HTML::SimpleLinkExtor->new(")
     Create the link extractor object and do not resolve relative links.

$extor->parse_file( $filename )
     Parse the file for links.

$extor->parse( $data )
     Parse the HTML in $data.

$extor->links
     Return a list of the links.

$extor->img
     Return a list of the links from all the SRC attributes of the IMG.

$extor->frame
     Return a list of all the links from all the SRC attributes of the
     FRAME.

$extor->src
     Return a list of the links from all the SRC attributes of any tag.

$extor->a
     Return a list of the links from all the HREF attributes of the A tags.

$extor->area
     Return a list of the links from all the HREF attributes of the AREA
     tags.

$extor->base
     Return a list of the links from all the HREF attributes of the BASE
     tags.  There should only be one.

$extor->href
     Return a list of the links from all the HREF attributes of any tag.

$extor->body, $extor->background
     Return the link from the BODY tag's BACKGROUND attribute.

TO DO
=====

   This module doesn't handle all of the HTML tags that might have links.
If someone wants those, I'll add them.

AUTHOR
======

   brian d foy <comdog@panix.com>


File: pm.info,  Node: HTML/SimpleParse,  Next: HTML/StickyForms,  Prev: HTML/SimpleLinkExtor,  Up: Module List

a bare-bones HTML parser
************************

NAME
====

   HTML::SimpleParse - a bare-bones HTML parser

SYNOPSIS
========

     use HTML::SimpleParse;

     # Parse the text into a simple tree
     my $p = new HTML::SimpleParse( $html_text );
     $p->output;                 # Output the HTML verbatim
     
     $p->text( $new_text );      # Give it some new HTML to chew on
     $p->parse                   # Parse the new HTML
     $p->output;

     my %attrs = HTML::SimpleParse->parse_args('A="xx" B=3');
     # %attrs is now ('A' => 'xx', 'B' => '3')

DESCRIPTION
===========

   This module is a simple HTML parser.  It is similar in concept to
HTML::Parser, but it differs in a couple of important ways.

   First, HTML::Parser knows which tags can contain other tags, which
start tags have corresponding end tags, which tags can exist only in the
<HEAD> portion of the document, and so forth.  HTML::SimpleParse does not
know any of these things.  It just finds tags and text in the HTML you
give it, it does not care about the specific content of these tags (though
it does distiguish between different _types_ of tags, such as comments,
starting tags like <b>, ending tags like </b>, and so on).

   Second, HTML::SimpleParse does not create a hierarchical tree of HTML
content, but rather a simple linear list.  It does not pay any attention
to balancing start tags with corresponding end tags, or which pairs of
tags are inside other pairs of tags.

   Because of these characteristics, you can make a very effective HTML
filter by sub-classing HTML::SimpleParse.  For example, to remove all
comments from HTML:

     package NoComment;
     use HTML::SimpleParse;
     @ISA = qw(HTML::SimpleParse);
     sub output_comment {}
     
     package main;
     NoComment->new($some_html)->output;

Methods
-------

   * new
          $p = new HTML::SimpleParse( $some_html );

     Creates a new HTML::SimpleParse object.  Optionally takes one
     argument, a string containing some HTML with which to initialize the
     object.  If you give it a non-empty string, the HTML will be parsed
     into a tree and ready for outputting.

     Can also take a list of attributes, such as

          $p = new HTML::SimpleParse( $some_html, 'fix_case' => -1);

     See the `parse_args()' method below for an explanation of this
     attribute.

   * text
          $text = $p->text;
          $p->text( $new_text );

     Get or set the contents of the HTML to be parsed.

   * tree
          foreach ($p->tree) { ... }

     Returns a list of all the nodes in the tree, in case you want to step
     through them manually or something.  Each node in the tree is an
     anonymous hash with (at least) three data members, $node->{type} (is
     this a comment, a start tag, an end tag, etc.), $node->{content} (all
     the text between the angle brackets, verbatim), and $node->{offset}
     (number of bytes from the beginning of the string).

     The possible values of $node->{type} are text, `starttag', endtag,
     `ssi', and `markup'.

   * parse
          $p->parse;

     Once an object has been initialized with some text, call $p->parse and
     a tree will be created.  After the tree is created, you can call
     $p->output.  If you feed some text to the new() method, parse will be
     called automatically during your object's construction.

   * parse_args
          %hash = $p->parse_args( $arg_string );

     This routine is handy for parsing the contents of an HTML tag into
     key=value pairs.  For instance:

          $text = 'type=checkbox checked name=flavor value="chocolate or strawberry"';
          %hash = $p->parse_args( $text );
          # %hash is ( TYPE=>'checkbox', CHECKED=>undef, NAME=>'flavor',
          #            VALUE=>'chocolate or strawberry' )

     Note that the position of the last m//g search on the string (the
     value returned by Perl's pos() function) will be altered by the
     parse_args function, so make sure you take that into account if (in
     the above example) you do `$text =~ m/something/g'.

     The parse_args() method can be run as either an object method or as a
     class method, i.e. as either $p->parse_args(...) or
     HTML::SimpleParse->parse_args(...).

     HTML attribute lists are supposed to be case-insensitive with respect
     to attribute names.  To achieve this behavior, parse_args() respects
     the 'fix_case' flag, which can be set either as a package global
     $FIX_CASE, or as a class member datum 'fix_case'.  If set to 0, no
     case conversion is done.  If set to 1, all keys are converted to upper
     case.  If set to -1, all keys are converted to lower case.  The
     default is 1, i.e. all keys are uppercased.

     If an attribute takes no value (like "checked" in the above example)
     then it will still have an entry in the returned hash, but its value
     will be undef.  For example:

          %hash = $p->parse_args('type=checkbox checked name=banana value=""');
          # $hash{CHECKED} is undef, but $hash{VALUE} is ""

     This method actually returns a list (not a hash), so duplicate
     attributes and order will be preserved if you want them to be:

          @hash = $p->parse_args("name=family value=gwen value=mom value=pop");
          # @hash is qw(NAME family VALUE gwen VALUE mom VALUE pop)

   * output
          $p->output;

     This will output the contents of the HTML, passing the real work off
     to the output_text, output_comment, etc. functions.  If you do not
     override any of these methods, this module will output the exact text
     that it parsed into a tree in the first place.

   * get_output
          print $p->get_output

     Similar to $p->output(), but returns its result instead of printing
     it.

   * execute
          foreach ($p->tree) {
             print $p->execute($_);
          }

     Executes a single node in the HTML parse tree.  Useful if you want to
     loop through the nodes and output them individually.

   The following methods do the actual outputting of the various parts of
the HTML.  Override some of them if you want to change the way the HTML is
output.  For instance, to strip comments from the HTML, override the
output_comment method like so:

     # In subclass:
     sub output_comment { }  # Does nothing

   * output_text

   * output_comment

   * output_endtag

   * output_starttag

   * output_markup

   * output_ssi

CAVEATS
=======

   Please do not assume that the interface here is stable.  This is a
first pass, and I'm still trying to incorporate suggestions from the
community.  If you employ this module somewhere, make doubly sure before
upgrading that none of your code breaks when you use the newer version.

BUGS
====

   * Embedded >s are broken

     Won't handle tags with embedded >s in them, like <input name=expr
     value="x > y">.  This will be fixed in a future version, probably by
     using the parse_args method.  Suggestions are welcome.

TO DO
=====

   * extensibility

     Based on a suggestion from Randy Harmon (thanks), I'd like to make it
     easier for subclasses of SimpleParse to pick out other kinds of HTML
     blocks, i.e.  extend the set {text, comment, endtag, starttag,
     markup, ssi} to include more members.  Currently the only easy way to
     do that is by overriding the parse method:

          sub parse {  # In subclass
             my $self = $_[0];
             $self->SUPER::parse(@_);
             foreach ($self->tree) {
                if ($_->{content} =~ m#^a\s+#i) {
                   $_->{type} = 'anchor_start';
                }
             }
          }

          sub output_anchor_start {
             # Whatever you want...
          }

     Alternatively, this feature might be implemented by hanging
     attatchments onto the parsing loop, like this:

          my $parser = new SimpleParse( $html_text );
          $regex = '<(a\s+.*?)>';
          $parser->watch_for( 'anchor_start', $regex );
          
          sub SimpleParse::output_anchor_start {
             # Whatever you want...
          }

     I think I like that idea better.  If you wanted to, you could make a
     subclass with output_anchor_start as one of its methods, and put the
     ->watch_for stuff in the constructor.

   * reading from filehandles

     It would be nice if you could initialize an object by giving it a
     filehandle or filename instead of the text itself.

   * tests

     I need to write a few tests that run under "make test".

AUTHOR
======

   Ken Williams <ken@forum.swarthmore.edu>

COPYRIGHT
=========

   Copyright 1998 Swarthmore College.  All rights reserved.

   This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.


File: pm.info,  Node: HTML/StickyForms,  Next: HTML/Stream,  Prev: HTML/SimpleParse,  Up: Module List

HTML form generation for CGI or mod_perl
****************************************

NAME
====

   HTML::StickyForms - HTML form generation for CGI or mod_perl

SYNOPSIS
========

     # mod_perl example

     use HTML::StickyForms;
     use Apache::Request;

     sub handler{
       my($r)=@_;
       $r=Apache::Request->new($r);
       my $f=HTML::StickyForms->new($r);

     $r->send_http_header;
     print
       "<HTML><BODY><FORM>",
       "Text field:",
       $f->text(name => 'field1', size => 40, value => 'default value'),
       "<BR>Text area:",
       $r->textarea(name => 'field2', cols => 60, rows => 5, value => 'stuff'),
       "<BR>Radio buttons:",
       $r->radio_group(name => 'field3', values => [1,2,3],
         labels => { 1=>'one', 2=>'two', 3=>'three' }, default => 2),
       "<BR>Single checkbox:",
       $r->checkbox(name => 'field4', value => 'xyz', checked => 1),
       "<BR>Checkbox group:",
       $r->checkbox_group(name => 'field5', values => [4,5,6],
         labels => { 4=>'four', 5=>'five', 6=>'six' }, defaults => [5,6]),
       "<BR>Password field:",
       $r->password(name => 'field6', size => 20),
       '<BR><INPUT type="submit" value=" Hit me! ">',
       '</FORM></BODY></HTML>',
      ;
      return OK;
       }

DESCRIPTION
===========

   This module provides a simple interface for generating HTML <FORM>
fields, with default values chosen from the previous form submission. This
module was written with mod_perl in mind, but works equally well with
CGI.pm, including the new 3.x version.

   The module does not provide methods for generating all possible form
fields, only those which benefit from having default values overridden by
the previous form submission. This means that, unlike CGI.pm, there are no
routines for generating <FORM> tags, hidden fields or submit fields. Also
this module's interface is much less flexible than CGI.pm's. This was done
mainly to keep the size and complexity down.

METHODS
-------

HTML::StickyForms->new($req)
     Creates a new form generation object. The single argument can be an
     Apache::Request object, a CGI object (v2.x), a CGI::State object
     (v3.x), or an object of a subclass of any of the above. As a special
     case, if the argument is undef or ", the object created will behave
     as if a request object with no submitted fields was given.

$f->trim_params()
     Removes leading and trailing whitespace from all submitted values.

$f->text(%args)
     Generates an <INPUT> tag, with a type of "text". All arguments are
     used directly to generate attributes for the tag, with the following
     exceptions:

    type
          Defaults to "text"

    name
          The value passed will have all HTML-special characters escaped.

    default
          Specifies the default value of the field if no fields were
          submitted in the request object passed to new(). The value used
          will have all HTML-special characters escaped.

$f->password(%args)
     As text(), but generates a `"password"' type field.

$f->textarea(%args)
     Generates a <TEXTAREA> container. All arguments are used directly to
     generate attributes for the start tag, except for:

    name
          This value will be HTML-escaped.

    default
          Specifies the default contents of the container if no fields
          were submitted.  The value used will be HTML-escaped.

$f->checkbox(%args)
     Generates a single `"checkbox"' type <INPUT> tag. All arguments are
     used directly to generate attributes for the tag, except for:

    name, value
          The values passed will be HTML-escaped.

    checked
          Specifies the default state of the field if no fields were
          submitted.

$f->checkbox_group(%args)
     Generates a group of `"checkbox"' type <INPUT> tags. If called in
     list context, returns a list of tags, otherwise a single string
     containing all tags. All arguments are used directly to generate
     attributes in each tag, except for the following:

    type
          Defaults to `"checkbox"'.

    name
          This value will be HTML-escaped.

    values or value
          An arrayref of values. One tag will be generated for each
          element. The values will be HTML-escaped. Defaults to label keys.

    labels or label
          A hashref of labels. Each tag generated will be followed by the
          label keyed by the value. If no label is present for a given
          value, no label will be generated. Defaults to an empty hashref.

    escape
          If this value is true, the labels will be HTML-escaped.

    defaults or default
          A single value or arrayref of values to be checked if no fields
          were submitted.  Defaults to an empty arrayref.

    linebreak
          If true, each tag/label will be followed by a <BR> tag.

$f->radio_group(%args)
     As `checkbox_group()', but generates `"radio"' type tags.

$f->select(%args)
     Generates a <SELECT> tags. All arguments are used directly to generate
     attributes in the <SELECT> tag, except for the following:

    name
          This value will be HTML-escaped.

    values or value
          An arrayref of values. One <OPTION> tag will be created inside
          the <SELECT> tag for each element. The values will be
          HTML-escaped.  Defaults to label keys.

    labels or label
          A hashref of labels. Each <OPTION> tag generated will contain the
          label keyed by its value. If no label is present for a given
          value, no label will be generated. Defaults to an empty hashref.

    defaults or default
          A single value or arrayref of values to be selected if no fields
          were submitted. Defaults to an empty arrayref.

    multiple
          If a true value is passed, the MULTIPLE attribute is set.

AUTHOR
======

   Peter Haworth <pmh@edison.ioppublishing.com>


File: pm.info,  Node: HTML/Stream,  Next: HTML/Subtext,  Prev: HTML/StickyForms,  Up: Module List

HTML output stream class, and some markup utilities
***************************************************

NAME
====

   HTML::Stream - HTML output stream class, and some markup utilities

SYNOPSIS
========

   Here's small sample of some of the non-OO ways you can use this module:

     use HTML::Stream qw(:funcs);
     
     print html_tag('A', HREF=>$link);
     print html_escape("<<Hello & welcome!>>");

   And some of the OO ways as well:

     use HTML::Stream;
     $HTML = new HTML::Stream \*STDOUT;
     
     # The vanilla interface...
     $HTML->tag('A', HREF=>"$href");
     $HTML->tag('IMG', SRC=>"logo.gif", ALT=>"LOGO");
     $HTML->text($copyright);
     $HTML->tag('_A');
     
     # The chocolate interface...
     $HTML -> A(HREF=>"$href");
     $HTML -> IMG(SRC=>"logo.gif", ALT=>"LOGO");
     $HTML -> t($caption);
     $HTML -> _A;
     
     # The chocolate interface, with whipped cream...
     $HTML -> A(HREF=>"$href")
           -> IMG(SRC=>"logo.gif", ALT=>"LOGO")
           -> t($caption)
           -> _A;

     # The strawberry interface...
     output $HTML [A, HREF=>"$href"],
                  [IMG, SRC=>"logo.gif", ALT=>"LOGO"],
                  $caption,
                  [_A];

DESCRIPTION
===========

   The *HTML::Stream* module provides you with an object-oriented (and
subclassable) way of outputting HTML.  Basically, you open up an "HTML
stream" on an existing filehandle, and then do all of your output to the
HTML stream.  You can intermix HTML-stream-output and
ordinary-print-output, if you like.

   There's even a small built-in subclass, HTML::Stream::Latin1, which can
handle Latin-1 input right out of the box.   But all in good time...

INTRODUCTION (the Neapolitan dessert special)
=============================================

Function interface
------------------

   Let's start out with the simple stuff.  This module provides a
collection of non-OO utility functions for escaping HTML text and
producing HTML tags, like this:

     use HTML::Stream qw(:funcs);        # imports functions from @EXPORT_OK
     
     print html_tag(A, HREF=>$url);
     print '&copy; 1996 by', html_escape($myname), '!';
     print html_tag('/A');

   By the way: that last line could be rewritten as:

     print html_tag(_A);

   And if you need to get a parameter in your tag that doesn't have an
associated value, supply the *undefined* value (not the empty string!):

     print html_tag(TD, NOWRAP=>undef, ALIGN=>'LEFT');
     
          <TD NOWRAP ALIGN=LEFT>
     
     print html_tag(IMG, SRC=>'logo.gif', ALT=>'');
     
          <IMG SRC="logo.gif" ALT="">

   There are also some routines for reversing the process, like:

     $text = "This <i>isn't</i> &quot;fun&quot;...";
     print html_unmarkup($text);
     
          This isn't &quot;fun&quot;...
     
     print html_unescape($text);
     
          This isn't "fun"...

   *Yeah, yeah, yeah*, I hear you cry.  *We've seen this stuff before.*
But wait!  There's more...

OO interface, vanilla
---------------------

   Using the function interface can be tedious... so we also provide an
*"HTML output stream"* class.  Messages to an instance of that class
generally tell that stream to output some HTML.  Here's the above example,
rewritten using HTML streams:

     use HTML::Stream;
     $HTML = new HTML::Stream \*STDOUT;
     
     $HTML->tag(A, HREF=>$url);
     $HTML->ent('copy');
     $HTML->text(" 1996 by $myname!");
     $HTML->tag(_A);

   As you've probably guessed:

     text()   Outputs some text, which will be HTML-escaped.
     
     tag()    Outputs an ordinary tag, like <A>, possibly with parameters.
              The parameters will all be HTML-escaped automatically.
     
     ent()    Outputs an HTML entity, like the &copy; or &lt; .
              You mostly don't need to use it; you can often just put the
              Latin-1 representation of the character in the text().

   You might prefer to use `t()' and `e()' instead of text() and `ent()':
they're absolutely identical, and easier to type:

     $HTML -> tag(A, HREF=>$url);
     $HTML -> e('copy');
     $HTML -> t(" 1996 by $myname!");
     $HTML -> tag(_A);

   Now, it wouldn't be nice to give you those text() and `ent()' shortcuts
without giving you one for tag(), would it?  Of course not...

OO interface, chocolate
-----------------------

   The known HTML tags are even given their own *tag-methods,* compiled on
demand.  The above code could be written even more compactly as:

     $HTML -> A(HREF=>$url);
     $HTML -> e('copy');
     $HTML -> t(" 1996 by $myname!");
     $HTML -> _A;

   As you've probably guessed:

     A(HREF=>$url)   ==   tag(A, HREF=>$url)   ==   <A HREF="/the/url">
     _A              ==   tag(_A)              ==   </A>

   All of the autoloaded "tag-methods" use the tagname in *all-uppercase*.
A `"_"' prefix on any tag-method means that an end-tag is desired.  The
`"_"' was chosen for several reasons: (1) it's short and easy to type, (2)
it doesn't produce much visual clutter to look at, (3) `_TAG' looks a
little like `/TAG' because of the straight line.

   * *I know, I know... it looks like a private method.  You get used to
     it.  Really.*

   I should stress that this module will only auto-create tag methods for
*known* HTML tags.  So you're protected from typos like this (which will
cause a fatal exception at run-time):

     $HTML -> IMGG(SRC=>$src);

   (You're not yet protected from illegal tag parameters, but it's a start,
ain't it?)

   If you need to make a tag known (sorry, but this is currently a global
operation, and not stream-specific), do this:

     accept_tag HTML::Stream 'MARQUEE';       # for you MSIE fans...

   *Note: there is no corresponding "reject_tag".*  I thought and thought
about it, and could not convince myself that such a method would do
anything more useful than cause other people's modules to suddenly stop
working because some bozo function decided to reject the `FONT' tag.

OO interface, with whipped cream
--------------------------------

   In the grand tradition of C++, output method chaining is supported in
both the Vanilla Interface and the Chocolate Interface.  So you can (and
probably should) write the above code as:

     $HTML -> A(HREF=>$url)
           -> e('copy') -> t(" 1996 by $myname!")
           -> _A;

   *But wait!  Neapolitan ice cream has one more flavor...*

OO interface, strawberry
------------------------

   I was jealous of the compact syntax of HTML::AsSubs, but I didn't want
to worry about clogging the namespace with a lot of functions like p(),
a(), etc. (especially when markup-functions like tr() conflict with
existing Perl functions).  So I came up with this:

     output $HTML [A, HREF=>$url], "Here's my $caption", [_A];

   Conceptually, arrayrefs are sent to `html_tag()', and strings to
`html_escape()'.

ADVANCED TOPICS
===============

Auto-formatting and inserting newlines
--------------------------------------

   *Auto-formatting* is the name I give to the Chocolate Interface feature
whereby newlines (and maybe, in the future, other things) are inserted
before or after the tags you output in order to make your HTML more
readable.  So, by default, this:

     $HTML -> HTML
           -> HEAD
           -> TITLE -> t("Hello!") -> _TITLE
           -> _HEAD
           -> BODY(BGCOLOR=>'#808080');

   Actually produces this:

     <HTML><HTML>
     <HEAD>
     <TITLE>Hello!</TITLE>
     </HEAD>
     <BODY BGCOLOR="#808080">

   *To turn off autoformatting altogether* on a given HTML::Stream object,
use the `auto_format()' method:

     $HTML->auto_format(0);        # stop autoformatting!

   *To change whether a newline is automatically output* before/after the
begin/end form of a tag at a global level, use `set_tag()':

     HTML::Stream->set_tag('B', Newlines=>15);   # 15 means "\n<B>\n \n</B>\n"
     HTML::Stream->set_tag('I', Newlines=>7);    # 7 means  "\n<I>\n \n</I>  "

   *To change whether a newline is automatically output* before/after the
begin/end form of a tag *for a given stream* level, give the stream its
own private "tag info" table, and then use `set_tag()':

     $HTML->private_tags;
     $HTML->set_tag('B', Newlines=>0);     # won't affect anyone else!

   *To output newlines explicitly*, just use the special nl method in the
Chocolate Interface:

     $HTML->nl;     # one newline
     $HTML->nl(6);  # six newlines

   I am sometimes asked, "why don't you put more newlines in
automatically?"  Well, mostly because...

   * Sometimes you'll be outputting stuff inside a `PRE' environment.

   * Sometimes you really do want to jam things (like images, or table
     cell delimiters and the things they contain) right up against each
     other.

   So I've stuck to outputting newlines in places where it's most likely
to be harmless.

Entities
--------

   As shown above, You can use the `ent()' (or `e()') method to output an
entity:

     $HTML->t('Copyright ')->e('copy')->t(' 1996 by Me!');

   But this can be a pain, particularly for generating output with
non-ASCII characters:

     $HTML -> t('Copyright ')
           -> e('copy')
           -> t(' 1996 by Fran') -> e('ccedil') -> t('ois, Inc.!');

   Granted, Europeans can always type the 8-bit characters directly in
their Perl code, and just have this:

     $HTML -> t("Copyright \251 1996 by Fran\347ois, Inc.!');

   But folks without 8-bit text editors can find this kind of output
cumbersome to generate.  Sooooooooo...

Auto-escaping: changing the way text is escaped
-----------------------------------------------

   *Auto-escaping* is the name I give to the act of taking an "unsafe"
string (one with ">", "&", etc.), and magically outputting "safe" HTML.

   The default "auto-escape" behavior of an HTML stream can be a drag if
you've got a lot character entities that you want to output, or if you're
using the Latin-1 character set, or some other input encoding.
Fortunately, you can use the `auto_escape()' method to change the way a
particular HTML::Stream works at any time.

   First, here's a couple of special invocations:

     $HTML->auto_escape('ALL');      # Default; escapes [<>"&] and 8-bit chars.
     $HTML->auto_escape('LATIN_1');  # Like ALL, but uses Latin-1 entities
                                     #   instead of decimal equivalents.
     $HTML->auto_escape('NON_ENT');  # Like ALL, but leaves "&" alone.

   You can also install your own auto-escape function (note that you might
very well want to install it for just a little bit only, and then
de-install it):

     sub my_auto_escape {
         my $text = shift;
     	HTML::Entities::encode($text);     # start with default
         $text =~ s/\(c\)/&copy;/ig;        # (C) becomes copyright
         $text =~ s/\\,(c)/\&$1cedil;/ig;   # \,c becomes a cedilla
      	$text;
     }
     
     # Start using my auto-escape:
     my $old_esc = $HTML->auto_escape(\&my_auto_escape);
     
     # Output some stuff:
     $HTML-> IMG(SRC=>'logo.gif', ALT=>'Fran\,cois, Inc');
     output $HTML 'Copyright (C) 1996 by Fran\,cois, Inc.!';
     
     # Stop using my auto-escape:
     $HTML->auto_escape($old_esc);

   If you find yourself in a situation where you're doing this a lot, a
better way is to create a *subclass* of HTML::Stream which installs your
custom function when constructed.  For an example, see the
HTML::Stream::Latin1 subclass in this module.

Outputting HTML to things besides filehandles
---------------------------------------------

   As of Revision 1.21, you no longer need to supply new() with a
filehandle: *any object that responds to a print() method will do*.  Of
course, this includes blessed FileHandles, and IO::Handles.

   If you supply a GLOB reference (like `\*STDOUT') or a string (like
`"Module::FH"'), HTML::Stream will automatically create an invisible
object for talking to that filehandle (I don't dare bless it into a
FileHandle, since the underlying descriptor would get closed when the
HTML::Stream is destroyed, and you might not want that).

   You say you want to print to a string?  For kicks and giggles, try this:

     package StringHandle;
     sub new {
     	my $self = '';
     	bless \$self, shift;
     }
     sub print {
         my $self = shift;
         $$self .= join('', @_);
     }
     
     
     package main;
     use HTML::Stream;
     
     my $SH = new StringHandle;
     my $HTML = new HTML::Stream $SH;
     $HTML -> H1 -> t("Hello & <<welcome>>!") -> _H1;
     print "PRINTED STRING: ", $$SH, "\n";

Subclassing
-----------

   This is where you can make your application-specific HTML-generating
code *much* easier to look at.  Consider this:

     package MY::HTML;
     @ISA = qw(HTML::Stream);
     
     sub Aside {
     	$_[0] -> FONT(SIZE=>-1) -> I;
     }
     sub _Aside {
     	$_[0] -> _I -> _FONT;
     }

   Now, you can do this:

     my $HTML = new MY::HTML \*STDOUT;
     
     $HTML -> Aside
           -> t("Don't drink the milk, it's spoiled... pass it on...")
           -> _Aside;

   If you're defining these markup-like, chocolate-interface-style
functions, I recommend using mixed case with a leading capital.  You
probably shouldn't use all-uppercase, since that's what this module uses
for real HTML tags.

PUBLIC INTERFACE
================

Functions
---------

html_escape TEXT
     Given a TEXT string, turn the text into valid HTML by escaping
     "unsafe" characters.  Currently, the "unsafe" characters are 8-bit
     characters plus:

          <  >  =  &

     Note: provided for convenience and backwards-compatibility only.  You
     may want to use the more-powerful *HTML::Entities::encode* function
     instead.

html_tag TAG [, PARAM=>VALUE, ...]
     Return the text for a given TAG, possibly with parameters.  As an
     efficiency hack, only the values are HTML-escaped currently: it is
     assumed that the tag and parameters will already be safe.

     For convenience and readability, you can say `_A' instead of `"/A"'
     for the first tag, if you're into barewords.

html_unescape TEXT
     Remove angle-tag markup, and convert the standard ampersand-escapes
     (lt, gt, `amp', quot, and `#ddd') into ASCII characters.

     Note: provided for convenience and backwards-compatibility only.  You
     may want to use the more-powerful *HTML::Entities::decode* function
     instead: unlike this function, it can collapse entities like copy and
     `ccedil' into their Latin-1 byte values.

html_unmarkup TEXT
     Remove angle-tag markup from TEXT, but do not convert
     ampersand-escapes.  Cheesy, but theoretically useful if you want to,
     say, incorporate externally-provided HTML into a page you're
     generating, and are worried that the HTML might contain undesirable
     markup.

Vanilla
-------

new [PRINTABLE]
     *Class method.* Create a new HTML output stream.

     The PRINTABLE may be a FileHandle, a glob reference, or any object
     that responds to a print() message.  If no PRINTABLE is given, does a
     select() and uses that.

auto_escape [NAME|SUBREF]
     *Instance method.* Set the auto-escape function for this HTML stream.

     If the argument is a subroutine reference SUBREF, then that subroutine
     will be used.  Declare such subroutines like this:

          sub my_escape {
          	my $text = shift;     # it's passed in the first argument
              ...
              $text;
          }

     If a textual NAME is given, then one of the appropriate built-in
     functions is used.  Possible values are:

    ALL
          Default for HTML::Stream objects.  This escapes angle brackets,
          ampersands, double-quotes, and 8-bit characters.  8-bit
          characters are escaped using decimal entity codes (like `#123').

    LATIN_1
          Like `"ALL"', but uses Latin-1 entity names (like `ccedil')
          instead of decimal entity codes to escape characters.  This
          makes the HTML more readable but it is currently not advised, as
          "older" browsers (like Netscape 2.0) do not recognize many of
          the ISO-8859-1 entity names (like `deg').

          Warning: If you specify this option, you'll find that it attempts
          to "require" *HTML::Entities* at run time.  That's because I
          didn't want to force you to have that module just to use the
          rest of HTML::Stream.  To pick up problems at compile time, you
          are advised to say:

               use HTML::Stream;
               use HTML::Entities;

          in your source code.

    NON_ENT
          Like `"ALL"', except that ampersands (&) are not escaped.  This
          allows you to use &-entities in your text strings, while having
          everything else safely escaped:

               output $HTML "If A is an acute angle, then A > 90&deg;";

     Returns the previously-installed function, in the manner of select().
     No arguments just returns the currently-installed function.

auto_format ONOFF
     *Instance method.* Set the auto-formatting characteristics for this
     HTML stream.  Currently, all you can do is supply a single defined
     boolean argument, which turns auto-formatting ON (1) or OFF (0).  The
     self object is returned.

     Please use no other values; they are reserved for future use.

comment COMMENT
     *Instance method.* Output an HTML comment.  As of 1.29, a newline is
     automatically appended.

ent ENTITY
     *Instance method.* Output an HTML entity.  For example, here's how
     you'd output a non-breaking space:

          $html->ent('nbsp');

     You may abbreviate this method name as e:

          $html->e('nbsp');

     Warning: this function assumes that the entity argument is legal.

io
     Return the underlying output handle for this HTML stream.  All you
     can depend upon is that it is some kind of object which responds to a
     print() message:

          $HTML->io->print("This is not auto-escaped or nuthin!");

nl [COUNT]
     *Instance method.* Output COUNT newlines.  If undefined, COUNT
     defaults to 1.

tag TAGNAME [, PARAM=>VALUE, ...]
     *Instance method.* Output a tag.  Returns the self object, to allow
     method chaining.  You can say `_A' instead of `"/A"', if you're into
     barewords.

text TEXT...
     *Instance method.* Output some text.  You may abbreviate this method
     name as t:

          $html->t('Hi there, ', $yournamehere, '!');

     Returns the self object, to allow method chaining.

text_nbsp TEXT...
     *Instance method.* Output some text, but with all spaces output as
     non-breaking-space characters:

          $html->t("To list your home directory, type: ")
               ->text_nbsp("ls -l ~yourname.")

     Returns the self object, to allow method chaining.

Strawberry
----------

output ITEM,...,ITEM
     *Instance method.* Go through the items.  If an item is an arrayref,
     treat it like the array argument to html_tag() and output the result.
     If an item is a text string, escape the text and output the result.
     Like this:

          output $HTML [A, HREF=>$url], "Here's my $caption!", [_A];

Chocolate
---------

accept_tag TAG
     *Class method.* Declares that the tag is to be accepted as valid HTML
     (if it isn't already).  For example, this...

          # Make sure methods MARQUEE and _MARQUEE are compiled on demand:
          HTML::Stream->accept_tag('MARQUEE');

     ...gives the Chocolate Interface permission to create (via AUTOLOAD)
     definitions for the MARQUEE and _MARQUEE methods, so you can then say:

          $HTML -> MARQUEE -> t("Hi!") -> _MARQUEE;

     If you want to set the default attribute of the tag as well, you can
     do so via the set_tag() method instead; it will effectively do an
     accept_tag() as well.

          # Make sure methods MARQUEE and _MARQUEE are compiled on demand,
          #   *and*, set the characteristics of that tag.
          HTML::Stream->set_tag('MARQUEE', Newlines=>9);

private_tags
     *Instance method.* Normally, HTML streams use a reference to a global
     table of tag information to determine how to do such things as
     auto-formatting, and modifications made to that table by set_tag will
     affect everyone.

     However, if you want an HTML stream to have a private copy of that
     table to munge with, just send it this message after creating it.
     Like this:

          my $HTML = new HTML::Stream \*STDOUT;
          $HTML->private_tags;

     Then, you can say stuff like:

          $HTML->set_tag('PRE',   Newlines=>0);
          $HTML->set_tag('BLINK', Newlines=>9);

     And it won't affect anyone else's *auto-formatting* (although they
     will possibly be able to use the BLINK tag method without a fatal
     exception `:-(' ).

     Returns the self object.

set_tag TAG, [TAGINFO...]
     *Class/instance method.* Accept the given TAG in the Chocolate
     Interface, and (if TAGINFO is given) alter its characteristics when
     being output.

        * *If invoked as a class method,* this alters the "master tag
          table", and allows a new tag to be supported via an autoloaded
          method:

               HTML::Stream->set_tag('MARQUEE', Newlines=>9);

          Once you do this, all HTML streams you open from then on will
          allow that tag to be output in the chocolate interface.

        * *If invoked as an instance method,* this alters the "tag table"
          referenced by that HTML stream, usually for the purpose of
          affecting things like the auto-formatting on that HTML stream.

          Warning: by default, an HTML stream just references the "master
          tag table" (this makes new() more efficient), so *by default, the
          instance method will behave exactly like the class method.*

               my $HTML = new HTML::Stream \*STDOUT;
               $HTML->set_tag('BLINK', Newlines=>0);  # changes it for others!

          If you want to diddle with one stream's auto-formatting *only,*
          you'll need to give that stream its own private tag table.  Like
          this:

               my $HTML = new HTML::Stream \*STDOUT;
               $HTML->private_tags;
               $HTML->set_tag('BLINK', Newlines=>0);  # doesn't affect other streams

          Note: this will still force an default entry for BLINK in the
          master tag table: otherwise, we'd never know that it was legal
          to AUTOLOAD a BLINK method.   However, it will only alter the
          *characteristics* of the BLINK tag (like auto-formatting) in the
          *object's* tag table.

     The TAGINFO, if given, is a set of key=>value pairs with the following
     possible keys:

    Newlines
          Assumed to be a number which encodes how newlines are to be
          output before/after a tag.   The value is the logical OR (or
          sum) of a set of flags:

               0x01    newline before <TAG>         .<TAG>.     .</TAG>.
               0x02    newline after <TAG>          |     |     |      |
               0x04    newline before </TAG>        1     2     4      8
               0x08    newline after </TAG>

          Hence, to output BLINK environments which are preceded/followed
          by newlines:

               set_tag HTML::Stream 'BLINK', Newlines=>9;

     Returns the self object on success.

tags
     *Class/instance method.* Returns an unsorted list of all tags in the
     class/instance tag table (see set_tag for class/instance method
     differences).

SUBCLASSES
==========

HTML::Stream::Latin1
--------------------

   A small, public package for outputting Latin-1 markup.  Its default
auto-escape function is LATIN_1, which tries to output the mnemonic entity
markup (e.g., `&ccedil;') for ISO-8859-1 characters.

   So using HTML::Stream::Latin1 like this:

     use HTML::Stream;
     
     $HTML = new HTML::Stream::Latin1 \*STDOUT;
     output $HTML "\253A right angle is 90\260, \277No?\273\n";

   Prints this:

     &laquo;A right angle is 90&deg;, &iquest;No?&raquo;

   Instead of what HTML::Stream would print, which is this:

     &#171;A right angle is 90&#176;, &#191;No?&#187;

   Warning: a lot of Latin-1 HTML markup is not recognized by older
browsers (e.g., Netscape 2.0).  Consider using HTML::Stream; it will output
the decimal entities which currently seem to be more "portable".

   Note: using this class "requires" that you have HTML::Entities.

PERFORMANCE
===========

   Slower than I'd like.  Both the output() method and the various "tag"
methods seem to run about 5 times slower than the old
just-hardcode-the-darn stuff approach.  That is, in general, this:

     ### Approach #1...
     tag  $HTML 'A', HREF=>"$href";
     tag  $HTML 'IMG', SRC=>"logo.gif", ALT=>"LOGO";
     text $HTML $caption;
     tag  $HTML '_A';
     text $HTML $a_lot_of_text;

   And this:

     ### Approach #2...
     output $HTML [A, HREF=>"$href"],
     	         [IMG, SRC=>"logo.gif", ALT=>"LOGO"],
     		 $caption,
     		 [_A];
     output $HTML $a_lot_of_text;

   And this:

     ### Approach #3...
     $HTML -> A(HREF=>"$href")
     	  -> IMG(SRC=>"logo.gif", ALT=>"LOGO")
     	  -> t($caption)
     	  -> _A
           -> t($a_lot_of_text);

   Each run about 5x slower than this:

     ### Approach #4...
     print '<A HREF="', html_escape($href), '>',
           '<IMG SRC="logo.gif" ALT="LOGO">',
       	  html_escape($caption),
           '</A>';
     print html_escape($a_lot_of_text);

   Of course, I'd much rather use any of first three *(especially #3)* if
I had to get something done right in a hurry.  Or did you not notice the
typo in approach #4?  `;-)'

   (BTW, thanks to Benchmark:: for allowing me to... er... benchmark
stuff.)

VERSION
=======

   $Id: Stream.pm,v 1.49 2000/08/16 05:15:43 eryq Exp $

CHANGE LOG
==========

Version 1.47   (2000/06/10)
     No real changes to code; just improved documentation.

Version 1.45   (1999/02/09)
     Cleanup for Perl 5.005: removed duplicate typeglob assignments.

Version 1.44   (1998/01/14)
     Win95 install (5.004) now works.  Added SYNOPSIS to POD.

Version 1.41   (1998/01/02)
     Removed $& for efficiency.  *Thanks, Andreas!*

     Added support for OPTION, and default now puts newlines after SELECT
     and /SELECT.  Also altered "TELEM" syntax to put newline after
     end-tags of list element tags (like /OPTION, /LI, etc.).  In theory,
     this change could produce undesireable results for folks who embed
     lists inside of PRE environments... however, that kind of stuff was
     done in the days before TABLEs; also, you can always turn it off if
     you really need to.  *Thanks to John D Groenveld for these patches.*

     Added text_nbsp().  *Thanks to John D Groenveld for the patch.* This
     method may also be invoked as nbsp_text() as in the original patch,
     but that's sort of a private tip-of-the-hat to the patch author, and
     the synonym may go away in the future.

Version 1.37   (1997/02/09)
     No real change; just trying to make CPAN.pm happier.

Version 1.32   (1997/01/12)
     *NEW TOOL for generating Perl code which uses HTML::Stream!* Check
     your toolkit for *html2perlstream*.

     Added built-in support for escaping 8-bit characters.

     Added LATIN_1 auto-escape, which uses HTML::Entities to generate
     mnemonic entities.  This is now the default method for
     HTML::Stream::Latin1.

     Added `auto_format(),' so you can now turn auto-formatting off/on.

     Added `private_tags()', so it is now possible for HTML streams to
     each have their own "private" copy of the %Tags table, for use by
     `set_tag()'.

     Added `set_tag()'.  The tags tables may now be modified dynamically so
     as to change how formatting is done on-the-fly.  This will hopefully
     not compromise the efficiency of the chocolate interface (until now,
     the formatting was compiled into the method itself), and *will* add
     greater flexibility for more-complex programs.

     Added POD documentation for all subroutines in the public interface.

Version 1.29   (1996/12/10)
     Added terminating newline to comment().  *Thanks to John D Groenveld
     for the suggestion and the patch.*

Version 1.27   (1996/12/10)
     Added built-in HTML::Stream::Latin1, which does a very simple encoding
     of all characters above ASCII 127.

     Fixed bug in accept_tag(), where 'my' variable was shadowing argument.
     *Thanks to John D Groenveld for the bug report and the patch.*

Version 1.26   (1996/09/27)
     Start of history.

ACKNOWLEDGEMENTS
================

   Warmest thanks to...

     John Buckman           For suggesting that I write an "html2perlstream",
                            and inspiring me to look at supporting Latin-1.
     Tony Cebzanov          For suggesting that I write an "html2perlstream"
     John D Groenveld       Bug reports, patches, and suggestions
     B. K. Oxley (binkley)  For suggesting the support of "writing to strings"
                            which became the "printable" interface.

AUTHOR
======

   Eryq, `eryq@zeegee.com'.  President, Zero G Inc.
(`http://www.zeegee.com').

   Enjoy.


