This is Info file /home/pdm/tmp/Python-1.5.2p1/Doc/lib/python-lib.info,
produced by Makeinfo version 1.68 from the input file lib.texi.

   July 6, 1999			1.5.2


File: python-lib.info,  Node: sgmllib,  Next: htmllib,  Prev: Internet Data Handling,  Up: Internet Data Handling

Simple SGML parser
==================

   Only as much of an SGML parser as needed to parse HTML.

   This module defines a class `SGMLParser' which serves as the basis
for parsing text files formatted in SGML (Standard Generalized Mark-up
Language).  In fact, it does not provide a full SGML parser -- it only
parses SGML insofar as it is used by HTML, and the module only exists
as a base for the `htmllib'module.

`SGMLParser()'
     The `SGMLParser' class is instantiated without arguments.  The
     parser is hardcoded to recognize the following constructs:

        * Opening and closing tags of the form `<TAG ATTR="VALUE" ...>'
          and `</TAG>', respectively.

        * Numeric character references of the form `&#NAME;'.

        * Entity references of the form `&NAME;'.

        * SGML comments of the form `<!--TEXT-->'.  Note that spaces,
          tabs, and newlines are allowed between the trailing `>' and
          the immediately preceeding `--'.

   `SGMLParser' instances have the following interface methods:

`reset()'
     Reset the instance.  Loses all unprocessed data.  This is called
     implicitly at instantiation time.

`setnomoretags()'
     Stop processing tags.  Treat all following input as literal input
     (CDATA).  (This is only provided so the HTML tag `<PLAINTEXT>' can
     be implemented.)

`setliteral()'
     Enter literal mode (CDATA mode).

`feed(data)'
     Feed some text to the parser.  It is processed insofar as it
     consists of complete elements; incomplete data is buffered until
     more data is fed or `close()' is called.

`close()'
     Force processing of all buffered data as if it were followed by an
     end-of-file mark.  This method may be redefined by a derived class
     to define additional processing at the end of the input, but the
     redefined version should always call `close()'.

`handle_starttag(tag, method, attributes)'
     This method is called to handle start tags for which either a
     `start_TAG()' or `do_TAG()' method has been defined.  The TAG
     argument is the name of the tag converted to lower case, and the
     METHOD argument is the bound method which should be used to
     support semantic interpretation of the start tag.  The ATTRIBUTES
     argument is a list of `(NAME, VALUE)' pairs containing the
     attributes found inside the tag's `<>' brackets.  The NAME has
     been translated to lower case and double quotes and backslashes in
     the VALUE have been interpreted.  For instance, for the tag `<A
     HREF="http://www.cwi.nl/">', this method would be called as
     `unknown_starttag('a', [('href', 'http://www.cwi.nl/')])'.  The
     base implementation simply calls METHOD with ATTRIBUTES as the
     only argument.

`handle_endtag(tag, method)'
     This method is called to handle endtags for which an `end_TAG()'
     method has been defined.  The TAG argument is the name of the tag
     converted to lower case, and the METHOD argument is the bound
     method which should be used to support semantic interpretation of
     the end tag.  If no `end_TAG()' method is defined for the closing
     element, this handler is not called.  The base implementation
     simply calls METHOD.

`handle_data(data)'
     This method is called to process arbitrary data.  It is intended
     to be overridden by a derived class; the base class implementation
     does nothing.

`handle_charref(ref)'
     This method is called to process a character reference of the form
     `&#REF;'.  In the base implementation, REF must be a decimal
     number in the range 0-255.  It translates the character to ASCII
     and calls the method `handle_data()' with the character as
     argument.  If REF is invalid or out of range, the method
     `unknown_charref(REF)' is called to handle the error.  A subclass
     must override this method to provide support for named character
     entities.

`handle_entityref(ref)'
     This method is called to process a general entity reference of the
     form `&REF;' where REF is an general entity reference.  It looks
     for REF in the instance (or class) variable `entitydefs' which
     should be a mapping from entity names to corresponding
     translations.  If a translation is found, it calls the method
     `handle_data()' with the translation; otherwise, it calls the
     method `unknown_entityref(REF)'.  The default `entitydefs' defines
     translations for `&amp;', `&apos', `&gt;', `&lt;', and `&quot;'.

`handle_comment(comment)'
     This method is called when a comment is encountered.  The COMMENT
     argument is a string containing the text between the `<!--' and
     `-->' delimiters, but not the delimiters themselves.  For example,
     the comment `<!--text-->' will cause this method to be called with
     the argument `'text''.  The default method does nothing.

`report_unbalanced(tag)'
     This method is called when an end tag is found which does not
     correspond to any open element.

`unknown_starttag(tag, attributes)'
     This method is called to process an unknown start tag.  It is
     intended to be overridden by a derived class; the base class
     implementation does nothing.

`unknown_endtag(tag)'
     This method is called to process an unknown end tag.  It is
     intended to be overridden by a derived class; the base class
     implementation does nothing.

`unknown_charref(ref)'
     This method is called to process unresolvable numeric character
     references.  Refer to `handle_charref()' to determine what is
     handled by default.  It is intended to be overridden by a derived
     class; the base class implementation does nothing.

`unknown_entityref(ref)'
     This method is called to process an unknown entity reference.  It
     is intended to be overridden by a derived class; the base class
     implementation does nothing.

   Apart from overriding or extending the methods listed above, derived
classes may also define methods of the following form to define
processing of specific tags.  Tag names in the input stream are case
independent; the TAG occurring in method names must be in lower case:

`start_TAG(attributes)'
     This method is called to process an opening tag TAG.  It has
     preference over `do_TAG()'.  The ATTRIBUTES argument has the same
     meaning as described for `handle_starttag()' above.

`do_TAG(attributes)'
     This method is called to process an opening tag TAG that does not
     come with a matching closing tag.  The ATTRIBUTES argument has the
     same meaning as described for `handle_starttag()' above.

`end_TAG()'
     This method is called to process a closing tag TAG.

   Note that the parser maintains a stack of open elements for which no
end tag has been found yet.  Only tags processed by `start_TAG()' are
pushed on this stack.  Definition of an `end_TAG()' method is optional
for these tags.  For tags processed by `do_TAG()' or by
`unknown_tag()', no `end_TAG()' method must be defined; if defined, it
will not be used.  If both `start_TAG()' and `do_TAG()' methods exist
for a tag, the `start_TAG()' method takes precedence.


File: python-lib.info,  Node: htmllib,  Next: htmlentitydefs,  Prev: sgmllib,  Up: Internet Data Handling

A parser for HTML documents
===========================

   A parser for HTML documents.

   This module defines a class which can serve as a base for parsing
text files formatted in the HyperText Mark-up Language (HTML).  The
class is not directly concerned with I/O -- it must be provided with
input in string form via a method, and makes calls to methods of a
"formatter" object in order to produce output.  The `HTMLParser' class
is designed to be used as a base class for other classes in order to
add functionality, and allows most of its methods to be extended or
overridden.  In turn, this class is derived from and extends the
`SGMLParser' class defined in module `sgmllib'.  The `HTMLParser'
implementation supports the HTML 2.0 language as described in RFC 1866.
Two implementations of formatter objects are provided in the
`formatter' module; refer to the documentation for that module for
information on the formatter interface.

   The following is a summary of the interface defined by
`sgmllib.SGMLParser':

   * The interface to feed data to an instance is through the `feed()'
     method, which takes a string argument.  This can be called with as
     little or as much text at a time as desired; `p.feed(a);
     p.feed(b)' has the same effect as `p.feed(a+b)'.  When the data
     contains complete HTML tags, these are processed immediately;
     incomplete elements are saved in a buffer.  To force processing of
     all unprocessed data, call the `close()' method.

     For example, to parse the entire contents of a file, use:
          parser.feed(open('myfile.html').read())
          parser.close()

   * The interface to define semantics for HTML tags is very simple:
     derive a class and define methods called `start_TAG()',
     `end_TAG()', or `do_TAG()'.  The parser will call these at
     appropriate moments: `start_TAG' or `do_TAG()' is called when an
     opening tag of the form `<TAG ...>' is encountered; `end_TAG()' is
     called when a closing tag of the form `<TAG>' is encountered.  If
     an opening tag requires a corresponding closing tag, like `<H1>'
     ... `</H1>', the class should define the `start_TAG()' method; if
     a tag requires no closing tag, like `<P>', the class should define
     the `do_TAG()' method.

   The module defines a single class:

`HTMLParser(formatter)'
     This is the basic HTML parser class.  It supports all entity names
     required by the HTML 2.0 specification (RFC 1866).  It also defines
     handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.

   See also:

   *Note htmlentitydefs:: Definition of replacement text for HTML 2.0
entities.  *Note sgmllib:: Base class for `HTMLParser'.

* Menu:

* HTMLParser Objects::


File: python-lib.info,  Node: HTMLParser Objects,  Prev: htmllib,  Up: htmllib

HTMLParser Objects
------------------

   In addition to tag methods, the `HTMLParser' class provides some
additional methods and instance variables for use within tag methods.

`formatter'
     This is the formatter instance associated with the parser.

`nofill'
     Boolean flag which should be true when whitespace should not be
     collapsed, or false when it should be.  In general, this should
     only be true when character data is to be treated as
     "preformatted" text, as within a `<PRE>' element.  The default
     value is false.  This affects the operation of `handle_data()' and
     `save_end()'.

`anchor_bgn(href, name, type)'
     This method is called at the start of an anchor region.  The
     arguments correspond to the attributes of the `<A>' tag with the
     same names.  The default implementation maintains a list of
     hyperlinks (defined by the `HREF' attribute for `<A>' tags) within
     the document.  The list of hyperlinks is available as the data
     attribute `anchorlist'.

`anchor_end()'
     This method is called at the end of an anchor region.  The default
     implementation adds a textual footnote marker using an index into
     the list of hyperlinks created by `anchor_bgn()'.

`handle_image(source, alt[, ismap[, align[, width[, height]]]])'
     This method is called to handle images.  The default implementation
     simply passes the ALT value to the `handle_data()' method.

`save_bgn()'
     Begins saving character data in a buffer instead of sending it to
     the formatter object.  Retrieve the stored data via `save_end()'.
     Use of the `save_bgn()' / `save_end()' pair may not be nested.

`save_end()'
     Ends buffering character data and returns all data saved since the
     preceeding call to `save_bgn()'.  If the `nofill' flag is false,
     whitespace is collapsed to single spaces.  A call to this method
     without a preceeding call to `save_bgn()' will raise a `TypeError'
     exception.


File: python-lib.info,  Node: htmlentitydefs,  Next: xmllib,  Prev: htmllib,  Up: Internet Data Handling

Definitions of HTML general entities
====================================

   Definitions of HTML general entities.

   This section was written by Fred L. Drake, Jr. <fdrake@acm.org>.
This module defines a single dictionary, `entitydefs', which is used by
the `htmllib' module to provide the `entitydefs' member of the
`HTMLParser' class.  The definition provided here contains all the
entities defined by HTML 2.0 that can be handled using simple textual
substitution in the Latin-1 character set (ISO-8859-1).

`entitydefs'
     A dictionary mapping HTML 2.0 entity definitions to their
     replacement text in ISO Latin-1.


File: python-lib.info,  Node: xmllib,  Next: formatter,  Prev: htmlentitydefs,  Up: Internet Data Handling

A parser for XML documents
==========================

   A parser for XML documents.  This module was documented by Sjoerd
Mullender <Sjoerd.Mullender@cwi.nl>.
This section was written by Sjoerd Mullender <Sjoerd.Mullender@cwi.nl>.
*Changed in Python version 1.5.2*

   This module defines a class `XMLParser' which serves as the basis
for parsing text files formatted in XML (Extensible Markup Language).

`XMLParser()'
     The `XMLParser' class must be instantiated without arguments.

   This class provides the following interface methods and instance
variables:

`attributes'
     A mapping of element names to mappings.  The latter mapping maps
     attribute names that are valid for the element to the default
     value of the attribute, or if there is no default to `None'.  The
     default value is the empty dictionary.  This variable is meant to
     be overridden, not extended since the default is shared by all
     instances of `XMLParser'.

`elements'
     A mapping of element names to tuples.  The tuples contain a
     function for handling the start and end tag respectively of the
     element, or `None' if the method `unknown_starttag()' or
     `unknown_endtag()' is to be called.  The default value is the
     empty dictionary.  This variable is meant to be overridden, not
     extended since the default is shared by all instances of
     `XMLParser'.

`entitydefs'
     A mapping of entitynames to their values.  The default value
     contains definitions for `'lt'', `'gt'', `'amp'', `'quot'', and
     `'apos''.

`reset()'
     Reset the instance.  Loses all unprocessed data.  This is called
     implicitly at the instantiation time.

`setnomoretags()'
     Stop processing tags.  Treat all following input as literal input
     (CDATA).

`setliteral()'
     Enter literal mode (CDATA mode).  This mode is automatically exited
     when the close tag matching the last unclosed open tag is
     encountered.

`feed(data)'
     Feed some text to the parser.  It is processed insofar as it
     consists of complete tags; incomplete data is buffered until more
     data is fed or `close()' is called.

`close()'
     Force processing of all buffered data as if it were followed by an
     end-of-file mark.  This method may be redefined by a derived class
     to define additional processing at the end of the input, but the
     redefined version should always call `close()'.

`translate_references(data)'
     Translate all entity and character references in DATA and return
     the translated string.

`handle_xml(encoding, standalone)'
     This method is called when the `<?xml ...?>' tag is processed.
     The arguments are the values of the encoding and standalone
     attributes in the tag.  Both encoding and standalone are optional.
     The values passed to `handle_xml()' default to `None' and the
     string `'no'' respectively.

`handle_doctype(tag, data)'
     This method is called when the `<!DOCTYPE...>' tag is processed.
     The arguments are the name of the root element and the
     uninterpreted contents of the tag, starting after the white space
     after the name of the root element.

`handle_starttag(tag, method, attributes)'
     This method is called to handle start tags for which a start tag
     handler is defined in the instance variable `elements'.  The TAG
     argument is the name of the tag, and the METHOD argument is the
     function (method) which should be used to support semantic
     interpretation of the start tag.  The ATTRIBUTES argument is a
     dictionary of attributes, the key being the NAME and the value
     being the VALUE of the attribute found inside the tag's `<>'
     brackets.  Character and entity references in the VALUE have been
     interpreted.  For instance, for the start tag `<A
     HREF="http://www.cwi.nl/">', this method would be called as
     `handle_starttag('A', self.elements['A'][0], {'HREF':
     'http://www.cwi.nl/'})'.  The base implementation simply calls
     METHOD with ATTRIBUTES as the only argument.

`handle_endtag(tag, method)'
     This method is called to handle endtags for which an end tag
     handler is defined in the instance variable `elements'.  The TAG
     argument is the name of the tag, and the METHOD argument is the
     function (method) which should be used to support semantic
     interpretation of the end tag.  For instance, for the endtag
     `</A>', this method would be called as `handle_endtag('A',
     self.elements['A'][1])'.  The base implementation simply calls
     METHOD.

`handle_data(data)'
     This method is called to process arbitrary data.  It is intended
     to be overridden by a derived class; the base class implementation
     does nothing.

`handle_charref(ref)'
     This method is called to process a character reference of the form
     `&#REF;'.  REF can either be a decimal number, or a hexadecimal
     number when preceded by an `x'.  In the base implementation, REF
     must be a number in the range 0-255.  It translates the character
     to ASCII and calls the method `handle_data()' with the character
     as argument.  If REF is invalid or out of range, the method
     `unknown_charref(REF)' is called to handle the error.  A subclass
     must override this method to provide support for character
     references outside of the ASCII range.

`handle_entityref(ref)'
     This method is called to process a general entity reference of the
     form `&REF;' where REF is an general entity reference.  It looks
     for REF in the instance (or class) variable `entitydefs' which
     should be a mapping from entity names to corresponding
     translations.  If a translation is found, it calls the method
     `handle_data()' with the translation; otherwise, it calls the
     method `unknown_entityref(REF)'.  The default `entitydefs' defines
     translations for `&amp;', `&apos', `&gt;', `&lt;', and `&quot;'.

`handle_comment(comment)'
     This method is called when a comment is encountered.  The COMMENT
     argument is a string containing the text between the `<!--' and
     `-->' delimiters, but not the delimiters themselves.  For example,
     the comment `<!--text-->' will cause this method to be called with
     the argument `'text''.  The default method does nothing.

`handle_cdata(data)'
     This method is called when a CDATA element is encountered.  The
     DATA argument is a string containing the text between the
     `<![CDATA[' and `]]>' delimiters, but not the delimiters
     themselves.  For example, the entity `<![CDATA[text]]>' will cause
     this method to be called with the argument `'text''.  The default
     method does nothing, and is intended to be overridden.

`handle_proc(name, data)'
     This method is called when a processing instruction (PI) is
     encountered.  The NAME is the PI target, and the DATA argument is
     a string containing the text between the PI target and the closing
     delimiter, but not the delimiter itself.  For example, the
     instruction `<?XML text?>' will cause this method to be called
     with the arguments `'XML'' and `'text''.  The default method does
     nothing.  Note that if a document starts with `<?xml ..?>',
     `handle_xml()' is called to handle it.

`handle_special(data)'
     This method is called when a declaration is encountered.  The DATA
     argument is a string containing the text between the `<!' and `>'
     delimiters, but not the delimiters themselves.  For example, the
     entity `<!ENTITY text>' will cause this method to be called with
     the argument `'ENTITY text''.  The default method does nothing.
     Note that `<!DOCTYPE ...>' is handled separately if it is located
     at the start of the document.

`syntax_error(message)'
     This method is called when a syntax error is encountered.  The
     MESSAGE is a description of what was wrong.  The default method
     raises a `RuntimeError' exception.  If this method is overridden,
     it is permissable for it to return.  This method is only called
     when the error can be recovered from.  Unrecoverable errors raise
     a `RuntimeError' without first calling `syntax_error()'.

`unknown_starttag(tag, attributes)'
     This method is called to process an unknown start tag.  It is
     intended to be overridden by a derived class; the base class
     implementation does nothing.

`unknown_endtag(tag)'
     This method is called to process an unknown end tag.  It is
     intended to be overridden by a derived class; the base class
     implementation does nothing.

`unknown_charref(ref)'
     This method is called to process unresolvable numeric character
     references.  It is intended to be overridden by a derived class;
     the base class implementation does nothing.

`unknown_entityref(ref)'
     This method is called to process an unknown entity reference.  It
     is intended to be overridden by a derived class; the base class
     implementation does nothing.

   See also:

   The Python XML Topic Guide provides a great deal of information on
using XML from Python and links to other sources of information on XML.
It's located on the Web at `http://www.python.org/topics/xml/'.

   The Python XML Special Interest Group is developing substantial
support for processing XML from Python.  See
`http://www.python.org/sigs/xml-sig/' for more information.

* Menu:

* XML Namespaces::


File: python-lib.info,  Node: XML Namespaces,  Prev: xmllib,  Up: xmllib

XML Namespaces
--------------

   This module has support for XML namespaces as defined in the XML
Namespaces proposed recommendation.

   Tag and attribute names that are defined in an XML namespace are
handled as if the name of the tag or element consisted of the namespace
(i.e. the URL that defines the namespace) followed by a space and the
name of the tag or attribute.  For instance, the tag `<html
xmlns='http://www.w3.org/TR/REC-html40'>' is treated as if the tag name
was `'http://www.w3.org/TR/REC-html40 html'', and the tag `<html:a
href='http://frob.com'>' inside the above mentioned element is treated
as if the tag name were `'http://www.w3.org/TR/REC-html40 a'' and the
attribute name as if it were `'http://www.w3.org/TR/REC-html40 src''.

   An older draft of the XML Namespaces proposal is also recognized, but
triggers a warning.


File: python-lib.info,  Node: formatter,  Next: rfc822,  Prev: xmllib,  Up: Internet Data Handling

Generic output formatting
=========================

   Generic output formatter and device interface.

   This module supports two interface definitions, each with mulitple
implementations.  The *formatter* interface is used by the `HTMLParser'
class of the `htmllib' module, and the *writer* interface is required
by the formatter interface.

   Formatter objects transform an abstract flow of formatting events
into specific output events on writer objects.  Formatters manage
several stack structures to allow various properties of a writer object
to be changed and restored; writers need not be able to handle relative
changes nor any sort of "change back" operation.  Specific writer
properties which may be controlled via formatter objects are horizontal
alignment, font, and left margin indentations.  A mechanism is provided
which supports providing arbitrary, non-exclusive style settings to a
writer as well.  Additional interfaces facilitate formatting events
which are not reversible, such as paragraph separation.

   Writer objects encapsulate device interfaces.  Abstract devices, such
as file formats, are supported as well as physical devices.  The
provided implementations all work with abstract devices.  The interface
makes available mechanisms for setting the properties which formatter
objects manage and inserting data into the output.

* Menu:

* Formatter Interface::
* Formatter Implementations::
* Writer Interface::
* Writer Implementations::


File: python-lib.info,  Node: Formatter Interface,  Next: Formatter Implementations,  Prev: formatter,  Up: formatter

The Formatter Interface
-----------------------

   Interfaces to create formatters are dependent on the specific
formatter class being instantiated.  The interfaces described below are
the required interfaces which all formatters must support once
initialized.

   One data element is defined at the module level:

`AS_IS'
     Value which can be used in the font specification passed to the
     `push_font()' method described below, or as the new value to any
     other `push_PROPERTY()' method.  Pushing the `AS_IS' value allows
     the corresponding `pop_PROPERTY()' method to be called without
     having to track whether the property was changed.

   The following attributes are defined for formatter instance objects:

`writer'
     The writer instance with which the formatter interacts.

`end_paragraph(blanklines)'
     Close any open paragraphs and insert at least BLANKLINES before
     the next paragraph.

`add_line_break()'
     Add a hard line break if one does not already exist.  This does not
     break the logical paragraph.

`add_hor_rule(*args, **kw)'
     Insert a horizontal rule in the output.  A hard break is inserted
     if there is data in the current paragraph, but the logical
     paragraph is not broken.  The arguments and keywords are passed on
     to the writer's `send_line_break()' method.

`add_flowing_data(data)'
     Provide data which should be formatted with collapsed whitespaces.
     Whitespace from preceeding and successive calls to
     `add_flowing_data()' is considered as well when the whitespace
     collapse is performed.  The data which is passed to this method is
     expected to be word-wrapped by the output device.  Note that any
     word-wrapping still must be performed by the writer object due to
     the need to rely on device and font information.

`add_literal_data(data)'
     Provide data which should be passed to the writer unchanged.
     Whitespace, including newline and tab characters, are considered
     legal in the value of DATA.

`add_label_data(format, counter)'
     Insert a label which should be placed to the left of the current
     left margin.  This should be used for constructing bulleted or
     numbered lists.  If the FORMAT value is a string, it is
     interpreted as a format specification for COUNTER, which should be
     an integer.  The result of this formatting becomes the value of
     the label; if FORMAT is not a string it is used as the label value
     directly.  The label value is passed as the only argument to the
     writer's `send_label_data()' method.  Interpretation of non-string
     label values is dependent on the associated writer.

     Format specifications are strings which, in combination with a
     counter value, are used to compute label values.  Each character
     in the format string is copied to the label value, with some
     characters recognized to indicate a transform on the counter
     value.  Specifically, the character `1' represents the counter
     value formatter as an arabic number, the characters `A' and `a'
     represent alphabetic representations of the counter value in upper
     and lower case, respectively, and `I' and `i' represent the
     counter value in Roman numerals, in upper and lower case.  Note
     that the alphabetic and roman transforms require that the counter
     value be greater than zero.

`flush_softspace()'
     Send any pending whitespace buffered from a previous call to
     `add_flowing_data()' to the associated writer object.  This should
     be called before any direct manipulation of the writer object.

`push_alignment(align)'
     Push a new alignment setting onto the alignment stack.  This may be
     `AS_IS' if no change is desired.  If the alignment value is
     changed from the previous setting, the writer's `new_alignment()'
     method is called with the ALIGN value.

`pop_alignment()'
     Restore the previous alignment.

`push_font(`('size, italic, bold, teletype`)')'
     Change some or all font properties of the writer object.
     Properties which are not set to `AS_IS' are set to the values
     passed in while others are maintained at their current settings.
     The writer's `new_font()' method is called with the fully resolved
     font specification.

`pop_font()'
     Restore the previous font.

`push_margin(margin)'
     Increase the number of left margin indentations by one, associating
     the logical tag MARGIN with the new indentation.  The initial
     margin level is `0'.  Changed values of the logical tag must be
     true values; false values other than `AS_IS' are not sufficient to
     change the margin.

`pop_margin()'
     Restore the previous margin.

`push_style(*styles)'
     Push any number of arbitrary style specifications.  All styles are
     pushed onto the styles stack in order.  A tuple representing the
     entire stack, including `AS_IS' values, is passed to the writer's
     `new_styles()' method.

`pop_style([n` = 1'])'
     Pop the last N style specifications passed to `push_style()'.  A
     tuple representing the revised stack, including `AS_IS' values, is
     passed to the writer's `new_styles()' method.

`set_spacing(spacing)'
     Set the spacing style for the writer.

`assert_line_data([flag` = 1'])'
     Inform the formatter that data has been added to the current
     paragraph out-of-band.  This should be used when the writer has
     been manipulated directly.  The optional FLAG argument can be set
     to false if the writer manipulations produced a hard line break at
     the end of the output.


File: python-lib.info,  Node: Formatter Implementations,  Next: Writer Interface,  Prev: Formatter Interface,  Up: formatter

Formatter Implementations
-------------------------

   Two implementations of formatter objects are provided by this module.
Most applications may use one of these classes without modification or
subclassing.

`NullFormatter([writer])'
     A formatter which does nothing.  If WRITER is omitted, a
     `NullWriter' instance is created.  No methods of the writer are
     called by `NullFormatter' instances.  Implementations should
     inherit from this class if implementing a writer interface but
     don't need to inherit any implementation.

`AbstractFormatter(writer)'
     The standard formatter.  This implementation has demonstrated wide
     applicability to many writers, and may be used directly in most
     circumstances.  It has been used to implement a full-featured
     world-wide web browser.


File: python-lib.info,  Node: Writer Interface,  Next: Writer Implementations,  Prev: Formatter Implementations,  Up: formatter

The Writer Interface
--------------------

   Interfaces to create writers are dependent on the specific writer
class being instantiated.  The interfaces described below are the
required interfaces which all writers must support once initialized.
Note that while most applications can use the `AbstractFormatter' class
as a formatter, the writer must typically be provided by the
application.

`flush()'
     Flush any buffered output or device control events.

`new_alignment(align)'
     Set the alignment style.  The ALIGN value can be any object, but
     by convention is a string or `None', where `None' indicates that
     the writer's "preferred" alignment should be used.  Conventional
     ALIGN values are `'left'', `'center'', `'right'', and `'justify''.

`new_font(font)'
     Set the font style.  The value of FONT will be `None', indicating
     that the device's default font should be used, or a tuple of the
     form `('SIZE, ITALIC, BOLD, TELETYPE`)'.  Size will be a string
     indicating the size of font that should be used; specific strings
     and their interpretation must be defined by the application.  The
     ITALIC, BOLD, and TELETYPE values are boolean indicators
     specifying which of those font attributes should be used.

`new_margin(margin, level)'
     Set the margin level to the integer LEVEL and the logical tag to
     MARGIN.  Interpretation of the logical tag is at the writer's
     discretion; the only restriction on the value of the logical tag
     is that it not be a false value for non-zero values of LEVEL.

`new_spacing(spacing)'
     Set the spacing style to SPACING.

`new_styles(styles)'
     Set additional styles.  The STYLES value is a tuple of arbitrary
     values; the value `AS_IS' should be ignored.  The STYLES tuple may
     be interpreted either as a set or as a stack depending on the
     requirements of the application and writer implementation.

`send_line_break()'
     Break the current line.

`send_paragraph(blankline)'
     Produce a paragraph separation of at least BLANKLINE blank lines,
     or the equivelent.  The BLANKLINE value will be an integer.  Note
     that the implementation will receive a call to `send_line_break()'
     before this call if a line break is needed; this method should not
     include ending the last line of the paragraph.  It is only
     responsible for vertical spacing between paragraphs.

`send_hor_rule(*args, **kw)'
     Display a horizontal rule on the output device.  The arguments to
     this method are entirely application- and writer-specific, and
     should be interpreted with care.  The method implementation may
     assume that a line break has already been issued via
     `send_line_break()'.

`send_flowing_data(data)'
     Output character data which may be word-wrapped and re-flowed as
     needed.  Within any sequence of calls to this method, the writer
     may assume that spans of multiple whitespace characters have been
     collapsed to single space characters.

`send_literal_data(data)'
     Output character data which has already been formatted for
     display.  Generally, this should be interpreted to mean that line
     breaks indicated by newline characters should be preserved and no
     new line breaks should be introduced.  The data may contain
     embedded newline and tab characters, unlike data provided to the
     `send_formatted_data()' interface.

`send_label_data(data)'
     Set DATA to the left of the current left margin, if possible.  The
     value of DATA is not restricted; treatment of non-string values is
     entirely application- and writer-dependent.  This method will only
     be called at the beginning of a line.


File: python-lib.info,  Node: Writer Implementations,  Prev: Writer Interface,  Up: formatter

Writer Implementations
----------------------

   Three implementations of the writer object interface are provided as
examples by this module.  Most applications will need to derive new
writer classes from the `NullWriter' class.

`NullWriter()'
     A writer which only provides the interface definition; no actions
     are taken on any methods.  This should be the base class for all
     writers which do not need to inherit any implementation methods.

`AbstractWriter()'
     A writer which can be used in debugging formatters, but not much
     else.  Each method simply announces itself by printing its name and
     arguments on standard output.

`DumbWriter([file[, maxcol` = 72']])'
     Simple writer class which writes output on the file object passed
     in as FILE or, if FILE is omitted, on standard output.  The output
     is simply word-wrapped to the number of columns specified by
     MAXCOL.  This class is suitable for reflowing a sequence of
     paragraphs.


File: python-lib.info,  Node: rfc822,  Next: mimetools,  Prev: formatter,  Up: Internet Data Handling

Parse RFC 822 mail headers
==========================

   Parse RFC 822 style mail headers.

   This module defines a class, `Message', which represents a
collection of "email headers" as defined by the Internet standard RFC
822.  It is used in various contexts, usually to read such headers from
a file.  This module also defines a helper class `AddressList' for
parsing RFC 822 addresses.

   Note that there's a separate module to read UNIX, MH, and MMDF style
mailbox files: `mailbox'.

`Message(file[, seekable])'
     A `Message' instance is instantiated with an input object as
     parameter.  Message relies only on the input object having a
     `readline()' method; in particular, ordinary file objects qualify.
     Instantiation reads headers from the input object up to a
     delimiter line (normally a blank line) and stores them in the
     instance.

     This class can work with any input object that supports a
     `readline()' method.  If the input object has seek and tell
     capability, the `rewindbody()' method will work; also, illegal
     lines will be pushed back onto the input stream.  If the input
     object lacks seek but has an `unread()' method that can push back a
     line of input, `Message' will use that to push back illegal lines.
     Thus this class can be used to parse messages coming from a
     buffered stream.

     The optional SEEKABLE argument is provided as a workaround for
     certain stdio libraries in which `tell()' discards buffered data
     before discovering that the `lseek()' system call doesn't work.
     For maximum portability, you should set the seekable argument to
     zero to prevent that initial `tell()' when passing in an
     unseekable object such as a a file object created from a socket
     object.

     Input lines as read from the file may either be terminated by
     CR-LF or by a single linefeed; a terminating CR-LF is replaced by
     a single linefeed before the line is stored.

     All header matching is done independent of upper or lower case;
     e.g. `M['From']', `M['from']' and `M['FROM']' all yield the same
     result.

`AddressList(field)'
     You may instantiate the `AddressList' helper class using a single
     string parameter, a comma-separated list of RFC 822 addresses to be
     parsed.  (The parameter `None' yields an empty list.)

`parsedate(date)'
     Attempts to parse a date according to the rules in RFC 822.
     however, some mailers don't follow that format as specified, so
     `parsedate()' tries to guess correctly in such cases.  DATE is a
     string containing an RFC 822 date, such as `'Mon, 20 Nov 1995
     19:12:08 -0500''.  If it succeeds in parsing the date,
     `parsedate()' returns a 9-tuple that can be passed directly to
     `time.mktime()'; otherwise `None' will be returned.

`parsedate_tz(date)'
     Performs the same function as `parsedate()', but returns either
     `None' or a 10-tuple; the first 9 elements make up a tuple that
     can be passed directly to `time.mktime()', and the tenth is the
     offset of the date's timezone from UTC (which is the official term
     for Greenwich Mean Time).  (Note that the sign of the timezone
     offset is the opposite of the sign of the `time.timezone' variable
     for the same timezone; the latter variable follows the POSIX
     standard while this module follows RFC 822.)  If the input string
     has no timezone, the last element of the tuple returned is `None'.

`mktime_tz(tuple)'
     Turn a 10-tuple as returned by `parsedate_tz()' into a UTC
     timestamp.  It the timezone item in the tuple is `None', assume
     local time.  Minor deficiency: this first interprets the first 8
     elements as a local time and then compensates for the timezone
     difference; this may yield a slight error around daylight savings
     time switch dates.  Not enough to worry about for common use.

* Menu:

* Message Objects::
* AddressList Objects::


File: python-lib.info,  Node: Message Objects,  Next: AddressList Objects,  Prev: rfc822,  Up: rfc822

Message Objects
---------------

   A `Message' instance has the following methods:

`rewindbody()'
     Seek to the start of the message body.  This only works if the file
     object is seekable.

`isheader(line)'
     Returns a line's canonicalized fieldname (the dictionary key that
     will be used to index it) if the line is a legal RFC 822 header;
     otherwise returns None (implying that parsing should stop here and
     the line be pushed back on the input stream).  It is sometimes
     useful to override this method in a subclass.

`islast(line)'
     Return true if the given line is a delimiter on which Message
     should stop.  The delimiter line is consumed, and the file
     object's read location positioned immediately after it.  By
     default this method just checks that the line is blank, but you
     can override it in a subclass.

`iscomment(line)'
     Return true if the given line should be ignored entirely, just
     skipped.  By default this is a stub that always returns false, but
     you can override it in a subclass.

`getallmatchingheaders(name)'
     Return a list of lines consisting of all headers matching NAME, if
     any.  Each physical line, whether it is a continuation line or
     not, is a separate list item.  Return the empty list if no header
     matches NAME.

`getfirstmatchingheader(name)'
     Return a list of lines comprising the first header matching NAME,
     and its continuation line(s), if any.  Return `None' if there is
     no header matching NAME.

`getrawheader(name)'
     Return a single string consisting of the text after the colon in
     the first header matching NAME.  This includes leading whitespace,
     the trailing linefeed, and internal linefeeds and whitespace if
     there any continuation line(s) were present.  Return `None' if
     there is no header matching NAME.

`getheader(name[, default])'
     Like `getrawheader(NAME)', but strip leading and trailing
     whitespace.  Internal whitespace is not stripped.  The optional
     DEFAULT argument can be used to specify a different default to be
     returned when there is no header matching NAME.

`get(name[, default])'
     An alias for `getheader()', to make the interface more compatible
     with regular dictionaries.

`getaddr(name)'
     Return a pair `(FULL NAME, EMAIL ADDRESS)' parsed from the string
     returned by `getheader(NAME)'.  If no header matching NAME exists,
     return `(None, None)'; otherwise both the full name and the
     address are (possibly empty) strings.

     Example: If M's first `From' header contains the string
     `'jack@cwi.nl (Jack Jansen)'', then `m.getaddr('From')' will yield
     the pair `('Jack Jansen', 'jack@cwi.nl')'.  If the header contained
     `'Jack Jansen <jack@cwi.nl>'' instead, it would yield the exact
     same result.

`getaddrlist(name)'
     This is similar to `getaddr(LIST)', but parses a header containing
     a list of email addresses (e.g. a `To' header) and returns a list
     of `(FULL NAME, EMAIL ADDRESS)' pairs (even if there was only one
     address in the header).  If there is no header matching NAME,
     return an empty list.

     If multiple headers exist that match the named header (e.g. if
     there are several `Cc' headers), all are parsed for addresses.  Any
     continuation lines the named headers contain are also parsed.

`getdate(name)'
     Retrieve a header using `getheader()' and parse it into a 9-tuple
     compatible with `time.mktime()'.  If there is no header matching
     NAME, or it is unparsable, return `None'.

     Date parsing appears to be a black art, and not all mailers adhere
     to the standard.  While it has been tested and found correct on a
     large collection of email from many sources, it is still possible
     that this function may occasionally yield an incorrect result.

`getdate_tz(name)'
     Retrieve a header using `getheader()' and parse it into a
     10-tuple; the first 9 elements will make a tuple compatible with
     `time.mktime()', and the 10th is a number giving the offset of the
     date's timezone from UTC.  Similarly to `getdate()', if there is
     no header matching NAME, or it is unparsable, return `None'.

   `Message' instances also support a read-only mapping interface.  In
particular: `M[name]' is like `M.getheader(name)' but raises `KeyError'
if there is no matching header; and `len(M)', `M.has_key(name)',
`M.keys()', `M.values()' and `M.items()' act as expected (and
consistently).

   Finally, `Message' instances have two public instance variables:

`headers'
     A list containing the entire set of header lines, in the order in
     which they were read (except that setitem calls may disturb this
     order). Each line contains a trailing newline.  The blank line
     terminating the headers is not contained in the list.

`fp'
     The file or file-like object passed at instantiation time.  This
     can be used to read the message content.


File: python-lib.info,  Node: AddressList Objects,  Prev: Message Objects,  Up: rfc822

AddressList Objects
-------------------

   An `AddressList' instance has the following methods:

`__len__(name)'
     Return the number of addresses in the address list.

`__str__(name)'
     Return a canonicalized string representation of the address list.
     Addresses are rendered in "name" <host@domain> form,
     comma-separated.

`__add__(name)'
     Return an `AddressList' instance that contains all addresses in
     both `AddressList' operands, with duplicates removed (set union).

`__sub__(name)'
     Return an `AddressList' instance that contains every address in the
     left-hand `AddressList' operand that is not present in the
     right-hand address operand (set difference).

   Finally, `AddressList' instances have one public instance variable:

`addresslist'
     A list of tuple string pairs, one per address.  In each member, the
     first is the canonicalized name part of the address, the second is
     the route-address (@-separated host-domain pair).


File: python-lib.info,  Node: mimetools,  Next: MimeWriter,  Prev: rfc822,  Up: Internet Data Handling

Tools for parsing MIME messages
===============================

   Tools for parsing MIME style message bodies.

   This module defines a subclass of the `rfc822.Message' class and a
number of utility functions that are useful for the manipulation for
MIME multipart or encoded message.

   It defines the following items:

`Message(fp[, seekable])'
     Return a new instance of the `Message' class.  This is a subclass
     of the `rfc822.Message' class, with some additional methods (see
     below).  The SEEKABLE argument has the same meaning as for
     `rfc822.Message'.

`choose_boundary()'
     Return a unique string that has a high likelihood of being usable
     as a part boundary.  The string has the form
     `'HOSTIPADDR.UID.PID.TIMESTAMP.RANDOM''.

`decode(input, output, encoding)'
     Read data encoded using the allowed MIME ENCODING from open file
     object INPUT and write the decoded data to open file object
     OUTPUT.  Valid values for ENCODING include `'base64'',
     `'quoted-printable'' and `'uuencode''.

`encode(input, output, encoding)'
     Read data from open file object INPUT and write it encoded using
     the allowed MIME ENCODING to open file object OUTPUT.  Valid
     values for ENCODING are the same as for `decode()'.

`copyliteral(input, output)'
     Read lines until `EOF' from open file INPUT and write them to open
     file OUTPUT.

`copybinary(input, output)'
     Read blocks until `EOF' from open file INPUT and write them to
     open file OUTPUT.  The block size is currently fixed at 8192.

* Menu:

* mimetools.Message Methods::