', the class should define the `do_TAG()' method. The module defines a single class: `HTMLParser(formatter)' This is the basic HTML parser class. It supports all entity names required by the HTML 2.0 specification (RFC 1866). It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements. See also: *Note htmlentitydefs:: Definition of replacement text for HTML 2.0 entities. *Note sgmllib:: Base class for `HTMLParser'. * Menu: * HTMLParser Objects:: File: python-lib.info, Node: HTMLParser Objects, Prev: htmllib, Up: htmllib HTMLParser Objects ------------------ In addition to tag methods, the `HTMLParser' class provides some additional methods and instance variables for use within tag methods. `formatter' This is the formatter instance associated with the parser. `nofill' Boolean flag which should be true when whitespace should not be collapsed, or false when it should be. In general, this should only be true when character data is to be treated as "preformatted" text, as within a `
' element. The default
value is false. This affects the operation of `handle_data()' and
`save_end()'.
`anchor_bgn(href, name, type)'
This method is called at the start of an anchor region. The
arguments correspond to the attributes of the `' tag with the
same names. The default implementation maintains a list of
hyperlinks (defined by the `HREF' attribute for `' tags) within
the document. The list of hyperlinks is available as the data
attribute `anchorlist'.
`anchor_end()'
This method is called at the end of an anchor region. The default
implementation adds a textual footnote marker using an index into
the list of hyperlinks created by `anchor_bgn()'.
`handle_image(source, alt[, ismap[, align[, width[, height]]]])'
This method is called to handle images. The default implementation
simply passes the ALT value to the `handle_data()' method.
`save_bgn()'
Begins saving character data in a buffer instead of sending it to
the formatter object. Retrieve the stored data via `save_end()'.
Use of the `save_bgn()' / `save_end()' pair may not be nested.
`save_end()'
Ends buffering character data and returns all data saved since the
preceeding call to `save_bgn()'. If the `nofill' flag is false,
whitespace is collapsed to single spaces. A call to this method
without a preceeding call to `save_bgn()' will raise a `TypeError'
exception.
File: python-lib.info, Node: htmlentitydefs, Next: xmllib, Prev: htmllib, Up: Internet Data Handling
Definitions of HTML general entities
====================================
Definitions of HTML general entities.
This section was written by Fred L. Drake, Jr. .
This module defines a single dictionary, `entitydefs', which is used by
the `htmllib' module to provide the `entitydefs' member of the
`HTMLParser' class. The definition provided here contains all the
entities defined by HTML 2.0 that can be handled using simple textual
substitution in the Latin-1 character set (ISO-8859-1).
`entitydefs'
A dictionary mapping HTML 2.0 entity definitions to their
replacement text in ISO Latin-1.
File: python-lib.info, Node: xmllib, Next: formatter, Prev: htmlentitydefs, Up: Internet Data Handling
A parser for XML documents
==========================
A parser for XML documents. This module was documented by Sjoerd
Mullender .
This section was written by Sjoerd Mullender .
*Changed in Python version 1.5.2*
This module defines a class `XMLParser' which serves as the basis
for parsing text files formatted in XML (Extensible Markup Language).
`XMLParser()'
The `XMLParser' class must be instantiated without arguments.
This class provides the following interface methods and instance
variables:
`attributes'
A mapping of element names to mappings. The latter mapping maps
attribute names that are valid for the element to the default
value of the attribute, or if there is no default to `None'. The
default value is the empty dictionary. This variable is meant to
be overridden, not extended since the default is shared by all
instances of `XMLParser'.
`elements'
A mapping of element names to tuples. The tuples contain a
function for handling the start and end tag respectively of the
element, or `None' if the method `unknown_starttag()' or
`unknown_endtag()' is to be called. The default value is the
empty dictionary. This variable is meant to be overridden, not
extended since the default is shared by all instances of
`XMLParser'.
`entitydefs'
A mapping of entitynames to their values. The default value
contains definitions for `'lt'', `'gt'', `'amp'', `'quot'', and
`'apos''.
`reset()'
Reset the instance. Loses all unprocessed data. This is called
implicitly at the instantiation time.
`setnomoretags()'
Stop processing tags. Treat all following input as literal input
(CDATA).
`setliteral()'
Enter literal mode (CDATA mode). This mode is automatically exited
when the close tag matching the last unclosed open tag is
encountered.
`feed(data)'
Feed some text to the parser. It is processed insofar as it
consists of complete tags; incomplete data is buffered until more
data is fed or `close()' is called.
`close()'
Force processing of all buffered data as if it were followed by an
end-of-file mark. This method may be redefined by a derived class
to define additional processing at the end of the input, but the
redefined version should always call `close()'.
`translate_references(data)'
Translate all entity and character references in DATA and return
the translated string.
`handle_xml(encoding, standalone)'
This method is called when the `' tag is processed.
The arguments are the values of the encoding and standalone
attributes in the tag. Both encoding and standalone are optional.
The values passed to `handle_xml()' default to `None' and the
string `'no'' respectively.
`handle_doctype(tag, data)'
This method is called when the `' tag is processed.
The arguments are the name of the root element and the
uninterpreted contents of the tag, starting after the white space
after the name of the root element.
`handle_starttag(tag, method, attributes)'
This method is called to handle start tags for which a start tag
handler is defined in the instance variable `elements'. The TAG
argument is the name of the tag, and the METHOD argument is the
function (method) which should be used to support semantic
interpretation of the start tag. The ATTRIBUTES argument is a
dictionary of attributes, the key being the NAME and the value
being the VALUE of the attribute found inside the tag's `<>'
brackets. Character and entity references in the VALUE have been
interpreted. For instance, for the start tag `', this method would be called as
`handle_starttag('A', self.elements['A'][0], {'HREF':
'http://www.cwi.nl/'})'. The base implementation simply calls
METHOD with ATTRIBUTES as the only argument.
`handle_endtag(tag, method)'
This method is called to handle endtags for which an end tag
handler is defined in the instance variable `elements'. The TAG
argument is the name of the tag, and the METHOD argument is the
function (method) which should be used to support semantic
interpretation of the end tag. For instance, for the endtag
`', this method would be called as `handle_endtag('A',
self.elements['A'][1])'. The base implementation simply calls
METHOD.
`handle_data(data)'
This method is called to process arbitrary data. It is intended
to be overridden by a derived class; the base class implementation
does nothing.
`handle_charref(ref)'
This method is called to process a character reference of the form
`REF;'. REF can either be a decimal number, or a hexadecimal
number when preceded by an `x'. In the base implementation, REF
must be a number in the range 0-255. It translates the character
to ASCII and calls the method `handle_data()' with the character
as argument. If REF is invalid or out of range, the method
`unknown_charref(REF)' is called to handle the error. A subclass
must override this method to provide support for character
references outside of the ASCII range.
`handle_entityref(ref)'
This method is called to process a general entity reference of the
form `&REF;' where REF is an general entity reference. It looks
for REF in the instance (or class) variable `entitydefs' which
should be a mapping from entity names to corresponding
translations. If a translation is found, it calls the method
`handle_data()' with the translation; otherwise, it calls the
method `unknown_entityref(REF)'. The default `entitydefs' defines
translations for `&', `&apos', `>', `<', and `"'.
`handle_comment(comment)'
This method is called when a comment is encountered. The COMMENT
argument is a string containing the text between the `' delimiters, but not the delimiters themselves. For example,
the comment `' will cause this method to be called with
the argument `'text''. The default method does nothing.
`handle_cdata(data)'
This method is called when a CDATA element is encountered. The
DATA argument is a string containing the text between the
`' delimiters, but not the delimiters
themselves. For example, the entity `' will cause
this method to be called with the argument `'text''. The default
method does nothing, and is intended to be overridden.
`handle_proc(name, data)'
This method is called when a processing instruction (PI) is
encountered. The NAME is the PI target, and the DATA argument is
a string containing the text between the PI target and the closing
delimiter, but not the delimiter itself. For example, the
instruction `' will cause this method to be called
with the arguments `'XML'' and `'text''. The default method does
nothing. Note that if a document starts with `',
`handle_xml()' is called to handle it.
`handle_special(data)'
This method is called when a declaration is encountered. The DATA
argument is a string containing the text between the `'
delimiters, but not the delimiters themselves. For example, the
entity `' will cause this method to be called with
the argument `'ENTITY text''. The default method does nothing.
Note that `' is handled separately if it is located
at the start of the document.
`syntax_error(message)'
This method is called when a syntax error is encountered. The
MESSAGE is a description of what was wrong. The default method
raises a `RuntimeError' exception. If this method is overridden,
it is permissable for it to return. This method is only called
when the error can be recovered from. Unrecoverable errors raise
a `RuntimeError' without first calling `syntax_error()'.
`unknown_starttag(tag, attributes)'
This method is called to process an unknown start tag. It is
intended to be overridden by a derived class; the base class
implementation does nothing.
`unknown_endtag(tag)'
This method is called to process an unknown end tag. It is
intended to be overridden by a derived class; the base class
implementation does nothing.
`unknown_charref(ref)'
This method is called to process unresolvable numeric character
references. It is intended to be overridden by a derived class;
the base class implementation does nothing.
`unknown_entityref(ref)'
This method is called to process an unknown entity reference. It
is intended to be overridden by a derived class; the base class
implementation does nothing.
See also:
The Python XML Topic Guide provides a great deal of information on
using XML from Python and links to other sources of information on XML.
It's located on the Web at `http://www.python.org/topics/xml/'.
The Python XML Special Interest Group is developing substantial
support for processing XML from Python. See
`http://www.python.org/sigs/xml-sig/' for more information.
* Menu:
* XML Namespaces::
File: python-lib.info, Node: XML Namespaces, Prev: xmllib, Up: xmllib
XML Namespaces
--------------
This module has support for XML namespaces as defined in the XML
Namespaces proposed recommendation.
Tag and attribute names that are defined in an XML namespace are
handled as if the name of the tag or element consisted of the namespace
(i.e. the URL that defines the namespace) followed by a space and the
name of the tag or attribute. For instance, the tag `' is treated as if the tag name
was `'http://www.w3.org/TR/REC-html40 html'', and the tag `' inside the above mentioned element is treated
as if the tag name were `'http://www.w3.org/TR/REC-html40 a'' and the
attribute name as if it were `'http://www.w3.org/TR/REC-html40 src''.
An older draft of the XML Namespaces proposal is also recognized, but
triggers a warning.
File: python-lib.info, Node: formatter, Next: rfc822, Prev: xmllib, Up: Internet Data Handling
Generic output formatting
=========================
Generic output formatter and device interface.
This module supports two interface definitions, each with mulitple
implementations. The *formatter* interface is used by the `HTMLParser'
class of the `htmllib' module, and the *writer* interface is required
by the formatter interface.
Formatter objects transform an abstract flow of formatting events
into specific output events on writer objects. Formatters manage
several stack structures to allow various properties of a writer object
to be changed and restored; writers need not be able to handle relative
changes nor any sort of "change back" operation. Specific writer
properties which may be controlled via formatter objects are horizontal
alignment, font, and left margin indentations. A mechanism is provided
which supports providing arbitrary, non-exclusive style settings to a
writer as well. Additional interfaces facilitate formatting events
which are not reversible, such as paragraph separation.
Writer objects encapsulate device interfaces. Abstract devices, such
as file formats, are supported as well as physical devices. The
provided implementations all work with abstract devices. The interface
makes available mechanisms for setting the properties which formatter
objects manage and inserting data into the output.
* Menu:
* Formatter Interface::
* Formatter Implementations::
* Writer Interface::
* Writer Implementations::
File: python-lib.info, Node: Formatter Interface, Next: Formatter Implementations, Prev: formatter, Up: formatter
The Formatter Interface
-----------------------
Interfaces to create formatters are dependent on the specific
formatter class being instantiated. The interfaces described below are
the required interfaces which all formatters must support once
initialized.
One data element is defined at the module level:
`AS_IS'
Value which can be used in the font specification passed to the
`push_font()' method described below, or as the new value to any
other `push_PROPERTY()' method. Pushing the `AS_IS' value allows
the corresponding `pop_PROPERTY()' method to be called without
having to track whether the property was changed.
The following attributes are defined for formatter instance objects:
`writer'
The writer instance with which the formatter interacts.
`end_paragraph(blanklines)'
Close any open paragraphs and insert at least BLANKLINES before
the next paragraph.
`add_line_break()'
Add a hard line break if one does not already exist. This does not
break the logical paragraph.
`add_hor_rule(*args, **kw)'
Insert a horizontal rule in the output. A hard break is inserted
if there is data in the current paragraph, but the logical
paragraph is not broken. The arguments and keywords are passed on
to the writer's `send_line_break()' method.
`add_flowing_data(data)'
Provide data which should be formatted with collapsed whitespaces.
Whitespace from preceeding and successive calls to
`add_flowing_data()' is considered as well when the whitespace
collapse is performed. The data which is passed to this method is
expected to be word-wrapped by the output device. Note that any
word-wrapping still must be performed by the writer object due to
the need to rely on device and font information.
`add_literal_data(data)'
Provide data which should be passed to the writer unchanged.
Whitespace, including newline and tab characters, are considered
legal in the value of DATA.
`add_label_data(format, counter)'
Insert a label which should be placed to the left of the current
left margin. This should be used for constructing bulleted or
numbered lists. If the FORMAT value is a string, it is
interpreted as a format specification for COUNTER, which should be
an integer. The result of this formatting becomes the value of
the label; if FORMAT is not a string it is used as the label value
directly. The label value is passed as the only argument to the
writer's `send_label_data()' method. Interpretation of non-string
label values is dependent on the associated writer.
Format specifications are strings which, in combination with a
counter value, are used to compute label values. Each character
in the format string is copied to the label value, with some
characters recognized to indicate a transform on the counter
value. Specifically, the character `1' represents the counter
value formatter as an arabic number, the characters `A' and `a'
represent alphabetic representations of the counter value in upper
and lower case, respectively, and `I' and `i' represent the
counter value in Roman numerals, in upper and lower case. Note
that the alphabetic and roman transforms require that the counter
value be greater than zero.
`flush_softspace()'
Send any pending whitespace buffered from a previous call to
`add_flowing_data()' to the associated writer object. This should
be called before any direct manipulation of the writer object.
`push_alignment(align)'
Push a new alignment setting onto the alignment stack. This may be
`AS_IS' if no change is desired. If the alignment value is
changed from the previous setting, the writer's `new_alignment()'
method is called with the ALIGN value.
`pop_alignment()'
Restore the previous alignment.
`push_font(`('size, italic, bold, teletype`)')'
Change some or all font properties of the writer object.
Properties which are not set to `AS_IS' are set to the values
passed in while others are maintained at their current settings.
The writer's `new_font()' method is called with the fully resolved
font specification.
`pop_font()'
Restore the previous font.
`push_margin(margin)'
Increase the number of left margin indentations by one, associating
the logical tag MARGIN with the new indentation. The initial
margin level is `0'. Changed values of the logical tag must be
true values; false values other than `AS_IS' are not sufficient to
change the margin.
`pop_margin()'
Restore the previous margin.
`push_style(*styles)'
Push any number of arbitrary style specifications. All styles are
pushed onto the styles stack in order. A tuple representing the
entire stack, including `AS_IS' values, is passed to the writer's
`new_styles()' method.
`pop_style([n` = 1'])'
Pop the last N style specifications passed to `push_style()'. A
tuple representing the revised stack, including `AS_IS' values, is
passed to the writer's `new_styles()' method.
`set_spacing(spacing)'
Set the spacing style for the writer.
`assert_line_data([flag` = 1'])'
Inform the formatter that data has been added to the current
paragraph out-of-band. This should be used when the writer has
been manipulated directly. The optional FLAG argument can be set
to false if the writer manipulations produced a hard line break at
the end of the output.
File: python-lib.info, Node: Formatter Implementations, Next: Writer Interface, Prev: Formatter Interface, Up: formatter
Formatter Implementations
-------------------------
Two implementations of formatter objects are provided by this module.
Most applications may use one of these classes without modification or
subclassing.
`NullFormatter([writer])'
A formatter which does nothing. If WRITER is omitted, a
`NullWriter' instance is created. No methods of the writer are
called by `NullFormatter' instances. Implementations should
inherit from this class if implementing a writer interface but
don't need to inherit any implementation.
`AbstractFormatter(writer)'
The standard formatter. This implementation has demonstrated wide
applicability to many writers, and may be used directly in most
circumstances. It has been used to implement a full-featured
world-wide web browser.
File: python-lib.info, Node: Writer Interface, Next: Writer Implementations, Prev: Formatter Implementations, Up: formatter
The Writer Interface
--------------------
Interfaces to create writers are dependent on the specific writer
class being instantiated. The interfaces described below are the
required interfaces which all writers must support once initialized.
Note that while most applications can use the `AbstractFormatter' class
as a formatter, the writer must typically be provided by the
application.
`flush()'
Flush any buffered output or device control events.
`new_alignment(align)'
Set the alignment style. The ALIGN value can be any object, but
by convention is a string or `None', where `None' indicates that
the writer's "preferred" alignment should be used. Conventional
ALIGN values are `'left'', `'center'', `'right'', and `'justify''.
`new_font(font)'
Set the font style. The value of FONT will be `None', indicating
that the device's default font should be used, or a tuple of the
form `('SIZE, ITALIC, BOLD, TELETYPE`)'. Size will be a string
indicating the size of font that should be used; specific strings
and their interpretation must be defined by the application. The
ITALIC, BOLD, and TELETYPE values are boolean indicators
specifying which of those font attributes should be used.
`new_margin(margin, level)'
Set the margin level to the integer LEVEL and the logical tag to
MARGIN. Interpretation of the logical tag is at the writer's
discretion; the only restriction on the value of the logical tag
is that it not be a false value for non-zero values of LEVEL.
`new_spacing(spacing)'
Set the spacing style to SPACING.
`new_styles(styles)'
Set additional styles. The STYLES value is a tuple of arbitrary
values; the value `AS_IS' should be ignored. The STYLES tuple may
be interpreted either as a set or as a stack depending on the
requirements of the application and writer implementation.
`send_line_break()'
Break the current line.
`send_paragraph(blankline)'
Produce a paragraph separation of at least BLANKLINE blank lines,
or the equivelent. The BLANKLINE value will be an integer. Note
that the implementation will receive a call to `send_line_break()'
before this call if a line break is needed; this method should not
include ending the last line of the paragraph. It is only
responsible for vertical spacing between paragraphs.
`send_hor_rule(*args, **kw)'
Display a horizontal rule on the output device. The arguments to
this method are entirely application- and writer-specific, and
should be interpreted with care. The method implementation may
assume that a line break has already been issued via
`send_line_break()'.
`send_flowing_data(data)'
Output character data which may be word-wrapped and re-flowed as
needed. Within any sequence of calls to this method, the writer
may assume that spans of multiple whitespace characters have been
collapsed to single space characters.
`send_literal_data(data)'
Output character data which has already been formatted for
display. Generally, this should be interpreted to mean that line
breaks indicated by newline characters should be preserved and no
new line breaks should be introduced. The data may contain
embedded newline and tab characters, unlike data provided to the
`send_formatted_data()' interface.
`send_label_data(data)'
Set DATA to the left of the current left margin, if possible. The
value of DATA is not restricted; treatment of non-string values is
entirely application- and writer-dependent. This method will only
be called at the beginning of a line.
File: python-lib.info, Node: Writer Implementations, Prev: Writer Interface, Up: formatter
Writer Implementations
----------------------
Three implementations of the writer object interface are provided as
examples by this module. Most applications will need to derive new
writer classes from the `NullWriter' class.
`NullWriter()'
A writer which only provides the interface definition; no actions
are taken on any methods. This should be the base class for all
writers which do not need to inherit any implementation methods.
`AbstractWriter()'
A writer which can be used in debugging formatters, but not much
else. Each method simply announces itself by printing its name and
arguments on standard output.
`DumbWriter([file[, maxcol` = 72']])'
Simple writer class which writes output on the file object passed
in as FILE or, if FILE is omitted, on standard output. The output
is simply word-wrapped to the number of columns specified by
MAXCOL. This class is suitable for reflowing a sequence of
paragraphs.
File: python-lib.info, Node: rfc822, Next: mimetools, Prev: formatter, Up: Internet Data Handling
Parse RFC 822 mail headers
==========================
Parse RFC 822 style mail headers.
This module defines a class, `Message', which represents a
collection of "email headers" as defined by the Internet standard RFC
822. It is used in various contexts, usually to read such headers from
a file. This module also defines a helper class `AddressList' for
parsing RFC 822 addresses.
Note that there's a separate module to read UNIX, MH, and MMDF style
mailbox files: `mailbox'.
`Message(file[, seekable])'
A `Message' instance is instantiated with an input object as
parameter. Message relies only on the input object having a
`readline()' method; in particular, ordinary file objects qualify.
Instantiation reads headers from the input object up to a
delimiter line (normally a blank line) and stores them in the
instance.
This class can work with any input object that supports a
`readline()' method. If the input object has seek and tell
capability, the `rewindbody()' method will work; also, illegal
lines will be pushed back onto the input stream. If the input
object lacks seek but has an `unread()' method that can push back a
line of input, `Message' will use that to push back illegal lines.
Thus this class can be used to parse messages coming from a
buffered stream.
The optional SEEKABLE argument is provided as a workaround for
certain stdio libraries in which `tell()' discards buffered data
before discovering that the `lseek()' system call doesn't work.
For maximum portability, you should set the seekable argument to
zero to prevent that initial `tell()' when passing in an
unseekable object such as a a file object created from a socket
object.
Input lines as read from the file may either be terminated by
CR-LF or by a single linefeed; a terminating CR-LF is replaced by
a single linefeed before the line is stored.
All header matching is done independent of upper or lower case;
e.g. `M['From']', `M['from']' and `M['FROM']' all yield the same
result.
`AddressList(field)'
You may instantiate the `AddressList' helper class using a single
string parameter, a comma-separated list of RFC 822 addresses to be
parsed. (The parameter `None' yields an empty list.)
`parsedate(date)'
Attempts to parse a date according to the rules in RFC 822.
however, some mailers don't follow that format as specified, so
`parsedate()' tries to guess correctly in such cases. DATE is a
string containing an RFC 822 date, such as `'Mon, 20 Nov 1995
19:12:08 -0500''. If it succeeds in parsing the date,
`parsedate()' returns a 9-tuple that can be passed directly to
`time.mktime()'; otherwise `None' will be returned.
`parsedate_tz(date)'
Performs the same function as `parsedate()', but returns either
`None' or a 10-tuple; the first 9 elements make up a tuple that
can be passed directly to `time.mktime()', and the tenth is the
offset of the date's timezone from UTC (which is the official term
for Greenwich Mean Time). (Note that the sign of the timezone
offset is the opposite of the sign of the `time.timezone' variable
for the same timezone; the latter variable follows the POSIX
standard while this module follows RFC 822.) If the input string
has no timezone, the last element of the tuple returned is `None'.
`mktime_tz(tuple)'
Turn a 10-tuple as returned by `parsedate_tz()' into a UTC
timestamp. It the timezone item in the tuple is `None', assume
local time. Minor deficiency: this first interprets the first 8
elements as a local time and then compensates for the timezone
difference; this may yield a slight error around daylight savings
time switch dates. Not enough to worry about for common use.
* Menu:
* Message Objects::
* AddressList Objects::
File: python-lib.info, Node: Message Objects, Next: AddressList Objects, Prev: rfc822, Up: rfc822
Message Objects
---------------
A `Message' instance has the following methods:
`rewindbody()'
Seek to the start of the message body. This only works if the file
object is seekable.
`isheader(line)'
Returns a line's canonicalized fieldname (the dictionary key that
will be used to index it) if the line is a legal RFC 822 header;
otherwise returns None (implying that parsing should stop here and
the line be pushed back on the input stream). It is sometimes
useful to override this method in a subclass.
`islast(line)'
Return true if the given line is a delimiter on which Message
should stop. The delimiter line is consumed, and the file
object's read location positioned immediately after it. By
default this method just checks that the line is blank, but you
can override it in a subclass.
`iscomment(line)'
Return true if the given line should be ignored entirely, just
skipped. By default this is a stub that always returns false, but
you can override it in a subclass.
`getallmatchingheaders(name)'
Return a list of lines consisting of all headers matching NAME, if
any. Each physical line, whether it is a continuation line or
not, is a separate list item. Return the empty list if no header
matches NAME.
`getfirstmatchingheader(name)'
Return a list of lines comprising the first header matching NAME,
and its continuation line(s), if any. Return `None' if there is
no header matching NAME.
`getrawheader(name)'
Return a single string consisting of the text after the colon in
the first header matching NAME. This includes leading whitespace,
the trailing linefeed, and internal linefeeds and whitespace if
there any continuation line(s) were present. Return `None' if
there is no header matching NAME.
`getheader(name[, default])'
Like `getrawheader(NAME)', but strip leading and trailing
whitespace. Internal whitespace is not stripped. The optional
DEFAULT argument can be used to specify a different default to be
returned when there is no header matching NAME.
`get(name[, default])'
An alias for `getheader()', to make the interface more compatible
with regular dictionaries.
`getaddr(name)'
Return a pair `(FULL NAME, EMAIL ADDRESS)' parsed from the string
returned by `getheader(NAME)'. If no header matching NAME exists,
return `(None, None)'; otherwise both the full name and the
address are (possibly empty) strings.
Example: If M's first `From' header contains the string
`'jack@cwi.nl (Jack Jansen)'', then `m.getaddr('From')' will yield
the pair `('Jack Jansen', 'jack@cwi.nl')'. If the header contained
`'Jack Jansen '' instead, it would yield the exact
same result.
`getaddrlist(name)'
This is similar to `getaddr(LIST)', but parses a header containing
a list of email addresses (e.g. a `To' header) and returns a list
of `(FULL NAME, EMAIL ADDRESS)' pairs (even if there was only one
address in the header). If there is no header matching NAME,
return an empty list.
If multiple headers exist that match the named header (e.g. if
there are several `Cc' headers), all are parsed for addresses. Any
continuation lines the named headers contain are also parsed.
`getdate(name)'
Retrieve a header using `getheader()' and parse it into a 9-tuple
compatible with `time.mktime()'. If there is no header matching
NAME, or it is unparsable, return `None'.
Date parsing appears to be a black art, and not all mailers adhere
to the standard. While it has been tested and found correct on a
large collection of email from many sources, it is still possible
that this function may occasionally yield an incorrect result.
`getdate_tz(name)'
Retrieve a header using `getheader()' and parse it into a
10-tuple; the first 9 elements will make a tuple compatible with
`time.mktime()', and the 10th is a number giving the offset of the
date's timezone from UTC. Similarly to `getdate()', if there is
no header matching NAME, or it is unparsable, return `None'.
`Message' instances also support a read-only mapping interface. In
particular: `M[name]' is like `M.getheader(name)' but raises `KeyError'
if there is no matching header; and `len(M)', `M.has_key(name)',
`M.keys()', `M.values()' and `M.items()' act as expected (and
consistently).
Finally, `Message' instances have two public instance variables:
`headers'
A list containing the entire set of header lines, in the order in
which they were read (except that setitem calls may disturb this
order). Each line contains a trailing newline. The blank line
terminating the headers is not contained in the list.
`fp'
The file or file-like object passed at instantiation time. This
can be used to read the message content.
File: python-lib.info, Node: AddressList Objects, Prev: Message Objects, Up: rfc822
AddressList Objects
-------------------
An `AddressList' instance has the following methods:
`__len__(name)'
Return the number of addresses in the address list.
`__str__(name)'
Return a canonicalized string representation of the address list.
Addresses are rendered in "name" form,
comma-separated.
`__add__(name)'
Return an `AddressList' instance that contains all addresses in
both `AddressList' operands, with duplicates removed (set union).
`__sub__(name)'
Return an `AddressList' instance that contains every address in the
left-hand `AddressList' operand that is not present in the
right-hand address operand (set difference).
Finally, `AddressList' instances have one public instance variable:
`addresslist'
A list of tuple string pairs, one per address. In each member, the
first is the canonicalized name part of the address, the second is
the route-address (@-separated host-domain pair).
File: python-lib.info, Node: mimetools, Next: MimeWriter, Prev: rfc822, Up: Internet Data Handling
Tools for parsing MIME messages
===============================
Tools for parsing MIME style message bodies.
This module defines a subclass of the `rfc822.Message' class and a
number of utility functions that are useful for the manipulation for
MIME multipart or encoded message.
It defines the following items:
`Message(fp[, seekable])'
Return a new instance of the `Message' class. This is a subclass
of the `rfc822.Message' class, with some additional methods (see
below). The SEEKABLE argument has the same meaning as for
`rfc822.Message'.
`choose_boundary()'
Return a unique string that has a high likelihood of being usable
as a part boundary. The string has the form
`'HOSTIPADDR.UID.PID.TIMESTAMP.RANDOM''.
`decode(input, output, encoding)'
Read data encoded using the allowed MIME ENCODING from open file
object INPUT and write the decoded data to open file object
OUTPUT. Valid values for ENCODING include `'base64'',
`'quoted-printable'' and `'uuencode''.
`encode(input, output, encoding)'
Read data from open file object INPUT and write it encoded using
the allowed MIME ENCODING to open file object OUTPUT. Valid
values for ENCODING are the same as for `decode()'.
`copyliteral(input, output)'
Read lines until `EOF' from open file INPUT and write them to open
file OUTPUT.
`copybinary(input, output)'
Read blocks until `EOF' from open file INPUT and write them to
open file OUTPUT. The block size is currently fixed at 8192.
* Menu:
* mimetools.Message Methods::