This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: XML/PPMConfig,  Next: XML/PYX,  Prev: XML/PPD,  Up: Module List

PPMConfig file format and XML parsing elements
**********************************************

NAME
====

   XML::PPMConfig - PPMConfig file format and XML parsing elements

SYNOPSIS
========

     use XML::Parser;
     use XML::PPMConfig;

     $p = new XML::Parser( Style => 'Objects', Pkg => 'XML::PPMConfig' );
     ...

DESCRIPTION
===========

   This module provides a set of classes for parsing PPM configuration
files using the XML::Parser module.  All of the elements unique to a PPM
configuration file are derived from `XML::ValidatingElement'. There are
also several classes rebuilt here which are derived from elements in
`XML::PPD' as we can include a PPD file within our own INSTPPD element.

MAJOR ELEMENTS
==============

PPMCONFIG
---------

   Defines a PPM configuration file.  The root of a PPMConfig document is
always a PPMCONFIG element.

PACKAGE
-------

   Child of PPMCONFIG, used to describe a Perl Package which has already
been installed.  Multiple instances are valid.  The PACKAGE element allows
for the following attributes:

NAME
     Name of the package as given in it's PPD

MINOR ELEMENTS
==============

PPMVER
------

   Child of PPMCONFIG, used to state the version of PPM for which this
configuration file is valid.  A single instance should be present.

PLATFORM
--------

   Child of PPMCONFIG, used to specify the platform of the target machine.
A single instance should be present.  The PLATFORM element allows for the
following attributes:

OSVALUE
     Description of the local operating system as defined in the Config.pm
     file under 'osname'.

OSVERSION
     Version of the local operating system.

CPU
     Description of the CPU in the local system.  The following list of
     possible values was taken from the OSD Specification:

          x86 mips alpha ppc sparc 680x0

LANGUAGE
     Description of the language used on the local system as specified by
     the language codes in ISO 639.

REPOSITORY
----------

   Child of PPMCONFIG, used to specify a repository where Perl Packages
can be found.  Multiple instances are valid.  The REPOSITORY element
allows for the following attributes:

NAME
     Name by which the repository will be known (e.g.  "ActiveState").

LOCATION
     An URL or directory where the repository can be found.

USERNAME
     Optional username for a repository requiring authenticated connection.

PASSWORD
     Optional password for a repository requiring authenticated connection.

SUMMARYFILE
     Optional package summary filename.

     If this file exists on the repository, its contents can be retrieved
     using PPM::RepositorySummary().  The contents are not strictly
     enforced by PPM.pm, however ppm.pl expects this to be a file with the
     following format (for display with the 'summary' command):

     Agent [2.91]:   supplies agentspace methods for perl5.
     Apache-OutputChain [0.06]:      chain stacked Perl handlers [etc.]

OPTIONS
-------

   Child of PPMCONFIG, used to specify the current configuration options
for PPM.  A single instance should be present.  The OPTIONS element allows
for the following attributes:

IGNORECASE
     Sets case-sensitive searching.  Can be either '1' or '0'.

CLEAN
     Sets removal of temporarily files.  Can be either '1' or '0'.

CONFIRM
     Sets confirmation of all installs/removals/upgrades.  Can be either
     '1' or '0'.

BUILDDIR
     Directory in which packages will be unpacked before their
     installation.

ROOT
     Directory under which packages should be installed on the local
     system.

TRACE
     Level of tracing (0 is no tracing, 4 is max tracing).

TRACEFILE
     File to which trace information will be written.

VERBOSE
     Controls whether query and search results are verbose (1 == verbose,
     0 == no).

PPMPRECIOUS
-----------

   Child of PPMCONFIG, used to specify the modules which PPM itself is
dependant upon.  A single instance should be present.

LOCATION
--------

   Child of PACKAGE, used to specify locations at which to search for
updated versions of the PPD file for this package.  Its value can be
either a directory or an Internet address.  A single instance should be
present.

INSTDATE
--------

   Child of PACKAGE, used to specify the date on which the Perl Package was
installed.  A single instance should be present.

INSTROOT
--------

   Child of PACKAGE, used to specify the root directory that the Perl
Package was installed into.  A single instance should be present.

INSTPACKLIST
------------

   Child of PACKAGE, used to specify a reference to the packlist for this
Perl Package; a file containing a list of all of the files which were
installed.  A single instance should be present.

INSTPPD
-------

   Child of PACKAGE, used to hold a copy of the PPD from which Perl
Packages were installed.  Multiple instances are valid.

DOCUMENT TYPE DEFINITION
========================

   The DTD for PPMConfig documents is available from the ActiveState
website and the latest version can be found at:
http://www.ActiveState.com/PPM/DTD/ppmconfig.dtd

   This revision of the `XML::PPMConfig' module implements the following
DTD:

     <!ELEMENT PPMCONFIG (PPMVER | PLATFORM | REPOSITORY | OPTIONS |
                          PPMPRECIOUS | PACKAGE)*>

     <!ELEMENT PPMVER   (#PCDATA)>

     <!ELEMENT PLATFORM  EMPTY>
     <!ATTLIST PLATFORM  OSVALUE     CDATA   #REQUIRED
                         OSVERSION   CDATA   #REQUIRED
                         CPU         CDATA   #REQUIRED
                         LANGUAGE    CDATA   #IMPLIED>

     <!ELEMENT REPOSITORY    EMPTY>
     <!ATTLIST REPOSITORY    NAME     CDATA  #REQUIRED
                             LOCATION CDATA  #REQUIRED
                             USERNAME CDATA  #IMPLIED
                             PASSWORD CDATA  #IMPLIED
                             SUMMARYFILE CDATA #IMPLIED>

     <!ELEMENT OPTIONS   EMPTY>
     <!ATTLIST OPTIONS   IGNORECASE      CDATA   #REQUIRED
                         CLEAN           CDATA   #REQUIRED
                         CONFIRM         CDATA   #REQUIRED
                         FORCEINSTALL    CDATA   #REQUIRED
                         ROOT            CDATA   #REQUIRED
                         BUILDDIR        CDATA   #REQUIRED
                         MORE            CDATA   #REQUIRED
                         TRACE           CDATA   #IMPLIED
                         TRACEFILE       CDATA   #IMPLIED>

     <!ELEMENT PPMPRECIOUS (#PCDATA)>

     <!ELEMENT PACKAGE   (LOCATION | INSTDATE | INSTROOT | INSTPACKLIST |
                          INSTPPD)*>
     <!ATTLIST PACKAGE   NAME    CDATA   #REQUIRED>

     <!ELEMENT LOCATION  (#PCDATA)>

     <!ELEMENT INSTDATE  (#PCDATA)>

     <!ELEMENT INSTROOT  (#PCDATA)>

     <!ELEMENT INSTPACKLIST (#PCDATA)>

     <!ELEMENT INSTPPD   (#PCDATA)>

SAMPLE PPMConfig FILE
=====================

   The following is a sample PPMConfig file.  Note that this may not be a
current description of this module and is for sample purposes only.

     <PPMCONFIG>
         <PPMVER>1,0,0,0</PPMVER>
         <PLATFORM CPU="x86" OSVALUE="MSWin32" OSVERSION="4,0,0,0" />
         <OPTIONS BUILDDIR="/tmp" CLEAN="1" CONFIRM="1" FORCEINSTALL="1"
                  IGNORECASE="0" MORE="0" ROOT="/usr/local" TRACE="0" TRACEFILE="" />
         <REPOSITORY LOCATION="http://www.ActiveState.com/packages"
                     NAME="ActiveState Package Repository" SUMMARYFILE="package.lst" />
         <PPMPRECIOUS>PPM;libnet;Archive-Tar;Compress-Zlib;libwww-perl</PPMPRECIOUS>
         <PACKAGE NAME="AtExit">
             <LOCATION>g:/packages</LOCATION>
             <INSTPACKLIST>c:/perllib/lib/site/MSWin32-x86/auto/AtExit/.packlist</INSTPACKLIST>
             <INSTROOT>c:/perllib</INSTROOT>
             <INSTDATE>Sun Mar  8 02:56:31 1998</INSTDATE>
             <INSTPPD>
                 <SOFTPKG NAME="AtExit" VERSION="1,02,0,0">
                     <TITLE>AtExit</TITLE>
                     <ABSTRACT>Register a subroutine to be invoked at program -exit time.</ABSTRACT>
                     <AUTHOR>Brad Appleton (Brad_Appleton-GBDA001@email.mot.com)</AUTHOR>
                     <IMPLEMENTATION>
                         <CODEBASE HREF="x86/AtExit.tar.gz" />
                     </IMPLEMENTATION>
                 </SOFTPKG>
             </INSTPPD>
         </PACKAGE>
     </PPMCONFIG>

KNOWN BUGS/ISSUES
=================

   Elements which are required to be empty (e.g. REPOSITORY) are not
enforced as such.

   Notations above about elements for which "only one instance" or
"multiple instances" are valid are not enforced; this primarily a
guideline for generating your own PPD files.

   Currently, this module creates new classes within it's own namespace
for all of the PPD elements which can be contained within the INSTPPD
element.  A suitable method for importing the entire XML::PPD:: namespace
should be found in order to make this cleaner.

AUTHORS
=======

   Graham TerMarsch <grahamt@ActiveState.com>

   Murray Nesbitt <murrayn@ActiveState.com>

   Dick Hardt <dick_hardt@ActiveState.com>

HISTORY
=======

   v0.1 - Initial release

SEE ALSO
========

   `XML::ValidatingElement' in this node, *Note XML/Parser: XML/Parser,,
*Note XML/PPD: XML/PPD, .


File: pm.info,  Node: XML/PYX,  Next: XML/Parser,  Prev: XML/PPMConfig,  Up: Module List

XML to PYX generator
********************

NAME
====

   XML::PYX - XML to PYX generator

SYNOPSIS
========

     use XML::PYX;
     my $parser = XML::PYX::Parser->new;
     my $string = $parser->parsefile($filename);

DESCRIPTION
===========

   After reading about PYX on XML.com, I thought it was a pretty cool idea,
so I built this, to generate PYX from XML using perl. See
http://www.xml.com/pub/2000/03/15/feature/index.html for an excellent
introduction.

   The package contains 2 usable packages, and 3 utilities that are
probably currently more use than the module:

     pyx - a XML to PYX converter using XML::Parser
     pyxv - a Validating XML to PYX converter using XML::Checker::Parser
     pyxw - a PYX to XML converter
     pyxhtml - an HTML to PYX converter using HTML::TreeBuilder

   All these utilities can be pipelined together, so you can have:

     pyx test.xml | grep -v "^-" | pyxw > new.xml

   Which should remove all text from an XML file (leaving only tags).

   The 2 packages are XML::PYX::Parser and XML::PYX::Parser::ToCSF. The
former is a direct subclass of XML::Parser that simply returns a PYX
string on a call to parse or parsefile. The latter stands for *To
Currently Selected Filehandle*. Instead of returning a string, it sends
output directly to the currently selected filehandle. This is much better
for pipelined utilities for obvious reasons.

   There's a special variable: $XML::PYX::Lame. Set it to 1 to use a "Lame"
parser that simply uses regexps. This is useful, for example, if you are
changing the input to invalid XML for some reason. You can then use
$XML::PYX::Lame = 1 to enable the non-xml parser. It does check for some
things, like balanced tags, but otherwise it's pretty lame :)

   Lame mode is enabled for pyx and pyxw with the -l option.

AUTHOR
======

   Matt Sergeant, matt@sergeant.org


File: pm.info,  Node: XML/Parser,  Next: XML/Parser/EasyTree,  Prev: XML/PYX,  Up: Module List

A perl module for parsing XML documents
***************************************

NAME
====

   XML::Parser - A perl module for parsing XML documents

SYNOPSIS
========

     use XML::Parser;
     
     $p1 = new XML::Parser(Style => 'Debug');
     $p1->parsefile('REC-xml-19980210.xml');
     $p1->parse('<foo id="me">Hello World</foo>');

     # Alternative
     $p2 = new XML::Parser(Handlers => {Start => \&handle_start,
     				     End   => \&handle_end,
     				     Char  => \&handle_char});
     $p2->parse($socket);

     # Another alternative
     $p3 = new XML::Parser(ErrorContext => 2);

     $p3->setHandlers(Char    => \&text,
     		   Default => \&other);

     open(FOO, 'xmlgenerator |');
     $p3->parse(*FOO, ProtocolEncoding => 'ISO-8859-1');
     close(FOO);

     $p3->parsefile('junk.xml', ErrorContext => 3);

DESCRIPTION
===========

   This module provides ways to parse XML documents. It is built on top of
*Note XML/Parser/Expat: XML/Parser/Expat,, which is a lower level
interface to James Clark's expat library. Each call to one of the parsing
methods creates a new instance of XML::Parser::Expat which is then used to
parse the document.  Expat options may be provided when the XML::Parser
object is created.  These options are then passed on to the Expat object
on each parse call.  They can also be given as extra arguments to the
parse methods, in which case they override options given at XML::Parser
creation time.

   The behavior of the parser is controlled either by ``' in this node'
and/or ``' in this node' options, or by `' in this node method. These all
provide mechanisms for XML::Parser to set the handlers needed by
XML::Parser::Expat.  If neither Style nor Handlers are specified, then
parsing just checks the document for being well-formed.

   When underlying handlers get called, they receive as their first
parameter the *Expat* object, not the Parser object.

METHODS
=======

new
     This is a class method, the constructor for XML::Parser. Options are
     passed as keyword value pairs. Recognized options are:

        * Style

          This option provides an easy way to create a given style of
          parser. The built in styles are: `"Debug"' in this node,
          `"Subs"' in this node, `"Tree"' in this node, `"Objects"' in
          this node, and `"Stream"' in this node.  Custom styles can be
          provided by giving a full package name containing at least one
          '::'. This package should then have subs defined for each
          handler it wishes to have installed. See `"STYLES"' in this node
          below for a discussion of each built in style.

        * Handlers

          When provided, this option should be an anonymous hash
          containing as keys the type of handler and as values a sub
          reference to handle that type of event. All the handlers get
          passed as their 1st parameter the instance of expat that is
          parsing the document. Further details on handlers can be found
          in `"HANDLERS"' in this node. Any handler set here overrides the
          corresponding handler set with the Style option.

        * Pkg

          Some styles will refer to subs defined in this package. If not
          provided, it defaults to the package which called the
          constructor.

        * ErrorContext

          This is an Expat option. When this option is defined, errors are
          reported in context. The value should be the number of lines to
          show on either side of the line in which the error occurred.

        * ProtocolEncoding

          This is an Expat option. This sets the protocol encoding name.
          It defaults to none. The built-in encodings are: `UTF-8',
          `ISO-8859-1', `UTF-16', and `US-ASCII'. Other encodings may be
          used if they have encoding maps in one of the directories in the
          @Encoding_Path list. Check `"ENCODINGS"' in this node for more
          information on encoding maps. Setting the protocol encoding
          overrides any encoding in the XML declaration.

        * Namespaces

          This is an Expat option. If this is set to a true value, then
          namespace processing is done during the parse. See
          `"Namespaces"', *Note XML/Parser/Expat: XML/Parser/Expat, for
          further discussion of namespace processing.

        * NoExpand

          This is an Expat option. Normally, the parser will try to expand
          references to entities defined in the internal subset. If this
          option is set to a true value, and a default handler is also
          set, then the default handler will be called when an entity
          reference is seen in text. This has no effect if a default
          handler has not been registered, and it has no effect on the
          expansion of entity references inside attribute values.

        * Stream_Delimiter

          This is an Expat option. It takes a string value. When this
          string is found alone on a line while parsing from a stream,
          then the parse is ended as if it saw an end of file. The
          intended use is with a stream of xml documents in a MIME
          multipart format. The string should not contain a trailing
          newline.

        * ParseParamEnt

          This is an Expat option. Unless standalone is set to "yes" in
          the XML declaration, setting this to a true value allows the
          external DTD to be read, and parameter entities to be parsed and
          expanded.

        * NoLWP

          This option has no effect if the ExternEnt or ExternEntFin
          handlers are directly set. Otherwise, if true, it forces the use
          of a file based external entity handler.

        * Non-Expat-Options

          If provided, this should be an anonymous hash whose keys are
          options that shouldn't be passed to Expat. This should only be
          of concern to those subclassing XML::Parser.

setHandlers(TYPE, HANDLER [, TYPE, HANDLER [...]])
     This method registers handlers for various parser events. It
     overrides any previous handlers registered through the Style or
     Handler options or through earlier calls to setHandlers. By providing
     a false or undefined value as the handler, the existing handler can
     be unset.

     This method returns a list of type, handler pairs corresponding to the
     input. The handlers returned are the ones that were in effect prior to
     the call.

     See a description of the handler types in `"HANDLERS"' in this node.

parse(SOURCE [, OPT => OPT_VALUE [...]])
     The SOURCE parameter should either be a string containing the whole
     XML document, or it should be an open IO::Handle. Constructor options
     to XML::Parser::Expat given as keyword-value pairs may follow the
     SOURCE parameter. These override, for this call, any options or
     attributes passed through from the XML::Parser instance.

     A die call is thrown if a parse error occurs. Otherwise it will
     return 1 or whatever is returned from the Final handler, if one is
     installed.  In other words, what parse may return depends on the
     style.

parsestring
     This is just an alias for parse for backwards compatibility.

parsefile(FILE [, OPT => OPT_VALUE [...]])
     Open FILE for reading, then call parse with the open handle. The file
     is closed no matter how parse returns. Returns what parse returns.

parse_start([ OPT => OPT_VALUE [...]])
     Create and return a new instance of XML::Parser::ExpatNB. Constructor
     options may be provided. If an init handler has been provided, it is
     called before returning the ExpatNB object. Documents are parsed by
     making incremental calls to the parse_more method of this object,
     which takes a string. A single call to the parse_done method of this
     object, which takes no arguments, indicates that the document is
     finished.

     If there is a final handler installed, it is executed by the
     parse_done method before returning and the parse_done method returns
     whatever is returned by the final handler.

HANDLERS
========

   Expat is an event based parser. As the parser recognizes parts of the
document (say the start or end tag for an XML element), then any handlers
registered for that type of an event are called with suitable parameters.
All handlers receive an instance of XML::Parser::Expat as their first
argument. See `"METHODS"', *Note XML/Parser/Expat: XML/Parser/Expat, for a
discussion of the methods that can be called on this object.

Init		(Expat)
-------------

   This is called just before the parsing of the document starts.

Final		(Expat)
--------------

   This is called just after parsing has finished, but only if no errors
occurred during the parse. Parse returns what this returns.

Start		(Expat, Element [, Attr, Val [,...]])
--------------------------------------------

   This event is generated when an XML start tag is recognized. Element is
the name of the XML element type that is opened with the start tag. The
Attr & Val pairs are generated for each attribute in the start tag.

End		(Expat, Element)
---------------------

   This event is generated when an XML end tag is recognized. Note that an
XML empty tag (<foo/>) generates both a start and an end event.

Char		(Expat, String)
---------------------

   This event is generated when non-markup is recognized. The non-markup
sequence of characters is in String. A single non-markup sequence of
characters may generate multiple calls to this handler. Whatever the
encoding of the string in the original document, this is given to the
handler in UTF-8.

Proc		(Expat, Target, Data)
---------------------------

   This event is generated when a processing instruction is recognized.

Comment		(Expat, Data)
----------------------

   This event is generated when a comment is recognized.

CdataStart	(Expat)
------------------

   This is called at the start of a CDATA section.

CdataEnd		(Expat)
-----------------

   This is called at the end of a CDATA section.

Default		(Expat, String)
------------------------

   This is called for any characters that don't have a registered handler.
This includes both characters that are part of markup for which no events
are generated (markup declarations) and characters that could generate
events, but for which no handler has been registered.

   Whatever the encoding in the original document, the string is returned
to the handler in UTF-8.

Unparsed		(Expat, Entity, Base, Sysid, Pubid, Notation)
-------------------------------------------------------

   This is called for a declaration of an unparsed entity. Entity is the
name of the entity. Base is the base to be used for resolving a relative
URI.  Sysid is the system id. Pubid is the public id. Notation is the
notation name. Base and Pubid may be undefined.

Notation		(Expat, Notation, Base, Sysid, Pubid)
-----------------------------------------------

   This is called for a declaration of notation. Notation is the notation
name.  Base is the base to be used for resolving a relative URI. Sysid is
the system id. Pubid is the public id. Base, Sysid, and Pubid may all be
undefined.

ExternEnt	(Expat, Base, Sysid, Pubid)
-------------------------------------

   This is called when an external entity is referenced. Base is the base
to be used for resolving a relative URI. Sysid is the system id. Pubid is
the public id. Base, and Pubid may be undefined.

   This handler should either return a string, which represents the
contents of the external entity, or return an open filehandle that can be
read to obtain the contents of the external entity, or return undef, which
indicates the external entity couldn't be found and will generate a parse
error.

   If an open filehandle is returned, it must be returned as either a glob
(*FOO) or as a reference to a glob (e.g. an instance of IO::Handle).

   A default handler is installed for this event. The default handler is
XML::Parser::lwp_ext_ent_handler unless the NoLWP option was provided with
a true value, otherwise XML::Parser::file_ext_ent_handler is the default
handler for external entities. Even without the NoLWP option, if the URI
or LWP modules are missing, the file based handler ends up being used
after giving a warning on the first external entity reference.

   The LWP external entity handler will use proxies defined in the
environment (http_proxy, ftp_proxy, etc.).

   Please note that the LWP external entity handler reads the entire
entity into a string and returns it, where as the file handler opens a
filehandle.

   Also note that the file external entity handler will likely choke on
absolute URIs or file names that don't fit the conventions of the local
operating system.

   The expat base method can be used to set a basename for relative
pathnames. If no basename is given, or if the basename is itself a
relative name, then it is relative to the current working directory.

ExternEntFin	(Expat)
--------------------

   This is called after parsing an external entity. It's not called unless
an ExternEnt handler is also set. There is a default handler installed
that pairs with the default ExternEnt handler.

   If you're going to install your own ExternEnt handler, then you should
set (or unset) this handler too.

Entity		(Expat, Name, Val, Sysid, Pubid, Ndata, IsParam)
--------------------------------------------------------

   This is called when an entity is declared. For internal entities, the
Val parameter will contain the value and the remaining three parameters
will be undefined. For external entities, the Val parameter will be
undefined, the Sysid parameter will have the system id, the Pubid
parameter will have the public id if it was provided (it will be undefined
otherwise), the Ndata parameter will contain the notation for unparsed
entities. If this is a parameter entity declaration, then the IsParam
parameter is true.

   Note that this handler and the Unparsed handler above overlap. If both
are set, then this handler will not be called for unparsed entities.

Element		(Expat, Name, Model)
-----------------------------

   The element handler is called when an element declaration is found. Name
is the element name, and Model is the content model as an
XML::Parser::Content object. See `"XML::Parser::ContentModel Methods"',
*Note XML/Parser/Expat: XML/Parser/Expat, for methods available for this
class.

Attlist		(Expat, Elname, Attname, Type, Default, Fixed)
-------------------------------------------------------

   This handler is called for each attribute in an ATTLIST declaration.
So an ATTLIST declaration that has multiple attributes will generate
multiple calls to this handler. The Elname parameter is the name of the
element with which the attribute is being associated. The Attname
parameter is the name of the attribute. Type is the attribute type, given
as a string. Default is the default value, which will either be
"#REQUIRED", "#IMPLIED" or a quoted string (i.e. the returned string will
begin and end with a quote character).  If Fixed is true, then this is a
fixed attribute.

Doctype		(Expat, Name, Sysid, Pubid, Internal)
----------------------------------------------

   This handler is called for DOCTYPE declarations. Name is the document
type name. Sysid is the system id of the document type, if it was provided,
otherwise it's undefined. Pubid is the public id of the document type,
which will be undefined if no public id was given. Internal is the internal
subset, given as a string. If there was no internal subset, it will be
undefined. Internal will contain all whitespace, comments, processing
instructions, and declarations seen in the internal subset. The
declarations will be there whether or not they have been processed by
another handler (except for unparsed entities processed by the Unparsed
handler). However, comments and processing instructions will not appear if
they've been processed by their respective handlers.

* DoctypeFin		(Parser)
----------------------

   This handler is called after parsing of the DOCTYPE declaration has
finished, including any internal or external DTD declarations.

XMLDecl		(Expat, Version, Encoding, Standalone)
-----------------------------------------------

   This handler is called for xml declarations. Version is a string
containg the version. Encoding is either undefined or contains an encoding
string.  Standalone will be either true, false, or undefined if the
standalone attribute is yes, no, or not made respectively.

STYLES
======

Debug
-----

   This just prints out the document in outline form. Nothing special is
returned by parse.

Subs
----

   Each time an element starts, a sub by that name in the package specified
by the Pkg option is called with the same parameters that the Start
handler gets called with.

   Each time an element ends, a sub with that name appended with an
underscore ("_"), is called with the same parameters that the End handler
gets called with.

   Nothing special is returned by parse.

Tree
----

   Parse will return a parse tree for the document. Each node in the tree
takes the form of a tag, content pair. Text nodes are represented with a
pseudo-tag of "0" and the string that is their content. For elements, the
content is an array reference. The first item in the array is a (possibly
empty) hash reference containing attributes. The remainder of the array is
a sequence of tag-content pairs representing the content of the element.

   So for example the result of parsing:

     <foo><head id="a">Hello <em>there</em></head><bar>Howdy<ref/></bar>do</foo>

   would be:              Tag   Content
==================================================================   [foo,
[{}, head, [{id => "a"}, 0, "Hello ",  em, [{}, 0, "there"]],
bar, [         {}, 0, "Howdy",  ref, [{}]], 	        0, "do" 	]   ]

   The root document "foo", has 3 children: a "head" element, a "bar"
element and the text "do". After the empty attribute hash, these are
represented in it's contents by 3 tag-content pairs.

Objects
-------

   This is similar to the Tree style, except that a hash object is created
for each element. The corresponding object will be in the class whose name
is created by appending "::" and the element name to the package set with
the Pkg option. Non-markup text will be in the ::Characters class. The
contents of the corresponding object will be in an anonymous array that is
the value of the Kids property for that object.

Stream
------

   This style also uses the Pkg package. If none of the subs that this
style looks for is there, then the effect of parsing with this style is to
print a canonical copy of the document without comments or declarations.
All the subs receive as their 1st parameter the Expat instance for the
document they're parsing.

   It looks for the following routines:

   * StartDocument

     Called at the start of the parse .

   * StartTag

     Called for every start tag with a second parameter of the element
     type. The $_ variable will contain a copy of the tag and the %_
     variable will contain attribute values supplied for that element.

   * EndTag

     Called for every end tag with a second parameter of the element type.
     The $_ variable will contain a copy of the end tag.

   * Text

     Called just before start or end tags with accumulated non-markup text
     in the $_ variable.

   * PI

     Called for processing instructions. The $_ variable will contain a
     copy of the PI and the target and data are sent as 2nd and 3rd
     parameters respectively.

   * EndDocument

     Called at conclusion of the parse.

ENCODINGS
=========

   XML documents may be encoded in character sets other than Unicode as
long as they may be mapped into the Unicode character set. Expat has
further restrictions on encodings. Read the xmlparse.h header file in the
expat distribution to see details on these restrictions.

   Expat has built-in encodings for: `UTF-8', `ISO-8859-1', `UTF-16', and
`US-ASCII'. Encodings are set either through the XML declaration encoding
attribute or through the ProtocolEncoding option to XML::Parser or
XML::Parser::Expat.

   For encodings other than the built-ins, expat calls the function
load_encoding in the Expat package with the encoding name. This function
looks for a file in the path list @XML::Parser::Expat::Encoding_Path, that
matches the lower-cased name with a '.enc' extension. The first one it
finds, it loads.

   If you wish to build your own encoding maps, check out the XML::Encoding
module from CPAN.

AUTHORS
=======

   Larry Wall <`larry@wall.org'> wrote version 1.0.

   Clark Cooper <`coopercc@netheaven.com'> picked up support, changed the
API for this version (2.x), provided documentation, and added some
standard package features.


File: pm.info,  Node: XML/Parser/EasyTree,  Next: XML/Parser/Expat,  Prev: XML/Parser,  Up: Module List

Easier tree style for XML::Parser
*********************************

NAME
====

   XML::Parser::EasyTree - Easier tree style for XML::Parser

SYNOPSIS
========

     use XML::Parser;
     use XML::Parser::EasyTree;
     $XML::Parser::Easytree::Noempty=1;
     my $p=new XML::Parser(Style=>'EasyTree');
     my $tree=$p->parsefile('something.xml');

DESCRIPTION
===========

   XML::Parser::EasyTree adds a new "built-in" style called "EasyTree" to
XML::Parser.  Like XML::Parser's "Tree" style, setting this style causes
the parser to build a lightweight tree structure representing the XML
document.  This structure is, at least in this author's opinion, easier to
work with than the one created by the built-in style.

   When the parser is invoked with the EasyTree style, it returns a
reference to an array of tree nodes, each of which is a hash reference.
All nodes have a 'type' key whose value is the type of the node: 'e' for
element nodes, 't' for text nodes, and 'p' for processing instruction
nodes.  All nodes also have a 'content' key whose value is a reference to
an array holding the element's child nodes for element nodes, the string
value for text nodes, and the data value for processing instruction nodes.
Element nodes also have an 'attrib' key whose value is a reference to a
hash of attribute names and values.  Processing instructions also have a
'target' key whose value is the PI's target.

   EasyTree nodes are ordinary Perl hashes and are not objects.  Contiguous
runs of text are always returned in a single node.

   The reason the parser returns an array reference rather than the root
element's node is that an XML document can legally contain processing
instructions outside the root element (the xml-stylesheet PI is commonly
used this way).

   If the parser's Namespaces option is set, element and attribute names
will be prefixed with their (possibly empty) namespace URI enclosed in
curly brackets.

SPECIAL VARIABLES
=================

   Two package global variables control special behaviors:

XML::Parser::EasyTree::Latin
     If this is set to a nonzero value, all text, names, and values will be
     returned in ISO-8859-1 (Latin-1) encoding rather than UTF-8.

XML::Parser::EasyTree::Noempty
     If this is set to a nonzero value, text nodes containing nothing but
     whitespace (such as those generated by line breaks and indentation
     between tags) will be omitted from the parse tree.

EXAMPLE
=======

   Parse a prettyprined version of the XML shown in the example for the
built-in "Tree" style:

     #!perl -w
     use strict;
     use XML::Parser;
     use XML::Parser::EasyTree;
     use Data::Dumper;
     
     $XML::Parser::EasyTree::Noempty=1;
     my $xml=<<'EOF';
     <foo>
       <head id="a">Hello <em>there</em>
       </head>
       <bar>Howdy<ref/>
       </bar>
       do
     </foo>
     EOF
     my $p=new XML::Parser(Style=>'EasyTree');
     my $tree=$p->parse($xml);
     print Dumper($tree);

   Returns:

     $VAR1 = [
             { 'name' => 'foo',
               'type' => 'e',
               'content' => [
                              { 'name' => 'head',
                                'type' => 'e',
                                'content' => [
                                               { 'type' => 't',
                                                 'content' => 'Hello '
                                               },
                                               { 'name' => 'em',
                                                 'type' => 'e',
                                                 'content' => [
                                                                { 'type' => 't',
                                                                  'content' => 'there'
                                                                }
                                                              ],
                                                 'attrib' => {}
                                               }
                                             ],
                                'attrib' => { 'id' => 'a'
                                            }
                              },
                              { 'name' => 'bar',
                                'type' => 'e',
                                'content' => [
                                               { 'type' => 't',
                                                 'content' => 'Howdy'
                                               },
                                               { 'name' => 'ref',
                                                 'type' => 'e',
                                                 'content' => [],
                                                 'attrib' => {}
                                               }
                                             ],
                                'attrib' => {}
                              },
                              { 'type' => 't',
                                'content' => '
     do
      '
                              }
                            ],
               'attrib' => {}
             }
           ];

AUTHOR
======

   Eric Bohlman (ebohlman@omsdev.com)

   Copyright (c) 2001 Eric Bohlman. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same
terms as Perl itself.

SEE ALSO
========

     XML::Parser


File: pm.info,  Node: XML/Parser/Expat,  Next: XML/Parser/PerlSAX,  Prev: XML/Parser/EasyTree,  Up: Module List

Lowlevel access to James Clark's expat XML parser
*************************************************

NAME
====

   XML::Parser::Expat - Lowlevel access to James Clark's expat XML parser

SYNOPSIS
========

     use XML::Parser::Expat;

     $parser = new XML::Parser::Expat;
     $parser->setHandlers('Start' => \&sh,
     		      'End'   => \&eh,
                          'Char'  => \&ch);
     open(FOO, 'info.xml') or die "Couldn't open";
     $parser->parse(*FOO);
     close(FOO);
     # $parser->parse('<foo id="me"> here <em>we</em> go </foo>');

     sub sh
     {
       my ($p, $el, %atts) = @_;
       $p->setHandlers('Char' => \&spec)
         if ($el eq 'special');
       ...
     }

     sub eh
     {
       my ($p, $el) = @_;
       $p->setHandlers('Char' => \&ch)  # Special elements won't contain
         if ($el eq 'special');         # other special elements
       ...
     }

DESCRIPTION
===========

   This module provides an interface to James Clark's XML parser, expat.
As in expat, a single instance of the parser can only parse one document.
Calls to parsestring after the first for a given instance will die.

   Expat (and XML::Parser::Expat) are event based. As the parser recognizes
parts of the document (say the start or end of an XML element), then any
handlers registered for that type of an event are called with suitable
parameters.

METHODS
=======

new
     This is a class method, the constructor for XML::Parser::Expat.
     Options are passed as keyword value pairs. The recognized options are:

        * ProtocolEncoding

          The protocol encoding name. The default is none. The expat
          built-in encodings are: `UTF-8', `ISO-8859-1', `UTF-16', and
          `US-ASCII'.  Other encodings may be used if they have encoding
          maps in one of the directories in the @Encoding_Path list.
          Setting the protocol encoding overrides any encoding in the XML
          declaration.

        * Namespaces

          When this option is given with a true value, then the parser
          does namespace processing. By default, namespace processing is
          turned off. When it is turned on, the parser consumes *xmlns*
          attributes and strips off prefixes from element and attributes
          names where those prefixes have a defined namespace. A name's
          namespace can be found using the `"namespace"' in this node
          method and two names can be checked for absolute equality with
          the `"eq_name"' in this node method.

        * NoExpand

          Normally, the parser will try to expand references to entities
          defined in the internal subset. If this option is set to a true
          value, and a default handler is also set, then the default
          handler will be called when an entity reference is seen in text.
          This has no effect if a default handler has not been registered,
          and it has no effect on the expansion of entity references
          inside attribute values.

        * Stream_Delimiter

          This option takes a string value. When this string is found
          alone on a line while parsing from a stream, then the parse is
          ended as if it saw an end of file. The intended use is with a
          stream of xml documents in a MIME multipart format. The string
          should not contain a trailing newline.

        * ErrorContext

          When this option is defined, errors are reported in context. The
          value of ErrorContext should be the number of lines to show on
          either side of the line in which the error occurred.

        * ParseParamEnt

          Unless standalone is set to "yes" in the XML declaration,
          setting this to a true value allows the external DTD to be read,
          and parameter entities to be parsed and expanded.

        * Base

          The base to use for relative pathnames or URLs. This can also be
          done by using the base method.

setHandlers(TYPE, HANDLER [, TYPE, HANDLER [...]])
     This method registers handlers for the various events. If no handlers
     are registered, then a call to parsestring or parsefile will only
     determine if the corresponding XML document is well formed (by
     returning without error.)  This may be called from within a handler,
     after the parse has started.

     Setting a handler to something that evaluates to false unsets that
     handler.

     This method returns a list of type, handler pairs corresponding to the
     input. The handlers returned are the ones that were in effect before
     the call to setHandlers.

     The recognized events and the parameters passed to the corresponding
     handlers are:

        * Start		(Parser, Element [, Attr, Val [,...]])

          This event is generated when an XML start tag is recognized.
          Parser is an XML::Parser::Expat instance. Element is the name of
          the XML element that is opened with the start tag. The Attr &
          Val pairs are generated for each attribute in the start tag.

        * End		(Parser, Element)

          This event is generated when an XML end tag is recognized. Note
          that an XML empty tag (<foo/>) generates both a start and an end
          event.

          There is always a lower level start and end handler installed
          that wrap the corresponding callbacks. This is to handle the
          context mechanism.  A consequence of this is that the default
          handler (see below) will not see a start tag or end tag unless
          the default_current method is called.

        * Char		(Parser, String)

          This event is generated when non-markup is recognized. The
          non-markup sequence of characters is in String. A single
          non-markup sequence of characters may generate multiple calls to
          this handler. Whatever the encoding of the string in the
          original document, this is given to the handler in UTF-8.

        * Proc		(Parser, Target, Data)

          This event is generated when a processing instruction is
          recognized.

        * Comment		(Parser, String)

          This event is generated when a comment is recognized.

        * CdataStart	(Parser)

          This is called at the start of a CDATA section.

        * CdataEnd	(Parser)

          This is called at the end of a CDATA section.

        * Default		(Parser, String)

          This is called for any characters that don't have a registered
          handler.  This includes both characters that are part of markup
          for which no events are generated (markup declarations) and
          characters that could generate events, but for which no handler
          has been registered.

          Whatever the encoding in the original document, the string is
          returned to the handler in UTF-8.

        * Unparsed		(Parser, Entity, Base, Sysid, Pubid, Notation)

          This is called for a declaration of an unparsed entity. Entity
          is the name of the entity. Base is the base to be used for
          resolving a relative URI.  Sysid is the system id. Pubid is the
          public id. Notation is the notation name. Base and Pubid may be
          undefined.

        * Notation		(Parser, Notation, Base, Sysid, Pubid)

          This is called for a declaration of notation. Notation is the
          notation name.  Base is the base to be used for resolving a
          relative URI. Sysid is the system id. Pubid is the public id.
          Base, Sysid, and Pubid may all be undefined.

        * ExternEnt		(Parser, Base, Sysid, Pubid)

          This is called when an external entity is referenced. Base is
          the base to be used for resolving a relative URI. Sysid is the
          system id. Pubid is the public id. Base, and Pubid may be
          undefined.

          This handler should either return a string, which represents the
          contents of the external entity, or return an open filehandle
          that can be read to obtain the contents of the external entity,
          or return undef, which indicates the external entity couldn't be
          found and will generate a parse error.

          If an open filehandle is returned, it must be returned as either
          a glob (*FOO) or as a reference to a glob (e.g. an instance of
          IO::Handle).

        * ExternEntFin		(Parser)

          This is called after an external entity has been parsed. It
          allows applications to perform cleanup on actions performed in
          the above ExternEnt handler.

        * Entity			(Parser, Name, Val, Sysid, Pubid, Ndata, IsParam)

          This is called when an entity is declared. For internal
          entities, the Val parameter will contain the value and the
          remaining three parameters will be undefined. For external
          entities, the Val parameter will be undefined, the Sysid
          parameter will have the system id, the Pubid parameter will have
          the public id if it was provided (it will be undefined
          otherwise), the Ndata parameter will contain the notation for
          unparsed entities. If this is a parameter entity declaration,
          then the IsParam parameter is true.

          Note that this handler and the Unparsed handler above overlap.
          If both are set, then this handler will not be called for
          unparsed entities.

        * Element			(Parser, Name, Model)

          The element handler is called when an element declaration is
          found. Name is the element name, and Model is the content model
          as an XML::Parser::ContentModel object. See
          `"XML::Parser::ContentModel Methods"' in this node for methods
          available for this class.

        * Attlist			(Parser, Elname, Attname, Type, Default, Fixed)

          This handler is called for each attribute in an ATTLIST
          declaration.  So an ATTLIST declaration that has multiple
          attributes will generate multiple calls to this handler. The
          Elname parameter is the name of the element with which the
          attribute is being associated. The Attname parameter is the name
          of the attribute. Type is the attribute type, given as a string.
          Default is the default value, which will either be "#REQUIRED",
          "#IMPLIED" or a quoted string (i.e. the returned string will
          begin and end with a quote character). If Fixed is true, then
          this is a fixed attribute.

        * Doctype			(Parser, Name, Sysid, Pubid, Internal)

          This handler is called for DOCTYPE declarations. Name is the
          document type name. Sysid is the system id of the document type,
          if it was provided, otherwise it's undefined. Pubid is the
          public id of the document type, which will be undefined if no
          public id was given. Internal will be true or false, indicating
          whether or not the doctype declaration contains an internal
          subset.

        * DoctypeFin		(Parser)

          This handler is called after parsing of the DOCTYPE declaration
          has finished, including any internal or external DTD
          declarations.

        * XMLDecl			(Parser, Version, Encoding, Standalone)

          This handler is called for XML declarations. Version is a string
          containg the version. Encoding is either undefined or contains
          an encoding string.  Standalone is either undefined, or true or
          false. Undefined indicates that no standalone parameter was
          given in the XML declaration. True or false indicates "yes" or
          "no" respectively.

namespace(name)
     Return the URI of the namespace that the name belongs to. If the name
     doesn't belong to any namespace, an undef is returned. This is only
     valid on names received through the Start or End handlers from a
     single document, or through a call to the generate_ns_name method. In
     other words, don't use names generated from one instance of
     XML::Parser::Expat with other instances.

eq_name(name1, name2)
     Return true if name1 and name2 are identical (i.e. same name and from
     the same namespace.) This is only meaningful if both names were
     obtained through the Start or End handlers from a single document, or
     through a call to the generate_ns_name method.

generate_ns_name(name, namespace)
     Return a name, associated with a given namespace, good for using with
     the above 2 methods. The namespace argument should be the namespace
     URI, not a prefix.

new_ns_prefixes
     When called from a start tag handler, returns namespace prefixes
     declared with this start tag. If called elsewere (or if there were no
     namespace prefixes declared), it returns an empty list. Setting of
     the default namespace is indicated with '#default' as a prefix.

expand_ns_prefix(prefix)
     Return the uri to which the given prefix is currently bound. Returns
     undef if the prefix isn't currently bound. Use '#default' to find the
     current binding of the default namespace (if any).

current_ns_prefixes
     Return a list of currently bound namespace prefixes. The order of the
     the prefixes in the list has no meaning. If the default namespace is
     currently bound, '#default' appears in the list.

recognized_string
     Returns the string from the document that was recognized in order to
     call the current handler. For instance, when called from a start
     handler, it will give us the the start-tag string. The string is
     encoded in UTF-8.  This method doesn't return a meaningful string
     inside declaration handlers.

original_string
     Returns the verbatim string from the document that was recognized in
     order to call the current handler. The string is in the original
     document encoding. This method doesn't return a meaningful string
     inside declaration handlers.

default_current
     When called from a handler, causes the sequence of characters that
     generated the corresponding event to be sent to the default handler
     (if one is registered). Use of this method is deprecated in favor the
     recognized_string method, which you can use without installing a
     default handler. This method doesn't deliver a meaningful string to
     the default handler when called from inside declaration handlers.

xpcroak(message)
     Concatenate onto the given message the current line number within the
     XML document plus the message implied by ErrorContext. Then croak with
     the formed message.

xpcarp(message)
     Concatenate onto the given message the current line number within the
     XML document plus the message implied by ErrorContext. Then carp with
     the formed message.

current_line
     Returns the line number of the current position of the parse.

current_column
     Returns the column number of the current position of the parse.

current_byte
     Returns the current position of the parse.

base([NEWBASE]);
     Returns the current value of the base for resolving relative URIs. If
     NEWBASE is supplied, changes the base to that value.

context
     Returns a list of element names that represent open elements, with the
     last one being the innermost. Inside start and end tag handlers, this
     will be the tag of the parent element.

current_element
     Returns the name of the innermost currently opened element. Inside
     start or end handlers, returns the parent of the element associated
     with those tags.

in_element(NAME)
     Returns true if NAME is equal to the name of the innermost currently
     opened element. If namespace processing is being used and you want to
     check against a name that may be in a namespace, then use the
     generate_ns_name method to create the NAME argument.

within_element(NAME)
     Returns the number of times the given name appears in the context
     list.  If namespace processing is being used and you want to check
     against a name that may be in a namespace, then use the
     generate_ns_name method to create the NAME argument.

depth
     Returns the size of the context list.

element_index
     Returns an integer that is the depth-first visit order of the current
     element. This will be zero outside of the root element. For example,
     this will return 1 when called from the start handler for the root
     element start tag.

skip_until(INDEX)
     INDEX is an integer that represents an element index. When this method
     is called, all handlers are suspended until the start tag for an
     element that has an index number equal to INDEX is seen. If a start
     handler has been set, then this is the first tag that the start
     handler will see after skip_until has been called.

position_in_context(LINES)
     Returns a string that shows the current parse position. LINES should
     be an integer >= 0 that represents the number of lines on either side
     of the current parse line to place into the returned string.

xml_escape(TEXT [, CHAR [, CHAR ...]])
     Returns TEXT with markup characters turned into character entities.
     Any additional characters provided as arguments are also turned into
     character references where found in TEXT.

parse (SOURCE)
     The SOURCE parameter should either be a string containing the whole
     XML document, or it should be an open IO::Handle. Only a single
     document may be parsed for a given instance of XML::Parser::Expat, so
     this will croak if it's been called previously for this instance.

parsestring(XML_DOC_STRING)
     Parses the given string as an XML document. Only a single document
     may be parsed for a given instance of XML::Parser::Expat, so this
     will die if either parsestring or parsefile has been called for this
     instance previously.

     This method is deprecated in favor of the parse method.

parsefile(FILENAME)
     Parses the XML document in the given file. Will die if parsestring or
     parsefile has been called previously for this instance.

is_defaulted(ATTNAME)
     NO LONGER WORKS. To find out if an attribute is defaulted please use
     the specified_attr method.

specified_attr
     When the start handler receives lists of attributes and values, the
     non-defaulted (i.e. explicitly specified) attributes occur in the list
     first. This method returns the number of specified items in the list.
     So if this number is equal to the length of the list, there were no
     defaulted values. Otherwise the number points to the index of the
     first defaulted attribute name.

finish
     Unsets all handlers (including internal ones that set context), but
     expat continues parsing to the end of the document or until it finds
     an error.  It should finish up a lot faster than with the handlers
     set.

release
     There are data structures used by XML::Parser::Expat that have
     circular references. This means that these structures will never be
     garbage collected unless these references are explicitly broken.
     Calling this method breaks those references (and makes the instance
     unusable.)

     Normally, higher level calls handle this for you, but if you are using
     XML::Parser::Expat directly, then it's your responsibility to call it.

XML::Parser::ContentModel Methods
---------------------------------

   The element declaration handlers are passed objects of this class as the
content model of the element declaration. They also represent content
particles, components of a content model.

   When referred to as a string, these objects are automagicly converted
to a string representation of the model (or content particle).

isempty
     This method returns true if the object is "EMPTY", false otherwise.

isany
     This method returns true if the object is "ANY", false otherwise.

ismixed
     This method returns true if the object is "(#PCDATA)" or
     "(#PCDATA|...)*", false otherwise.

isname
     This method returns if the object is an element name.

ischoice
     This method returns true if the object is a choice of content
     particles.

isseq
     This method returns true if the object is a sequence of content
     particles.

quant
     This method returns undef or a string representing the quantifier
     ('?', '*', '+') associated with the model or particle.

children
     This method returns undef or (for mixed, choice, and sequence types)
     an array of component content particles. There will always be at least
     one component for choices and sequences, but for a mixed content model
     of pure PCDATA, "(#PCDATA)", then an undef is returned.

XML::Parser::ExpatNB Methods
----------------------------

   The class XML::Parser::ExpatNB is a subclass of XML::Parser::Expat used
for non-blocking access to the expat library. It does not support the
parse, parsestring, or parsefile methods, but it does have these
additional methods:

parse_more(DATA)
     Feed expat more text to munch on.

parse_done
     Tell expat that it's gotten the whole document.

FUNCTIONS
=========

XML::Parser::Expat::load_encoding(ENCODING)
     Load an external encoding. ENCODING is either the name of an encoding
     or the name of a file. The basename is converted to lowercase and a
     '.enc' extension is appended unless there's one already there. Then,
     unless it's an absolute pathname (i.e. begins with '/'), the first
     file by that name discovered in the @Encoding_Path path list is used.

     The encoding in the file is loaded and kept in the %Encoding_Table
     table. Earlier encodings of the same name are replaced.

     This function is automaticly called by expat when it encounters an
     encoding it doesn't know about. Expat shouldn't call this twice for
     the same encoding name. The only reason users should use this
     function is to explicitly load an encoding not contained in the
     @Encoding_Path list.

AUTHORS
=======

   Larry Wall <`larry@wall.org'> wrote version 1.0.

   Clark Cooper <`coopercc@netheaven.com'> picked up support, changed the
API for this version (2.x), provided documentation, and added some standard
package features.