This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: XML/SimpleObject,  Next: XML/Stream,  Prev: XML/Simple,  Up: Module List

Perl extension allowing a simple object representation of a parsed XML::Parser tree.
************************************************************************************

NAME
====

   XML::SimpleObject - Perl extension allowing a simple object
representation of a parsed XML::Parser tree.

SYNOPSIS
========

     use XML::SimpleObject;

     my $parser = new XML::Parser (ErrorContext => 2, Style => "Tree");
     my $xmlobj = new XML::SimpleObject ($parser->parse($XML));

     my $filesobj = $xmlobj->child("files")->child("file");

     $filesobj->name;
     $filesobj->value;
     $filesobj->attribute("type");
     
     %attributes    = $filesobj->attributes;
     @children      = $filesobj->children;
     @some_children = $filesobj->children("some");
     @chilren_names = $filesobj->children_names;

DESCRIPTION
===========

   This is a short and simple class allowing simple object access to a
parsed XML::Parser tree, with methods for fetching children and attributes
in as clean a manner as possible. My apologies for further polluting the
XML:: space; this is a small and quick module, with easy and compact usage.

USAGE
=====

$xmlobj = new XML::SimpleObject($parser->parse($XML))
     $parser is an XML::Parser object created with Style "Tree":

          my $parser = new XML::Parser (ErrorContext => 2, Style => "Tree");

     After creating $xmlobj, this object can now be used to browse the XML
     tree with the following methods.

$xmlobj->child('NAME')
     This will return a new XML::SimpleObject object using the child
     element NAME.

$xmlobj->children('NAME')
     Called with an argument NAME, children() will return an array of
     XML::SimpleObject objects of element NAME. Thus, if $xmlobj
     represents the top-level XML element, 'children' will return an array
     of all elements directly below the top-level that have the element
     name NAME.

$xmlobj->children
     Called without arguments, 'children()' will return an array of
     XML::SimpleObject s for all children elements of $xmlobj. These are
     not in the order they occur in  the XML document.

$xmlobj->children_names
     This will return an array of all the names of child elements for
     $xmlobj. You can use this to step through all the children of a given
     element (see EXAMPLES). Each name will occur only once, even if
     multiple children exist with that name.

$xmlobj->value
     If the element represented by $xmlobj contains any PCDATA, this
     method will return that text data.

$xmlobj->attribute('NAME')
     This returns the text for an attribute NAME of the XML element
     represented by $xmlobj.

$xmlobj->attributes
     This returns a hash of key/value pairs for all elements in element
     $xmlobj.

EXAMPLES
========

   Given this XML document:

     <files>
       <file type="symlink">
         <name>/etc/dosemu.conf</name>
         <dest>dosemu.conf-drdos703.eval</dest>
       </file>
       <file>
         <name>/etc/passwd</name>
         <bytes>948</bytes>
       </file>
     </files>

   You can then interpret the tree as follows:

     my $parser = new XML::Parser (ErrorContext => 2, Style => "Tree");
     my $xmlobj = new XML::SimpleObject ($parser->parse($XML));

     print "Files: \n";
     foreach my $element ($xmlobj->child("files")->children("file"))
     {
       print "  filename: " . $element->child("name")->value . "\n";
       if ($element->attribute("type"))
       {
         print "    type: " . $element->attribute("type") . "\n";
       }
       print "    bytes: " . $element->child("bytes")->value . "\n";
     }

   This will output:

     Files:
       filename: /etc/dosemu.conf
         type: symlink
         bytes: 20
       filename: /etc/passwd
         bytes: 948

   You can use 'children()' without arguments to step through all children
of a given element:

     my $filesobj = $xmlobj->child("files")->child("file");
     foreach my $child ($filesobj->children) {
       print "child: ", $child->name, ": ", $child->value, "\n";
     }

   For the tree above, this will output:

     child: bytes: 20
     child: dest: dosemu.conf-drdos703.eval
     child: name: /etc/dosemu.conf

   Using 'children_names()', you can step through all children for a given
element:

     my $filesobj = $xmlobj->child("files");
     foreach my $childname ($filesobj->children_names) {
         print "$childname has children: ";
         print join (", ", $filesobj->child($childname)->children_names), "\n";
     }

   This will print:

     file has children: bytes, dest, name

   By always using 'children()', you can step through each child object,
retrieving them with 'child()'.

AUTHOR
======

   Dan Brian <dbrian@brians.org>

SEE ALSO
========

   perl(1), XML::Parser.


File: pm.info,  Node: XML/Stream,  Next: XML/Stream/Namespace,  Prev: XML/SimpleObject,  Up: Module List

Creates and XML Stream connection and parses return data
********************************************************

NAME
====

   XML::Stream - Creates and XML Stream connection and parses return data

SYNOPSIS
========

     XML::Stream is an attempt at solidifying the use of XML via streaming.

DESCRIPTION
===========

     This module provides the user with methods to connect to a remote server,
     send a stream of XML to the server, and receive/parse an XML stream from
     the server.  It is primarily based work for the Etherx XML router
     developed by the Jabber Development Team.  For more information about
     this project visit http://etherx.jabber.org/stream/.

     XML::Stream gives the user the ability to define a central callback
     that will be used to handle the tags received from the server.  These
     tags are passed in the format of an XML::Parser::Tree object.  After
     the closing tag of an object is seen, the tree is finished and passed
     to the call back function.  What the user does with it from there is up
     to them.

     For a detailed description of how this module works, and about the data
     structure that it returns, please view the source of Stream.pm and
     look at the detailed description at the end of the file.

METHODS
=======

     new(debug=>string,       - creates the XML::Stream object.  debug should
         debugfh=>FileHandle,   be set to the path for the debug log to be
         debuglevel=>0|1|2,     written.  If set to "stdout" then the debug
         debugtime=>0|1)        will go there.   Also, you can specify a
                                filehandle that already exists byt using
                                debugfh.  debuglevel determines the amount of
                                debug to generate.  0 is the least, 2 is the
                                most.  debugtime determines wether a timestamp
                                should be preappended to the entry.

     Connect(hostname=>string,       - opens a tcp connection to the
             port=>integer,            specified server and sends the proper
             to=>string,               opening XML Stream tag.  hostname,
             from=>string,             port, and namespace are required.
             myhostname=>string,       namespaces allows you to use
             namespace=>string,        XML::Stream::Namespace objects.
             namespaced=>array,        to is needed if you want the stream
             connectiontype=>string)   to attribute to be something other
                                       than the hostname you are connecting
                                       to.  from is needed if you want the
                                       stream from attribute to be something
                                       other than the hostname you are
                                       connecting from.  myhostname should
                                       not be needed but if the module cannot
                                       determine your hostname properly (check
                                       the debug log), set this to the correct
                                       value, or if you want the other side
                                       of the  stream to think that you are
                                       someone else.  The type determines
                                       the kind of connection that is made:
                                         "tcpip"    - TCP/IP (default)
                                         "stdinout" - STDIN/STDOUT

     Disconnect() - sends the proper closing XML tag and closes the socket
                    down.

     Process(integer) - waits for data to be available on the socket.  If
                        a timeout is specified then the Process function
                        waits that period of time before returning nothing.
                        If a timeout period is not specified then the
                        function blocks until data is received.

   * DEPRECATED *  OnNode(function pointer) - This function is deprecated
and will be *                             removed in a future version.
Instead, use *                             the
SetCallBacks(node=>function) to do the *                             same
thing...

     SetCallBacks(node=>function,   - sets the callback that should be
                  update=>function)   called in various situations.  node
                                      is used to handle the XML::Parser::Tree
                                      trees that are built for each top
                                      level tag.  Update is used for when
                                      Process is blocking waiting for data,
                                      but you want your original code to be
                                      updated.

     GetRoot() - returns the attributes that the stream:stream tag sent by
                 the other end listed in a hash.

     GetSock() - returns a pointer to the IO::Socket object.

     Send(string) - sends the string over the connection as is.  This
                    does no checking if valid XML was sent or not.  Best
                    behavior when sending information.

     GetErrorCode() - returns a string that will hopefully contain some
                      useful information about why Process or Connect
                      returned an undef to you.

EXAMPLES
========

     ##########################
     # simple example

     use XML::Stream;

     $stream = new XML::Stream;

     my $status = $stream->Connect(hostname => "jabber.org",
                                   port => 5222,
                                   namespace => "jabber:client");

     if (!defined($status)) {
       print "ERROR: Could not connect to server\n";
       print "       (",$stream->GetErrorCode(),")\n";
       exit(0);
     }

     while($node = $stream->Process()) {
       # do something with $node
     }

     $stream->Disconnect();

     ###########################
     # example using a handler

     use XML::Stream;

     $stream = new XML::Stream;
     $stream->SetCallBacks(node=>\&noder);
     $stream->Connect(hostname => "jabber.org",
     		   port => 5222,
     		   namespace => "jabber:client",
     		   timeout => undef) || die $!;

     # Blocks here forever, noder is called for incoming
     # packets when they arrive.
     while(defined($stream->Process())) { }

     print "ERROR: Stream died (",$stream->GetErrorCode(),")\n";
     
     sub noder
     {
       my $node = shift;
       # do something with $node
     }

AUTHOR
======

   Tweaked, tuned, and brightness changes by Ryan Eatmon, reatmon@ti.com
in May of 2000.  Colorized, and Dolby Surround sound added by Thomas
Charron, tcharron@jabber.org By Jeremie in October of 1999 for
http://etherx.jabber.org/streams/

COPYRIGHT
=========

   This module is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.


File: pm.info,  Node: XML/Stream/Namespace,  Next: XML/Stream/Parser,  Prev: XML/Stream,  Up: Module List

Object to make defining Namespaces easier in                           XML::Stream.
***********************************************************************************

NAME
====

   XML::Stream::Namespace - Object to make defining Namespaces easier in
                      XML::Stream.

SYNOPSIS
========

   XML::Stream::Namespace is a helper package to XML::Stream.  It provides
a clean way of defining Namespaces for XML::Stream to use when connecting.

DESCRIPTION
===========

     This module allows you to set and read elements from an XML::Stream
     Namespace.

METHODS
=======

     SetNamespace("mynamespace");
     SetXMLNS("http://www.mynamespace.com/xmlns");
     SetAttributes(attrib1=>"value1",
                   attrib2=>"value2");

     GetNamespace() returns "mynamespace"
     GetXMLNS() returns "http://www.mynamespace.com/xmlns"
     GetAttributes() returns a hash ( attrib1=>"value1",attrib2=>"value2")
     GetStream() returns the following string:
       "xmlns:mynamespace='http://www.nynamespace.com/xmlns'
        mynamespace:attrib1='value1'
        mynamespace:attrib2='value2'"

EXAMPLES
========

     $myNamespace = new XML::Stream::Namespace("mynamspace");
     $myNamespace->SetXMLNS("http://www.mynamespace.org/xmlns");
     $myNamespace->SetAttributes(foo=>"bar",
                                 bob=>"vila");

     $stream = new XML::Stream;
     $stream->Connect(name=>"foo.bar.org",
                      port=>1234,
                      namespace=>"foo:bar",
                      namespaces=>[ $myNamespace ]);

     #
     # The above Connect will send the following as the opening string
     # of the stream to foo.bar.org:1234...
     #
     #   <stream:stream
     #    xmlns:stream="http://etherx.jabber.org/streams"
     #    to="foo.bar.org"
     #    xmlns="foo:bar"
     #    xmlns:mynamespace="http://www.mynamespace.org/xmlns"
     #    mynamespace:foo="bar"
     #    mynamespace:bob="vila">
     #

AUTHOR
======

   Written by Ryan Eatmon in February 2000 Idea By Thomas Charron in
January of 2000 for http://etherx.jabber.org/streams/

COPYRIGHT
=========

   This module is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.


File: pm.info,  Node: XML/Stream/Parser,  Next: XML/Stream/Parser/DTD,  Prev: XML/Stream/Namespace,  Up: Module List

SAX XML Parser for XML Streams
******************************

NAME
====

     XML::Stream::Parser - SAX XML Parser for XML Streams

SYNOPSIS
========

     Light weight XML parser that builds XML::Parser::Tree objects from the
     incoming stream and passes them to a function to tell whoever is using
     it that there are new packets.

DESCRIPTION
===========

     This module provides a very light weight parser

METHODS
=======

EXAMPLES
========

AUTHOR
======

   By Ryan Eatmon in January of 2001 for http://jabber.org/

COPYRIGHT
=========

   This module is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.


File: pm.info,  Node: XML/Stream/Parser/DTD,  Next: XML/TiePYX,  Prev: XML/Stream/Parser,  Up: Module List

XML DTD Parser and Verifier
***************************

NAME
====

     XML::Stream::Parser::DTD - XML DTD Parser and Verifier

SYNOPSIS
========

     This is a work in progress.  I had need for a DTD parser and verifier
     and so am working on it here.  If you are reading this then you are
     snooping.  =)

DESCRIPTION
===========

     This module provides the initial code for a DTD parser and verifier.

METHODS
=======

EXAMPLES
========

AUTHOR
======

   By Ryan Eatmon in February of 2001 for http://jabber.org/

COPYRIGHT
=========

   This module is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.


File: pm.info,  Node: XML/TiePYX,  Next: XML/TokeParser,  Prev: XML/Stream/Parser/DTD,  Up: Module List

Read or write XML data in PYX format via tied filehandle
********************************************************

NAME
====

   XML::TiePYX - Read or write XML data in PYX format via tied filehandle

SYNOPSIS
========

     use XML::TiePYX;

     tie *XML,'XML::TiePYX','file.xml'

     open IN,'file.xml' or die $!;
     tie *XML,'XML::TiePYX',\*IN,Condense=>0;

     my $text='<tag xmlns="http://www.omsdev.com">text</tag>';
     tie *XML,'XML::TiePYX',\$text,Namespaces=>1;

     tie *XML,'XML::TiePYX',\*STDOUT;
     print XML "(start\n","-Hello, world!\n",")start\n";

DESCRIPTION
===========

   XML::TiePYX lets you use a tied filehandle to read from or write to an
XML file or string.  PYX is a line-oriented, parsed representation of XML
developed by Sean McGrath (http://www.pyxie.org).  Each line corresponds to
one "event" in the XML, with the first character indicating the type of
event:

(
     The start of an element; the rest of the line is its name.

A
     An attribute; the rest of the line is the attribute's name, a space,
     and its value.

)
     The end of an element; the rest of the line is its name.

-
     Literal text (characters).  The rest of the line is the text.

?
     A processing instruction.  The rest of the line is the instruction's
     target, a space, and the instruction's value.

   Newlines in attribute values, text, and processing instruction values
are represented as the literal sequence '\n' (that is, a backslash
followed by an 'n').  By default, consecutive runs of characters are
always gathered into a single text event when reading, but this behavior
can be disabled.  Comments are *not* available through PYX.

   Just as SAX is an API well suited to "push"-mode XML parsing, PYX is
well- suited to "pull"-mode parsing where you want to capture the state of
the parse through your program's flow of code rather than through a bunch
of state variables.  This module uses incremental parsing to avoid the
need to buffer up large numbers of events.

   This module implements an (unofficial) extension to the PYX format to
allow namespace processing.  If namespaces are enabled, an element or
attribute name will be prefixed by its namespace URI (*NOT* any namespace
prefix used in the document) enclosed in curly braces.  A name with no
namespace will be prefixed with {}.  At the present time, this module does
not implement namespace processing in output mode; attempting to write
'(', ')', or 'A' lines that contain a namespace URI in curly braces will
merely result in generating ill-formed element or attribute names.

INTERFACE
=========

     tie *tied_handle, 'XML::TiePYX', source, [Option=>value,...]

   *tied_handle* is the filehandle which the PYX events will be read from
or written to.

   source is either a reference to a string containing the XML, the name of
a file containing the XML, or an open IO::Handle or filehandle glob
reference which the XML can be read or written to.

   The Options can be any options allowed by XML::Parser and
XML::Parser::Expat, as well as four module-specific options:

Validating
     This will provide a validating parse by using XML::Checker::Parser in
     place of XML::Parser if set to a true value.

Condense
     Causes all consecutive runs of character data to be gathered up into a
     single PYX event if set to a true value (the default).  If set false,
     multiple consecutive character data events may occur in the stream
     (which may be desirable when dealing with large chunks of text).
     This option has no effect when writing.

Latin
     If set to a true value, causes Unicode characters in the range
     128-255 to be returned as ISO-Latin-1 characters rather than UTF-8
     characters when reading, and an XML declaration specifying an
     encoding of "ISO-8859-1" to be output when writing.

Catalog
     Specifies the URL of a catalog to use for resolving public
     identifiers and remapping system identifiers used in document type
     declarations or external entity references.  This option requires
     XML::Catalog to be installed.

   The tied filehandle may be read from with either the diamond operator
(<HANDLE>), getc(), or read().  The diamond operator always returns a line
at a time regardless of the setting of $/.  It may be written to with
print() or printf(); it is necessary to print one or more complete PYX
lines at a time.  This module does not support read/write mode.

EXAMPLE
=======

   This program (*psectp.plx* in the distribution) prints a numbered
outline from an XML file in which an <outline> can contain zero or more
<sect>s, each with a title attribute, and each <sect> can contain zero or
more nested <sect>s or <para>s containing text, as in the *sects.otl* file
included with the distribution.  The -c option makes it print just a table
of contents.

   This is actually a traditional recursive-descent parser using PYX
events as tokens.

     #!/usr/bin/perl -w

     use strict;
     use XML::TiePYX;
     use Text::Wrap;
     use Getopt::Std;

     my (@sectnums,%opts);

     getopts('c',\%opts);

     die "usage: psect [-c] file\n" unless @ARGV==1;

     tie *XML,'XML::TiePYX',$ARGV[0];
     die "illegal structure" unless get_event() =~ /^\(outline/;
     push @sectnums,0;
     print_sect() while get_event() =~ /^\(sect/;
     die unless /^\)outline/;
     close XML;

     sub print_sect {
       <XML>=~/^Atitle (.*)/ or die "missing title";
       ++$sectnums[-1];
       print ' ' x (4*$#sectnums),join('.',@sectnums)," $1\n";
       print "\n" unless $opts{c};
       push @sectnums,0;
       while (get_event() !~ /^\)sect/) {
         /^\(sect/ and print_sect(),next;
         /^\(para/ and print_para(),next;
         die "illegal structure";
       }
       pop @sectnums;
     }
     
     sub print_para() {
       die "illegal structure" unless <XML> =~ /^-(.*)/;
       $_=$1;
       s/\\n/ /g;
       s/^\s+//;
       s/\s+$//;
       print wrap((' ' x (4*($#sectnums-1))) x 2,$_),"\n\n" unless $opts{c};
       die "illegal structure" unless <XML> =~ /^\)para/;
     }

     sub get_event {
       $_=<XML>;
       $_=<XML> if /^-(\s|\\n)*$/;
       $_;
     }

RATIONALE
=========

   There's already an XML::PYX module (written by Matt Sergeant)
available, so why another PYX implementation?  Mainly because XML::PYX is
intended to be used in a standalone PYX-outputting program which you open
as a pipe.  That works very well under Unix, aside from the overhead of
forking a separate process, but is problematic on Win32 systems for a
variety of niggling reasons: the standalone script is supplied as a batch
file, whose output can't be properly redirected into a pipe unless you
invoke it as 'perl /perl/bin/pyx|' instead of just 'pyx|'.  Both Win95 and
Win98, as well as possibly other Win32 systems, implement pipes using
temporary files and the reading process can't start reading until the
writing process is done writing, which means that if you're parsing a huge
file you may have to wait a long time before getting *any* output.  The
ability to guarantee a single character data event for any run of
characters can often simplify processing.  And finally, when I wrote this
the only supported namespace- aware way to parse XML was the raw handlers
interface of XML::Parser, which is needlessly complicated for simple
applications (there are, of course, those who would argue that "simple
applications" and "namespace-aware" are mutually-exclusive categories).

BUGS
====

   The Validating option does not work correctly, as XML::Checker::Parser
does not implement the parse_start() method.

   Error handling leaves much to be desired.

AUTHOR
======

   Eric Bohlman (ebohlman@netcom.com, ebohlman@omsdev.com)

COPYRIGHT
=========

   Copyright 2000 Eric Bohlman.  All rights reserved.

   This program is free software; you can use/modify/redistribute it under
the same terms as Perl itself.

SEE ALSO
========

     XML::PYX
     XML::Parser
     XML::Parser::Expat
     XML::Checker
     XML::Catalog
     perl(1).


File: pm.info,  Node: XML/TokeParser,  Next: XML/TreeBuilder,  Prev: XML/TiePYX,  Up: Module List

Simplified interface to XML::Parser
***********************************

NAME
====

   XML::TokeParser - Simplified interface to XML::Parser

SYNOPSIS
========

     use XML::TokeParser;

     #parse from file
     my $p=XML::TokeParser->new('file.xml')

     #parse from open handle
     open IN,'file.xml' or die $!;
     my $p=XML::TokeParser->new(\*IN,Noempty=>1);

     #parse literal text
     my $text='<tag xmlns="http://www.omsdev.com">text</tag>';
     my $p=XML::TokeParser->new(\$text,Namespaces=>1);

     #read next token
     my $token=$p->get_token();

     #skip to <title> and read text
     $p->get_tag('title');
     $p->get_text();

     #read text of next <para>, ignoring any internal markup
     $p->get_tag('para');
     $p->get_trimmed_text('/para');

DESCRIPTION
===========

   XML::TokeParser provides a procedural ("pull mode") interface to
XML::Parser in much the same way that Gisle Aas' HTML::TokeParser provides
a procedural interface to HTML::Parser.  XML::TokeParser splits its XML
input up into "tokens," each corresponding to an XML::Parser event.

   A token is a reference to an array whose first element is an event-type
string and whose last element is the literal text of the XML input that
generated the event, with intermediate elements varying according to the
event type:

Start tag
     The token has five elements: 'S', the element's name, a reference to
     a hash of attribute values keyed by attribute names, a reference to
     an array of attribute names in the order in which they appeared in
     the tag, and the literal text.

End tag
     The token has three elements: 'E', the element's name, and the
     literal text.

Character data (text)
     The token has three elements: 'T', the parsed text, and the literal
     text.  All contiguous runs of text are gathered into single tokens;
     there will never be two 'T' tokens in a row.

Comment
     The token has three elements: 'C', the parsed text of the comment,
     and the literal text.

Processing instruction
     The token has four elements: 'PI', the target, the data, and the
     literal text.

   The literal text includes any markup delimiters (pointy brackets,
<![CDATA[, etc.), entity references, and numeric character references and
is in the XML document's original character encoding.  All other text is in
UTF-8 (unless the Latin option is set, in which case it's in ISO-8859-1)
regardless of the original encoding, and all entity and character
references are expanded.

   If the Namespaces option is set, element and attribute names are
prefixed by their (possibly empty) namespace URIs enclosed in curly
brackets and xmlns:* attributes do not appear in 'S' tokens.

METHODS
=======

$p = XML::TokeParser->new($input, [options])
     Creates a new parser, specifying the input source and any options.  If
     $input is a string, it is the name of the file to parse.  If $input
     is a reference to a string, that string is the actual text to parse.
     If $input is a reference to a typeglob or an IO::Handle object
     corresponding to an open file or socket, the text read from the
     handle will be parsed.

     Options are name=>value pairs and can be any of the following:

    Namespaces
          If set to a true value, namespace processing is enabled.

    ParseParamEnt
          This option is passed on to the underlying XML::Parser object;
          see that module's documentation for details.

    Noempty
          If set to a true value, text tokens consisting of only
          whitespace (such as those created by indentation and line breaks
          in between tags) will be ignored.

    Latin
          If set to a true value, all text other than the literal text
          elements of tokens will be translated into the ISO 8859-1
          (Latin-1) character encoding rather than the normal UTF-8
          encoding.

    Catalog
          The value is the URI of a catalog file used to resolve PUBLIC
          and SYSTEM identifiers.  See XML::Catalog for details.

$token = $p->get_token()
     Returns the next token, as an array reference, from the input.
     Returns undef if there are no remaining tokens.

$p->unget_token($token,...)
     Pushes tokens back so they will be re-read.  Useful if you've read
     one or more tokens to far.

$token = $p->get_tag( [$token] )
     If no argument given, skips tokens until the next start tag or end tag
     token. If an argument is given, skips tokens until the start tag or
     end tag (if the argument begins with '/') for the named element.  The
     returned token does not include an event type code; its first element
     is the element name, prefixed by a '/' if the token is for an end tag.

$text = $p->get_text( [$token] )
     If no argument given, returns the text at the current position, or an
     empty string if the next token is not a 'T' token.  If an argument is
     given, gathers up all text between the current position and the
     specified start or end tag, stripping out any intervening tags (much
     like the way a typical Web browser deals with unknown tags).

$text = $p->get_trimmed_text( [$token])
     Like get_text(), but deletes any leading or trailing whitespaces and
     collapses multiple whitespace (including newlines) into single spaces.

DIFFERENCES FROM HTML::TokeParser
=================================

   Uses a true XML parser rather than a modified HTML parser.

   Text and comment tokens include extracted text as well as literal text.

   PI tokens include target and data as well as literal text.

   No tokens for declarations.

   No "textify" hash.

EXAMPLES
========

Print method signatures from the XML version of this PODpage
------------------------------------------------------------

     #!/usr/bin/perl -w
     use strict;
     use XML::TokeParser;
     my $t;
     my $p=XML::TokeParser->new('tokeparser.xml',Noempty=>1) or die $!;
     while ($p->get_tag('title') && $p->get_text('/title') ne 'METHODS') {
       ;
     }
     $p->get_tag('list');
     while (($t=$p->get_tag()->[0]) ne '/list') {
       if ($t eq 'item') {
         $p->get_tag('itemtext');
         print $p->get_text('/itemtext'),"\n";
         $p->get_tag('/item');
       }
       else {
         $p->get_tag('/list');  # assumes no nesting here!
       }
     }

AUTHOR
======

   Eric Bohlman (ebohlman@omsdev.com)

   Copyright (c) 2001 Eric Bohlman. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same
terms as Perl itself.

SEE ALSO
========

     XML::Parser
     XML::Catalog
     HTML::TokeParser


File: pm.info,  Node: XML/TreeBuilder,  Next: XML/Twig,  Prev: XML/TokeParser,  Up: Module List

Parser that builds a tree of XML::Element objects
*************************************************

NAME
====

   XML::TreeBuilder - Parser that builds a tree of XML::Element objects

SYNOPSIS
========

     foreach my $file_name (@ARGV) {
       my $tree = XML::TreeBuilder->new; # empty tree
       $tree->parse_file($file_name);
       print "Hey, here's a dump of the parse tree of $file_name:\n";
       $tree->dump; # a method we inherit from XML::Element
       print "And here it is, bizarrely rerendered as XML:\n",
         $tree->as_XML, "\n";
     
       # Now that we're done with it, we must destroy it.
       $tree = $tree->delete;
     }

DESCRIPTION
===========

   This module uses XML::Parser to make XML document trees constructed of
XML::Element objects (and XML::Element is a subclass of HTML::Element
adapted for XML).  XML::TreeBuilder is meant particularly for people who
are used to the HTML::TreeBuilder / HTML::Element interface to document
trees, and who don't want to learn some other document interface like
XML::Twig or XML::DOM.

   The way to use this class is to:

   1. start a new (empty) XML::TreeBuilder object.

   2. set any of the "store" options you want.

   3. then parse the document from a source by calling `$x->parsefile(...)'
or `$x->parse(...)' (See *Note XML/Parser: XML/Parser, docs for the options
that these two methods take)

   4. do whatever you need to do with the syntax tree, presumably
involving traversing it looking for some bit of information in it,

   5. and finally, when you're done with the tree, call $tree->delete to
erase the contents of the tree from memory.  This kind of thing usually
isn't necessary with most Perl objects, but it's necessary for TreeBuilder
objects.  See *Note HTML/Element: HTML/Element, for a more verbose
explanation of why this is the case.

METHODS AND ATTRIBUTES
======================

   XML::TreeBuilder is a subclass of XML::Element, which in turn is a
subclass of HTML:Element.  You should read and understand the
documentation for those two modules.

   An XML::TreeBuilder object is just a special XML::Element object that
allows you to call these additional methods:

$root = XML::TreeBuilder->new()
     Construct a new XML::TreeBuilder object.

$root->parse(...options...)
     Uses XML::Parser's parse method to parse XML from the source(s?)
     specified by the options.  See `XML::Parse' in this node

$root->parsefile(...options...)
     Uses XML::Parser's parsefile method to parse XML from the source(s?)
     specified by the options.  See `XML::Parse' in this node

$root->parse_file(...options...)
     Simply an alias for parsefile.

$root->store_comments(value)
     This determines whether TreeBuilder will normally store comments found
     while parsing content into $root.  Currently, this is off by default.

$root->store_declarations(value)
     This determines whether TreeBuilder will normally store markup
     declarations found while parsing content into $root.  Currently, this
     is off by default.

$root->store_pis(value)
     This determines whether TreeBuilder will normally store processing
     instructions found while parsing content into $root.  Currently, this
     is off (false) by default.

SEE ALSO
========

   *Note XML/Parser: XML/Parser,, *Note XML/Element: XML/Element,, *Note
HTML/TreeBuilder: HTML/TreeBuilder,, *Note HTML/DOMbo: HTML/DOMbo,.

   And for alternate XML document interfaces, *Note XML/DOM: XML/DOM, and
*Note XML/Twig: XML/Twig,.

COPYRIGHT
=========

   Copyright 2000 Sean M. Burke.

AUTHOR
======

   Sean M. Burke, <sburke@cpan.org>


File: pm.info,  Node: XML/Twig,  Next: XML/UM,  Prev: XML/TreeBuilder,  Up: Module List

A perl module for processing huge XML documents in tree mode.
*************************************************************

NAME
====

   XML::Twig - A perl module for processing huge XML documents in tree
mode.

SYNOPSIS
========

     single-tree mode
         my $t= new XML::Twig();
         $t->parse( '<doc><para>para1</para></doc>');
         $t->print;

     chunk mode
         my $t= new XML::Twig( TwigHandlers => { section => \&flush});
         $t->parsefile( 'doc.xml');
         $t->flush;
         sub flush { $_[0]->flush; }

     my $t= new XML::Twig( TwigHandlers =>
     	                        { 'section/title' => \&print_elt_text} );
     $t->parsefile( 'doc.xml');
     sub print_elt_text
       { my( $t, $elt)= @_;
         print $elt->text;
       }

     my $t= new XML::Twig( TwigHandlers =>
     	                        { 'section[@level="1"]' => \&print_elt_text }
     			    );
     $t->parsefile( 'doc.xml');

     roots mode (builds only the required sub-trees)
         my $t= new XML::Twig(
                  TwigRoots    => { 'section/title' => \&print_elt_text}
                             );
         $t->parsefile( 'doc.xml');
         sub print_elt_text
           { my( $t, $elt)= @_;
             print $elt->text;
           }

DESCRIPTION
===========

   This module provides a way to process XML documents. It is build on top
of XML::Parser.

   The module offers a tree interface to the document, while allowing you
to output the parts of it that have been completely processed.

   It allows minimal resource (CPU and memory) usage by building the tree
only for the parts of the documents that need actual processing, through
the use of the TwigRoots and TwigPrintOutsideRoots options. The finish and
finish_print methods also help to increase performances.

   XML::Twig tries to make simple things easy so it tries its best to
takes care of a lot of the (usually) annoying (but sometimes necessary)
features that come with XML and XML::Parser.

Whitespaces
     Whitespaces that look non-significant are discarded, this behaviour
     can be controlled using the KeepSpaces, KeepSpacesIn and
     DiscardSpacesIn options.

Encoding
     You can specify that you want the output in the same encoding as the
     input (provided you have valid XML, which means you have to specify
     the encoding either in the document or when you create the Twig
     object) using the KeepEncoding option

METHODS
=======

Twig
----

   A twig is a subclass of XML::Parser, so all XML::Parser methods can be
called on a twig object, including parse and parsefile.  setHandlers on
the other hand cannot not be used, see ``' in this node'

new
     This is a class method, the constructor for XML::Twig. Options are
     passed as keyword value pairs. Recognized options are the same as
     XML::Parser, plus some XML::Twig specifics:

    TwigHandlers
          This argument replaces the corresponding XML::Parser argument.
          It consists of a hash { expression => \&handler} where
          expression is a *generic_attribute_condition*,
          *string_condition*, an *attribute_condition*,*full_path*, a
          *partial_path*, a gi, *_default_* or <_all_>.

          The idea is to support a usefull but efficient (thus limited)
          subset of XPATH. A fuller expression set will be supported in
          the future, as users ask for more and as I manage to implement
          it efficiently. This will never encompass all of XPATH due to
          the streaming nature of parsing (no lookhead after the element
          end tag).

          A *generic_attribute_condition* is a condition on an attribute,
          in the form **[@att="val"]* or **[@att]*, simple quotes can be
          used instead of double quotes and the leading '*' is actually
          optional. No matter what the gi of the element is, the handler
          will be triggered either if the attribute has the specified
          value or if it just exists.

          A *string_condition* is a condition on the content of an
          element, in the form *gi[string()="foo"]*, simple quotes can be
          used instead of double quotes, at the moment you cannot escape
          the quotes (this will be added as soon as I dig out my copy of
          Mastering Regular Expressions from its storage box).  The text
          returned is, as per what I (and Matt Sergeant!) understood from
          the XPATH spec the concatenation of all the text in the element,
          excluding all markup. Thus to call a handler on the element
          <p>text <b>bold</b></p> the appropriate condition is
          p[string()="text bold"]. Note that this is not exactly
          conformant to the XPATH spec, it just tries to mimic it while
          being still quite concise.

          A extension of that notation is *gi[string(*child_gi*)="foo"]*
          where the handler will be called if a child of a gi element has
          a text value of foo.  At the moment only direct children of the
          gi element are checked. If you need to test on descendants of
          the element let me know. The fix is trivial but would slow down
          the checks, so I'd like to keep it the way it is.

          A *regexp_condition* is a condition on the content of an
          element, in the form *gi[string()=~ /foo/"]*. This is the same
          as a string condition except that the text of the element is
          matched to the regexp. The i, m, <s> and o modifiers can be used
          on the regexp.

          The *gi[string(*child_gi*)=~ /foo/"]* extension is also
          supported.

          An *attribute_condition* is a simple condition of an attribute
          of the current element in the form *gi[@att="val"]* (simple
          quotes can be used instead of double quotes, you can escape
          quotes either).  If several attribute_condition are true the
          same element all the handlers can be called in turn (in the
          order in which they were first defined).  If the ="val" part is
          ommited ( the condition is then gi[@att]) then the handler is
          triggered if the attribute actually exists for the element, no
          matter what it's value is.

          A *full_path* looks like *'/doc/section/chapter/title'*, it
          starts with a / then gives all the gi's to the element. The
          handler will be called if the path to the current element (in
          the input document) is exactly as defined by the full_path.

          A *partial_path* is like a full_path except it does not start
          with a /: *'chapter/title'* for example. The handler will be
          called if the path to the element (in the input document) ends
          as defined in the partial_path.

          WARNING: (hopefully temporary) at the moment *string_condition*,
          *regexp_condition* and *attribute_condition* are only supported
          on a simple gi, not on a path.

          A gi (generic identifier) is just a tag name.

          A special gi *_all_* is used to call a function for each element.
          The special gi *_default_* is used to call a handler for each
          element that does NOT have a specific handler.

          The order of precedence to trigger a handler is:
          *generic_attribute_condition*, *string_condition*,
          *regexp_condition*, *attribute_condition*, *full_path*, longer
          *partial_path*, shorter *partial_path*, gi, *_default_* .

          *Important*: once a handler has been triggered if it returns 0
          then no other handler is called, exept a _all_ handler which
          will be called anyway.

          If a handler returns a true value and other handlers apply, then
          the next applicable handler will be called. Repeat, rince,
          lather..;

          When an element is CLOSED the corresponding handler is called,
          with 2 arguments: the twig and the ``' in this node'. The twig
          includes the document tree that has been built so far, the
          element is the complete sub-tree for the element.

          Text is stored in elements where gi is #PCDATA (due to mixed
          content, text and sub-element in an element there is no way to
          store the text as just an attribute of the enclosing element).

          Warning: if you have used purge or flush on the twig the element
          might not be complete, some of its children might have been
          entirely flushed or purged, and the start tag might even have
          been printed (by flush) already, so changing its gi might not
          give the expected result.

          More generally, the *full_path*, *partial_path* and gi
          expressions are evaluated against the input document. Which
          means that even if you have changed the gi of an element
          (changing the gi of a parent element from a handler for example)
          the change will not impact the expression evaluation. Attributes
          in *attribute_condition* are different though. As the initial
          value of attribute is not stored the handler will be triggered
          if the current attribute/value pair is found when the element
          end tag is found. Although this can be quite confusing it should
          not impact most of users, and allow others to play clever tricks
          with temporary attributes. Let me know if this is a problem for
          you.

    TwigRoots
          This argument let's you build the tree only for those elements
          you are interested in.

               Example: my $t= new XML::Twig( TwigRoots => { title => 1, subtitle => 1});
                        $t->parsefile( file);
                        my $t= new XML::Twig( TwigRoots => { 'section/title' => 1});
                        $t->parsefile( file);

          returns a twig containing a document including only title and
          subtitle elements, as children of the root element.

          You can use *generic_attribute_condition*,
          *attribute_condition*, *full_path*, *partial_path*, gi,
          *_default_* and *_all_* to trigger the building of the twig.
          *string_condition* and *regexp_condition* cannot be used as the
          content of the element, and the string, have not yet been parsed
          when the condition is checked.

          WARNING: path are checked for the document. Even if the
          TwigRoots option is used they will be checked against the full
          document tree, not the virtual tree created by XML::Twig

          WARNING: TwigRoots elements should NOT be nested, that would
          hopelessly confuse XML::Twig ;-(

          Note: you can set handlers (TwigHandlers) using TwigRoots
          Example: my $t= new XML::Twig( TwigRoots => { title    => sub {
          $_{1]->print;},
          subtitle => \&process_subtitle });            $t->parsefile(
          file);

    TwigPrintOutsideRoots
          To be used in conjunction with the TwigRoots argument. When set
          to a true value this will print the document outside of the
          TwigRoots elements.

               Example: my $t= new XML::Twig( TwigRoots => { title => \&number_title },
                                              TwigPrintOutsideRoots => 1,
                                             );
                         $t->parsefile( file);
                         { my $nb;
                         sub number_title
                           { my( $twig, $title);
                             $nb++;
                             $title->prefix( "$nb "; }
                             $title->print;
                           }
                         }

          This example prints the document outside of the title element,
          calls number_title for each title element, prints it, and then
          resumes printing the document. The twig is built only for the
          title elements.

    StartTagHandlers
          A hash { expression => \&handler}. Sets element handlers that
          are called when the element is open (at the end of the
          XML::Parser Start handler). The handlers are called with 2
          params: the twig and the element. The element is empty at that
          point, its attributes are created though.

          You can use *generic_attribute_condition*,
          *attribute_condition*, *full_path*, *partial_path*, gi,
          *_default_*  and *_all_* to trigger the handler.
          *string_condition* and *regexp_condition* cannot be used as the
          content of the element, and the string, have not yet been parsed
          when the condition is checked.

          The main use for those handlers is probably to create temporary
          attributes that will be used when processing sub-element with
          TwigHanlders.

          You should also use it to change tags if you use flush. If you
          change the tag in a regular TwigHandler then the start tag might
          already have been flushed.

          Note: StartTag handlers can be called outside ot TwigRoots if
          this argument is used, in this case handlers are called with the
          following arguments: $t (the twig), $gi (the gi of the element)
          and %att (a hash of the attributes of the element).

          If the TwigPrintOutsideRoots argument is also used then the
          start tag will be printed if the last handler called returns a
          `true' value, if it does not then the start tag will not be
          printed (so you can print a modified string yourself for
          example);

    EndTagHandlers
          A hash { expression => \&handler}. Sets element handlers that
          are called when the element is closed (at the end of the
          XML::Parser End handler). The handlers are called with 2 params:
          the twig and the gi of the element.

          TwigHandlers are called when an element is completely parsed, so
          why have this redundant option? There is only one use for
          EndTagHandlers: when using the TwigRoots option, to trigger a
          handler for an element *outside* the roots. It is for example
          very useful to number titles in a document using nested sections:

               my @no= (0);
               my $no;
               my $t= new XML::Twig(
                       StartTagHandlers => { section => sub { $no[$#no]++; $no= join '.', @no; push @no, 0; } },
                       TwigRoots        => { title   => sub { $_[1]->prefix( $no); $_[1]->print; } },
                       EndTagHandlers   => { section => sub { pop @no;  } },
                       TwigPrintOutsideRoots => 1
                                   );
                $t->parsefile( $file);

          Using the EndTagHandlers argument without TwigRoots will result
          in an error.

    CharHandler
          A reference to a subroutine that will be called every time
          PCDATA is found.

    KeepEncoding
          This is a (slightly?) evil option: if the XML document is not
          UTF-8 encoded and you want to keep it that way, then setting
          KeepEncoding will use the Expat original_string method for
          character, thus keeping the original encoding, as well as the
          original entities in the strings.

          WARNING: if the original encoding is multi-byte then attribute
          parsing will be EXTREMELY unsafe under any Perl before 5.6, as
          it uses regular expressions which do not deal properly with
          multi-byte characters.

          WARNING: this option is NOT used when parsing with the
          non-blocking parser (parse_start, parse_more, parse_done
          methods).

    LoadDTD
          If this argument is set to a true value, parse or parsefile on
          the twig will load the DTD information. This information can
          then be accessed through the twig, in a DTDHandler for example.
          This will load even an external DTD.

          See `DTD Handling' in this node for more information

    DTDHandler
          Sets a handler that will be called once the doctype (and the
          DTD) have been loaded, with 2 arguments, the twig and the DTD.

    Id
          This optional argument gives the name of an attribute that can
          be used as an ID in the document. Elements whose ID is known can
          be accessed through the elt_id method. Id defaults to 'id'.  See
          ``' in this node'

    DiscardSpaces
          If this optional argument is set to a true value then spaces are
          discarded when they look non-significant: strings containing
          only spaces are discarded.  This argument is set to true by
          default.

    KeepSpaces
          If this optional argument is set to a true value then all spaces
          in the document are kept, and stored as PCDATA.  KeepSpaces and
          DiscardSpaces cannot be both set.

    DiscardSpacesIn
          This argument sets KeepSpaces to true but will cause the twig
          builder to discard spaces in the elements listed.  The syntax
          for using this argument is:   new XML::Twig( DiscardSpacesIn =>
          [ 'elt1', 'elt2']);

    KeepSpacesIn
          This argument sets DiscardSpaces to true but will cause the twig
          builder to keep spaces in the elements listed.  The syntax for
          using this argument is:   new XML::Twig( KeepSpacesIn => [
          'elt1', 'elt2']);

    PrettyPrint
          Sets the pretty print method, amongst 'none' (default),
          'nsgmls', 'nice', 'indented', 'record' and rec'record'ord_c

    EmptyTags
          Sets the empty tag display style (normal, html or expand).

    Comments
          Sets the way comments are processed: drop (default), keep or
          process

         drop
               drops the comments, they are not read, nor printed to the
               output

         keep
               comments are loaded and will appear on the output, they are
               not accessible within the twig and will not interfere with
               processing though

               Bug: comments in the middle of a text element such as

                    <p>text <!-- comment --> more text --></p>

               are output at the end of the text:

                    <p>text  more text <!-- comment --></p>

         process
               comments are loaded in the twig and will be treated as
               regular elements (their gi is `#COMMENT') this can
               interfere with processing if you expect
               `$elt-'{first_child}> to be an element but find a comment
               there.  Validation will not protect you from this as
               comments can happen anywhere.  You can use
               `$elt-'first_child( 'gi')> (which is a good habit anyway)
               to get where you want. Consider using

    Pi
          Sets the way processing instructions are processed: drop, keep
          (default) or process

          Note that you can also set PI handlers in the TwigHandlers
          option:

               '?'       => \&handler
               '?target' => \&handler 2

          The handlers will be called with 2 parameters, the twig and the
          PI element if Pi is set to process, and with 3, the twig, the
          target and the data if Pi is set to keep. Of course they will
          not be called if PI is set to drop.

          If Pi is set to keep the handler should return a string that
          will be used as-is as the PI text (it should look like "`
          <?target data?' >" or " if you want to remove the PI),

          Only one handler will be called, `?target' or ? if no specific
          handler for that target is available.

     Note: I _HATE_ the Java-like name of arguments used by most XML
     modules. As XML::Twig is based on XML::Parser I kept the style, but
     you can also use a more perlish syntax, using
     `twig_print_outside_roots' instead of TwigPrintOutsideRoots or
     pretty_print instead of PrettyPrint, XML::Twig then normalizes all
     the argument names.

parse(SOURCE [, OPT => OPT_VALUE [...]])
     This method is inherited from XML::Parser.  The SOURCE parameter
     should either be a string containing the whole XML document, or it
     should be an open IO::Handle. Constructor options to
     XML::Parser::Expat given as keyword-value pairs may follow the SOURCE
     parameter. These override, for this call, any options or attributes
     passed through from the XML::Parser instance.

     A die call is thrown if a parse error occurs. Otherwise it will return
     the twig built by the parse. Use *safe_parse* if you want the parsing
     to return even when an error occurs.

parsestring
     This is just an alias for parse for backwards compatibility.

parsefile(FILE [, OPT => OPT_VALUE [...]])
     This method is inherited from XML::Parser.

     Open FILE for reading, then call parse with the open handle. The file
     is closed no matter how parse returns.

     A die call is thrown if a parse error occurs. Otherwise it will return
     the twig built by the parse. Use *safe_parsefile* if you want the
     parsing to return even when an error occurs.

safe_parse( SOURCE [, OPT => OPT_VALUE [...]])
     This method is similar to parse except that it wraps the parsing in an
     eval block. It returns the twig on success and 0 on failure (the twig
     object also contains the parsed twig). $@ contains the error message
     on failure.

     Note that the parsing still stops as soon as an error is detected,
     there is no way to keep going after an error.

safe_parsefile(FILE [, OPT => OPT_VALUE [...]])
     This method is similar to parsefile except that it wraps the parsing
     in an eval block. It returns the twig on success and 0 on failure
     (the twig object also contains the parsed twig) . $@ contains the
     error message on failure

     Note that the parsing still stops as soon as an error is detected,
     there is no way to keep going after an error.

setTwigHandlers ($handlers)
     Set the Twig handlers. $handlers is a reference to a hash similar to
     the one in the TwigHandlers option of new. All previous handlers are
     unset.  The method returns the reference to the previous handlers.

setTwigHandler ($gi $handler)
     Set a single Twig handlers for the $gi element. $handler is a
     reference to a subroutine. If the handler was previously set then the
     reference to the previous handler is returned.

setStartTagHandlers ($handlers)
     Set the StartTag handlers. $handlers is a reference to a hash similar
     to the one in the StartTagHandlers option of new. All previous
     handlers are unset.  The method returns the reference to the previous
     handlers.

setStartTagHandler ($gi $handler)
     Set a single StartTag handlers for the $gi element. $handler is a
     reference to a subroutine. If the handler was previously set then the
     reference to the previous handler is returned.

setEndTagHandlers ($handlers)
     Set the EndTag handlers. $handlers is a reference to a hash similar
     to the one in the EndTagHandlers option of new. All previous handlers
     are unset.  The method returns the reference to the previous handlers.

setEndTagHandler ($gi $handler)
     Set a single EndTag handlers for the $gi element. $handler is a
     reference to a subroutine. If the handler was previously set then the
     reference to the previous handler is returned.

setTwigHandlers ($handlers)
     Set the Twig handlers. $handlers is a reference to a hash similar to
     the one in the TwigHandlers option of new.

dtd
     Returns the dtd (an XML::Twig::DTD object) of a twig

root
     Returns the root element of a twig

first_elt ($optionnal_gi)
     Returns the first element whose gi is $optionnal_gi of a twig, if no
     $optionnal_gi is given then the root is returned

elt_id        ($id)
     Returns the element whose id attribute is $id

entity_list
     Returns the entity list of a twig

change_gi      ($old_gi, $new_gi)
     Performs a (very fast) global change. All elements old_gi are now
     new_gi.  See ``' in this node'

flush            ($optional_filehandle, $options)
     Flushes a twig up to (and including) the current element, then deletes
     all unnecessary elements from the tree that's kept in memory.  flush
     keeps track of which elements need to be open/closed, so if you flush
     from handlers you don't have to worry about anything. Just keep
     flushing the twig every time you're done with a sub-tree and it will
     come out well-formed. After the whole parsing don't forget to flush
     one more time to print the end of the document.  The doctype and
     entity declarations are also printed.

     flush take an optional filehandle as an argument.

     options: use the Update_DTD option if you have updated the (internal)
     DTD and/or the entity list and you want the updated DTD to be output

     The PrettyPrint option sets the pretty printing of the document.

          Example: $t->flush( Update_DTD => 1);
                   $t->flush( \*FILE, Update_DTD => 1);
                   $t->flush( \*FILE);

flush_up_to ($elt, $optionnal_filehandle, %options)
     Flushes up to the $elt element. This allows you to keep part of the
     tree in memory when you flush.

     options: see flush.

purge
     Does the same as a flush except it does not print the twig. It just
     deletes all elements that have been completely parsed so far.

purge_up_to ($elt)
     Purges up to the $elt element. This allows you to keep part of the
     tree in memory when you flush.

print            ($optional_filehandle, %options)
     Prints the whole document associated with the twig. To be used only
     AFTER the parse.

     options: see flush.

sprint
     Returns the text of the whole document associated with the twig. To
     be used only AFTER the parse.

     options: see flush.

set_pretty_print  ($style)
     Sets the pretty print method, amongst 'none' (default), 'nsgmls',
     'nice', 'indented', 'record' and rec'record'ord_c

     WARNING: the pretty print style is a GLOBAL variable, so once set it's
     applied to ALL print's (and sprint's). Same goes if you use XML::Twig
     with mod_perl . This should not be a problem as the XML that's
     generated is valid anyway, and XML processors (as well as HTML
     processors, including browsers) should not care. Let me know if this
     is a big problem, but at the moment the performance/cleanliness
     trade-off clearly favors the global approach.

set_empty_tag_style  ($style)
     Sets the empty tag display style (normal, html or expand). As with
     set_pretty_print this sets a global flag.

     normal outputs an empty tag '<tag/>', html adds a space '<tag /> and
     expand outputs '<tag></tag>'

print_prolog     ($optional_filehandle, %options)
     Prints the prolog (XML declaration + DTD + entity declarations) of a
     document.

     options: see flush.

prolog     ($optional_filehandle, %options)
     Returns the prolog (XML declaration + DTD + entity declarations) of a
     document.

     options: see flush.

finish
     Call Expat finish method.  Unsets all handlers (including internal
     ones that set context), but expat continues parsing to the end of the
     document or until it finds an error.  It should finish up a lot
     faster than with the handlers set.

finish_print
     Stop twig processing, flush the twig and proceed to finish printing
     the document as fast as possible. Use this method when modifying a
     document and the modification is done.

depth
     Calls Expat's depth method , which returns the depth in the tree
     during the parsing.  This is usefull when using the TwigRoots option
     to still get info on the actual document.

in_element ($gi)
     Call Expat in_element method.  Returns true if $gi is equal to the
     name of the innermost currently opened element. If namespace
     processing is being used and you want to check against a name that
     may be in a namespace, then use the generate_ns_name method to create
     the $gi argument. Usefull when using the TwigRoots option.

within_element($gi)
     Call Expat within_element method.  Returns the number of times the
     given name appears in the context list.  If namespace processing is
     being used and you want to check against a name that may be in a
     namespace, then use the generate_ns_name method to create the $gi
     argument. Usefull when using the TwigRoots option.

context
     Returns a list of element names that represent open elements, with
     the last one being the innermost. Inside start and end tag handlers,
     this will be the tag of the parent element.

path($gi)
     Returns the element context in a form similar to XPath's short form:
     '/root/gi1/../gi'

get_xpath  ($xpath, $optional_offset)
     Performs a get_xpath on the document root (see Elt)

Elt
---

new          ($optional_gi, $optional_atts, @optional_content)
     The gi is optional (but then you can't have a content ), the optional
     atts is the ref of a hash of attributes, the content can be just a
     string or a list of strings and element. A content of '#EMPTY'
     creates an empty element;

          Examples: my $elt= new XML::Twig::Elt();
                    my $elt= new XML::Twig::Elt( 'para', { align => 'center' });
          	   my $elt= new XML::Twig::Elt( 'br', '#EMPTY');
          	   my $elt= new XML::Twig::Elt( 'para');
                    my $elt= new XML::Twig::Elt( 'para', 'this is a para');
                    my $elt= new XML::Twig::Elt( 'para', $elt3, 'another para');

     The strings are not parsed, the element is not attached to any twig.

     WARNING: if you rely on ID's then you will have to set the id
     yourself. At this point the element does not belong to a twig yet, so
     the ID attribute is not known so it won't be strored in the ID list.

parse         ($string, %args)
     Creates an element from an XML string. The string is actually parsed
     as a new twig, then the root of that twig is returned.  The arguments
     in %args are passed to the twig.  As always if the parse fails the
     parser will die, so use an eval if you want to trap syntax errors.

     As obviously the element does not exist beforehand this method has to
     be called on the class:

          my $elt= parse XML::Twig::Elt( "<a> string to parse, with <sub/>
                                          <elements>, actually tons of </elements>
          				  h</a>");

set_gi         ($gi)
     Sets the gi of an element

gi
     Returns the gi of the element

closed
     Returns true if the element has been closed. Might be usefull if you
     are somewhere in the tree, during the parse, and have no idea whether
     a parent element is completely loaded or not.

is_pcdata
     Returns 1 if the element is a #PCDATA element, returns 0 otherwise.

pcdata
     Returns the text of a PCDATA element or undef if the element is not
     PCDATA.

set_pcdata     ($text)
     Sets the text of a PCDATA element.

append_pcdata  ($text)
     Add the text at the end of a #PCDATA element.

is_cdata
     Returns 1 if the element is a #CDATA element, returns 0 otherwise.

is_text
     Returns 1 if the element is a #CDATA or #PCDATA element, returns 0
     otherwise.

cdata
     Returns the text of a CDATA element or undef if the element is not
     CDATA.

set_cdata     ($text)
     Sets the text of a CDATA element.

append_cdata  ($text)
     Add the text at the end of a #CDATA element.

is_empty
     Returns 1 if the element is empty, 0 otherwise

set_empty
     Flags the element as empty. No further check is made, so if the
     element is actually not empty the output will be messed. The only
     effect of this method is that the output will be <gi att="value""/>.

set_not_empty
     Flags the element as not empty. if it is actually empty then the
     element will be output as <gi att="value""></gi>

root
     Returns the root of the twig in which the element is contained.

twig
     Returns the twig containing the element.

parent        ($optional_gi)
     Returns the parent of the element, or the first ancestor whose gi is
     $optional_gi.

first_child   ($optional_gi)
     Returns the first child of the element, or the first child whose gi is
     $optional_gi (ie the first of the element children whose gi matches).

child ($offset, $optional_gi)
     Returns the $offset-th child of the element, optionally the
     $offset-th child with a gi of $optional_gi. The children are treated
     as a list, so $elt->child( 0) is the first child, while $elt->chlid(
     -1) is the last child.

child_text ($offset, $optional_gi)
     Returns the text of a child or undef if the sibling does not exist.
     Arguments are the same as child.

first_child_text   ($optional_gi)
     Returns the text of the first child of the element, or the first child
     whose gi is $optional_gi.(ie the first of the element children whose
     gi matches).  If there is no first_child then returns ". This avoids
     getting the child, checking for its existence then getting the text
     for trivial cases.

field         ($optional_gi)
     Same method as first_child_text with a different name

last_child    ($optional_gi)
     Returns the last child of the element, or the last child whose gi is
     $optional_gi (ie the last of the element children whose gi matches).

last_child_text   ($optional_gi)
     Same as first_child_text but for the last child.

prev_sibling  ($optional_gi)
     Returns the previous sibling of the element, or the first one whose
     gi is $optional_gi.

sibling  ($offset, $optional_gi)
     Returns the next or previous $offset-th sibling of the element, or
     the $offset-th one whose gi is $optional_gi. If $offset is negative
     then a previous sibling is returned, if $offset is positive then  a
     next sibling is returned. $offset=0 returns the element if there is
     no $optional_gi or if the element gi matches $optional_gi, undef
     otherwise.

sibling_text ($offset, $optional_gi)
     Returns the text of a sibling or undef if the sibling does not exist.
     Arguments are the same as sibling.

next_sibling  ($optional_gi)
     Returns the next sibling of the element, or the first one whose gi is
     $gi.

next_elt     ($optional_elt, $optional_gi)
     Returns the next elt (optionally whose gi is $gi) of the element.
     This is defined as the next element which opens after the current
     element opens.  Which usually means the first child of the element.
     Counter-intuitive as it might look this allows you to loop through the
     whole document by starting from the root.

     The $optional_elt is the root of a subtree. When the next_elt is out
     of the subtree then the method returns undef. You can then walk a sub
     tree with:

          my $elt= $subtree_root;
          while( $elt= $elt->next_elt( $subtree_root)
            { # insert processing code here
            }

prev_elt     ($optional_gi)
     Returns the previous elt (optionally whose gi is $gi) of the element.
     This is the first element which opens before the current one.  It is
     usually either the last descendant of the previous sibling or simply
     the parent

children     ($optional_gi)
     Returns the list of children (optionally whose gi is $gi) of the
     element.  The list is in document order.

descendants     ($optional_gi)
     Returns the list of all descendants (optionally whose gi is $gi) of
     the element This is the equivalent of the getElementsByTagName of the
     DOM

ancestors    ($optional_gi)
     Returns the list of ancestors (optionally whose gi is $gi) of the
     element.  The list is ordered from the innermost ancestor to the
     outtermost one

     NOTE: the element itself is not part of the list, in order to include
     it you will have to write:

          my @array= ($elt, $elt->ancestors)
          
          =item prev_siblings ($optional_gi)

     Returns the list of previous siblings (optionaly whose gi is
     $optional_gi) for the element. The elements are ordered in document
     order.

next_siblings ($optional_gi)
     Returns the list of siblings (optionaly whose gi is $optional_gi)
     following the element. The elements are ordered in document order.

get_xpath  ($xpath, $optional_offset)
     Returns a list of elements satisfying the $xpath. $xpath is an
     XPATH-like expression.  A subset of the XPATH abbreviated syntax is
     covered:

          gi
          gi[1] (or any other positive number)
          gi[last()]
          gi[@att] (the attribute exists for the element)
          gi[@att="val"]
          gi[att1="val1" and att2="val2"]
          gi[att1="val1" or att2="val2"]
          gi[string()="toto"] (returns gi elements which text (as per the text method) is toto)
          gi[string()=~/regexp/] (returns gi elements which text (as per the text method) matches regexp)
          expressions can start with / (search starts at the document root)
          expressions can start with . (search starts at the current element)
          // can be used to get all descendants instead of just direct children
          * matches any gi
          
          So the following examples from the XPATH recommendation (http://www.w3.org/TR/xpath.html#path-abbrev)
          work:

          para selects the para element children of the context node
          * selects all element children of the context node
          para[1] selects the first para child of the context node
          para[last()] selects the last para child of the context node
          */para selects all para grandchildren of the context node
          /doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc
          chapter//para selects the para element descendants of the chapter element children of the context node
          //para selects all the para descendants of the document root and thus selects all para elements in the same document as the
                     context node
          //olist/item selects all the item elements in the same document as the context node that have an olist parent
          .//para selects the para element descendants of the context node
          .. selects the parent of the context node
          para[@type="warning"] selects all para children of the context node that have a type attribute with value warning
          employee[@secretary and @assistant] selects all the employee children of the context node that have both a secretary attribute and
                     an assistant attribute

     The elements will be returned in the document order.

     If $optional_offset is used then only one element will be returned,
     the one with the appropriate offset in the list, starting at 0

     Quoting and interpolating variables can be a pain when the Perl
     syntax and the XPATH syntax collide, so here are some more examples
     to get you started:

          my $p1= "p1";
          my $p2= "p2";
          my @res= $t->get_xpath( "p[string( '$p1') or string( '$p2')]");

          my $a= "a1";
          my @res= $t->get_xpath( "//*[@att=\"$a\"]);

          my $val= "a1";
          my $exp= "//p[ \@att='$val']"; # note that you need to use \@ or you will get a warning
          my @res= $t->get_xpath( $exp);

     XML::Twig does not provide full XPATH support. If that's what you
     want then look no further than the XML::XPath module on CPAN.

level       ($optional_gi)
     Returns the depth of the element in the twig (root is 0).  If the
     optional gi is given then only ancestors of the given type are
     counted.

     WARNING: in a tree created using the TwigRoots option this will not
     return the level in the document tree, level 0 will be the document
     root, level 1 will be the TwigRoots elements. During the parsing (in
     a TwigHandler) you can use the depth method on the twig object to get
     the real parsing depth.

in           ($potential_parent)
     Returns true if the element is in the potential_parent
     ($potential_parent is an element)

in_context   ($gi, $optional_level)
     Returns true if the element is included in an element whose gi is $gi,
     optionally within $optional_level levels. The returned value is the
     including element.

atts
     Returns a hash ref containing the element attributes

set_atts      ({att1=>$att1_val, att2=> $att2_val... })
     Sets the element attributes with the hash ref supplied as the argument

del_atts
     Deletes all the element attributes.

set_att      ($att, $att_value)
     Sets the attribute of the element to the given value

att          ($att)
     Returns the attribute value

del_att      ($att)
     Delete the attribute for the element

inherit_att  ($att, @optional_gi_list)
     Returns the value of an attribute inherited from parent tags. The
     value returned is found by looking for the attribute in the element
     then in turn in each of its ancestors. If the @optional_gi_list is
     supplied only those ancestors whose gi is in the list will be checked.

set_id       ($id)
     Sets the id attribute of the element to the value.  See ``' in this
     node' to change the id attribute name

id
     Gets the id attribute value

del_id       ($id)
     Deletes the id attribute of the element and remove it from the id list
     for the document

cut
     Cuts the element from the tree.

copy        ($elt)
     Returns a copy of the element. The copy is a "deep" copy: all sub
     elements of the element are duplicated.

paste       ($optional_position, $ref)
     Pastes a (previously cut) element.  The optional position element can
     be:

    first_child (default)
          The element is pasted as the first child of the element object
          this method is called on.

    last_child
          The element is pasted as the last child of the element object
          this method is called on.

    before
          The element is pasted before the element object, as its previous
          sibling.

    after
          The element is pasted after the element object, as its next
          sibling.

move       ($optional_position, $ref)
     Move an element in the tree.  This is just a cut then a paste.  The
     syntax is the same as paste.

replace       ($ref)
     Replaces an element in the tree. Sometimes it is just not possible to
     cut an element then paste another in its place, so replace comes in
     handy.

prefix       ($text)
     Add a prefix to an element. If the element is a PCDATA element the
     text is added to the pcdata, if the elements first_child is a PCDATA
     then the text is added to it's pcdata, otherwise a new PCDATA element
     is created and pasted as the first child of the element.

suffix       ($text)
     Add a suffix to an element. If the element is a PCDATA element the
     text is added to the pcdata, if the elements last_child is a PCDATA
     then the text is added to it's pcdata, otherwise a new PCDATA element
     is created and pasted as the last child of the element.

erase
     Erases the element: the element is deleted and all of its children are
     pasted in its place.

delete
     Cut the element and frees the memory.

DESTROY
     Frees the element from memory.

start_tag
     Returns the string for the start tag for the element, including the
     /> at the end of an empty element tag

end_tag
     Returns the string for the end tag of an element.  For an empty
     element, this returns the empty string (").

print         ($optional_filehandle, $pretty_print_style)
     Prints an entire element, including the tags, optionally to a
     $optional_filehandle, optionally with a $pretty_print_style.

sprint       ($elt, $optional_no_enclosing_tag)
     Returns the string for an entire element, including the tags. To be
     used with caution!  If the optional second argument is true then only
     the string inside the element is returned (the start and end tag for
     $elt are not).

set_pretty_print ($style)
     Sets the pretty print method, amongst 'none' (default), 'nsgmls',
     'nice', 'indented', 'record' and 'record_c'

    none
          the default, no \n is used

    nsgmls
          nsgmls style, with \n added within tags

    nice
          adds \n wherever possible (NOT SAFE, can lead to invalid XML)

    indented
          same as nice plus indents elements (NOT SAFE, can lead to
          invalid XML)

    record
          table-oriented pretty print, one field per line

    record_c
          table-oriented pretty print, more compact than record, one
          record per line

set_empty_tag_style ($style)
     Sets the method to output empty tags, amongst 'normal' (default),
     'html', and 'expand',

set_indent ($string)
     Sets the indentation for the indented pretty print style (default is
     2 spaces)

set_quote ($quote)
     Sets the quotes used for attributes. can be 'double' (default) or
     'single'

text
     Returns a string consisting of all the PCDATA and CDATA in an element,
     without any tags.

set_text        ($string)
     Sets the text for the element: if the element is a PCDATA, just set
     its text, otherwise cut all the children of the element and create a
     single PCDATA child for it, which holds the text.

set_content    ( $optional_atts, @list_of_elt_and_strings)                      ( $optional_atts, '#EMPTY')
     Sets the content for the element, from a list of strings and
     elements.  Cuts all the element children, then pastes the list
     elements as the children.  This method will create a PCDATA element
     for any strings in the list.

     The optional_atts argumentis the ref of a hash of attributes. If this
     argument is used then the previous attrubutes are deleted, otherwise
     they are left untouched.

     WARNING: if you rely on ID's then you will have to set the id
     yourself. At this point the element does not belong to a twig yet, so
     the ID attribute is not known so it won't be strored in the ID list.

     A content of '#EMPTY' creates an empty element;

insert         (@gi)
     For each gi in the list inserts an element $gi as the only child of
     the element.  All children of the element are set as children of the
     new element.  The upper level element is returned.

     $p->insert( 'table', 'tr', 'td') puts $p in a table with a single tr
     and a single td and returns the table element.

wrap_in        (@gi)
     Wraps elements $gi as the successive ancestors of the element,
     returns the new element.  $elt->wrap_in( 'td', 'tr', 'table') wraps
     the element as a single cell in a table for example.

cmp       ($elt)   Compare the order of the 2 elements in a twig.
          $a is the <A>..</A> element, $b is the <B>...</B> element
          
          document                        $a->cmp( $b)
          <A> ... </A> ... <B>  ... </B>     -1
          <A> ... <B>  ... </B> ... </A>     -1
          <B> ... </B> ... <A>  ... </A>      1
          <B> ... <A>  ... </A> ... </B>      1
           $a == $b                           0
           $a and $b not in the same tree   undef

before       ($elt)
     Returns 1 if $elt starts before the element, 0 otherwise. If the 2
     elements are not in the same twig then return undef.

          if( $a->cmp( $b) == -1) { return 1; } else { return 0; }

after       ($elt)
     Returns 1 if $elt starts after the element, 0 otherwise. If the 2
     elements are not in the same twig then return undef.

          if( $a->cmp( $b) == -1) { return 1; } else { return 0; }

path
     Returns the element context in a form similar to XPath's short form:
     '/root/gi1/../gi'

private methods

    set_parent        ($parent)
    set_first_child   ($first_child)
    set_last_child    ($last_child)
    set_prev_sibling  ($prev_sibling)
    set_next_sibling  ($next_sibling)
    set_twig_current
    del_twig_current
    twig_current
    flushed
          This method should NOT be used, always flush the twig, not an
          element.

    set_flushed
    del_flushed
    flush
    contains_text
     Those methods should not be used, unless of course you find some
     creative and interesting, not to mention useful, ways to do it.

Entity_list
-----------

new
     Creates an entity list.

add         ($ent)
     Adds an entity to an entity list.

delete     ($ent or $gi).
     Deletes an entity (defined by its name or by the Entity object) from
     the list.

print      ($optional_filehandle)
     Prints the entity list.

Entity
------

new        ($name, $val, $sysid, $pubid, $ndata)
     Same arguments as the Entity handler for XML::Parser.

print       ($optional_filehandle)
     Prints an entity declaration.

text
     Returns the entity declaration text.

EXAMPLES
========

   See the test file in t/test[1-n].t Additional examples (and a complete
tutorial) can be found at http://www.xmltwig.cx/

   To figure out what flush does call the following script with an  xml
file and an element name as arguments

     use XML::Twig;

     my ($file, $elt)= @ARGV;
     my $t= new XML::Twig( TwigHandlers =>
         { $elt => sub {$_[0]->flush; print "\n[flushed here]\n";} });
     $t->parsefile( $file, ErrorContext => 2);
     $t->flush;
     print "\n";

NOTES
=====

DTD Handling
------------

   There are 3 possibilities here.  They are:

No DTD
     No doctype, no DTD information, no entity information, the world is
     simple...

Internal DTD
     The XML document includes an internal DTD, and maybe entity
     declarations.

     If you use the LoadDTD option when creating the twig the DTD
     information and the entity declarations can be accessed.

     The DTD and the entity declarations will be flush'ed (or print'ed)
     either as is (if they have not been modified) or as reconstructed
     (poorly, comments are lost, order is not kept, due to it's content
     this DTD should not be viewed by anyone) if they have been modified.
     You can also modify them directly by changing the
     $twig->{twig_doctype}->{internal} field (straight from XML::Parser,
     see the Doctype handler doc)

External DTD
     The XML document includes a reference to an external DTD, and maybe
     entity declarations.

     If you use the LoadDTD when creating the twig the DTD information and
     the entity declarations can be accessed. The entity declarations will
     be flush'ed (or print'ed) either as is (if they have not been
     modified) or as reconstructed (badly, comments are lost, order is not
     kept).

     You can change the doctype through the $twig->set_doctype method and
     print the dtd through the $twig->dtd_text or $twig->dtd_print methods.

     If you need to modify the entity list this is probably the easiest
     way to do it.

Flush
-----

   If you set handlers and use flush, do not forget to flush the twig one
last time AFTER the parsing, or you might be missing the end of the
document.

   Remember that element handlers are called when the element is CLOSED, so
if you have handlers for nested elements the inner handlers will be called
first. It makes it for example trickier than it would seem to number nested
clauses.

BUGS
====

ID list
     The ID list is NOT updated when ID's are modified or elements cut or
     deleted.

change_gi
     This method will not function properly if you do:

          $twig->change_gi( $old1, $new);
          $twig->change_gi( $old2, $new);
          $twig->change_gi( $new, $even_newer);

sanity check on XML::Parser method calls
     XML::Twig should really prevent calls to some XML::Parser methods,
     especially the setHandlers method.

Globals
=======

   These are the things that can mess up calling code, especially if
threaded.  They might also cause problem under mod_perl.

Exported constants
     Whether you want them or not you get them! These are subroutines to
     use as constant when creating or testing elements

    PCDATA
          returns '#PCDATA'

    CDATA
          returns '#CDATA'

    PI
          returns '#PI', I had the choice between PROC and PI :-(

Module scoped values: constants
     these should cause no trouble:

          %base_ent= ( '>' => '&gt;',
                       '<' => '&lt;',
                       '&' => '&amp;',
                       "'" => '&apos;',
                       '"' => '&quot;',
                     );
          CDATA_START   = "<![CDATA[";
          CDATA_END     = "]]>";
          PI_START      = "<?";
          PI_END        = "?>";
          COMMENT_START = "<!--";
          COMMENT_END   = "-->";

     pretty print styles

          ( $NSGMLS, $NICE, $INDENTED, $RECORD1, $RECORD2)= (1..5);

     empty tag output style

          ( $HTML, $EXPAND)= (1..2);

Module scoped values: might be changed
     Most of these deal with pretty printing, so the worst that can happen
     is probably that XML output does not look right, but is still valid
     and processed identically by XML processors.

     $empty_tag_style can mess up HTML bowsers though and changing $ID
     would most likely create problems.

          $pretty=0;           # pretty print style
          $quote='"';          # quote for attributes
          $INDENT= '  ';       # indent for indented pretty print
          $empty_tag_style= 0; # how to display empty tags
          $ID                  # attribute used as a gi ('id' by default)

Module scoped values: definitely changed
     These 2 variables are used to replace gi's by an index, thus saving
     some space when creating a twig. If they really cause you too much
     trouble, let me know, it is probably possible to create either a
     switch or at least a version of XML::Twig that does not perform this
     optimisation.

          %gi2index;     # gi => index
          @index2gi;     # list of gi's

TODO
====

multiple twigs are not well supported
     A number of twig features are just global at the moment. These include
     the ID list and the "gi pool" (if you use change_gi then you change
     the gi for ALL twigs).

     The next version will try to support this while trying not to be to
     hard on performance (at least when a single twig is used!).

XML::Parser-like handlers
     Sometimes it would be nice to be able to use both XML::Twig handlers
     and XML::Parser handlers, for example to perform generic tasks on all
     open tags, like adding an ID, or taking care of the autonumbering.

     Next version...

BENCHMARKS
==========

   You can use the `benchmark_twig' file to do additional benchmarks.
Please send me benchmark information for additional systems.

AUTHOR
======

   Michel Rodriguez <m.v.rodriguez@ieee.org>

   This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

   Bug reports and comments to m.v.rodriguez@ieee.org.  The XML::Twig page
is at http://www.xmltwig.cx/

SEE ALSO
========

   XML::Parser