This is Info file pm.info, produced by Makeinfo version 1.68 from the input file bigpm.texi.  File: pm.info, Node: XML/Doctype, Next: XML/Doctype/AttDef, Prev: XML/DT, Up: Module List A DTD object class ****************** NAME ==== XML::Doctype - A DTD object class SYNOPSIS ======== # To parse an external DTD at compile time, useful when # using XML::ValidWriter use XML::Doctype NAME => 'FooML', SYSTEM_ID => 'FooML.dtd' ; use XML::Doctype NAME => 'FooML', DTD_TEXT => $dtd ; # Parsing at run-time $doctype = XML::Doctype->new( 'FooML', SYSTEM_ID => 'FooML.dtd' ) ; # or $doctype = XML::Doctype->new() ; $doctype->parse( 'FooML', 'FooML.dtd' ) ; # Saving the parsed object open( PM, ">FooML/DTD/v1_000.pm" ) or die $! ; print PM $doctype->as_pm( 'FooML::DTD::v1_000' ) ; # Using a saved parsed DTD use FooML::DTD::v1_000 ; $doctype = FooML::DTD::v1_000->new() ; DESCRIPTION =========== This module parses DTDs and allows them to be saved as .pm files and reloaded. The ability to save and reload is intended to aid in packaging parsed DTDs with XML tools so that XML::Parser need not be installed. STATUS ====== This module is alpha code. It's developed enough to support XML::ValidWriter, but need a lot of work. Some big things that are lacking are: * methods or objects to build / traverse the DTD * XML::Doctype::ELEMENT * XML::Doctype::ATTLIST * XML::Doctype::ENITITY METHODS ======= new $doctype = XML::Doctype->new() ; $doctype = XML::Doctype->new( 'FooML', DTD_TEXT => $doctype_text ) ; $doctype = XML::Doctype->new( 'FooML', SYSTEM_ID => 'FooML.dtd' ) ; name $name = $doctype->name() ; Sets/gets the name. parse_dtd $doctype->parse_dtd( $name, $doctype_text ) ; $doctype->parse_dtd( $name, $doctype_text, 'internal' ) ; Parses the text of a DTD from a scalar. $name is used to indicate the name of the DOCTYPE, and thus the root node. The DTD is considered to be external unless the third parameter is TRUE. parse_dtd_file $doctype->parse_dtd_file( $name, $system_id [, $public_id] ) ; $doctype->parse_dtd_file( $name, $system_id [, $public_id], 'internal' ) ; Parses a DTD from a file. Eventually will support full URL syntax. $public_id is ignored for now, and $system_id is used to locate the DTD. This routine requires XML::Parser. XML::Parser is not loaded at any other time and is not needed to use the resulting DTD object. The DTD is considered to be external unless the fourth parameter is TRUE. $doctype->parse_dtd_file( $name, $system_id, $p_id, 'internal' ) ; $doctype->parse_dtd_file( $name, $system_id, undef, 'internal' ) ; system_id $system_id = $doctype->system_id() ; Sets/gets the system ID. public_id $public_id = $doctype->public_id() ; Sets/gets the public_id. element_decl $elt_decl = $doctype->element_decl( $name ) ; Returns the XML::Doctype:Element object associated with $name. These can be defined by tags or undefined, which can happen if they were just referred-to by or tags. element_names Returns an unsorted list of element names. This list includes names that are declared and undeclared (but referred to in element declarations or attribute definitions). as_pm open( PM, "FooML/DTD/v1_001.pm" ) or die $! ; print PM $doctype->as_pm( 'FooML::DTD::v1_001' ) or die $! ; close PM or die $! ; Then, later: use FooML::DTD::v1_001 ; # Do *not* use () as a parameter list! Returns string containing the DTD as an independant module, allowing the DTD to be parsed in the development environment and shipped as Perl code, so that the target environment need not have XML::Parser installed. This is useful for XML creation-only tools and as an efficiency tuning measure if you will be rereading the same set of DTDs over and over again. import use use XML::Doctype NAME => 'FooML', SYSTEM_ID => 'dtds/FooML.dtd' ; import() constructs a default DTD object for the calling package so that XML::ValidWriter's functional interface can use it. If XML::Doctype is subclassed, the subclasses' constructor is called with all parameters. SUBCLASSING =========== This object uses the fields pragma, so you should use base and fields for any subclasses. AUTHOR ====== Barrie Slaymaker COPYRIGHT ========= This module is Copyright 2000, Barrie Slaymaker. All rights reserved. This module is licensed under the GPL, version 2. Please contact me if this does not suit your needs.  File: pm.info, Node: XML/Doctype/AttDef, Next: XML/Doctype/ElementDecl, Prev: XML/Doctype, Up: Module List A class representing a definition in an tag ****************************************************** NAME ==== XML::Doctype::AttDef - A class representing a definition in an tag SYNOPSIS ======== $attr = $elt->attribute( $name ) ; $attr->name ; DESCRIPTION =========== This module is used to represent tags in an XML::Doctype object. It contains tags as well. STATUS ====== This module is alpha code. It's developed enough to support XML::ValidWriter, but need a lot of work. Some big things that are lacking are: METHODS ======= new $dtd = XML::Doctype::AttDef->new( $name, $type, $default ) ; default ( $spec, $value ) = $attr->default ; $attr->default( '#REQUIRED' ) ; $attr->default( '#IMPLIED' ) ; $attr->default( '', 'foo' ) ; $attr->default( '#FIXED', 'foo' ) ; Sets/gets the default value. This is a quant $attdef->quant( $q ) ; $q = $attdef->quant ; Sets/gets the attribute quantifier: '#REQUIRED', '#FIXED', '#IMPLIED', or ". name $attdef->name( $name ) ; $name = $attdef->name ; Sets/gets this attribute name. Don't change the name while an attribute is in an element's attlist, since it will then be filed under the wrong name. default_on_write $attdef->default_on_write( $value ) ; $value = $attdef->default_on_write ; $attdef->default_on_write( $attdef->default ) ; Sets/gets the value which is automatically output for this attribute if none is supplied to $writer->startTag. This is typically used to set a document-wide default for #REQUIRED attributes (and perhaps plain attributes) so that the attribute is treated like a #FIXED tag and emitted with a fixed value. The default_on_write does not need to be the same as the default unless the quantifier is #FIXED. SUBCLASSING =========== This object uses the fields pragma, so you should use base and fields for any subclasses. AUTHOR ====== Barrie Slaymaker COPYRIGHT ========= This module is Copyright 2000, Barrie Slaymaker. All rights reserved. This module is licensed under the GPL, version 2. Please contact me if this does not suit your needs.  File: pm.info, Node: XML/Doctype/ElementDecl, Next: XML/Driver/HTML, Prev: XML/Doctype/AttDef, Up: Module List A class representing an tag ************************************** NAME ==== XML::Doctype::ElementDecl - A class representing an tag SYNOPSIS ======== $elt = $dtd->element( 'foo' ) ; $elt->name() ; $elt->attr( 'foo' ) ; DESCRIPTION =========== This module is used to represent tags in an XML::Doctype object. It contains tags as well. STATUS ====== This module is alpha code. It's developed enough to support XML::ValidWriter, but need a lot of work. Some big things that are lacking are: METHODS ======= new # Undefined element constructors: $dtd = XML::Doctype::ElementDecl->new( $name ) ; $dtd = XML::Doctype::ElementDecl->new( $name, undef, \@attdefs ) ; # Defined element constructors $dtd = XML::Doctype::ElementDecl->new( $name, \@kids, \@attdef ) ; $dtd = XML::Doctype::ElementDecl->new( $name, [], \@attdefs ) ; add_attdef $elt_decl->add_attdef( $att_def ) ; attdef $attr = $elt->attdef( $name ) ; Returns the XML::Doctype::AttDef named by $name or undef if there is no such attribute. attdefs $attdefs = $elt->attdefs( $name ) ; Returns the list of XML::Doctype::AttDef instances associated with this element. attribute_names Returns a list of the attdefs' names. child_names @names = $elt->child_names ; Returns a list of names of elements in this element decl's content model. is_declared if ( $elt_decl->is_declared ) ... $elt_decl->is_declared( 1 ) ; Returns TRUE if there is any data defined in the element other than name and attributes or if is_declared has been set by calling is_declared( 1 ) or passing DECLARED => 1 to new(). is_empty is_any is_mixed name $n = $elt_decl->name ; Gets the name of the element. validate_content $v = $elt_decl->validate_content( \@seq ) ; Takes an ARRAY ref of tag names (or '#PCDATA') and checks to see if it would be valid content for elements of this type. Right now, this must be called only when an element's end tag is emitted. It can be broadened to be incremental if need be. SUBCLASSING =========== This object uses the fields pragma, so you should use base and fields for any subclasses. AUTHOR ====== Barrie Slaymaker COPYRIGHT ========= This module is Copyright 2000, Barrie Slaymaker. All rights reserved. This module is licensed under the GPL, version 2. Please contact me if this does not suit your needs.  File: pm.info, Node: XML/Driver/HTML, Next: XML/Dumper, Prev: XML/Doctype/ElementDecl, Up: Module List SAX Driver for non wellformed HTML. *********************************** NAME ==== XML::Driver::HTML - SAX Driver for non wellformed HTML. SYNOPSIS ======== use XML::Driver::HTML; $driver = new XML::Driver::HTML( 'Handler' => $some_sax_filter_or_handler, 'Source' => $some_PerlSAX_like_hash ); $driver->parse(); or use XML::Driver::HTML; $driver = new XML::Driver::HTML(); $driver->parse( 'Handler' => $some_sax_filter_or_handler, 'Source' => $some_PerlSAX_like_hash ); $driver->parse( 'Handler' => $some_other_sax_filter_or_handler, 'Source' => $some_other_source ); =head1 DESCRIPTION XML::Driver::HTML is a SAX Driver for HTML. There is no need for the HTML input to be weel formed, as XML::Driver::HTML is generating its SAX events by walking a HTML::TreeBuilder object. The simplest kind of use, is a filter from HTML to XHTML using XML::Handler::YAWriter as a SAX Handler. my $ya = new XML::Handler::YAWriter( 'Output' => new IO::File ( ">-" ), 'Pretty' => { 'NoWhiteSpace'=>1, 'NoComments'=>1, 'AddHiddenNewline'=>1, 'AddHiddenAttrTab'=>1, } ); my $html = new XML::Driver::HTML( 'Handler' => $ya, 'Source' => { 'ByteStream' => new IO::File ( "<-" ) } ); $html->parse(); METHODS ------- new Creates a new XML::Driver::HTML object. Default options for parsing, described below, are passed as key-value pairs or as a single hash. Options may be changed directly in the object. parse Parses a document. Options, described below, are passed as key-value pairs or as a single hash. Options passed to parse() override the default options in the parser object for the duration of the parse. OPTIONS ------- The following options are supported by XML::Driver::HTML : Handler Default SAX Handler to receive events Source Hash containing the input source for parsing. The `Source' hash may contain the following parameters: ByteStream The raw byte stream (file handle) containing the document. String A string containing the document. SystemId The system identifier (URI) of the document. Encoding A string describing the character encoding. If more than one of `ByteStream', `String', or `SystemId', then preference is given first to `ByteStream', then `String', then `SystemId'. NOTES ===== XML::Driver::HTML requires Perl 5.6 to convert from ISO-8859-1 to UTF-8. BUGS ==== not yet implemented: Interpretation of SystemId as being an URI XHTML document type other bugs: HTML::Parser and HTML::TreeBuilder bugs concerning DOCTYPE and CSS. The NotSoFree License is incompatible to the GNU General Public License. AUTHOR ====== Michael Koehne, Kraehe@Copyleft.De (c) 2000 NotSoFree License SEE ALSO ======== *Note XML/Parser/PerlSAX: XML/Parser/PerlSAX, and *Note HTML/TreeBuilder: HTML/TreeBuilder,  File: pm.info, Node: XML/Dumper, Next: XML/EP, Prev: XML/Driver/HTML, Up: Module List Perl module for dumping Perl objects from/to XML ************************************************ NAME ==== XML::Dumper - Perl module for dumping Perl objects from/to XML SYNOPSIS ======== # Convert Perl code to XML use XML::Dumper; my $dump = new XML::Dumper; $data = [ { first => 'Jonathan', last => 'Eisenzopf', email => 'eisen@pobox.com' }, { first => 'Larry', last => 'Wall', email => 'larry@wall.org' } ]; $xml = $dump->pl2xml($perl); # Convert XML to Perl code use XML::Dumper; my $dump = new XML::Dumper; # some XML my $xml = < foo XML # load Perl data structure from dumped XML $data = $dump->xml2pl($Tree); DESCRIPTION =========== XML::Dumper dumps Perl data to a structured XML format. XML::Dumper can also read XML data that was previously dumped by the module and convert it back to Perl. This is done via the following 2 methods: XML::Dumper::pl2xml XML::Dumper::xml2pl AUTHOR ====== Jonathan Eisenzopf CREDITS ======= Chris Thorman L.M.Orchard DeWitt Clinton SEE ALSO ======== perl(1), XML::Parser(3).  File: pm.info, Node: XML/EP, Next: XML/ESISParser, Prev: XML/Dumper, Up: Module List A framework for embedding XML into a web server *********************************************** NAME ==== XML::EP - A framework for embedding XML into a web server SYNOPSIS ======== # Generate a new XML::EP instance use XML::EP(); my $ep = XML::EP->new(); # Let the instance process an HTTP request $ep->handle($request); DESCRIPTION =========== XML::EP is an administrative framework for embedding XML into a web server. That means that the system allows you to retrieve XML documents from external storage (files, a Tamino database engine, or whatever), parse them, pipe the parsed XML tree into processors (modules, that change the tree, for example the DBI processor will issue SQL queries and insert the result as XML elements). Finally the XML tree will be piped into a so-called formatter, that converts XML to HTML and prints the result. The architecture is as follows: +---------------------+ | Control element | +---------------------+ / | \ / | \ +----------+ XML +------------+ XML +-----------+ | Producer | ---> | Processors | ---> | Formatter | +----------+ +------------+ +-----------+ The control element, an instance of XML::EP::Control, will be created first. Its purpose is the creation of the other elements, the producer (an instance of XML::EP::Producer), one or more processors (instances of XML::EP::Processor) and finally a formatter (an instance of XML::EP::Formatter). The producer, processors, formatters are selected based on virtual host, location (file part of the URL being requested) and in particular depending on the client. For example, an HTML formatter will be selected, if the client seems to request HTML, WML formatter will be created, if the client appears to be WAP HANDY and so on. METHOD INTERFACE ================ Public available methods are: Creating a control element -------------------------- my $control = $ep->control(); (Instance method) This method will create an instance of XML::EP::Control. The main task of this instance is its *CreatePipe* method, which will then be called for creating an XML tree, a list of processors and a formatter. Getting or setting the processors, formatters --------------------------------------------- my $processors = $self->Processors(); $self->Processors($processors); my $formatter = $self->Formatter(); $self->Formatter($formatter); my $request = $self->Request(); $self->Request($request); my $response = $self->Response(); $self->Response($response); (Instance methods) These methods are used for querying or modifying the list of processors (an array ref) or the formatter. Processors are explicitly permitted to use this methods. The response object is designed for receiving HTTP headers, cookies, etc. that are being sent to the client. Response objects are instances of XML::EP::Response. Handling an HTTP request ------------------------ $self->Handle($request); (Instance method) This method is called with an request object (an instance of XML::EP::Request) as argument. The request object contains all information about the client and its request, in particular HTTP headers, etc. The method implements the HTTP requests full life cycle: A control object is created (an instance of XML::EP::Control), the control objects *CreatePipe* method is called for creating an XML tree and initializing the processor list and the formatter, the processors are called and finally the formatter which has to send data to the client.  File: pm.info, Node: XML/ESISParser, Next: XML/Edifact, Prev: XML/EP, Up: Module List Perl SAX parser using nsgmls **************************** NAME ==== XML::ESISParser - Perl SAX parser using nsgmls SYNOPSIS ======== use XML::ESISParser; $parser = XML::ESISParser->new( [OPTIONS] ); $result = $parser->parse( [OPTIONS] ); $result = $parser->parse($string); DESCRIPTION =========== `XML::ESISParser' is a Perl SAX parser using the `nsgmls' command of James Clark's SGML Parser (SP), a validating XML and SGML parser. This man page summarizes the specific options, handlers, and properties supported by `XML::ESISParser'; please refer to the Perl SAX standard in ``SAX.pod'' for general usage information. `XML::ESISParser' defaults to parsing XML and has an option for parsing SGML. `nsgmls' source, and binaries for some platforms, is available from . `nsgmls' is included in both the SP and Jade packages. METHODS ======= new Creates a new parser object. Default options for parsing, described below, are passed as key-value pairs or as a single hash. Options may be changed directly in the parser object unless stated otherwise. Options passed to `parse()' override the default options in the parser object for the duration of the parse. OPTIONS ======= The following options are supported by `XML::ESISParser': Handler default handler to receive events DocumentHandler handler to receive document events DTDHandler handler to receive DTD events ErrorHandler handler to receive error events Source hash containing the input source for parsing IsSGML the document to be parsed is in SGML If no handlers are provided then all events will be silently ignored. If a single string argument is passed to the `parse()' method, it is treated as if a `Source' option was given with a `String' parameter. The `Source' hash may contain the following parameters: ByteStream The raw byte stream (file handle) containing the document. String A string containing the document. SystemId The system identifier (URI) of the document. If more than one of `ByteStream', `String', or `SystemId', then preference is given first to `ByteStream', then `String', then `SystemId'. HANDLERS ======== The following handlers and properties are supported by `XML::ESISParser': DocumentHandler methods ----------------------- start_document Receive notification of the beginning of a document. No properties defined. end_document Receive notification of the end of a document. No properties defined. start_element Receive notification of the beginning of an element. Name The element type name. Attributes A hash containing the attributes attached to the element, if any. IncludedSubelement This element is an included subelement. Empty This element is declared empty. The `Attributes' hash contains only string values. The `Empty' flag is not set for an element that merely has no content, it is set only if the DTD declares it empty. BETA: Attribute values currently do not expand SData entities into entity objects, they are still in the system data notation used by nsgmls (inside `|'). A future version of XML::ESISParser will also convert other types of attributes into their respective objects, currently just their notation or entity names are given. end_element Receive notification of the end of an element. Name The element type name. characters Receive notification of character data. Data The characters from the document. record_end Receive notification of a record end sequence. XML applications should convert this to a new-line. processing_instruction Receive notification of a processing instruction. Target The processing instruction target in XML. Data The processing instruction data, if any. internal_entity_ref Receive notification of a system data (SData) internal entity reference. Name The name of the internal entity reference. external_entity_ref Receive notification of a external entity reference. Name The name of the external entity reference. start_subdoc Receive notification of the start of a sub document. Name The name of the external entity reference. end_subdoc Receive notification of the end of a sub document. Name The name of the external entity reference. conforming Receive notification that the document just parsed conforms to it's document type declaration (DTD). No properties defined. DTDHandler methods ------------------ external_entity_decl Receive notification of an external entity declaration. Name The entity's entity name. Type The entity's type (CDATA, NDATA, etc.) SystemId The entity's system identifier. PublicId The entity's public identifier, if any. GeneratedId Generated system identifiers, if any. internal_entity_decl Receive notification of an internal entity declaration. Name The entity's entity name. Type The entity's type (CDATA, NDATA, etc.) Value The entity's character value. notation_decl Receive notification of a notation declaration. Name The notation's name. SystemId The notation's system identifier. PublicId The notation's public identifier, if any. GeneratedId Generated system identifiers, if any. subdoc_entity_decl Receive notification of a subdocument entity declaration. Name The entity's entity name. SystemId The entity's system identifier. PublicId The entity's public identifier, if any. GeneratedId Generated system identifiers, if any. external_sgml_entity_decl Receive notification of an external SGML-entity declaration. Name The entity's entity name. SystemId The entity's system identifier. PublicId The entity's public identifier, if any. GeneratedId Generated system identifiers, if any. AUTHOR ====== Ken MacLeod, ken@bitsko.slc.ut.us SEE ALSO ======== perl(1), PerlSAX.pod(3) Extensible Markup Language (XML) SAX 1.0: The Simple API for XML SGML Parser (SP)  File: pm.info, Node: XML/Edifact, Next: XML/Element, Prev: XML/ESISParser, Up: Module List Perl module to handle XML::Edifact messages. ******************************************** NAME ==== XML::Edifact - Perl module to handle XML::Edifact messages. SYNOPSIS ======== use XML::Edifact; &XML::Edifact::open_dbm(); &XML::Edifact::read_edi_message($ARGV[0]); print &XML::Edifact::make_xml_message(); &XML::Edifact::close_dbm(); 0; -------------------------------------------------------------- use XML::Edifact; &XML::Edifact::open_dbm(); &XML::Edifact::read_xml_message($ARGV[0]); print &XML::Edifact::make_edi_message(); &XML::Edifact::close_dbm(); 0; DESCRIPTION =========== XML-Edifact started as Onyx-EDI which was a gawk script. XML::Edifact-0.3x still shows its bad ancestry (a2p) in some places. The current module is able to generate some SDBM files for the directory pointed to by open_dbm, by parsing the original United Nations EDIFACT documents during Bootstrap.PL. Those files will be stored during make install. The first typical usage will read an EDIFACT message into a buffer global to the package, and will print this message as XML on STDOUT. The second usage will do the opposite. Those two files will be installed as edi2xml and xml2edi in your local bin directory. New to XML::Edifact 0.34 are namespace migration and intend handling - take a look at the test.pl for how to use them. BUT WAIT - An object-oriented syntax is planned for the next release! And I'm calling this release an interim, because I'm just saving a stable state (I hope) before I start to muddle all things around while going on an object(ive) raid. If you have other EDIFACT files, I would like to include them in the next version. I'm also open to any comments; as they say, "everything is still in flux" ! AUTHOR ====== Michael Koehne, Kraehe@Copyleft.de SEE ALSO ======== perl(1), XML::Parser(3), UN/EDIFACT Draft.  File: pm.info, Node: XML/Element, Next: XML/Encoding, Prev: XML/Edifact, Up: Module List XML elements with the same interface as HTML::Element ***************************************************** NAME ==== XML::Element - XML elements with the same interface as HTML::Element SYNOPSIS ======== [See HTML::Element] DESCRIPTION =========== This is just a subclass of HTML::Element. It works basically the same as HTML::Element, except that tagnames and attribute names aren't forced to lowercase, as they are in HTML::Element. *Note HTML/Element: HTML/Element, describes everything you can do with this class. CAVEATS ======= Has currently no handling of namespaces. SEE ALSO ======== *Note XML/TreeBuilder: XML/TreeBuilder, for a class that actually builds XML::Element structures. *Note HTML/Element: HTML/Element, for all documentation. *Note XML/DOM: XML/DOM, and *Note XML/Twig: XML/Twig, for other XML document tree interfaces. *Note XML/Generator: XML/Generator, for more fun. COPYRIGHT ========= Copyright 2000 Sean M. Burke. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. AUTHOR ====== Sean M. Burke,  File: pm.info, Node: XML/Encoding, Next: XML/Filter/DetectWS, Prev: XML/Element, Up: Module List A perl module for parsing XML encoding maps. ******************************************** NAME ==== XML::Encoding - A perl module for parsing XML encoding maps. SYNOPSIS ======== use XML::Encoding; my $em_parser = new XML::Encoding(ErrorContext => 2, ExpatRequired => 1, PushPrefixFcn => \&push_prefix, PopPrefixFcn => \&pop_prefix, RangeSetFcn => \&range_set); my $encmap_name = $em_parser->parsefile($ARGV[0]); DESCRIPTION =========== This module, which is built as a subclass of XML::Parser, provides a parser for encoding map files, which are XML files. The file maps/encmap.dtd in the distribution describes the structure of these files. Calling a parse method returns the name of the encoding map (obtained from the name attribute of the root element). The contents of the map are processed through the callback functions push_prefix, pop_prefix, and range_set. METHODS ======= This module provides no additional methods to those provided by XML::Parser, but it does take the following additional options. * ExpatRequired When this has a true value, then an error occurs unless the encmap "expat" attribute is set to "yes". Whether or not the ExpatRequired option is given, the parser enters expat mode if this attribute is set. In expat mode, the parser checks if the encoding violates expat restrictions. * PushPrefixFcn The corresponding value should be a code reference to be called when a prefix element starts. The single argument to the callback is an integer which is the byte value of the prefix. An undef value should be returned if successful. If in expat mode, a defined value causes an error and is used as the message string. * PopPrefixFcn The corresponding value should be a code reference to be called when a prefix element ends. No arguments are passed to this function. An undef value should be returned if successful. If in expat mode, a defined value causes an error and is used as the message string. * RangeSetFcn The corresponding value should be a code reference to be called when a "range" or "ch" element is seen. The 3 arguments passed to this function are: (byte, unicode_scalar, length) The byte is the starting byte of a range or the byte being mapped by a "ch" element. The unicode_scalar is the Unicode value that this byte (with the current prefix) maps to. The length of the range is the last argument. This will be 1 for the "ch" element. An undef value should be returned if successful. If in expat mode, a defined value causes an error and is used as the message string. AUTHOR ====== Clark Cooper <`coopercc@netheaven.com'> SEE ALSO ======== XML::Parser  File: pm.info, Node: XML/Filter/DetectWS, Next: XML/Filter/Digest, Prev: XML/Encoding, Up: Module List A PerlSAX filter that detects ignorable whitespace ************************************************** NAME ==== XML::Filter::DetectWS - A PerlSAX filter that detects ignorable whitespace SYNOPSIS ======== use XML::Filter::DetectWS; my $detect = new XML::Filter::DetectWS (Handler => $handler, SkipIgnorableWS => 1); DESCRIPTION =========== This a PerlSAX filter that detects which character data contains ignorable whitespace and optionally filters it. Note that this is just a first stab at the implementation and it may change completely in the near future. Please provide feedback whether you like it or not, so I know whether I should change it. The XML spec defines ignorable whitespace as the character data found in elements that were defined in an declaration with a model of 'EMPTY' or 'Children' (Children is the rule that does not contain '#PCDATA'.) In addition, XML::Filter::DetectWS allows the user to define other whitespace to be *ignorable*. The ignorable whitespace is passed to the PerlSAX Handler with the ignorable_whitespace handler, provided that the Handler implements this method. (Otherwise it is passed to the characters handler.) If the *SkipIgnorableWS* is set, the ignorable whitespace is simply discarded. XML::Filter::DetectWS also takes xml:space attributes into account. See below for details. CDATA sections are passed in the standard PerlSAX way (i.e. with surrounding start_cdata and end_cdata events), unless the Handler does not implement these methods. In that case, the CDATA section is simply passed to the characters method. Constructor Options =================== * SkipIgnorableWS (Default: 0) When set, detected ignorable whitespace is discarded. * Handler The PerlSAX handler (or filter) that will receive the PerlSAX events from this filter. Current Implementation ====================== When determining which whitespace is ignorable, it first looks at the xml:space attribute of the parent element node (and its ancestors.) If the attribute value is "preserve", then it is *NOT* ignorable. (If someone took the trouble of adding xml:space="preserve", then that is the final answer...) If xml:space="default", then we look at the definition of the parent element. If the model is 'EMPTY' or follows the 'Children' rule (i.e. does not contain '#PCDATA') then we know that the whitespace is ignorable. Otherwise we need input from the user somehow. The idea is that the API of DetectWS will be extended, so that you can specify/override e.g. which elements should behave as if xml:space="preserve" were set, and/or which elements should behave as if the model was defined a certain way, etc. Please send feedback! The current implementation also detects whitespace after an element-start tag, whitespace before an element-end tag. It also detects whitespace before an element-start and after an element-end tag and before or after comments, processing instruction, cdata sections etc., but this needs to be reimplemented. In either case, the detected whitespace is split off into its own PerlSAX characters event and an extra property 'Loc' is added. It can have 4 possible values: * 1 (WS_START) - whitespace immediately after element-start tag * 2 (WS_END) - whitespace just before element-end tag * 3 (WS_ONLY) - both WS_START and WS_END, i.e. it's the only text found between the start and end tag and it's all whitespace * 0 (WS_INTER) - none of the above, probably before an element-start tag, after an element-end tag, or before or after a comment, PI, cdata section etc. Note that WS_INTER may not be that useful, so this may change. xml:space attribute =================== The XML spec states that: A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications. In valid documents, this attribute, like any other, must be declared if it is used. When declared, it must be given as an enumerated type whose only possible values are "default" and "preserve". For example: The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overriden with another instance of the xml:space attribute. The root element of any document is considered to have signaled no intentions as regards application space handling, unless it provides a value for this attribute or the attribute is declared with a default value. [... end of excerpt ...] CAVEATS ======= This code is highly experimental! It has not been tested well and the API may change. The code that detects of blocks of whitespace at potential indent positions may need some work. See AUTHOR ====== Send bug reports, hints, tips, suggestions to Enno Derksen at <`enno@att.com'>.  File: pm.info, Node: XML/Filter/Digest, Next: XML/Filter/Hekeln, Prev: XML/Filter/DetectWS, Up: Module List XML::Filter::Digest ******************* NAME ==== XML::Filter::Digest SYNOPSIS ======== use strict; use XML::Filter::Digest; use XML::Handler::YAWriter; use IO::File; my $digest = new XML::Filter::Digest( 'Handler'=> new XML::Handler::YAWriter( 'Output' => new IO::File( ">-" ), 'Pretty' => { 'AddHiddenNewLine' => 1 } ), 'Script' => new XML::Script::Digest( 'Source' => { 'SystemId' => $ARGV[0] } )->parse(), 'Source' => { 'SystemId' => $ARGV[1] } )->parse(); 0; DESCRIPTION =========== Most XML tools aim to parse some simple XML and to produce some formatted output. *XML::Filter::Digest* aims to do the opposite. Many formats can now be parsed by a SAX Driver. XPath offers a smart way to write queries to XML. XML::Filter::Digest is a PerlSAX Filter to query XML and to provide a simpler digest as a result. XML::Filter::Digest uses its own script language that can be parsed by *XML::Script::Digest* to formulate these digest queries. In fact, a digest script is well-formed XML. The following script defines that the result XML should have a root element called extract, containing several elements called section starting from the 4th HTML header. Those section elements contain id, title and *intro* elements, which in turn contain the XPath *string-value* of their nodes as character data. The digest script parser silently ignores anything other than digest elements and collect elements. The digest element needs a name attribute defining the name of the root element, while the collect element needs an additional node attribute defining XPath queries for nested elements. Only a single digest element should exist within a script document, but there is no need that the digest script be the root element of the document. Nested within the digest element should be collect elements. They may contain several other collect elements recursivly. METHODS ------- The XML::Filter::Digest object may act as a Filter to receive SAX events, or directly as a Driver if you provide a Source option to the parse method. The filter is reusable, if you arrange that the chain of Handlers is also reusable to handle multiple documents in batches. The filter requires a Handler and a Script option before the start_document method is called. The XML::Script::Digest object may act as a Handler to receive SAX events, or directly if you provide a Source option to the parse method. The script object is reusable and a single script object can be used for several filter objects. new Creates a new XML::Driver::HTML object. Default options for parsing, described below, are passed as key-value pairs or as a single hash. Options may be changed directly in the object. parse Parses a document by embedding XML::Parser::PerlSAX. This allows you to use XML::Filter::Digest directly as a Driver and simplifies generating a ready-to-use XML::Script::Object. Options, described below, are passed as key-value pairs or as a single hash. Options passed to parse() override the default options in the object for the duration of the parse. start_document Notifies the object about the start of a new document. The object will do its cleanup if it's reused. end_document Notifies the object about the end of the document. Return value of XML::Script::Digest is *$self*, to be used as the return value of the parse method. XML::Filter::Digest will walk through the script object to generate a stream of SAX events for its Handler. Return value of XML::Filter::Digest is the return value of the end_document method of the Handler object. OPTIONS ------- Script XML::Script::Digest objects can be used for several XML::Filter::Digest objects. Handler Default SAX Handler to receive events from XML::Filter::Digest objects. Source XML::Filter::Digest and XML::Script can be used on raw XML directly, by calling the parse() method. To do this, the Source option is required for embedding the PerlSAX parser. The `Source' hash may contain the following parameters: ByteStream The raw byte stream (file handle) containing the document. String A string containing the document. SystemId The system identifier (URI) of the document. Encoding A string describing the character encoding. If more than one of `ByteStream', `String', or `SystemId' are present, preference is given first to `ByteStream', then `String', then `SystemId'. NOTES ===== The XML::Filter::Digest is not a streaming filter, but a buffering filter, as any processing is done by the end_document method. This could cause the Perl interpreter to run out of memory on large XML files. Ideally, define a *ulimit*, to prevent the system going offline for several minutes, till it detects that there is really no memory to seize somewhere in the network. Adding network swapspace ad infinitum only make things worse, so I have the following line in my *.bashrc*. Other operating systems offer similar constraints. ulimit -v 98304 -d 98304 -m 98304 This line is ok on a single user machine with 32M ram and 128MB swap. I can raise this value, if I know that I wanna walk the dog. BUGS ==== not yet implemented: reuse of XML::Filter::Digest objects. XML::XPath::Builder bug: XML::Filter::Digest 0.02 has been tested with XML::XPath version 0.51, but XML::XPath needs the patch included within this distribution. Version 0.52 is expected to work out of the box. other bugs: The NotSoFree License is incompatible with the GNU General Public License. AUTHOR ====== Michael Koehne, Kraehe@Copyleft.De (c) 2000 NotSoFree License SEE ALSO ======== *Note XML/Parser/PerlSAX: XML/Parser/PerlSAX, and *Note XML/XPath: XML/XPath,  File: pm.info, Node: XML/Filter/Hekeln, Next: XML/Filter/Reindent, Prev: XML/Filter/Digest, Up: Module List a SAX stream editor ******************* NAME ==== XML::Filter::Hekeln - a SAX stream editor SYNOPSIS ======== use XML::Filter::Hekeln; my $hander = new SAXHandler( ... ); my $hekeln = new XML::Filter::Hekeln( 'Handler' => $handler, 'Script' => $script ); my $driver = new SAXDriver( ..., 'Handler' => $hekeln ); DESCRIPTION =========== XML::Filter::Hekeln is a sophisticated SAX stream editor. Hekeln is a SAX filter. This means that you can use a Hekeln object as a Handler to act on events, and to produce SAX events as a driver for the next handler in the chain. The name Hekeln sounds like the german word for crocheting, whats the best to describe, what Hekeln can do on markup language translation. The main design goal was to make it as easy for Perl as possible, while preserving a human readable form for the translation script. Hekeln scripts are event based. Hekeln objects stream events to the next in chain. They are therefore useable to handle XML documents larger than physical memory, as they do not need to store the entire document in a DOM or Grove structure. They will also be faster than any XSL in most circumstances. To tell you straight, how Hekeln works, I'll start with an example. I want to translate XML::Edifact repositories into html. Those repositories start with something like this: Here is a sniplet from test.pl : start_element:repository ! $self->handle('start_document',{}); < html > < body > < h1 > XML-Edifact Repository < h2 > ~name~ < p > Agency: ~agency~ < br > Code: ~code~ < br > Version: ~version~ < br > Description: ~desc~ < hr > end_element:repository ! $self->handle('end_document',{}); This part is handling start_element and end_element events, that have a target called repository. The translation done by Hekeln is done into subroutines that are stored in a hash. So anything is possible, if you understand the trick. To understand the trick, uncomment the "'Debug' => 1" parameter of Hekeln invocation in the test.pl script and redirect STDERR to some file. This will produce a file starting like : $hash->{start_element:repository}=eval "sub { my ($self,$param) = @_; my ($hash) = {}; $self->handle('start_document',{}); $hash->{Name}="html"; $self->handle("start_element", $hash); $hash->{Name}="body"; $self->handle("start_element", $hash); $hash->{Name}="h1"; $self->handle("start_element", $hash); $hash->{Data}="XML-Edifact Repository"; $self->handle("characters", $hash); $hash->{Name}="h1"; $self->handle("end_element", $hash); $hash->{Name}="h2"; $self->handle("start_element", $hash); $hash->{Data}="$param->{name}"; $self->handle("characters", $hash); $hash->{Name}="h2"; $self->handle("end_element", $hash); $hash->{Name}="p"; $self->handle("start_element", $hash); $hash->{Data}="Agency: $param->{agency}"; $self->handle("characters", $hash); $hash->{Name}="br"; $self->handle("start_element", $hash); $hash->{Data}="Code: $param->{code}"; $self->handle("characters", $hash); $hash->{Name}="br"; $self->handle("start_element", $hash); $hash->{Data}="Version: $param->{version}"; $self->handle("characters", $hash); $hash->{Name}="br"; $self->handle("start_element", $hash); $hash->{Data}="Description: $param->{desc}"; $self->handle("characters", $hash); $hash->{Name}="p"; $self->handle("end_element", $hash); $hash->{Name}="hr"; $self->handle("start_element", $hash); }"; $hash->{end_element:repository}=eval "sub { my ($self,$param) = @_; my ($hash) = {}; $hash->{Name}="body"; $self->handle("end_element", $hash); $hash->{Name}="html"; $self->handle("end_element", $hash); $self->handle('end_document',{}); }"; As you can imagine ~foobaa~ parts within a script will become expanded with the the attributes given in the XML start_element event. Syntax itself is a bit tricky as translation of the script into a sub is stupid and fast. Any event that has to be handled by Hekeln starts with an event_name event_target pair and ends with a blank line. event_nameevent_target left_indicatortextright_indicator left_indicatortextright_indicator left_indicatortextright_indicator Valid as left_indicator are "<", "{Flag}{FooBaa}=1; ! unshift $self->{Stack}, "FooBaa"; and ! $self->{Flag}{FooBaa}=undef; ! shift $self->{Stack} if $self->{Stack}[0] eq "FooBaa"; and ! if ($self->{Flag}{FooBaa}) { < h1 > flag FooBaa raised ! } It wont be necessary to code exactly this, as this is done by "++", "-", "?{" and "?}". "+" and "-" will raise or lower some flag, while "++" and "-" not only manage the flags, but also a stack that is needed to process character events. The default behavior is to throw away any event that does not have a subroutine matching the event, target pair. Events that do not have a target, will use the top flag on the stack as a target. So if you want to process character events, use "++" and "-" when handling the surounding start_element and end_element events. As a last word: Hekeln is not yet well tested, and badly needs some better documentation. I would aplaude anybody for naming bug, or improving the POD. AUTHOR ====== Michael Koehne, Kraehe@Copyleft.de SEE ALSO ======== perl(1), XML::Parser, XML::Parser::PerlSAX