This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: XBase/Base,  Next: XBase/FAQ,  Prev: XBase,  Up: Module List

Base input output module for XBase suite
****************************************

NAME
====

   XBase::Base - Base input output module for XBase suite

SYNOPSIS
========

   Used indirectly, via XBase or XBase::Memo.

DESCRIPTION
===========

   This module provides catch-all I/O methods for other XBase classes,
should be used by people creating additional XBase classes/methods.  There
is nothing interesting in here for users of the XBase(3) module.  Methods
in XBase::Base return nothing (undef) on error and the error message can
be retrieved using the *errstr* method.

   Methods are:

new
     Constructor. Creates the object and if the file name is specified,
     opens the file.

open
     Opens the file and using method read_header reads the header and sets
     the object's data structure. The read_header should be defined in the
     derived class, there is no default.

close
     Closes the file, doesn't destroy the object.

drop
     Unlinks the file.

create_file
     Creates file of given name. Second (optional) paramater is the
     permission specification for the file.

   The reading/writing methods assume that the file has got header of
length header_len bytes (possibly 0) and then records of length
record_len. These two values should be set by the read_header method.

seek_to, seek_to_record
     Seeks to absolute position or to the start of the record.

read_record, read_from
     Reads data from specified position (offset) or from the given record.
     The second parameter (optional for *read_record*) is the length to
     read. It can be negative, and at that case the read will not complain
     if the file is shorter than requested.

write_to, write_record
     Writes data to the absolute position or to specified record position.
     The data is not padded to record_len, just written out.

   General locking methods are *locksh*, *lockex* and unlock, they call
*_locksh*, *_lockex* and *_unlock* which can be redefined to allow any way
for locking (not only the default flock). The user is responsible for
calling the lock if he needs it.

   No more description - check the source code if you need to know more.

VERSION
=======

   0.129

AUTHOR
======

   (c) 1997-1999 Jan Pazdziora, adelton@fi.muni.cz

SEE ALSO
========

   perl(1), XBase(3)


File: pm.info,  Node: XBase/FAQ,  Next: XBase/Index,  Prev: XBase/Base,  Up: Module List

Frequently asked questions about the XBase.pm/DBD::XBase modules
****************************************************************

NAME
====

   XBase::FAQ - Frequently asked questions about the XBase.pm/DBD::XBase
modules

DESCRIPTION
===========

   This is a list of questions people asked since the module has been
announced in fall 1997, and my answers to them.

AUTHOR
======

   *Jan Pazdziora*, adelton@fi.muni.cz

Questions and answers
=====================

What Perl version do I need? What other modules?
     You need perl at least 5.004. I test each new distribution agains
     5.005* and 5.004_04 version of perl. You need DBI module version 1.00
     or higher, if you want to use the DBD driver (which you should).

Can I use *XBase.pm* under Windows 95/NT?
     Yes. It's a standard Perl module so there is no reason it shouldn't.
     Or, actually, there are a lot of reasons why standard thing do not
     work on systems that are broken, but I'm trying hard to workaround
     these bugs. If you find a problem on these platform, send me a
     description and I'll try to find yet another workaround.

Is there a choice of the format of the date?
     The only possible format in which you can get the date and that the
     module expect for inserts and updates is a 8 char string 'YYYYMMDD'.
     It is not possible to change this format. I prefer to do the formating
     myself since you have more control over it.

With *XBase.pm*, the get_record also returns records marked as deleted. Why?
     Because. You get the _DELETED flag as the first value of the array.
     This gives you a possibility to decide what to do - undelete,
     ignore... It's a feature - you say you want a record of given number,
     you get it and get additional information, if the record is or isn't
     deleted.

However, when reading the same file using *DBD::XBase*, I do not see the deleted records.
     That's correct: *DBD::XBase* only gives you records that are
     positively in the file and not deleted. Which shows that *XBase.pm*
     is a lower level tool because you can touch records that are deleted,
     while *DBD::XBase* is higher level - it gives you SQL interface and
     let's you work with the file more naturaly (what is deleted should
     stay deleted).

I have this dbf/something file created with [your favorite] tool and *XBase.pm* cannot read it.
     Describe exactly, what you expect and what you get. Send me the file
     (I understand attachments, uuencode, tar, gzip and zip) so that I can
     check what it going on and make *XBase.pm* undestand your file.  A
     small sample (three rows, or so) are generally enough but you can
     send the whole file if it doesn't have megabytes.

I want to install the module but I do not have make on my [your damaged] system.
     On Win* platform and with ActiveState port, use ppm to install
     *DBD::XBase* from ActiveState's site. You can also just copy the files
     from the lib directory of the distribution to where perl can find
     them. See README.

I have make but I cannot install into default directory and my sysadmin doesn't want to do it for me.
     Fire the sysadmin. See README for how to install into and use
     nonstandard place for the module.

Can I access one dbf file both from Perl and (say) Clipper?
     For reading - yes. For writing - *XBase.pm* has a locksh and lockex
     method to lock the file. The question is to what extend Clipper (or
     Fox* or whatever) uses the same system calls. So the answer is that
     for multiple updates you should probably consider real RDBMS system
     (PostgreSQL, MySQL, Oracle, to name a few).

XBase.pm/DBD::XBase breaks my accented characters.
     No, it doesn't. The character data is returned exactly as it appears
     in the dbf/dbt file. You probably brought the file from differenct
     system that uses differend character encodings. So some bytes in the
     strings have different meaning on that system. You also probably have
     fonts in different encoding on that system. In the Czech language, we
     have about 6 different encoding that affect possition at which
     accented characters appear.

     So what you really want to do is to use some external utility to
     convert the strings to encoding you need - for example, when I bring
     the dbf from Win*, ot often is in the Windows-1250 or PC-Latin-2
     encoding, while the standard is ISO-8859-2. I use my utility
     *Cz::Cstocs* to do the conversion, you maight also try GNU program
     *recode*.

How do I access the fields in the memo file?
     Just read the memo field, it will fetch the data from the memo file
     for you transparently.

I try to use `select * from table where field = '%str%'' but it doesn't work. Why?
     If you want to match wildcards with *DBD::XBase*, you have to use
     like:

          select * from table where field like '%str%'

Can I sue you if I use XBase.pm/DBD::XBase and it corrupts my dbf's?
     No. At least, I hope no. The software is provided without any
     warranty, in a hope you might find is usefull. Which is by the way
     the same as with most other software, even if you pay for that. What
     is different with XBase.pm/DBD::XBase is the fact that if you find out
     that the results are different from those expected, you are welcome to
     contact me, describe the problem and send me the files that give the
     module troubles, and I'll try to find a reason and fix the module.

What dbf/other files standard does the module support?
     I try to support any file that looks reasonably as
     dbf/dbt/fpt/smt/ndx/ntx/mdx/idx/cdx. There are many clones of
     XBase-like software, each adding its own extension. The module tries
     to accept all different variations. To do that, I need your
     cooperation however - usually good description of the problem, file
     sample and expected results lead to rather fast patch.

What SQL standard does the *DBD::XBase* support?
     If supports a reasonable subset of the SQL syntax, IMHO. So you can do
     select, delete, insert and update, create and drop table. If there is
     something that should be added, let me know and I will consider it.
     Having said that, I do not expect to ever support joins, for example.
     This module is more a parser to read files from your legacy
     applications that a RDBMS - you can find plenty of them around - use
     them.

I downloaded you module to [fill your] system and I do not know how to install it.
     Did you follow the steps in the README? Where did it fail? This module
     uses a standard way modules in Perl are installed. If you've never
     installed a module on your system and you system is so unstandard that
     the general instrauction do not help, you should contact your system
     administrator or the support for your system.

Does the module allow for any aggregate functions in the select, like select max(field) from table?
     No, aggregate functions are not supported. It would probably be very
     slow, since the DBD doesn't make use of indexes at the moment. I do
     not have plans to add this support in some near future.

I try to `DBI->connect' on my [fill your favorite] system and it says that the directory doesn't exist. But it's there. Is *DBD::XBase* mad or what?
     The third part of the first parameter to the connect is the directory
     where *DBD::XBase* will look for the dbf files. During connect, the
     module checks `if -d $directory'. So if it says it's not there, it's
     not there and the only thing *DBD::XBase* can do about it is to
     report it to you. It might be that the directory is not mounted, you
     do not have permissions to it, the script is running under different
     UID than when you try it from command line... Anyway, add

          die "Error reading $dir: $!\n" unless -d $dir;

     to your script and you will see that it's not *DBD::XBase* problem.

The *XBase.pm/dbfdump* stops after reading n records. Why doesn't it read all *10 x n* records?
     Check if the file isn't truncated. `dbfdump -i file.dbf' will tell
     you the expected number of records and length of one record, like

          Filename:       file.dbf
          Version:        0x03 (ver. 3)
          Num of records: 65
          Header length:  1313
          Record length:  1117
          Last change:    1998/12/18
          Num fields:     40

     So the expected length of the file is at least *1313 + 65 * 1117*. If
     it's shorted, you've got damaged file and *XBase.pm/dbfdump* only
     reads as much rows as it can find in the dbf.

How is this *DBD::XBase* related to *DBD::ODBC*?
     *DBD::XBase* reads the dbf files directly, using the (included)
     *XBase.pm* module. So it will run on any platform with reasonable new
     perl. With *DBD::ODBC*, you need an ODBC server, or some program, that
     *DBD::ODBC* could talk to. Many proprietary software can serve as ODBC
     source for dbf files, it just doesn't seem to run on Un*x systems. And
     is also much more resource intensive, if you just need to read the
     file record by record and convert it to HTML page or do similary
     simple operation with it.

How do I pack the dbf file, after the records were deleted?
     *XBase.pm* doesn't support this directly. You'd probably want to
     create new table, copy the data and rename back. Patches are always
     welcome.

Foxpro doesn't see all fields in dbf created with *XBase.pm*.
     Put 'version' => 3 options in to the create call - that way we say
     that the dbf file is dBaseIII style.


File: pm.info,  Node: XBase/Index,  Next: XBase/Memo,  Prev: XBase/FAQ,  Up: Module List

base class for the index files for dbf
**************************************

NAME
====

   XBase::Index - base class for the index files for dbf

SYNOPSIS
========

     use XBase;
     my $table = new XBase "data.dbf";
     my $cur = $table->prepare_select_with_index("id.ndx",
     	"ID", "NAME);
     $cur->find_eq(1097);

     while (my @data = $cur->fetch()) {
     	last if $data[0] != 1097;
     	print "@data\n";
     }

   This is a snippet of code to print ID and NAME fields from dbf data.dbf
where ID equals 1097. Provided you have index on ID in file id.ndx. You
can use the same code for ntx and idx index files.  For the cdx and mdx,
the prepare_select call would be

     prepare_select_with_index(['rooms.cdx', 'ROOMNAME'])

   so instead of plain filename you specify an arrayref with filename and
an index tag in that file. The reason is that cdx and mdx can contain
multiple indexes in one file and you have to distinguish, which you want
to use.

DESCRIPTION
===========

   The module XBase::Index is a collection of packages to provide index
support for XBase-like dbf database files.

   An index file is generaly a file that holds values of certain database
field or expression in sorted order, together with the record number that
the record occupies in the dbf file. So when you search for a record with
some value, you first search in this sorted list and once you have the
record number in the dbf, you directly fetch the record from dbf.

What indexes do
---------------

   To make the searching in this ordered list fast, it's generally
organized as a tree - it starts with a root page with records that point to
pages at lower level, etc., until leaf pages where the pointer is no
longer a pointer to the index but to the dbf. When you search for a record
in the index file, you fetch the root page and scan it (lineary) until you
find key value that is equal or grater than that you are looking for. That
way you've avoided reading all pages describing the values that are lower.
Here you descend one level, fetch the page and again search the list of
keys in that page. And you repeat this process until you get to the leaf
(lowest) level and here you finaly find a pointer to the dbf. XBase::Index
does this for you.

   Some of the formats also support multiple indexes in one file - usually
there is one top level index that for different field values points to
different root pages in the index file (so called tags).

   XBase::Index supports (or aims to support) the following index formats:
ndx, ntx, mdx, cdx and idx. They differ in a way they store the keys and
pointers but the idea is always the same: make a tree of pages, where the
page contains keys and pointer either to pages at lower levels, or to dbf
(or both). XBase::Index only supports read only access to the index fields
at the moment (and if you need writing them as well, follow reading
because we need to have the reading support stable before I get to work on
updating the indexes).

Testing your index file (and XBase::Index)
------------------------------------------

   You can test your index using the indexdump script in the main
directory of the DBD::XBase distribution (I mean test XBase::Index on
correct index data, not testing corrupted index file, of course ;-) Just
run

     ./indexdump ~/path/index.ndx
     ./indexdump ~/path/index.cdx tag_name

   or

     perl -Ilib ./indexdump ~/path/index.cdx tag_name

   if you haven't installed this version of XBase.pm/DBD::XBase yet. You
should get the content of the index file. On each row, there is the key
value and a record number of the record in the dbf file. Let me know if
you get results different from those you expect. I'd probably ask you to
send me the index file (and possibly the dbf file as well), so that I can
debug the problem.

   The index file is (as already noted) a complement to a dbf file. Index
file without a dbf doesn't make much sense because the only thing that you
can get from it is the record number in the dbf file, not the actual data.
But it makes sense to test - dump the content of the index to see if the
sequence is OK.

   The index formats usually distinguish between numeric and character
data. Some of the file formats include the information about the type in
the index file, other depend on the dbf file. Since with indexdump we only
look at the index file, you may need to specify the -type option to
indexdump if it complains that it doesn't know the data type of the values
(this is the case with cdx at least). The possible values are num, char
and date and the call would be like

     ./indexdump -type=num ~/path/index.cdx tag_name

   (this -type option may not work with all index formats at the moment -
will be fixed and patches always welcome).

   You can use `-ddebug' option to indexdump to see how pages are fetched
and decoded, or run debugger to see the calls and parsing.

Using the index files to speed up searches in dbf
-------------------------------------------------

   The syntax for using the index files to access data in the dbf file is
generally

     my $table = new XBase "tablename";
     	# or any other arguments to get the XBase object
     	# see XBase(3)
     my $cur = $table->prepare_select_with_index("indexfile",
     	"list", "of", "fields", "to", "return");

   or

     my $cur = $table->prepare_select_with_index(
     	[ "indexfile_with_tags", "tag_name" ],
     	"list", "of", "fields", "to", "return");

   where we specify the tag in the index file (this is necessary with cdx
and mdx). After we have the cursor, we can search to given record and
start fetching the data:

     $cur->find_eq('jezek');
     while (my @data = $cur->fetch) { # do something

Supported index formats
-----------------------

   The following table summarizes which formats are supproted by
XBase::Index. If the field says something else that Yes, I welcome testers
and offers of example index files.

     Reading of index files -- types supported by XBase::Index

     type	string		numeric		date
     ----------------------------------------------------------
     ndx	Yes		Yes		Yes (you need to
     					convert to Julian)

     ntx	Yes		Yes		Untested

     idx	Untested	Untested	Untested
     	(but should be pretty usable)

     mdx	Untested	Untested	Untested

     cdx	Yes		Yes		Untested

     Writing of index files -- not supported untill the reading
     is stable enough.

   So if you have access to an index file that is untested or unsupported
and you care about support of these formats, contact me. If you are able
to actually generate those files on request, the better because I may need
specific file size or type to check something. If the file format you work
with is supported, I still appreciate a report that it really works for
you.

   *Please note* that there is very little documentation about the file
formats and the work on XBase::Index is heavilly based on making
assumption based on real life data. Also, the documentation is often wrong
or only describing some format variations but not the others.  I
personally do not need the index support but am more than happy to make it
a reality for you. So I need your help - contact me if it doesn't work for
you and offer me your files for testing. Mentioning word XBase somewhere
in the Subject line will get you (hopefully ;-) fast response. Mentioning
work Help or similar stupidity will probably make my filters to consider
your email as spam. Help yourself by making my life easier in helping you.

Programmer's notes
------------------

   Programmers might find the following information usefull when trying to
debug XBase::Index from their files:

   The XBase::Index module contains the basic XBase::Index package and
also packages XBase::ndx, XBase::ntx, XBase::idx, XBase::mdx and
XBase::cdx, and for each of these also a package XBase::index_type::Page.
Reading the file goes like this: you create as object calling either new
XBase::Index or new XBase::ndx (or whatever the index type is). This can
also be done behind the scenes, for example
XBase::prepare_select_with_index calls new XBase::Index.  The index file
is opened using the XBase::Base::new/open and then the
XBase::index_type::read_header is called. This function fills the basic
data fields of the object from the header of the file. The new method
returns the object corresponding to the index type.

   Then you probably want to do $index->prepare_select or
$index->prepare_select_eq, that would possition you just before record
equal or greater than the parameter (record in the index file, that is).
Then you do a series of fetch'es that return next pair of (key,
pointer_to_dbf). Behind the scenes, prepare_select_eq or fetch call
XBase::Index::get_record which in turn calls XBase::index_type::Page::new.
From the index file perspective, the atomic item in the file is one index
page (or block, or whatever you call it). The XBase::index_type::Page::new
reads the block of data from the file and parses the information in the
page - pages have more or less complex structures. Page::new fills the
structure, so that the fetch calls can easily check what values are in the
page.

   For some examples, please see eg/use_index in the distribution
directory.

VERSION
=======

   0.170

AUTHOR
======

   (c) 1998-2001 Jan Pazdziora, adelton@fi.muni.cz

SEE ALSO
========

   XBase(3), XBase::FAQ(3)


File: pm.info,  Node: XBase/Memo,  Next: XBase/SDBM,  Prev: XBase/Index,  Up: Module List

Generic support for various memo formats
****************************************

NAME
====

   XBase::Memo - Generic support for various memo formats

SYNOPSIS
========

   Used indirectly, via XBase. Users should check its man page.

DESCRIPTION
===========

   Objects of this class are created to deal with memo files, currently
.dbt, .fpt and .smt (code for this provided by Dirk Tostmann).  Package
XBase::Memo defines methods read_header to parse that header of the file
and set object's structures, *write_record* and last_record to write the
records properly formated and find the end of file.

   There are four separate subpackages in XBase::Memo, dBaseIII, dBaseIV,
Fox and Apollo. Memo objects are effectively of one of these types and
they override their specific record handling routines where appropriate.

VERSION
=======

   0.172

AUTHOR
======

   (c) 1997-2001 Jan Pazdziora, adelton@fi.muni.cz

SEE ALSO
========

   perl(1), XBase(3)


File: pm.info,  Node: XBase/SDBM,  Next: XML/AutoWriter,  Prev: XBase/Memo,  Up: Module List

SDBM nonportable index support for dbf
**************************************

NAME
====

   XBase::SDBM - SDBM nonportable index support for dbf

VERSION
=======

   0.162

AUTHOR
======

   (c) 2001 Jan Pazdziora, adelton@fi.muni.cz,
http://www.fi.muni.cz/~adelton/ at Faculty of Informatics, Masaryk
University in Brno, Czech Republic

   All rights reserved. This package is free software; you can
redistribute it and/or modify it under the same terms as Perl itself.


File: pm.info,  Node: XML/AutoWriter,  Next: XML/Catalog,  Prev: XBase/SDBM,  Up: Module List

DOCTYPE based XML output
************************

NAME
====

   XML::AutoWriter - DOCTYPE based XML output

SYNOPSIS
========

     use XML::Doctype         NAME => a, SYSTEM_ID => 'a.dtd' ;
     use XML::AutoWriter qw( :all :dtd_tags ) ;
     #
     # a.dtd contains:
     #
     #   <!ELEMENT a ( b1, b2?, b3* ) >
     #	  <!ATTLIST   a aa1 CDATA       #REQUIRED >
     #   <!ELEMENT b1 ( c1 ) >
     #   <!ELEMENT b2 ( c2 ) >
     #
     b1 ;                # Emits <a><b1>
     c2( attr=>"val" ) ; # Emits </b1><b2><c2 attr="val">
     endAllTags ;        # Emits </c2></b2></a>

     ## If you've got an XML::Doctype object handy:
     use XML::AutoWriter qw( :dtd_tags ), DOCTYPE => $doctype ;

     ## If you've saved a preparsed DTD as a perl module
     use FooML::Doctype::v1_0001 ;
     use XML::AutoWriter qw( :dtd_tags ) ;

     ## Or as a normal perl object:
     $writer = XML::AutoWriter->new( ... ) ;
     $writer->startTag( 'b1' ) ;
     $writer->startTag( 'c2' ) ;
     $writer->end ;

STATUS
======

   Alpha.  Use and patch, don't depend on things not changing drastically.

   Many methods supplied by XML::Writer are not yet supplied here.

DESCRIPTION
===========

   This module subclasses *Note XML/ValidWriter: XML/ValidWriter, and
provides automatic start and end tag generation, allowing you to emit only
the 'important' tags.

   See XML::ValidWriter for the details on all functions not documented
here.

XML::Writer API compatibility
-----------------------------

   Much of the interface is patterned after XML::Writer so that it can
possibly be used as a drop-in replacement.  It will take awhile before
this module emulates enough of XML::Writer to be a drop-in replacement in
situations where the more advanced XML::Writer methods are used.

Automatic start tags
--------------------

   Automatic start tag creation is done when emitting a start tag that is
not allowed to be a child of the currently open tag but is allowed to be
contained in the currently open tag's subset.  In this case, the minimal
number of start tags necessary to allow All start tags between the current
tag and the desired tag are automatically emitted with no attributes.

Automatic end tags
------------------

   If start tag autogeneration fails, then end tag autogeneration is
attempted.  startTag() scans the stack of currently open tags trying to
close as few as possible before start tag autogeneration suceeds.

   Explicit end tags may be emitted to prevent unwanted automatic start
tags, and, in the future, warnings or errors will be available in place of
automatic start and end tag creation.

METHODS AND FUNCTIONS
=====================

   All of the routines in this module can be called as either functions or
methods unless otherwise noted.

   To call these routines as functions use either the DOCTYPE or :dtd_tags
options in the parameters to the use statement:

     use XML::AutoWriter DOCTYPE => XML::Doctype->new( ... ) ;
     use XML::AutoWriter qw( :dtd_tags ) ;

   This associates an XML::AutoWriter and an XML::Doctype with the
package.  These are used by the routines when called as functions.

new
          $writer = XML::AutoWriter->new( DTD => $dtd, OUTPUT => \*FH ) ;

     Creates an XML::AutoWriter.

     All other parameters are passed to the XML::ValidWriter base class
     constructor.

characters
          characters( 'yabba dabba dooo' ) ;
          $writer->characters( 'yabba dabba dooo' ) ;

     If the currently open tag cannot contain #PCDATA, then start tag
     autogeneration will be attempted, followed by end tag autogeneration.

     Start tag autogeneration takes place even if you pass in only ", or
     even (), the empty list.

endTag
          endTag ;
          endTag( 'a' ) ;
          $writer->endTag ;
          $writer->endTag( 'a' ) ;

     Prints one or more end tags.  The tag name is optional and defaults
     to the most recently emitted start tag if not present.

     This will emit as many close tags as necessary to close the supplied
     tag name, or will emit an error if the tag name specified is not open
     in the output document.

startTag
          startTag( 'a', attr => val ) ;  # use default XML::AutoWriter for
                                          # current package.
          $writer->startTag( 'a', attr => val ) ;

     Emits a named start tag with optional attributes.  If the named tag
     cannot be a child of the most recently started tag, then any tags
     that need to be opened between that one and the named tag are opened.

     If the named tag cannot be enclosed within the most recently opened
     tag, no matter how deep, then startTag() tries to end as few started
     tags as necessary to allow the named tag to be emitted within a tag
     already on the stack.

     This warns (once) if no <?xml?> declaration has been emitted.  It
     does not check to see if a <!DOCTYPE...> has been emitted.  It dies
     if an attempt is made to emit a second root element.

AUTHOR
======

   Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
=========

   This module is Copyright 2000, Barrie Slaymaker.  All rights reserved.

   This module is licensed under the GPL, version 2.  Please contact me if
this does not suit your needs.


File: pm.info,  Node: XML/Catalog,  Next: XML/Checker,  Prev: XML/AutoWriter,  Up: Module List

Resolve public identifiers and remap system identifiers
*******************************************************

NAME
====

   XML::Catalog - Resolve public identifiers and remap system identifiers

SYNOPSIS
========

     use XML::Catalog;
     my $catalog=XML::Catalog->new('/xml/catalog.cat');
     $catalog->add('http://www.w3.org/xcatalog/mastercat.xml');
     my $sysid=$catalog->resolve_public('-//John Cowan//LOC Diacritics');
     my $newsysid=$catalog->remap_system('http://www.w3.org');
     $parser->setHandlers(ExternEnt=>$catalog->get_handler($parser));

DESCRIPTION
===========

   This module implements draft 0.4 of John Cowan's XML Catalog (formerly
known as XCatalog) proposal
(<http://www.ccil.org/~cowan/XML/XCatalog.html>).  Catalogs may be written
in either SOCAT or XML syntax (see the proposal for syntax details);
XML::Catalog will assume SOCAT syntax if the catalog is not in well-formed
XML syntax.

CONSTRUCTOR
===========

new(URL [,URL]*)
     Read the catalog identified by URL and return a catalog object
     implementing it.  If more than one URL is given, chain the additional
     catalogs as extensions to the catalog (they will be searched before
     catalogs specified by EXTEND entries).

     All URLs must be absolute.  A URL with no protocol is treated as a
     filename.

METHODS
=======

add(URL [,URL]*)
     Chain the catalogs identified by the URL(s) to the current catalog.

resolve_public(PUBID)
     Translate the public identifier PUBID to a system identifier.  Returns
     undef if the identifier could not be translated.

remap_system(SYSID)
     Remap the system identifier SYSID as specified by the catalog.
     Returns SYSID unchanged if no remapping was found.

get_handler(PARSER)
     Returns a coderef to a resolver suitable for use as the ExternEnt
     handler for an XML::Parser object.  The resolver will first attempt
     to resolve a public identifier if supplied, and then attempt to remap
     the resulting system identifier (or the original system identifier if
     no public identifier was supplied).  It will then call the original
     ExternEnt handler associated with the parser object.  PARSER is the
     parser object; it is needed as an argument in order to obtain the
     original handler.

BUGS / TODO
===========

   Searching of chained catalogs is not purely depth-first (EXTEND items
in a chained catalog will be searched before EXTEND items in the original
catalog.

   Error checking leaves much to be desired.

AUTHOR
======

   Eric Bohlman (ebohlman@netcom.com)

COPYRIGHT
=========

   Copyright 1999-2000 Eric Bohlman.  All rights reserved.

   This program is free software; you can use/modify/redistribute it under
the same terms as Perl itself.


File: pm.info,  Node: XML/Checker,  Next: XML/Checker/Parser,  Prev: XML/Catalog,  Up: Module List

A perl module for validating XML
********************************

NAME
====

   XML::Checker - A perl module for validating XML

SYNOPSIS
========

   *Note XML/Checker/Parser: XML/Checker/Parser, - an *Note XML/Parser:
XML/Parser, that validates at parse time

   *Note XML/DOM/ValParser: XML/DOM/ValParser, - an *Note XML/DOM/Parser:
XML/DOM/Parser, that validates at parse time

   (Some of the package names may change! This is only an alpha release...)

DESCRIPTION
===========

   XML::Checker can be used in different ways to validate XML. See the
manual pages of *Note XML/Checker/Parser: XML/Checker/Parser, and *Note
XML/DOM/ValParser: XML/DOM/ValParser, for more information.

   This document only describes common topics like error handling and the
XML::Checker class itself.

   WARNING: Not all errors are currently checked. Almost everything is
subject to change. Some reported errors may not be real errors.

ERROR HANDLING
==============

   Whenever XML::Checker (or one of the packages that uses XML::Checker)
detects a potential error, the 'fail handler' is called. It is currently
also called to report information, like how many times an Entity was
referenced.  (The whole error handling mechanism is subject to change, I'm
afraid...)

   The default fail handler is XML::Checker::print_error(), which prints
an error message to STDERR. It does not stop the XML::Checker, so it will
continue looking for other errors.  The error message is created with
XML::Checker::error_string().

   You can define your own fail handler in two ways, locally and globally.
Use a local variable to temporarily override the fail handler. This way
the default fail handler is restored when the local variable goes out of
scope, esp. when exceptions are thrown e.g.

     # Using a local variable to temporarily override the fail handler (preferred)
     { # new block - start of local scope
       local $XML::Checker::FAIL = \&my_fail;
       ... your code here ...
     } # end of block - the previous fail handler is restored

   You can also set the error handler globally, risking that your code may
not be reusable or may clash with other modules that use XML::Checker.

     # Globally setting the fail handler (not recommended)
     $XML::Checker::FAIL = \&my_fail;
     ... rest of your code ...

   The fail handler is called with the following parameters ($code, $msg,
@context), where $code is the error code, $msg is the error description and
@context contains information on where the error occurred. The @context is
a (ordered) list of (key,value) pairs and can easily be turned into a hash.
It contains the following information:

     Element - tag name of Element node (if applicable)
     Attr - attribute name (if applicable)
     ChildElementIndex - if applicable (see error 157)
     line - only when parsing
     column - only when parsing
     byte - only when parsing (-1 means: end of file)

   Some examples of fail handlers:

     # Don't print info messages
     sub my_fail
     {
         my $code = shift;
         print STDERR XML::Checker::error_message ($code, @_)
             if $code < 300;
     }

     # Die when the first error is encountered - this will stop
     # the parsing process. Ignore information messages.
     sub my_fail
     {
         my $code = shift;
         die XML::Checker::error_message ($code, @_) if $code < 300;
     }

     # Count the number of undefined NOTATION references
     # and print the error as usual
     sub my_fail
     {
         my $code = shift;
         $count_undef_notations++ if $code == 100;
         XML::Checker::print_error ($code, @_);
     }

     # Die when an error is encountered.
     # Don't die if a warning or info message is encountered, just print a message.
     sub my_fail {
         my $code = shift;
         die XML::Checker::error_string ($code, @_) if $code < 200;
         XML::Checker::print_error ($code, @_);
     }

INSIGNIFICANT WHITESPACE
========================

   XML::Checker keeps track of whether whitespace found in character data
is significant or not. It is considered insignicant if it is found inside
an element that has a ELEMENT rule that is not of type Mixed or of type
ANY.  (A Mixed ELEMENT rule does contains the #PCDATA keyword.  An ANY
rule contains the ANY keyword. See the XML spec for more info.)

   XML::Checker can not determine whether whitespace is insignificant in
those two cases, because they both allow regular character data to appear
within XML elements and XML::Checker can therefore not deduce whether
whitespace is part of the actual data or was just added for readability of
the XML file.

   XML::Checker::Parser and XML::DOM::ValParser both have the option to
skip insignificant whitespace when setting SkipInsignifWS to 1 in their
constructor.  If set, they will not call the Char handler when
insignificant whitespace is encountered. This means that in
XML::DOM::ValParser no Text nodes are created for insignificant whitespace.

   Regardless of whether the SkipInsignifWS options is set, XML::Checker
always keeps track of whether whitespace is insignificant. After making a
call to XML::Checker's Char handler, you can find out if it was
insignificant whitespace by calling the isInsignifWS method.

   When using multiple (nested) XML::Checker instances or when using
XML::Checker without using XML::Checker::Parser or XML::DOM::ValParser
(which hardly anybody probably will), make sure to set a local variable in
the scope of your checking code, e.g.

     { # new block - start of local scope
       local $XML::Checker::INSIGNIF_WS = 0;
       ... insert your code here ...
     } # end of scope

ERROR CODES
===========

   There are 3 categories, errors, warnings and info messages.  (The codes
are still subject to change, as well the error descriptions.)

   Most errors have a link to the appropriate Validaty Constraint (*VC*)
or other section in the XML specification.

ERROR Messages
--------------

100 - 109
---------

   * 100 - undefined NOTATION [$notation] in ATTLIST

     The ATTLIST contained a Notation reference that was not defined in a
     NOTATION definition.  *VC:* `Notation Attributes|http:' in this node

   * 101 - undefined ELEMENT [$tagName]

     The specified Element was never defined in an ELEMENT definition.
     This is not an error according to the XML spec.  See `Element Type
     Declarations|http:' in this node

   * 102 - undefined unparsed ENTITY [$entity]

     The attribute value referenced an undefined unparsed entity.  *VC:*
     `Entity Name|http:' in this node

   * 103 - undefined attribute [$attrName]

     The specified attribute was not defined in an ATTLIST for that
     Element.  *VC:* `Attribute Value Type|http:' in this node

110 - 119
---------

   * 110 - attribute [$attrName] of element [$tagName] already defined

     The specified attribute was already defined in this ATTLIST
     definition or in a previous one.  This is not an error according to
     the XML spec.  See `Attribute-List Declarations|http:' in this node

   * 111 - ID [$value] already defined

     An ID with the specified value was already defined in an attribute
     within the same document.  *VC:* `ID|http:' in this node

   * 112 - unparsed ENTITY [$entity] already defined

     This is not an error according to the XML spec.  See `Entity
     Declarations|http:' in this node

   * 113 - NOTATION [$notation] already defined

   * 114 - ENTITY [$entity] already defined

     This is not an error according to the XML spec.  See `Entity
     Declarations|http:' in this node

   * 115 - ELEMENT [$name] already defined *VC:* `Unique Element Type
     Declaration|http:' in this node

120 - 129
---------

   * 120 - invalid default ENTITY [$default]

     (Or IDREF or NMTOKEN instead of ENTITY.)  The ENTITY, IDREF or
     NMTOKEN reference in the default attribute value for an attribute
     with types ENTITY, IDREF or NMTOKEN was not valid.  *VC:* `Attribute
     Default Legal|http:' in this node

   * 121 - invalid default [$token] in ENTITIES [$default]

     (Or IDREFS or NMTOKENS instead of ENTITIES) One of the ENTITY, IDREF
     or NMTOKEN references in the default attribute value for an attribute
     with types ENTITIES, IDREFS or NMTOKENS was not valid.  *VC:*
     `Attribute Default Legal|http:' in this node

   * 122 - invalid default attribute value [$default]

     The specified default attribute value is not a valid attribute value.
     *VC:* `Attribute Default Legal|http:' in this node

   * 123 - invalid default ID [$default], must be #REQUIRED or #IMPLIED

     The default attribute value for an attribute of type ID has to be
     #REQUIRED or #IMPLIED.  *VC:* `ID Attribute Default|http:' in this
     node

   * 124 - bad model [$model] for ELEMENT [$name]

     The model in the ELEMENT definition did not conform to the XML syntax
     for Mixed models.  See `Mixed Content|http:' in this node

130 - 139
---------

   * 130 - invalid NMTOKEN [$attrValue]

     The attribute value is not a valid NmToken token.  *VC:*
     `Enumeration|http:' in this node

   * 131 - invalid ID [$attrValue]

     The specified attribute value is not a valid Name token.  *VC:*
     `ID|http:' in this node

   * 132 - invalid IDREF [$value]

     The specified attribute value is not a valid Name token.  *VC:*
     `IDREF|http:' in this node

   * 133 - invalid ENTITY name [$name]

     The specified attribute value is not a valid Name token.  *VC:*
     `Entity Name|http:' in this node

   * 134 - invalid Enumeration value [$value] in ATTLIST

     The specified value is not a valid NmToken (see XML spec for def.)
     See definition of `NmToken|http:' in this node

   * 135 - empty NOTATION list in ATTLIST

     The NOTATION list of the ATTLIST definition did not contain any
     NOTATION references.  See definition of `NotationType|http:' in this
     node

   * 136 - empty Enumeration list in ATTLIST

     The ATTLIST definition of the attribute of type Enumeration did not
     contain any values.  See definition of `Enumeration|http:' in this
     node

   * 137 - invalid ATTLIST type [$type]

     The attribute type has to be one of: ID, IDREF, IDREFS, ENTITY,
     ENTITIES, NMTOKEN, NMTOKENS, CDATA, NOTATION or an Enumeration.  See
     definition of `AttType|http:' in this node

150 - 159
---------

   * *150* - bad #FIXED attribute value [$value], it should be [$default]

     The specified attribute was defined as #FIXED in the ATTLIST
     definition and the found attribute $value differs from the specified
     $default value.  *VC:* `Fixed Attribute Default|http:' in this node

   * *151* - only one ID allowed in ATTLIST per element first=[$attrName]

     The ATTLIST definitions for an Element may contain only one attribute
     with the type ID. The specified $attrName is the one that was found
     first.  *VC:* `One ID per Element Type|http:' in this node

   * *152* - Element should be EMPTY, found Element [$tagName]

     The ELEMENT definition for the specified Element said it should be
     EMPTY, but a child Element was found.  *VC:* `Element Valid
     (sub1)|http:' in this node

   * *153* - Element should be EMPTY, found text [$text]

     The ELEMENT definition for the specified Element said it should be
     EMPTY, but text was found. Currently, whitespace is not allowed
     between the open and close tag. (This may be wrong, please give
     feedback.)  To allow whitespace (subject to change), set:

          $XML::Checker::Context::EMPTY::ALLOW_WHITE_SPACE = 1;

     *VC:* `Element Valid (sub1)|http:' in this node

   * *154* - bad order of Elements Found=[$found] RE=[$re]

     The child elements of the specified Element did not match the regular
     expression found in the ELEMENT definition. $found contains a comma
     separated list of all the child element tag names that were found.
     $re contains the (decoded) regular expression that was used
     internally.  *VC:* `Element Valid|http:' in this node

   * *155* - more than one root Element [$tags]

     An XML Document may only contain one Element.  $tags is a comma
     separated list of element tag names encountered sofar.  *Note
     XML/Parser: XML/Parser, (expat) throws 'no element found' exception.
     See two_roots.xml for an example.  See definition of `document|http:'
     in this node

   * *156* - unexpected root Element [$tagName], expected [$rootTagName]

     The tag name of the root Element of the XML Document differs from the
     name specified in the DOCTYPE section.  *Note XML/Parser: XML/Parser,
     (expat) throws 'not well-formed' exception.  See bad_root.xml for an
     example.  *VC:* `Root Element Type|http:' in this node

   * *157* - unexpected Element [$tagName]

     The ELEMENT definition for the specified Element does not allow child
     Elements with the specified $tagName.  *VC:* `Element Valid|http:' in
     this node

     The error context contains ChildElementIndex which is the index within
     its parent Element (counting only Element nodes.)

   * *158* - unspecified value for #IMPLIED attribute [$attrName]

     The ATTLIST for the specified attribute said the attribute was
     #IMPLIED, which means the user application should supply a value, but
     the attribute value was not specified. (User applications should pass
     a value and set $specified to 1 in the Attr handler.)

   * *159* - unspecified value for #REQUIRED attribute [$attrName]

     The ATTLIST for the specified attribute said the attribute was
     #REQUIRED, which means that a value should have been specified.
     *VC:* `Required Attribute|http:' in this node

160 - 169
---------

   * *160* - invalid Enumeration value [$attrValue]

     The specified attribute value does not match one of the Enumeration
     values in the ATTLIST.  *VC:* `Enumeration|http:' in this node

   * *161* - invalid NOTATION value [$attrValue]

     The specifed attribute value was not found in the list of possible
     NOTATION references as found in the ATTLIST definition.  *VC:*
     `Notation Attributes|http:' in this node

   * *162* - undefined NOTATION [$attrValue]

     The NOTATION referenced by the specified attribute value was not
     defined.  *VC:* `Notation Attributes|http:' in this node

WARNING Messages (200 and up)
-----------------------------

   * *200* - undefined ID [$id] was referenced [$n] times

     The specified ID was referenced $n times, but never defined in an
     attribute value with type ID.  *VC:* `IDREF|http:' in this node

INFO Messages (300 and up)
--------------------------

   * *300* - [$n] references to ID [$id]

     The specified ID was referenced $n times.

Not checked
-----------

   The following errors are already checked by *Note XML/Parser:
XML/Parser, (expat) and are currently not checked by XML::Checker:

   (?? TODO - add more info)

root element is missing
     *Note XML/Parser: XML/Parser, (expat) throws 'no element found'
     exception.  See no_root.xml for an example.

XML::Checker
============

   XML::Checker can be easily plugged into your application.  It uses
mostly the same style of event handlers (or callbacks) as *Note
XML/Parser: XML/Parser,.  See *Note XML/Parser: XML/Parser, manual page
for descriptions of most handlers.

   It also implements PerlSAX style event handlers. See `PerlSAX
interface' in this node.

   Currently, the XML::Checker object is a blessed hash with the following
(potentially useful) entries:

     $checker->{RootElement} - root element name as found in the DOCTYPE
     $checker->{NOTATION}->{$notation} - is 1 if the NOTATION was defined
     $checker->{ENTITY}->{$name} - contains the (first) ENTITY value if defined
     $checker->{Unparsed}->{$entity} - is 1 if the unparsed ENTITY was defined
     $checker->{ID}->{$id} - is 1 if the ID was defined
     $checker->{IDREF}->{$id} - number of times the ID was referenced

     # Less useful:
     $checker->{ERule}->{$tag} - the ELEMENT rules by Element tag name
     $checker->{ARule}->{$tag} - the ATTLIST rules by Element tag name
     $checker->{Context} - context stack used internally
     $checker->{CurrARule} - current ATTLIST rule for the current Element

XML:Checker methods
-------------------

   This section is only interesting when using XML::Checker directly.
XML::Checker supports most event handlers that *Note XML/Parser:
XML/Parser, supports with minor differences. Note that the XML::Checker
event handler methods are instance methods and not static, so don't forget
to call them like this, without passing $expat (as in the *Note
XML/Parser: XML/Parser,) handlers:

     $checker->Start($tagName);

Constructor
          $checker = new XML::Checker;
          $checker = new XML::Checker (%user_args);

     User data may be stored by client applications. Only $checker->{User}
     is guaranteed not to clash with internal hash keys.

getRootElement ()
          $tagName = $checker->getRootElement;

     Returns the root element name as found in the DOCTYPE

Expat interface
---------------

   XML::Checker supports what I call the *Expat* interface, which is the
collection of methods you normally specify as the callback handlers when
using XML::Parser.

   Only the following *Note XML/Parser: XML/Parser, handlers are currently
supported: Init, Final, Char, Start, End, Element, Attlist, Doctype,
Unparsed, Entity, Notation.

   I don't know how to correctly support the Default handler for all *Note
XML/Parser: XML/Parser, releases. The Start handler works a little
different (see below) and I added Attr, InitDomElem, FinalDomElem, CDATA
and EntityRef handlers.  See *Note XML/Parser: XML/Parser, for a
description of the handlers that are not listed below.

   Note that this interface may disappear, when the PerlSAX interface
stabilizes.

Start ($tag)
          $checker->Start($tag);

     Call this when an Element with the specified $tag name is encountered.
     Different from the Start handler in *Note XML/Parser: XML/Parser,, in
     that no attributes are passed in (use the Attr handler for those.)

Attr ($tag, $attrName, $attrValue, $isSpecified)
          $checker->Attr($tag,$attrName,$attrValue,$spec);

     Checks an attribute with the specified $attrName and $attrValue
     against the ATTLIST definition of the element with the specified $tag
     name.  $isSpecified means whether the attribute was specified (1) or
     defaulted (0).

EndAttr ()
          $checker->EndAttr;

     This should be called after all attributes are passed with Attr().
     It will check which of the #REQUIRED attributes were not specified
     and generate the appropriate error (159) for each one that is missing.

CDATA ($text)
          $checker->CDATA($text);

     This should be called whenever CDATASections are encountered.
     Similar to Char handler (but might perform different checks later...)

EntityRef ($entity, $isParameterEntity)
          $checker->EntityRef($entity,$isParameterEntity);

     Checks the ENTITY reference. Set $isParameterEntity to 1 for entity
     references that start with '%'.

InitDomElem () and FinalDomElem ()
     Used by XML::DOM::Element::check() to initialize (and cleanup) the
     context stack when checking a single element.

PerlSAX interface
-----------------

   XML::Checker now also supports the PerlSAX interface, so you can use
XML::Checker wherever you use PerlSAX handlers.

   XML::Checker implements the following methods: start_document,
end_document, start_element, end_element, characters,
processing_instruction, comment, start_cdata, end_cdata, entity_reference,
notation_decl, unparsed_entity_decl, entity_decl, element_decl,
attlist_decl, doctype_decl, xml_decl

   Not implemented: set_document_locator, ignorable_whitespace

   See PerlSAX.pod for details. (It is called lib/PerlSAX.pod in the
libxml-perl distribution which can be found at CPAN.)

CAVEATS
=======

   This is an alpha release. Almost everything is subject to change.

AUTHOR
======

   Send bug reports, hints, tips, suggestions to Enno Derksen at
<`enno@att.com'>.

SEE ALSO
========

   The home page of XML::Checker at `http:' in this node

   The XML spec (Extensible Markup Language 1.0) at `http:' in this node

   The *Note XML/Parser: XML/Parser, and *Note XML/Parser/Expat:
XML/Parser/Expat, manual pages.

   The other packages that come with XML::Checker: *Note
XML/Checker/Parser: XML/Checker/Parser,, *Note XML/DOM/ValParser:
XML/DOM/ValParser,

   The DOM Level 1 specification at `http:' in this node

   The PerlSAX specification. It is currently in lib/PerlSAX.pod in the
libxml-perl distribution by Ken MacLeod.

   The original SAX specification (Simple API for XML) can be found at
`http:' in this node and `http:' in this node