This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: Text/PDF/Objind,  Next: Text/PDF/Page,  Prev: Text/PDF/Number,  Up: Module List

PDF indirect object reference. Also acts as an abstract superclass for all elements in a PDF file.
**************************************************************************************************

NAME
====

   Text::PDF::Objind - PDF indirect object reference. Also acts as an
abstract superclass for all elements in a PDF file.

INSTANCE VARIABLES
==================

   Instance variables differ from content variables in that they all start
with a space.

parent
     For an object which is a reference to an object in some source, this
     holds the reference to the source object, so that should the
     reference have to be de-referenced, then we know where to go and get
     the info.

objnum (R)
     The object number in the source (only for object references)

objgen (R)
     The object generation in the source

     There are other instance variables which are used by the parent for
     file control.

isfree
     This marks whether the object is in the free list and available for
     re-use as another object elsewhere in the file.

nextfree
     Holds a direct reference to the next free object in the free list.

METHODS
=======

Text::PDF::Objind->new()
------------------------

   Creates a new indirect object

uid
---

   Returns a Unique id for this object, creating one if it didn't have one
before

$r->val
-------

   Returns the val of this object or reads the object and then returns its
value.

   Note that all direct subclasses *must* make their own versions of this
subroutine otherwise we could be in for a very deep loop!

$r->realise
-----------

   Makes sure that the object is fully read in, etc.

$r->outobjdeep($fh, $pdf)
-------------------------

   If you really want to output this object, then you must need to read it
first.  This also means that all direct subclasses must subclass this
method or loop forever!

$r->outobj($fh)
---------------

   If this is a full object then outputs a reference to the object,
otherwise calls outobjdeep to output the contents of the object at this
point.

$r->elementsof
--------------

   Abstract superclass function filler. Returns self here but should return
something more useful if an array.

$r->empty
---------

   Empties all content from this object to free up memory or to be read to
pass the object into the free list. Simplistically undefs all instance
variables other than object number and generation.

$r->merge($objind)
------------------

   This merges content information into an object reference place-holder.
This occurs when an object reference is read before the object definition
and the information in the read data needs to be merged into the object
place-holder

$r->is_obj($pdf)
----------------

   Returns whether this object is a full object with its own object number
or whether it is purely a sub-object. $pdf indicates which output file we
are concerned that the object is an object in.

$r->copy($pdf, $res)
--------------------

   Returns a new copy of this object. The object is assumed to be some kind
of associative array and the copy is a deep copy for elements which are
not PDF objects, according to $pdf, and shallow copy for those that are.
Notice that calling copy on an object forces at least a one level copy
even if it is a PDF object. The returned object loses its PDF object
status though.

   If $res is defined then the copy goes into that object rather than
creating a new one. It is up to the caller to bless $res, etc. Notice that
elements from $self are not copied into $res if there is already an entry
for them existing in $res.


File: pm.info,  Node: Text/PDF/Page,  Next: Text/PDF/Pages,  Prev: Text/PDF/Objind,  Up: Module List

Represents a PDF page, inherits from *Note Text/PDF/Pages: Text/PDF/Pages,
**************************************************************************

NAME
====

   Text::PDF::Page - Represents a PDF page, inherits from *Note
Text/PDF/Pages: Text/PDF/Pages,

DESCRIPTION
===========

   Represents a page of output in PDF. It also keeps track of the content
stream, any resources (such as fonts) being switched, etc.

   Page inherits from Pages due to a number of shared methods. They are
really structurally quite different.

INSTANCE VARIABLES
==================

   A page has various working variables:

curstrm
     The currently open stream

METHODS
=======

Text::PDF::Page->new($pdf, $parent, $index)
-------------------------------------------

   Creates a new page based on a pages object (perhaps the root object).

   The page is also added to the parent at this point, so pages are
ordered in a PDF document in the order in which they are created rather
than in the order they are closed.

   Only the essential elements in the page dictionary are created here,
all others are either optional or can be inherited.

   The optional index value indicates the index in the parent list that
this page should be inserted (so that new pages need not be appended)

$p->add($str)
-------------

   Adds the string to the currently active stream for this page. If no
stream exists, then one is created and added to the list of streams for
this page.

   The slightly cryptic name is an aim to keep it short given the number
of times people are likely to have to type it.

$p->ship_out($pdf)
------------------

   Ships the page out to the given output file context


File: pm.info,  Node: Text/PDF/Pages,  Next: Text/PDF/SFont,  Prev: Text/PDF/Page,  Up: Module List

a PDF pages hierarchical element. Inherits from *Note Text/PDF/Dict: Text/PDF/Dict,
***********************************************************************************

NAME
====

   Text::PDF::Pages - a PDF pages hierarchical element. Inherits from
*Note Text/PDF/Dict: Text/PDF/Dict,

DESCRIPTION
===========

   A Pages object is the parent to other pages objects or to page objects
themselves.

METHODS
=======

Text::PDF::Pages->new($parent)
------------------------------

   This creates a new Pages object. Notice that $parent here is not the
file context for the object but the parent pages object for this pages. If
we are using this class to create a root node, then $parent should point
to the file context, which is identified by not having a Type of Pages.

$p->out_obj($isnew)
-------------------

   Tells all the files that this thing is destined for that they should
output this object come time to output. If this object has no parent, then
it must be the root. So set as the root for the files in question and tell
it to be output too.  If $isnew is set, then call new_obj rather than
out_obj to create as a new object in the file.

$p->find_prop($key)
-------------------

   Searches up through the inheritance tree to find a property.

$p->add_font($pdf, $font)
-------------------------

   Creates or edits the resource dictionary at this level in the
hierarchy. If the font is already supported even through the hierarchy,
then it is not added.

$p->bbox($xmin, $ymin, $xmax, $ymax, [$param])
----------------------------------------------

   Specifies the bounding box for this and all child pages. If the values
are identical to those inherited then no change is made. $param specifies
the attribute name so that other 'bounding box'es can be set with this
method.

$p->proc_set(@entries)
----------------------

   Ensures that the current resource contains all the entries in the
proc_sets listed. If necessary it creates a local resource dictionary to
achieve this.


File: pm.info,  Node: Text/PDF/SFont,  Next: Text/PDF/String,  Prev: Text/PDF/Pages,  Up: Module List

PDF Standard inbuilt font resource object. Inherits from *Note Text/PDF/Dict: Text/PDF/Dict,
********************************************************************************************

NAME
====

   Text::PDF::SFont - PDF Standard inbuilt font resource object. Inherits
from *Note Text/PDF/Dict: Text/PDF/Dict,

METHODS
=======

Text::PDF::SFont->new($parent, $name, $pdfname)
-----------------------------------------------

   Creates a new font object with given parent and name. The name must be
from one of the core 14 base fonts included with PDF. These are:

     Courier,     Courier-Bold,   Courier-Oblique,   Courier-BoldOblique
     Times-Roman, Times-Bold,     Times-Italic,      Times-BoldItalic
     Helvetica,   Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique
     Symbol,      ZapfDingbats

   The $pdfname is the name that this particular font object will be
referenced by throughout the PDF file. If you want to play silly games
with naming, then you can write the code to do it!

   All fonts in this system are full PDF objects.

BUGS
====

   Currently no width support for Symbol or ZapfDingbats, I haven't got my
head around the AFMs yet.

   MacExpertEncoding not supported yet (I don't have the width info for any
of the fonts)

$f->width($text)
----------------

   Returns the width of the text in em.

$f->trim($text, $len)
---------------------

   Trims the given text to the given length (in per mille em) returning
the trimmed text

$f->out_text($text)
-------------------

   Acknowledges the text to be output for subsetting purposes, etc.


File: pm.info,  Node: Text/PDF/String,  Next: Text/PDF/TTFont,  Prev: Text/PDF/SFont,  Up: Module List

PDF String type objects and superclass for simple objects that are basically stringlike (Number, Name, etc.)
************************************************************************************************************

NAME
====

   Text::PDF::String - PDF String type objects and superclass for simple
objects that are basically stringlike (Number, Name, etc.)

METHODS
=======

Text::PDF::String->from_pdf($string)
------------------------------------

   Creates a new string object (not a full object yet) from a given string.
The string is parsed according to input criteria with escaping working.

Text::PDF::String->new($string)
-------------------------------

   Creates a new string object (not a full object yet) from a given string.
The string is parsed according to input criteria with escaping working.

$s->convert($str)
-----------------

   Returns $str converted as per criteria for input from PDF file

$s->val
-------

   Returns the value of this string (the string itself).

$->as_pdf
---------

   Returns the string formatted for output as PDF

$s->outobjdeep
--------------

   Outputs the string in PDF format, complete with necessary conversions


File: pm.info,  Node: Text/PDF/TTFont,  Next: Text/PDF/TTFont0,  Prev: Text/PDF/String,  Up: Module List

Inherits from *Note Text/PDF/Dict: Text/PDF/Dict, and represents a TrueType font within a PDF file.
***************************************************************************************************

NAME
====

   Text::PDF::TTFont - Inherits from *Note Text/PDF/Dict: Text/PDF/Dict,
and represents a TrueType font within a PDF file.

DESCRIPTION
===========

   A font consists of two primary parts in a PDF file: the header and the
font descriptor. Whilst two fonts may share font descriptors, they will
have their own header dictionaries including encoding and widhth
information.

INSTANCE VARIABLES
==================

   There are no instance variables beyond the variables which directly
correspond to entries in the appropriate PDF dictionaries.

METHODS
=======

Text::PDF::TTFont->new($parent, $fontfname, $pdfname, %opts)
------------------------------------------------------------

   Creates a new font resource for the given fontfile. This includes the
font descriptor and the font stream. The $pdfname is the name by which
this font resource will be known throught a particular PDF file.

   All font resources are full PDF objects.

$t->width($text)
----------------

   Measures the width of the given text according to the widths in the font

$t->trim($text, $len)
---------------------

   Trims the given text to the given length (in per mille em) returning
the trimmed text

$t->out_text($text)      Indicates to the font that the text is to be output and returns the text to be output
--------------------------------------------------------------------------------------------------------------

$f->copy
--------

   Copies the font object excluding the name, widths and encoding, etc.

TITLE
=====

   Text::PDF::TTIOString - internal IO type handle for string output for
font embedding. This code is ripped out of IO::Scalar, to save the direct
dependence for so little. See IO::Scalar for details


File: pm.info,  Node: Text/PDF/TTFont0,  Next: Text/PDF/Utils,  Prev: Text/PDF/TTFont,  Up: Module List

Inherits from `PDF::Dict' in this node and represents a TrueType Type 0 font within a PDF file.
***********************************************************************************************

NAME
====

   Text::PDF::TTFont0 - Inherits from `PDF::Dict' in this node and
represents a TrueType Type 0 font within a PDF file.

DESCRIPTION
===========

   A font consists of two primary parts in a PDF file: the header and the
font descriptor. Whilst two fonts may share font descriptors, they will
have their own header dictionaries including encoding and widhth
information.

INSTANCE VARIABLES
==================

   There are no instance variables beyond the variables which directly
correspond to entries in the appropriate PDF dictionaries.

METHODS
=======

Text::PDF::TTFont->new($parent, $fontfname. $pdfname)
-----------------------------------------------------

   Creates a new font resource for the given fontfile. This includes the
font descriptor and the font stream. The $pdfname is the name by which
this font resource will be known throughout a particular PDF file.

   All font resources are full PDF objects.

out_text($text)
---------------

   Returns the string to be put into a content stream for text to be
output in this font.  The text is assumed to be UTF8 encoded and the
return string is a glyph sequence for the text. If subsetting is enabled,
then all the glyphs returned are also marked for output.

width($text)
------------

   Returns the width of the string, assuming it to be UTF8 encoded.

outobjdeep($fh, $pdf)
---------------------

   Handles the creation of the font stream including subsetting at this
point. So if you get this far, that's it for subsetting.

ship_out($pdf)
--------------

   Ship this font out to the given $pdf file context

empty
-----

   Empty the font of as much as possible in order to save memory


File: pm.info,  Node: Text/PDF/Utils,  Next: Text/ParseWords,  Prev: Text/PDF/TTFont0,  Up: Module List

Utility functions for PDF library
*********************************

NAME
====

   Text::PDF::Utils - Utility functions for PDF library

DESCRIPTION
===========

   A set of utility functions to save the fingers of the PDF library users!

FUNCTIONS
=========

PDFBool
-------

   Creates a Bool via Text::PDF::Bool->new

PDFArray
--------

   Creates an array via Text::PDF::Array->new

PDFDict
-------

   Creates a dict via Text::PDF::Dict->new

PDFName
-------

   Creates a name via Text::PDF::Name->new

PDFNum
------

   Creates a number via Text::PDF::Number->new

PDFStr
------

   Creates a string via Text::PDF::String->new

asPDFBool
---------

   Returns a boolean value in PDF output form

asPDFStr
--------

   Returns a string in PDF output form (including () or <>)

asPDFName
---------

   Returns a Name in PDF Output form (including /)

asPDFNum
--------

   Returns a number in PDF output form

unpacku($str)
-------------

   Returns a list of unicode values for the given UTF8 string


File: pm.info,  Node: Text/ParseWords,  Next: Text/Query,  Prev: Text/PDF/Utils,  Up: Module List

parse text into an array of tokens or array of arrays
*****************************************************

NAME
====

   Text::ParseWords - parse text into an array of tokens or array of arrays

SYNOPSIS
========

     use Text::ParseWords;
     @lists = &nested_quotewords($delim, $keep, @lines);
     @words = &quotewords($delim, $keep, @lines);
     @words = &shellwords(@lines);
     @words = &parse_line($delim, $keep, $line);
     @words = &old_shellwords(@lines); # DEPRECATED!

DESCRIPTION
===========

   The &nested_quotewords() and &quotewords() functions accept a delimiter
(which can be a regular expression) and a list of lines and then breaks
those lines up into a list of words ignoring delimiters that appear inside
quotes.  &quotewords() returns all of the tokens in a single long list,
while &nested_quotewords() returns a list of token lists corresponding to
the elements of @lines.  &parse_line() does tokenizing on a single string.
The &*quotewords() functions simply call &parse_lines(), so if you're
only splitting one line you can call &parse_lines() directly and save a
function call.

   The $keep argument is a boolean flag.  If true, then the tokens are
split on the specified delimiter, but all other characters (quotes,
backslashes, etc.) are kept in the tokens.  If $keep is false then the
&*quotewords() functions remove all quotes and backslashes that are not
themselves backslash-escaped or inside of single quotes (i.e.,
&quotewords() tries to interpret these characters just like the Bourne
shell).  NB: these semantics are significantly different from the original
version of this module shipped with Perl 5.000 through 5.004.  As an
additional feature, $keep may be the keyword "delimiters" which causes the
functions to preserve the delimiters in each string as tokens in the token
lists, in addition to preserving quote and backslash characters.

   &shellwords() is written as a special case of &quotewords(), and it
does token parsing with whitespace as a delimiter- similar to most Unix
shells.

EXAMPLES
========

   The sample program:

     use Text::ParseWords;
     @words = &quotewords('\s+', 0, q{this   is "a test" of\ quotewords \"for you});
     $i = 0;
     foreach (@words) {
         print "$i: <$_>\n";
         $i++;
     }

   produces:

     0: <this>
     1: <is>
     2: <a test>
     3: <of quotewords>
     4: <"for>
     5: <you>

   demonstrating:

  1. a simple word

  2. multiple spaces are skipped because of our $delim

  3. use of quotes to include a space in a word

  4. use of a backslash to include a space in a word

  5. use of a backslash to remove the special meaning of a double-quote

  6. another simple word (note the lack of effect of the backslashed
     double-quote)
        Replacing `&quotewords('\s+', 0, q{this   is...})' with
`&shellwords(q{this   is...})' is a simpler way to accomplish the same
thing.

AUTHORS
=======

   Maintainer is Hal Pomeranz <pomeranz@netcom.com>, 1994-1997 (Original
author unknown).  Much of the code for &parse_line() (including the
primary regexp) from Joerk Behrends <jbehrends@multimediaproduzenten.de>.

   Examples section another documentation provided by John Heidemann
<johnh@ISI.EDU>

   Bug reports, patches, and nagging provided by lots of folks- thanks
everybody!  Special thanks to Michael Schwern <schwern@envirolink.org> for
assuring me that a &nested_quotewords() would be useful, and to Jeff
Friedl <jfriedl@yahoo-inc.com> for telling me not to worry about
error-checking (sort of- you had to be there).


File: pm.info,  Node: Text/Query,  Next: Text/Query/Advanced,  Prev: Text/ParseWords,  Up: Module List

Query processing framework
**************************

NAME
====

   Text::Query - Query processing framework

SYNOPSIS
========

     use Text::Query;
     
     # Constructor
     $query = Text::Query->new([QSTRING] [OPTIONS]);

     # Methods
     $query->prepare(QSTRING [OPTIONS]);
     $query->match([TARGET]);
     $query->matchscalar([TARGET]);

DESCRIPTION
===========

   This module provides an object that matches a data source against a
query expression.

   Query expressions are compiled into an internal form when a new object
is created or the prepare method is called; they are not recompiled on
each match.

   The class provided by this module uses four packages to process the
query.  The query parser parses the question and calls a query expression
builder (internal form of the question). The optimizer is then called to
reduce the complexity of the expression. The solver applies the expression
on a data source.

   The following parsers are provided:

Text::Query::ParseAdvanced
Text::Query::ParseSimple
   The following builders are provided:

Text::Query::BuildAdvancedString
Text::Query::BuildSimpleString
   The following solver is provided:

Text::Query::SolveSimpleString
Text::Query::SolveAdvancedString
EXAMPLES
========

     use Text::Query;
     my $q=new Text::Query('hello and world',
                           -parse => 'Text::Query::ParseAdvanced',
                           -solve => 'Text::Query::SolveAdvancedString',
                           -build => 'Text::Query::BuildAdvancedString');
     die "bad query expression" if not defined $q;
     print if $q->match;
     ...
     $q->prepare('goodbye or adios or ta ta',
                 -litspace => 1,
                 -case => 1);
     #requires single space between the two ta's
     if($q->match($line)) {
     #doesn't match "Goodbye"
     ...
     $q->prepare('"and" or "or"');
     #quoting operators for literal match
     ...
     $q->prepare('\\bintegrate\\b', -regexp => 1);
     #won't match "disintegrated"

CONSTRUCTOR
===========

new ([QSTRING] [OPTIONS])
     This is the constructor for a new Text::Query object.  If a `QSTRING'
     is given it will be compiled to internal form.

     OPTIONS are passed in a hash like fashion, using key and value pairs.
     Possible options are:

     *-parse* - Package name of the parser. Default is
     Text::Query::ParseSimple.

     *-build* - Package name of the builder. Default is Text::Query::Build.

     *-optimize* - Package name of the optimizer. Default is
     Text::Query::Optimize.

     *-solve* - Package name of the solver. Default is Text::Query::Solve.

     *-mode* - Name of predefined group of packages to use.  Options are
             currently `simple_text' and `advanced_text'.

     These options are handled by the configure method.

     All other options are passed to the parser prepare function.  See the
     corresponding manual pages for a description.

     If `QSTRING' is undefined, the prepare function is not called.

     The constructor will croak if a `QSTRING' was supplied and had
     illegal syntax.

METHODS
=======

configure ([OPTIONS])
     Set the parse, build, optimize or solve packages. See the CONSTRUCTOR
     description for explanations.

prepare (QSTRING [OPTIONS])
     Compiles the query expression in `QSTRING' to internal form and sets
     any options (same as in the constructor).  prepare may be used to
     change the query expression and options for an existing query object.
     If OPTIONS are omitted, any options set by a previous call to
     prepare are persistent.

     The optimizer (-optimize) is called with the result of the parser
     (-parse).  The parser uses the builder (-build) to construct the
     internal form.

     This method returns a reference to the query object if the syntax of
     the expression was legal, or croak if not.

match ([TARGET])
     Calls the match method of the solver (-solve).

matchscalar ([TARGET])
     Calls the matchscalar method of the solver (-solve).

SEE ALSO
========

   Text::Query::ParseAdvanced(3), Text::Query::ParseSimple(3),
Text::Query::BuildSimpleString(3), Text::Query::BuildAdvanedString(3),
Text::Query::SolveSimpleString(3), Text::Query::SolveAdvancedString(3),

   Text::Query::Build(3), Text::Query::Parse(3), Text::Query::Solve(3),
Text::Query::Optimize(3)

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Query/Advanced,  Next: Text/Query/Build,  Prev: Text/Query,  Up: Module List

Match text against Boolean expression
*************************************

NAME
====

   Text::Query::Advanced - Match text against Boolean expression

SYNOPSIS
========

     use Text::Query::Advanced;
     
     # Constructor
     $query = Text::Query::Advanced->new([QSTRING] [OPTIONS]);

     # Methods
     $query->prepare(QSTRING [OPTIONS]);
     $query->match([TARGET]);
     $query->matchscalar([TARGET]);

     # Methods that can be overridden to produce custom query trees, etc.

     $query->build_final_expression(Q1);
     $query->build_expression(Q1,Q2);
     $query->build_expression_finish(Q1);
     $query->build_conj(Q1,Q2,F);
     $query->build_near(Q1,Q2);
     $query->build_concat(Q1,Q2);
     $query->build_negation(Q1);
     $query->build_literal(Q1);

DESCRIPTION
===========

   This module provides an object that matches a string or list of strings
against a Boolean query expression similar to an AltaVista "advanced
query".  Elements of the query expression may be regular expressions or
literal text.

   Query expressions are compiled into an internal form (currently, a
regular expression making use of most of the tricks listed in Recipe 6.17
of _The Perl Cookbook_) when a new object is created or the prepare method
is called; they are not recompiled on each match.

   The class provided by this module may be subclassed to produce query
processors that match against input other than literal strings, e.g.
indices.

   Query expressions consist of literal strings (or regexps) joined by the
following operators, in order of precedence from lowest to highest:

OR, |
AND, &
NEAR
NOT, !
   Operator names are not case-sensitive.  Note that if you want to use a |
in a regexp, you need to backwhack it to keep it from being seen as a query
operator.  Sub-expressions may be quoted in single or double quotes to
match "and," "or," or "not" literally and may be grouped in parentheses
(`(, )') to alter the precedence of evaluation.

   A parenthesized sub-expression may also be concatenated with other sub-
expressions to match sequences: `(Perl or Python) interpreter' would match
either "Perl interpreter" or "Python interpreter".  Concatenation has a
precedence higher than NOT but lower than AND.  Juxtaposition of simple
words has the highest precedence of all.

EXAMPLES
========

     use Text::Query::Advanced;
     my $q=new Text::Query::Advanced('hello and world');
     die "bad query expression" if not defined $q;
     print if $q->match;
     ...
     $q->prepare('goodbye or adios or ta ta',-litspace=>1,-case=>1);
     #requires single space between the two ta's
     if ($q->match($line)) {
     #doesn't match "Goodbye"
     ...
     $q->prepare('"and" or "or"');
     #quoting operators for literal match
     ...
     $q->prepare('\\bintegrate\\b',-regexp=>1);
     #won't match "disintegrated"

CONSTRUCTOR
===========

new ([QSTRING] [OPTIONS])
     This is the constructor for a new Text::Query object.  If a `QSTRING'
     is given it will be compiled to internal form.

     OPTIONS are passed in a hash like fashion, using key and value pairs.
     Possible options are:

     *-case* - If true, do case-sensitive match.

     *-litspace* - If true, match spaces (except between operators) in
     `QSTRING' literally.  If false, match spaces as `\s+'.

     *-near* - Sets the number of words that can occur between two
     expressions and still satisfy the NEAR operator.  The default is 10.

     *-regexp* - If true, treat patterns in `QSTRING' as regular
     expressions rather than literal text.

     *-whole* - If true, match whole words only, not substrings of words.

     The constructor will return undef if a `QSTRING' was supplied and had
     illegal syntax.

METHODS
=======

prepare (QSTRING [OPTIONS])
     Compiles the query expression in `QSTRING' to internal form and sets
     any options (same as in the constructor).  prepare may be used to
     change the query expression and options for an existing query object.
     If OPTIONS are omitted, any options set by a previous call to the
     constructor or prepare remain in effect.

     This method returns a reference to the query object if the syntax of
     the expression was legal, or undef if not.

match ([TARGET])
     If `TARGET' is a scalar, match returns a true value if the string
     specified by `TARGET' matches the query object's query expression.  If
     `TARGET' is not given, the match is made against $_.

     If `TARGET' is an array, match returns a (possibly empty) list of all
     matching elements.  If the elements of the array are references to
     sub- arrays, the match is done against the first element of each
     sub-array.  This allows arbitrary information (e.g. filenames) to be
     associated with each string to match.

     If `TARGET' is a reference to an array, match returns a reference to
     a (possibly empty) list of all matching elements.

matchscalar ([TARGET])
     Behaves just like MATCH when `TARGET' is a scalar or is not given.
     Slightly faster than MATCH under these circumstances.

CODE-GENERATION METHODS
=======================

   The following methods are used to generate regexps based on query
elements.  They may be overridden to generate other forms of matching
code, such as trees to be used by a custom version of match that evaluates
index lists or the like.

   All these methods return a scalar corresponding to the code that
performs the specified options.  As supplied, they return regexp strings,
but overridden methods could return objects, array references, etc.

   Parameters Q1 and Q2 are the same type of scalar as the return values.

build_final_expression(Q1)
     Does any final processing to generate code to match a top-level
     expression.  As supplied, optionally adds case-insensitivity code and
     then uses `qr//' to compile the regexp.  The return value will be
     stored in the object's `matchexp' field.  It is NOT necessarily of a
     type that can be passed to the other code-generation methods.

build_expression(Q1,Q2)
     Generate code to match `Q1' OR `Q2'

build_expression_finish(Q1)
     Generate any code needed to enclose an expression.  As supplied,
     encloses the generated regexp in non-capturing parentheses.

build_conj(Q1,Q2,F)
     Generate code needed to match `Q1' AND `Q2'.  F will be true if this
     is the first time this method is called in a sequence of several
     conjunctions (the supplied method uses this to factor a common ^ out
     of the generated sub- expressions, which greatly speeds up matching).

     =item build_near(Q1,Q2)

     Generate code needed to match `Q1' NEAR `Q2'.

build_concat(Q1,Q2)
     Generate code needed to match `Q1' immediately followed by `Q2'.

build_negation(Q1)
     Generate code needed to match NOT `Q1'.

build_literal(Q1)
     Generate code to match `Q1' as a literal.

AUTHOR
======

   Eric Bohlman (ebohlman@netcom.com)

CREDITS
=======

   The parse_tokens routine was adapted from the parse_line routine in
Text::Parsewords.

COPYRIGHT
=========

   Copyright (c) 1998-1999 Eric Bohlman. All rights reserved.  This
program is free software; you can redistribute and/or modify it under the
same terms as Perl itself.  =cut


File: pm.info,  Node: Text/Query/Build,  Next: Text/Query/BuildAdvancedString,  Prev: Text/Query/Advanced,  Up: Module List

Base class for query builders
*****************************

NAME
====

   Text::Query::Build - Base class for query builders

SYNOPSIS
========

     package Text::Query::BuildMy;

     use Text::Query::Build;
     
     use vars qw(@ISA);

     @ISA = qw(Text::Query::Build);

DESCRIPTION
===========

   This module provides a virtual base class for query builders.

   Query builders are called by the parser logic. A given set of functions
is provided by the builder to match a Boolean logic.  All the methods
return a scalar corresponding to the code that performs the specified
options.

   Parameters Q1 and Q2 are the same type of scalar as the return values.

METHODS
=======

matchstring()
     Return a string that represent the last built expression. Two
     identical expressions should generate the same string. This is for
     testing purpose.

CODE-GENERATION METHODS
=======================

build_init()
     Called before building the expression. A chance to initialize object
     data.

build_final_expression(Q1)
     Does any final processing to generate code to match a top-level
     expression.  The return value is NOT necessarily of a type that can
     be passed to the other code-generation methods.

build_expression(Q1,Q2)
     Generate code to match `Q1' OR `Q2'

build_expression_finish(Q1)
     Generate any code needed to enclose an expression.

build_conj(Q1,Q2,F)
     Generate code needed to match `Q1' AND `Q2'.  F will be true if this
     is the first time this method is called in a sequence of several
     conjunctions.

     =item build_near(Q1,Q2)

     Generate code needed to match `Q1' NEAR `Q2'.

build_concat(Q1,Q2)
     Generate code needed to match `Q1' immediately followed by `Q2'.

build_negation(Q1)
     Generate code needed to match NOT `Q1'.

build_literal(Q1)
     Generate code to match `Q1' as a literal.

build_scope_start($scope)
     Generate code to enter in the `$scope' query context.

build_scope_end($scope,Q1)
     Generate code needed to match `Q1' in the `$scope' context.

build_mandatory(Q1)
     Generate code to match `Q1' (think + in AltaVista syntax).

build_forbiden(Q1)
     Generate code to match NOT `Q1' (think - in AltaVista syntax).

SEE ALSO
========

   Text::Query(3)

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Query/BuildAdvancedString,  Next: Text/Query/BuildSimpleString,  Prev: Text/Query/Build,  Up: Module List

Builder for Text::Query::ParseAdvanced to build regexps
*******************************************************

NAME
====

   Text::Query::BuildAdvancedString - Builder for
Text::Query::ParseAdvanced to build regexps

SYNOPSIS
========

     use Text::Query;
     my $q=new Text::Query('hello and world',
                           -parse => 'Text::Query::ParseAdvanced',
                           -solve => 'Text::Query::SolveAdvancedString',
                           -build => 'Text::Query::BuildAdvancedString');

DESCRIPTION
===========

   Build a regexp to match the advanced query parsed by
Text::Query::ParseAdvanced.  The words of the query can be regular
expressions and will provide the expected result if the `-regexp' option
is set.

SEE ALSO
========

   Text::Query(3) Text::Query::Build(3)

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Query/BuildSimpleString,  Next: Text/Query/Optimize,  Prev: Text/Query/BuildAdvancedString,  Up: Module List

Builder for Text::Query::ParseSimple to build regexps
*****************************************************

NAME
====

   Text::Query::BuildSimpleString - Builder for Text::Query::ParseSimple
to build regexps

SYNOPSIS
========

     use Text::Query;
     my $q=new Text::Query('+hello +world',
                           -parse => 'Text::Query::ParseSimple',
                           -solve => 'Text::Query::SolveSimpleString',
                           -build => 'Text::Query::BuildSimpleString');

DESCRIPTION
===========

   Build a regexp to match the simple query parsed by
Text::Query::ParseSimple.  The words of the query can be regular
expressions and will provide the expected result if the `-regexp' option
is set.

SEE ALSO
========

   Text::Query(3) Text::Query::Build(3)

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Query/Optimize,  Next: Text/Query/Parse,  Prev: Text/Query/BuildSimpleString,  Up: Module List

Base class for query parsers
****************************

NAME
====

   Text::Query::Parse - Base class for query parsers

SYNOPSIS
========

     package Text::Query::OptimizeSmart;

     use Text::Query::Optimize;
     
     use vars qw(@ISA);

     @ISA = qw(Text::Query::Optimize);

DESCRIPTION
===========

   This module provides a virtual base class for query optimizers.

   It defines the optimize method that is called by the `Text::Query'
object to optimize the internal query.

METHODS
=======

optimize (INTERNAL)
     Returns the INTERNAL argument after optimization. The default
     implementation returns the argument untouched.

SEE ALSO
========

   Text::Query(3)

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Query/Parse,  Next: Text/Query/ParseAdvanced,  Prev: Text/Query/Optimize,  Up: Module List

Base class for query parsers
****************************

NAME
====

   Text::Query::Parse - Base class for query parsers

SYNOPSIS
========

     package Text::Query::ParseThisSyntax;

     use Text::Query::Parse;
     
     use vars qw(@ISA);

     @ISA = qw(Text::Query::Parse);

DESCRIPTION
===========

   This module provides a virtual base class for query parsers.

   It defines the prepare method that is called by the `Text::Query'
object to compile the query string.

MEMBERS
=======

*-build* Pointer to a Text::Query::Build object.
scope Scope stack. Defines the context in which the query must be solved.
token The current token. Destroyed by prepare.
tokens A reference to the list of all the tokens. Filled by parse_tokens. Destroyed by prepare.
parseopts A reference to a hash table containing all the parameters given to the prepare function.
*-verbose* Integer indicating the desired verbose level.
METHODS
=======

prepare (QSTRING [OPTIONS])
     Compiles the query expression in `QSTRING' to internal form and sets
     any options. First calls `build_init' to reset the builder and
     destroy the token and tokens members. Then calls parse_tokens to fill
     the tokens member. Then calls expression to use the tokens from
     tokens. The expression is expected to call the build_* functions to
     build the compiled expression. At last calls `build_final_expression'
     with the result of expression.

     A derived parser must redefine this function to define default values
     for specific options.

expression ()
     Must be redefined by derived package. Returns the internal form of the
     question built from build_* functions using the tokens.

parse_tokens (QSTRING)
     Must be redefined by derived package. Parses the `QSTRING' scalar and
     fills the tokens member with lexical units.


     Shortcuts to the corresponding function of the Text::Query::Build
     object found in the `-build' member.

OPTIONS
=======

   These are the options of the prepare method and the constructor.

-quotes defaults to \'\"
     Defines the quote characters.

-case defaults to 0
     If true, do case-sensitive match.

-litspace defaults to 0
     If true, match spaces (except between operators) in `QSTRING'
     literally.  If false, match spaces as `\s+'.

-regexp defaults to 0
     If true, treat patterns in `QSTRING' as regular expressions rather
     than literal text.

-whole defaults to 0
     If true, match whole words only, not substrings of words.

SEE ALSO
========

   Text::Query(3)

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Query/ParseAdvanced,  Next: Text/Query/ParseSimple,  Prev: Text/Query/Parse,  Up: Module List

Parse AltaVista advanced query syntax
*************************************

NAME
====

   Text::Query::ParseAdvanced - Parse AltaVista advanced query syntax

SYNOPSIS
========

     use Text::Query;
     my $q=new Text::Query('hello and world',
                           -parse => 'Text::Query::ParseAdvanced',
                           -solve => 'Text::Query::SolveAdvancedString',
                           -build => 'Text::Query::BuildAdvancedString');

DESCRIPTION
===========

   This module provides an object that parses a string containing a
Boolean query expression similar to an AltaVista "advanced query".

   It's base class is Text::Query::Parse;

   Query expressions consist of literal strings (or regexps) joined by the
following operators, in order of precedence from lowest to highest:

OR, |
AND, &
NEAR, ~
NOT, !
   Operator names are not case-sensitive.  Note that if you want to use a |
in a regexp, you need to backwhack it to keep it from being seen as a query
operator.  Sub-expressions may be quoted in single or double quotes to
match "and," "or," or "not" literally and may be grouped in parentheses
(`(, )') to alter the precedence of evaluation.

   A parenthesized sub-expression may also be concatenated with other sub-
expressions to match sequences: `(Perl or Python) interpreter' would match
either "Perl interpreter" or "Python interpreter".  Concatenation has a
precedence higher than NOT but lower than AND.  Juxtaposition of simple
words has the highest precedence of all.

OPTIONS
=======

   These are the additional options of the prepare method and the
constructor.

-near defaults to 10
     Sets the number of words that can occur between two expressions and
     still satisfy the NEAR operator.

-operators defaults to and, or, not, near
     Sets the operator names. The argument of the option is a pointer to a
     hash table mapping the default names to desired names. For instance:

          {
          	'or' => 'ou',
          	'and' => 'et',
          	'near' => 'proche',
          	'not' => 'non',
          }

-scope_map default to {}
     Map the scope names to other names. If a scope is specified as
     `scope:' search the map for an entry whose key is scope and replace
     scope with the scalar found. For instance:

          {
          	 'scope' => 'otherscope'
          }

SEE ALSO
========

   Text::Query(3) Text::Query::Parse(3)

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Query/ParseSimple,  Next: Text/Query/Simple,  Prev: Text/Query/ParseAdvanced,  Up: Module List

Parse AltaVista simple query syntax
***********************************

NAME
====

   Text::Query::ParseSimple - Parse AltaVista simple query syntax

SYNOPSIS
========

     use Text::Query;
     my $q=new Text::Query('hello and world',
                           -parse => 'Text::Query::ParseSimple',
                           -solve => 'Text::Query::SolveSimpleString',
                           -build => 'Text::Query::BuildSimpleString');

DESCRIPTION
===========

   This module provides an object that parses a string containing a
Boolean query expression similar to an AltaVista "simple query". Elements
of the query expression may be assigned weights.

   It's base class is Text::Query::Parse;

   Query expressions are compiled into an internal form when a new object
is created or the prepare method is called; they are not recompiled on each
match.

   Query expressions consist of words (sequences of non-whitespace) or
phrases (quoted strings) separated by whitespace.  Words or phrases
prefixed with a + must be present for the expression to match; words or
phrases prefixed with a - must be absent for the expression to match.

   Words or phrases may optionally be followed by a number in parentheses
(no whitespace is allowed between the word or phrase and the parenthesized
number).  This number specifies the weight given to the word or phrase.
If a weight is not given, a weight of 1 is assumed.

EXAMPLES
========

     use Text::Query;
     my $q=new Text::Query('+hello world',
                           -solve => 'Text::Query::SolveSimpleString',
                           -build => 'Text::Query::BuildSimpleString');
     die "bad query expression" if not defined $q;
     $count=$q->match;
     ...
     $q->prepare('goodbye adios -"ta ta"', -litspace=>1);
     #requires single space between the two ta's
     if ($q->match($line, -case=>1)) {
     #doesn't match "Goodbye"
     ...
     $q->prepare('\\bintegrate\\b', -regexp=>1);
     #won't match "disintegrated"
     ...
     $q->prepare('information(2) retrieval');
     #information has twice the weight of retrieval

SEE ALSO
========

   Text::Query(3) Text::Query::Parse(3)

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Query/Simple,  Next: Text/Query/Solve,  Prev: Text/Query/ParseSimple,  Up: Module List

Match text against simple query expression and return relevance value for ranking
*********************************************************************************

NAME
====

   Text::Query::Simple - Match text against simple query expression and
return relevance value for ranking

SYNOPSIS
========

     use Text::Query::Simple;
     
     # Constructor
     $query = Text::Query::Simple->new([QSTRING] [OPTIONS]);

     # Methods
     $query->prepare(QSTRING [OPTIONS]);
     $query->match([TARGET]);
     $query->matchscalar([TARGET]);

DESCRIPTION
===========

   This module provides an object that tests a string or list of strings
against a query expression similar to an AltaVista "simple  query" and
returns a "relevance value."  Elements of the query expression may be
regular expressions or literal text, and may be assigned weights.

   Query expressions are compiled into an internal form when a new object
is created or the prepare method is called; they are not recompiled on each
match.

   Query expressions consist of words (sequences of non-whitespace),
regexps or phrases (quoted strings) separated by whitespace.  Words or
phrases prefixed with a + must be present for the expression to match;
words or phrases prefixed with a - must be absent for the expression to
match.

   A successful match returns a count of the number of times any of the
words (except ones prefixed with -) appeared in the text.  This type of
result is useful for ranking documents according to relevance.

   Words or phrases may optionally be followed by a number in parentheses
(no whitespace is allowed between the word or phrase and the parenthesized
number).  This number specifies the weight given to the word or phrase; it
will be added to the count each time the word or phrase appears in the
text.  If a weight is not given, a weight of 1 is assumed.

EXAMPLES
========

     use Text::Query::Simple;
     my $q=new Text::Query::Simple('+hello world');
     die "bad query expression" if not defined $q;
     $count=$q->match;
     ...
     $q->prepare('goodbye adios -"ta ta",-litspace=>1);
     #requires single space between the two ta's
     if ($q->match($line,-case=>1)) {
     #doesn't match "Goodbye"
     ...
     $q->prepare('\\bintegrate\\b',-regexp=>1);
     #won't match "disintegrated"
     ...
     $q->prepare('information(2) retrieval');
     #information has twice the weight of retrieval

CONSTRUCTOR
===========

new ([QSTRING] [OPTIONS])
     This is the constructor for a new Text::Query::Simple object.  If a
     `QSTRING' is given it will be compiled to internal form.

     OPTIONS are passed in a hash like fashion, using key and value pairs.
     Possible options are:

     *-case* - If true, do case-sensitive match.

     *-litspace* - If true, match spaces (except between operators) in
     `QSTRING' literally.  If false, match spaces as `\s+'.

     *-regexp* - If true, treat patterns in `QSTRING' as regular
     expressions rather than literal text.

     *-whole* - If true, match whole words only, not substrings of words.

     The constructor will return undef if a `QSTRING' was supplied and had
     illegal syntax.

METHODS
=======

prepare (QSTRING [OPTIONS])
     Compiles the query expression in `QSTRING' to internal form and sets
     any options (same as in the constructor).  prepare may be used to
     change the query expression and options for an existing query object.
     If OPTIONS are omitted, any options set by a previous call to the
     constructor or prepare remain in effect.

     This method returns a reference to the query object if the syntax of
     the expression was legal, or undef if not.

match ([TARGET])
     If `TARGET' is a scalar, match returns the number of words in the
     string specified by `TARGET' that match the query object's query
     expression.  If `TARGET' is not given, the match is made against $_.

     If `TARGET' is an array, match returns a list of references to
     anonymous arrays consisting of each element followed by its match
     count.  The list is sorted in descending order by match count.  If
     the elements of `TARGET' were anonymous arrays, the match count is
     appended to each element.  This allows arbitrary information (such as
     a filename) to be associated with each element.

     If `TARGET' is a reference to an array, match returns a reference to
     a sorted list of matching items, with counts, for all elements.

matchscalar ([TARGET])
     Behaves just like MATCH when `TARGET' is a scalar or is not given.
     Slightly faster than MATCH under these circumstances.

RESTRICTIONS
============

   This module requires Perl 5.005 or higher due to the use of evaluated
expressions in regexes

AUTHOR
======

   Eric Bohlman (ebohlman@netcom.com)

CREDITS
=======

   The parse_tokens routine was adapted from the parse_line routine in
Text::Parsewords.

COPYRIGHT
=========

   Copyright (c) 1998 Eric Bohlman. All rights reserved.  This program is
free software; you can redistribute and/or modify it under the same terms
as Perl itself.  =cut


File: pm.info,  Node: Text/Query/Solve,  Next: Text/Query/SolveAdvancedString,  Prev: Text/Query/Simple,  Up: Module List

Base class for query resolution
*******************************

NAME
====

   Text::Query::Solve - Base class for query resolution

SYNOPSIS
========

     package Text::Query::SolveSource;

     use Text::Query::Parse;
     
     use vars qw(@ISA);

     @ISA = qw(Text::Query::Solve);

DESCRIPTION
===========

   This module provides a virtual base class for query resolution.

   It defines the match and matchscalar method that is called by the
`Text::Query' object to apply a query on a data source.

METHODS
=======

match (EXPR [TARGET])
     If `TARGET' is a scalar, match returns a true value if the data source
     specified by `TARGET' matches the EXPR query expression.  If `TARGET'
     is not given, the match is made against $_.

     If `TARGET' is an array, match returns a (possibly empty) list of all
     matching elements.  If the elements of the array are references to
     sub- arrays, the match is done against the first element of each
     sub-array.  This allows arbitrary information (e.g. filenames) to be
     associated with each data source to match.

     If `TARGET' is a reference to an array, match returns a reference to
     a (possibly empty) list of all matching elements.

matchscalar (EXPR [TARGET])
     Behaves just like MATCH when `TARGET' is a scalar or is not given.

SEE ALSO
========

   Text::Query(3)

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)