This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: Text/Query/SolveAdvancedString,  Next: Text/Query/SolveSimpleString,  Prev: Text/Query/Solve,  Up: Module List

Apply query expression on strings
*********************************

NAME
====

   Text::Query::SolveAdvancedString - Apply query expression on strings

SYNOPSIS
========

     use Text::Query;
     my $q=new Text::Query('hello and world',
                           -parse => 'Text::Query::ParseAdvanced',
                           -solve => 'Text::Query::SolveAdvancedString',
                           -build => 'Text::Query::BuildAdvancedString');

     $q->match('this hello is a world')

DESCRIPTION
===========

   Applies an expression built by Text::Query::BuildAdvancedString to a
list of strings.

METHODS
=======

match ([TARGET])
     If `TARGET' is a scalar, match returns a true value if the string
     specified by `TARGET' matches the query object's query expression.  If
     `TARGET' is not given, the match is made against $_.

     If `TARGET' is an array, match returns a (possibly empty) list of all
     matching elements.  If the elements of the array are references to
     sub- arrays, the match is done against the first element of each
     sub-array.  This allows arbitrary information (e.g. filenames) to be
     associated with each string to match.

     If `TARGET' is a reference to an array, match returns a reference to
     a (possibly empty) list of all matching elements.

matchscalar ([TARGET])
     Behaves just like MATCH when `TARGET' is a scalar or is not given.
     Slightly faster than MATCH under these circumstances.

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Query/SolveSimpleString,  Next: Text/Refer,  Prev: Text/Query/SolveAdvancedString,  Up: Module List

Apply query expression on strings
*********************************

NAME
====

   Text::Query::SolveSimpleString - Apply query expression on strings

SYNOPSIS
========

     use Text::Query;
     my $q=new Text::Query('+hello +world',
                           -parse => 'Text::Query::ParseSimple',
                           -solve => 'Text::Query::SolveSimpleString',
                           -build => 'Text::Query::BuildSimpleString');

     $q->match('this hello is a world')

DESCRIPTION
===========

   Applies an expression built by Text::Query::BuildSimpleString to a list
of strings.

METHODS
=======

match ([TARGET])
     If `TARGET' is a scalar, match returns a true value if the string
     specified by `TARGET' matches the query object's query expression.  If
     `TARGET' is not given, the match is made against $_.

     If `TARGET' is an array, match returns a (possibly empty) list of all
     matching elements.  If the elements of the array are references to
     sub- arrays, the match is done against the first element of each
     sub-array.  This allows arbitrary information (e.g. filenames) to be
     associated with each string to match.

     If `TARGET' is a reference to an array, match returns a reference to
     a (possibly empty) list of all matching elements.

matchscalar ([TARGET])
     Behaves just like MATCH when `TARGET' is a scalar or is not given.
     Slightly faster than MATCH under these circumstances.

AUTHORS
=======

   Eric Bohlman (ebohlman@netcom.com)

   Loic Dachary (loic@senga.org)


File: pm.info,  Node: Text/Refer,  Next: Text/Reflow,  Prev: Text/Query/SolveSimpleString,  Up: Module List

parse Unix "refer" files
************************

NAME
====

   Text::Refer - parse Unix "refer" files

   *This is Alpha code, and may be subject to changes in its public
interface.  It will stabilize by June 1997, at which point this notice
will be removed.  Until then, if you have any feedback, please let me
know!*

SYNOPSIS
========

   Pull in the module:

     use Text::Refer;

   Parse a refer stream from a filehandle:

     while ($ref = input Text::Refer \*FH)  {
     	# ...do stuff with $ref...
     }
     defined($ref) or die "error parsing input";

   Same, but using a parser object for more control:

   # Create a new parser:     $parser = new Text::Refer::Parser
LeadWhite=>'KEEP';

   # Parse:     while ($ref = $parser->input(\*FH))  { 	# ...do stuff with
$ref...      }     defined($ref) or die "error parsing input";

   Manipulating reference objects, using high-level methods:

     # Get the title, author, etc.:
     $title      = $ref->title;
     @authors    = $ref->author;      # list context
     $lastAuthor = $ref->author;      # scalar context
     
     # Set the title and authors:
     $ref->title("Cyberiad");
     $ref->author(["S. Trurl", "C. Klapaucius"]);   # arrayref for >1 value!
     
     # Delete the abstract:
     $ref->abstract(undef);

   Same, using low-level methods:

     # Get the title, author, etc.:
     $title      = $ref->get('T');
     @authors    = $ref->get('A');      # list context
     $lastAuthor = $ref->get('A');      # scalar context
     
     # Set the title and authors:
     $ref->set('T', "Cyberiad");
     $ref->set('A', "S. Trurl", "C. Klapaucius");
     
     # Delete the abstract:
     $ref->set('X');                    # sets to empty array of values

   Output:

     print $ref->as_string;

DESCRIPTION
===========

   *This module supercedes the old Text::Bib.*

   This module provides routines for parsing in the contents of
"refer"-format bibliographic databases: these are simple text files which
contain one or more bibliography records.  They are usually found lurking
on Unix-like operating systems, with the extension `.bib'.

   Each record in a "refer" file describes a single paper, book, or
article.  Users of nroff/troff often employ such databases when
typesetting papers.

   Even if you don't use *roff, this simple, easily-parsed parameter-value
format is still useful for recording/exchanging bibliographic information.
With this module, you can easily post-process "refer" files: search them,
convert them into LaTeX, whatever.

Example
-------

   Here's a possible "refer" file with three entries:

     %T Cyberiad
     %A Stanislaw Lem
     %K robot fable
     %I Harcourt/Brace/Jovanovich
     
     %T Invisible Cities
     %A Italo Calvino
     %K city fable philosophy
     %X In this surreal series of fables, Marco Polo tells an
        aged Kublai Khan of the many cities he has visited in
        his lifetime.
     
     %T Angels and Visitations
     %A Neil Gaiman
     %D 1993

   The lines separating the records must be *completely blank*; that is,
they cannot contain anything but a single newline.

   See refer(1) or grefer(1) for more information on "refer" files.

Syntax
------

   *From the GNU manpage, `grefer(1)':*

   The  bibliographic  database  is a text file consisting of records
separated by one or more blank lines.  Within each record  fields  start
with a % at the beginning of a line.  Each field has a one character name
that immediately  follows the  %.  It is best to use only upper and lower
case letters for the names of fields. The name  of  the  field should  be
followed by exactly one space, and then by the contents of the field.
Empty  fields  are  ignored.   The conventional meaning of each field is
as follows:

A
     The name of an author. If the name contains a title such as Jr. at
     the end, it should	be separated from the last name by a comma.  There
     can be multiple occurrences of the A field.  The order is significant.
     It is a good idea always to supply an A field or a Q field.

B
     For an article that is part of a book, the title of the book

C
     The place (city) of publication.

D
     The date of publication.  The year should be specified in full.  If
     the month is specified, the name rather than the number of the month
     should be used, but only the first three letters are required.  It is
     a good idea always to supply a D field; if the date is unknown, a
     value such as "in press" or "unknown" can be used.

E
     For  an article that is part of a book, the name of an editor of the
     book.  Where the work has editors and no authors, the names of the
     editors should be  given as A fields and , (ed) or , (eds)  should  be
     appended to the last author.

G
     US Government ordering number.

I
     The publisher (issuer).

J
     For an article in a journal, the name of the journal.

K
     Keywords to be used for searching.

L
     Label.

     NOTE: Uniquely identifies the entry.  For example, "Able94".

N
     Journal issue number.

O
     Other information.  This is usually printed at the end of the
     reference.

P
     Page number.  A range of pages can be specified as m-n.

Q
     The name of the author, if the author is not a person.  This will
     only be used if there are no A fields.  There can only be one Q field.

     NOTE: Thanks to Mike Zimmerman for clarifying this for me: it means a
     "corporate" author: when the "author" is listed as an organization
     such as the UN, or RAND Corporation, or whatever.

R
     Technical report number.

S
     Series name.

T
     Title.  For an article in a book or journal, this should be the title
     of the article.

V
     Volume number of the journal or book.

X
     Annotation.

     NOTE: Basically, a brief abstract or description.

   For all fields except A and E, if there is more than one occurrence of
a particular field in a record, only the last such field will be used.

   If accent strings are used, they should follow the character to be
accented.  This means that the AM macro must  be used  with  the -ms
macros.  Accent strings should not be quoted: use one \ rather than two.

Parsing records from "refer" files
----------------------------------

   You will nearly always use the `input()' constructor to create new
instances, and nearly always as shown in the `"SYNOPSIS"' in this node.

   Internally, the records are parsed by a parser object; if you invoke
the class method `Text::Refer::input()', a special default parser is used,
and this will be good enough for most tasks.  However, for more complex
tasks, feel free to use `"class Text::Refer::Parser"' in this node to
build (and use) your own fine-tuned parser, and `input()' from that
instead.

CLASS Text::Refer
=================

   Each instance of this class represents a single record in a "refer"
file.

Construction and input
----------------------

new
     *Class method, constructor.* Build an empty "refer" record.

input FILEHANDLE
     *Class method.* Input a new "refer" record from a filehandle.  The
     default parser is used:

          while ($ref = input Text::Refer \*STDIN) {
          	# ...do stuff with $ref...
          }

     Do not use this as an instance method; it will not re-init the object
     you give it.

Getting/setting attributes
--------------------------

attr ATTR, [VALUE]
     *Instance method.* Get/set the attribute by its one-character name,
     ATTR.  The VALUE is optional, and may be given in a number of ways:

        * *If the VALUE is given as undefined*, the attribute will be
          deleted:

               $ref->attr('X', undef);        # delete the abstract

        * *If a defined, non-reference scalar VALUE is given,* it is used
          to replace the existing values for the attribute with that
          single value:

               $ref->attr('T', "The Police State Rears Its Ugly Head");
               $ref->attr('D', 1997);

        * *If an arrayref VALUE is given,* it is used to replace the
          existing values for the attribute with *all elements of that
          array:*

               $ref->attr('A', ["S. Trurl", "C. Klapaucius"]);

          We use an arrayref since an empty array would be impossible to
          distinguish from the next two cases, where the goal is to "get"
          instead of "set"...

     This method returns the current (or new) value of the given attribute,
     just as get() does:

        * *If invoked in a scalar context,* the method will return the
          last value (this is to mimic the behavior of *groff*).  Hence,
          given the above, the code:

               $author = $ref->attr('A');

          will set `$author' to `"C. Klapaucius"'.

        * *If invoked in an array context,* the method will return the list
          of all values, in order.  Hence, given the above, the code:

               @authors = $ref->attr('A');

          will set `@authors' to `("S. Trurl", "C. Klapaucius")'.

     Note: this method is used as the basis of all "named" access methods;
     hence, the following are equivalent in every way:

          $ref->attr(T => $title)    <=>   $ref->title($title);
          $ref->attr(A => \@authors) <=>   $ref->author(\@authors);
          $ref->attr(D => undef)     <=>   $ref->date(undef);
          $auth  = $ref->attr('A')   <=>   $auth  = $ref->author;
          @auths = $ref->attr('A')   <=>   @auths = $ref->author;

author, book, city, ... [VALUE]
     *Instance methods.* For every one of the standard fields in a "refer"
     record, this module has designated a high-level attribute name:

          A  author     G  govt_no      N  number        S  series
          B  book       I  publisher    O  other_info    T  title
          C  city       J  journal      P  page          V  volume
          D  date       K  keywords     Q  corp_author   X  abstract
          E  editor     L  label        R  report_no

     Then, for each field F with high-level attribute name *FIELDNAME*,
     the method `FIELDNAME()' works as follows:

          $ref->attr('F', @args)     <=>   $ref->FIELDNAME(@args)

     Which means:

          $ref->attr(T => $title)    <=>   $ref->title($title);
          $ref->attr(A => \@authors) <=>   $ref->author(\@authors);
          $ref->attr(D => undef)     <=>   $ref->date(undef);
          $auth  = $ref->attr('A')   <=>   $auth  = $ref->author;
          @auths = $ref->attr('A')   <=>   @auths = $ref->author;

     See the documentation of attr() for the argument list.

get ATTR
     *Instance method.* Get an attribute, by its one-character name.  In
     an array context, it returns all values (empty if none):

          @authors = $ref->get('A');      # returns list of all authors

     In a scalar context, it returns the last value (undefined if none):

          $author = $ref->get('A');       # returns the last author

set ATTR, VALUES...
     *Instance method.* Set an attribute, by its one-character name.

          $ref->set('A', "S. Trurl", "C. Klapaucius");

     An empty array of VALUES deletes the attribute:

          $ref->set('A');       # deletes all authors

     No useful return value is currently defined.

Output
------

as_string [OPTSHASH]
     *Instance method.* Return the "refer" record as a string, usually for
     printing:

          print $ref->as_string;

     The options are:

    Quick
          If true, do it quickly, but unsafely.  *This does no fixup on
          the values at all:* they are output as-is.  That means if you
          used parser-options which destroyed any of the formatting
          whitespace (e.g., `Newline=TOSPACE' with `LeadWhite=KILLALL'),
          there is a risk that the output object will be an invalid
          "refer" record.

     The fields are output with %L first (if it exists), and then the
     remaining fields in alphabetical order.  The following "safety
     measures" are normally taken:

        * Lines longer than 76 characters are wrapped (if possible, at a
          non-word character a reasonable length in, but there is a chance
          that they will simply be "split" if no such character is
          available).

        * Any occurences of '%' immediately after a newline are preceded
          by a single space.

     These safety measures are slightly time-consuming, and are silly if
     you are merely outputting a "refer" object which you have read in
     verbatim (i.e., using the default parser-options) from a valid
     "refer" file.  In these cases, you may want to use the Quick option.

     =cut

     sub as_string {     my ($self, %opts) = @_;     my ($key, $val);

          # Figure out the keys to use, and put them in order:
          my @keys = sort grep {(length == 1) && ($_ ne 'L')} (keys %$self);
          defined($self->{'L'}) && unshift(@keys, 'L');

          # Output:
          my @lines;
          foreach $key (@keys) {
          	foreach $val (@{$self->{$key}}) {
          	    unless ($opts{Quick}) {
          		### print "UNWRAPPED = [$val]\n";
          		_wrap($val);             # make sure no line exceeds 80 chars
          		### print "WRAPPED   = [$val]\n";
          		$val =~ s/\n%/\n %/g;    # newlines must NOT be followed by %
          		$val =~ s/\n+\Z//;       # strip trailing newlines
          	    }
          	    push @lines, join('', '%', $key, ' ', $val, "\n");
          	}
          }
          join '', @lines;
          }

CLASS Text::Refer::Parser
=========================

   Instances of this class do the actual parsing.

Parser options
--------------

   The options you may give to new() are as follows:

ForgiveEOF
     Normally, the last record in a file must end with a blank line, or
     else this module will suspect it of being incomplete and return an
     error.  However, if you give this option as true, it will allow the
     last record to be terminated by an EOF.

GoodFields
     By default, the parser accepts any (one-character) field name that is
     a printable ASCII character (no whitespace).  Formally, this is:

          [\041-\176]

     However, when compiling parser options, you can supply your own
     regular expression for validating (one-character) field names.
     (*note:* you must supply the square brackets; they are there to remind
     you that you should give a well-formed single-character expression).
     One standard expression is provided for you:

          $Text::Refer::GroffFields  = '[A-EGI-LN-TVX]';  # legal groff fields

     Illegal fields which are encounterd during parsing result in a syntax
     error.

     NOTE: You really shouldn't use this unless you absolutely need to.
     The added regular expression test slows down the parser.

LeadWhite
     In many "refer" files, continuation lines (the 2nd, 3rd, etc. lines
     of a field) are written with leading whitespace, like this:

          %T Incontrovertible Proof that Pi Equals Three
             (for Large Values of Three)
          %A S. Trurl
          %X The author shows how anyone can use various common household
             objects to obtain successively less-accurate estimations of
             pi, until finally arriving at a desired integer approximation,
             which nearly always is three.

     This leading whitespace serves two purposes: (1) it makes it
     impossible to mistake a continuation line for a field, since % can no
     longer be the first character, and (2) it makes the entries easier to
     read.  The LeadWhite option controls what is done with this
     whitespace:

          KEEP	- default; the whitespace is untouched
          KILLONE	- exactly one character of leading whitespace is removed
          KILLALL	- all leading whitespace is removed

     See the section below on "using the parser options" for hints and
     warnings.

Newline
     The Newline option controls what is done with the newlines that
     separate adjacent lines in the same field:

          KEEP	- default; the newlines are kept in the field value
          TOSPACE	- convert each newline to a single space
          KILL	- the newlines are removed

     See the section below on "using the parser options" for hints and
     warnings.

   Default values will be used for any options which are left unspecified.

Notes on the parser options
---------------------------

   The default values for Newline and LeadWhite will preserve the input
text exactly.

   The `Newline=TOSPACE' option, when used in conjunction with the
`LeadWhite=KILLALL' option, effectively "word-wraps" the text of each
field into a single line.

   *Be careful!* If you use the `Newline=KILL' option with either the
`LeadWhite=KILLONE' or the `LeadWhite=KILLALL' option, you could end up
eliminating all whitespace that separates the word at the end of one line
from the word at the beginning of the next line.

Public interface
----------------

new PARAMHASH
     *Class method, constructor.* Create and return a new parser.  See
     above for the `"parser options"' in this node which you may give in
     the PARAMHASH.

create [CLASS]
     *Instance method.* What class of objects to create.  The default is
     `Text::Refer'.

input FH
     *Instance method.* Create a new object from the next record in a
     "refer" stream.  The actual class of the object is given by the
     `class()' method.

     Returns the object on success, '0' on *expected* end-of-file, and
     undefined on error.

     Having two false values makes parsing very simple: just `input()'
     records until the result is false, then check to see if that last
     result was 0 (end of file) or undef (failure).

NOTES
=====

Under the hood
--------------

   Each "refer" object has instance variables corresponding to the actual
field names ('T', 'A', etc.).  Each of these is a reference to an array of
the actual values.

   Notice that, for maximum flexibility and consistency (but at the cost of
some space and access-efficiency), the semantics of "refer" records do not
come into play at this time: since everything resides in an array, you can
have as many %K, %D, etc. records as you like, and given them entirely
different semantics.

   For example, the Library Of Boring Stuff That Everyone Reads (LOBSTER)
uses the unused %Y as a "year" field.  The parser accomodates this case by
politely not choking on LOBSTER .bibs (although why you would want to eat
a lobster bib instead of the lobster is beyond me...).

Performance
-----------

   Tolerable.  On my 90MHz/32 MB RAM/I586 box running Linux 1.2.13 and
Perl5.002, it parses a typical 500 KB "refer" file (of 1600 records) as
follows:

     8 seconds of user time for input and no output
         10 seconds of user time for input and "quick" output
         16 seconds of user time for input and "safe" output

   So, figure the individual speeds are:

     input:            200 records ( 60 KB) per second.
     "quick" output:   800 records (240 KB) per second.
     "safe" output:    200 records ( 60 KB) per second.

   By contrast, a C program which does the same work is about 8 times as
fast.  But of course, the C code is 8 times as large, and 8 times as
ugly...  `:-)'

Note to serious bib-file users
------------------------------

   I actually do not use "refer" files for *roffing... I used them as a
quick-and-dirty database for WebLib, and that's where this code comes
from.  If you're a serious user of "refer" files, and this module doesn't
do what you need it to, please contact me: I'll add the functionality in.

BUGS
====

   Some combinations of parser-options are silly.

CHANGE LOG
==========

   $Id: Refer.pm,v 1.106 1997/04/22 18:41:41 eryq Exp $

Version 1.101
     Initial release.  Adapted from Text::Bib.

AUTHOR
======

   Copyright (C) 1997 by Eryq, `eryq@enteract.com',
`http://www.enteract.com/~eryq'.

NO WARRANTY
===========

   This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.

   This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

   For a copy of the GNU General Public License, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


File: pm.info,  Node: Text/Reflow,  Next: Text/Roman,  Prev: Text/Refer,  Up: Module List

Perl module for reflowing text files using Knuth's paragraphing algorithm.
**************************************************************************

NAME
====

   Text::Reflow - Perl module for reflowing text files using Knuth's
paragraphing algorithm.

SYNOPSIS
========

     use Text::Reflow qw(reflow_file reflow_string reflow_array);

     reflow_file($infile, $outfile, key => value, ...);

     $output = reflow_string($input, key => value, ...);

     $output = reflow_array(\@input, key => value, ...);

DESCRIPTION
===========

   These routines will reflow the paragraphs in the given file,
filehandle, string or array using Knuth's paragraphing algorithm (as used
in TeX) to pick "good" places to break the lines.

   Each routine takes ascii text data with paragraphs separated by blank
lines and reflows the paragraphs.  If two or more lines in a row are
"indented" then they are assumed to be a quoted poem and are passed
through unchanged (but see below)

   The reflow algorithm tries to keep the lines the same length but also
tries to break at punctuation, and avoid breaking within a proper name or
after certain *connectives* ("a", "the", etc.). The result is a file with
a more "ragged" right margin than is produced by fmt or Text::Wrap but it
is easier to read since fewer phrases are broken across line breaks.

   For `reflow_file', if $infile is the empty string, then the input is
taken from STDIN and if $outfile is the empty string, the output is
written to STDOUT.  Otherwise, $infile and $outfile may be a string, a
FileHandle reference or a FileHandle glob.

   A typical invocation is:

     reflow_file("myfile", "");

   which reflows the whole of `myfile' and prints the result to STDOUT.

KEYWORD OPTIONS
---------------

   The behaviour of Reflow can be adjusted by setting various keyword
options.  These can be set globally by referencing the appropriate
variable in the Text::Reflow package, for example:

     $Text::Reflow::maximum = 80;
     $Text::Reflow::optimum = 75;

   will set the maximum line length to 80 characters and the optimum line
length to 75 characters for all subsequent reflow operations.  Or they can
be passed to a reflow_ function as a keyword parameter, for example:

     $out = reflow_string($in, maximum => 80, optimum => 75);

   in which case the new options only apply to this call.

   The following options are currently implemented, with their default
values:

optimum => [65]
     The optimum line length in characters.  This can be either a number
     or a reference to an array of numbers:  in the latter case, each
     optimal line length is tried in turn for each paragraph, and the one
     which leads to the best overall paragraph is chosen.  This results in
     less ragged paragraphs, but some paragraphs will be wider or narrower
     overall than others.

maximum => 75
     The maximum allowed line length.

indent => ""
     Each line of output has this string prepended. `indent => string' is
     equivalent to `indent1 => string, indent2 => string'.

indent1 => ""
     A string which is used to indent the first line in any paragraph.

indent2 => ""
     A string which is used to indent the second and subsequent line in
     any paragraph.

quote => ""
     Characters to strip from the beginning of a line before processing.
     To reflow a quoted email message and then restore the quotes you
     might want to use

          quote => "> ", indent => "> "

skipto => ""
     Skip to the first line starting with the given pattern before starting
     to reflow. This is useful for skipping Project Gutenberg headers or
     contents tables.

skipindented => 2
     If skipindented = 0 then all indented lines are flowed in with the
     surrounding paragraph.  If skipindented = 1 then any indented line
     will not be reflowed.  If skipindented = 2 then any two or more
     adjacent indented lines will not be reflowed.  The purpose of the
     default value is to allow poetry to pass through unchanged, but not
     to allow a paragraph indentation from preventing the first line of
     the paragraph from being reflowed.

noreflow => ""
     A pattern to indicate that certain lines should not be reflowed.  For
     example, a table of contents might have a line of dots.  The option:

          noreflow => '(\.\s*){4}\.'

     will not reflow any lines containing five or more consecutive dots.

frenchspacing => 'n'
     Normally two spaces are put at the end of a sentance or a clause.
     The frenchspacing option (taken from the TeX macro of the same name)
     disables this feature.

oneparagraph => 'n'
     Set this to 'y' if you want the whole input to be flowed into a single
     paragraph, ignoring blank lines in the input.

semantic => 30
     This parameter indicates the extent to which semantic factors matter
     (breaking on punctuation, avoiding a break within a clause etc.).
     Set this to zero to minimise the raggedness of the right margin, at
     the expense of readability.

namebreak => 10
     Penalty for splitting up a name

sentence => 20
     Penalty for sentence widows and orphans (ie splitting a line
     immediately after the first word in a sentence, or before the last
     word in a sentence)

independent => 10
     Penalty for independent clause widows and orphans.

dependent => 6
     Penalty for dependent clause widows and orphans.

shortlast => 5
     Penalty for a short last line in a paragraph (one or two words).

connpenalty => 1
     Multiplier for the "negative penalty" for breaking at a connective.
     In other words, increasing this value makes connectives an even more
     attractive place to break a line.

EXPORT
------

   None by default.

AUTHOR
======

   Original `reflow' perl script written by Michael Larsen,
larsen@edu.upenn.math.

   Modified, enhanced and converted to a perl module with XSUB by Martin
Ward, Martin.Ward@durham.ac.uk

SEE ALSO
========

   perl(1).

   See "TeX the Program" by Donald Knuth for a description of the
algorithm used.


File: pm.info,  Node: Text/Roman,  Next: Text/ScriptTemplate,  Prev: Text/Reflow,  Up: Module List

Converts roman algarism in integer numbers and the contrary, recognize algarisms.
*********************************************************************************

NAME
====

   Text::Roman - Converts roman algarism in integer numbers and the
contrary, recognize algarisms.

SYNOPSIS
========

     use Text::Roman;

     print roman(123);

DESCRIPTION
===========

   Text::Roman::roman() is a very simple algarism converter. It converts a
single integer (in arabic algarisms) at a time to its roman correspondent.
The conventional roman numbers goes from 1 up to 3999. MROMANS (milhar
romans) range is 1 up to 3999*1000+3999=4002999.

   Up to these number we will found symbols as:??????but they do not
concern this specific package. There is no concern for mix cases, like
'Xv', 'XiiI', as legal roman algarism numbers.

*roman($int)*: return string containing  the roman corresponding to the given integer, or " if the integer is out of domain...

*roman2int($str)*: return " if $str is not roman or return integer if it is.
*isroman($str)*: verify whether the given string is a conventional roman number, if it is return 1; if it is not return 0...
   Quite same follows for *mroman2int($str)* and *ismroman($str)*, except
that these functions treat milhar romans.

SPECIFICATION
=============

   Roman number has origin in following BNF-like formula:

   a =	I{1,3}

   b =	V\a?|IV|\a

   e =	X{1,3}\b?|X{0,3}IX|\b

   ee =	IX|\b

   f =	L\e?|XL\ee?|\e

   g =	C{1,3}\f?|C{0,3}XC\ee?|\f

   gg =	XC\ee?|\f

   h =	D\g?|CD\gg?|\g

   j =	M{1,3}\h?|M{0,3}CM\gg?|\h

REFERENCES
==========

   Especification supplied by redactor's manual of newspaper "O Estado de
São Paulo".  URL: http://www.estado.com.br/redac/norn-nro.html

EXAMPLE
=======

     use Text::Roman;
     
     $roman	= "XXXV";
     $mroman	= 'L_X_XXIII';
     print roman(123), "\n";
     print roman2int($roman), "\n"	if isroman($roman);
     print mroman2int($mroman), "\n"	if ismroman($mroman);

BUGS
====

   No one known.

AUTHOR
======

   Peter de Padua Krauss, krauss@ifqsc.sc.usp.br.

COPYRIGHT
=========

   1.2-krauss/set/97; 1.0-krauss/3/ago/97


File: pm.info,  Node: Text/ScriptTemplate,  Next: Text/Search,  Prev: Text/Roman,  Up: Module List

Lightweight processor for full-featured template
************************************************

NAME
====

     Text::ScriptTemplate - Lightweight processor for full-featured template

SYNOPSIS
========

     use Text::ScriptTemplate;

     $tmpl = new Text::ScriptTemplate;    # create processor object
     $tmpl->setq(TEXT => "hello, world"); # export data to template
     $tmpl->load($file);                  # loads template from named file
     $tmpl->pack(q{TEXT: <%= $TEXT; %>}); # loads template from in-memory data

     print $tmpl->fill;                   # prints "TEXT: hello, world"

     # load intermixed Perl script and text as a template
     $tmpl->pack(q{<% for (1..3) { %>i = <%= "$_\n"; %><% } %>});

     print $tmpl->fill;                   # prints "i = 1\ni = 2\ni = 3\n"

DESCRIPTION
===========

   This is a varient of Text::SimpleTemplate, a module for template-based
text generation.

   Template-based text generation is a way to separate program code and
data, so non-programmer can control final result (like HTML) as desired
without tweaking the program code itself. By doing so, jobs like website
maintenance is much easier because you can leave program code unchanged
even if page redesign was needed.

   The idea of this module is simple. Whenever a block of text surrounded
by '<%' and '%>' (or any pair of delimiters you specify) is found, it will
be taken as Perl expression, and will be handled specially by template
processing engine. With this module, Perl script and text can be
intermixed closely.

   Major goal of this library is to provide support of powerful PHP-style
template with smaller resource. This is useful when PHP, HTML::Embperl, or
Apache::ASP is overkill, but their template style is still desired.

INSTALLATION / REQUIREMENTS
===========================

   This module requires Carp.pm and FileHandle.pm.  Since these are
standard modules, all you need is perl itself.

   For installation, standard procedure of

     perl Makefile.PL
     make
     make test
     make install

   should work just fine.

TEMPLATE SYNTAX AND USAGE
=========================

   Any block of text surrounded by '<%' and '%>' will be handled specially
by template processor.

   For block surrounded by '<%=' and '%>, it will be taken as simple perl
expression, and will be replace by its evaluated result.

   For block surrounded by '<%' and '%>, it will be taken as part of
control structure, and after all parts are merged into one big block, it
will be evaluated and the result will be handled as output.

   Suppose you have a following template named "sample.tmpl":

     === Module Information ===
     <% if ($HAS->{Text::ScriptTemplate}) { %>
     Name: <%= $INFO->{Name}; %>
     Description: <%= $INFO->{Description}; %>
     Author: <%= $INFO->{Author}; %> <<%= $INFO->{Email}; %>>
     <% } else { %>
     Text::ScriptTemplate is not installed.
     <% } %>

   With the following code...

     use Safe;
     use Text::ScriptTemplate;

     $tmpl = new Text::ScriptTemplate;
     $tmpl->setq(INFO => {
         Name        => "Text::ScriptTemplate",
         Description => "Lightweight processor for full-featured template",
         Author      => "Taisuke Yamada",
         Email       => "tai\@imasy.or.jp",
     });
     $tmpl->setq(HAS => { Text::ScriptTemplate => 1 }); # installed
     $tmpl->load("sample.tmpl");

     print $tmpl->fill(PACKAGE => new Safe);

   ...you will get following result:

     === Module Information ===

     Name: Text::ScriptTemplate
     Description: Lightweight processor for full-featured template
     Author: Taisuke Yamada <tai@imasy.or.jp>

   If you change

     $tmpl->setq(HAS => { Text::ScriptTemplate => 1 }); # installed

   to

     $tmpl->setq(HAS => { Text::ScriptTemplate => 0 }); # not installed

   then you will get

     === Module Information ===

     Text::ScriptTemplate is not installed.

   You can embed any control strucure as long as intermixed text block is
surround by set of braces. This means

     hello world<% if ($firsttime); %>

   must be written as

     <% do { %>hello world<% } if ($firsttime); %>

   If you want to know more on this, please read TEMPLATE INTERNAL section
for the detail.

   Also, as you might have noticed, any scalar data can be exported to
template namespace, even hash reference or code reference.

   Finally, although I used "Safe" module in example above, this is not a
requirement. However, if you want to control power of the template editor
over program logic, its use is strongly recommended (see the Safe manpage
for more).

RESERVED NAMES
==============

   Since template can be evaluated in separate namespace, this module does
not have much restriction on variable or function name you define in
theory.

   However, due to internal structure of this module, please consider all
names starting with "_" (underscore) as reserved for internal usage.

METHODS
=======

   Following methods are currently available.

$tmpl = new Text::ScriptTemplate;
     Constructor. Returns newly created object.

     If this method was called through existing object, cloned object will
     be returned. This cloned instance inherits all properties except for
     internal buffer which holds template data. Cloning is useful for
     chained template processing.

$tmpl->setq($name => $data, $name => $data, ...);
     Exports scalar data ($data) to template namespace, with $name as a
     scalar variable name to be used in template.

     You can repeat the pair to export multiple sets in one operation.

$tmpl->load($file, %opts);
     Loads template file ($file) for later evaluation.  File can be
     specified in either form of pathname or fileglob.

     This method accepts DELIM option, used to specify delimiter for
     parsing template. It is speficied by passing reference to array
     containing delimiter pair, just like below:

          $tmpl->load($file, DELIM => [qw(<? ?>)]);

     Returns object reference to itself.

$tmpl->pack($data, %opts);
     Loads in-memory data ($data) for later evaluation.  Except for this
     difference, works just like $tmpl->load.

$text = $tmpl->fill(%opts);
     Returns evaluated result of template, which was preloaded by either
     $tmpl->pack or $tmpl->load method.

     This method accepts two options: PACKAGE and OHANDLE.

     PACKAGE option specifies the namespace where template evaluation
     takes place. You can either pass the name of the package, or the
     package object itself. So either of

          $tmpl->fill(PACKAGE => new Safe);
          $tmpl->fill(PACKAGE => new Some::Module);
          $tmpl->fill(PACKAGE => 'Some::Package');

     works. In case Safe module (or its subclass) was passed, its "reval"
     method will be used instead of built-in eval.

     OHANDLE option is for output selection. By default, this method
     returns the result of evaluation, but with OHANDLE option set, you
     can instead make it print to given handle.  Either style of

          $tmpl->fill(OHANDLE => \*STDOUT);
          $tmpl->fill(OHANDLE => new FileHandle(...));

     is supported.

TEMPLATE INTERNAL
=================

   Internally, template processor converts template into one big perl
script, and then simply executes it. Conversion rule is fairly simple - If
you have following template,

     <% if ($bool) { %>
     hello, <%= $name; %>
     <% } %>

   it will be converted into

     if ($bool) {
         $_handle->(q{
     hello, });
         $_handle->(do{ $name; });
         $_handle->(q{
     });
     }

   Note line breaks are preserved. After all conversion is done, it will
be executed. And depending on existance of OHANDLE option, $_handle (this
is a code reference to predefined function) will either print or buffer
its argument.

NOTES / BUGS
============

   Nested template delimiter will cause this module to fail.

SEE ALSO
========

   *Note Safe: Safe, and *Note Text/SimpleTemplate: Text/SimpleTemplate,

CONTACT ADDRESS
===============

   Please send any bug reports/comments to <tai@imasy.or.jp>.

AUTHORS / CONTRIBUTORS
======================

     - Taisuke Yamada <tai@imasy.or.jp>

COPYRIGHT
=========

   Copyright 2001. All rights reserved.

   This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.


File: pm.info,  Node: Text/Search,  Next: Text/Sentence,  Prev: Text/ScriptTemplate,  Up: Module List

Perl module to allow quick searching of directories for given text.
*******************************************************************

NAME
====

   *Text::Search* - Perl module to allow quick searching of directories
for given text.

   Version 0.91

SYNOPSIS
========

   use Text::Search;

   Simple Search: my $term = 'foo AND bar';

   my $search = Text::Search->new(); my @results = $search->Find($term);

   foreach (@results) { print "Found $term in $_->{'FILENAME'}
$_->{'OCCURENCES'} times.\n" };

   RegEx Search: my @terms = ('(foo.*)','(bar)');

   my $search = Text::Search->new('RegExSearch', '1'); my @results =
$search->Find(@terms);

   foreach (@results) { print "Found $_->{'FILENAME'} with
$_->{'OCCURENCES'}.\n" };

DESCRIPTION
===========

   *Text::Search* takes in a given directory and search term, and will
recursively search for all occurences for the term. Features include:
extension filtering, binary filter (won't search binary files), simple and
regex search expressions.

   Information is returned as an array of hashes sorted descending by
number of occurences.

CONSTRUCTORS
============

`Text::Search->new()'
     `$search = Text::Search->new('RegExSearch', '1', 'DocumentRoot',
     '/usr/home/mike', 'FileFilter, '(^.*\.htm*$)' );'

     Prepares a search to be performed. The search will execute with a
     $search->Find().

     RegExSearch = Set to 1 if this is to be a regular expression search.
     0 if this is a simple search. (Default)

     DocumentRoot = Where to begin the search from, search is recursive.
     Default is (/usr).

     WebRoot = For use with a website. Only needs to be set if your
     DocumentRoot is set to something other than your WebRoot.

     FileFilter = Regular expression to filter out unwanted files. Default
     is all files.

     Recursive = Set to 0 to turn off recursive searching. Default is 1.

     Highlight = Set to 1 to turn on Highlighting of matched words. Useful
     for bolding matched text in websites. Default is 0.

     HighlightBegin = Customize the code to appear before a match. Default
     is '<b>'.

     HighlightEnd = Customize the code to appear after a match. Default is
     '</b>'.

`$search->Find()'
     `@results = $search->Find('blowfish OR foo AND bar');' `@results =
     $search->Find('(blowfish)|(foo)','(bar)');'

     Executes a search for the given terms.

     Data is returned in an array of hashes which can be accessed like so:

     `foreach (@results) { print "$_->{'FILENAME'} : $_->{'FILEKSIZE'} :
     $_->{'LAST_MODIFIED'} : $_->{'OCCURENCES'}\n" };'

     OR

     `print "Most likely target is $results[0]{'FILENAME'}\n";'

     The following keys are available in the returned hash:

     FULLNAME = Full path and filename of file. (ie. /home/doug/readme.txt)

     FILENAME = Name of the file (ie. readme.txt)

     FILEPATH = Path to the file (ie. /home/doug)

     FILESIZE = Size of file in bytes.

     FILEKSIZE = Size of file in kilobytes.

     LAST_MODIFIED_EPOCH = Time since file was last modified in seconds.

     LAST_MODIFIED = Date and Time of last modified in long format.

     OCCURENCES = Number of times the search pattern was matched.

     URL = For Website use, the path and filename of file from the given
     DocumentRoot.

     SNIPPET = Snippet of text containing text of the first matched term.

     TITLE = Pulled from the <TITLE> tag of an HTML page.

     META_TITLE = Pulled from the <meta name="title"> tag.

     META_DESCRIPTION = Pulled from the <meta name="description"> tag.

     META_KEYWORDS = Pulled from the <meta name="keywords"> tag.

EXAMPLES
========

Simple script usage.
--------------------

   use Text::Search;

   my $search = Text::Search->new('DocumentRoot', '/usr/home/bill/ebooks');

   print "Searching:\n\n";

   my @results = $search->Find('romeo AND juliette');

   foreach (@results) { print "Found it $_->{'OCCURENCES'} times in
$_->{'FILENAME'}\n" };

Web based application.
----------------------

   This is an excellent way to add a search engine to any html site.

   require ("cgi-lib.pl"); use Text::Search;

   &ReadParse (*input);

   my $search = Text::Search->new('DocumentRoot', $ENV{'DOCUMENT_ROOT'},
'FileFilter', '(^.*\.html$)');

   my @results = $search->Find($input{'User_Search'});

   print "Content-type: text/html \n\n";

   print "Sorry, I couldn't find your request!" if (scalar @results == 0);

   foreach (@results) { print "<a
href=\"$_->{'URL'}\">$_->{'URL'}</a><br>Relevancy:
$_->{'OCCURENCES'}<br>Last Updated: $_->{'LAST_MODIFIED'}<br><hr>\n" };

BUGS
====

   Be careful when using simple search. It will attemp to quote any
illegal characters, etc. But for security sake, do your own checks before
passing user input into the simple search.

   If you have trouble with the HTML grabbing (ie. Title tags, Meta tags),
take a look at the syntax of the HTML document. Text::Search tries to be
forgiving, but it expects something like this:

   <title>This is my title</title>

   <meta name="description" content="This is my wonderful website">

DISCLAIMER
==========

   This package is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.

COPYRIGHT
=========

   Copyright (c) 2001 Mike Miller.  All rights reserved.

LICENSE
=======

   This program is free software: you can redistribute it and/or modify it
under the same terms as Perl itself.

AUTHOR
======

   Mike Miller <mrmike@2bit.net>


File: pm.info,  Node: Text/Sentence,  Next: Text/Shoebox,  Prev: Text/Search,  Up: Module List

module for splitting text into sentences
****************************************

NAME
====

   Text::Sentence - module for splitting text into sentences

SYNOPSIS
========

     use Text::Sentence qw( split_sentences );
     use locale;
     use POSIX qw( locale_h );

     setlocale( LC_CTYPE, 'iso_8859_1' );
     @sentences = split_sentences( $text );

DESCRIPTION
===========

   The `Text::Sentence' module contains the function split_sentences, which
splits text into its constituent sentences, based on a fairly approximate
regex. If you set the locale before calling it, it will deal correctly with
locale dependant capitalization to identify sentence boundaries. Certain
well know exceptions, such as abreviations, may cause incorrect
segmentations.

FUNCTIONS
=========

split_sentences( $text )
------------------------

   The split sentences function takes a scalar containing ascii text as an
argument and returns an array of sentences that the text has been split
into.

     @sentences = split_sentences( $text );

SEE ALSO
========

     locale
     POSIX

AUTHOR
======

   Ave Wrigley <wrigley@cre.canon.co.uk>

COPYRIGHT
=========

   Copyright (c) 1997 Canon Research Centre Europe (CRE). All rights
reserved.  This script and any associated documentation or files cannot be
distributed outside of CRE without express prior permission from CRE.


File: pm.info,  Node: Text/Shoebox,  Next: Text/SimpleTemplate,  Prev: Text/Sentence,  Up: Module List

read and write SIL Shoebox Standard Format (.sf) files
******************************************************

NAME
====

   Text::Shoebox - read and write SIL Shoebox Standard Format (.sf) files

SYNOPSIS
========

     use Text::Shoebox;
     my $lex = [];
     foreach my $file (@ARGV) {
       read_sf(
         from_file => $file, into => $lex,
       ) or warn "read from $file failed\n";
     }
     print scalar(@$lex), " entries read.\n";
     
     die "hw field-names differ\n"
      unless are_hw_keys_uniform($lex);
     warn "hw field-values aren't unique\n"
      unless are_hw_values_unique($lex);
     
     write_fs(from => $lex, to_file => "merged.sf")
      or die "Couldn't write to merged.sf: $!";

DESCRIPTION
===========

   The Summer Institute of Linguistics (`http://www.sil.org/') makes a
piece of free software called "the Linguist's Shoebox", or just "Shoebox"
for short.  It's a simple database program generally used for making
lexicon databases (altho it can also be used for databases of field notes,
etc.).

   Shoebox can export its databases to SF (Standard Format) files, a
simple text format.  Reading and writing those SF files is what this Perl
module, Text::Shoebox, is for.  (I have heard that Standard Format
predates Shoebox quite a bit, and is used by other programs.  If you use
SF files with something other than Shoebox, I'd be interested in hearing
about it, particularly about whether such files and Text::Shoebox are
happily compatible.)

FUNCTIONS
=========

$lex_lol = read_fs(...options...)
     Reads entries in Standard Format from the source specified.  If no
     entries were read, returns false.  Otherwise, returns a reference to
     the array that the entries were added to (which will be a new array,
     unless the "into" option is set).

     The options are:

    from_file => STRING
          This specifies that the source of the SF data is a file, whose
          filespec is given.

    from_fh => FILEHANDLE
          This specifies that the source of the SF data is a given
          filehandle.  (Examples of filehandle values: or global
          filehandle passed either like `*MYFH{IO}' or `*MYFH'; or object
          values from an IO class like IO::Socket or IO::Handle.)

          The filehandle isn't closed when all its data is read.

    rs => STRING
          This specifies that the given string should be used as the record
          separator (newline string) for the data source being read from.

          If the SF source is specified by a "from_file" option, and you
          don't specify an "rs" option, then Text::Shoebox will try
          guessing the line format of the file by reading the first 2K of
          the file and looking for a CRLF ("\cm\cj"), an LF ("\cj"), or a
          CR ("\cm").  If you need to stop it from trying to guess, just
          stipulate an "rs" value of $/.

          If the SF source is specified by a "from_file" option, and you
          don't specify an "rs" option, then Text::Shoebox will just use
          the value in the Perl variable $/ (the global RS value).

    into => ARRAYREF
          If this option is stipulated, then entries read will be pushed
          to the end of the array specified.  Otherwise the entries will
          be put into a new array.

     Example use:

          use Text::Shoebox;
          my $lexicon = read_fs(from_file => 'foo.sf')
           || die "No entries?";
          print scalar(@$lexicon), " entries read.\n";
          print "First entry has ",
           @{ $lexicon->[0] } / 2 , " fields.\n";

write_fs(...options...)
     This writes the given lexicon, in Standard Format, to the destination
     specified.  If all entries were written, returns true; otherwise (in
     case of an IO error), returns false, in which case you should check
     $!.

     The options are:

    from => ARRAYREF
          This option must be present, to specify the lexicon that you
          want to write out.

    to_file => STRING
          This specifies that the SF data is to be written to the file
          specified.  (Note that the file is opened in overwrite mode, not
          append mode.)

    to_handle => FILEHANDLE
          This specifies that the destination for the SF data is the given
          filehandle.

          The filehandle isn't closed when all the data is written to it.

    rs => STRING
          This specifies that the given string should be used as the record
          separator (newline string) for the SF data written.

          If not specified, defaults to "\n".

are_hw_keys_uniform($lol)
     This function returns true iff all the entries in the lexicon have the
     same key for their headword fields (i.e., the first field per record).
     This will always be true if you read the lexicon from one file; but if
     you read it from several, it's possible that the different files have
     different keys marking headword fields.

are_hw_values_unique($lex_lol)
     This function returns true iff all the headword values in all non-null
     entries in the lexicon $lol are unique - i.e., if no two (or more)
     entries have the same values for their headword fields.  I don't know
     if uniqueness is a requirement for SF lexicons that you'd want to
     import into Shoebox, but some tasks you put lexicons to might require
     it.

A NOTE ABOUT VALIDITY
=====================

   I make very few assumptions about what characters can be in a field key
in SF files.  Just now, I happen to assume they can't start with an
underscore (lest they be considered comments), and can't contain any
whitespace characters.

   I make essentially no assumptions about what can be in a field value,
except that there can be no newline followed immediately by a backslash.
(Any newline-backslash sequence in turned into newline-space-backslash.)

   You should be aware that Shoebox, or whatever other programs use SF
files, may have a *much* more restricted view of what can be in a field
key or value.

COPYRIGHT
=========

   Copyright 2000, Sean M. Burke `sburke@cpan.org', all rights reserved.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

AUTHOR
======

   Sean M. Burke, `sburke@cpan.org'

   Please contact me if you find that this module is not behaving
correctly.  I've tested it only on Shoebox files I generate on my own.

   I hasten to point out, incidentally, that I am not in any way
affiliated with the Summer Institute of Linguistics.


