This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: Stone/Cursor,  Next: Stone/GB_Sequence,  Prev: Stone,  Up: Module List

Traverse tags and values of a Stone
***********************************

NAME
====

   Stone::Cursor - Traverse tags and values of a Stone

SYNOPSIS
========

     use Boulder::Store;
     $store = Boulder::Store->new('./soccer_teams');

     my $stone = $store->get(28);
     $cursor = $stone->cursor;
     while (my ($key,$value) = $cursor->each) {
       print "$value: Go Bluejays!\n" if $key eq 'State' and $value eq 'Katonah';
     }

DESCRIPTION
===========

   Boulder::Cursor is a utility class that allows you to create one or
more iterators across a *Note Stone: Stone, object.  This is used for
traversing large Stone objects in order to identify or modify portions of
the record.

CLASS METHODS
-------------

Boulder::Cursor->new($stone)
     Return a new Boulder::Cursor over the specified *Note Stone: Stone,
     object.  This will return an error if the object is not a *Note
     Stone: Stone, or a descendent. This method is usually not called
     directly, but rather indirectly via the *Note Stone: Stone, cursor()
     method:

          my $cursor = $stone->cursor;

OBJECT METHODS
--------------

$cursor->each()
     Iterate over the attached *Stone*.  Each iteration will return a
     two-valued list consisting of a tag path and a value.  The tag path is
     of a form that can be used with *Stone::index()* (in fact, a cursor
     is used internally to implement the Stone::dump() method.  When the
     end of the *Stone* is reached, `each()' will return an empty list,
     after which it will start over again from the beginning.  If you
     attempt to insert or delete from the stone while iterating over it,
     all attached cursors will reset to the beginnning.

     For example:

          $cursor = $s->cursor;
          while (($key,$value) = $cursor->each) {
                     print "$value: BOW WOW!\n" if $key=~/pet/;
          }

$cursor->reset()
     This resets the cursor back to the beginning of the associated
     *Stone*.

AUTHOR
======

   Lincoln D. Stein <lstein@cshl.org>.

COPYRIGHT
=========

   Copyright 1997-1999, Cold Spring Harbor Laboratory, Cold Spring Harbor
NY.  This module can be used and distributed on the same terms as Perl
itself.

SEE ALSO
========

   *Note Boulder: Boulder,, *Note Stone: Stone,


File: pm.info,  Node: Stone/GB_Sequence,  Next: Storable,  Prev: Stone/Cursor,  Up: Module List

Specialized Access to GenBank Records
*************************************

NAME
====

   Stone::GB_Sequence - Specialized Access to GenBank Records

SYNOPSIS
========

     use Boulder::Genbank;  # No need to use Stone::GB_Sequence directly
     $gb = Boulder::Genbank->newFh qw(M57939 M28274 L36028);

     while ($entry = <$gb>) {
       print "Entry's length is ",$entry->length,"\n";
       @cds   = $entry->match_features(-type=>'CDS');
       @exons = $entry->match_features(-type=>'Exon',-start=>100,-end=>300);
     }
     }

DESCRIPTION
===========

   Stone::GB_Sequence provides several specialized access methods to the
various fields in a GenBank flat file record.  You can return the sequence
as a Bio::Seq object, or query the sequence for features that match
positional or descriptional criteria that you provide.

CONSTRUCTORS
============

   This class is not intended to be created directly, but via a *Note
Boulder/Genbank: Boulder/Genbank, stream.

METHODS
=======

   In addition to the standard *Note Stone: Stone, methods and accessors,
the following methods are provided.  In the synopses, the variable $entry
refers to a previously-created Stone::GB_Sequence object.

$length = $entry->length
------------------------

   Get the length of the sequence.

$start = $entry->start
----------------------

   Get the start position of the sequence, currently always "1".

$end = $entry->end
------------------

   Get the end position of the sequence, currently always the same as the
length.

@feature_list = $entry->features(-pos=>[50,450],-type=>['CDS','Exon'])
----------------------------------------------------------------------

   features() will search the entry feature list for those features that
meet certain criteria.  The criteria are specified using the *-pos* and/or
*-type* argument names, as shown below.

-pos
     Provide a position or range of positions which the feature must
     overlap.  A single position is specified in this way:

          -pos => 1500;         # feature must overlap postion 1500

     or a range of positions in this way:

          -pos => [1000,1500];  # 1000 to 1500 inclusive

     If no criteria are provided, then features() returns all the features,
     and is equivalent to calling the Features() accessor.

-type, -types
     Filter the list of features by type or a set of types.  Matches are
     case-insensitive, so "exon", "Exon" and "EXON" are all equivalent.
     You may call with a single type as in:

          -type => 'Exon'

     or with a list of types, as in

          -types => ['Exon','CDS']

     The names "-type" and "-types" can be used interchangeably.

$seqObj = $entry->bioSeq;
-------------------------

   Returns a *Note Bio/Seq: Bio/Seq, object from the Bioperl project.
Dies with an error message unless the Bio::Seq module is installed.

AUTHOR
======

   Lincoln D. Stein <lstein@cshl.org>.

COPYRIGHT
=========

   Copyright 1997-1999, Cold Spring Harbor Laboratory, Cold Spring Harbor
NY.  This module can be used and distributed on the same terms as Perl
itself.

SEE ALSO
========

   *Note Boulder: Boulder,, `Boulder:Genbank' in this node, *Note Stone:
Stone,


File: pm.info,  Node: Storable,  Next: String/Approx,  Prev: Stone/GB_Sequence,  Up: Module List

persistency for perl data structures
************************************

NAME
====

   Storable - persistency for perl data structures

SYNOPSIS
========

     use Storable;
     store \%table, 'file';
     $hashref = retrieve('file');

     use Storable qw(nstore store_fd nstore_fd freeze thaw dclone);

     # Network order
     nstore \%table, 'file';
     $hashref = retrieve('file');	# There is NO nretrieve()

     # Storing to and retrieving from an already opened file
     store_fd \@array, \*STDOUT;
     nstore_fd \%table, \*STDOUT;
     $aryref = fd_retrieve(\*SOCKET);
     $hashref = fd_retrieve(\*SOCKET);

     # Serializing to memory
     $serialized = freeze \%table;
     %table_clone = %{ thaw($serialized) };

     # Deep (recursive) cloning
     $cloneref = dclone($ref);

     # Advisory locking
     use Storable qw(lock_store lock_nstore lock_retrieve)
     lock_store \%table, 'file';
     lock_nstore \%table, 'file';
     $hashref = lock_retrieve('file');

DESCRIPTION
===========

   The Storable package brings persistency to your perl data structures
containing SCALAR, ARRAY, HASH or REF objects, i.e. anything that can be
convenientely stored to disk and retrieved at a later time.

   It can be used in the regular procedural way by calling store with a
reference to the object to be stored, along with the file name where the
image should be written.  The routine returns undef for I/O problems or
other internal error, a true value otherwise. Serious errors are
propagated as a die exception.

   To retrieve data stored to disk, use retrieve with a file name, and the
objects stored into that file are recreated into memory for you, a
reference to the root object being returned. In case an I/O error occurs
while reading, undef is returned instead. Other serious errors are
propagated via die.

   Since storage is performed recursively, you might want to stuff
references to objects that share a lot of common data into a single array
or hash table, and then store that object. That way, when you retrieve
back the whole thing, the objects will continue to share what they
originally shared.

   At the cost of a slight header overhead, you may store to an already
opened file descriptor using the `store_fd' routine, and retrieve from a
file via `fd_retrieve'. Those names aren't imported by default, so you
will have to do that explicitely if you need those routines.  The file
descriptor you supply must be already opened, for read if you're going to
retrieve and for write if you wish to store.

     store_fd(\%table, *STDOUT) || die "can't store to stdout\n";
     $hashref = fd_retrieve(*STDIN);

   You can also store data in network order to allow easy sharing across
multiple platforms, or when storing on a socket known to be remotely
connected. The routines to call have an initial n prefix for *network*, as
in `nstore' and `nstore_fd'. At retrieval time, your data will be
correctly restored so you don't have to know whether you're restoring from
native or network ordered data.  Double values are stored stringified to
ensure portability as well, at the slight risk of loosing some precision
in the last decimals.

   When using `fd_retrieve', objects are retrieved in sequence, one object
(i.e. one recursive tree) per associated `store_fd'.

   If you're more from the object-oriented camp, you can inherit from
Storable and directly store your objects by invoking store as a method.
The fact that the root of the to-be-stored tree is a blessed reference
(i.e. an object) is special-cased so that the retrieve does not provide a
reference to that object but rather the blessed object reference itself.
(Otherwise, you'd get a reference to that blessed object).

MEMORY STORE
============

   The Storable engine can also store data into a Perl scalar instead, to
later retrieve them. This is mainly used to freeze a complex structure in
some safe compact memory place (where it can possibly be sent to another
process via some IPC, since freezing the structure also serializes it in
effect). Later on, and maybe somewhere else, you can thaw the Perl scalar
out and recreate the original complex structure in memory.

   Surprisingly, the routines to be called are named `freeze' and `thaw'.
If you wish to send out the frozen scalar to another machine, use
`nfreeze' instead to get a portable image.

   Note that freezing an object structure and immediately thawing it
actually achieves a deep cloning of that structure:

     dclone(.) = thaw(freeze(.))

   Storable provides you with a `dclone' interface which does not create
that intermediary scalar but instead freezes the structure in some
internal memory space and then immediatly thaws it out.

ADVISORY LOCKING
================

   The `lock_store' and `lock_nstore' routine are equivalent to store and
`nstore', only they get an exclusive lock on the file before writing.
Likewise, `lock_retrieve' performs as retrieve, but also gets a shared
lock on the file before reading.

   Like with any advisory locking scheme, the protection only works if you
systematically use `lock_store' and `lock_retrieve'.  If one side of your
application uses store whilst the other uses `lock_retrieve', you will get
no protection at all.

   The internal advisory locking is implemented using Perl's flock()
routine.  If your system does not support any form of flock(), or if you
share your files across NFS, you might wish to use other forms of locking
by using modules like LockFile::Simple which lock a file using a filesystem
entry, instead of locking the file descriptor.

SPEED
=====

   The heart of Storable is written in C for decent speed. Extra low-level
optimization have been made when manipulating perl internals, to sacrifice
encapsulation for the benefit of a greater speed.

CANONICAL REPRESENTATION
========================

   Normally Storable stores elements of hashes in the order they are
stored internally by Perl, i.e. pseudo-randomly.  If you set
`$Storable::canonical' to some TRUE value, Storable will store hashes with
the elements sorted by their key.  This allows you to compare data
structures by comparing their frozen representations (or even the
compressed frozen representations), which can be useful for creating
lookup tables for complicated queries.

   Canonical order does not imply network order, those are two orthogonal
settings.

ERROR REPORTING
===============

   Storable uses the "exception" paradigm, in that it does not try to
workaround failures: if something bad happens, an exception is generated
from the caller's perspective (see *Note Carp: Carp, and `croak()').  Use
eval {} to trap those exceptions.

   When Storable croaks, it tries to report the error via the `logcroak()'
routine from the `Log::Agent' package, if it is available.

   Normal errors are reported by having store() or retrieve() return undef.
Such errors are usually I/O errors (or truncated stream errors at
retrieval).

WIZARDS ONLY
============

Hooks
-----

   Any class may define hooks that will be called during the serialization
and deserialization process on objects that are instances of that class.
Those hooks can redefine the way serialization is performed (and therefore,
how the symetrical deserialization should be conducted).

   Since we said earlier:

     dclone(.) = thaw(freeze(.))

   everything we say about hooks should also hold for deep cloning.
However, hooks get to know whether the operation is a mere serialization,
or a cloning.

   Therefore, when serializing hooks are involved,

     dclone(.) <> thaw(freeze(.))

   Well, you could keep them in sync, but there's no guarantee it will
always hold on classes somebody else wrote.  Besides, there is little to
gain in doing so: a serializing hook could only keep one attribute of an
object, which is probably not what should happen during a deep cloning of
that same object.

   Here is the hooking interface:

`STORABLE_freeze' *obj*, cloning
     The serializing hook, called on the object during serialization.  It
     can be inherited, or defined in the class itself, like any other
     method.

     Arguments: *obj* is the object to serialize, cloning is a flag
     indicating whether we're in a dclone() or a regular serialization via
     store() or freeze().

     Returned value: A LIST `($serialized, $ref1, $ref2, ...)' where
     $serialized is the serialized form to be used, and the optional
     $ref1, $ref2, etc... are extra references that you wish to let the
     Storable engine serialize.

     At deserialization time, you will be given back the same LIST, but
     all the extra references will be pointing into the deserialized
     structure.

     The *first time* the hook is hit in a serialization flow, you may
     have it return an empty list.  That will signal the Storable engine
     to further discard that hook for this class and to therefore revert
     to the default serialization of the underlying Perl data.  The hook
     will again be normally processed in the next serialization.

     Unless you know better, serializing hook should always say:

          sub STORABLE_freeze {
              my ($self, $cloning) = @_;
              return if $cloning;         # Regular default serialization
              ....
          }

     in order to keep reasonable dclone() semantics.

`STORABLE_thaw' *obj*, cloning, *serialized*, ...
     The deserializing hook called on the object during deserialization.
     But wait. If we're deserializing, there's no object yet... right?

     Wrong: the Storable engine creates an empty one for you.  If you know
     Eiffel, you can view `STORABLE_thaw' as an alternate creation routine.

     This means the hook can be inherited like any other method, and that
     *obj* is your blessed reference for this particular instance.

     The other arguments should look familiar if you know
     `STORABLE_freeze': cloning is true when we're part of a deep clone
     operation, *serialized* is the serialized string you returned to the
     engine in `STORABLE_freeze', and there may be an optional list of
     references, in the same order you gave them at serialization time,
     pointing to the deserialized objects (which have been processed
     courtesy of the Storable engine).

     When the Storable engine does not find any `STORABLE_thaw' hook
     routine, it tries to load the class by requiring the package
     dynamically (using the blessed package name), and then re-attempts
     the lookup.  If at that time the hook cannot be located, the engine
     croaks.  Note that this mechanism will fail if you define several
     classes in the same file, but perlmod(1) warned you.

     It is up to you to use these information to populate *obj* the way
     you want.

     Returned value: none.

Predicates
----------

   Predicates are not exportable.  They must be called by explicitely
prefixing them with the Storable package name.

`Storable::last_op_in_netorder'
     The `Storable::last_op_in_netorder()' predicate will tell you whether
     network order was used in the last store or retrieve operation.  If
     you don't know how to use this, just forget about it.

`Storable::is_storing'
     Returns true if within a store operation (via STORABLE_freeze hook).

`Storable::is_retrieving'
     Returns true if within a retrieve operation, (via STORABLE_thaw hook).

Recursion
---------

   With hooks comes the ability to recurse back to the Storable engine.
Indeed, hooks are regular Perl code, and Storable is convenient when it
comes to serialize and deserialize things, so why not use it to handle the
serialization string?

   There are a few things you need to know however:

   * You can create endless loops if the things you serialize via freeze()
     (for instance) point back to the object we're trying to serialize in
     the hook.

   * Shared references among objects will not stay shared: if we're
     serializing the list of object [A, C] where both object A and C refer
     to the SAME object B, and if there is a serializing hook in A that
     says freeze(B), then when deserializing, we'll get [A', C'] where A'
     refers to B', but C' refers to D, a deep clone of B'.  The topology
     was not preserved.

   That's why `STORABLE_freeze' lets you provide a list of references to
serialize.  The engine guarantees that those will be serialized in the
same context as the other objects, and therefore that shared objects will
stay shared.

   In the above [A, C] example, the `STORABLE_freeze' hook could return:

     ("something", $self->{B})

   and the B part would be serialized by the engine.  In `STORABLE_thaw',
you would get back the reference to the B' object, deserialized for you.

   Therefore, recursion should normally be avoided, but is nonetheless
supported.

Deep Cloning
------------

   There is a new Clone module available on CPAN which implements deep
cloning natively, i.e. without freezing to memory and thawing the result.
It is aimed to replace Storable's dclone() some day.  However, it does not
currently support Storable hooks to redefine the way deep cloning is
performed.

EXAMPLES
========

   Here are some code samples showing a possible usage of Storable:

     use Storable qw(store retrieve freeze thaw dclone);

     %color = ('Blue' => 0.1, 'Red' => 0.8, 'Black' => 0, 'White' => 1);

     store(\%color, '/tmp/colors') or die "Can't store %a in /tmp/colors!\n";

     $colref = retrieve('/tmp/colors');
     die "Unable to retrieve from /tmp/colors!\n" unless defined $colref;
     printf "Blue is still %lf\n", $colref->{'Blue'};

     $colref2 = dclone(\%color);

     $str = freeze(\%color);
     printf "Serialization of %%color is %d bytes long.\n", length($str);
     $colref3 = thaw($str);

   which prints (on my machine):

     Blue is still 0.100000
     Serialization of %color is 102 bytes long.

WARNING
=======

   If you're using references as keys within your hash tables, you're bound
to disapointment when retrieving your data. Indeed, Perl stringifies
references used as hash table keys. If you later wish to access the items
via another reference stringification (i.e. using the same reference that
was used for the key originally to record the value into the hash table),
it will work because both references stringify to the same string.

   It won't work across a store and retrieve operations however, because
the addresses in the retrieved objects, which are part of the stringified
references, will probably differ from the original addresses. The topology
of your structure is preserved, but not hidden semantics like those.

   On platforms where it matters, be sure to call binmode() on the
descriptors that you pass to Storable functions.

   Storing data canonically that contains large hashes can be
significantly slower than storing the same data normally, as temprorary
arrays to hold the keys for each hash have to be allocated, populated,
sorted and freed.  Some tests have shown a halving of the speed of storing
- the exact penalty will depend on the complexity of your data.  There is
no slowdown on retrieval.

BUGS
====

   You can't store GLOB, CODE, FORMLINE, etc... If you can define
semantics for those operations, feel free to enhance Storable so that it
can deal with them.

   The store functions will croak if they run into such references unless
you set `$Storable::forgive_me' to some TRUE value. In that case, the
fatal message is turned in a warning and some meaningless string is stored
instead.

   Setting `$Storable::canonical' may not yield frozen strings that
compare equal due to possible stringification of numbers. When the string
version of a scalar exists, it is the form stored, therefore if you happen
to use your numbers as strings between two freezing operations on the same
data structures, you will get different results.

   When storing doubles in network order, their value is stored as text.
However, you should also not expect non-numeric floating-point values such
as infinity and "not a number" to pass successfully through a
nstore()/retrieve() pair.

   As Storable neither knows nor cares about character sets (although it
does know that characters may be more than eight bits wide), any difference
in the interpretation of character codes between a host and a target
system is your problem.  In particular, if host and target use different
code points to represent the characters used in the text representation of
floating-point numbers, you will not be able be able to exchange
floating-point data, even with nstore().

CREDITS
=======

   Thank you to (in chronological order):

     Jarkko Hietaniemi <jhi@iki.fi>
     Ulrich Pfeifer <pfeifer@charly.informatik.uni-dortmund.de>
     Benjamin A. Holzman <bah@ecnvantage.com>
     Andrew Ford <A.Ford@ford-mason.co.uk>
     Gisle Aas <gisle@aas.no>
     Jeff Gresham <gresham_jeffrey@jpmorgan.com>
     Murray Nesbitt <murray@activestate.com>
     Marc Lehmann <pcg@opengroup.org>
     Justin Banks <justinb@wamnet.com>
     Jarkko Hietaniemi <jhi@iki.fi> (AGAIN, as perl 5.7.0 Pumpkin!)
     Salvador Ortiz Garcia <sog@msg.com.mx>
     Dominic Dunlop <domo@computer.org>
     Erik Haugan <erik@solbors.no>

   for their bug reports, suggestions and contributions.

   Benjamin Holzman contributed the tied variable support, Andrew Ford
contributed the canonical order for hashes, and Gisle Aas fixed a few
misunderstandings of mine regarding the Perl internals, and optimized the
emission of "tags" in the output streams by simply counting the objects
instead of tagging them (leading to a binary incompatibility for the
Storable image starting at version 0.6-older images are of course still
properly understood).  Murray Nesbitt made Storable thread-safe.  Marc
Lehmann added overloading and reference to tied items support.

TRANSLATIONS
============

   There is a Japanese translation of this man page available at
http://member.nifty.ne.jp/hippo2000/perltips/storable.htm , courtesy of
Kawai, Takanori <kawai@nippon-rad.co.jp>.

AUTHOR
======

   Raphael Manfredi `<Raphael_Manfredi@pobox.com>'

SEE ALSO
========

   Clone(3).


File: pm.info,  Node: String/Approx,  Next: String/BitCount,  Prev: Storable,  Up: Module List

Perl extension for approximate matching (fuzzy matching)
********************************************************

NAME
====

   String::Approx - Perl extension for approximate matching (fuzzy
matching)

SYNOPSIS
========

     use String::Approx 'amatch';

     print if amatch("foobar");

     my @matches = amatch("xyzzy", @inputs);

     my @catches = amatch("plugh", ['2'], @inputs);

DESCRIPTION
===========

   String::Approx lets you match and substitute strings approximately.
With this you can emulate errors: typing errorrs, speling errors, closely
related vocabularies (colour color), genetic mutations (GAG ACT),
abbreviations (McScot, MacScot).

   The measure of *approximateness* is the *Levenshtein edit distance*.
It is the total number of "edits": insertions,

     word world

   deletions,

     monkey money

   and substitutions

     sun fun

   required to transform a string to another string.  For example, to
transform *"lead"* into *"gold"*, you need three edits:

     lead gead goad gold

   The edit distance of "lead" and "gold" is therefore three.

MATCH
=====

     use String::Approx 'amatch';

     $matched     = amatch("pattern")
     $matched     = amatch("pattern", [ modifiers ])

     $any_matched = amatch("pattern", @inputs)
     $any_matched = amatch("pattern", [ modifiers ], @inputs)

     @match       = amatch("pattern")
     @match       = amatch("pattern", [ modifiers ])

     @matches     = amatch("pattern", @inputs)
     @matches     = amatch("pattern", [ modifiers ], @inputs)

   Match pattern approximately.  In list context return the matched
*@inputs*.  If no inputs are given, match against the $_.  In scalar
context return true if any of the inputs match, false if none match.

   Notice that the pattern is a string.  Not a regular expression.  None
of the regular expression notations (^, ., *, and so on) work.  They are
characters just like the others.  Note-on-note: some limited form of
*"regular expressionism"* is planned in future: for example character
classes ([abc]) and *any-chars* (.).  But that feature will be turned on
by a special *modifier* (just a guess: "r"), so there should be no
backward compatibility problem.

   Notice also that matching is not symmetric.  The inputs are matched
against the pattern, not the other way round.  In other words: the pattern
can be a substring, a submatch, of an input element.  An input element is
always a superstring of the pattern.

MODIFIERS
---------

   With the modifiers you can control the amount of approximateness and
certain other control variables.  The modifiers are one or more strings,
for example `"i"', within a string optionally separated by whitespace.
The modifiers are inside an anonymous array: the `[ ]' in the syntax are
not notational, they really do mean `[ ]', for example `[ "i", "2" ]'.
`["2 i"]' would be identical.

   The implicit default approximateness is 10%, rounded up.  In other
words: every tenth character in the pattern may be an error, an edit.  You
can explicitly set the maximum approximateness by supplying a modifier like

     number
     number%

   Examples: `"3"', `"15%"'.

   Using a similar syntax you can separately control the maximum number of
insertions, deletions, and substitutions by prefixing the numbers with I,
D, or S, like this:

     Inumber
     Inumber%
     Dnumber
     Dnumber%
     Snumber
     Snumber%

   Examples: `"I2"', `"D20%"', `"S0"'.

   You can ignore case (`"A"' becames equal to `"a"' and vice versa) by
adding the `"i"' modifier.

   For example

     [ "i 25%", "S0" ]

   means *ignore case*, *allow every fourth character to be "an edit"*,
but allow *no substitutions*.  (See `NOTES' in this node about disallowing
substitutions or insertions.)

SUBSTITUTE
==========

     use String::Approx 'asubstitute';

     @substituted = asubstitute("pattern", "replacement")
     @substituted = asubstitute("pattern", "replacement", @inputs)
     @substituted = asubstitute("pattern", "replacement", [ modifiers ])
     @substituted = asubstitute("pattern", "replacement",
     			   [ modifiers ], @inputs)

   Substitute approximate pattern with replacement and return as a list
<copies> of *@inputs*, the substitutions having been made on the elements
that did match the pattern.  If no inputs are given, substitute in the $_.
The replacement can contain magic strings $&, $`, $' that stand for the
matched string, the string before it, and the string after it,
respectively.  All the other arguments are as in `amatch()', plus one
additional modifier, `"g"' which means substitute globally (all the
matches in an element and not just the first one, as is the default).

   See `BAD NEWS' in this node about the unfortunate stinginess of
`asubstitute()'.

INDEX
=====

     use String::Approx 'aindex';

     $index   = aindex("pattern")
     @indices = aindex("pattern", @inputs)
     $index   = aindex("pattern", [ modifiers ])
     @indices = aindex("pattern", [ modifiers ], @inputs)

   Like `amatch()' but returns the index/indices at which the pattern
matches approximately.  In list context and if `@inputs' are used, returns
a list of indices, one index for each input element.  If there's no
approximate match, `-1' is returned as the index.

   There's also backwards-scanning `arindex()'.

SLICE
=====

     use String::Approx 'aindex';

     ($index, $size)   = aslice("pattern")
     ([$i0, $s0], ...) = aslice("pattern", @inputs)
     ($index, $size)   = aslice("pattern", [ modifiers ])
     ([$i0, $s0], ...) = aslice("pattern", [ modifiers ], @inputs)

   Like `aindex()' but returns also the size of the match.  If the match
fails, returns an empty list (when matching against $_) or an empty
anonymous list corresponding to the particular input.

   Note that the size of the match will very probably be something you did
not expect (such as longer than the pattern).  This may or may not be
fixed in future releases.

   If the modifier

     "minimal_distance"

   is used, the minimal possible edit distance is returned as the third
element:

     ($index, $size, $distance) = aslice("pattern", [ modifiers ])
     ([$i0, $s0, $d0], ...)     = aslice("pattern", [ modifiers ], @inputs)

DISTANCE
========

     use String::Approx 'adist';

     $dist = adist("pattern", $input);
     @dist = adist("pattern", @input);

   Return the *edit distance* or distances between the pattern and the
input or inputs.  Zero edit distance means exact match.  (Remember that
the match can 'float' in the inputs, the match is a substring match.)  If
the pattern is longer than the input or inputs, the returned distance or
distance is or are negative.

     use String::Approx 'adistr';

     $dist = adistr("pattern", $input);
     @dist = adistr("pattern", @inputs);

   Return the relative *edit distance* or distances between the pattern
and the input or inputs.  Zero relative edit distance means exact match,
one means completely different.  (Remember that the match can 'float' in
the inputs, the match is a substring match.)  If the pattern is longer
than the input or inputs, the returned distance or distances is or are
negative.

CONTROLLING THE CACHE
=====================

   `String::Approx' maintains a LU (least-used) cache that holds the
'matching engines' for each instance of a *pattern+modifiers*.  The cache
is intended to help the case where you match a small set of patterns
against a large set of string.  However, the more engines you cache the
more you eat memory.  If you have a lot of different patterns or if you
have a lot of memory to burn, you may want to control the cache yourself.
For example, allowing a larger cache consumes more memory but probably
runs a little bit faster since the cache fills (and needs flushing) less
often.

   The cache has two parameters: max and purge.  The first one is the
maximum size of the cache and the second one is the cache flushing ratio:
when the number of cache entries exceeds max, max times purge cache
entries are flushed.  The default values are 1000 and 0.75, respectively,
which means that when the 1001st entry would be cached, 750 least used
entries will be removed from the cache.  To access the parameters you can
use the calls

     $now_max = String::Approx::cache_max();
     String::Approx::cache_max($new_max);

     $now_purge = String::Approx::cache_purge();
     String::Approx::cache_purge($new_purge);

     $limit = String::Approx::cache_n_purge();

   To be honest, there are actually *two* caches: the first one is used
far the patterns with no modifiers, the second one for the patterns with
pattern modifiers.  Using the standard parameters you will therefore
actually cache up to 2000 entries.  The above calls control both caches
for the same price.

   To disable caching completely use

     String::Approx::cache_disable();

   Note that this doesn't flush any possibly existing cache entries, to do
that use

     String::Approx::cache_flush_all();

NOTES
=====

   Because matching is by *substrings*, not by whole strings, insertions
and substitutions produce often very similar results: "abcde" matches
"axbcde" either by insertion or substitution of "x".

   The maximum edit distance is also the maximum number of edits.  That
is, the `"I2"' in

     amatch("abcd", ["I2"])

   is useless because the maximum edit distance is (implicitly) 1.  You
may have meant to say

     amatch("abcd", ["2D1S1"])

   or something like that.

   If you want to simulate transposes

     feet fete

   you need to allow at least edit distance of two because in terms of our
edit primitives a transpose is first one deletion and then one insertion.

TEXT POSITION
-------------

   The starting and ending positions of matching, substituting, indexing,
or slicing can be changed from the beginning and end of the input(s) to
some other positions by using either or both of the modifiers

     "initial_position=24"
     "final_position=42"

   or the both the modifiers

     "initial_position=24"
     "position_range=10"

   By setting the `"position_range"' to be zero you can limit (anchor) the
operation to happen only once (if a match is possible) at the position.

VERSION
=======

   Major release 3.

CHANGES FROM VERSION 2
======================

GOOD NEWS
---------

The version 3 is 2-3 times faster than version 2
No pattern length limitation
     The algorithm is independent on the pattern length: its time
     complexity is *O(kn)*, where k is the number of edits and n the
     length of the text (input).  The preprocessing of the pattern will of
     course take some *O(m)* (m being the pattern length) time, but
     `amatch()' and `asubstitute()' cache the result of this preprocessing
     so that it is done only once per pattern.

BAD NEWS
--------

You do need a C compiler to install the module
     Perl's regular expressions are no more used; instead a faster and more
     scalable algorithm written in C is used.

`asubstitute()' is now always stingy
     The string matched and substituted is now always stingy, as short as
     possible.  It used to be as long as possible.  This is an unfortunate
     change stemming from switching the matching algorithm.  Example: with
     edit distance of two and substituting for `"word"' from `"cork"' and
     `"wool"' previously did match `"cork"' and `"wool"'.  Now it does
     match `"or"' and `"wo"'.  As little as possible, or, in other words,
     with as much approximateness, as many edits, as possible.  Because
     there is no *need* to match the `"c"' of `"cork"', it is not matched.

no more `aregex()' because regular expressions are no more used
no more `compat1' for String::Approx version 1 compatibility
ACKNOWLEDGEMENTS
================

   The following people have provided valuable test cases, documentation
clarifications, and other feedback:

   Jared August, Anirvan Chatterjee, Steve A. Chervitz, Aldo Calpini,
David Curiel, Teun van den Dool, Alberto Fontaneda, Rob Fugina, Dmitrij
Frishman, Lars Gregersen, Kevin Greiner, B. Elijah Griffin, Mike Hanafey,
Mitch Helle, Ricky Houghton, Helmut Jarausch, Damian Keefe, Ben Kennedy,
Craig Kelley, Franz Kirsch, Dag Kristian, Mark Land, J. D. Laub, Sergey
Novoselov, Andy Oram, Eric Promislow, Nikolaus Rath, Stefan Ram, Dag
Kristian Rognlien, Stewart Russell, Slaven Rezic, Chris Rosin, Ilya
Sandler, Bob J.A. Schijvenaars, Ross Smith, Frank Tobin, Greg Ward, Rick
Wise.

   The matching algorithm was developed by Udi Manber, Sun Wu, and Burra
Gopal in the Department of Computer Science, University of Arizona.

AUTHOR
======

   Jarkko Hietaniemi <jhi@iki.fi>


File: pm.info,  Node: String/BitCount,  Next: String/CRC,  Prev: String/Approx,  Up: Module List

count number of "1" bits in string
**********************************

NAME
====

   String::BitCount, BitCount showBitCount - count number of "1" bits in
string

SYNOPSIS
========

     use String::BitCount;

DESCRIPTION
===========

BitCount LIST
     Joins the elements of LIST into a single string and returns the the
     number of bits in this string.

showBitCount LIST
     Copies the elements of LIST to a new list and converts the new
     elements to strings of digits showing the number of set bits in the
     original byte.  In array context returns the new list.  In scalar
     context joins the elements of the new list into a single string and
     returns the string.

AUTHOR
======

   Winfried Koenig <win@in.rhein-main.de>

SEE ALSO
========

   perl(1)


File: pm.info,  Node: String/CRC,  Next: String/CRC32,  Prev: String/BitCount,  Up: Module List

Perl interface cyclic redundency check generation
*************************************************

NAME
====

   CRC - Perl interface cyclic redundency check generation

SYNOPSIS
========

     use String::CRC;
     
     ($crc_low, $crc_high) = crc("some string", 64);
     $crc_binary = crc("some string", 64);
     ($crc_low, $crc_high) = unpack("LL", $crc_binary);
     ($crc_small) = crc("some string", 32);

DESCRIPTION
===========

   The *CRC* module calculates CRC of various lenghts.  The default CRC
length is 32 bits.

   CRCs of 32 bits and smaller will be returned as an integer.

   CRCs that are larger than 32 bits will be returned as two integers if
called in list context and as a packed binary string if called in scalar
context.

COPYRIGHT
=========

   Taken from Matt Dillon's Diablo distribution with permission.

   The authors of this package (David Sharnoff & Matthew Dillon) disclaim
all copyrights and release it into the public domain.


File: pm.info,  Node: String/CRC32,  Next: String/Checker,  Prev: String/CRC,  Up: Module List

Perl interface for cyclic redundency check generation
*****************************************************

NAME
====

   CRC32 - Perl interface for cyclic redundency check generation

SYNOPSIS
========

     use String::CRC32;
     
     $crc = crc32("some string");
     $crc = crc32("some string", initvalue);

     $somestring = "some string";
     $crc = crc32($somestring);

     open(SOMEFILE, "location/of/some.file");
     $crc = crc32(*SOMEFILE);
     close(SOMEFILE);

DESCRIPTION
===========

   The *CRC32* module calculates CRC sums of 32 bit lenghts.  It generates
the same CRC values as ZMODEM, PKZIP, PICCHECK and many others.

   Despite its name, this module is able to compute the checksum of
strings as well as of files.

EXAMPLES
========

     $crc = crc32("some string");

     results in the same as

     $crc = crc32(" string", crc32("some"));

   This is useful for subsequent CRC checking of substrings.

   You may even check files:

     open(SOMEFILE, "location/of/some.file");
     $crc = crc32(*SOMEFILE);
     close(SOMEFILE);

   A init value may also been supplied in the above example.

AUTHOR
======

   Soenke J. Peters <peters@simprovement.com>

   Please be so kind as to report any bugs/suggestions to the above
address.

COPYRIGHT
=========

   CRC algorithm code taken from CRC-32 by Craig Bruce.  The module stuff
is inspired by a similar perl module called String::CRC by David Sharnoff
& Matthew Dillon.  Horst Fickenscher told me that it could be useful to
supply an init value to the crc checking function and so I included this
possibility.

   The author of this package disclaims all copyrights and releases it
into the public domain


File: pm.info,  Node: String/Checker,  Next: String/DiffLine,  Prev: String/CRC32,  Up: Module List

An extensible string validation module (allowing commonly used checks on strings to be called more concisely and consistently).
*******************************************************************************************************************************

NAME
====

   String::Checker - An extensible string validation module (allowing
commonly used checks on strings to be called more concisely and
consistently).

SYNOPSIS
========

     use String::Checker;

     String::Checker::register_check($checkname, \&sub);
     $return = String::Checker::checkstring($string, [ expectation, ... ]);

DESCRIPTION
===========

   This is a very simple library for checking a string against a given set
of expectations.  It contains a number of pre-defined expectations which
can be used, and can also be extended to perform any arbitrary match or
modification on a string.

   Why is this useful?  If you're only checking one string, it probably
isn't.  However, if you're checking a bunch of strings (say, for example,
CGI input parameters) against a set of expectations, this comes in pretty
handy.  As a matter of fact, the CGI::ArgChecker module is a simple,
CGI.pm aware wrapper for this library.

Checking a string
-----------------

   The checkstring function takes a string scalar and a reference to a
list of 'expectations' as arguments, and outputs a reference to a list,
containing the names of the expectations which failed.

   Each expectation, in turn, can either be a string scalar (the name of
the expectation) or a two-element array reference (the first element being
the name of the expectation, and second element being the argument to that
expectation.)  For example:

     $string = "foo";
     String::Checker::checkstring($string, [ 'allow_empty',
                                             [ 'max' => 20 ] ] );

   Note that the expectations are run in order.  In the above case, for
example, the 'allow_empty' expectation would be checked first, followed by
the 'max' expectation with an argument of 20.

Defined checks
--------------

   The module predefines a number of checks.  They are:

allow_empty
     Never fails - will convert an undef scalar to an empty string, though.

disallow_empty
     Fails if the input string is either undef or empty.

min
     Fails if the length of the input string is less than the numeric
     value of it's single argument.

max
     Fails if the length of the input string is more than the numeric
     value of it's single argument.

want_int
     Fails if the input string does not solely consist of numeric
     characters.

want_float
     Fails if the argument does not solely consist of numeric characters,
     plus an optional single '.'.

allow_chars
     Fails if the input string contains characters other than those in its
     argument.

disallow_chars
     Fails if the input string contains any of the characters in its
     argument.

upcase
     Never fails - converts the string to upper case.

downcase
     Never fails - converts the string to lower case.

stripxws
     Never fails - strips leading and trailing whitespace from the string.

enum
     Fails if the input string does not precisely match at least one of the
     elements of the array reference it takes as an argument.

match
     Fails if the input string does not match the regular expression it
     takes as an argument.

want_email
     Fails if the input string does not match the regular expression:
     ^\S+\@@[\w-]+\.[\w\.-]+$

want_phone
     Fails if the input string does not match the regular expression
     ^[0-9+.()-]*$

want_date
     Interprets the input string as a date, if possible.  This will fail
     if it can't figure out a date from the input.  In addition, it is
     possible to use this to standardize date input.  Pass a formatting
     string (see the strftime(3) man page) as an argument to this check,
     and the string will be formatted appropriately if possible.  This is
     based on the Date::Manip(1) module, so that documentation might prove
     valuable if you're using this check.

Extension checks
----------------

   Use register_check to register a new expectation checking routine.  This
function should be passed a new expectation name and a code reference.

   This code reference will be called every time the expectation name is
seen, with either one or two arguments.  The first argument will always be
a reference to the input string (the function is free to modify the value
of the string).  The second argument, if any, is the second element of a
two-part expectation, whatever that might be.

   The function should return undef unless there's a problem, in which case
it should return 1.  It's also best (if possible) to return undef if the
string is undef, so that the user can decide whether to allow_empty or
disallow_empty independent of your check.

   For example, registering a check to verify that the input word is "poot"
would look like:

     String::Checker::register_check("ispoot", sub {
         my($s) = shift;
         if ((defined($$s)) && ($$s ne 'poot')) {
             return 1;
         }
         return undef;
     };

BUGS
====

   Hopefully none.

AUTHOR
======

   J. David Lowe, dlowe@webjuice.com

SEE ALSO
========

   perl(1), CGI::ArgChecker(1)


File: pm.info,  Node: String/DiffLine,  Next: String/Escape,  Prev: String/Checker,  Up: Module List

find the character,line, and line position of the first difference
******************************************************************

NAME
====

   String::DiffLine - find the character,line, and line position of the
first difference

SYNOPSIS
========

     use String::DiffLine qw(diffline);
     ($char,$line,$lpos)=diffline("abc","abx");

DESCRIPTION
===========

diffline($str1,$str2)
     Returns a three-item list identifying the location of the first
     difference between the two strings: the character position (indexed
     from 0), the line number (indexed from 1), and the position in the
     line (indexed from 0). $/ is used as the line separator.

     If the strings are identical, the first element of the returned list
     is zero, the second element is the number of line separators plus one,
     and the last element is the number of characters following the last
     line separator.

AUTHOR
======

   Andrew Allen <andrew_d_allen@hotmail.com>

SEE ALSO
========

   perl(1).


File: pm.info,  Node: String/Escape,  Next: String/Parity,  Prev: String/DiffLine,  Up: Module List

Registry of string functions, including backslash escapes
*********************************************************

NAME
====

   String::Escape - Registry of string functions, including backslash
escapes

SYNOPSIS
========

     use String::Escape qw( printable unprintable );
     # Convert control, high-bit chars to \n or \xxx escapes
     $output = printable($value);
     # Convert escape sequences back to original chars
     $value = unprintable($input);
     
     use String::Escape qw( elide );
     # Shorten strings to fit, if necessary
     foreach (@_) { print elide( $_, 79 ) . "\n"; }
     
     use String::Escape qw( escape );
     # Defer selection of escaping routines until runtime
     $escape_name = $use_quotes ? 'qprintable' : 'printable';
     @escaped = escape($escape_name, @values);

DESCRIPTION
===========

   This module provides a flexible calling interface to some
frequently-performed string conversion functions, including applying and
removing C/Unix-style backslash escapes like \n and \t, wrapping and
removing double-quotes, and truncating to fit within a desired length.

   The escape() function provides for dynamic selection of operations by
using a package hash variable to map escape specification strings to the
functions which implement them. The lookup imposes a bit of a performance
penalty, but allows for some useful late-binding behaviour. Compound
specifications (ex. 'quoted uppercase') are expanded to a list of
functions to be applied in order. Other modules may also register their
functions here for later general use.

REFERENCE
=========

Escaping And Unescaping Functions
---------------------------------

   Each of these functions takes a single simple scalar argument and
returns its escaped (or unescaped) equivalent.

quote($value) : $escaped
     Add double quote characters to each end of the string.

quote_non_words($value) : $escaped
     As above, but only quotes empty, punctuated, and multiword values.

unquote($value) : $escaped
     If the string both begins and ends with double quote characters, they
     are removed, otherwise the string is returned unchanged.

printable($value) : $escaped
unprintable($value) : $escaped
     These functions convert return, newline, tab, backslash and
     unprintable characters to their backslash-escaped equivalents and
     back again.

qprintable($value) : $escaped
unqprintable($value) : $escaped
     The qprintable function applies printable escaping and then wraps the
     results with quote_non_words, while unqprintable applies  unquote and
     then unprintable.  (Note that this is not MIME quoted-printable
     encoding.)

String Elision Function
-----------------------

   This function extracts the leading portion of a provided string and
appends ellipsis if it's longer than the desired maximum excerpt length.

elide($string) : $elided_string
elide($string, $length) : $elided_string
elide($string, $length, $word_boundary_strictness) : $elided_string
     If the original string is shorter than $length, it is returned
     unchanged. At most $length characters are returned; if called with a
     single argument, $length defaults to $DefaultLength.

     Up to $word_boundary_strictness additional characters may be ommited
     in order to make the elided portion end on a word boundary; you can
     pass 0 to ignore word boundaries. If not provided,
     $word_boundary_strictness defaults to $DefaultStrictness.

$Elipses
     The string of characters used to indicate the end of the excerpt.
     Initialized to '...'.

$DefaultLength
     The default target excerpt length, used when the elide function is
     called with a single argument. Initialized to 60.

$DefaultStrictness
     The default word-boundary flexibility, used when the elide function
     is called without the third argument. Initialized to 10.

Escape By-Name
--------------

   These functions provide for the registration of string-escape
specification names and corresponding functions, and then allow the
invocation of one or several of these functions on one or several source
string values.

escape($escapes, $value) : $escaped_value
escape($escapes, @values) : @escaped_values
     Returns an altered copy of the provided values by looking up the
     escapes string in a registry of string-modification functions.

     If called in a scalar context, operates on the single value passed
     in; if called in a list contact, operates identically on each of the
     provided values.

     Valid escape specifications are:

    one of the keys defined in %Escapes
          The coresponding specification will be looked up and used.

    a sequence of names separated by whitespace,
          Each name will be looked up, and each of the associated
          functions will be applied successively, from left to right.

    a reference to a function
          The provided function will be called on with each value in turn.

    a reference to an array
          Each item in the array will be expanded as provided above.

     A fatal error will be generated if you pass an unsupported escape
     specification, or if the function is called with multiple values in a
     scalar context.

String::Escape::names() : @defined_escapes
     Returns a list of defined escape specification strings.

String::Escape::add( $escape_name, \&escape_function );
     Add a new escape specification and corresponding function.

%Escapes : $name, $operation, ...
     By default, the %Escapes hash is initialized to contain the following
     mappings:

    quote, unquote, or quote_non_words
    printable, unprintable, qprintable, or unqprintable,
    elide
          Run the above-described functions of the same names.

    uppercase, lowercase, or initialcase
          Alters the case of letters in the string to upper or lower case,
          or for initialcase, sets the first letter to upper case and all
          others to lower.

    none
          Return an unchanged copy of the original value.

EXAMPLES
========

   `print printable( "\tNow is the time\nfor all good folks\n" );'

   `*\tNow is the time\nfor all good folks\n*'

   `print escape('qprintable', "\tNow is the time\nfor all good folks\n"
);'

   `*"\tNow is the time\nfor all good folks\n"*'

   `print escape('uppercase qprintable', "\tNow is the time\nfor all good
folks\n" );'

   `*"\tNOW IS THE TIME\nFOR ALL GOOD FOLKS\n"*'

   `print join '--', escape('printable', "\tNow is the time\n", "for all
good folks\n" );'

   `*\tNow is the time\n--for all good folks\n*'

   `$string = 'foo bar baz this that the other';'

   `print elide( $string, 100 );'

   `*foo bar baz this that the other*'

   `print elide( $string, 12 );'

   `*foo bar...*'

   `print elide( $string, 12, 0 );'

   `*foo bar b...*'

PREREQUISITES AND INSTALLATION
==============================

   This package should run on any standard Perl 5 installation.

   To install this package, download and unpack the distribution archive
from http://www.evoscript.com/dist/ or your favorite CPAN mirror, and
execute the standard "perl Makefile.PL", "make test", "make install"
sequence.

STATUS AND SUPPORT
==================

   This release of String::Escape is intended for public review and
feedback.  It has been tested in several environments and no major
problems have been discovered, but it should be considered "beta" pending
that feedback.

     Name            DSLI  Description
     --------------  ----  ---------------------------------------------
     String::
     ::Escape        bdpf  Escape by-name registry and useful functions

   Further information and support for this module is available at
<www.evoscript.com>.

   Please report bugs or other problems to <bugs@evoscript.com>.

   The following changes are in progress or under consideration:

Use word-boundary test in elide's regular expression rather than \s|\Z.
Compare with TOMC's String::Edit package.
AUTHORS AND COPYRIGHT
=====================

   Copyright 1997, 1998 Evolution Online Systems, Inc. <www.evolution.com>

   You may use this software for free under the terms of the Artistic
License.

   Contributors: M. Simon Cavalletto <simonm@evolution.com>, Jeremy G.
Bishop <jeremy@evolution.com>