This is Info file pm.info, produced by Makeinfo version 1.68 from the input file bigpm.texi.  File: pm.info, Node: Stone/Cursor, Next: Stone/GB_Sequence, Prev: Stone, Up: Module List Traverse tags and values of a Stone *********************************** NAME ==== Stone::Cursor - Traverse tags and values of a Stone SYNOPSIS ======== use Boulder::Store; $store = Boulder::Store->new('./soccer_teams'); my $stone = $store->get(28); $cursor = $stone->cursor; while (my ($key,$value) = $cursor->each) { print "$value: Go Bluejays!\n" if $key eq 'State' and $value eq 'Katonah'; } DESCRIPTION =========== Boulder::Cursor is a utility class that allows you to create one or more iterators across a *Note Stone: Stone, object. This is used for traversing large Stone objects in order to identify or modify portions of the record. CLASS METHODS ------------- Boulder::Cursor->new($stone) Return a new Boulder::Cursor over the specified *Note Stone: Stone, object. This will return an error if the object is not a *Note Stone: Stone, or a descendent. This method is usually not called directly, but rather indirectly via the *Note Stone: Stone, cursor() method: my $cursor = $stone->cursor; OBJECT METHODS -------------- $cursor->each() Iterate over the attached *Stone*. Each iteration will return a two-valued list consisting of a tag path and a value. The tag path is of a form that can be used with *Stone::index()* (in fact, a cursor is used internally to implement the Stone::dump() method. When the end of the *Stone* is reached, `each()' will return an empty list, after which it will start over again from the beginning. If you attempt to insert or delete from the stone while iterating over it, all attached cursors will reset to the beginnning. For example: $cursor = $s->cursor; while (($key,$value) = $cursor->each) { print "$value: BOW WOW!\n" if $key=~/pet/; } $cursor->reset() This resets the cursor back to the beginning of the associated *Stone*. AUTHOR ====== Lincoln D. Stein . COPYRIGHT ========= Copyright 1997-1999, Cold Spring Harbor Laboratory, Cold Spring Harbor NY. This module can be used and distributed on the same terms as Perl itself. SEE ALSO ======== *Note Boulder: Boulder,, *Note Stone: Stone,  File: pm.info, Node: Stone/GB_Sequence, Next: Storable, Prev: Stone/Cursor, Up: Module List Specialized Access to GenBank Records ************************************* NAME ==== Stone::GB_Sequence - Specialized Access to GenBank Records SYNOPSIS ======== use Boulder::Genbank; # No need to use Stone::GB_Sequence directly $gb = Boulder::Genbank->newFh qw(M57939 M28274 L36028); while ($entry = <$gb>) { print "Entry's length is ",$entry->length,"\n"; @cds = $entry->match_features(-type=>'CDS'); @exons = $entry->match_features(-type=>'Exon',-start=>100,-end=>300); } } DESCRIPTION =========== Stone::GB_Sequence provides several specialized access methods to the various fields in a GenBank flat file record. You can return the sequence as a Bio::Seq object, or query the sequence for features that match positional or descriptional criteria that you provide. CONSTRUCTORS ============ This class is not intended to be created directly, but via a *Note Boulder/Genbank: Boulder/Genbank, stream. METHODS ======= In addition to the standard *Note Stone: Stone, methods and accessors, the following methods are provided. In the synopses, the variable $entry refers to a previously-created Stone::GB_Sequence object. $length = $entry->length ------------------------ Get the length of the sequence. $start = $entry->start ---------------------- Get the start position of the sequence, currently always "1". $end = $entry->end ------------------ Get the end position of the sequence, currently always the same as the length. @feature_list = $entry->features(-pos=>[50,450],-type=>['CDS','Exon']) ---------------------------------------------------------------------- features() will search the entry feature list for those features that meet certain criteria. The criteria are specified using the *-pos* and/or *-type* argument names, as shown below. -pos Provide a position or range of positions which the feature must overlap. A single position is specified in this way: -pos => 1500; # feature must overlap postion 1500 or a range of positions in this way: -pos => [1000,1500]; # 1000 to 1500 inclusive If no criteria are provided, then features() returns all the features, and is equivalent to calling the Features() accessor. -type, -types Filter the list of features by type or a set of types. Matches are case-insensitive, so "exon", "Exon" and "EXON" are all equivalent. You may call with a single type as in: -type => 'Exon' or with a list of types, as in -types => ['Exon','CDS'] The names "-type" and "-types" can be used interchangeably. $seqObj = $entry->bioSeq; ------------------------- Returns a *Note Bio/Seq: Bio/Seq, object from the Bioperl project. Dies with an error message unless the Bio::Seq module is installed. AUTHOR ====== Lincoln D. Stein . COPYRIGHT ========= Copyright 1997-1999, Cold Spring Harbor Laboratory, Cold Spring Harbor NY. This module can be used and distributed on the same terms as Perl itself. SEE ALSO ======== *Note Boulder: Boulder,, `Boulder:Genbank' in this node, *Note Stone: Stone,  File: pm.info, Node: Storable, Next: String/Approx, Prev: Stone/GB_Sequence, Up: Module List persistency for perl data structures ************************************ NAME ==== Storable - persistency for perl data structures SYNOPSIS ======== use Storable; store \%table, 'file'; $hashref = retrieve('file'); use Storable qw(nstore store_fd nstore_fd freeze thaw dclone); # Network order nstore \%table, 'file'; $hashref = retrieve('file'); # There is NO nretrieve() # Storing to and retrieving from an already opened file store_fd \@array, \*STDOUT; nstore_fd \%table, \*STDOUT; $aryref = fd_retrieve(\*SOCKET); $hashref = fd_retrieve(\*SOCKET); # Serializing to memory $serialized = freeze \%table; %table_clone = %{ thaw($serialized) }; # Deep (recursive) cloning $cloneref = dclone($ref); # Advisory locking use Storable qw(lock_store lock_nstore lock_retrieve) lock_store \%table, 'file'; lock_nstore \%table, 'file'; $hashref = lock_retrieve('file'); DESCRIPTION =========== The Storable package brings persistency to your perl data structures containing SCALAR, ARRAY, HASH or REF objects, i.e. anything that can be convenientely stored to disk and retrieved at a later time. It can be used in the regular procedural way by calling store with a reference to the object to be stored, along with the file name where the image should be written. The routine returns undef for I/O problems or other internal error, a true value otherwise. Serious errors are propagated as a die exception. To retrieve data stored to disk, use retrieve with a file name, and the objects stored into that file are recreated into memory for you, a reference to the root object being returned. In case an I/O error occurs while reading, undef is returned instead. Other serious errors are propagated via die. Since storage is performed recursively, you might want to stuff references to objects that share a lot of common data into a single array or hash table, and then store that object. That way, when you retrieve back the whole thing, the objects will continue to share what they originally shared. At the cost of a slight header overhead, you may store to an already opened file descriptor using the `store_fd' routine, and retrieve from a file via `fd_retrieve'. Those names aren't imported by default, so you will have to do that explicitely if you need those routines. The file descriptor you supply must be already opened, for read if you're going to retrieve and for write if you wish to store. store_fd(\%table, *STDOUT) || die "can't store to stdout\n"; $hashref = fd_retrieve(*STDIN); You can also store data in network order to allow easy sharing across multiple platforms, or when storing on a socket known to be remotely connected. The routines to call have an initial n prefix for *network*, as in `nstore' and `nstore_fd'. At retrieval time, your data will be correctly restored so you don't have to know whether you're restoring from native or network ordered data. Double values are stored stringified to ensure portability as well, at the slight risk of loosing some precision in the last decimals. When using `fd_retrieve', objects are retrieved in sequence, one object (i.e. one recursive tree) per associated `store_fd'. If you're more from the object-oriented camp, you can inherit from Storable and directly store your objects by invoking store as a method. The fact that the root of the to-be-stored tree is a blessed reference (i.e. an object) is special-cased so that the retrieve does not provide a reference to that object but rather the blessed object reference itself. (Otherwise, you'd get a reference to that blessed object). MEMORY STORE ============ The Storable engine can also store data into a Perl scalar instead, to later retrieve them. This is mainly used to freeze a complex structure in some safe compact memory place (where it can possibly be sent to another process via some IPC, since freezing the structure also serializes it in effect). Later on, and maybe somewhere else, you can thaw the Perl scalar out and recreate the original complex structure in memory. Surprisingly, the routines to be called are named `freeze' and `thaw'. If you wish to send out the frozen scalar to another machine, use `nfreeze' instead to get a portable image. Note that freezing an object structure and immediately thawing it actually achieves a deep cloning of that structure: dclone(.) = thaw(freeze(.)) Storable provides you with a `dclone' interface which does not create that intermediary scalar but instead freezes the structure in some internal memory space and then immediatly thaws it out. ADVISORY LOCKING ================ The `lock_store' and `lock_nstore' routine are equivalent to store and `nstore', only they get an exclusive lock on the file before writing. Likewise, `lock_retrieve' performs as retrieve, but also gets a shared lock on the file before reading. Like with any advisory locking scheme, the protection only works if you systematically use `lock_store' and `lock_retrieve'. If one side of your application uses store whilst the other uses `lock_retrieve', you will get no protection at all. The internal advisory locking is implemented using Perl's flock() routine. If your system does not support any form of flock(), or if you share your files across NFS, you might wish to use other forms of locking by using modules like LockFile::Simple which lock a file using a filesystem entry, instead of locking the file descriptor. SPEED ===== The heart of Storable is written in C for decent speed. Extra low-level optimization have been made when manipulating perl internals, to sacrifice encapsulation for the benefit of a greater speed. CANONICAL REPRESENTATION ======================== Normally Storable stores elements of hashes in the order they are stored internally by Perl, i.e. pseudo-randomly. If you set `$Storable::canonical' to some TRUE value, Storable will store hashes with the elements sorted by their key. This allows you to compare data structures by comparing their frozen representations (or even the compressed frozen representations), which can be useful for creating lookup tables for complicated queries. Canonical order does not imply network order, those are two orthogonal settings. ERROR REPORTING =============== Storable uses the "exception" paradigm, in that it does not try to workaround failures: if something bad happens, an exception is generated from the caller's perspective (see *Note Carp: Carp, and `croak()'). Use eval {} to trap those exceptions. When Storable croaks, it tries to report the error via the `logcroak()' routine from the `Log::Agent' package, if it is available. Normal errors are reported by having store() or retrieve() return undef. Such errors are usually I/O errors (or truncated stream errors at retrieval). WIZARDS ONLY ============ Hooks ----- Any class may define hooks that will be called during the serialization and deserialization process on objects that are instances of that class. Those hooks can redefine the way serialization is performed (and therefore, how the symetrical deserialization should be conducted). Since we said earlier: dclone(.) = thaw(freeze(.)) everything we say about hooks should also hold for deep cloning. However, hooks get to know whether the operation is a mere serialization, or a cloning. Therefore, when serializing hooks are involved, dclone(.) <> thaw(freeze(.)) Well, you could keep them in sync, but there's no guarantee it will always hold on classes somebody else wrote. Besides, there is little to gain in doing so: a serializing hook could only keep one attribute of an object, which is probably not what should happen during a deep cloning of that same object. Here is the hooking interface: `STORABLE_freeze' *obj*, cloning The serializing hook, called on the object during serialization. It can be inherited, or defined in the class itself, like any other method. Arguments: *obj* is the object to serialize, cloning is a flag indicating whether we're in a dclone() or a regular serialization via store() or freeze(). Returned value: A LIST `($serialized, $ref1, $ref2, ...)' where $serialized is the serialized form to be used, and the optional $ref1, $ref2, etc... are extra references that you wish to let the Storable engine serialize. At deserialization time, you will be given back the same LIST, but all the extra references will be pointing into the deserialized structure. The *first time* the hook is hit in a serialization flow, you may have it return an empty list. That will signal the Storable engine to further discard that hook for this class and to therefore revert to the default serialization of the underlying Perl data. The hook will again be normally processed in the next serialization. Unless you know better, serializing hook should always say: sub STORABLE_freeze { my ($self, $cloning) = @_; return if $cloning; # Regular default serialization .... } in order to keep reasonable dclone() semantics. `STORABLE_thaw' *obj*, cloning, *serialized*, ... The deserializing hook called on the object during deserialization. But wait. If we're deserializing, there's no object yet... right? Wrong: the Storable engine creates an empty one for you. If you know Eiffel, you can view `STORABLE_thaw' as an alternate creation routine. This means the hook can be inherited like any other method, and that *obj* is your blessed reference for this particular instance. The other arguments should look familiar if you know `STORABLE_freeze': cloning is true when we're part of a deep clone operation, *serialized* is the serialized string you returned to the engine in `STORABLE_freeze', and there may be an optional list of references, in the same order you gave them at serialization time, pointing to the deserialized objects (which have been processed courtesy of the Storable engine). When the Storable engine does not find any `STORABLE_thaw' hook routine, it tries to load the class by requiring the package dynamically (using the blessed package name), and then re-attempts the lookup. If at that time the hook cannot be located, the engine croaks. Note that this mechanism will fail if you define several classes in the same file, but perlmod(1) warned you. It is up to you to use these information to populate *obj* the way you want. Returned value: none. Predicates ---------- Predicates are not exportable. They must be called by explicitely prefixing them with the Storable package name. `Storable::last_op_in_netorder' The `Storable::last_op_in_netorder()' predicate will tell you whether network order was used in the last store or retrieve operation. If you don't know how to use this, just forget about it. `Storable::is_storing' Returns true if within a store operation (via STORABLE_freeze hook). `Storable::is_retrieving' Returns true if within a retrieve operation, (via STORABLE_thaw hook). Recursion --------- With hooks comes the ability to recurse back to the Storable engine. Indeed, hooks are regular Perl code, and Storable is convenient when it comes to serialize and deserialize things, so why not use it to handle the serialization string? There are a few things you need to know however: * You can create endless loops if the things you serialize via freeze() (for instance) point back to the object we're trying to serialize in the hook. * Shared references among objects will not stay shared: if we're serializing the list of object [A, C] where both object A and C refer to the SAME object B, and if there is a serializing hook in A that says freeze(B), then when deserializing, we'll get [A', C'] where A' refers to B', but C' refers to D, a deep clone of B'. The topology was not preserved. That's why `STORABLE_freeze' lets you provide a list of references to serialize. The engine guarantees that those will be serialized in the same context as the other objects, and therefore that shared objects will stay shared. In the above [A, C] example, the `STORABLE_freeze' hook could return: ("something", $self->{B}) and the B part would be serialized by the engine. In `STORABLE_thaw', you would get back the reference to the B' object, deserialized for you. Therefore, recursion should normally be avoided, but is nonetheless supported. Deep Cloning ------------ There is a new Clone module available on CPAN which implements deep cloning natively, i.e. without freezing to memory and thawing the result. It is aimed to replace Storable's dclone() some day. However, it does not currently support Storable hooks to redefine the way deep cloning is performed. EXAMPLES ======== Here are some code samples showing a possible usage of Storable: use Storable qw(store retrieve freeze thaw dclone); %color = ('Blue' => 0.1, 'Red' => 0.8, 'Black' => 0, 'White' => 1); store(\%color, '/tmp/colors') or die "Can't store %a in /tmp/colors!\n"; $colref = retrieve('/tmp/colors'); die "Unable to retrieve from /tmp/colors!\n" unless defined $colref; printf "Blue is still %lf\n", $colref->{'Blue'}; $colref2 = dclone(\%color); $str = freeze(\%color); printf "Serialization of %%color is %d bytes long.\n", length($str); $colref3 = thaw($str); which prints (on my machine): Blue is still 0.100000 Serialization of %color is 102 bytes long. WARNING ======= If you're using references as keys within your hash tables, you're bound to disapointment when retrieving your data. Indeed, Perl stringifies references used as hash table keys. If you later wish to access the items via another reference stringification (i.e. using the same reference that was used for the key originally to record the value into the hash table), it will work because both references stringify to the same string. It won't work across a store and retrieve operations however, because the addresses in the retrieved objects, which are part of the stringified references, will probably differ from the original addresses. The topology of your structure is preserved, but not hidden semantics like those. On platforms where it matters, be sure to call binmode() on the descriptors that you pass to Storable functions. Storing data canonically that contains large hashes can be significantly slower than storing the same data normally, as temprorary arrays to hold the keys for each hash have to be allocated, populated, sorted and freed. Some tests have shown a halving of the speed of storing - the exact penalty will depend on the complexity of your data. There is no slowdown on retrieval. BUGS ==== You can't store GLOB, CODE, FORMLINE, etc... If you can define semantics for those operations, feel free to enhance Storable so that it can deal with them. The store functions will croak if they run into such references unless you set `$Storable::forgive_me' to some TRUE value. In that case, the fatal message is turned in a warning and some meaningless string is stored instead. Setting `$Storable::canonical' may not yield frozen strings that compare equal due to possible stringification of numbers. When the string version of a scalar exists, it is the form stored, therefore if you happen to use your numbers as strings between two freezing operations on the same data structures, you will get different results. When storing doubles in network order, their value is stored as text. However, you should also not expect non-numeric floating-point values such as infinity and "not a number" to pass successfully through a nstore()/retrieve() pair. As Storable neither knows nor cares about character sets (although it does know that characters may be more than eight bits wide), any difference in the interpretation of character codes between a host and a target system is your problem. In particular, if host and target use different code points to represent the characters used in the text representation of floating-point numbers, you will not be able be able to exchange floating-point data, even with nstore(). CREDITS ======= Thank you to (in chronological order): Jarkko Hietaniemi Ulrich Pfeifer Benjamin A. Holzman Andrew Ford Gisle Aas Jeff Gresham Murray Nesbitt Marc Lehmann Justin Banks Jarkko Hietaniemi (AGAIN, as perl 5.7.0 Pumpkin!) Salvador Ortiz Garcia Dominic Dunlop Erik Haugan for their bug reports, suggestions and contributions. Benjamin Holzman contributed the tied variable support, Andrew Ford contributed the canonical order for hashes, and Gisle Aas fixed a few misunderstandings of mine regarding the Perl internals, and optimized the emission of "tags" in the output streams by simply counting the objects instead of tagging them (leading to a binary incompatibility for the Storable image starting at version 0.6-older images are of course still properly understood). Murray Nesbitt made Storable thread-safe. Marc Lehmann added overloading and reference to tied items support. TRANSLATIONS ============ There is a Japanese translation of this man page available at http://member.nifty.ne.jp/hippo2000/perltips/storable.htm , courtesy of Kawai, Takanori . AUTHOR ====== Raphael Manfredi `' SEE ALSO ======== Clone(3).  File: pm.info, Node: String/Approx, Next: String/BitCount, Prev: Storable, Up: Module List Perl extension for approximate matching (fuzzy matching) ******************************************************** NAME ==== String::Approx - Perl extension for approximate matching (fuzzy matching) SYNOPSIS ======== use String::Approx 'amatch'; print if amatch("foobar"); my @matches = amatch("xyzzy", @inputs); my @catches = amatch("plugh", ['2'], @inputs); DESCRIPTION =========== String::Approx lets you match and substitute strings approximately. With this you can emulate errors: typing errorrs, speling errors, closely related vocabularies (colour color), genetic mutations (GAG ACT), abbreviations (McScot, MacScot). The measure of *approximateness* is the *Levenshtein edit distance*. It is the total number of "edits": insertions, word world deletions, monkey money and substitutions sun fun required to transform a string to another string. For example, to transform *"lead"* into *"gold"*, you need three edits: lead gead goad gold The edit distance of "lead" and "gold" is therefore three. MATCH ===== use String::Approx 'amatch'; $matched = amatch("pattern") $matched = amatch("pattern", [ modifiers ]) $any_matched = amatch("pattern", @inputs) $any_matched = amatch("pattern", [ modifiers ], @inputs) @match = amatch("pattern") @match = amatch("pattern", [ modifiers ]) @matches = amatch("pattern", @inputs) @matches = amatch("pattern", [ modifiers ], @inputs) Match pattern approximately. In list context return the matched *@inputs*. If no inputs are given, match against the $_. In scalar context return true if any of the inputs match, false if none match. Notice that the pattern is a string. Not a regular expression. None of the regular expression notations (^, ., *, and so on) work. They are characters just like the others. Note-on-note: some limited form of *"regular expressionism"* is planned in future: for example character classes ([abc]) and *any-chars* (.). But that feature will be turned on by a special *modifier* (just a guess: "r"), so there should be no backward compatibility problem. Notice also that matching is not symmetric. The inputs are matched against the pattern, not the other way round. In other words: the pattern can be a substring, a submatch, of an input element. An input element is always a superstring of the pattern. MODIFIERS --------- With the modifiers you can control the amount of approximateness and certain other control variables. The modifiers are one or more strings, for example `"i"', within a string optionally separated by whitespace. The modifiers are inside an anonymous array: the `[ ]' in the syntax are not notational, they really do mean `[ ]', for example `[ "i", "2" ]'. `["2 i"]' would be identical. The implicit default approximateness is 10%, rounded up. In other words: every tenth character in the pattern may be an error, an edit. You can explicitly set the maximum approximateness by supplying a modifier like number number% Examples: `"3"', `"15%"'. Using a similar syntax you can separately control the maximum number of insertions, deletions, and substitutions by prefixing the numbers with I, D, or S, like this: Inumber Inumber% Dnumber Dnumber% Snumber Snumber% Examples: `"I2"', `"D20%"', `"S0"'. You can ignore case (`"A"' becames equal to `"a"' and vice versa) by adding the `"i"' modifier. For example [ "i 25%", "S0" ] means *ignore case*, *allow every fourth character to be "an edit"*, but allow *no substitutions*. (See `NOTES' in this node about disallowing substitutions or insertions.) SUBSTITUTE ========== use String::Approx 'asubstitute'; @substituted = asubstitute("pattern", "replacement") @substituted = asubstitute("pattern", "replacement", @inputs) @substituted = asubstitute("pattern", "replacement", [ modifiers ]) @substituted = asubstitute("pattern", "replacement", [ modifiers ], @inputs) Substitute approximate pattern with replacement and return as a list of *@inputs*, the substitutions having been made on the elements that did match the pattern. If no inputs are given, substitute in the $_. The replacement can contain magic strings $&, $`, $' that stand for the matched string, the string before it, and the string after it, respectively. All the other arguments are as in `amatch()', plus one additional modifier, `"g"' which means substitute globally (all the matches in an element and not just the first one, as is the default). See `BAD NEWS' in this node about the unfortunate stinginess of `asubstitute()'. INDEX ===== use String::Approx 'aindex'; $index = aindex("pattern") @indices = aindex("pattern", @inputs) $index = aindex("pattern", [ modifiers ]) @indices = aindex("pattern", [ modifiers ], @inputs) Like `amatch()' but returns the index/indices at which the pattern matches approximately. In list context and if `@inputs' are used, returns a list of indices, one index for each input element. If there's no approximate match, `-1' is returned as the index. There's also backwards-scanning `arindex()'. SLICE ===== use String::Approx 'aindex'; ($index, $size) = aslice("pattern") ([$i0, $s0], ...) = aslice("pattern", @inputs) ($index, $size) = aslice("pattern", [ modifiers ]) ([$i0, $s0], ...) = aslice("pattern", [ modifiers ], @inputs) Like `aindex()' but returns also the size of the match. If the match fails, returns an empty list (when matching against $_) or an empty anonymous list corresponding to the particular input. Note that the size of the match will very probably be something you did not expect (such as longer than the pattern). This may or may not be fixed in future releases. If the modifier "minimal_distance" is used, the minimal possible edit distance is returned as the third element: ($index, $size, $distance) = aslice("pattern", [ modifiers ]) ([$i0, $s0, $d0], ...) = aslice("pattern", [ modifiers ], @inputs) DISTANCE ======== use String::Approx 'adist'; $dist = adist("pattern", $input); @dist = adist("pattern", @input); Return the *edit distance* or distances between the pattern and the input or inputs. Zero edit distance means exact match. (Remember that the match can 'float' in the inputs, the match is a substring match.) If the pattern is longer than the input or inputs, the returned distance or distance is or are negative. use String::Approx 'adistr'; $dist = adistr("pattern", $input); @dist = adistr("pattern", @inputs); Return the relative *edit distance* or distances between the pattern and the input or inputs. Zero relative edit distance means exact match, one means completely different. (Remember that the match can 'float' in the inputs, the match is a substring match.) If the pattern is longer than the input or inputs, the returned distance or distances is or are negative. CONTROLLING THE CACHE ===================== `String::Approx' maintains a LU (least-used) cache that holds the 'matching engines' for each instance of a *pattern+modifiers*. The cache is intended to help the case where you match a small set of patterns against a large set of string. However, the more engines you cache the more you eat memory. If you have a lot of different patterns or if you have a lot of memory to burn, you may want to control the cache yourself. For example, allowing a larger cache consumes more memory but probably runs a little bit faster since the cache fills (and needs flushing) less often. The cache has two parameters: max and purge. The first one is the maximum size of the cache and the second one is the cache flushing ratio: when the number of cache entries exceeds max, max times purge cache entries are flushed. The default values are 1000 and 0.75, respectively, which means that when the 1001st entry would be cached, 750 least used entries will be removed from the cache. To access the parameters you can use the calls $now_max = String::Approx::cache_max(); String::Approx::cache_max($new_max); $now_purge = String::Approx::cache_purge(); String::Approx::cache_purge($new_purge); $limit = String::Approx::cache_n_purge(); To be honest, there are actually *two* caches: the first one is used far the patterns with no modifiers, the second one for the patterns with pattern modifiers. Using the standard parameters you will therefore actually cache up to 2000 entries. The above calls control both caches for the same price. To disable caching completely use String::Approx::cache_disable(); Note that this doesn't flush any possibly existing cache entries, to do that use String::Approx::cache_flush_all(); NOTES ===== Because matching is by *substrings*, not by whole strings, insertions and substitutions produce often very similar results: "abcde" matches "axbcde" either by insertion or substitution of "x". The maximum edit distance is also the maximum number of edits. That is, the `"I2"' in amatch("abcd", ["I2"]) is useless because the maximum edit distance is (implicitly) 1. You may have meant to say amatch("abcd", ["2D1S1"]) or something like that. If you want to simulate transposes feet fete you need to allow at least edit distance of two because in terms of our edit primitives a transpose is first one deletion and then one insertion. TEXT POSITION ------------- The starting and ending positions of matching, substituting, indexing, or slicing can be changed from the beginning and end of the input(s) to some other positions by using either or both of the modifiers "initial_position=24" "final_position=42" or the both the modifiers "initial_position=24" "position_range=10" By setting the `"position_range"' to be zero you can limit (anchor) the operation to happen only once (if a match is possible) at the position. VERSION ======= Major release 3. CHANGES FROM VERSION 2 ====================== GOOD NEWS --------- The version 3 is 2-3 times faster than version 2 No pattern length limitation The algorithm is independent on the pattern length: its time complexity is *O(kn)*, where k is the number of edits and n the length of the text (input). The preprocessing of the pattern will of course take some *O(m)* (m being the pattern length) time, but `amatch()' and `asubstitute()' cache the result of this preprocessing so that it is done only once per pattern. BAD NEWS -------- You do need a C compiler to install the module Perl's regular expressions are no more used; instead a faster and more scalable algorithm written in C is used. `asubstitute()' is now always stingy The string matched and substituted is now always stingy, as short as possible. It used to be as long as possible. This is an unfortunate change stemming from switching the matching algorithm. Example: with edit distance of two and substituting for `"word"' from `"cork"' and `"wool"' previously did match `"cork"' and `"wool"'. Now it does match `"or"' and `"wo"'. As little as possible, or, in other words, with as much approximateness, as many edits, as possible. Because there is no *need* to match the `"c"' of `"cork"', it is not matched. no more `aregex()' because regular expressions are no more used no more `compat1' for String::Approx version 1 compatibility ACKNOWLEDGEMENTS ================ The following people have provided valuable test cases, documentation clarifications, and other feedback: Jared August, Anirvan Chatterjee, Steve A. Chervitz, Aldo Calpini, David Curiel, Teun van den Dool, Alberto Fontaneda, Rob Fugina, Dmitrij Frishman, Lars Gregersen, Kevin Greiner, B. Elijah Griffin, Mike Hanafey, Mitch Helle, Ricky Houghton, Helmut Jarausch, Damian Keefe, Ben Kennedy, Craig Kelley, Franz Kirsch, Dag Kristian, Mark Land, J. D. Laub, Sergey Novoselov, Andy Oram, Eric Promislow, Nikolaus Rath, Stefan Ram, Dag Kristian Rognlien, Stewart Russell, Slaven Rezic, Chris Rosin, Ilya Sandler, Bob J.A. Schijvenaars, Ross Smith, Frank Tobin, Greg Ward, Rick Wise. The matching algorithm was developed by Udi Manber, Sun Wu, and Burra Gopal in the Department of Computer Science, University of Arizona. AUTHOR ====== Jarkko Hietaniemi  File: pm.info, Node: String/BitCount, Next: String/CRC, Prev: String/Approx, Up: Module List count number of "1" bits in string ********************************** NAME ==== String::BitCount, BitCount showBitCount - count number of "1" bits in string SYNOPSIS ======== use String::BitCount; DESCRIPTION =========== BitCount LIST Joins the elements of LIST into a single string and returns the the number of bits in this string. showBitCount LIST Copies the elements of LIST to a new list and converts the new elements to strings of digits showing the number of set bits in the original byte. In array context returns the new list. In scalar context joins the elements of the new list into a single string and returns the string. AUTHOR ====== Winfried Koenig SEE ALSO ======== perl(1)  File: pm.info, Node: String/CRC, Next: String/CRC32, Prev: String/BitCount, Up: Module List Perl interface cyclic redundency check generation ************************************************* NAME ==== CRC - Perl interface cyclic redundency check generation SYNOPSIS ======== use String::CRC; ($crc_low, $crc_high) = crc("some string", 64); $crc_binary = crc("some string", 64); ($crc_low, $crc_high) = unpack("LL", $crc_binary); ($crc_small) = crc("some string", 32); DESCRIPTION =========== The *CRC* module calculates CRC of various lenghts. The default CRC length is 32 bits. CRCs of 32 bits and smaller will be returned as an integer. CRCs that are larger than 32 bits will be returned as two integers if called in list context and as a packed binary string if called in scalar context. COPYRIGHT ========= Taken from Matt Dillon's Diablo distribution with permission. The authors of this package (David Sharnoff & Matthew Dillon) disclaim all copyrights and release it into the public domain.  File: pm.info, Node: String/CRC32, Next: String/Checker, Prev: String/CRC, Up: Module List Perl interface for cyclic redundency check generation ***************************************************** NAME ==== CRC32 - Perl interface for cyclic redundency check generation SYNOPSIS ======== use String::CRC32; $crc = crc32("some string"); $crc = crc32("some string", initvalue); $somestring = "some string"; $crc = crc32($somestring); open(SOMEFILE, "location/of/some.file"); $crc = crc32(*SOMEFILE); close(SOMEFILE); DESCRIPTION =========== The *CRC32* module calculates CRC sums of 32 bit lenghts. It generates the same CRC values as ZMODEM, PKZIP, PICCHECK and many others. Despite its name, this module is able to compute the checksum of strings as well as of files. EXAMPLES ======== $crc = crc32("some string"); results in the same as $crc = crc32(" string", crc32("some")); This is useful for subsequent CRC checking of substrings. You may even check files: open(SOMEFILE, "location/of/some.file"); $crc = crc32(*SOMEFILE); close(SOMEFILE); A init value may also been supplied in the above example. AUTHOR ====== Soenke J. Peters Please be so kind as to report any bugs/suggestions to the above address. COPYRIGHT ========= CRC algorithm code taken from CRC-32 by Craig Bruce. The module stuff is inspired by a similar perl module called String::CRC by David Sharnoff & Matthew Dillon. Horst Fickenscher told me that it could be useful to supply an init value to the crc checking function and so I included this possibility. The author of this package disclaims all copyrights and releases it into the public domain  File: pm.info, Node: String/Checker, Next: String/DiffLine, Prev: String/CRC32, Up: Module List An extensible string validation module (allowing commonly used checks on strings to be called more concisely and consistently). ******************************************************************************************************************************* NAME ==== String::Checker - An extensible string validation module (allowing commonly used checks on strings to be called more concisely and consistently). SYNOPSIS ======== use String::Checker; String::Checker::register_check($checkname, \&sub); $return = String::Checker::checkstring($string, [ expectation, ... ]); DESCRIPTION =========== This is a very simple library for checking a string against a given set of expectations. It contains a number of pre-defined expectations which can be used, and can also be extended to perform any arbitrary match or modification on a string. Why is this useful? If you're only checking one string, it probably isn't. However, if you're checking a bunch of strings (say, for example, CGI input parameters) against a set of expectations, this comes in pretty handy. As a matter of fact, the CGI::ArgChecker module is a simple, CGI.pm aware wrapper for this library. Checking a string ----------------- The checkstring function takes a string scalar and a reference to a list of 'expectations' as arguments, and outputs a reference to a list, containing the names of the expectations which failed. Each expectation, in turn, can either be a string scalar (the name of the expectation) or a two-element array reference (the first element being the name of the expectation, and second element being the argument to that expectation.) For example: $string = "foo"; String::Checker::checkstring($string, [ 'allow_empty', [ 'max' => 20 ] ] ); Note that the expectations are run in order. In the above case, for example, the 'allow_empty' expectation would be checked first, followed by the 'max' expectation with an argument of 20. Defined checks -------------- The module predefines a number of checks. They are: allow_empty Never fails - will convert an undef scalar to an empty string, though. disallow_empty Fails if the input string is either undef or empty. min Fails if the length of the input string is less than the numeric value of it's single argument. max Fails if the length of the input string is more than the numeric value of it's single argument. want_int Fails if the input string does not solely consist of numeric characters. want_float Fails if the argument does not solely consist of numeric characters, plus an optional single '.'. allow_chars Fails if the input string contains characters other than those in its argument. disallow_chars Fails if the input string contains any of the characters in its argument. upcase Never fails - converts the string to upper case. downcase Never fails - converts the string to lower case. stripxws Never fails - strips leading and trailing whitespace from the string. enum Fails if the input string does not precisely match at least one of the elements of the array reference it takes as an argument. match Fails if the input string does not match the regular expression it takes as an argument. want_email Fails if the input string does not match the regular expression: ^\S+\@@[\w-]+\.[\w\.-]+$ want_phone Fails if the input string does not match the regular expression ^[0-9+.()-]*$ want_date Interprets the input string as a date, if possible. This will fail if it can't figure out a date from the input. In addition, it is possible to use this to standardize date input. Pass a formatting string (see the strftime(3) man page) as an argument to this check, and the string will be formatted appropriately if possible. This is based on the Date::Manip(1) module, so that documentation might prove valuable if you're using this check. Extension checks ---------------- Use register_check to register a new expectation checking routine. This function should be passed a new expectation name and a code reference. This code reference will be called every time the expectation name is seen, with either one or two arguments. The first argument will always be a reference to the input string (the function is free to modify the value of the string). The second argument, if any, is the second element of a two-part expectation, whatever that might be. The function should return undef unless there's a problem, in which case it should return 1. It's also best (if possible) to return undef if the string is undef, so that the user can decide whether to allow_empty or disallow_empty independent of your check. For example, registering a check to verify that the input word is "poot" would look like: String::Checker::register_check("ispoot", sub { my($s) = shift; if ((defined($$s)) && ($$s ne 'poot')) { return 1; } return undef; }; BUGS ==== Hopefully none. AUTHOR ====== J. David Lowe, dlowe@webjuice.com SEE ALSO ======== perl(1), CGI::ArgChecker(1)  File: pm.info, Node: String/DiffLine, Next: String/Escape, Prev: String/Checker, Up: Module List find the character,line, and line position of the first difference ****************************************************************** NAME ==== String::DiffLine - find the character,line, and line position of the first difference SYNOPSIS ======== use String::DiffLine qw(diffline); ($char,$line,$lpos)=diffline("abc","abx"); DESCRIPTION =========== diffline($str1,$str2) Returns a three-item list identifying the location of the first difference between the two strings: the character position (indexed from 0), the line number (indexed from 1), and the position in the line (indexed from 0). $/ is used as the line separator. If the strings are identical, the first element of the returned list is zero, the second element is the number of line separators plus one, and the last element is the number of characters following the last line separator. AUTHOR ====== Andrew Allen SEE ALSO ======== perl(1).  File: pm.info, Node: String/Escape, Next: String/Parity, Prev: String/DiffLine, Up: Module List Registry of string functions, including backslash escapes ********************************************************* NAME ==== String::Escape - Registry of string functions, including backslash escapes SYNOPSIS ======== use String::Escape qw( printable unprintable ); # Convert control, high-bit chars to \n or \xxx escapes $output = printable($value); # Convert escape sequences back to original chars $value = unprintable($input); use String::Escape qw( elide ); # Shorten strings to fit, if necessary foreach (@_) { print elide( $_, 79 ) . "\n"; } use String::Escape qw( escape ); # Defer selection of escaping routines until runtime $escape_name = $use_quotes ? 'qprintable' : 'printable'; @escaped = escape($escape_name, @values); DESCRIPTION =========== This module provides a flexible calling interface to some frequently-performed string conversion functions, including applying and removing C/Unix-style backslash escapes like \n and \t, wrapping and removing double-quotes, and truncating to fit within a desired length. The escape() function provides for dynamic selection of operations by using a package hash variable to map escape specification strings to the functions which implement them. The lookup imposes a bit of a performance penalty, but allows for some useful late-binding behaviour. Compound specifications (ex. 'quoted uppercase') are expanded to a list of functions to be applied in order. Other modules may also register their functions here for later general use. REFERENCE ========= Escaping And Unescaping Functions --------------------------------- Each of these functions takes a single simple scalar argument and returns its escaped (or unescaped) equivalent. quote($value) : $escaped Add double quote characters to each end of the string. quote_non_words($value) : $escaped As above, but only quotes empty, punctuated, and multiword values. unquote($value) : $escaped If the string both begins and ends with double quote characters, they are removed, otherwise the string is returned unchanged. printable($value) : $escaped unprintable($value) : $escaped These functions convert return, newline, tab, backslash and unprintable characters to their backslash-escaped equivalents and back again. qprintable($value) : $escaped unqprintable($value) : $escaped The qprintable function applies printable escaping and then wraps the results with quote_non_words, while unqprintable applies unquote and then unprintable. (Note that this is not MIME quoted-printable encoding.) String Elision Function ----------------------- This function extracts the leading portion of a provided string and appends ellipsis if it's longer than the desired maximum excerpt length. elide($string) : $elided_string elide($string, $length) : $elided_string elide($string, $length, $word_boundary_strictness) : $elided_string If the original string is shorter than $length, it is returned unchanged. At most $length characters are returned; if called with a single argument, $length defaults to $DefaultLength. Up to $word_boundary_strictness additional characters may be ommited in order to make the elided portion end on a word boundary; you can pass 0 to ignore word boundaries. If not provided, $word_boundary_strictness defaults to $DefaultStrictness. $Elipses The string of characters used to indicate the end of the excerpt. Initialized to '...'. $DefaultLength The default target excerpt length, used when the elide function is called with a single argument. Initialized to 60. $DefaultStrictness The default word-boundary flexibility, used when the elide function is called without the third argument. Initialized to 10. Escape By-Name -------------- These functions provide for the registration of string-escape specification names and corresponding functions, and then allow the invocation of one or several of these functions on one or several source string values. escape($escapes, $value) : $escaped_value escape($escapes, @values) : @escaped_values Returns an altered copy of the provided values by looking up the escapes string in a registry of string-modification functions. If called in a scalar context, operates on the single value passed in; if called in a list contact, operates identically on each of the provided values. Valid escape specifications are: one of the keys defined in %Escapes The coresponding specification will be looked up and used. a sequence of names separated by whitespace, Each name will be looked up, and each of the associated functions will be applied successively, from left to right. a reference to a function The provided function will be called on with each value in turn. a reference to an array Each item in the array will be expanded as provided above. A fatal error will be generated if you pass an unsupported escape specification, or if the function is called with multiple values in a scalar context. String::Escape::names() : @defined_escapes Returns a list of defined escape specification strings. String::Escape::add( $escape_name, \&escape_function ); Add a new escape specification and corresponding function. %Escapes : $name, $operation, ... By default, the %Escapes hash is initialized to contain the following mappings: quote, unquote, or quote_non_words printable, unprintable, qprintable, or unqprintable, elide Run the above-described functions of the same names. uppercase, lowercase, or initialcase Alters the case of letters in the string to upper or lower case, or for initialcase, sets the first letter to upper case and all others to lower. none Return an unchanged copy of the original value. EXAMPLES ======== `print printable( "\tNow is the time\nfor all good folks\n" );' `*\tNow is the time\nfor all good folks\n*' `print escape('qprintable', "\tNow is the time\nfor all good folks\n" );' `*"\tNow is the time\nfor all good folks\n"*' `print escape('uppercase qprintable', "\tNow is the time\nfor all good folks\n" );' `*"\tNOW IS THE TIME\nFOR ALL GOOD FOLKS\n"*' `print join '--', escape('printable', "\tNow is the time\n", "for all good folks\n" );' `*\tNow is the time\n--for all good folks\n*' `$string = 'foo bar baz this that the other';' `print elide( $string, 100 );' `*foo bar baz this that the other*' `print elide( $string, 12 );' `*foo bar...*' `print elide( $string, 12, 0 );' `*foo bar b...*' PREREQUISITES AND INSTALLATION ============================== This package should run on any standard Perl 5 installation. To install this package, download and unpack the distribution archive from http://www.evoscript.com/dist/ or your favorite CPAN mirror, and execute the standard "perl Makefile.PL", "make test", "make install" sequence. STATUS AND SUPPORT ================== This release of String::Escape is intended for public review and feedback. It has been tested in several environments and no major problems have been discovered, but it should be considered "beta" pending that feedback. Name DSLI Description -------------- ---- --------------------------------------------- String:: ::Escape bdpf Escape by-name registry and useful functions Further information and support for this module is available at . Please report bugs or other problems to . The following changes are in progress or under consideration: Use word-boundary test in elide's regular expression rather than \s|\Z. Compare with TOMC's String::Edit package. AUTHORS AND COPYRIGHT ===================== Copyright 1997, 1998 Evolution Online Systems, Inc. You may use this software for free under the terms of the Artistic License. Contributors: M. Simon Cavalletto , Jeremy G. Bishop