This is Info file pm.info, produced by Makeinfo version 1.68 from the input file bigpm.texi.  File: pm.info, Node: Bio/Tools/SeqPattern, Next: Bio/Tools/SeqStats, Prev: Bio/Tools/SeqAnal, Up: Module List Bioperl object for a sequence pattern or motif ********************************************** NAME ==== Bio::Tools::SeqPattern.pm - Bioperl object for a sequence pattern or motif SYNOPSIS ======== Object Creation --------------- use Bio::Tools::SeqPattern (); $pat1 = 'T[GA]AA...TAAT'; $pattern1 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Dna'); $pat2 = '[VILM]R(GXX){3,2}...[^PG]'; $pattern2 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Amino'); INSTALLATION ============ This module is included with the central Bioperl distribution: http://bio.perl.org/Core/Latest ftp://bio.perl.org/pub/DIST Follow the installation instructions included in the README file. DESCRIPTION =========== The Bio::Tools::SeqPattern.pm module encapsulates generic data and methods for manipulating regular expressions describing nucleic or amino acid sequence patterns (a.k.a, "motifs"). Bio::Tools::SeqPattern.pm is a concrete class that inherits from *Bio::Seq.pm*. This class grew out of a need to have a standard module for doing routine tasks with sequence patterns such as: -- Forming a reverse-complement version of a nucleotide sequence pattern -- Expanding patterns containing ambiguity codes -- Checking for invalid regexp characters -- Untainting yet preserving special characters in the pattern Other features to look for in the future: - Full pattern syntax checking - Conversion between expanded and ondensed forms of the pattern MOTIVATIONS =========== A key motivation for Bio::Tools::SeqPattern.pm is to have a way to generate a reverse complement of a nucleotide sequence pattern. This makes possible simultaneous pattern matching on both sense and anti-sense strands of a query sequence. In principle, one could do such a search more inefficiently by testing ainst both sense and anti-sense versions of a sequence. It is entirely equivalent to test a regexp containing both sense and anti-sense versions of the *pattern* against one copy of the sequence. The latter approach is much more efficient since: 1) You need only one copy of the sequence. 2) Only one regexp is executed. 3) Regexp patterns are typically much smaller than sequences. Patterns can be quite complex and it is often difficult to generate the reverse complement pattern. The Bioperl SeqPattern.pm addresses this problem, providing a convenient set of tools for working with biological sequence regular expressions. Not all patterns have been tested. If you discover a pattern that is not handled properly by Bio::Tools::SeqPattern.pm, please send me some email (sac@genome.stanford.edu). Thanks. OTHER FEATURES ============== Extended Alphabet Support ------------------------- This module supports the same set of ambiguity codes for nucleotide sequences as supported by *Bio::Seq.pm*. These ambiguity codes define the behavior or the expand() method. Amino acid alphabet support is different from that of Seq.pm (see below). ------------------------------------------ Symbol Meaning Nucleic Acid ------------------------------------------ A A Adenine C C Cytosine G G Guanine T T Thymine U U Uracil M A or C R A or G Any purine W A or T S C or G Y C or T Any pyrimidine K G or T V A or C or G H A or C or T D A or G or T B C or G or T X G or A or T or C N G or A or T or C . G or A or T or C ------------------------------------------ Symbol Meaning ------------------------------------------ A Alanine C Cysteine D Aspartic Acid E Glutamic Acid F Phenylalanine G Glycine H Histidine I Isoleucine K Lysine L Leucine M Methionine N Asparagine P Proline Q Glutamine R Arginine S Serine T Threonine V Valine W Tryptophan Y Tyrosine B Any hydrophobic: IFVLWMAGCY Z Any hydrophilic: TSHEDQNKR X Any amino acid . Any amino acid Multiple Format Support ----------------------- Ultimately, this module should be able to build SeqPattern.pm objects objects using a variety of pattern formats such as ProSite, Blocks, Prints, GCG, etc. Currently, this module only supports patterns using a grep-like syntax. USAGE ===== A simple demo script is included with the central Bioperl distribution (`INSTALLATION' in this node) and is also available from: http://bio.perl.org/Core/Examples/seq_pattern.pl SEE ALSO ======== Bio::Root::Object.pm - Base class. Bio::Seq.pm - Lightweight sequence object. http://bio.perl.org/Projects/modules.html - Online module documentation http://bio.perl.org/ - Bioperl Project Homepage FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR ====== Steve A. Chervitz, sac@genome.stanford.edu See the `FEEDBACK' in this node section for where to send bug reports and comments. VERSION ======= Bio::Tools::SeqPattern.pm, 0.011 COPYRIGHT ========= Copyright (c) 1997-8 Steve A. Chervitz. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. new === Title : new Usage : my $seqpat = new Bio::Tools::SeqPattern(); Purpose : Verifies that the type is correct for superclass (Bio::Seq.pm) : and calls superclass constructor last. Returns : n/a Argument : Parameters passed to new() Throws : Exception if the pattern string (seq) is empty. Comments : The process of creating a new SeqPattern.pm object : ensures that the pattern string is untained. See Also : `_untaint_pat' in this node(), *Bio::Root::RootI::new()*, *Bio::Seq::_initialize()* alphabet_ok =========== Title : alphabet_ok Usage : $mypat->alphabet_ok; Purpose : Checks for invalid regexp characters. : Overrides Bio::Seq::alphabet_ok() to allow : additional regexp characters ,.*()[]<>{}^$ : in addition to the standard genetic alphabet. : Also untaints the pattern and sets the sequence : object's sequence to the untained string. Returns : Boolean (1 | 0) Argument : n/a Throws : Exception if the pattern contains invalid characters. Comments : Does not call the superclass method. : Actaully permits any alphanumeric, not just the : standard genetic alphabet. See Also : *Bio::Seq::alphabet_ok()*, `_initialize' in this node() expand ====== Title : expand Usage : $seqpat_object->expand(); Purpose : Expands the sequence pattern using special ambiguity codes. Example : $pat = $seq_pat->expand(); Returns : String containing fully expanded sequence pattern Argument : n/a Throws : Exception if sequence type is not recognized : (i.e., is not one of [DR]NA, Amino) See Also : Extended Alphabet Support, `_expand_pep' in this node(), `_exapand_nuc' in this node() _expand_pep =========== Title : _expand_pep Usage : n/a; automatically called by expand() Purpose : Expands peptide patterns Returns : String (the expanded pattern) Argument : String (the unexpanded pattern) Throws : n/a See Also : `expand' in this node(), `_expand_nuc' in this node() _expand_nuc =========== Title : _expand_nuc Purpose : Expands nucleotide patterns Returns : String (the expanded pattern) Argument : String (the unexpanded pattern) Throws : n/a See Also : `expand' in this node(), `_expand_pep' in this node() revcom ====== Title : revcom Usage : revcom([1]); Purpose : Forms a pattern capable of recognizing the reverse complement : version of a nucleotide sequence pattern. Example : $pattern_object->revcom(); : $pattern_object->revcom(1); ## returns expanded rev complement pattern. Returns : Object reference for a new Bio::Tools::SeqPattern containing : the revcom of the current pattern as its sequence. Argument : (1) boolean (optional) (default= false) : true : expand the pattern before rev-complementing. : false: don't expand pattern before or after rev-complementing. Throws : Exception if called for amino acid sequence pattern. Comments : This method permits the simultaneous searching of both : sense and anti-sense versions of a nucleotide pattern : by means of a grep-type of functionality in which any : number of patterns may be or-ed into the recognition : pattern. : Overrides Bio::Seq::revcom() and calls it first thing. : The order of _fixpat() calls is critical. See Also : *Bio::Seq::revcom()*, `_fixpat_1' in this node(), `_fixpat_2' in this node(), `_fixpat_3' in this node(), `_fixpat_4' in this node(), `_fixpat_5' in this node() _fixpat_1 ========= Title : _fixpat_1 Usage : n/a; called automatically by revcom() Purpose : Utility method for revcom() : Converts all {7,5} --> {5,7} (Part I) : and [T^] --> [^T] (Part II) : and *N --> N* (Part III) Returns : String (the new, partially reversed pattern) Argument : String (the expanded pattern) Throws : n/a See Also : `revcom' in this node() _fixpat_2 ========= Title : _fixpat_2 Usage : n/a; called automatically by revcom() Purpose : Utility method for revcom() : Converts all {5,7}Y ---> Y{5,7} : and {10,}. ---> .{10,} Returns : String (the new, partially reversed pattern) Argument : String (the expanded, partially reversed pattern) Throws : n/a See Also : `revcom' in this node() _fixpat_3 ========= Title : _fixpat_3 Usage : n/a; called automatically by revcom() Purpose : Utility method for revcom() : Converts all {5,7}(XXX) ---> (XXX){5,7} Returns : String (the new, partially reversed pattern) Argument : String (the expanded, partially reversed pattern) Throws : n/a See Also : `revcom' in this node() _fixpat_4 ========= Title : _fixpat_4 Usage : n/a; called automatically by revcom() Purpose : Utility method for revcom() : Converts all {5,7}[XXX] ---> [XXX]{5,7} Returns : String (the new, partially reversed pattern) Argument : String (the expanded, partially reversed pattern) Throws : n/a See Also : `revcom' in this node() _fixpat_5 ========= Title : _fixpat_5 Usage : n/a; called automatically by revcom() Purpose : Utility method for revcom() : Converts all *[XXX] ---> [XXX]* : and *(XXX) ---> (XXX)* Returns : String (the new, partially reversed pattern) Argument : String (the expanded, partially reversed pattern) Throws : n/a See Also : `revcom' in this node() _fixpat_6 ========= Title : _fixpat_6 Usage : n/a; called automatically by revcom() Purpose : Utility method for revcom() : Converts all ?Y{5,7} ---> Y{5,7}? : and ?(XXX){5,7} ---> (XXX){5,7}? : and ?[XYZ]{5,7} ---> [XYZ]{5,7}? Returns : String (the new, partially reversed pattern) Argument : String (the expanded, partially reversed pattern) Throws : n/a See Also : `revcom' in this node() str --- Title : str Usage : $obj->str($newval) Function: Returns : value of str Args : newvalue (optional) type ---- Title : type Usage : $obj->type($newval) Function: Returns : value of type Args : newvalue (optional) FOR DEVELOPERS ONLY =================== Data Members ------------ Information about the various data members of this module is provided for those wishing to modify or understand the code. Two things to bear in mind: 1. Do NOT rely on these in any code outside of this module. All data members are prefixed with an underscore to signify that they are private. Always use accessor methods. If the accessor doesn't exist or is inadequate, create or modify an accessor (and let me know, too!). 2. This documentation may be incomplete and out of date. It is easy for this documentation to become obsolete as this module is still evolving. Always double check this info and search for members not described here. An instance of Bio::Tools::RestrictionEnzyme.pm is a blessed reference to a hash containing all or some of the following fields: FIELD VALUE ------------------------------------------------------------------------ _rev : The corrected reverse complement of the fully expanded pattern. INHERITED DATA MEMBERS: _seq : (From Bio::Seq.pm) The original, unexpanded input sequence after untainting. _type : (From Bio::Seq.pm) 'Dna' or 'Amino'  File: pm.info, Node: Bio/Tools/SeqStats, Next: Bio/Tools/SeqWords, Prev: Bio/Tools/SeqPattern, Up: Module List Object holding statistics for one particular sequence ***************************************************** NAME ==== Bio::Tools::SeqStats - Object holding statistics for one particular sequence SYNOPSIS ======== # build a primary nucleic acid or protein sequence object somehow # then build a statistics object from the sequence object $seqobj = Bio::PrimarySeq->new(-seq=>'ACTGTGGCGTCAACTG', -moltype = 'dna', -id = 'test'); $seq_stats = Bio::Tools::SeqStats->new(-seq=>$seqobj); # obtain a hash of counts of each type of monomer # (ie amino or nucleic acid) $hash_ref = $seq_stats->count_monomers(); # eg for DNA sequence foreach $base ( sort keys $$hash_ref) { print "Number of bases of type ",$base "= ",%$hash_ref{$base}"\n"; } # or obtain the count directly without creating a new statistics object $hash_ref = Bio::Tools::SeqStats->count_monomers($seqobj); foreach $base ( sort keys $$hash_ref) { print "Number of bases of type ",$base "= ",%$hash_ref{$base}"\n"; } # obtain hash of counts of each type of codon in a nucleic acid sequence $hash_ref = $seq_stats-> count_codons(); # for nucleic acid sequence # or $hash_ref = Bio::Tools::SeqStats->count_codons($seqobj); # Obtain the molecular weight of a sequence. Since the sequence may contain # ambiguous monomers, the molecular weight is returned as a (reference to) a # two element array containing greatest lower bound (GLB) and least upper bound # (LUB) of the molecular weight $weight = $seq_stats->get_mol_wt(); # or $weight = Bio::Tools::SeqStats->get_mol_wt($seqobj); print "Molecular weight of sequence ", $seqobj->id(), " is greater than ", $$weight[0], " and less than " , $$weight[1], "\n"; DESCRIPTION =========== Bio::Tools::SeqStats is a lightweight object for the calculation of simple statistical and numerical properties of a sequence. By "lightweight" I mean that only "primary" sequences are handled by the object. The calling script needs to create the appropriate primary sequence to be passed to SeqStats if statistics on a sequence feature are required. Similarly if a codon count is desired for a frame-shifted sequence and/or a negative strand sequence, the calling script needs to create that sequence and pass it to the SeqStats object. SeqStats can be called in two distinct manners. If only a single computation is required on a given sequence object, the method can be called easily using the SeqStats object directly: $weight = Bio::Tools::SeqStats->get_mol_wt($seqobj); Alternately, if several computations will be required on a given sequence object, an "instance" statistics object can be constructed and used for the method calls: $seq_stats = Bio::Tools::SeqStats->new($seqobj); $monomers = $seq_stats->count_monomers(); $codons = $seq_stats->count_codons(); $weight = $seq_stats->get_mol_wt(); As currently implemented the object can return the following values from a sequence: * The molecular weight of the sequence: get_mol_wt() * The number of each type of monomer present: count_monomers() * The number of each codon present in a nucleic acid sequence: count_codons() For dna (and rna) sequences, single-stranded weights are returned. The molecular weights are calculated for neutral - ie not ionized - nucleic acids. The returned weight is the sum of the base-sugar-phosphate residues of the chain plus one weight of water to to account for the additional OH on the phosphate of the 5' residue and the additional H on the sugar ring of the 3' residue. Note that this leads to a difference of 18 in calculated molecular weights compared to some other available programs (eg Informax VectorNTI). Note that since sequences may contain ambiguous monomers (eg "M" meaning "A" or "C" in a nucleic acid sequence), the method get_mol_wt returns a two-element array containing the greatest lower bound and least upper bound of the molecule. (For a sequence with no ambiguous monomers, the two elements of the returned array will be equal.) The method count_codons() handles ambiguous bases by simply counting all ambiguous codons together and issuing a warning to that effect. DEVELOPERS NOTES ================ Ewan moved it from Bio::SeqStats to Bio::Tools::SeqStats FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR - Peter Schattner ========================= Email schattner@alum.mit.edu APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ count_monomers -------------- Title : count_monomers Usage : $rcount = $seq_stats->count_monomers(); or $rcount = $seq_stats->Bio::Tools::SeqStats->($seqobj); Function: Counts the number of each type of monomer (amino acid or base) in the sequence. Example : Returns : Reference to a hash in which keys are letters of the genetic alphabet used and values are number of occurrences of the letter in the sequence. Args : None or reference to sequence object Throws : Throws an exception if type of sequence is unknown (ie amino or nucleic)or if unknown letter in alphabet. Ambiguous elements are allowed. get_mol_wt ---------- Title : get_mol_wt Usage : $wt = $seqobj->get_mol_wt() or $wt = Bio::Tools::SeqStats ->get_mol_wt($seqobj); Function: Calculate molecular weight of sequence Example : Returns : Reference to two element array containing lower and upper bounds of molecule molecular weight. (For dna (and rna) sequences, single-stranded weights are returned.) If sequence contains no ambiguous elements, both entries in array are equal to molecular weight of molecule. Args : None or reference to sequence object Throws : Exception if type of sequence is unknown (ie not amino or nucleic) or if unknown letter in alphabet. Ambiguous elements are allowed. count_codons ------------ Title : count_codons Usage : $rcount = $seqstats->count_codons (); or $rcount = Bio::Tools::SeqStats->count_codons($seqobj); Function: Counts the number of each type of codons in a given frame for a dna or rna sequence. Example : Returns : Reference to a hash in which keys are codons of the genetic alphabet used and values are number of occurrences of the codons in the sequence. All codons with "ambiguous" bases are counted together. Args : None or reference to sequence object Throws : an exception if type of sequence is unknown or protein. _is_alphabet_strict ------------------- Title : _is_alphabet_strict Usage : Function: internal function to determine whether there are any ambiguous elements in the current sequence Example : Returns : 1 if strict alphabet is being used, 0 if ambiguous elements are present Args : Throws : an exception if type of sequence is unknown (ie amino or nucleic) or if unknown letter in alphabet. Ambiguous monomers are allowed. _print_data ----------- Title : _print_data Usage : $seqobj->_print_data() or Bio::Tools::SeqStats->_print_data(); Function: Displays dna / rna parameters (used for debugging) Returns : 1 Args : None Used for debugging.  File: pm.info, Node: Bio/Tools/SeqWords, Next: Bio/Tools/Sigcleave, Prev: Bio/Tools/SeqStats, Up: Module List Object holding n-mer statistics for one sequence ************************************************ NAME ==== Bio::Tools::SeqWords - Object holding n-mer statistics for one sequence SYNOPSIS ======== Take a sequence object from eg, an inputstream, and creates an object for the purposes of holding n-mer word statistics about that sequence. The sequence can be nucleic acid or protein, but the module is probably most relevant for DNA. The words are counted in a non-overlapping manner, ie. in the style of a codon table, but with any word length. For overlapping word counts, a sequence can be 'shifted' to remove the first character and then the count repeated. For counts on opposite strand (DNA/RNA), a reverse complement method should be performed, and then the count repeated. Creating the SeqWords object, eg: my $inputstream = Bio::SeqIO->new( -file => "seqfile", -format => 'Fasta'); my $seqobj = $inputstream->next_seq(); my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj); or: my $seqobj = Bio::PrimarySeq->new(-seq=>'[cut and paste a sequence here]', -moltype = 'dna', -id = 'test'); my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj); obtain a hash of word counts, eg: my $hash_ref = $seq_stats->count_words($word_length); display hash table, eg: my %hash = %$hash_ref; foreach my $key(sort keys %hash) { print "\n$key\t$hash{$key}"; } or my $hash_ref = Bio::SeqWords->count_words($seqobj,$word_length); DESCRIPTION =========== Bio:SeqWords is a featherweight object for the calculation of n-mer word occurrences in a single sequence. It is envisaged that the object will be useful for construction of scripts which use n-mer word tables as the raw material for statistical calculations; for instance, hexamer frequency for the calculation of coding protential, or the calculation of periodicity in repetitive DNA. Triplet frequency is already handled by Bio::SeqStats.pm (author: Peter Schattner). There are a few possible applications for protein, eg: hypothesised amino acid 7-mers in heat shock proteins, or proteins with multiple simple motifs. Sometimes these protein periodicities are best seen when the amino acid alphabet is truncated, eg Shulman alphabet. Since there are quite a few of these shortened alphabets, this module does not specify any particular alphabet. See Synopsis above for object creation code. FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR ====== Derek Gatherer, in the loosest sense of the word 'author'. The general shape of the module is lifted directly from Peter Schattner's SeqStats.pm module. The central subroutine to count the words is adapted from original code provided by Dave Shivak, in response to a query on the bioperl mailing list. At least 2 other people provided alternative means (equally good but not used in the end) of performing the same calculation. Thanks to all for your assistance. APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ count_words ----------- Title : count_words Usage : $word_count = $seq_stats->count_words($word_length); or : $word_count = $seq_stats->Bio::SeqWords->($seqobj,$word_length); Function: Counts non-overlapping words within a string : any alphabet is used Example : a sequence ACCGTCCGT, counted at word length 4, : will give the hash : ACCG 1, TCCG 1 Returns : Reference to a hash in which keys are words (any length) of the alphabet : used and values are number of occurrences of the word in the sequence. Args : Word length as scalar and, reference to sequence object if required Throws an exception word length is not a positive integer or if word length is longer than the sequence.  File: pm.info, Node: Bio/Tools/Sigcleave, Next: Bio/Tools/Sim4/Exon, Prev: Bio/Tools/SeqWords, Up: Module List Bioperl object for sigcleave analysis ************************************* NAME ==== Bio::Tools::Sigcleave.pm - Bioperl object for sigcleave analysis SYNOPSIS ======== Object Creation --------------- use Bio::Tools::Sigcleave (); $sigcleave_object = new Bio::Tools::Sigcleave(-file=>'sigtest.aa', -desc=>'test sigcleave protein seq', -type=>'AMINO', -threshold=>'3.5', ); Sigcleave objects can be created via the same methods as Bio::Seq objects. The one additional parameter is "-threshold" which sets the score reporting limit for the algorithim. The above exmple shows a sigcleave object being created from a protein sequence file. See the Bio::Seq documention to see the other ways that objects can be created. Object Methods & Accessors -------------------------- %raw_results = $sigcleave_object->signals; $formatted_output = $sigcleave_object->pretty_print; INSTALLATION ============ This module is included with the central Bioperl distribution: http://bioperl.org/Core/Latest ftp://bioperl.org/pub/DIST Follow the installation instructions included in the README file. DESCRIPTION =========== "Sigcleave" was a program distributed as part of the free EGCG add-on to earlier versions of the GCG Sequence Analysis package. From the EGCG documentation: SigCleave uses the von Heijne method to locate signal sequences, and to identify the cleavage site. The method is 95% accurate in resolving signal sequences from non-signal sequences with a cutoff score of 3.5, and 75-80% accurate in identifying the cleavage site. The program reports all hits above a minimum value. The EGCG Sigcleave program was written by Peter Rice (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs, CB10 1SA, UK). Since EGCG is no longer distributed for the latest versions of GCG, this code was developed to emulate the output of the original program as much as possible for those who lost access to sigcleave when upgrading to newer versions of GCG. There are 2 accessor methods for this object. "signals" will return a perl associative array containing the sigcleave scores keyed by amino acid position. "pretty_print" returns a formatted string similar to the output of the original sigcleave utility. In both cases, the "threshold" setting controls the score reporting level. If no value for threshold is passed in by the user, the code defaults to a reporting value of 3.5. In this implemntation the accessor will never return any score/position pair which does not meet the threshold limit. This is the slightly different from the behaviour of the 8.1 EGCG sigcleave program which will report the highest of the under-threshold results if nothing else is found. Example of pretty_print output: SIGCLEAVE of sigtest from: 1 to 146 Report scores over 3.5 Maximum score 4.9 at residue 131 Sequence: FVILAAMSIQGSA-NLQTQWKSTASLALET | (signal) | (mature peptide) 118 131 Other entries above 3.5 Maximum score 3.7 at residue 112 Sequence: CSRQLFGWLFCKV-HPGAIVFVILAAMSIQGSANLQTQWKSTASLALET | (signal) | (mature peptide) 99 112 USAGE ===== No warranty implied or expressed. Use at your own risk :) Users unfamiliar with the original Sigcleave application should read the von Heijne papers. The emphasis here is on correctly replicating the calls that 8.1 EGCG sigcleave would make. This code has been tested against a non-redundant curated set of 405 Swissprot proteins representing secreted, non-secreted, membrane and transit proteins. Except for the EGCG sigcleave habit of reporting an under-threshold score if nothing better is found the output was identical. The weight matrix in this code is for eukaryote signal sequences. Please see the example script located in the bioperl distribution to see how this code can be used. FEEDBACK ======== When updating and maintaining a module, it helps to know that people are actually using it. Let us know if you find a bug, think this code is useful or have any improvements/features to suggest. Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bioperl.org/bioperl-bugs/ AUTHOR ====== Chris Dagdigian, dag@sonsorol.org & others VERSION ======= Bio::Tools::Sigcleave.pm, $Id: Sigcleave.pm,v 1.12 2000/12/29 07:43:27 lapp Exp $ COPYRIGHT ========= Copyright (c) 1999 Chris Dagdigian & others. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. REFERENCES / SEE ALSO ===================== von Heijne G. (1986) "A new method for predicting signal sequences cleavage sites." Nucleic Acids Res. 14, 4683-4690. von Heijne G. (1987) in "Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit" (Acad. Press, (1987), 113-117). APPENDIX ======== The following documentation describes the various functions contained in this module. Some functions are for internal use and are not meant to be called by the user; they are preceded by an underscore ("_"). _Analyze ======== Title : _Analyze Usage : N/A This is an internal method. Not meant to be called from outside : the package : Purpose : calculates sigcleave score and amino acid position for the : given protein sequence. The score reporting threshold can : be adjusted by passing in the "threshold" parameter during : object construction. If no threshold is passed in, the code : defaults to reporting any scores equal to or above 3.5 : Returns : nothing. results are added to the object Argument : none. Throws : nothing. Comments : nothing. See Also : n/a threshold ========= Title : threshold Usage : $value = $self->threshold : Purpose : Accessor method sigcleave score reporting threshold. : Returns : float. : Argument : none. Throws : none. Comments : none. See Also : n/a signals ======= Title : signals Usage : %sigcleave_results = $sigcleave_object->signals; : Purpose : Accessor method for sigcleave results : Returns : Associative array. The key value represents the amino acid position : and the value represents the score. Only scores that : are greater than or equal to the THRESHOLD value are reported. : Argument : none. Throws : none. Comments : none. See Also : THRESHOLD pretty_print ============ Title : pretty_print Usage : $output = $sigcleave_object->pretty_print; : print $sigcleave_object->pretty_print; : Purpose : Emulates the output of the EGCG Sigcleave : utility. : Returns : A formatted string. Argument : none. Throws : none. Comments : none. See Also : n/a  File: pm.info, Node: Bio/Tools/Sim4/Exon, Next: Bio/Tools/Sim4/Results, Prev: Bio/Tools/Sigcleave, Up: Module List A single exon determined by an alignment **************************************** NAME ==== Bio::Tools::Sim4::Exon - A single exon determined by an alignment SYNOPSIS ======== # See Bio::Tools::Sim4::Results for a description of the context. # an instance of this class is-a Bio::SeqFeature::SimilarityPair # coordinates of the exon (recommended way): print "exon from ", $exon->start(), " to ", $exon->end(), "\n"; # the same (feature1() inherited from Bio::SeqFeature::FeaturePair) print "exon from ", $exon->feature1()->start(), " to ", $exon->feature1()->end(), "\n"; # also the same (query() inherited from Bio::SeqFeature::SimilarityPair): print "exon from ", $exon->query()->start(), " to ", $exon->query()->end(), "\n"; # coordinates on the matching EST (recommended way): print "matches on EST from ", $exon->est_hit()->start(), " to ", $exon->est_hit()->end(), "\n"; # the same (feature2() inherited from Bio::SeqFeature::FeaturePair) print "matches on EST from ", $exon->feature2()->start(), " to ", $exon->feature2()->end(), "\n"; # also the same (subject() inherited from Bio::SeqFeature::SimilarityPair): print "exon from ", $exon->subject()->start(), " to ", $exon->subject()->end(), "\n"; DESCRIPTION =========== This class inherits from Bio::SeqFeature::SimilarityPair and represents an exon on a genomic sequence determined by similarity, that is, by aligning an EST sequence (using Sim4 in this case). Consequently, the notion of query and subject is always from the perspective of the genomic sequence: query refers to the genomic seq, subject to the aligned EST hit. Because of this, $exon->start(), $exon->end() etc will always return what you expect. To get the coordinates on the matching EST, refer to the properties of the feature returned by `est_hit' in this node(). FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR - Ewan Birney, Hilmar Lapp ================================= Email birney@sanger.ac.uk Hilmar Lapp or . Describe contact details here APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ percentage_id ------------- Title : percentage_id Usage : $obj->percentage_id($newval) Function: This is a synonym for 100 * $obj->est_hit()->frac_identical(). Returns : value of percentage_id Args : newvalue (optional) est_hit ------- Title : est_hit Usage : $est_feature = $obj->est_hit(); Function: Returns the EST hit pointing to (i.e., aligned to by Sim4) this exon (i.e., genomic region). At present, merely a synonym for $obj->feature2(). Returns : An Bio::SeqFeatureI implementing object. Args :  File: pm.info, Node: Bio/Tools/Sim4/Results, Next: Bio/Tools/WWW, Prev: Bio/Tools/Sim4/Exon, Up: Module List Results of one Sim4 run *********************** NAME ==== Bio::Tools::Sim4::Results - Results of one Sim4 run SYNOPSIS ======== # to preset the order of EST and genomic file as given on the sim4 # command line: $sim4 = Bio::Tools::Sim4::Results->new(-file => 'result.sim4', -estisfirst => 1); # to let the order be determined automatically (by length comparison): $sim4 = Bio::Tools::Sim4->new( -file => 'sim4.results' ); # filehandle: $sim4 = Bio::Tools::Sim4->new( -fh => \*INPUT ); # parse the results while($exonset = $sim4->next_exonset()) { # $exonset is-a Bio::SeqFeature::Generic with Bio::Tools::Sim4::Exons # as sub features print "Delimited on sequence ", $exonset->seqname(), "from ", $exonset->start(), " to ", $exonset->end() "\n"; foreach $exon ( $exonset->sub_SeqFeature() ) { # $exon is-a Bio::SeqFeature::FeaturePair print "Exon from ", $exon->start, " to ", $exon->end, " on strand ", $exon->strand(), "\n"; # you can get out what it matched using the est_hit attribute $homol = $exon->est_hit(); print "Matched to sequence", $homol->seqname, " at ", $homol->start," to ", $homol->end, "\n"; } } # essential if you gave a filename at initialization (otherwise the file # stays open) $sim4->close(); DESCRIPTION =========== The sim4 module provides a parser and results object for sim4 output. The sim4 results are specialised types of SeqFeatures, meaning you can add them to AnnSeq objects fine, and manipulate them in the "normal" seqfeature manner. The sim4 Exon objects are Bio::SeqFeature::FeaturePair inherited objects. The $esthit = $exon->est_hit() is the alignment as a feature on the matching object (normally, an EST), in which the start/end points are where the hit lies. To make this module work sensibly you need to run sim4 genomic.fasta est.database.fasta or sim4 est.fasta genomic.database.fasta To get the sequence identifiers recorded for the first sequence, too, use A=4 as output option for sim4. One fiddle here is that there are only two real possibilities to the matching criteria: either one sequence needs reversing or not. Because of this, it is impossible to tell whether the match is in the forward or reverse strand of the genomic DNA. We solve this here by assuming that the genomic DNA is always forward. As a consequence, the strand attribute of the matching EST is unknown, and the strand attribute of the genomic DNA (i.e., the Exon object) will reflect the direction of the hit. See the documentation of parse_next_alignment() for abilities of the parser to deal with the different output format options of sim4. FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR - Ewan Birney, Hilmar Lapp ================================= Email birney@sanger.ac.uk hlapp@gmx.net (or hilmar.lapp@pharma.novartis.com) Describe contact details here APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ analysis_method --------------- Usage : $sim4->analysis_method(); Purpose : Inherited method. Overridden to ensure that the name matches /sim4/i. Returns : String Argument : n/a parse_next_alignment -------------------- Title : parse_next_alignment Usage : @exons = $sim4_result->parse_next_alignment; foreach $exon (@exons) { # do something } Function: Parses the next alignment of the Sim4 result file and returns the found exons as an array of Bio::Tools::Sim4::Exon objects. Call this method repeatedly until an empty array is returned to get the results for all alignments. The $exon->seqname() attribute will be set to the identifier of the respective sequence for both sequences if A=4 was used in the sim4 run, and otherwise for the second sequence only. If the output does not contain the identifier, the filename stripped of path and extension is used instead. In addition, the full filename will be recorded for both features ($exon inherits off Bio::SeqFeature::SimilarityPair) as tag 'filename'. The length is accessible via the seqlength() attribute of $exon->query() and $exon->est_hit(). Note that this method is capable of dealing with outputs generated with format 0,1,3, and 4 (via the A=n option to sim4). It automatically determines which of the two sequences has been reversed, and adjusts the coordinates for that sequence. It will also detect whether the EST sequence(s) were given as first or as second file to sim4, unless this has been specified at creation time of the object. Example : Returns : An array of Bio::Tools::Sim4::Exon objects Args : next_exonset ------------ Title : next_exonset Usage : $exonset = $sim4_result->parse_next_exonset; print "Exons start at ", $exonset->start(), "and end at ", $exonset->end(), "\n"; foreach $exon ($exonset->sub_SeqFeature()) { # do something } Function: Parses the next alignment of the Sim4 result file and returns the set of exons as a container of features. The container is itself a Bio::SeqFeature::Generic object, with the Bio::Tools::Sim4::Exon objects as sub features. Start, end, and strand of the container will represent the total region covered by the exons of this set. See the documentation of parse_next_alignment() for further reference about parsing and how the information is stored. Example : Returns : An Bio::SeqFeature::Generic object holding Bio::Tools::Sim4::Exon objects as sub features. Args : next_feature ------------ Title : next_feature Usage : while($exonset = $sim4->next_feature()) { # do something } Function: Does the same as L. See there for documentation of the functionality. Call this method repeatedly until FALSE is returned. The returned object is actually a SeqFeatureI implementing object. This method is required for classes implementing the SeqAnalysisParserI interface, and is merely an alias for next_exonset() at present. Example : Returns : A Bio::SeqFeature::Generic object. Args :  File: pm.info, Node: Bio/Tools/WWW, Next: Bio/Tools/pSW, Prev: Bio/Tools/Sim4/Results, Up: Module List Bioperl manager for web resources related to biology. ***************************************************** NAME ==== Bio::Tools::WWW.pm - Bioperl manager for web resources related to biology. SYNOPSIS ======== Object Creation --------------- use Bio::Tools qw(:obj); $pdb = $BioWWW->home_url('pdb'); There is no need to create a new Bio::Tools::WWW.pm object when the `:obj' tag is used. This tag will import the static $BioWWW object created by Bio::Tools::WWW.pm into your name space. This saves you from having to call `new Bio::Tools::WWW'. You are free to not use the :obj tag and create the object as you like, but a Bio::Tools::WWW object is not configurable; any given script only needs a single copy. INSTALLATION ============ This module is included with the central Bioperl distribution: http://bio.perl.org/Core/Latest ftp://bio.perl.org/pub/DIST You also need to define URLs for the following variables in this package: $Not_found_url : Generic page to show in place of a 404 error. $Tmp_url : Web-accessible site that is Used for scripts that need to generate temporary, web-accessible files. The files need not necessarily be HTML files, but being on the same disk as the server will permit faster IO from server scripts. DESCRIPTION =========== Bio::Tools::WWW is primarily a URL broker for a select set of sites related to bioinformatics/genome analysis. It definitely represents a biased, unexhaustive set. It might be more accurate to call this module "Bio::Tools::URL.pm". But this module does handle some non-URL things and it may do more of this in the future. Having one module to cover all biologically relevant web utilities makes it more convenient, especially at this early stage of development. Maintaining accurate URLs over time can be challenging as new web sites spring up and old sites are re-organized. Because of this fact, the URLs in this module are not guaranteed to be correct or exhaustive and will require periodic updating. URL Management -------------- By keeping URL management within Bio::Tools::WWW.pm, other generic modules can easily access a variety of different web sites without having to know about a potential multitude of specific modules specialized for one database or another. A specific example of this is in Bio::Tools::Blast.pm where the function blast_to_html() needs access to different URLs in order to add database links to the Blast report. An alternative approach would be to have multiple blast_to_html() functions defined within modules specialized for Blast analyses of different datasets. This, however, may create maintenance headaches when updating the different versions of the function. Complex Websites ---------------- Websites with complex datasets may require special treatment within this module. As an example, URLs for the Saccharomyces Genome Database are clustered separately in this module, due to (1) the different ways to access information at this database and (2) the familiarity of the developer with this database. The Bio::SGD::WWW.pm inherits from Bio::Tools::WWW.pm to permit access to the URLs provided by Bio::Tools::WWW.pm and to SGD-specific HTML and images. The organization of Bio::Tools::WWW.pm is expected to evolve as websites get born, die, and mutate their APIs. SEE ALSO ======== http://bio.perl.org/Projects/modules.html - Online module documentation http://bio.perl.org/ - Bioperl Project Homepage FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. vsns-bcd-perl@lists.uni-bielefeld.de - General discussion vsns-bcd-perl-guts@lists.uni-bielefeld.de - Technically-oriented discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR ====== Steve A. Chervitz, sac@genome.stanford.edu VERSION ======= Bio::Tools::WWW.pm, 0.014 COPYRIGHT ========= Copyright (c) 1996-98 Steve A. Chervitz. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. APPENDIX ======== Methods beginning with a leading underscore are considered private and are intended for internal use by this module. They are not considered part of the public interface and are described here for documentation purposes only. home_url -------- Usage : $BioWWW->home_url() Purpose : To obtain the homepage URL for a biological database or resource. Returns : String containing the URL (including "http://") Argument : String : Currently acceptable arguments are: : bioperl bioperl-schema biomoo bsm ebi emotif entrez : expasy mips mmdb ncbi pir pfam pdb geneQuiz : molMov pubmed sacch3d sgd scop swissProt webmol ypd Throws : Warns if argument cannot be resolved to a URL. Comments : The URLs listed here do not represent a complete list. : Expect this to evolve and grow with time. See Also : `search_url' in this node() search_url ---------- Usage : $BioWWW->search_url() Purpose : To provide a URL stem for a search engine at a biological database : or resource. Returns : String containing the URL (including "http://") Argument : String : Currently acceptable arguments are: : 3db embl cath ec1 ec2 ec3 emotif_id entrez gb1 gb2 : gb3 gb4 gb5 pdb medline mmdb pdb pdb_coord pfam pir_acc : pdbSum molMov swpr swModel swprSearch scop scop_pdb scop_data : ypd Throws : Warns if argument cannot be resolved to a URL. Comments : Unlike the homepage URLs, this method does not return a complete : URL but a stem which must be further modified, typically by : appending data to it, before it can be used. The data appended : depends on the specific URL; typically, it is a database ID or : other unique identifier. : The requirements for each URL will be described here eventually. : : The URLs listed here do not represent a complete list. : Expect this to evolve and grow with time. : : Given this complexity, it may be useful to provide special methods : for these different URLs. This would however result in an : explosion of methods that might make this module less : maintainable and harder to use. See Also : `home_url' in this node() stem_url -------- Usage : $BioWWW->stem_url() Purpose : To obtain the minimal stem URL for searching a biological database or resource. Returns : String containing the URL (including "http://") Argument : String : Currently acceptable arguments are: : emotif entrez pdb Throws : Warns if argument cannot be resolved to a URL. Comments : The URLs stems returned by this method are much more minimal than : this provided by search_url(). Use of these stems requires knowledge : of the CGI scripts which they invoke. See Also : `search_url' in this node() viewer_url ---------- Usage : $BioWWW->viewer_url() Purpose : To obtain the stem URL for a 3D viewer (RasMol, WebMol, Cn3D) Returns : String containing the URL (including "http://") Argument : String : Currently acceptable arguments are: : rasmol webmol cn3d java (java is an alias for webmol) Throws : Warns if argument cannot be resolved to a URL. Comments : The 4-letter Brookhaven PDB identifier must be appended to the : URL provided by this method. : The URLs listed here do not represent a complete list. : Expect this to evolve and grow with time. not_found_url ------------- Usage : $BioWWW->not_found_url() Purpose : To obtain the URL for a web page to be shown in place of a 404 error. Returns : String containing the URL (including "http://") Argument : n/a Throws : n/a Comments : This URL should be customized as desired. tmp_url ------- Usage : $BioWWW->tmp_url() Purpose : To obtain the URL for a temporary, web-accessible directory. Returns : String containing the URL (including "http://") Argument : n/a Throws : n/a Comments : This URL should be customized as desired. search_link ----------- Usage : $BioWWW->search_link(, , ) Purpose : Wrapper for search_url() that returns the URL within an HTML anchor. Returns : String containing the HTML anchor ( qq||) Argument : = string to be used as argument for search_url() : = string to be appended to the search URL stem. : = string to be shown as the link text (default = ). Throws : n/a Status : Experimental See Also : `search_url' in this node() viewer_link ----------- Usage : $BioWWW->viewer_link(, , ) Purpose : Wrapper for viewer_url() that returns the complete URL within an HTML anchor. Returns : String containing the HTML anchor ( qq||) Argument : = string to be used as argument for viewer_url() : = string to be appended to the viewer URL stem. : = string to be shown as the link text (default = ). Throws : n/a Status : Experimental See Also : `viewer_url' in this node() html ---- Usage : $BioWWW->html() Purpose : To obtain HTML-formatted text for frequently needed web-page messages. Returns : String containing the HTML anchor ( qq||) Argument : String. : Currently acceptable arguments are: : authority (mailto: link for webmaster; shows e-mail address as link) : notify (wraps mailto:authority link with text for link "please notify us") : ourFault ("this problem is our fault. If it persists ") : trouble (same as ourFault but doesn't blame us for the problem) : techDiff ("we are experiencing technical difficulties. Please stand by.") Throws : n/a Comments : The authority (webmaster) is imported from the Bio::Root::Global.pm : module. The value for $AUTHORITY should be set there, or : customize this module so that it doesn't use Bio::Root::Global.pm. sgd_url ------- Usage : $BioWWW->sgd_url() Purpose : To obtain the webpage URL or search stem for SGD. Returns : String containing the URL (including "http://") Argument : String : Currently acceptable arguments (TODO). Throws : Warns if argument cannot be resolved to a URL. Comments : This accessor is specialized for the Saccharomyces Genome Database. : It is possible that it will be moved to SGD::WWW.pm in the future. See Also : `search_url' in this node() s3d_url ------- Usage : $BioWWW->s3d_url() Purpose : To obtain the webpage URL or search stem for Sacch3D. Returns : String containing the URL (including "http://") Argument : String : Currently acceptable arguments (TODO). Throws : Warns if argument cannot be resolved to a URL. Comments : This accessor is specialized for the Saccharomyces Genome Database. : It is possible that it will be moved to SGD::WWW.pm in the future. See Also : `search_url' in this node() sgd_stem_url ------------ Usage : $BioWWW->sgd_stem_url() Purpose : To obtain the minimal stem URL for a SGD/Sacch3D CGI script. Returns : String containing the URL (including "http://") Argument : String : Currently acceptable arguments (TODO). Throws : Warns if argument cannot be resolved to a URL. Comments : This accessor is specialized for the Saccharomyces Genome Database. : It is possible that it will be moved to SGD::WWW.pm in the future. See Also : `search_url' in this node() s3d_link -------- Usage : $BioWWW->s3d_link(, , ) Purpose : Wrapper for s3d_url() that returns the complete URL within an HTML anchor. Returns : String containing the URL (including "http://") Argument : = string to be used as argument for s3d_url() : = string to be appended to the s3d URL stem. : = string to be shown as the link text (default = ). Throws : n/a Status : Experimental Comments : This accessor is specialized for the Saccharomyces Genome Database. : It is possible that it will be moved to SGD::WWW.pm in the future. See Also : `s3d_url' in this node(), `sgd_link' in this node() sgd_link -------- Usage : $BioWWW->sgd_link(, , ) Purpose : Wrapper for sgd_url() that returns the complete URL within an HTML anchor. Returns : String containing the URL (including "http://") Argument : = string to be used as argument for sgd_url() : = string to be appended to the sgd URL stem. : = string to be shown as the link text (default = ). Throws : n/a Status : Experimental Comments : This accessor is specialized for the Saccharomyces Genome Database. : It is possible that it will be moved to SGD::WWW.pm in the future. See Also : `sgd_url' in this node(), `s3d_link' in this node() start_html ---------- Usage : $BioWWW->start_html() Purpose : Prints the "Content-type: text/html\n\n\n" header. Returns : n/a; This method prints the Content-type string shown above. Argument : n/a Throws : n/a Status : Experimental Comments : This method prevents redundant invocations thus avoiding th : accidental printing of the "content-type..." on the page. : If using L. Stein's CGI.pm, this is similar to $query->header() : (Does CGI.pm prevent redundant invocation?) redirect -------- Usage : $BioWWW->redirect() Purpose : Prints the header needed to redirect a web browser to a supplied URL. Returns : n/a; Prints the redirection header. Argument : String containing the URL to be redirected to. Throws : n/a Status : Experimental pre --- Usage : $BioWWW->pre("text to be pre-formatted"); Purpose : To produce HTML for text that is not to be formated by the brower. Returns : String containing the "
" formatted html.
     Argument  : n/a
     Throws    : n/a
     Status    : Experimental

strip_html
----------

     Usage     : $boolean = &strip_html( string_ref, [fast] );
     Purpose   : Removes HTML formatting from a supplied string.
     Returns   : Boolean: true if string was stripped, false if not.
     Argument  : string_ref = reference to a string containing the whole
               :              web page to be stripped.
               : fast = a non-zero value. Optional. If set, a faster
               :        but perhaps less thorough procedure is used for
               :        stripping. Default = not fast.
     Throws    : Exception if the argument is not a scalar reference.
     Comments  : Based on code originally written by Alex Dong Li
               : (ali@genet.sickkids.on.ca).
               : This is a more generic version of the function that appears
               : in Bio::Tools::Blast::HTML.pm
               : This version does not perform any Blast-specific stripping.
               :
               : This employs a simple method for removing tags that
               : will fail under following conditions:
               :  1) if quoted > appears in a tag  (does this ever happen?)
               :  2) if a tag is split over multiple lines and this method is
               :     used to process one line at a time.
               :
               : Without fast mode, large HTML files can take exceedingly long times to
               : strip (e.g., 1Meg file with many tags can take 10 minutes versus 5 seconds
               : in fast mode. Try the swissprot yeast table). If you know the HTML to be
               : well-behaved (i.e., tags are not split across mutiple lines), use fast
               : mode for large, dense files.

Data Members
------------

   An instance of Bio::Tools::WWW.pm is a blessed reference to a hash
containing all or some of the following fields:

     FIELD           VALUE
     --------------------------------------------------------------
     _started_html   Defined the on the initial invocation of start_html()
                     to avoid duplicate printing out the "Content-type..." header.