This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: Bio/Tools/SeqPattern,  Next: Bio/Tools/SeqStats,  Prev: Bio/Tools/SeqAnal,  Up: Module List

Bioperl object for a sequence pattern or motif
**********************************************

NAME
====

   Bio::Tools::SeqPattern.pm - Bioperl object for a sequence pattern or
motif

SYNOPSIS
========

Object Creation
---------------

     use Bio::Tools::SeqPattern ();

     $pat1     = 'T[GA]AA...TAAT';
     $pattern1 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Dna');

     $pat2     = '[VILM]R(GXX){3,2}...[^PG]';
     $pattern2 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Amino');

INSTALLATION
============

   This module is included with the central Bioperl distribution:

     http://bio.perl.org/Core/Latest
     ftp://bio.perl.org/pub/DIST

   Follow the installation instructions included in the README file.

DESCRIPTION
===========

   The Bio::Tools::SeqPattern.pm module encapsulates generic data and
methods for manipulating regular expressions describing nucleic or amino
acid sequence patterns (a.k.a, "motifs").

   Bio::Tools::SeqPattern.pm is a concrete class that inherits from
*Bio::Seq.pm*.

   This class grew out of a need to have a standard module for doing
routine tasks with sequence patterns such as:

     -- Forming a reverse-complement version of a nucleotide sequence pattern
     -- Expanding patterns containing ambiguity codes
     -- Checking for invalid regexp characters
     -- Untainting yet preserving special characters in the pattern

   Other features to look for in the future:

   - Full pattern syntax checking   - Conversion between expanded and
ondensed forms of the pattern

MOTIVATIONS
===========

   A key motivation for Bio::Tools::SeqPattern.pm is to have a way to
generate a reverse complement of a nucleotide sequence pattern.  This
makes possible simultaneous pattern matching on both sense and anti-sense
strands of a query sequence.

   In principle, one could do such a search more inefficiently by testing
ainst both sense and anti-sense versions of a sequence.  It is entirely
equivalent to test a regexp containing both sense and anti-sense versions
of the *pattern* against one copy of the sequence.  The latter approach is
much more efficient since:

   1) You need only one copy of the sequence.     2) Only one regexp is
executed.     3) Regexp patterns are typically much smaller than sequences.

   Patterns can be quite complex and it is often difficult to generate the
reverse complement pattern. The Bioperl SeqPattern.pm addresses this
problem, providing a convenient set of tools for working with biological
sequence regular expressions.

   Not all patterns have been tested. If you discover a pattern that is
not handled properly by Bio::Tools::SeqPattern.pm, please send me some
email (sac@genome.stanford.edu). Thanks.

OTHER FEATURES
==============

Extended Alphabet Support
-------------------------

   This module supports the same set of ambiguity codes for nucleotide
sequences as supported by *Bio::Seq.pm*. These ambiguity codes define the
behavior or the expand() method.  Amino acid alphabet support is different
from that of Seq.pm (see below).

     ------------------------------------------
     Symbol       Meaning      Nucleic Acid
     ------------------------------------------
      A            A           Adenine
      C            C           Cytosine
      G            G           Guanine
      T            T           Thymine
      U            U           Uracil
      M          A or C
      R          A or G        Any purine
      W          A or T
      S          C or G
      Y          C or T        Any pyrimidine
      K          G or T
      V        A or C or G
      H        A or C or T
      D        A or G or T
      B        C or G or T
      X      G or A or T or C
      N      G or A or T or C
      .      G or A or T or C

     ------------------------------------------
     Symbol           Meaning
     ------------------------------------------
     A        Alanine
     C        Cysteine
     D        Aspartic Acid
     E        Glutamic Acid
     F        Phenylalanine
     G        Glycine
     H        Histidine
     I        Isoleucine
     K        Lysine
     L        Leucine
     M        Methionine
     N        Asparagine
     P        Proline
     Q        Glutamine
     R        Arginine
     S        Serine
     T        Threonine
     V        Valine
     W        Tryptophan
     Y        Tyrosine

     B        Any hydrophobic: IFVLWMAGCY
     Z        Any hydrophilic: TSHEDQNKR
     X        Any amino acid
     .        Any amino acid

Multiple Format Support
-----------------------

   Ultimately, this module should be able to build SeqPattern.pm objects
objects using a variety of pattern formats such as ProSite, Blocks,
Prints, GCG, etc.  Currently, this module only supports patterns using a
grep-like syntax.

USAGE
=====

   A simple demo script is included with the central Bioperl distribution
(`INSTALLATION' in this node) and is also available from:

     http://bio.perl.org/Core/Examples/seq_pattern.pl

SEE ALSO
========

     Bio::Root::Object.pm    - Base class.
     Bio::Seq.pm             - Lightweight sequence object.

     http://bio.perl.org/Projects/modules.html  - Online module documentation
     http://bio.perl.org/                       - Bioperl Project Homepage

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules.  Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org              - General discussion
     http://bio.perl.org/MailList.html  - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR
======

   Steve A. Chervitz, sac@genome.stanford.edu

   See the `FEEDBACK' in this node section for where to send bug reports
and comments.

VERSION
=======

   Bio::Tools::SeqPattern.pm, 0.011

COPYRIGHT
=========

   Copyright (c) 1997-8 Steve A. Chervitz. All Rights Reserved.  This
module is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.

new
===

     Title     : new
     Usage     : my $seqpat = new Bio::Tools::SeqPattern();
     Purpose   : Verifies that the type is correct for superclass (Bio::Seq.pm)
               : and calls superclass constructor last.
     Returns   : n/a
     Argument  : Parameters passed to new()
     Throws    : Exception if the pattern string (seq) is empty.
     Comments  : The process of creating a new SeqPattern.pm object
               : ensures that the pattern string is untained.

   See Also   : `_untaint_pat' in this node(), *Bio::Root::RootI::new()*,
           *Bio::Seq::_initialize()*

alphabet_ok
===========

     Title     : alphabet_ok
     Usage     : $mypat->alphabet_ok;
     Purpose   : Checks for invalid regexp characters.
               : Overrides Bio::Seq::alphabet_ok() to allow
               : additional regexp characters ,.*()[]<>{}^$
               : in addition to the standard genetic alphabet.
               : Also untaints the pattern and sets the sequence
               : object's sequence to the untained string.
     Returns   : Boolean (1 | 0)
     Argument  : n/a
     Throws    : Exception if the pattern contains invalid characters.
     Comments  : Does not call the superclass method.
               : Actaully permits any alphanumeric, not just the
               : standard genetic alphabet.

   See Also   : *Bio::Seq::alphabet_ok()*, `_initialize' in this node()

expand
======

     Title     : expand
     Usage     : $seqpat_object->expand();
     Purpose   : Expands the sequence pattern using special ambiguity codes.
     Example   : $pat = $seq_pat->expand();
     Returns   : String containing fully expanded sequence pattern
     Argument  : n/a
     Throws    : Exception if sequence type is not recognized
               : (i.e., is not one of [DR]NA, Amino)

   See Also   : Extended Alphabet Support, `_expand_pep' in this node(),
`_exapand_nuc' in this node()

_expand_pep
===========

     Title     : _expand_pep
     Usage     : n/a; automatically called by expand()
     Purpose   : Expands peptide patterns
     Returns   : String (the expanded pattern)
     Argument  : String (the unexpanded pattern)
     Throws    : n/a

   See Also   : `expand' in this node(), `_expand_nuc' in this node()

_expand_nuc
===========

     Title     : _expand_nuc
     Purpose   : Expands nucleotide patterns
     Returns   : String (the expanded pattern)
     Argument  : String (the unexpanded pattern)
     Throws    : n/a

   See Also   : `expand' in this node(), `_expand_pep' in this node()

revcom
======

     Title     : revcom
     Usage     : revcom([1]);
     Purpose   : Forms a pattern capable of recognizing the reverse complement
               : version of a nucleotide sequence pattern.
     Example   : $pattern_object->revcom();
               : $pattern_object->revcom(1); ## returns expanded rev complement pattern.
     Returns   : Object reference for a new Bio::Tools::SeqPattern containing
               : the revcom of the current pattern as its sequence.
     Argument  : (1) boolean (optional) (default= false)
               :     true : expand the pattern before rev-complementing.
               :     false: don't expand pattern before or after rev-complementing.
     Throws    : Exception if called for amino acid sequence pattern.
     Comments  : This method permits the simultaneous searching of both
               : sense and anti-sense versions of a nucleotide pattern
               : by means of a grep-type of functionality in which any
               : number of patterns may be or-ed into the recognition
               : pattern.
               : Overrides Bio::Seq::revcom() and calls it first thing.
               : The order of _fixpat() calls is critical.

   See Also   : *Bio::Seq::revcom()*, `_fixpat_1' in this node(),
`_fixpat_2' in this node(), `_fixpat_3' in this node(), `_fixpat_4' in
this node(), `_fixpat_5' in this node()

_fixpat_1
=========

     Title     : _fixpat_1
     Usage     : n/a; called automatically by revcom()
     Purpose   : Utility method for revcom()
               : Converts all {7,5} --> {5,7}     (Part I)
               :           and [T^] --> [^T]      (Part II)
               :           and *N   --> N*        (Part III)
     Returns   : String (the new, partially reversed pattern)
     Argument  : String (the expanded pattern)
     Throws    : n/a

   See Also   : `revcom' in this node()

_fixpat_2
=========

     Title     : _fixpat_2
     Usage     : n/a; called automatically by revcom()
     Purpose   : Utility method for revcom()
               : Converts all {5,7}Y ---> Y{5,7}
               :          and {10,}. ---> .{10,}
     Returns   : String (the new, partially reversed pattern)
     Argument  : String (the expanded, partially reversed pattern)
     Throws    : n/a

   See Also   : `revcom' in this node()

_fixpat_3
=========

     Title     : _fixpat_3
     Usage     : n/a; called automatically by revcom()
     Purpose   : Utility method for revcom()
               : Converts all {5,7}(XXX) ---> (XXX){5,7}
     Returns   : String (the new, partially reversed pattern)
     Argument  : String (the expanded, partially reversed pattern)
     Throws    : n/a

   See Also   : `revcom' in this node()

_fixpat_4
=========

     Title     : _fixpat_4
     Usage     : n/a; called automatically by revcom()
     Purpose   : Utility method for revcom()
               : Converts all {5,7}[XXX] ---> [XXX]{5,7}
     Returns   : String (the new, partially reversed pattern)
     Argument  : String (the expanded, partially reversed  pattern)
     Throws    : n/a

   See Also   : `revcom' in this node()

_fixpat_5
=========

     Title     : _fixpat_5
     Usage     : n/a; called automatically by revcom()
     Purpose   : Utility method for revcom()
               : Converts all *[XXX]  ---> [XXX]*
               :          and *(XXX)  ---> (XXX)*
     Returns   : String (the new, partially reversed pattern)
     Argument  : String (the expanded, partially reversed pattern)
     Throws    : n/a

   See Also   : `revcom' in this node()

_fixpat_6
=========

     Title     : _fixpat_6
     Usage     : n/a; called automatically by revcom()
     Purpose   : Utility method for revcom()
               : Converts all ?Y{5,7}  ---> Y{5,7}?
               :          and ?(XXX){5,7}  ---> (XXX){5,7}?
               :          and ?[XYZ]{5,7}  ---> [XYZ]{5,7}?
     Returns   : String (the new, partially reversed pattern)
     Argument  : String (the expanded, partially reversed pattern)
     Throws    : n/a

   See Also   : `revcom' in this node()

str
---

     Title   : str
     Usage   : $obj->str($newval)
     Function:
     Returns : value of str
     Args    : newvalue (optional)

type
----

     Title   : type
     Usage   : $obj->type($newval)
     Function:
     Returns : value of type
     Args    : newvalue (optional)

FOR DEVELOPERS ONLY
===================

Data Members
------------

   Information about the various data members of this module is provided
for those wishing to modify or understand the code. Two things to bear in
mind:

  1. Do NOT rely on these in any code outside of this module.  All data
     members are prefixed with an underscore to signify that they are
     private.  Always use accessor methods. If the accessor doesn't exist
     or is inadequate, create or modify an accessor (and let me know,
     too!).

  2. This documentation may be incomplete and out of date.  It is easy for
     this documentation to become obsolete as this module is still
     evolving.  Always double check this info and search for members not
     described here.

        An instance of Bio::Tools::RestrictionEnzyme.pm is a blessed
reference to a hash containing all or some of the following fields:

     FIELD          VALUE
     ------------------------------------------------------------------------
     _rev     : The corrected reverse complement of the fully expanded pattern.

     INHERITED DATA MEMBERS:

     _seq     : (From Bio::Seq.pm) The original, unexpanded input sequence after untainting.
     _type    : (From Bio::Seq.pm) 'Dna' or 'Amino'


File: pm.info,  Node: Bio/Tools/SeqStats,  Next: Bio/Tools/SeqWords,  Prev: Bio/Tools/SeqPattern,  Up: Module List

Object holding statistics for one particular sequence
*****************************************************

NAME
====

   Bio::Tools::SeqStats - Object holding statistics for one particular
sequence

SYNOPSIS
========

     # build a primary nucleic acid or protein sequence object somehow
     # then build a statistics object from the sequence object

     $seqobj = Bio::PrimarySeq->new(-seq=>'ACTGTGGCGTCAACTG',
     			       -moltype = 'dna', -id = 'test');
     $seq_stats  =  Bio::Tools::SeqStats->new(-seq=>$seqobj);

     # obtain a hash of counts of each type of monomer
     # (ie amino or nucleic acid)
     
        $hash_ref = $seq_stats->count_monomers();  # eg for DNA sequence
        foreach $base ( sort keys $$hash_ref) {
     	    print "Number of bases of type ",$base "= ",%$hash_ref{$base}"\n";
     	  }
         # or obtain the count directly without creating a new statistics object
     	$hash_ref = Bio::Tools::SeqStats->count_monomers($seqobj);
     	foreach $base ( sort keys $$hash_ref) {
     	    print "Number of bases of type ",$base "= ",%$hash_ref{$base}"\n";
     	}

     # obtain hash of counts of each type of codon in a nucleic acid sequence
     	$hash_ref = $seq_stats-> count_codons();  # for nucleic acid sequence
     #  or
     	$hash_ref = Bio::Tools::SeqStats->count_codons($seqobj);

   # Obtain the molecular weight of a sequence. Since the sequence may
contain # ambiguous monomers, the molecular weight is returned as a
(reference to) a # two element array containing greatest lower bound (GLB)
and least upper bound # (LUB) of the molecular weight

     $weight = $seq_stats->get_mol_wt();
         #  or
     	$weight = Bio::Tools::SeqStats->get_mol_wt($seqobj);
     	print "Molecular weight of sequence ", $seqobj->id(),
            " is greater than ", $$weight[0], " and less than " ,
            $$weight[1], "\n";

DESCRIPTION
===========

   Bio::Tools::SeqStats is a lightweight object for the calculation of
simple statistical and numerical properties of a sequence. By
"lightweight" I mean that only "primary" sequences are handled by the
object.  The calling script needs to create the appropriate primary
sequence to be passed to SeqStats if statistics on a sequence feature are
required.  Similarly if a codon count is desired for a frame-shifted
sequence and/or a negative strand sequence, the calling script needs to
create that sequence and pass it to the SeqStats object.

   SeqStats can be called in two distinct manners.  If only a single
computation is required on a given sequence object, the method can be
called easily using the SeqStats object directly:

     $weight = Bio::Tools::SeqStats->get_mol_wt($seqobj);

   Alternately, if several computations will be required on a given
sequence object, an "instance" statistics object can be constructed and
used for the method calls:

     $seq_stats  =  Bio::Tools::SeqStats->new($seqobj);
     $monomers = $seq_stats->count_monomers();
     $codons = $seq_stats->count_codons();
     $weight = $seq_stats->get_mol_wt();

   As currently implemented the object can return the following values
from a sequence: 	* The molecular weight of the sequence: get_mol_wt() 	*
The number of each type of monomer present: count_monomers() 	* The
number of each codon present in a nucleic acid sequence: count_codons()

   For dna (and rna) sequences, single-stranded weights are returned. The
molecular weights are calculated for neutral - ie not ionized - nucleic
acids. The returned weight is the sum of the base-sugar-phosphate residues
of the chain plus one weight of water to to account for the additional OH
on the phosphate of the 5' residue and the additional H on the sugar ring
of the 3' residue.  Note that this leads to a difference of 18 in
calculated molecular weights compared to some other available programs (eg
Informax VectorNTI).

   Note that since sequences may contain ambiguous monomers (eg "M"
meaning "A" or "C" in a nucleic acid sequence), the method get_mol_wt
returns a two-element array containing the greatest lower bound and least
upper bound of the molecule. (For a sequence with no ambiguous monomers,
the two elements of the returned array will be equal.) The method
count_codons() handles ambiguous bases by simply counting all ambiguous
codons together and issuing a warning to that effect.

DEVELOPERS NOTES
================

   Ewan moved it from Bio::SeqStats to Bio::Tools::SeqStats

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org               - General discussion
     http://bio.perl.org/MailList.html   - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.   Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR -  Peter Schattner
=========================

   Email schattner@alum.mit.edu

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

count_monomers
--------------

     Title   : count_monomers
     Usage   : $rcount = $seq_stats->count_monomers();
            or $rcount = $seq_stats->Bio::Tools::SeqStats->($seqobj);
     Function: Counts the number of each type of monomer (amino acid or
     	   base) in the sequence.
     Example :
     Returns : Reference to a hash in which keys are letters of the
               genetic alphabet used and values are number of occurrences
               of the letter in the sequence.
     Args    : None or reference to sequence object
     Throws : Throws an exception if type of sequence is unknown (ie amino
              or nucleic)or if unknown letter in alphabet. Ambiguous
              elements are allowed.

get_mol_wt
----------

     Title   : get_mol_wt
     Usage   : $wt = $seqobj->get_mol_wt() or
               $wt = Bio::Tools::SeqStats ->get_mol_wt($seqobj);
     Function: Calculate molecular weight of sequence
     Example :

     Returns : Reference to two element array containing lower and upper
               bounds of molecule molecular weight. (For dna (and rna)
               sequences, single-stranded weights are returned.)  If
               sequence contains no ambiguous elements, both entries in
               array are equal to molecular weight of molecule.
     Args    : None or reference to sequence object
     Throws  : Exception if type of sequence is unknown (ie not amino or
               nucleic) or if unknown letter in alphabet. Ambiguous
               elements are allowed.

count_codons
------------

     Title   : count_codons
     Usage   : $rcount = $seqstats->count_codons (); or
               $rcount = Bio::Tools::SeqStats->count_codons($seqobj);

     Function: Counts the number of each type of codons in a given frame
               for a dna or rna sequence.
     Example :
     Returns : Reference to a hash in which keys are codons of the genetic
               alphabet used and values are number of occurrences of the
               codons in the sequence. All codons with "ambiguous" bases
               are counted together.
     Args    : None or reference to sequence object

     Throws  : an exception if type of sequence is unknown or protein.

_is_alphabet_strict
-------------------

     Title   :   _is_alphabet_strict
     Usage   :
     Function: internal function to determine whether there are
               any ambiguous elements in the current sequence
     Example :
     Returns : 1 if strict alphabet is being used,
               0 if ambiguous elements are present
     Args    :

     Throws  : an exception if type of sequence is unknown (ie amino or
               nucleic) or if unknown letter in alphabet. Ambiguous
               monomers are allowed.

_print_data
-----------

     Title   : _print_data
     Usage   : $seqobj->_print_data() or Bio::Tools::SeqStats->_print_data();
     Function: Displays dna / rna parameters (used for debugging)
     Returns : 1
     Args    : None

   Used for debugging.


File: pm.info,  Node: Bio/Tools/SeqWords,  Next: Bio/Tools/Sigcleave,  Prev: Bio/Tools/SeqStats,  Up: Module List

Object holding n-mer statistics for one sequence
************************************************

NAME
====

   Bio::Tools::SeqWords - Object holding n-mer statistics for one sequence

SYNOPSIS
========

   Take a sequence object from eg, an inputstream, and creates an object
for the purposes of holding n-mer word statistics about that sequence.
The sequence can be nucleic acid or protein, but the module is probably
most relevant for DNA.  The words are counted in a non-overlapping manner,
ie. in the style of a codon table, but with any word length.  For
overlapping word counts, a sequence can be 'shifted' to remove the first
character and then the count repeated.  For counts on opposite strand
(DNA/RNA), a reverse complement method should be performed, and then the
count repeated.

   Creating the SeqWords object, eg:

     my $inputstream = Bio::SeqIO->new( -file => "seqfile", -format =>
     'Fasta');
     my $seqobj = $inputstream->next_seq();
     my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj);

   or:

     my $seqobj = Bio::PrimarySeq->new(-seq=>'[cut and paste a sequence here]',
                                               -moltype = 'dna', -id = 'test');
     my $seq_word  =  Bio::Tools::SeqWords->new(-seq => $seqobj);

   obtain a hash of word counts, eg:

     my $hash_ref = $seq_stats->count_words($word_length);

   display hash table, eg:

     my %hash = %$hash_ref;
     foreach my $key(sort keys %hash)
     {
     	print "\n$key\t$hash{$key}";
     }

   or

     my $hash_ref = Bio::SeqWords->count_words($seqobj,$word_length);

DESCRIPTION
===========

   Bio:SeqWords is a featherweight object for the calculation of n-mer
word occurrences in a single sequence.  It is envisaged that the object
will be useful for construction of scripts which use n-mer word tables as
the raw material for statistical calculations; for instance, hexamer
frequency for the calculation of coding protential, or the calculation of
periodicity in repetitive DNA.  Triplet frequency is already handled by
Bio::SeqStats.pm (author: Peter Schattner).  There are a few possible
applications for protein, eg: hypothesised amino acid 7-mers in heat shock
proteins, or proteins with multiple simple motifs.  Sometimes these
protein periodicities are best seen when the amino acid alphabet is
truncated, eg Shulman alphabet.  Since there are quite a few of these
shortened alphabets, this module does not specify any particular alphabet.

   See Synopsis above for object creation code.

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org                 - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR
======

   Derek Gatherer, in the loosest sense of the word 'author'.  The general
shape of the module is lifted directly from Peter Schattner's SeqStats.pm
module.  The central subroutine to count the words is adapted from
original code provided by Dave Shivak, in response to a query on the
bioperl mailing list.  At least 2 other people provided alternative means
(equally good but not used in the end) of performing the same calculation.
Thanks to all for your assistance.

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

count_words
-----------

     Title   : count_words
     Usage   : $word_count = $seq_stats->count_words($word_length);
     or 	 : $word_count = $seq_stats->Bio::SeqWords->($seqobj,$word_length);
     Function: Counts non-overlapping words within a string
     	 : any alphabet is used
     Example : a sequence ACCGTCCGT, counted at word length 4,
     	 : will give the hash
     	 : ACCG 1, TCCG 1
     Returns : Reference to a hash in which keys are words (any length) of the
     alphabet
             : used and values are number of occurrences of the word in the
     sequence.
     Args    : Word length as scalar and, reference to sequence object if
     required

     Throws an exception word length is not a positive integer
     or if word length is longer than the sequence.


File: pm.info,  Node: Bio/Tools/Sigcleave,  Next: Bio/Tools/Sim4/Exon,  Prev: Bio/Tools/SeqWords,  Up: Module List

Bioperl object for sigcleave analysis
*************************************

NAME
====

   Bio::Tools::Sigcleave.pm - Bioperl object for sigcleave analysis

SYNOPSIS
========

Object Creation
---------------

     use Bio::Tools::Sigcleave ();

     $sigcleave_object = new Bio::Tools::Sigcleave(-file=>'sigtest.aa',
                                                   -desc=>'test sigcleave protein seq',
                                                   -type=>'AMINO',
                                                   -threshold=>'3.5',
                                                  );

   Sigcleave objects can be created via the same methods as Bio::Seq
objects. The one additional parameter is "-threshold" which sets the score
reporting limit for the algorithim. The above exmple shows a sigcleave
object being created from a protein sequence file. See the Bio::Seq
documention to see the other ways that objects can be created.

Object Methods & Accessors
--------------------------

     %raw_results      = $sigcleave_object->signals;

     $formatted_output = $sigcleave_object->pretty_print;

INSTALLATION
============

   This module is included with the central Bioperl distribution:

     http://bioperl.org/Core/Latest
     ftp://bioperl.org/pub/DIST

   Follow the installation instructions included in the README file.

DESCRIPTION
===========

   "Sigcleave" was a program distributed as part of the free EGCG add-on to
earlier versions of the GCG Sequence Analysis package.

   From the EGCG documentation:   SigCleave uses the von Heijne method to
locate signal sequences, and to identify   the cleavage site. The method
is 95% accurate in resolving signal sequences from   non-signal sequences
with a cutoff score of 3.5, and 75-80% accurate in identifying   the
cleavage site. The program reports all hits above a minimum value.

   The EGCG Sigcleave program was written by Peter Rice (E-mail:
pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre, Wellcome
Trust Genome Campus, Hinxton, Cambs, CB10 1SA, UK).

   Since EGCG is no longer distributed for the latest versions of GCG,
this code was developed to emulate the output of the original program as
much as possible for those who lost access to sigcleave when upgrading to
newer versions of GCG.

   There are 2 accessor methods for this object. "signals" will return a
perl associative array containing the sigcleave scores keyed by amino acid
position.  "pretty_print" returns a formatted string similar to the output
of the original sigcleave utility.

   In both cases, the "threshold" setting controls the score reporting
level. If no value for threshold is passed in by the user, the code
defaults to a reporting value of 3.5.

   In this implemntation the accessor will never return any score/position
pair which does not meet the threshold limit. This is the slightly
different from the behaviour of the 8.1 EGCG sigcleave program which will
report the highest of the under-threshold results if nothing else is found.

   Example of pretty_print output:

     SIGCLEAVE of sigtest from: 1 to 146

     Report scores over 3.5
     Maximum score 4.9 at residue 131

     Sequence:  FVILAAMSIQGSA-NLQTQWKSTASLALET
             	    | (signal)    | (mature peptide)
               	118            131

     Other entries above 3.5

     Maximum score 3.7 at residue 112

     Sequence:  CSRQLFGWLFCKV-HPGAIVFVILAAMSIQGSANLQTQWKSTASLALET
              	   | (signal)    | (mature peptide)
                	99            112

USAGE
=====

   No warranty implied or expressed. Use at your own risk :) Users
unfamiliar with the original Sigcleave application should read the von
Heijne papers.

   The emphasis here is on correctly replicating the calls that 8.1 EGCG
sigcleave would make. This code has been tested against a non-redundant
curated set of 405 Swissprot proteins representing secreted, non-secreted,
membrane and transit proteins. Except for the EGCG sigcleave habit of
reporting an under-threshold score if nothing better is found the output
was identical.

   The weight matrix in this code is for eukaryote signal sequences.

   Please see the example script located in the bioperl distribution to
see how this code can be used.

FEEDBACK
========

   When updating and maintaining a module, it helps to know that people
are actually using it. Let us know if you find a bug, think this code is
useful or have any improvements/features to suggest.

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bioperl.org/bioperl-bugs/

AUTHOR
======

   Chris Dagdigian, dag@sonsorol.org  & others

VERSION
=======

   Bio::Tools::Sigcleave.pm, $Id: Sigcleave.pm,v 1.12 2000/12/29 07:43:27
lapp Exp $

COPYRIGHT
=========

   Copyright (c) 1999 Chris Dagdigian & others. All Rights Reserved.  This
module is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.

REFERENCES / SEE ALSO
=====================

   von Heijne G. (1986) "A new method for predicting signal sequences
cleavage sites."  Nucleic Acids Res. 14, 4683-4690.

   von Heijne G. (1987) in "Sequence Analysis in Molecular Biology:
Treasure Trove or Trivial Pursuit" (Acad. Press, (1987), 113-117).

APPENDIX
========

   The following documentation describes the various functions contained
in this module. Some functions are for internal use and are not meant to
be called by the user; they are preceded by an underscore ("_").

_Analyze
========

     Title     : _Analyze
     Usage     : N/A This is an internal method. Not meant to be called from outside
               : the package
               :
     Purpose   : calculates sigcleave score and amino acid position for the
               : given protein sequence. The score reporting threshold can
               : be adjusted by passing in the "threshold" parameter during
               : object construction. If no threshold is passed in, the code
               : defaults to reporting any scores equal to or above 3.5
               :
     Returns   : nothing. results are added to the object
     Argument  : none.
     Throws    : nothing.
     Comments  : nothing.
     See Also   : n/a

threshold
=========

     Title     : threshold
     Usage     : $value = $self->threshold
               :
     Purpose   : Accessor method sigcleave score reporting threshold.
               :
     Returns   : float.
               :
     Argument  : none.
     Throws    : none.
     Comments  : none.
     See Also   : n/a

signals
=======

     Title     : signals
     Usage     : %sigcleave_results = $sigcleave_object->signals;
               :
     Purpose   : Accessor method for sigcleave results
               :
     Returns   : Associative array. The key value represents the amino acid position
               : and the value represents the score. Only scores that
               : are greater than or equal to the THRESHOLD value are reported.
               :
     Argument  : none.
     Throws    : none.
     Comments  : none.
     See Also   : THRESHOLD

pretty_print
============

     Title     : pretty_print
     Usage     : $output = $sigcleave_object->pretty_print;
               : print $sigcleave_object->pretty_print;
               :
     Purpose   : Emulates the output of the EGCG Sigcleave
               : utility.
               :
     Returns   : A formatted string.
     Argument  : none.
     Throws    : none.
     Comments  : none.
     See Also   : n/a


File: pm.info,  Node: Bio/Tools/Sim4/Exon,  Next: Bio/Tools/Sim4/Results,  Prev: Bio/Tools/Sigcleave,  Up: Module List

A single exon determined by an alignment
****************************************

NAME
====

   Bio::Tools::Sim4::Exon - A single exon determined by an alignment

SYNOPSIS
========

     # See Bio::Tools::Sim4::Results for a description of the context.

     # an instance of this class is-a Bio::SeqFeature::SimilarityPair

     # coordinates of the exon (recommended way):
     print "exon from ", $exon->start(),
     	" to ", $exon->end(), "\n";

     # the same (feature1() inherited from Bio::SeqFeature::FeaturePair)
     print "exon from ", $exon->feature1()->start(),
     	" to ", $exon->feature1()->end(), "\n";
     # also the same (query() inherited from Bio::SeqFeature::SimilarityPair):
     print "exon from ", $exon->query()->start(),
     	" to ", $exon->query()->end(), "\n";

     # coordinates on the matching EST (recommended way):
     print "matches on EST from ", $exon->est_hit()->start(),
     	" to ", $exon->est_hit()->end(), "\n";

     # the same (feature2() inherited from Bio::SeqFeature::FeaturePair)
     print "matches on EST from ", $exon->feature2()->start(),
     	" to ", $exon->feature2()->end(), "\n";
     # also the same (subject() inherited from Bio::SeqFeature::SimilarityPair):
     print "exon from ", $exon->subject()->start(),
     	" to ", $exon->subject()->end(), "\n";

DESCRIPTION
===========

   This class inherits from Bio::SeqFeature::SimilarityPair and represents
an exon on a genomic sequence determined by similarity, that is, by
aligning an EST sequence (using Sim4 in this case). Consequently, the
notion of query and subject is always from the perspective of the genomic
sequence: query refers to the genomic seq, subject to the aligned EST hit.
Because of this, $exon->start(), $exon->end() etc will always return what
you expect.

   To get the coordinates on the matching EST, refer to the properties of
the feature returned by `est_hit' in this node().

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably  to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.   Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Ewan Birney, Hilmar Lapp
=================================

   Email birney@sanger.ac.uk Hilmar Lapp <hlapp@gmx.net> or
<hilmar.lapp@pharma.novartis.com>.

   Describe contact details here

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

percentage_id
-------------

     Title   : percentage_id
     Usage   : $obj->percentage_id($newval)
     Function: This is a synonym for 100 * $obj->est_hit()->frac_identical().
     Returns : value of percentage_id
     Args    : newvalue (optional)

est_hit
-------

     Title   : est_hit
     Usage   : $est_feature = $obj->est_hit();
     Function: Returns the EST hit pointing to (i.e., aligned to by Sim4) this
               exon (i.e., genomic region). At present, merely a synonym for
               $obj->feature2().
     Returns : An Bio::SeqFeatureI implementing object.
     Args    :


File: pm.info,  Node: Bio/Tools/Sim4/Results,  Next: Bio/Tools/WWW,  Prev: Bio/Tools/Sim4/Exon,  Up: Module List

Results of one Sim4 run
***********************

NAME
====

   Bio::Tools::Sim4::Results - Results of one Sim4 run

SYNOPSIS
========

     # to preset the order of EST and genomic file as given on the sim4
     # command line:
     $sim4 = Bio::Tools::Sim4::Results->new(-file => 'result.sim4',
                                            -estisfirst => 1);
     # to let the order be determined automatically (by length comparison):
     $sim4 = Bio::Tools::Sim4->new( -file => 'sim4.results' );
     # filehandle:
     $sim4 = Bio::Tools::Sim4->new( -fh   => \*INPUT );

     # parse the results
     while($exonset = $sim4->next_exonset()) {
         # $exonset is-a Bio::SeqFeature::Generic with Bio::Tools::Sim4::Exons
         # as sub features
         print "Delimited on sequence ", $exonset->seqname(),
               "from ", $exonset->start(), " to ", $exonset->end() "\n";
         foreach $exon ( $exonset->sub_SeqFeature() ) {
     	  # $exon is-a Bio::SeqFeature::FeaturePair
     	  print "Exon from ", $exon->start, " to ", $exon->end,
                  " on strand ", $exon->strand(), "\n";
            # you can get out what it matched using the est_hit attribute
            $homol = $exon->est_hit();
            print "Matched to sequence", $homol->seqname,
                  " at ", $homol->start," to ", $homol->end, "\n";
        }
     }

     # essential if you gave a filename at initialization (otherwise the file
     # stays open)
     $sim4->close();

DESCRIPTION
===========

   The sim4 module provides a parser and results object for sim4 output.
The sim4 results are specialised types of SeqFeatures, meaning you can add
them to AnnSeq objects fine, and manipulate them in the "normal"
seqfeature manner.

   The sim4 Exon objects are Bio::SeqFeature::FeaturePair inherited
objects. The $esthit = $exon->est_hit() is the alignment as a feature on
the matching object (normally, an EST), in which the start/end points are
where the hit lies.

   To make this module work sensibly you need to run

     sim4 genomic.fasta est.database.fasta
     or
     sim4 est.fasta genomic.database.fasta

   To get the sequence identifiers recorded for the first sequence, too,
use A=4 as output option for sim4.

   One fiddle here is that there are only two real possibilities to the
matching criteria: either one sequence needs reversing or not. Because of
this, it is impossible to tell whether the match is in the forward or
reverse strand of the genomic DNA. We solve this here by assuming that the
genomic DNA is always forward. As a consequence, the strand attribute of
the matching EST is unknown, and the strand attribute of the genomic DNA
(i.e., the Exon object) will reflect the direction of the hit.

   See the documentation of parse_next_alignment() for abilities of the
parser to deal with the different output format options of sim4.

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Ewan Birney, Hilmar Lapp
=================================

   Email birney@sanger.ac.uk       hlapp@gmx.net (or
hilmar.lapp@pharma.novartis.com)

   Describe contact details here

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

analysis_method
---------------

     Usage     : $sim4->analysis_method();
     Purpose   : Inherited method. Overridden to ensure that the name matches
                 /sim4/i.
     Returns   : String
     Argument  : n/a

parse_next_alignment
--------------------

     Title   : parse_next_alignment
     Usage   : @exons = $sim4_result->parse_next_alignment;
               foreach $exon (@exons) {
                   # do something
               }
     Function: Parses the next alignment of the Sim4 result file and returns the
               found exons as an array of Bio::Tools::Sim4::Exon objects. Call
               this method repeatedly until an empty array is returned to get the
               results for all alignments.

     The $exon->seqname() attribute will be set to the identifier of the
     respective sequence for both sequences if A=4 was used in the sim4
     run, and otherwise for the second sequence only. If the output does
     not contain the identifier, the filename stripped of path and
     extension is used instead. In addition, the full filename
     will be recorded for both features ($exon inherits off
     Bio::SeqFeature::SimilarityPair) as tag 'filename'. The length
     is accessible via the seqlength() attribute of $exon->query() and
     $exon->est_hit().

     Note that this method is capable of dealing with outputs generated
     with format 0,1,3, and 4 (via the A=n option to sim4). It
     automatically determines which of the two sequences has been
     reversed, and adjusts the coordinates for that sequence. It will
     also detect whether the EST sequence(s) were given as first or as
     second file to sim4, unless this has been specified at creation
     time of the object.

     Example :
     Returns : An array of Bio::Tools::Sim4::Exon objects
     Args    :

next_exonset
------------

     Title   : next_exonset
     Usage   : $exonset = $sim4_result->parse_next_exonset;
               print "Exons start at ", $exonset->start(),
                     "and end at ", $exonset->end(), "\n";
               foreach $exon ($exonset->sub_SeqFeature()) {
                   # do something
               }
     Function: Parses the next alignment of the Sim4 result file and returns the
               set of exons as a container of features. The container is itself
               a Bio::SeqFeature::Generic object, with the Bio::Tools::Sim4::Exon
               objects as sub features. Start, end, and strand of the container
               will represent the total region covered by the exons of this set.

     See the documentation of parse_next_alignment() for further
     reference about parsing and how the information is stored.

     Example :
     Returns : An Bio::SeqFeature::Generic object holding Bio::Tools::Sim4::Exon
               objects as sub features.
     Args    :

next_feature
------------

     Title   : next_feature
     Usage   : while($exonset = $sim4->next_feature()) {
                      # do something
               }
     Function: Does the same as L<next_exonset()>. See there for documentation of
               the functionality. Call this method repeatedly until FALSE is
               returned.

     The returned object is actually a SeqFeatureI implementing object.
     This method is required for classes implementing the
     SeqAnalysisParserI interface, and is merely an alias for
     next_exonset() at present.

     Example :
     Returns : A Bio::SeqFeature::Generic object.
     Args    :


File: pm.info,  Node: Bio/Tools/WWW,  Next: Bio/Tools/pSW,  Prev: Bio/Tools/Sim4/Results,  Up: Module List

Bioperl manager for web resources related to biology.
*****************************************************

NAME
====

   Bio::Tools::WWW.pm - Bioperl manager for web resources related to
biology.

SYNOPSIS
========

Object Creation
---------------

     use Bio::Tools qw(:obj);

     $pdb = $BioWWW->home_url('pdb');

   There is no need to create a new Bio::Tools::WWW.pm object when the
`:obj' tag is used. This tag will import the static $BioWWW object created
by Bio::Tools::WWW.pm into your name space. This saves you from having to
call `new Bio::Tools::WWW'.

   You are free to not use the :obj tag and create the object as you like,
but a Bio::Tools::WWW object is not configurable; any given script only
needs a single copy.

INSTALLATION
============

   This module is included with the central Bioperl distribution:

     http://bio.perl.org/Core/Latest
     ftp://bio.perl.org/pub/DIST

   You also need to define URLs for the following variables in this
package:

     $Not_found_url : Generic page to show in place of a 404 error.
     $Tmp_url       : Web-accessible site that is Used for scripts that
                      need to generate temporary, web-accessible files.
                      The files need not necessarily be HTML files, but
                      being on the same disk as the server will permit
                      faster IO from server scripts.

DESCRIPTION
===========

   Bio::Tools::WWW is primarily a URL broker for a select set of sites
related to bioinformatics/genome analysis. It definitely represents a
biased, unexhaustive set.  It might be more accurate to call this module
"Bio::Tools::URL.pm". But this module does handle some non-URL things and
it may do more of this in the future. Having one module to cover all
biologically relevant web utilities makes it more convenient, especially
at this early stage of development.

   Maintaining accurate URLs over time can be challenging as new web sites
spring up and old sites are re-organized. Because of this fact, the URLs
in this module are not guaranteed to be correct or exhaustive and will
require periodic updating.

URL Management
--------------

   By keeping URL management within Bio::Tools::WWW.pm, other generic
modules can easily access a variety of different web sites without having
to know about a potential multitude of specific modules specialized for
one database or another. A specific example of this is in
Bio::Tools::Blast.pm where the function blast_to_html() needs access to
different URLs in order to add database links to the Blast report. An
alternative approach would be to have multiple blast_to_html() functions
defined within modules specialized for Blast analyses of different
datasets. This, however, may create maintenance headaches when updating
the different versions of the function.

Complex Websites
----------------

   Websites with complex datasets may require special treatment within
this module. As an example, URLs for the Saccharomyces Genome Database are
clustered separately in this module, due to (1) the different ways to
access information at this database and (2) the familiarity of the
developer with this database. The Bio::SGD::WWW.pm inherits from
Bio::Tools::WWW.pm to permit access to the URLs provided by
Bio::Tools::WWW.pm and to SGD-specific HTML and images.

   The organization of Bio::Tools::WWW.pm is expected to evolve as
websites get born, die, and mutate their APIs.

SEE ALSO
========

     http://bio.perl.org/Projects/modules.html  - Online module documentation
     http://bio.perl.org/                       - Bioperl Project Homepage

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules.  Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     vsns-bcd-perl@lists.uni-bielefeld.de          - General discussion
     vsns-bcd-perl-guts@lists.uni-bielefeld.de     - Technically-oriented discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR
======

   Steve A. Chervitz, sac@genome.stanford.edu

VERSION
=======

   Bio::Tools::WWW.pm, 0.014

COPYRIGHT
=========

   Copyright (c) 1996-98 Steve A. Chervitz. All Rights Reserved.  This
module is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.

APPENDIX
========

   Methods beginning with a leading underscore are considered private and
are intended for internal use by this module. They are not considered part
of the public interface and are described here for documentation purposes
only.

home_url
--------

     Usage     : $BioWWW->home_url(<string>)
     Purpose   : To obtain the homepage URL for a biological database or resource.
     Returns   : String containing the URL (including "http://")
     Argument  : String
               : Currently acceptable arguments are:
               :    bioperl  bioperl-schema  biomoo  bsm  ebi  emotif  entrez
               :    expasy  mips  mmdb  ncbi  pir  pfam  pdb  geneQuiz
               :    molMov  pubmed  sacch3d  sgd  scop  swissProt  webmol  ypd
     Throws    : Warns if argument cannot be resolved to a URL.
     Comments  : The URLs listed here do not represent a complete list.
               : Expect this to evolve and grow with time.

   See Also   : `search_url' in this node()

search_url
----------

     Usage     : $BioWWW->search_url(<string>)
     Purpose   : To provide a URL stem for a search engine at a biological database
               : or resource.
     Returns   : String containing the URL (including "http://")
     Argument  : String
               : Currently acceptable arguments are:
               :   3db  embl  cath  ec1  ec2  ec3  emotif_id  entrez  gb1  gb2
               :   gb3  gb4  gb5  pdb  medline  mmdb  pdb  pdb_coord  pfam  pir_acc
               :   pdbSum  molMov  swpr  swModel  swprSearch  scop  scop_pdb  scop_data
               :   ypd
     Throws    : Warns if argument cannot be resolved to a URL.
     Comments  : Unlike the homepage URLs, this method does not return a complete
               : URL but a stem which must be further modified, typically by
               : appending data to it, before it can be used. The data appended
               : depends on the specific URL; typically, it is a database ID or
               : other unique identifier.
               : The requirements for each URL will be described here eventually.
               :
               : The URLs listed here do not represent a complete list.
               : Expect this to evolve and grow with time.
               :
               : Given this complexity, it may be useful to provide special methods
               : for these different URLs. This would however result in an
               : explosion of methods that might make this module less
               : maintainable and harder to use.

   See Also   : `home_url' in this node()

stem_url
--------

     Usage     : $BioWWW->stem_url(<string>)
     Purpose   : To obtain the minimal stem URL for searching a biological database or resource.
     Returns   : String containing the URL (including "http://")
     Argument  : String
               : Currently acceptable arguments are:
               :    emotif  entrez  pdb
     Throws    : Warns if argument cannot be resolved to a URL.
     Comments  : The URLs stems returned by this method are much more minimal than
               : this provided by search_url(). Use of these stems requires knowledge
               : of the CGI scripts which they invoke.

   See Also   : `search_url' in this node()

viewer_url
----------

     Usage     : $BioWWW->viewer_url(<string>)
     Purpose   : To obtain the stem URL for a 3D viewer (RasMol, WebMol, Cn3D)
     Returns   : String containing the URL (including "http://")
     Argument  : String
               : Currently acceptable arguments are:
               :    rasmol webmol cn3d java  (java is an alias for webmol)
     Throws    : Warns if argument cannot be resolved to a URL.
     Comments  : The 4-letter Brookhaven PDB identifier must be appended to the
               : URL provided by this method.
               : The URLs listed here do not represent a complete list.
               : Expect this to evolve and grow with time.

not_found_url
-------------

     Usage     : $BioWWW->not_found_url()
     Purpose   : To obtain the URL for a web page to be shown in place of a 404 error.
     Returns   : String containing the URL (including "http://")
     Argument  : n/a
     Throws    : n/a
     Comments  : This URL should be customized as desired.

tmp_url
-------

     Usage     : $BioWWW->tmp_url()
     Purpose   : To obtain the URL for a temporary, web-accessible directory.
     Returns   : String containing the URL (including "http://")
     Argument  : n/a
     Throws    : n/a
     Comments  : This URL should be customized  as desired.

search_link
-----------

     Usage     : $BioWWW->search_link(<site>, <value>, <text>)
     Purpose   : Wrapper for search_url() that returns the URL within an HTML anchor.
     Returns   : String containing the HTML anchor ( qq|<A HREF="http://..."</A>|)
     Argument  : <site>  = string to be used as argument for search_url()
               : <value> = string to be appended to the search URL stem.
               : <text>  = string to be shown as the link text (default = <value>).
     Throws    : n/a
     Status    : Experimental

   See Also   : `search_url' in this node()

viewer_link
-----------

     Usage     : $BioWWW->viewer_link(<site>, <value>, <text>)
     Purpose   : Wrapper for viewer_url() that returns the complete URL within an HTML anchor.
     Returns   : String containing the HTML anchor ( qq|<A HREF="http://..."</A>|)
     Argument  : <site>  = string to be used as argument for viewer_url()
               : <value> = string to be appended to the viewer URL stem.
               : <text>  = string to be shown as the link text (default = <value>).
     Throws    : n/a
     Status    : Experimental

   See Also   : `viewer_url' in this node()

html
----

     Usage     : $BioWWW->html(<string>)
     Purpose   : To obtain HTML-formatted text for frequently needed web-page messages.
     Returns   : String containing the HTML anchor ( qq|<A HREF="http://..."</A>|)
     Argument  : String.
               : Currently acceptable arguments are:
               :   authority  (mailto: link for webmaster; shows e-mail address as link)
               :   notify     (wraps mailto:authority link with text for link "please notify us")
               :   ourFault   ("this problem is our fault. If it persists <notify-link>")
               :   trouble    (same as ourFault but doesn't blame us for the problem)
               :   techDiff   ("we are experiencing technical difficulties. Please stand by.")
     Throws    : n/a
     Comments  : The authority (webmaster) is imported from the Bio::Root::Global.pm
               : module. The value for $AUTHORITY should be set there, or
               : customize this module so that it doesn't use Bio::Root::Global.pm.

sgd_url
-------

     Usage     : $BioWWW->sgd_url(<string>)
     Purpose   : To obtain the webpage URL or search stem for SGD.
     Returns   : String containing the URL (including "http://")
     Argument  : String
               : Currently acceptable arguments (TODO).
     Throws    : Warns if argument cannot be resolved to a URL.
     Comments  : This accessor is specialized for the Saccharomyces Genome Database.
               : It is possible that it will be moved to SGD::WWW.pm in the future.

   See Also   : `search_url' in this node()

s3d_url
-------

     Usage     : $BioWWW->s3d_url(<string>)
     Purpose   : To obtain the webpage URL or search stem for Sacch3D.
     Returns   : String containing the URL (including "http://")
     Argument  : String
               : Currently acceptable arguments (TODO).
     Throws    : Warns if argument cannot be resolved to a URL.
     Comments  : This accessor is specialized for the Saccharomyces Genome Database.
               : It is possible that it will be moved to SGD::WWW.pm in the future.

   See Also   : `search_url' in this node()

sgd_stem_url
------------

     Usage     : $BioWWW->sgd_stem_url(<string>)
     Purpose   : To obtain the minimal stem URL for a SGD/Sacch3D CGI script.
     Returns   : String containing the URL (including "http://")
     Argument  : String
               : Currently acceptable arguments (TODO).
     Throws    : Warns if argument cannot be resolved to a URL.
     Comments  : This accessor is specialized for the Saccharomyces Genome Database.
               : It is possible that it will be moved to SGD::WWW.pm in the future.

   See Also   : `search_url' in this node()

s3d_link
--------

     Usage     : $BioWWW->s3d_link(<site>, <value>, <text>)
     Purpose   : Wrapper for s3d_url() that returns the complete URL within an HTML anchor.
     Returns   : String containing the URL (including "http://")
     Argument  : <site>  = string to be used as argument for s3d_url()
               : <value> = string to be appended to the s3d URL stem.
               : <text>  = string to be shown as the link text (default = <value>).
     Throws    : n/a
     Status    : Experimental
     Comments  : This accessor is specialized for the Saccharomyces Genome Database.
               : It is possible that it will be moved to SGD::WWW.pm in the future.

   See Also   : `s3d_url' in this node(), `sgd_link' in this node()

sgd_link
--------

     Usage     : $BioWWW->sgd_link(<site>, <value>, <text>)
     Purpose   : Wrapper for sgd_url() that returns the complete URL within an HTML anchor.
     Returns   : String containing the URL (including "http://")
     Argument  : <site>  = string to be used as argument for sgd_url()
               : <value> = string to be appended to the sgd URL stem.
               : <text>  = string to be shown as the link text (default = <value>).
     Throws    : n/a
     Status    : Experimental
     Comments  : This accessor is specialized for the Saccharomyces Genome Database.
               : It is possible that it will be moved to SGD::WWW.pm in the future.

   See Also   : `sgd_url' in this node(), `s3d_link' in this node()

start_html
----------

     Usage     : $BioWWW->start_html()
     Purpose   : Prints the "Content-type: text/html\n\n<HTML>\n" header.
     Returns   : n/a; This method prints the Content-type string shown above.
     Argument  : n/a
     Throws    : n/a
     Status    : Experimental
     Comments  : This method prevents redundant invocations thus avoiding th
               : accidental printing of the "content-type..." on the page.
               : If using L. Stein's CGI.pm, this is similar to $query->header()
               : (Does CGI.pm prevent redundant invocation?)

redirect
--------

     Usage     : $BioWWW->redirect(<string>)
     Purpose   : Prints the header needed to redirect a web browser to a supplied URL.
     Returns   : n/a; Prints the redirection header.
     Argument  : String containing the URL to be redirected to.
     Throws    : n/a
     Status    : Experimental

pre
---

     Usage     : $BioWWW->pre("text to be pre-formatted");
     Purpose   : To produce HTML for text that is not to be formated by the brower.
     Returns   : String containing the "<pre>" formatted html.
     Argument  : n/a
     Throws    : n/a
     Status    : Experimental

strip_html
----------

     Usage     : $boolean = &strip_html( string_ref, [fast] );
     Purpose   : Removes HTML formatting from a supplied string.
     Returns   : Boolean: true if string was stripped, false if not.
     Argument  : string_ref = reference to a string containing the whole
               :              web page to be stripped.
               : fast = a non-zero value. Optional. If set, a faster
               :        but perhaps less thorough procedure is used for
               :        stripping. Default = not fast.
     Throws    : Exception if the argument is not a scalar reference.
     Comments  : Based on code originally written by Alex Dong Li
               : (ali@genet.sickkids.on.ca).
               : This is a more generic version of the function that appears
               : in Bio::Tools::Blast::HTML.pm
               : This version does not perform any Blast-specific stripping.
               :
               : This employs a simple method for removing tags that
               : will fail under following conditions:
               :  1) if quoted > appears in a tag  (does this ever happen?)
               :  2) if a tag is split over multiple lines and this method is
               :     used to process one line at a time.
               :
               : Without fast mode, large HTML files can take exceedingly long times to
               : strip (e.g., 1Meg file with many tags can take 10 minutes versus 5 seconds
               : in fast mode. Try the swissprot yeast table). If you know the HTML to be
               : well-behaved (i.e., tags are not split across mutiple lines), use fast
               : mode for large, dense files.

Data Members
------------

   An instance of Bio::Tools::WWW.pm is a blessed reference to a hash
containing all or some of the following fields:

     FIELD           VALUE
     --------------------------------------------------------------
     _started_html   Defined the on the initial invocation of start_html()
                     to avoid duplicate printing out the "Content-type..." header.