This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: Bio/Tools/CodonTable,  Next: Bio/Tools/ESTScan,  Prev: Bio/Tools/Blast/Sbjct,  Up: Module List

Bioperl codon table object
**************************

NAME
====

   Bio::Tools::CodonTable - Bioperl codon table object

SYNOPSIS
========

     This is a read-only class for all known codon tables.  The IDs are
     the ones used by nucleotide sequence databases.  All common IUPAC
     ambiguity codes for DNA, RNA and animo acids are recognized.

     # to use
     use Bio::Tools::CodonTable;

     # defaults to ID 1 "Standard"
     $myCodonTable   = Bio::Tools::CodonTable->new();
     $myCodonTable2  = Bio::Tools::CodonTable -> new ( -id => 3 );

     # change codon table
     $myCodonTable->id(5);

     # examine codon table
     print  join (' ', "The name of the codon table no.", $myCodonTable->id(4),
     	       "is:", $myCodonTable->name(), "\n");

     # translate a codon
     $aa = $myCodonTable->translate('ACU');
     $aa = $myCodonTable->translate('act');
     $aa = $myCodonTable->translate('ytr');

     # reverse translate an amino acid
     @codons = $myCodonTable->revtranslate('A');
     @codons = $myCodonTable->revtranslate('Ser');
     @codons = $myCodonTable->revtranslate('Glx');
     @codons = $myCodonTable->revtranslate('cYS', 'rna');

     #boolean tests
      print "Is a start\n"       if $myCodonTable->is_start_codon('ATG');
      print "Is a termianator\n" if $myCodonTable->is_ter_codon('tar');
      print "Is a unknown\n"     if $myCodonTable->is_unknown_codon('JTG');

DESCRIPTION
===========

   Codon tables are also called translation tables or genetics codes since
that is what they try to represent. A bit more complete picture of the
full complexity of codon usage in various taxonomic groups presented at
the NCBI Genetic Codes Home page.

   CodonTable is a BioPerl class that knows all current translation tables
that are used by primary nucleotide sequence databases (GenBank, EMBL and
DDBJ). It provides methods to output information about tables and
relationships between codons and amino acids.

   This class and its methods recognized all common IUPAC ambiguity codes
for DNA, RNA and animo acids. The translation method follows the
conventions in EMBL and TREMBL databases.

   It is a nuisance to separate RNA and cDNA representations of nucleic
acid transcripts. The CodonTable object accepts codons of both type as
input and allows the user to set the mode for output when reverse
translating. Its default for output is DNA.

   Note: This class deals with individual codons and amino acids, only.
   Call it from your own objects to translate and reverse translate
longer sequences.

   The amino acid codes are IUPAC recommendations for common amino acids:

     A           Ala            Alanine
     R           Arg            Arginine
     N           Asn            Asparagine
     D           Asp            Aspartic acid
     C           Cys            Cysteine
     Q           Gln            Glutamine
     E           Glu            Glutamic acid
     G           Gly            Glycine
     H           His            Histidine
     I           Ile            Isoleucine
     L           Leu            Leucine
     K           Lys            Lysine
     M           Met            Methionine
     F           Phe            Phenylalanine
     P           Pro            Proline
     S           Ser            Serine
     T           Thr            Threonine
     W           Trp            Tryptophan
     Y           Tyr            Tyrosine
     V           Val            Valine
     B           Asx            Aspartic acid or Asparagine
     Z           Glx            Glutamine or Glutamic acid
     X           Xaa            Any or unknown amino acid

   It is worth noting that, "Bacterial" codon table no. 11 produces an
polypeptide that is, confusingly, identical to the standard one. The only
differences are in available initiator codons.

   NCBI Genetic Codes home page:
http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c

   EBI Translation Table Viewer:
http://www.ebi.ac.uk/cgi-bin/mutations/trtables.cgi

   Amended ASN.1 version with ids 16 and 21 is at:
ftp://ftp.ebi.ac.uk/pub/databases/geneticcode/

   Thank your for Matteo diTomasso for the original Perl implementation of
these tables.

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to the
Bioperl mailing lists  Your participation is much appreciated.

     bioperl-l@bioperl.org                         - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via  email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Heikki Lehvaslaiho
===========================

   Email:  heikki@ebi.ac.uk Address:

     EMBL Outstation, European Bioinformatics Institute
     Wellcome Trust Genome Campus, Hinxton
     Cambs. CB10 1SD, United Kingdom

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

id
--

     Title   : id
     Usage   : $obj->id(3); $id_integer = $obj->id();
     Function:

     Sets or returns the id of the translation table.  IDs are
     integers from 1 to 15, excluding 7 and 8 which have been
     removed as redundant. If an invalid ID is given the method
     returns 0, false.

     Example :
     Returns : value of id, a scalar, 0 if not a valid
     Args    : newvalue (optional)

name
----

     Title   : name
     Usage   : $obj->name()
     Function: returns the descriptive name of the translation table
     Example :
     Returns : A string
     Args    : None

translate
---------

     Title   : translate
     Usage   : $obj->translate('YTR')
     Function: Returns one letter amino acid code for a codon input.

     Returns 'X' for unknown codons and codons that code for
     more than one amino acid. Returns an empty string if input
     is not three characters long. Exceptions for these are:

     - IUPAC amino acid code B for Aspartic Acid and
       Asparagine, is used.
     - IUPAC amino acid code Z for Glutamic Acid, Glutamine is
       used.
     - if the codon is two nucleotides long and if by adding
       an a third character 'N', it codes for a single amino
       acid (with exceptions above), return that, otherwise
       return empty string.

     Returns empty string for other input strings that are not
     three characters long.

     Example :
     Returns : One letter ambiguous IUPAC amino acid code
     Args    : a codon = a three character, ambiguous IUPAC nucleotide string

translate_strict
----------------

     Title   : translate_strict
     Usage   : $obj->translate_strict('ACT')
     Function: returns one letter amino acid code for a codon input

     Fast and simple translation. User is responsible to resolve
     ambiguous nucleotide codes before calling this
     method. Returns 'X' for unknown codons and an empty string
     for input strings that are not three characters long.

     It is not recommended to use this method in a production
     environment. Use method translate, instead.

     Example :
     Returns : A string
     Args    : a codon = a three nucleotide character string

revtranslate
------------

     Title   : revtranslate
     Usage   : $obj->revtranslate('G')
     Function: returns codons for an amino acid

     Returns an empty string for unknown amino acid
     codes. Ambiquous IUPAC codes Asx,B, (Asp,D; Asn,N) and
     Glx,Z (Glu,E; Gln,Q) are resolved. Both single and three
     letter amino acid codes are accepted. '*' and 'Ter' are
     used for terminator.

     By default, the output codons are shown in DNA.  If the
     output is needed in RNA (tr/t/u/), add a second argument
     'RNA'.

     Example : $obj->revtranslate('Gly', 'RNA')
     Returns : An array of three lower case letter strings i.e. codons
     Args    : amino acid, 'RNA'

is_start_codon
--------------

     Title   : is_start_codon
     Usage   : $obj->is_start_codon('ATG')
     Function: returns true (1) for all codons that can be used as a
               translation start, false (0) for others.
     Example : $myCodonTable->is_start_codon('ATG')
     Returns : boolean
     Args    : codon

is_ter_codon
------------

     Title   : is_ter_codon
     Usage   : $obj->is_ter_codon('GAA')
     Function: returns true (1) for all codons that can be used as a
               translation tarminator, false (0) for others.
     Example : $myCodonTable->is_ter_codon('ATG')
     Returns : boolean
     Args    : codon

is_unknown_codon
----------------

     Title   : is_unknown_codon
     Usage   : $obj->is_unknown_codon('GAJ')
     Function: returns false (0) for all codons that are valid,
     	    true (1) for others.
     Example : $myCodonTable->is_unknown_codon('NTG')
     Returns : boolean
     Args    : codon

_unambiquous_codons
-------------------

     Title   : _unambiquous_codons
     Usage   : @codons = _unambiquous_codons('ACN')
     Function:
     Example :
     Returns : array of strings (one letter unambiguous amino acid codes)
     Args    : a codon = a three IUPAC nucleotide character string


File: pm.info,  Node: Bio/Tools/ESTScan,  Next: Bio/Tools/Fasta,  Prev: Bio/Tools/CodonTable,  Up: Module List

Results of one ESTScan run
**************************

NAME
====

   Bio::Tools::ESTScan - Results of one ESTScan run

SYNOPSIS
========

     $estscan = Bio::Tools::ESTScan->new(-file => 'result.estscan');
     # filehandle:
     $estscan = Bio::Tools::ESTScan->new( -fh  => \*INPUT );

     # parse the results
     # note: this class is-a Bio::Tools::AnalysisResult which implements
     # Bio::SeqAnalysisParserI, i.e., $genscan->next_feature() is the same
     while($gene = $estscan->next_prediction()) {
         # $gene is an instance of Bio::Tools::Prediction::Gene
         foreach my $orf ($gene->exons()) {
     	   # $orf is an instance of Bio::Tools::Prediction::Exon
     	   $cds_str = $orf->predicted_cds();
         }
     }

     # essential if you gave a filename at initialization (otherwise the file
     # will stay open)
     $estscan->close();

DESCRIPTION
===========

   The ESTScan module provides a parser for ESTScan coding region
prediction output.

   This module inherits off *Note Bio/Tools/AnalysisResult:
Bio/Tools/AnalysisResult, and therefore implements the *Note
Bio/SeqAnalysisParserI: Bio/SeqAnalysisParserI, interface.  See *Note
Bio/SeqAnalysisParserI: Bio/SeqAnalysisParserI,.

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Hilmar Lapp
====================

   Email hlapp@gmx.net (or hilmar.lapp@pharma.novartis.com)

   Describe contact details here

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

analysis_method
---------------

     Usage     : $estscan->analysis_method();
     Purpose   : Inherited method. Overridden to ensure that the name matches
                 /estscan/i.
     Returns   : String
     Argument  : n/a

next_feature
------------

     Title   : next_feature
     Usage   : while($orf = $estscan->next_feature()) {
                      # do something
               }
     Function: Returns the next gene structure prediction of the ESTScan result
               file. Call this method repeatedly until FALSE is returned.

     The returned object is actually a SeqFeatureI implementing object.
     This method is required for classes implementing the
     SeqAnalysisParserI interface, and is merely an alias for
     next_prediction() at present.

     Example :
     Returns : A Bio::Tools::Prediction::Gene object.
     Args    :

next_prediction
---------------

     Title   : next_prediction
     Usage   : while($gene = $estscan->next_prediction()) {
                      # do something
               }
     Function: Returns the next gene structure prediction of the ESTScan result
               file. Call this method repeatedly until FALSE is returned.

     So far, this method DOES NOT work for reverse strand predictions,
     even though the code looks like.
      Example :
      Returns : A Bio::Tools::Prediction::Gene object.
      Args    :

close
-----

     Title   : close
     Usage   : $result->close()
     Function: Closes the file handle associated with this result file.
               Inherited method, overridden.
     Example :
     Returns :
     Args    :

_fasta_stream
-------------

     Title   : _fasta_stream
     Usage   : $result->_fasta_stream()
     Function: Gets/Sets the FASTA sequence IO stream for reading the contents of
               the file associated with this MZEF result object.

     If called for the first time, creates the stream from the filehandle
     if necessary.
      Example :
      Returns :
      Args    :


File: pm.info,  Node: Bio/Tools/Fasta,  Next: Bio/Tools/GFF,  Prev: Bio/Tools/ESTScan,  Up: Module List

Bioperl Fasta utility object
****************************

NAME
====

   Bio::Tools::Fasta.pm - Bioperl Fasta utility object

INSTALLATION
============

   This module is included with the central Bioperl distribution:

     http://bio.perl.org/Core/Latest
     ftp://bio.perl.org/pub/DIST

   Follow the installation instructions included in the README file.

SYNOPSIS
========

Object Creation
---------------

   Bio::Tools::Fasta.pm cannot yet build sequence analysis objects given
output from the FASTA program. This module can only be used for parsing
Fasta multiple sequence files. This situation may change.

Parse a Fasta multiple-sequence file.
-------------------------------------

   If $file is not a valid filename, data will be read from STDIN.  See
the `parse' in this node() method for a complete description of parameters.

     use Bio::Tools::Fasta qw(:obj);

     $seqCount = $Fasta->parse(-file        => $file,
     			      -seqs        => \@seqs,
     			      -ids         => \@ids,
     			      -edit_id     => 1,
     			      -edit_seq    => 1,
     			      -descs       => \@descs,
     			      -filt_func   => \&filter_seq   # filter input sequences.
     			      -exec_func   => \&process_seq  # process each seq as it is parsed.
     			      );

DESCRIPTION
===========

   The Bio::Tools::Fasta.pm module, in its present incarnation,
encapsulates data and methods for managing Fasta multiple sequence files
(reading, parsing).  It does not yet work with output from the Fasta
sequence analysis program (`References & Information about the FASTA
program' in this node).

   The documentation of this module is incomplete. For some examples of
usage, see the DEMO SCRIPTS section.

   Unlike "Blast", the term "Fasta" is ambiguous since it refers to both a
sequence file format and a sequence analysis utility (I use "FASTA" to
refer to the program; "Fasta" for the file format).  Ultimately, this
module will be able to work with both Fasta sequence files as well as
result files generated by FASTA sequence analysis, analogous to the way the
Bio::Tools::Blast.pm object is used for working with Blast output.

References & Information about the FASTA program
------------------------------------------------

   *WEBSITES:*

     ftp://ftp.virginia.edu/pub/fasta/    - FASTA software
     http://www2.ebi.ac.uk/fasta3/        - FASTA server at EBI

   *PUBLICATIONS:* (with PubMed links)

     Pearson W.R. and Lipman, D.J. (1988). Improved tools for biological
     sequence comparison. PNAS 85:2444-2448


http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=3162770&form=6&db=m&Dopt=b

     Pearson, W.R. (1990). Rapid and sensitive sequence comparison with FASTP and FASTA.
     Methods in Enzymology 183:63-98.


http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=2156132&form=6&db=m&Dopt=b

USAGE
=====

   A simple demo script is included with the central Bioperl distribution
(`INSTALLATION' in this node) and is also available from:

     http://bio.perl.org/Core/Examples/seq/

DEPENDENCIES
============

   Bio::Tools::Fasta.pm is a concrete class that inherits from
*Bio::Tools::SeqAnal.pm*.  This module also relies on *Bio::Seq.pm* for
producing sequence objects.

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules.  Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     vsns-bcd-perl@lists.uni-bielefeld.de          - General discussion
     vsns-bcd-perl-guts@lists.uni-bielefeld.de     - Technically-oriented discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR
======

   Steve A. Chervitz, sac@genome.stanford.edu

VERSION
=======

   Bio::Tools::Fasta.pm, 0.014

COPYRIGHT
=========

   Copyright (c) 1998 Steve A. Chervitz. All Rights Reserved.  This module
is free software; you can redistribute it and/or modify it under the same
terms as Perl itself.

SEE ALSO
========

     Bio::Tools::SeqAnal.pm   - Sequence analysis object base class.
     Bio::Seq.pm              - Biosequence object
     Bio::Root::Object.pm     - Proposed base class for all Bioperl objects.

     http://bio.perl.org/Projects/modules.html  - Online module documentation
     http://bio.perl.org/                       - Bioperl Project Homepage

   `References & Information about the FASTA program' in this node.

TODO
====

   * Incorporate code for parsing Fasta sequence analysis reports.

   * Improve documentation.

APPENDIX
========

   Methods beginning with a leading underscore are considered private and
are intended for internal use by this module. They are not considered part
of the public interface and are described here for documentation purposes
only.

_initialize
-----------

     Usage     : n/a; automatically called by Bio::Root::Object::new()
     Purpose   : Calls superclass constructor.
     Returns   : n/a
     Argument  : Named parameters passed to new() are processed by this method.
               : At present, none are processed.

   See Also   : *Bio::Tools::SeqAnal::_initialize()*

parse
-----

     Usage     : $fasta_obj->$parse( %named_parameters)
     Purpose   : Parse a set of Fasta sequences or Fasta reports from a file or STDIN.
               : (Currently only Fasta sequence parsing is supported).
     Returns   : Integer (number of sequences or Fasta reports parsed).
     Argument  : Named parameters: (TAGS CAN BE UPPER OR LOWER CASE)
     	   :   -FILE       => string (name of file containing Fasta-formatted sequences.
               :                          Optional. If a valid file is not supplied,
     	   :			      STDIN will be used).
               :   -SEQS       => boolean (true = parse a Fasta multi-sequence file
               :                           false = parse a Fasta sequence analysis report).
               :   -IDS        => array_ref (optional).
               :   -DESCS      => array_ref (optional).
               :   -EDIT_ID    => boolean  (true = edit sequence identifiers).
               :   -EDIT_SEQ   => boolean  (true = edit sequence data).
               :   -TYPE       => string   (type of sequences to be processed:
               :                            'dna', 'rna', 'amino'),
               :   -FILT_FUNC  => func_ref (reference to a function for filtering out
     	   :				sequences as they are being parsed.
     	   :				This function should return a boolean
               :                            (true if the sequence should be filtered out)
     	   :				and accept three arguments as shown
     	   :				in this sample filter function:
     	   :				sub filt {
     	   :				    my($len, $id, $desc);
     	   :				    # $len is the sequence length
     	   :				    return ($len < 25 and $id =~ /^123/);
     	   :				}
               :                            This function will screen out any sequence
               :                            less than 25 in length and having an id
     	   :				starting with '123'.
               :   -SAVE_ARRAY => array_ref (reference to an array for storing all
               :                             sequence objects as they are created.)
               :   -EXEC_FUNC  => func_ref (reference to a function for processing each
               :                            sequence object) as it is parsed.
               :                            When working with sequences, this function
               :                            should accept a Bio::Seq.pm object as its
               :                            sole argument. Return value will be ignored).
               :   -STRICT     => boolean (increases sensitivity to errors).
               :
               :  ----------------------------------------------------------------
               :   NOTE: Parameters such as seqs, ids, desc, edit_id, edit_seq, type
               :         are used only when parsing Fasta sequence files.
               :         Additional parameters will be added as necessary for
               :         parsing Fasta sequence analysis reports.
               :
     	   :   NOTE: When retreiving sequence data instead of objects,
               :         the -SEQS, -IDS, and -DESCS parameters should all be array refs.
               :         This constitutes a signal that sequence objects are not
               :         to be constructed.
               :
     Throws    : Propagates any exceptions thrown by _parse_seq_stream()
     Comments  :

     WORKING WITH SEQUENCE DATA:
     ---------------------------
     The parse method can return sequence data bundled into Bio::Seq.pm objects
     or in raw format (separate arrays for seq, id, and desc data). The reason for
     this is that in some cases, you don't particularly need to work with sequence
     objects and it is inefficient to build objects just to have them broken apart.
     However, there is something to be said for choosing one approach --
     always return seq objects. In this way, the object
     becomes the basic unit of exchange. For now, both options are allowed.

     The story will be different for Fasta sequence analysis report objects
     since these are a much more complex data type and it would be unwieldy
     and dangerous to return parsed data unencapsulated from an object.

   See Also   : `_parse_seq_stream' in this node(), `_set_id_desc' in this
node(), `_get_parse_seq_func' in this node()

_parse_seq_stream
-----------------

     Usage     : n/a. Internal method called by parse()
     Purpose   : Obtains the function to be used during parsing and calls read().
     Returns   : Integer (the number of sequences read)
     Argument  : Named parameters  (forwarded from parse())
     Throws    : Propagates any exception thrown by _get_parse_seq_func() and read().
     Comments  :

     This method permits the sequence data to be parsed as it is being read in.
     The motivation here is that when working with a potentially huge set of
     sequences, there is no need to read them all into memory before you start
     processing them. In fact, you may only be interested in a few of them.
     
     This method constructs and returns a closure for parsing a single Fasta sequence.
     It is called automatically by the read() method inherited from
     Bio::Root::Object.pm.
     
     Another issue concerns what to do with the parsed data: save it or
     use it? Sometimes you need to process all sequence data as a group
     (eg., sorting). Other times, you can safely process each sequence
     as it gets parsed and then move on to the next. By delivering each
     sequence as it gets parsed, the client is free to decide what to
     do with it.

   See Also   : `_get_parse_seq_func' in this node(),
*Bio::Root::Object::read()*

_get_parse_seq_func
-------------------

     Usage     : n/a. Internal method called by _parse_seq_stream()
     Purpose   : Generates a function reference to be used for parsing raw sequence data
               : as it is being loaded by read().
               : Used when parsing Fasta sequence files.
     Returns   : Function reference (actually a closure)
     Argument  : Named parameters forwared from _parse_seq_stream()
     Throws    : Exceptions due to improper argument types.
               :   (to be elaborated...)
     Comments  : The function generated performs sequence editing if
               : the -EDIT_SEQ parse() parameter is is non-zero.
     	   : This consists of removing any ambiguous residues at begin
               : or end of seq.
     	   : Regardless of -EDIT_SEQ, all sequence will be edited to remove
               : whitespace and non-alphabetic chars.
     	   : Gaps characters are permitted ('.' and '-').
               : (Need a more universal way to identify gap characters.)
               : If sequence objects are generated and an -EXEC_FUNC is supplied,
               : each object will be destroyed after calling this function.
               : This prevents memory usage problems for large runs.

   See Also   : `parse' in this node(), `_parse_seq_stream' in this
node(), *Bio::Root::Object::_rearrange*()

edit_id
-------

     Usage     : $fasta_obj->edit_id()
     Purpose   : Set/Get a boolean indicator as to whether sequence IDs should be edited.
               : Used when parsing Fasta sequence files.
     Returns   : Boolean (true if the IDs are to be edited).
     Argument  : Boolean (optional)
     Throws    : n/a

   See Also   : `_set_id_desc' in this node(), `_get_parse_seq_func' in
this node()

edit_seqs
---------

     Usage     : $fasta_obj->edit_seqs()
     Purpose   : Set/Get a boolean indicator as to whether sequences should be edited.
               : Used when parsing Fasta sequence files.
     Returns   : Boolean (true if the sequences are to be edited).
     Argument  : Boolean (optional)
     Throws    : n/a

   See Also   : `_get_parse_seq_func' in this node()

_set_id_desc
------------

     Usage     : n/a. Internal method called by _get_parse_seq_func()
     Purpose   : Sets the _id and _desc data members, optionally editing the id.
               : Used when parsing Fasta sequence files.
     Returns   : 2-element list containing: ($id, $description)
     Argument  : String containing raw ID + description (leading '>' will be stripped)
     Throws    : n/a
     Comments  : Optionally edits the ID if the '_edit_id' field is true.
               : Descriptions are not altered.
               : ID Edits:
               :   1) Uppercases the ID.
               :   2) If the ID has any | characters the following is performed:
               :        a) Replace | characters with _ characters.
               :           (prevent regexp and shell trouble).
               :        b) Cleans up complex identifiers.
               :           Some GenBank specifiers have multiple parts:
               :           >gi|2980872|gnl|PID|e1283615 homeobox protein SHOTb
               :           Only the first ID is saved as the official ID.
               :           Extra ids will be included at the end of the
               :           description between brackets:
               :           GI_2980872 homeobox protein SHOTb [ GNL PID e1283615 ]
               :
               : ID editing is somewhat experimental.

   See Also   : `_get_parse_seq_func' in this node(), `edit_id' in this
node()

num_seqs
--------

     Usage     : $fasta_obj->num_seqs()
     Purpose   : Get the number of sequences read by the Fasta object.
     Returns   : Integer
     Argument  : n/a
     Throws    : n/a

FOR DEVELOPERS ONLY
===================

Data Members
------------

   Information about the various data members of this module is provided
for those wishing to modify or understand the code. Two things to bear in
mind:

  1. Do NOT rely on these in any code outside of this module.  All data
     members are prefixed with an underscore to signify that they are
     private.  Always use accessor methods. If the accessor doesn't exist
     or is inadequate, create or modify an accessor (and let me know,
     too!).

  2. This documentation may be incomplete and out of date.  It is easy for
     these data member descriptions to become obsolete as this module is
     still evolving. Always double check this info and search for members
     not described here.

        An instance of Bio::Tools::Fasta.pm is a blessed reference to a
hash containing all or some of the following fields:

     FIELD           VALUE
     --------------------------------------------------------------
     _seqCount       Number of sequences parsed.

     _edit_seq       Boolean. Should sequences be edited during parsing?

     _edit_id        Boolean. Should ids be edited during parsing?

     More data members will be added when code for Fasta report
     processing is incorporated.

     INHERITED DATA MEMBERS

   (See Bio::Tools::SeqAnal.pm for inherited data members.)


File: pm.info,  Node: Bio/Tools/GFF,  Next: Bio/Tools/Genscan,  Prev: Bio/Tools/Fasta,  Up: Module List

A Bio::SeqAnalysisParserI compliant GFF format parser
*****************************************************

NAME
====

   Bio::Tools::GFF - A Bio::SeqAnalysisParserI compliant GFF format parser

SYNOPSIS
========

     use Bio::Tool::GFF;

     # specify input via -fh or -file
     my $gffio = Bio::Tools::GFF(-fh => \*STDIN, -gff_version => 2);
     my $feature;
     # loop over the input stream
     while($feature = $gffio->next_feature()) {
         # do something with feature
     }
     $gffio->close();

     # you can also obtain a GFF parser as a SeqAnalasisParserI in
     # HT analysis pipelines (see Bio::SeqAnalysisParserI and
     # Bio::Factory::SeqAnalysisParserFactory)
     my $factory = Bio::Factory::SeqAnalysisParserFactory->new();
     my $parser = $factory->get_parser(-input => \*STDIN, -method => "gff");
     while($feature = $parser->next_feature()) {
         # do something with feature
     }

DESCRIPTION
===========

   This class provides a simple GFF parser and writer. In the sense of a
SeqAnalysisParser, it parses an input file or stream into SeqFeatureI
objects, but is not in any way specific to a particular analysis program
and the output that program produces.

   That is, if you can get your analysis program spit out GFF, here is
your result parser.

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Matthew Pocock
=======================

   Email mrp@sanger.ac.uk

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

new
---

     Title   : new
     Usage   :
     Function: Creates a new instance. Recognized named parameters are -file, -fh,
               and -gff_version.

     Returns : a new object
     Args    : names parameters

next_feature
------------

     Title   : next_feature
     Usage   : $seqfeature = $gffio->next_feature();
     Function: Returns the next feature available in the input file or stream, or
               undef if there are no more features.
     Example :
     Returns : A Bio::SeqFeatureI implementing object, or undef if there are no
               more features.
     Args    : none

from_gff_string
---------------

     Title   : from_gff_string
     Usage   : $gff->from_gff_string($feature, $gff_string);
     Function: Sets properties of a SeqFeatureI object from a GFF-formatted
               string. Interpretation of the string depends on the version
               that has been specified at initialization.

     This method is used by next_feature(). It actually dispatches to
     one of the version-specific (private) methods.
      Example :
      Returns : void
      Args    : A Bio::SeqFeatureI implementing object to be initialized
     The GFF-formatted string to initialize it from

_from_gff1_string
-----------------

     Title   : _from_gff1_string
     Usage   :
     Function:
     Example :
     Returns : void
     Args    : A Bio::SeqFeatureI implementing object to be initialized
               The GFF-formatted string to initialize it from

_from_gff2_string
-----------------

     Title   : _from_gff2_string
     Usage   :
     Function:
     Example :
     Returns : void
     Args    : A Bio::SeqFeatureI implementing object to be initialized
               The GFF2-formatted string to initialize it from

write_feature
-------------

     Title   : write_feature
     Usage   : $gffio->write_feature($feature);
     Function: Writes the specified SeqFeatureI object in GFF format to the stream
               associated with this instance.
     Example :
     Returns :
     Args    : A Bio::SeqFeatureI implementing object to be serialized

gff_string
----------

     Title   : gff_string
     Usage   : $gffstr = $gffio->gff_string($feature);
     Function: Obtain the GFF-formatted representation of a SeqFeatureI object.
               The formatting depends on the version specified at initialization.

     This method is used by write_feature(). It actually dispatches to
     one of the version-specific (private) methods.
      Example :
      Returns : A GFF-formatted string representation of the SeqFeature
      Args    : A Bio::SeqFeatureI implementing object to be GFF-stringified

_gff1_string
------------

     Title   : _gff1_string
     Usage   : $gffstr = $gffio->_gff1_string
     Function:
     Example :
     Returns : A GFF1-formatted string representation of the SeqFeature
     Args    : A Bio::SeqFeatureI implementing object to be GFF-stringified

_gff2_string
------------

     Title   : _gff2_string
     Usage   : $gffstr = $gffio->_gff2_string
     Function:
     Example :
     Returns : A GFF2-formatted string representation of the SeqFeature
     Args    : A Bio::SeqFeatureI implementing object to be GFF2-stringified

gff_version
-----------

     Title   : _gff_version
     Usage   : $gffversion = $gffio->gff_version
     Function:
     Example :
     Returns : The GFF version this parser will accept and emit.
     Args    : none


File: pm.info,  Node: Bio/Tools/Genscan,  Next: Bio/Tools/HMMER/Domain,  Prev: Bio/Tools/GFF,  Up: Module List

Results of one Genscan run
**************************

NAME
====

   Bio::Tools::Genscan - Results of one Genscan run

SYNOPSIS
========

     $genscan = Bio::Tools::Genscan->new(-file => 'result.genscan');
     # filehandle:
     $genscan = Bio::Tools::Genscan->new( -fh  => \*INPUT );

     # parse the results
     # note: this class is-a Bio::Tools::AnalysisResult which implements
     # Bio::SeqAnalysisParserI, i.e., $genscan->next_feature() is the same
     while($gene = $genscan->next_prediction()) {
         # $gene is an instance of Bio::Tools::Prediction::Gene, which inherits
         # off Bio::SeqFeature::Gene::Transcript.
         #
         # $gene->exons() returns an array of
         # Bio::Tools::Prediction::Exon objects
         # all exons:
         @exon_arr = $gene->exons();

     # initial exons only
     @init_exons = $gene->exons('Initial');
     # internal exons only
     @intrl_exons = $gene->exons('Internal');
     # terminal exons only
     @term_exons = $gene->exons('Terminal');
     # singleton exons:
     ($single_exon) = $gene->exons();
        }

     # essential if you gave a filename at initialization (otherwise the file
     # will stay open)
     $genscan->close();

DESCRIPTION
===========

   The Genscan module provides a parser for Genscan gene structure
prediction output. It parses one gene prediction into a
Bio::SeqFeature::Gene::Transcript- derived object.

   This module also implements the Bio::SeqAnalysisParserI interface, and
thus can be used wherever such an object fits. See *Note
Bio/SeqAnalysisParserI: Bio/SeqAnalysisParserI,.

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Hilmar Lapp
====================

   Email hlapp@gmx.net

   Describe contact details here

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

analysis_method
---------------

     Usage     : $genscan->analysis_method();
     Purpose   : Inherited method. Overridden to ensure that the name matches
                 /genscan/i.
     Returns   : String
     Argument  : n/a

next_feature
------------

     Title   : next_feature
     Usage   : while($gene = $genscan->next_feature()) {
                      # do something
               }
     Function: Returns the next gene structure prediction of the Genscan result
               file. Call this method repeatedly until FALSE is returned.

     The returned object is actually a SeqFeatureI implementing object.
     This method is required for classes implementing the
     SeqAnalysisParserI interface, and is merely an alias for
     next_prediction() at present.

     Example :
     Returns : A Bio::Tools::Prediction::Gene object.
     Args    :

next_prediction
---------------

     Title   : next_prediction
     Usage   : while($gene = $genscan->next_prediction()) {
                      # do something
               }
     Function: Returns the next gene structure prediction of the Genscan result
               file. Call this method repeatedly until FALSE is returned.

     Example :
     Returns : A Bio::Tools::Prediction::Gene object.
     Args    :

_parse_predictions
------------------

     Title   : _parse_predictions()
     Usage   : $obj->_parse_predictions()
     Function: Parses the prediction section. Automatically called by
               next_prediction() if not yet done.
     Example :
     Returns :

_prediction
-----------

     Title   : _prediction()
     Usage   : $gene = $obj->_prediction()
     Function: internal
     Example :
     Returns :

_add_prediction
---------------

     Title   : _add_prediction()
     Usage   : $obj->_add_prediction($gene)
     Function: internal
     Example :
     Returns :

_predictions_parsed
-------------------

     Title   : _predictions_parsed
     Usage   : $obj->_predictions_parsed
     Function: internal
     Example :
     Returns : TRUE or FALSE

_has_cds
--------

     Title   : _has_cds()
     Usage   : $obj->_has_cds()
     Function: Whether or not the result contains the predicted CDSs, too.
     Example :
     Returns : TRUE or FALSE

_read_fasta_seq
---------------

     Title   : _read_fasta_seq()
     Usage   : ($id,$seqstr) = $obj->_read_fasta_seq();
     Function: Simple but specialised FASTA format sequence reader. Uses
               $self->_readline() to retrieve input, and is able to strip off
               the traling description lines.
     Example :
     Returns : An array of two elements.


File: pm.info,  Node: Bio/Tools/HMMER/Domain,  Next: Bio/Tools/HMMER/Results,  Prev: Bio/Tools/Genscan,  Up: Module List

One particular domain hit from HMMER
************************************

NAME
====

   Bio::Tools::HMMER::Domain - One particular domain hit from HMMER

SYNOPSIS
========

   Read the Bio::Tools::HMMER::Results docs

DESCRIPTION
===========

   A particular domain score. We reuse the Homol SeqFeature system here,
so this inherits off Homol SeqFeature. As this code originally came from a
separate project, there are some backward compatibility stuff provided to
keep this working with old code.

   Don\'t forget this inherits off Bio::SeqFeature, so all your usual nice
start/end/score stuff is ready for use.

CONTACT
=======

   Describe contact details here

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

add_alignment_line
------------------

     Title   : add_alignment_line
     Usage   : $domain->add_alignment_line($line_from_hmmer_output);
     Function: add an alignment line to this Domain object
     Returns : Nothing
     Args    : scalar

     Adds an alignment line, mainly for storing the HMMER alignments
     as flat text which can be reguritated. You\'re right. This is *not
     nice* and not the right way to do it.  C\'est la vie.

each_alignment_line
-------------------

     Title   : each_alignment_line
     Usage   : foreach $line ( $domain->each_alignment_line )
     Function: reguritates the alignment lines as they were fed in.
               only useful realistically for printing.
     Example :
     Returns :
     Args    : None

get_nse
-------

     Title   : get_nse
     Usage   : $domain->get_nse()
     Function: Provides a seqname/start-end format, useful
               for unique keys. nse stands for name-start-end
               It is used alot in Pfam
     Example :
     Returns : A string
     Args    : Optional seperator 1 and seperator 2 (default / and -)

hmmacc
------

     Title   : hmmacc
     Usage   : $domain->hmmacc($newacc)
     Function: set get for HMM accession number. This is placed in the homol
               feature of the HMM
     Example :
     Returns :
     Args    :

hmmname
-------

     Title   : hmmname
     Usage   : $domain->hmmname($newname)
     Function: set get for HMM accession number. This is placed in the homol
               feature of the HMM
     Example :
     Returns :
     Args    :

bits
----

     Title   : bits
     Usage   :
     Function: backward compatibility. Same as score
     Example :
     Returns :
     Args    :

evalue
------

     Title   : evalue
     Usage   :
     Function: $domain->evalue($value);
     Example :
     Returns :
     Args    :

seqbits
-------

     Title   : seqbits
     Usage   :
     Function: $domain->seqbits($value);
     Example :
     Returns :
     Args    :

seq_range
---------

     Title   : seq_range
     Usage   :
     Function: Throws an exception to catch scripts which need to upgrade
     Example :
     Returns :
     Args    :

hmm_range
---------

     Title   : hmm_range
     Usage   :
     Function: Throws an exception to catch scripts which need to upgrade
     Example :
     Returns :
     Args    :


File: pm.info,  Node: Bio/Tools/HMMER/Results,  Next: Bio/Tools/HMMER/Set,  Prev: Bio/Tools/HMMER/Domain,  Up: Module List

Object representing HMMER output results
****************************************

NAME
====

   Bio::Tools::HMMER::Results - Object representing HMMER output results

SYNOPSIS
========

     # parse a hmmsearch file (can also parse a hmmpfam file)
     $res = new Bio::Tools::HMMER::Results( -file => 'output.hmm' , -type => 'hmmsearch');

     # print out the results for each sequence
     foreach $seq ( $res->each_Set ) {
         print "Sequence bit score is",$seq->bits,"\n";
         foreach $domain ( $seq->each_Domain ) {
             print " Domain start ",$domain->start," end ",$domain->end,
     	   " score ",$domain->bits,"\n";
         }
     }

     # new result object on a sequence/domain cutoff of 25 bits sequence, 15 bits domain
     $newresult = $res->filter_on_cutoff(25,15);

     # alternative way of getting out all domains directly
     foreach $domain ( $res->each_Domain ) {
         print "Domain on ",$domain->seqname," with score ",
         $domain->bits," evalue ",$domain->evalue,"\n";
     }

DESCRIPTION
===========

   This object represents HMMER output, either from hmmsearch or hmmpfam.
For hmmsearch, a series of HMMER::Set objects are made, one for each
sequence, which have the the bits score for the object. For hmmpfam
searches, only one Set object is made.

   These objects come from the original HMMResults modules used internally
in Pfam, written by Ewan. Ewan then converted them to bioperl objects in
1999. That conversion is meant to be backwardly compatible, but may not be
(caveat emptor).

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org                - General discussion
     http://www.bioperl.org/MailList.html - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://www.bioperl.org/bioperl-bugs/

AUTHOR - Ewan Birney
====================

   Email birney@sanger.ac.uk

   Describe contact details here

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

next_feature
------------

     Title   : next_feature
     Usage   : while( my $feat = $res->next_feature ) { # do something }
     Function: SeqAnalysisParserI implementing function
     Example :
     Returns : A Bio::SeqFeatureI compliant object, in this case,
               each DomainUnit object, ie, flattening the Sequence
               aspect of this.
     Args    : None

number
------

     Title   : number
     Usage   : print "There are ",$res->number," domains hit\n";
     Function: provides the number of domains in the HMMER report

add_Domain
----------

     Title   : add_Domain
     Usage   : $res->add_Domain($unit)
     Function: adds a domain to the results array. Mainly used internally.
     Args    : A Bio::Tools::HMMER::Domain

each_Domain
-----------

     Title   : each_Domain
     Usage   : foreach $domain ( $res->each_Domain() )
     Function: array of Domain units which are held in this report
     Returns : array
     Args    : none

domain_bits_cutoff_from_evalue
------------------------------

     Title   : domain_bits_cutoff_from_evalue
     Usage   : $cutoff = domain_bits_cutoff_from_evalue(0.01);
     Function: return a bits cutoff from an evalue using the
               scores here. Somewhat interesting logic:
                Find the two bit score which straddle the evalue
                if( 25 is between these two points) return 25
                else return the midpoint.

     This logic tries to ensure that with large signal to
     noise separation one still has sensible 25 bit cutoff
      Returns :
      Args    :

write_FT_output
---------------

     Title   : write_FT_output
     Usage   : $res->write_FT_output(\*STDOUT,'DOMAIN')
     Function: writes feature table output ala swissprot
     Returns :
     Args    :

filter_on_cutoff
----------------

     Title   : filter_on_cutoff
     Usage   : $newresults = $results->filter_on_cutoff(25,15);
     Function: Produces a new HMMER::Results module which has
               been trimmed at the cutoff.
     Returns : a Bio::Tools::HMMER::Results module
     Args    : sequence cutoff and domain cutoff. in bits score
               if you want one cutoff, simply use same number both places

write_ascii_out
---------------

     Title   : write_ascii_out
     Usage   : $res->write_ascii_out(\*STDOUT)
     Function: writes as
               seq seq_start seq_end model-acc model_start model_end model_name
     Returns :
     Args    :

     FIXME: Now that we have no modelacc, this is probably a bad thing.

write_GDF_bits
--------------

     Title   : write_GDF_bits
     Usage   : $res->write_GDF_bits(25,15,\*STDOUT)
     Function: writes GDF format with a sequence,domain threshold
     Returns :
     Args    :

add_Set
-------

     Title   : add_Set
     Usage   : Mainly internal function
     Function:
     Returns :
     Args    :

each_Set
--------

     Title   : each_Set
     Usage   :
     Function:
     Returns :
     Args    :

get_Set
-------

     Title   : get_Set
     Usage   : $set = $res->get_Set('sequence-name');
     Function: returns the Set for a particular sequence
     Returns : a HMMER::Set object
     Args    : name of the sequence

_parse_hmmpfam
--------------

     Title   : _parse_hmmpfam
     Usage   : $res->_parse_hmmpfam($filehandle)
     Function:
     Returns :
     Args    :

_parse_hmmsearch
----------------

     Title   : _parse_hmmsearch
     Usage   : $res->_parse_hmmsearch($filehandle)
     Function:
     Returns :
     Args    :