This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: Bio/Tools/AlignFactory,  Next: Bio/Tools/AnalysisResult,  Prev: Bio/Tk/SeqCanvas,  Up: Module List

Base object for alignment factories
***********************************

NAME
====

   Bio::Tools::AlignFactory - Base object for alignment factories

SYNOPSIS
========

   You wont be using this as an object, but using a dervied class like
Bio::Tools::pSW

DESCRIPTION
===========

   Holds common Alignment Factory attributes in place

CONTACT
=======

   http://bio.perl.org/ or birney@sanger.ac.uk

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

kbyte
-----

     Title     : kbyte()
     Usage     : set/gets the amount of memory able to be used
     Function  :
               : $factory->kbyte(200);
               :
     Returns   :
     Argument  : memory in kilobytes

report
------

     Title     : report()
     Usage     : set/gets the report boolean to issue reports or not
     Function  :
               : $factory->report(1); # reporting goes on
               :
     Returns   : n/a
     Argument  : 1 or 0

set_memory_and_report
---------------------

     Title   : set_memory_and_report
     Usage   : Only used by subclasses.
     Function:
     Example :
     Returns :
     Args    :


File: pm.info,  Node: Bio/Tools/AnalysisResult,  Next: Bio/Tools/BPbl2seq,  Prev: Bio/Tools/AlignFactory,  Up: Module List

Base class for analysis result objects and parsers
**************************************************

NAME
====

   Bio::Tools::AnalysisResult - Base class for analysis result objects and
parsers

SYNOPSIS
========

     # obtain a AnalysisResult derived object somehow
     print "Method ", $result->analysis_method(),
           ", version " $result->analysis_method_version(),
           ", performed on ", $result->analysis_date(), "\n";
     # annotate a sequence utilizing SeqAnalysisParserI methods
     while($feat = $result->next_feature()) {
     	$seq->add_SeqFeature($feat);
     }
     $result->close();
     # query object, e.g. a Bio::SeqI implementing object
     $queryseq = $result->analysis_query();
     # Subject of the analysis -- may be undefined. Refer to derived module
     # to find out what is returned.
     $subject = $result->analysis_subject();

DESCRIPTION
===========

   The AnalysisResult module is supposed to be the base class for modules
encapsulating parsers and interpreters for the result of a analysis that
was carried out with a query sequence.

   The notion of an analysis represented by this base class is that of a
unary or binary operator, taking either one query or a query and a subject
and producing a result. The query is e.g. a sequence, and a subject is
either a sequence, too, or a database of sequences.

   This module also implements the Bio::SeqAnalysisParserI interface, and
thus can be used wherever such an object fits. See *Note
Bio/SeqAnalysisParserI: Bio/SeqAnalysisParserI,.  Developers will find a
ready-to-use `parse' in this node() method, but need to implement
next_feature() in an inheriting class. Support for initialization with
input file names and reading from streams is also ready to use.

   Note that this module does not provide support for running an analysis.
Rather, it is positioned in the subsequent parsing step (concerned with
turning raw results into BioPerl objects).

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Hilmar Lapp
====================

   Email hlapp@gmx.net

   Describe contact details here

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

_initialize_state
-----------------

     Title   : _initialize_state
     Usage   : n/a; usually called by _initialize()
     Function: This method is for BioPerl B<developers> only, as indicated by the
               leading underscore in its name.

     Performs initialization or reset of the state of this object. The
     difference to L<_initialize>() is that it may be called at any time,
     and repeatedly within the lifetime of this object. B<Note, however,
     that this is potentially dangerous in a multi-threading
     environment. In general, calling this method twice is discouraged
     for this reason.

     This method is supposed to reset the state such that any 'history'
     is lost. State information that does not change during object
     lifetime is not considered as history, e.g. parent, name, etc shall
     not be reset. An inheriting object should only be concerned with
     state information it introduces itself, and for everything else
     call SUPER::_initialize_state(@args).

     An example is parsing an input file: a state reset implies
     discarding any unread input, and the actual input itself, followed
     by setting the new input.

     The argument syntax is the same as for L<new>() and L<_initialize>(),
     i.e., named parameters following the -name=>$value convention.
     The following parameters are dealt with by the implementation
     provided here:
        -INPUT, -FH, -FILE
     (tags are case-insensitive).
      Example :
      Returns :
      Args    :

analysis_query
--------------

     Usage     : $query_obj = $result->analysis_query();
     Purpose   : Set/Get the name of the query used to generate the result, that
                 is, the entity on which the analysis was performed. Will mostly
                 be a sequence object (Bio::PrimarySeq compatible).
     Argument  :
     Returns   : The object set before. Mostly a Bio::PrimarySeq compatible object.

analysis_subject
----------------

     Usage     : $result->analyis_subject();
     Purpose   : Set/Get the subject of the analysis against which it was
                 performed. For similarity searches it will probably be a database,
                 and for sequence feature predictions (exons, promoters, etc) it
                 may be a collection of models or homologous sequences that were
                 used, or undefined.
     Returns   : The object that was set before, or undef.
     Argument  :

analysis_date
-------------

     Usage     : $result->analysis_date();
     Purpose   : Set/Get the date on which the analysis was performed.
     Returns   : String
     Argument  :
     Comments  :

analysis_method
---------------

     Usage     : $result->analysis_method();
     Purpose   : Set/Get the name of the sequence analysis method that was used
                 to produce this result (BLASTP, FASTA, etc.). May also be the
                 actual name of a program.
     Returns   : String
     Argument  : n/a

analysis_method_version
-----------------------

     Usage     : $result->analysis_method_version();
     Purpose   : Set/Get the version string of the analysis program.
               : (e.g., 1.4.9MP, 2.0a19MP-WashU).
     Returns   : String
     Argument  : n/a


File: pm.info,  Node: Bio/Tools/BPbl2seq,  Next: Bio/Tools/BPlite,  Prev: Bio/Tools/AnalysisResult,  Up: Module List

Lightweight BLAST parser for pair-wise sequence  alignment using the BLAST algorithm.
*************************************************************************************

NAME
====

   Bio::Tools::BPbl2seq - Lightweight BLAST parser for pair-wise sequence
alignment using the BLAST algorithm.

SYNOPSIS
========

     use Bio::Tools::BPbl2seq;
     my $report = Bio::Tools::BPbl2seq->new(-file => 't/bl2seq.out');
     $report->query;
     $report->score;
     $report->bits;
     $report->percent;
     $report->P;
     $report->match;
     $report->positive;
     $report->length;
     $report->querySeq;
     $report->sbjctSeq;
     $report->homologySeq;
     $report->subject->start;
     $report->subject->end;
     $report->subject->seqname;

DESCRIPTION
===========

   BPbl2seq is a package for parsing BLAST bl2seq reports. BLAST bl2seq is
a program for comparing and aligning two sequences using BLAST.  Although
the report format is similar to that of a conventional BLAST, there are a
few differences so that the standard bioperl BLAST parsers Blast.pm and
BPlite are unable to read bl2seq reports directly.

   From the user's perspective, the main difference between bl2seq and
other blast reports is that the bl2seq report does not print out the name
of the first of the two aligned sequences.  (The second sequence name is
given in the report as the name of the "hit").  Consequently, BPbl2seq has
no way of identifying the name of the initial sequence unless it is passed
to constructor as a second argument as in:

     my $report = Bio::Tools::BPbl2seq->new(\*FH, "ALEU_HORVU");

   If the name of the first sequence is not passed to BPbl2seq.pm in this
manner, the name of the first sequence will be left as "unknown".  (Note
that to preserve a common interface with the other BLAST programs the two
sequences being compared are referred to in bl2seq as "query" and
"subject" although this is perhaps a bit misleading when simply comparing
2 sequences as opposed to querying a database.)

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via  email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Peter Schattner
========================

   Email: schattner@alum.mit.edu

ACKNOWLEDGEMENTS
================

   Based on work of: Ian Korf (ikorf@sapiens.wustl.edu,
http://sapiens.wustl.edu/~ikorf), Lorenz Pollak (lorenz@ist.org, bioperl
port)

COPYRIGHT
=========

   BPlite.pm is copyright (C) 1999 by Ian Korf.

DISCLAIMER
==========

   This software is provided "as is" without warranty of any kind.

new
---

     Title   : new
     Function: Create a new Bio::Tools::BPbl2seq object
     Returns : Bio::Tools::BPbl2seq
     Args    : -file     input file (alternative to -fh)
               -fh       input stream (alternative to -file)
               -queryname    name of query sequence

P
-

     Title    : P
     Usage    : $hsp->P();
     Function : returns the P (significance) value for a HSP
     Example  :
     Returns  : (double) significance value
     Args     :

percent
-------

     Title    : percent
     Usage    : $hsp->percent();
     Function : returns the percent matching
     Example  :
     Returns  : (double) percent matching
     Args     :

match
-----

     Title    : match
     Usage    : $hsp->match();
     Function : returns the match
     Example  :
     Returns  : (double) frac_identical
     Args     :

positive
--------

     Title    : positive
     Usage    : $hsp->positive();
     Function : returns the number of positive hits
     Example  :
     Returns  : (int) number of positive residue hits
     Args     :

querySeq
--------

     Title    : querySeq
     Usage    : $hsp->querySeq();
     Function : returns the query sequence
     Example  :
     Returns  : (string) the Query Sequence
     Args     :

sbjctSeq
--------

     Title    : sbjctSeq
     Usage    : $hsp->sbjctSeq();
     Function : returns the Sbjct sequence
     Example  :
     Returns  : (string) the Sbjct Sequence
     Args     :

homologySeq
-----------

     Title    : homologySeq
     Usage    : $hsp->homologySeq();
     Function : returns the homologous sequence
     Example  :
     Returns  : (string) homologous sequence
     Args     :

qs
--

     Title    : qs
     Usage    : $hsp->qs();
     Function : returns the Query Sequence (same as querySeq)
     Example  :
     Returns  : (string) query Sequence
     Args     :

ss
--

     Title    : ss
     Usage    : $hsp->ss();
     Function : returns the subject sequence ( same as sbjctSeq)
     Example  :
     Returns  : (string) Sbjct Sequence
     Args     :

hs
--

     Title    : hs
     Usage    : $hsp->hs();
     Function : returns the Homologous Sequence (same as homologySeq )
     Example  :
     Returns  : (string) Homologous Sequence
     Args     :


File: pm.info,  Node: Bio/Tools/BPlite,  Next: Bio/Tools/BPlite/HSP,  Prev: Bio/Tools/BPbl2seq,  Up: Module List

Lightweight BLAST parser
************************

NAME
====

   Bio::Tools::BPlite - Lightweight BLAST parser

SYNOPSIS
========

     use Bio::Tools::BPlite;
     my $report = new BPlite(-fh=>\*STDIN);
     $report->query;
     $report->database;
     while(my $sbjct = $report->nextSbjct) {
         $sbjct->name;
         while (my $hsp = $sbjct->nextHSP) {
             $hsp->score;
             $hsp->bits;
             $hsp->percent;
             $hsp->P;
             $hsp->match;
             $hsp->positive;
             $hsp->length;
     	 $hsp->querySeq;
     	 $hsp->sbjctSeq;
     	 $hsp->homologySeq;
     	 $hsp->query->start;
     	 $hsp->query->end;
     	 $hsp->sbjct->start;
     	 $hsp->sbjct->end;
     	 $hsp->sbjct->seqname;
     	 $hsp->sbjct->overlaps($exon);
         }
     }

DESCRIPTION
===========

   BPlite is a package for parsing BLAST reports. The BLAST programs are a
family of widely used algorithms for sequence database searches. The
reports are non-trivial to parse, and there are differences in the formats
of the various flavors of BLAST. BPlite parses BLASTN, BLASTP, BLASTX,
TBLASTN, and TBLASTX reports from both the high performance WU-BLAST, and
the more generic NCBI-BLAST.

   Many people have developed BLAST parsers (I myself have made at least
three).  BPlite is for those people who would rather not have a giant
object specification, but rather a simple handle to a BLAST report that
works well in pipes.

Object
------

   BPlite has three kinds of objects, the report, the subject, and the
HSP. To create a new report, you pass a filehandle reference to the BPlite
constructor.

     my $report = new BPlite(-fh=>\*STDIN); # or any other filehandle

   The report has two attributes (query and database), and one method
(nextSbjct).

     $report->query;     # access to the query name
     $report->database;  # access to the database name
     $report->nextSbjct; # gets the next subject
     while(my $sbjct = $report->nextSbjct) {
         # canonical form of use is in a while loop
     }

   A subject is a BLAST hit, which should not be confused with an HSP
(below). A BLAST hit may have several alignments associated with it. A
useful way of thinking about it is that a subject is a gene and HSPs are
the exons. Subjects have one attribute (name) and one method (nextHSP).

     $sbjct->name;    # access to the subject name
     "$sbjct";        # overloaded to return name
     $sbjct->nextHSP; # gets the next HSP from the sbjct
     while(my $hsp = $sbjct->nextHSP) {
         # canonical form is again a while loop
     }

   An HSP is a high scoring pair, or simply an alignment.  HSP objects
inherit all the useful methods from RangeI/SeqFeatureI/FeaturePair, but
provide an additional set of attributes (score, bits, percent, P, match,
positive, length, querySeq, sbjctSeq, homologySeq) that should be familiar
to anyone who has seen a blast report.

   For lazy/efficient coders, two-letter abbreviations are available for
the attributes with long names (qs, ss, hs). Ranges of the aligned
sequences in query/subject and other information (like seqname) are stored
in SeqFeature objects (i.e.: $hsp->query, $hsp->sbjct which is equal to
$hsp->feature1, $hsp->feature2). querySeq, sbjctSeq and homologySeq do only
contain the alignment sequences from the blast report.

     $hsp->score;
     $hsp->bits;
     $hsp->percent;
     $hsp->P;
     $hsp->match;
     $hsp->positive;
     $hsp->length;
     $hsp->querySeq;      $hsp->qs;
     $hsp->sbjctSeq;      $hsp->ss;
     $hsp->homologySeq;   $hsp->hs;
     $hsp->query->start;
     $hsp->query->end;
     $hsp->query->seqname;
     $hsp->sbjct->primary_tag; # "similarity"
     $hsp->sbjct->source_tag;  # "BLAST"
     $hsp->sbjct->start;
     $hsp->sbjct->end;
     ...
     "$hsp"; # overloaded for query->start..query->end bits

   I've included a little bit of overloading for double quote variable
interpolation convenience. A subject will return its name and an HSP will
return its query->start, query->end, and bits in the alignment. Feel free
to modify this to whatever is most frequently used by you.

   So a very simple look into a BLAST report might look like this.

     my $report = new BPlite(-fh=>\*STDIN);
     while(my $sbjct = $report->nextSbjct) {
         print "$sbjct\n";
         while(my $hsp = $sbjct->nextHSP) {
     	 	print "\t$hsp\n";
         }
     }

   The output of such code might look like this:

     >foo
         100..155 29.5
         268..300 20.1
     >bar
         100..153 28.5
         265..290 22.1

AUTHORS
=======

   Ian Korf (ikorf@sapiens.wustl.edu, http://sapiens.wustl.edu/~ikorf),
Lorenz Pollak (lorenz@ist.org, bioperl port)

ACKNOWLEDGEMENTS
================

   This software was developed at the Genome Sequencing Center at
Washington Univeristy, St. Louis, MO.

COPYRIGHT
=========

   Copyright (C) 1999 Ian Korf. All Rights Reserved.

DISCLAIMER
==========

   This software is provided "as is" without warranty of any kind.

new
---

     Title   : new
     Function: Create a new Bio::Tools::BPlite object
     Returns : Bio::Tools::BPlite
     Args    : -file     input file (alternative to -fh)
               -fh       input stream (alternative to -file)

next_feature
------------

     Title   : next_feature
     Usage   : while( my $feat = $res->next_feature ) { # do something }
     Function: SeqAnalysisParserI implementing function
     Example :
     Returns : A Bio::SeqFeatureI compliant object, in this case,
               each DomainUnit object, ie, flattening the Sequence
               aspect of this.
     Args    : None

query
-----

     Title    : query
     Usage    : $query = $obj->query();
     Function : returns the query object
     Example  :
     Returns  : query object
     Args     :

qlength
-------

     Title    : qlength
     Usage    : $len = $obj->qlength();
     Function : returns the length of the query
     Example  :
     Returns  : length of query
     Args     :

pattern
-------

     Title    : database
     Usage    : $pattern = $obj->pattern();
     Function : returns the pattern used in a PHIBLAST search

query_pattern_location
----------------------

     Title    : query_pattern_location
     Usage    : $qpl = $obj->query_pattern_location();
     Function : returns reference to array of locations in the query sequence
                of pattern used in a PHIBLAST search

database
--------

     Title    : database
     Usage    : $db = $obj->database();
     Function : returns the database used in this search
     Example  :
     Returns  : database used for search
     Args     :

nextSbjct
---------

     Title    : nextSbjct
     Usage    : $sbjct = $obj->nextSbjct();
     Function : Method of iterating through all the Sbjct retrieved
                from parsing the report
     Example  : while ( my $sbjct = $obj->nextSbjct ) {}
     Returns  : next Sbjct object or null if finished
     Args     :


File: pm.info,  Node: Bio/Tools/BPlite/HSP,  Next: Bio/Tools/BPlite/Iteration,  Prev: Bio/Tools/BPlite,  Up: Module List

P
-

     Title    : P
     Usage    : $hsp->P();
     Function : returns the P (significance) value for a HSP
     Example  :
     Returns  : (double) significance value
     Args     :

percent
-------

     Title    : percent
     Usage    : $hsp->percent();
     Function : returns the percent matching
     Returns  : (double) percent matching
     Args     : none

match
-----

     Title    : match
     Usage    : $hsp->match();
     Function : returns the match
     Example  :
     Returns  : (double) frac_identical
     Args     :

positive
--------

     Title    : positive
     Usage    : $hsp->positive();
     Function : returns the number of positive hits
     Returns  : (int) number of positive residue hits
     Args     : none

querySeq
--------

     Title    : querySeq
     Usage    : $hsp->querySeq();
     Function : returns the query sequence
     Returns  : (string) the Query Sequence
     Args     : none

sbjctSeq
--------

     Title    : sbjctSeq
     Usage    : $hsp->sbjctSeq();
     Function : returns the Sbjct sequence
     Returns  : (string) the Sbjct Sequence
     Args     : none

homologySeq
-----------

     Title    : homologySeq
     Usage    : $hsp->homologySeq();
     Function : returns the homologous sequence
     Returns  : (string) homologous sequence
     Args     : none

qs
--

     Title    : qs
     Usage    : $hsp->qs();
     Function : returns the Query Sequence (same as querySeq)
     Returns  : (string) query Sequence
     Args     : none

ss
--

     Title    : ss
     Usage    : $hsp->ss();
     Function : returns the subject sequence ( same as sbjctSeq)
     Returns  : (string) Sbjct Sequence
     Args     : none

hs
--

     Title    : hs
     Usage    : $hsp->hs();
     Function : returns the Homologous Sequence (same as homologySeq )
     Returns  : (string) Homologous Sequence
     Args     : none


File: pm.info,  Node: Bio/Tools/BPlite/Iteration,  Next: Bio/Tools/BPlite/Sbjct,  Prev: Bio/Tools/BPlite/HSP,  Up: Module List

object for parsing single iteration of a PSIBLAST report
********************************************************

NAME
====

   Bio::Tools:: BPlite::Iteration - object for parsing single iteration of
a PSIBLAST report

SYNOPSIS
========

     use Bio::Tools:: BPpsilite;

     open FH, "t/psiblastreport.out";
     $report = Bio::Tools::BPpsilite->new(-fh=>\*FH);

     # determine number of iterations executed by psiblast
     $total_iterations = $report->number_of_iterations;
     $last_iteration = $report->round($total_iterations);

     # Process only hits found in last iteration ...
     $oldhitarray_ref = $last_iteration->oldhits;
     HIT: while($sbjct = $last_iteration->nextSbjct) {
         $id = $sbjct->name;
         $is_old =  grep  /\Q$id\E/, @$oldhitarray_ref;
         if ($is_old ){next HIT;}
     #  do something with new hit...
     }

DESCRIPTION
===========

   See the documentation for BPpsilite.pm for a description of the
Iteration.pm module.

AUTHORS - Peter Schattner
=========================

   Email: schattner@alum.mit.edu

ACKNOWLEDGEMENTS
================

   Based on work of: Ian Korf (ikorf@sapiens.wustl.edu,
http://sapiens.wustl.edu/~ikorf), Lorenz Pollak (lorenz@ist.org, bioperl
port)

COPYRIGHT
=========

   BPlite.pm is copyright (C) 1999 by Ian Korf.

DISCLAIMER
==========

   This software is provided "as is" without warranty of any kind.

query
-----

     Title    : query
     Usage    : $query = $obj->query();
     Function : returns the query object
     Example  :
     Returns  : query object
     Args     :

qlength
-------

     Title    : qlength
     Usage    : $len = $obj->qlength();
     Returns  : length of query
     Args     : none

newhits
-------

     Title    :  newhits
     Usage    : $newhits = $obj->newhits();
     Returns  : reference to an array listing all the hits
                from the current iteration which were not identified
                in the previous iteration
     Args     : none

oldhits
-------

     Title    :  oldhits
     Usage    : $oldhits = $obj->oldhits();
     Returns  : reference to an array listing all the hits from
                the current iteration which were identified and
                above threshold in the previous iteration
     Args     : none

nextSbjct
---------

     Title    : nextSbjct
     Usage    : $sbjct = $obj->nextSbjct();
     Function : Method of iterating through all the Sbjct retrieved
                from parsing the report
     Example  : while ( my $sbjct = $obj->nextSbjct ) {}
     Returns  : next Sbjct object or undef if finished
     Args     :


File: pm.info,  Node: Bio/Tools/BPlite/Sbjct,  Next: Bio/Tools/BPpsilite,  Prev: Bio/Tools/BPlite/Iteration,  Up: Module List

name
----

     Title    : name
     Usage    : $name = $obj->name();
     Function : returns the name of the Sbjct
     Example  :
     Returns  : name of the Sbjct
     Args     :

nextFeaturePair
---------------

     Title    : nextFeaturePair
     Usage    : $name = $obj->nextFeaturePair();
     Function : same as the nextHSP function
     Example  :
     Returns  : next FeaturePair
     Args     :

nextHSP
-------

     Title    : nextHSP
     Usage    : $hsp = $obj->nextHSP();
     Function : returns the next available High Scoring Pair
     Example  :
     Returns  : Bio::Tools::HSP  or null if finished
     Args     :


File: pm.info,  Node: Bio/Tools/BPpsilite,  Next: Bio/Tools/Blast,  Prev: Bio/Tools/BPlite/Sbjct,  Up: Module List

Lightweight BLAST parser for (iterated) psiblast reports
********************************************************

NAME
====

   Bio::Tools:: BPpsilite - Lightweight BLAST parser for (iterated)
psiblast reports

SYNOPSIS
========

     use Bio::Tools::BPpsilite;
     open FH, "t/psiblastreport.out";
     $report = Bio::Tools::BPpsilite->new(-fh=>\*FH);

     # determine number of iterations executed by psiblast
     $total_iterations = $report->number_of_iterations;
     $last_iteration = $report->round($total_iterations);

     # Process only hits found in last iteration ...
      $oldhitarray_ref = $last_iteration->oldhits;
      HIT: while($sbjct = $last_iteration->nextSbjct) {
     	  $id = $sbjct->name;
     	  $is_old =  grep  /\Q$id\E/, @$oldhitarray_ref;
     	  if ($is_old ){next HIT;}
     #  do something with new hit...
     }

DESCRIPTION
===========

   BPpsilite is a package for parsing multiple iteration PSIBLAST reports.
It is based closely on Ian Korf's BPlite.pm module for parsing single
iteration BLAST reports (as modified by Lorenz Pollak).

   Two of the four basic objects of BPpsilite.pm are identical to the
corresponding objects in BPlite - the "HSP.pm" and "Sbjct.pm" objects.
This DESCRIPTION documents only the one new object, the "iteration", as
well as the additional methods that are implemented in BPpsilite that are
not in BPlite.  See the BPlite documentation for information on the
BPlite, SBJCT and HSP objects.

   The essential difference between PSIBLAST and the other BLAST programs
(in terms of report parsing) is that PSIBLAST performs multiple iterations
of the BLASTing of the database and the results of all of these iterations
are stored in a single PSIBLAST report.  (For general information on
PSIBLAST see the README.bla file in the standalone BLAST distribution and
references therein). PSIBLAST's use of multiple iterations imposes
additional demands on the report parser: * There are several iterations of
hits.  Many of those hits will be repeated in more than one iteration.
Often only the last iteration will be of interest.  * Each iteration will
list two different kinds of hits - repeated hits that were used in the
model and newly identified hits - which may need to be processed in
different manners * The total number of iterations performed is not
displayed in the report until (almost) the very end of the report.  (The
user can specify a maximum number of iterations for the PSIBLAST search,
but the program may perform fewer iterations if convergence is reached)

   BPpsilite addresses these issues by offering the following methods:

   * The total number of iteration used is given by the method
number_of_iterations as in:

     $total_iterations = $report->number_of_iterations;

   * Results from an arbitrary iteration round can be accessed by using
the 'round' method:

     $iteration3_report = $report->round(3);

   * The ids of the sequences which passed the significance threshold for
the first time in the "nth" iteration can be identified by using the
newhits method.  Previously identified hits are identified by using   the
oldhits method, as in:

     $oldhitarray_ref = $iteration3_report->oldhits;
     $newhitarray_ref = $iteration3_report->newhits;

   BPpsilite.pm should work equally well on reports generated by the
StandAloneBlast.pm local BLAST module as with reports generated by remote
psiblast searches. For examples of usage of BPpsilite.pm, the user is
referred to the BPpsilite.t script in the "t" directory.

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org              - General discussion
     http://bio.perl.org/MailList.html  - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Peter Schattner
========================

   Email: schattner@alum.mit.edu

ACKNOWLEDGEMENTS
================

   Based on work of: Ian Korf (ikorf@sapiens.wustl.edu,
http://sapiens.wustl.edu/~ikorf), Lorenz Pollak (lorenz@ist.org, bioperl
port)

COPYRIGHT
=========

   BPlite.pm is copyright (C) 1999 by Ian Korf.

DISCLAIMER
==========

   This software is provided "as is" without warranty of any kind.

query
-----

     Title    : query
     Usage    : $query = $obj->query();
     Function : returns the query object
     Returns  : query object
     Args     :

qlength
-------

     Title    : qlength
     Usage    : $len = $obj->qlength();
     Function : returns the length of the query
     Returns  : length of query
     Args     :

database
--------

     Title    : database
     Usage    : $db = $obj->database();
     Function : returns the database used in this search
     Returns  : database used for search
     Args     :

number_of_iterations
--------------------

     Title    : number_of_iterations
     Usage    : $total_iterations = $obj-> number_of_iterations();
     Function : returns the total number of iterations used in this search
     Returns  : total number of iterations used for search
     Args     : none

pattern
-------

     Title    : database
     Usage    : $pattern = $obj->pattern();
     Function : returns the pattern used in a PHIBLAST search

query_pattern_location
----------------------

     Title    : query_pattern_location
     Usage    : $qpl = $obj->query_pattern_location();
     Function : returns reference to array of locations in the query sequence
                of pattern used in a PHIBLAST search

round
-----

     Title    : round
     Usage    : $Iteration3 = $report->round(3);
     Function : Method of retrieving data from a specific iteration
     Example  :
     Returns  : reference to requested Iteration object or null if argument
     		is greater than total number of iterations
     Args     : number of the requested iteration

_preprocess
-----------

     Title    : _preprocess
     Usage    : internal routine, not called directly
     Function :  determines number of iterations in report and prepares
     	data so individual iterations canbe parsed in non-sequential order
     Example  :
     Returns  :  nothing. Sets TOTAL_ITERATION_NUMBER in object's hash
     Args     :  reference to calling object


File: pm.info,  Node: Bio/Tools/Blast,  Next: Bio/Tools/Blast/HSP,  Prev: Bio/Tools/BPpsilite,  Up: Module List

Bioperl BLAST sequence analysis object
**************************************

NAME
====

   Bio::Tools::Blast.pm - Bioperl BLAST sequence analysis object

SYNOPSIS
========

Parsing Blast reports
---------------------

   Parse an existing Blast report from file:

     use Bio::Tools::Blast;

     $blastObj = Bio::Tools::Blast->new( -file   => '/tmp/blast.out',
     					-parse  => 1,
     					-signif => '1e-10',
     					);

   Parse an existing Blast report from STDIN:

     $blastObj = Bio::Tools::Blast->new( -parse  => 1,
     					-signif => '1e-10',
     					);

   Then send a Blast report to your script via STDIN.

   Full parameters for parsing Blast reports.

     %blastParam = (
     		-run             => \%runParam,
     		-file            => '',
     		-parse           => 1,
     		-signif          => 1e-5,
     		-filt_func       => \&my_filter,
     		-min_len         => 15,
     		-check_all_hits  => 0,
     		-strict          => 0,
     		-stats           => 1,
     		-best            => 0,
     		-share           => 0,
     		-exec_func       => \&process_blast,
     		-save_array      => \@blast_objs,  # not used if -exce_func defined.
     	       );

   See `parse' in this node() for a description of parameters and see
`USAGE' in this node for more examples including how to parse streams
containing multiple Blast reports `Using the Static $Blast Object' in this
node.

   See `Memory Usage Issues' in this node for information about how to
make Blast parsing be more memory efficient.

Running Blast reports
---------------------

   Run a new Blast2 at NCBI and then parse it:

     %runParam = (
     		  -method   => 'remote',
     		  -prog     => 'blastp',
     		  -database => 'swissprot',
     		  -seqs     => [ $seq ],  # Bio::Seq.pm objects.
     		  );

     $blastObj = Bio::Tools::Blast->new( -run     => \%runParam,
     					-parse   => 1,
     					-signif  => '1e-10',
     					-strict  => 1,
     					);

   Full parameters for running Blasts at NCBI using Webblast.pm:

     %runParam = (
     	      -method   => 'remote',
     	      -prog     => 'blastp',
     	      -version  => 2,      # BLAST2
     	      -database =>'swissprot',
     	      -html     => 0,
     	      -seqs     => [ $seqObject ],  # Bio::Seq.pm object(s)
     	      -descr    => 250,
     	      -align    => 250,
     	      -expect   => 10,
     	      -gap      => 'on',
     	      -matrix   => 'PAM250',
     	      -email    => undef,  # don't send report via e-mail if parsing.
     	      -filter   => undef,  # use default
     	      -gap_c    => undef,  # use default
     	      -gap_e    => undef,  # use default
     	      -word     => undef,  # use default
     	      -min_len  => undef,  # use default
     	      );

   See `run' in this node() and `USAGE' in this node for more information
about running Blasts.

HTML-formatting Blast reports
-----------------------------

   Print an HTML-formatted version of a Blast report:

     use Bio::Tools::Blast qw(:obj);

     $Blast->to_html($filename);
     $Blast->to_html(-file   => $filename,
     		    -header => "<H1>Blast Results</H1>");
     $Blast->to_html(-file   => $filename,
     		    -out    => \@array);  # store output
     $Blast->to_html();  # use STDIN

   Results are sent directly to STDOUT unless an `-out => array_ref'
parameter is supplied. See `to_html' in this node() for details.

INSTALLATION
============

   This module is included with the central Bioperl distribution:

     http://bio.perl.org/Core/Latest
     ftp://bio.perl.org/pub/DIST

   Follow the installation instructions included in the README file.

DESCRIPTION
===========

   The Bio::Tools::Blast.pm module encapsulates data and methods for
running, parsing, and analyzing pre-existing BLAST reports. This module
defines an application programming interface (API) for working with Blast
reports. A Blast object is constructed from raw Blast output and
encapsulates the Blast results which can then be accessed via the
interface defined by the Blast object.

   The ways in which researchers use Blast data are many and varied. This
module attempts to be general and flexible enough to accommodate different
uses. The Blast module API is still at an early stage of evolution and I
expect it to continue to evolve as new uses for Blast data are developed.
Your `FEEDBACK' in this node is welcome.

   *FEATURES:*

   * Supports NCBI Blast1.x, Blast2.x, and WashU-Blast2.x, gapped and
     ungapped.

     Can parse HTML-formatted as well as non-HTML-formatted reports.

   * Launch new Blast analyses remotely or locally.

     Blast objects can be constructed directly from the results of the
     run. See `run' in this node().

   * Construct Blast objects from pre-existing files or from a new run.

     Build a Blast object from a single file or build multiple Blast
     objects from an input stream containing multiple reports. See `parse'
     in this node().

   * Add hypertext links from a BLAST report.

     See `to_html' in this node().

   * Generate sequence and sequence alignment objects from HSP sequences.

     If you have Bio::Seq.pm and Bio::UnivAln.pm installed on your system,
     they can be used for working with high-scoring segment pair (HSP)
     sequences in the Blast alignment.  (A new version of Bio::Seq.pm is
     included in the distribution, see `INSTALLATION' in this node).  For
     more information about them, see:

          http://bio.perl.org/Projects/Sequence/
          http://bio.perl.org/Projects/SeqAlign/

   A variety of different data can be extracted from the Blast report by
querying the Blast.pm object. Some basic examples are given in the `USAGE'
in this node section. For some working scripts, see the links provided in
the `DEMO SCRIPTS' in this node section.

   As a part of the incipient Bioperl framework, the Bio::Tools::Blast.pm
module inherits from *Bio::Tools::SeqAnal.pm*, which provides some generic
functionality for biological sequence analysis. See the documentation for
that module for details (`Links to related modules' in this node).

The BLAST Program
-----------------

   BLAST (Basic Local Alignment Search Tool) is a widely used algorithm
for performing rapid sequence similarity searches between a single DNA or
protein sequence and a large dataset of sequences.  BLAST analyses are
typically performed by dedicated remote servers, such as the ones at the
NCBI. Individual groups may also run the program on local machines.

   The Blast family includes 5 different programs:

     Query Seq        Database
                  ------------     ----------
      blastp  --  protein          protein
      blastn  --  nucleotide       nucleotide
      blastx  --  nucleotide*      protein
      tblastn --  protein          nucleotide*
      tblastx --  nucleotide*      nucleotide*

     * = dynamically translated in all reading frames, both strands

   See `References & Information about the BLAST program' in this node.

Versions Supported
------------------

   BLAST reports generated by different application front ends are similar
but not exactly the same. Blast reports are not intended to be exchange
formats, making parsing software susceptible to obsolescence. This module
aims to support BLAST reports generated by different implementations:

     Implementation    Latest version tested
     --------------    --------------------
     NCBI Blast1       1.4.11   [24-Nov-97]
     NCBI Blast2       2.0.8    [Jan-5-1999]
     WashU-BLAST2      2.0a19MP [05-Feb-1998]
     GCG               1.4.8    [1-Feb-95]

   Support for both gapped and ungapped versions is included. Currently,
there is only rudimentary support for PSI-BLAST in that these reports can
be parsed but there is no special treatment of separate iteration rounds
(they are all merged together).

References & Information about the BLAST program
------------------------------------------------

   *WEBSITES:*

     http://www.ncbi.nlm.nih.gov/BLAST/                 - Homepage at NCBI
     http://www.ncbi.nlm.nih.gov/BLAST/blast_help.html  - Help manual
     http://blast.wustl.edu/                            - WashU-Blast2

   *PUBLICATIONS:* (with PubMed links)

     Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. (1990).
     "Basic local alignment search tool", J Mol Biol 215: 403-410.


http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=2231712&form=6&db=m&Dopt=r

     Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
     Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997).
     "Gapped BLAST and PSI-BLAST: a new generation of protein database
     search programs", Nucleic Acids Res. 25:3389-3402.


http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9254694&form=6&db=m&Dopt=r

     Karlin, Samuel and Stephen F. Altschul (1990).  Methods  for
     assessing the statistical significance of molecular sequence
     features by using general scoring schemes. Proc. Natl. Acad.
     Sci. USA 87:2264-68.


http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=2315319&form=6&db=m&Dopt=b

     Karlin, Samuel and Stephen F. Altschul (1993).  Applications
     and statistics for multiple high-scoring segments in molecu-
     lar sequences. Proc. Natl. Acad. Sci. USA 90:5873-7.


http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8390686&form=6&db=m&Dopt=b

USAGE
=====

Creating Blast objects
----------------------

   A Blast object can be constructed from the contents of a Blast report
using a set of named parameters that specify significance criteria for
parsing.  The report data can be read in from an existing file specified
with the `-file => 'filename'' parameter or from a STDIN stream containing
potentially multiple Blast reports. If the `-file' parameter does not
contain a valid filename, STDIN will be used. Separate Blast objects will
be created for each report in the stream.

   To parse the report, you must include a `-parse => 1' parameter in
addition to any other parsing parameters See `parse' in this node() for a
full description of parsing parameters.  To run a new report and then
parse it, include a `-run => \%runParams' parameter containing a reference
to a hash that hold the parameters required by the `run' in this node()
method.

   The constructor for Blast objects is inherited from
Bio::Tools::SeqAnal.pm.  See the _initialize() method of that package for
general information relevant to creating Blast objects. (The new() method,
inherited from *Bio::Root::Object.pm*, calls _initialize(). See `Links to
related modules' in this node).

   The Blast object can read compressed (gzipped) Blast report files.
Compression/decompression uses the gzip or compress programs that are
standard on Unix systems and should not require special configuration. If
you can't or don't want to use gzip as the file compression tool, either
pre-uncompress your files before parsing with this module or modify
*Bio::Root::Utilities.pm* to your liking.

   Blast objects can be generated either by direct instantiation as in:

     use Bio::Tools::Blast;
     $blast = new Bio::Tools::Blast (%parameters);

Using the Static $Blast Object
------------------------------

     use Bio::Tools::Blast qw(:obj);

   This exports the static $Blast object into your namespace. "Static"
refers to the fact that it has class scope and there is one of these
created when you use this module. The static $Blast object is basically an
empty object that is provided for convenience and is also used for various
internal chores.

   It is exported by this module and can be used for parsing and running
reports as well as HTML-formatting without having to first create an empty
Blast object.

   Using the static $Blast object for parsing a STDIN stream of Blast
reports:

     use Bio::Tools::Blast qw(:obj);

     sub process_blast {
     	my $blastObj = shift;
     	print $blastObj->table();
     	$blastObj->destroy;
     }

     $Blast->parse( -parse     => 1,
     		   -signif    => '1e-10',
     		   -exec_func => \&process_blast,
     		   );

   Then pipe a stream of Blast reports into your script via STDIN.  For
each Blast report extracted from the input stream, the parser will
generate a new Blast object and pass it to the function specified by
`-exec_func'.  The `destroy' in this node() call tells Perl to free the
memory associated with the object, important if you are crunching through
many reports. This method is inherited from *Bio::Root::Object.pm* (see
`Links to related modules' in this node). See `parse' in this node() for a
full description of parameters and `DEMO SCRIPTS' in this node for
additional examples.

Running Blasts
--------------

   To run a Blast, create a new Blast object with a `-run => \%runParams'
parameter.  Remote Blasts are performed by including a `-method =>
'remote'' parameter; local Blasts are performed by including a `-method =>
'local'' parameter.  See `Running Blast reports' in this node as well as
the `DEMO SCRIPTS' in this node for examples.  Note that running local
Blasts is not yet supported, see below.

   Note that the `-seqs => [ $seqs ]' run parameter must contain a
reference to an array of *Bio::Seq.pm* objects (`Links to related modules'
in this node). Encapsulating the sequence in an object makes sequence
information much easier to handle as it can be supplied in a variety of
formats. Bio::Seq.pm is included with this distribution (`INSTALLATION' in
this node).

   Remote Blasts are implemented using the
*Bio::Tools::Blast::Run::Webblast.pm* module.  Local Blasts require that
you customize the *Bio::Tools::Blast::Run::LocalBlast.pm* module.  The
version of LocalBlast.pm included with this distribution provides the
basic framework for running local Blasts.  See `Links to related modules'
in this node.

Significance screening
----------------------

   A `-signif' parameter can be used to screen out all hits with P-values
(or Expect values) above a certain cutoff. For example, to exclude all
hits with Expect values above 1.0e-10: `-signif => 1e-10'. Providing a
`-signif' cutoff can speed up processing tremendously, since only a small
fraction of the report need be parsed. This is because the `-signif' value
is used to screen hits based on the data in the "Description" section of
the Blast report:

   For NCBI BLAST2 reports:

     Score     E
       Sequences producing significant alignments:                        (bits)  Value

     sp|P31376|YAB1_YEAST  HYPOTHETICAL 74.1 KD PROTEIN IN CYS3-MDM10...   957  0.0

   For BLAST1 or WashU-BLAST2 reports:

     Smallest
       Sum
                                                                     High  Probability
       Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

     PDB:3PRK_E Proteinase K complexed with inhibitor ...........   504  1.8e-50   1

   Thus, the `-signif' parameter will screen based on Expect values for
BLAST2 reports and based on P-values for BLAST1/WashU-BLAST2 reports.

   To screen based on other criteria, you can supply a `-filt_func'
parameter containing a function reference that takes a
*Bio::Tools::Sbjct.pm* object as an argument and returns a boolean, true
if the hit is to be screened out. See example below for `Screening hits
using arbitrary criteria' in this node.

Get the best hit.
-----------------

     $hit = $blastObj->hit;

   A "hit" is contained by a *Bio::Tools::Blast::Sbjct.pm* object.

Get the P-value or Expect value of the most significant hit.
------------------------------------------------------------

     $p = $blastObj->lowest_p;
     $e = $blastObj->lowest_expect;

   Alternatively:

     $p = $blastObj->hit->p;
     $e = $blastObj->hit->expect;

   Note that P-values are not reported in NCBI Blast2 reports.

Iterate through all the hits
----------------------------

     foreach $hit ($blastObj->hits) {
     	 printf "%s\t %.1e\t %d\t %.2f\t %d\n",
     	                  $hit->name, $hit->expect, $hit->num_hsps,
                          $hit->frac_identical, $hit->gaps;
     }

   Refer to the documentation for *Bio::Tools::Blast::Sbjct.pm* for other
ways to work with hit objects (`Links to related modules' in this node).

Screening hits using arbitrary criteria
---------------------------------------

     sub filter { $hit=shift;
     		 return ($hit->gaps == 0 and
     			 $hit->frac_conserved > 0.5); }

     $blastObj = Bio::Tools::Blast->new( -file      => '/tmp/blast.out',
     					 -parse     => 1,
     					 -filt_func => \&filter );

   While the Blast object is parsing the report, each hit checked by
calling &filter($hit). All hits that generate false return values from
&filter are screened out and will not be added to the Blast object.  Note
that the Blast object will normally stop parsing the report after the
first non-significant hit or the first hit that does not pass the filter
function. To force the Blast object to check all hits, include a
`-check_all_hits => 1'  parameter.  Refer to the documentation for
*Bio::Tools::Blast::Sbjct.pm* for other ways to work with hit objects.

Hit start, end coordinates.
          print $sbjct->start('query');
          print $sbjct->end('sbjct');

     In array context, you can get information for both query and sbjct
     with one call:

          ($qstart, $sstart) = $sbjct->start();
          ($qend, $send)     = $sbjct->end();

     For important information regarding coordinate information, see the
     `HSP start, end, and strand' in this node section below.  Also check
     out documentation for the start and end methods in
     *Bio::Tools::Blast::Sbjct.pm*, which explains what happens if there
     is more than one HSP.

Working with HSPs
-----------------

Iterate through all the HSPs of every hit
          foreach $hit ($blastObj->hits) {
          	 foreach $hsp ($hit->hsps) {
          	 printf "%.1e\t %d\t %.1f\t %.2f\t %.2f\t %d\t %d\n",
          	                  $hsp->expect, $hsp->score, $hsp->bits,
                               $hsp->frac_identical, $hsp->frac_conserved,
          	                  $hsp->gaps('query'), $hsp->gaps('sbjct');
          }

     Refer to the documentation for Bio::Tools::Blast::HSP.pm for other
     ways to work with hit objects (`Links to related modules' in this
     node).

Extract HSP sequence data as strings or sequence objects
     Get the first HSP of the first hit and the sequences of the query and
     sbjct as strings.

          $hsp = $blast_obj->hit->hsp;
          $query_seq = $hsp->seq_str('query');
          $hsp_seq = $hsp->seq_str('sbjct');

     Get the indices of identical and conserved positions in the HSP query
     seq.

          @query_iden_indices = $hsp->seq_inds('query', 'identical');
          @query_cons_indices = $hsp->seq_inds('query', 'conserved');

     Similarly for the sbjct sequence.

          @sbjct_iden_indices = $hsp->seq_inds('sbjct', 'identical');
          @sbjct_cons_indices = $hsp->seq_inds('sbjct', 'conserved');

          print "Query in Fasta format:\n", $hsp->seq('query')->layout('fasta');
          print "Sbjct in Fasta format:\n", $hsp->seq('sbjct')->layout('fasta');

     See the *Bio::Seq.pm* package for more information about using these
     sequence objects (`Links to related modules' in this node).

Create sequence alignment objects using HSP sequences
          $aln = $hsp->get_aln;
          print " consensus:\n", $aln->consensus();
          print $hsp->get_aln->layout('fasta');

          $ENV{READSEQ_DIR} = '/home/users/sac/bin/solaris';
          $ENV{READSEQ} = 'readseq';
          print $hsp->get_aln->layout('msf');

     MSF formated layout requires Don Gilbert's ReadSeq program (not
     included).  See the *Bio::UnivAln.pm* for more information about
     using these alignment objects (`Links to related modules' in this
     node)'.

HSP start, end, and strand
     To facilitate HSP processing, endpoint data for each HSP sequence are
     normalized so that *start is always less than end*. This affects
     TBLASTN and TBLASTX HSPs on the reverse complement or "Minus" strand.

     Some examples of obtaining start, end coordinates for HSP objects:

          print $hsp->start('query');
          print $hsp->end('sbjct');
          ($qstart, $sstart) = $hsp->start();
          ($qend, $send) = $hsp->end();

     Strandedness of the HSP can be assessed using the strand() method on
     the HSP object:

          print $hsp->strand('query');
          print $hsp->strand('sbjct');

     These will return 'Minus' or 'Plus'.  Or, to get strand information
     for both query and sbjct with a single call:

          ($qstrand, $sstrand) = $hsp->strand();

Report Generation
-----------------

Generate a tab-delimited table of all results.
          print $blastObj->table;
          print $blastObj->table(0);   # don't include hit descriptions.
          print $blastObj->table_tiled;

     The `table' in this node() method returns data for each HSP of each
     hit listed one per line. The `table_tiled' in this node() method
     returns data for each *hit, i.e., Sbjct* listed one per line; data
     from multiple HSPs are combined after tiling to reduce overlaps. See
     *Bio::Tools::Blast::Sbjct.pm* for more information about HSP tiling.
     These methods generate stereotypical, tab-delimited data for each hit
     of the Blast report. The output is suitable for importation into
     spreadsheets or database tables. Feel free to roll your own table
     function if you need a custom table.

     For either table method, descriptions of each hit can be included if a
     single, true argument is supplied (e.g., $blastObj->table(1)). The
     description will be added as the last field. This will significantly
     increase the size of the table. Labels for the table columns can be
     obtained with `table_labels' in this node() and `table_labels_tiled'
     in this node().

Print a summary of the Blast report
          $blastObj->display();
          $blastObj->display(-show=>'hits');

     `display' in this node() prints various statistics extracted from the
     Blast report such as database name, database size, matrix used, etc.
     The `display(-show=>'hits')' call prints a non-tab-delimited table
     attempting to line the data up into more readable columns. The output
     generated is similar to `table_tiled' in this node().

HTML-format an existing report
          use Bio::Tools::Blast qw(:obj);

          # Going straight from a non HTML report file to HTML output using
          # the static $Blast object exported by Bio::Tools::Blast.pm
          $Blast->to_html(-file   => '/usr/people/me/blast.output.txt',
          		     -header => qq|<H1>BLASTP Results</H1><A HREF="home.html">Home</A>|
          		     );

          # You can also use a specific Blast object created previously.
          $blastObj->to_html;

     `to_html' in this node() will send HTML output, line-by-line,
     directly to STDOUT unless an `-out => array_ref' parameter is
     supplied (e.g., `-out => \@array'), in which case the HTML will be
     stored in @array, one line per array element.  The direct outputting
     permits faster response time since Blast reports can be huge. The
     -header tag can contain a string containing any HTML that you want to
     appear at the top of the Blast report.

DEMO SCRIPTS
============

   Sample Scripts are included in the central bioperl distribution in the
'examples/blast/' directory (see `INSTALLATION' in this node). These are
also available at the following URLs (but it would be safer to use the
scripts included with the distribution).

Handy library for working with Bio::Tools::Blast.pm
---------------------------------------------------

     http://bio.perl.org/Core/Examples/blast/blast_config.pl

Parsing Blast reports one at a time.
------------------------------------

     http://bio.perl.org/Core/Examples/blast/parse_blast.pl
     http://bio.perl.org/Core/Examples/blast/parse_blast2.pl
     http://bio.perl.org/Core/Examples/blast/parse_positions.pl

Parsing sets of Blast reports.
------------------------------

     http://bio.perl.org/Core/Examples/blast/parse_blast.pl
     http://bio.perl.org/Core/Examples/blast/parse_multi.pl

     B<Warning:> See note about L<Memory Usage Issues>.

Running Blast analyses one at a time.
-------------------------------------

     http://bio.perl.org/Core/Examples/blast/run_blast_remote.pl

Running Blast analyses given a set of sequences.
------------------------------------------------

     http://bio.perl.org/Core/Examples/blast/blast_seq.pl

HTML-formatting Blast reports.
------------------------------

     http://bio.perl.org/Core/Examples/blast/html.pl

TECHNICAL DETAILS
=================

Blast Modes
-----------

   A BLAST object may be created using one of three different modes as
defined by the *Bio::Tools::SeqAnal.pm* package (See `Links to related
modules' in this node):

     -- parse - Load a BLAST report and parse it, storing parsed data in
        Blast.pm object.
     -- run      - Run the BLAST program to generate a new report.
     -- read     - Load a BLAST report into the Blast object without parsing.

   *Run mode support has recently been added*.  The module
*Bio::Tools::Blast::Run::Webblast.pm* is an modularized adaptation of the
webblast script by Alex Dong Li:

     http://www.genet.sickkids.on.ca/bioinfo_resources/software.html#webblast

   for running remote Blast analyses and saving the results locally.  Run
mode can be combined with a parse mode to generate a Blast report and then
build the Blast object from the parsed results of this report (see `run'
in this node() and `SYNOPSIS' in this node).

   In read mode, the BLAST report is read in by the Blast object but is
not parsed.  This could be used to internalize a Blast report but not
parse it for results (e.g., generating HTML formatted output).

Significant Hits
----------------

   This module permits the screening of hits on the basis of
user-specified criteria for significance. Currently, Blast reports can be
screened based on:

     CRITERIA                            PARAMETER       VALUE
     ----------------------------------  ---------      ----------------
       1) the best Expect (or P) value      -signif        float or sci-notation
       2) the length of the query sequence  -min_length    integer
       3) arbitrary criteria                -filt_func     function reference

   The parameters are used for construction of the BLAST object or when
running the `parse' in this node() method on the static $Blast object.  The
-SIGNIF value represents the number listed in the description section at
the top of the Blast report. For Blast2, this is an Expect value, for
Blast1 and WashU-Blast2, this is a P-value.  The idea behind the
`-filt_func' parameter is that the hit has to pass through a filter to be
considered significant. Refer to the documentation for
*Bio::Tools::Blast::Sbjct.pm* for ways to work with hit objects.

   Using a `-signif' parameter allows for the following:

Faster parsing.
     Each hit can be screened by examination of the description line alone
     without fully parsing the HSP alignment section.

Flexibility.
     The `-signif' tag provides a more semantic-free way to specify the
     value to be used as a basis for screening hits. Thus, `-signif' can
     be used for screening Blast1 or Blast2 reports. It is up to the user
     to understand whether `-signif' represents a P-value or an Expect
     value.

   Any hit not meeting the significance criteria will not be added to the
"hit list" of the BLAST object. Also, a BLAST object without any hits
meeting the significance criteria will throw an exception during object
construction (a fatal event).

Statistical Parameters
----------------------

   There are numerous parameters which define the behavior of the BLAST
program and which are useful for interpreting the search results. These
parameters are extracted from the Blast report:

     filter  --  for masking out low-complexity sequences or short repeats
     matrix  --  name of the substitution scoring matrix (e.g., BLOSUM62)
     E       --  Expect filter (screens out frequent scores)
     S       --  Cutoff score for segment pairs
     W       --  Word length
     T       --  Threshold score for word pairs
     Lambda, --  Karlin-Altschul "sum" statistical parameters dependent on
      K, H        sequence composition.
     G       --  Gap creation penalty.
     E       --  Gap extension penalty.

   These parameters are not always needed. Extraction may be turned off
explicitly by including a `-stats => 0' parameter during object
construction.  Support for all statistical parameters is not complete.

   For more about the meaning of parameters, check out the NCBI URLs given
above.

Module Organization
-------------------

   The modules that comprise this Bioperl Blast distribution are location
in the Bio:: hierarchy as shown in the diagram below.

     Bio/
      |
                    +--------------------------+
                    |                          |
               Bio::Tools                  Bio::Root
                    |                          |
         +----------------------+           Object.pm
         |          |           |
      SeqAnal.pm  Blast.pm    Blast/
                                |
                 +---------+---------+------------+
                 |         |         |            |
               Sbjct.pm   HSP.pm   HTML.pm       Run/
                      |
                +------------+
                |            |
           Webblast.pm   LocalBlast.pm

   Bio::Tools::Blast.pm is a concrete class that inherits from
*Bio::Tools::SeqAnal.pm* and relies on other modules for parsing and
managing BLAST data.  Worth mentioning about this hierarchy is the lack of
a "Parse.pm" module.  Since parsing is considered central to the purpose
of the Bioperl Blast module (and Bioperl in general), it seems somewhat
unnatural to segregate out all parsing code. This segregation could also
lead to inefficiencies and harder to maintain code. I consider this issue
still open for debate.

   Bio::Tools::Blast.pm, *Bio::Tools::Blast::Sbjct.pm*, and
Bio::Tools::Blast::HSP.pm are mostly dedicated to parsing and all can be
used to instantiate objects.  Blast.pm is the main "command and control"
module, inheriting some basic behaviors from SeqAnal.pm (things that are
not specific to Blast *per se*).

   *Bio::Tools::Blast::HTML.pm* contains functions dedicated to generating
HTML-formatted Blast reports and does not generate objects.

Running Blasts: Details
-----------------------

   *Bio::Tools::Blast::Run::Webblast.pm* contains a set of functions for
running Blast analyses at a remote server and also does not instantiate
objects.  It uses a helper script called postclient.pl, located in the Run
directory.  The proposed LocalBlast.pm module would be used for running
Blast reports on local machines and thus would be customizable for
different sites. It would operate in a parallel fashion to Webblast.pm
(i.e., being a collection of functions, taking in sequence objects or
files, returning result files).

   The Run modules are considered experimental. In particular, Webblast.pm
catures an HTML-formatted version of the Blast report from the NCBI server
and strips out the HTML in preparation for parsing. A more direct approach
would be to capture the Blast results directly from the server using an
interface to the NCBI toolkit.  This approach was recently proposed on the
Bioperl mailing list:
http://www.uni-bielefeld.de/mailinglists/BCD/vsns-bcd-perl/9805/0000.html

Memory Usage Issues
-------------------

   Parsing large numbers of Blast reports (a few thousand or so) with
Bio::Tools::Blast.pm may lead to unacceptable memory usage situations.
This is somewhat dependent of the size and complexity of the reports.

   While this problem is under investigation, here are some workarounds
that fix the memory usage problem:

  1. Don't specify a -signif criterion when calling `parse' in this node().
     The `-signif' value is used for imposing a upper limit to the expect-
     or P-value for Blast hits to be parsed. For reasons that are still
     under investigation, specifying a value for `-signif' in the `parse'
     in this node() method prevents Blast objects from being fully garbage
     collected. When using the *parse_blast.pl* or *parse_multi.pl*
     scripts in `examples/blast/' of the bioperl distribution), don't
     supply a `-signif' command-line parameter.

  2. If you want to impose a -signif criterion, put it inside a -filt_func.
     For the `parse' in this node() method, a -signif => 1e-5 parameter is
     equivalent to using a filter function parameter of

          -filt_func => sub { my $hit = shift; return $hit->signif <= 1e-5; }

     Using the *examples/blast/parse_multi.pl* script, you can supply a
     command-line argument of

          -filt_func '$hit->signif <= 1e-5'

     For more information, see `parse' in this node() and the section
     `Screening hits using arbitrary criteria' in this node.


TODO
====

   * Develop a functional, prototype Bio::Tools::Blast::Run::LocalBlast.pm
     module.

   * Add support for PSI-BLAST and PHI-BLAST

   * Parse histogram of expectations and retrieve gif image in Blast
     report (if present).

   * Further investigate memory leak that occurs when parsing Blast
     streams whe supplying a -signif parameter to `parse' in this node().

   * Access Blast results directly from the NCBI server using a Perl
     interface to the NCBI toolkit or XML formated Blast reports (when
     available).

   * Further exploit Bio::UnivAln.pm and multiple-sequence alignment
     programs using HSP sequence data. Some of this may best go into a
     separate, dedicated module or script as opposed to burdening
     Blast.pm, Sbjct.pm, and HSP.pm with additional functionality that is
     not always required.

   * Add an example script for parsing Blast reports containing HTML
     formatting.

VERSION
=======

   Bio::Tools::Blast.pm, 0.09

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules.  Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org             - General discussion
     http://bio.perl.org/MailList.html - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution. Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR
======

   Steve A. Chervitz, sac@genome.stanford.edu

   See the `FEEDBACK' in this node section for where to send bug reports
and comments.

ACKNOWLEDGEMENTS
================

   This module was developed under the auspices of the Saccharomyces Genome
Database:     http://genome-www.stanford.edu/Saccharomyces

   Other contributors include: Alex Dong Li (webblast), Chris Dagdigian
(Seq.pm), Steve Brenner (Seq.pm), Georg Fuellen (Seq.pm, UnivAln.pm), and
untold others who have offered comments (noted in the
Bio/Tools/Blast/CHANGES file of the distribution).

COPYRIGHT
=========

   Copyright (c) 1996-98 Steve A. Chervitz. All Rights Reserved.  This
module is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.

SEE ALSO
========

     Bio::Tools::SeqAnal.pm                  - Sequence analysis object base class.
     Bio::Tools::Blast::Sbjct.pm             - Blast hit object.
     Bio::Tools::Blast::HSP.pm               - Blast HSP object.
     Bio::Tools::Blast::HTML.pm              - Blast HTML-formating utility class.
     Bio::Tools::Blast::Run::Webblast.pm     - Utility module for running Blasts remotely.
     Bio::Tools::Blast::Run::LocalBlast.pm   - Utility module for running Blasts locally.
     Bio::Seq.pm                             - Biosequence object
     Bio::UnivAln.pm                         - Biosequence alignment object.
     Bio::Root::Object.pm                    - Proposed base class for all Bioperl objects.

Links to related modules
------------------------

     Bio::Tools::SeqAnal.pm
          http://bio.perl.org/Core/POD/Bio/Tools/SeqAnal.html

     Bio::Tools::Blast::Sbjct.pm
          http://bio.perl.org/Core/POD/Bio/Tools/Blast/Sbjct.html

     Bio::Tools::Blast::HSP.pm
          http://bio.perl.org/Core/POD/Bio/Tools/Blast/HSP.html

     Bio::Tools::Blast::HTML.pm
          http://bio.perl.org/Core/POD/Bio/Tools/Blast/HTML.html

     Bio::Tools::Blast::Run::Webblast.pm
          http://bio.perl.org/Core/POD/Bio/Tools/Blast/Run/Webblast.html

     Bio::Tools::Blast::Run::LocalBlast.pm
          http://bio.perl.org/Core/POD/Bio/Tools/Blast/Run/LocalBlast.html

     Bio::Seq.pm
          http://bio.perl.org/Core/POD/Seq.html

     Bio::UnivAln.pm
          http://bio.perl.org/Projects/SeqAlign/
          Europe:  http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/#univaln

     Bio::Root::Object.pm
          http://bio.perl.org/Core/POD/Root/Object.html

     http://bio.perl.org/Projects/modules.html  - Online module documentation
     http://bio.perl.org/Projects/Blast/        - Bioperl Blast Project
     http://bio.perl.org/                       - Bioperl Project Homepage

   `References & Information about the BLAST program' in this node.

KNOWN BUGS
==========

   There is a memory leak that occurs when parsing parsing streams
containing large numbers of Blast reports (a few thousand or so) and
specifying a -signif parameter to the `parse' in this node() method. For a
workaround, see `Memory Usage Issues' in this node.

   Not sharing statistical parameters between different Blast objects when
parsing a multi-report stream has not been completely tested and may be a
little buggy.

   Documentation inconsistencies or inaccuracies may exist since this
module underwend a fair bit of re-working going from 0.75 to 0.80
(corresponds to versions 0.04.4 to 0.05 of the bioperl distribution).

APPENDIX
========

   Methods beginning with a leading underscore are considered private and
are intended for internal use by this module. They are not considered part
of the public interface and are described here for documentation purposes
only.

run
---

     Usage     : $object->run( %named_parameters )
     Purpose   : Run a local or remote Blast analysis on one or more sequences.
     Returns   : String containing name of Blast output file if a single Blast
               : is run.
               :  -- OR --
               : List of Blast objects if multiple Blasts are being run as a group.
     Argument  : Named parameters:  (PARAMETER TAGS CAN BE UPPER OR LOWER CASE).
               :    -METHOD  => 'local' or 'remote' (default = remote),
               :    -PARSE   => boolean, (true if the results are to be parsed after the run)
               :    -STRICT  => boolean, the strict mode to use for the resulting Blast objects.
               :  ADDITIONAL PARAMETERS:
               :      See methods _run_remote() and _run_local() for required
               :      parameters necessary for running the blast report.
     Throws    : Exception if no Blast output file was obtained.
     Comments  : This method is called automatically during construction of a
               : Blast.pm object when run parameters are sent to the constructor:
               :  $blastObj = new Bio::Tools::Blast (-RUN =>\%runParam,
     	   :					 %parseParam );
               :
               : The specific run methods (local or remote) called by run()
               : must return a list containing  the file name(s) with the Blast output.
               :
               : The run() method can perform single or multiple Blast runs
               : (analogous to the way parse() works) depending on how many
               : sequences are submitted. However, the running of multiple
               : Blasts is probably better handled at the script level. See notes in
               : the "TODO" section below.
               :
               : As for what to do with the Blast result file, that decision is
               : left for the user who can direct the Blast object to delete, compress,
               : or leave it alone.
               :
               : This method does not worry about load balancing, which
               : is probably best handled at the server level.
               :
     TODO:     : Support for running+parsing multiple Blast analyses with a
               : single run() call is incomplete. One can generate multiple
               : reports by placing more than one sequence object in the -seqs
               : reference parameter. This saves some overhead in the code
               : that executes the Blasts since all options are configured once.
               : (This is analogous to parsing using the static $Blast object
               : see parse() and _parse_stream()).
               :
               : The trouble is that Blast objects for all runs are constructed,
               : parsed (if necessary), and then returned as a group
               : This can require lots of memory when run+parsing many Blasts
               : but should be fine if you just want to run a bunch Blasts.
               :
               : For now, when running+parsing Blasts, stick to running one
               : Blast at a time, building the Blast object with the results
               : of that report, and processing as necessary.
               :
               : Support for running PSI-Blast is not complete.

   See Also:  `_run_remote' in this node(), `_run_local' in this node(),
`parse' in this node()

_run_remote
-----------

     Usage     : n/a; internal method called by run()
               : $object->_run_remote( %named_parameters )
     Purpose   : Run Blast on a remote server.
     Argument  : Named parameters:
               :   See documentation for function &blast_remote in
               :   Bio::Tools::Blast::Run::Webblast.pm for description
               :   of parameters.
     Comments  : This method requires the Bio::Tools::Blast::Run::Webblast.pm
               : which conforms to this minimal API:
               :    * export a method called &blast_remote that accepts a
               :      Bio::Tools::Blast.pm object + named parameters
               :      (specified in the Webblast.pm module).
               :    * return a list of names of files containing the raw Blast reports.
               :      (When building a Blast object, this list would contain a
               :       single file from which the Blast object is to be constructed).

   See Also   : `run' in this node(), `_run_local' in this node(),
*Bio::Tools::Blast::Run::Webblast.pm::blast_remote()*, `Links to related
modules' in this node

_run_local
----------

     Usage     : n/a; internal method called by run()
               : $object->_run_local(%named_parameters)
     Purpose   : Run Blast on a local machine.
     Argument  : Named parameters:
               :   See documentation for function &blast_local in
               :   Bio::Tools::Blast::Run::LocalBlast.pm for description
               :   of parameters.
     Comments  : This method requires the Bio::Tools::Blast::Run::LocalBlast.pm
               : module which should be customized for your site. This module would
               : contain all the commands, paths, environment variables, and other
               : data necessary to run Blast commands on a local machine, but should
               : not contain any semantics for specific query sequences.
               :
               : LocalBlast.pm should also conform to this minimal API:
               :    * export a method called &blast_local that accepts a
               :       Bio::Tools::Blast.pm object + named parameters
               :      (specified in the LocalBlast.pm module).
               :    * return a list of names of files containing the raw Blast reports.
               :      (When building a Blast object, this list would contain a
               :       single file from which the Blast object is to be constructed).

   See Also   : `run' in this node(), `_run_remote' in this node(),
*Bio::Tools::Blast::Run::LocalBlast::blast_local()*, `Links to related
modules' in this node

db_remote
---------

     Usage     : @dbs = $Blast->db_remote( [seq_type] );
     Purpose   : Get a list of available sequence databases for remote Blast analysis.
     Returns   : Array of strings
     Argument  : seq_type = 'p' or 'n'
               :  'p' = Gets databases for peptide searches  (default)
               :  'n' = Gets databases for nucleotide searches
     Throws    : n/a
     Comments  : Peptide databases are a subset of the nucleotide databases.
               : It is convenient to call this method on the static $Blast object
               : as shown in Usage.

   See Also   : `db_local' in this node()

db_local
--------

     Usage     : @dbs = $Blast->db_local( [seq_type] );
     Purpose   : Get a list of available sequence databases for local Blast analysis.
     Returns   : Array of strings
     Argument  : seq_type = 'p' or 'n'
               :  'p' = Gets databases for peptide searches  (default)
               :  'n' = Gets databases for nucleotide searches
     Throws    : n/a
     Comments  : Peptide databases are a subset of the nucleotide databases.
               : It is convenient to call this method on the static $Blast object.
                 as shown in Usage.

   See Also   : `db_remote' in this node()

parse
-----

     Usage     : $blast_object->parse( %named_parameters )
     Purpose   : Parse a Blast report from a file or STDIN.
               :   * Parses a raw BLAST data, populating Blast object with report data.
               :   * Sets the significance cutoff.
               :   * Extracts statistical parameters about the BLAST run.
               :   * Handles both single files and streams containing multiple reports.
     Returns   : integer (number of Blast reports parsed)
     Argument  : <named parameters>:  (PARAMETER TAGS CAN BE UPPER OR LOWER CASE).
     	   : -FILE       => string (name of file containing raw Blast output.
               :                         Optional. If a valid file is not supplied,
     	   :		             STDIN will be used).
     	   : -SIGNIF     => number (float or scientific notation number to be used
     	   :                         as a P- or Expect value cutoff;
     	   :			     default =  $DEFAULT_SIGNIF (999)).
     	   : -FILT_FUNC  => func_ref (reference to a function to be used for
               :                          filtering out hits based on arbitrary criteria.
               :                          This function should take a
               :                          Bio::Tools::Blast::Sbjct.pm object as its first
               :                          argument and return a boolean value,
     	   :                          true if the hit should be filtered out).
               :                          Sample filter function:
               :                          -FILT_FUNC => sub { $hit = shift;
     	   :				                  $hit->gaps == 0; },
               : -CHECK_ALL_HITS => boolean (check all hits for significance against
               :                             significance criteria.  Default = false.
     	   :			         If false, stops processing hits after the first
               :                             non-significant hit or the first hit that fails
               :                             the filt_func call. This speeds parsing,
               :                             taking advantage of the fact that the hits
               :                             are processed in the order they are ranked.)
               : -MIN_LEN     => integer (to be used as a minimum query sequence length
               :                          sequences below this length will not be processed).
               :                          default = no minimum length).
     	   : -STATS       => boolean (collect stats for report: matrix, filters, etc.
               :                          default = false).
     	   : -BEST        => boolean (only process the best hit of each report;
               :                          default = false).
               : -OVERLAP     => integer (the amount of overlap to permit between
               :                          adjacent HSPs when tiling HSPs,
               :                          Default = $MAX_HSP_OVERLAP (2))
               :
               : PARAMETERS USED WHEN PARSING MULTI-REPORT STREAMS:
               : --------------------------------------------------
     	   : -SHARE       => boolean (set this to true if all reports in stream
     	   :			      share the same stats. Default = true)
               :                          Must be set to false when parsing both Blast1 and
               :                          Blast2 reports in the same run or if you need
               :                          statistical params for each report, Lambda, K, H).
     	   : -STRICT      => boolean (use strict mode for all Blast objects created.
               :                          Increases sensitivity to errors. For single
               :                          Blasts, this is parameter is sent to new().)
               : -EXEC_FUNC   => func_ref (reference to a function for processing each
               :                           Blast object after it is parsed. Should accept a
               :                           Blast object as its sole argument. Return value
               :                           is ignored. If an -EXEC_FUNC parameter is supplied,
               :                           the -SAVE_ARRAY parameter will be ignored.)
               : -SAVE_ARRAY  =>array_ref, (reference to an array for storing all
               :                            Blast objects as they are created.
               :                            Experimental. Not recommended.)
               : -SIGNIF_FMT  => boolean   String of 'exp' or 'parts'. Sets the format
               :                           for reporting P/Expect values. 'exp' reports
               :                           only the exponent portion. 'parts' reports
               :                           them as a 2 element list. See signif_fmt()..
               :
     Throws    : Exception if BLAST report contains a FATAL: error.
               : Propagates any exception thrown by read().
               : Propagates any exception thrown by called parsing methods.
     Comments  : This method can be called either directly using the static $Blast object
               : or indirectly (by Bio::Tools::SeqAnal.pm) during constuction of an
               : individual Blast object.
               :
               : HTML-formatted reports can be parsed as well. No special flag is required
               : since it is detected automatically. The presence of HTML-formatting
               : will result in slower performace, however, since it must be removed
               : prior to parsing. Parsing HTML-formatted reports is highly
               : error prone and is generally not recommended.
               :
               : If one has an HTML report, do NOT remove the HTML from it by using the
               : "Save As" option of a web browser to save it as text. This renders the
               : report unparsable.
               : HTML-formatted reports can be parsed after running through the strip_html
               : function of Blast::HTML.pm as in:
               :    require Bio::Tools::Blast::HTML;
               :    Bio::Tools::Blast::HTML->import(&strip_html);
               :    &strip_html(\$data);
               :    # where data contains full contents of an HTML-formatted report.
               : TODO: write a demo script that does this.

   See Also   : `_init_parse_params' in this node(), `_parse_blast_stream'
in this node(), `overlap' in this node(), `signif_fmt' in this node(),
*Bio::Root::Object::read()*, *Bio::Tools::Blast::HTML.pm::strip_html()*,
`Links to related modules' in this node

_init_parse_params
------------------

     Title   : _init_parse_params
     Usage   : n/a; called automatically by parse()
     Purpose : Initializes parameters used during parsing of Blast reports.
             : This is a static method used by the $Blast object.
             : Calls _set_signif().
     Example :
     Returns : n/a
     Args    : Args extracted by parse().

   See Also: `parse' in this node(), `_set_signif' in this node()

_set_signif
-----------

     Usage     : n/a; called automatically by _init_parse_params()
               : This is now a "static" method used only by $Blast.
               : _set_signif($signif, $min_len, $filt_func);
     Purpose   : Sets significance criteria for the BLAST object.
     Argument  : Obligatory three arguments:
               :   $signif = float or sci-notation number or undef
               :   $min_len = integer or undef
               :   $filt_func = function reference or undef
               :
               :   If $signif is undefined, a default value is set
               :   (see $DEFAULT_SIGNIF; min_length = not set).
     Throws    : Exception if significance value is defined but appears
               :   out of range or invalid.
               : Exception if $filt_func if defined and is not a func ref.
     Comments  : The significance of a BLAST report can be based on
               : the P (or Expect) value and/or the length of the query sequence.
               : P (or Expect) values GREATER than '_significance' are not significant.
               : Query sequence lengths LESS than '_min_length' are not significant.
               :
               : Hits can also be screened using arbitrary significance criteria
               : as discussed in the parse() method.
               :
               : If no $signif is defined, the '_significance' level is set to
               : $Bio::Tools::Blast::DEFAULT_SIGNIF (999).

   See Also   : `signif' in this node(), `min_length' in this node(),
`_init_parse_params' in this node(), `parse' in this node()

_parse_blast_stream
-------------------

     Usage     : n/a. Internal method called by parse()
     Purpose   : Obtains the function to be used during parsing and calls read().
     Returns   : Integer (the number of blast reports read)
     Argument  : Named parameters  (forwarded from parse())
     Throws    : Propagates any exception thrown by _get_parse_blast_func() and read().

   See Also   : `_get_parse_blast_func' in this node(),
*Bio::Root::Object::read()*

_get_parse_blast_func
---------------------

     Usage     : n/a; internal method used by _parse_blast_stream()
               : $func_ref = $blast_object->_get_parse_blast_func()
     Purpose   : Generates a function ref to be used as a closure for parsing
               : raw data as it is being loaded by Bio::Root::IOManager::read().
     Returns   : Function reference (closure).
     Comments  : The the function reference contains a fair bit of logic
               : at present. It could perhaps be split up into separate
               : functions to make it more 'digestible'.

   See Also   : `_parse_blast_stream' in this node()

_report_errors
--------------

     Title   : _report_errors
     Usage   : n/a; Internal method called by _get_parse_blast_func().
     Purpose : Throw or warn about any errors encountered.
     Returns : n/a
     Args    : n/a
     Throws  : If all hits generated exceptions, raise exception
             :   (a fatal event for the Blast object.)
             : If some hits were okay but some were bad, generate a warning
             :   (a few bad applies should not spoil the bunch).
             :   This usually indicates a limiting B-value.
             : When the parsing code fails, it is either all or nothing.

_parse_header
-------------

     Usage     : n/a; called automatically by the _get_parse_blast_func().
     Purpose   : Parses the header section of a BLAST report.
     Argument  : String containing the header+description section of a BLAST report.
     Throws    : Exception if description data cannot be parsed properly.
               : Exception if there is a 'FATAL' error in the Blast report.
               : Warning if there is a 'WARNING' in the Blast report.
               : Warning if there are no significant hits.
     Comments  : Description section contains a single line for each hit listing
               : the seq id, description, score, Expect or P-value, etc.

   See Also   : `_get_parse_blast_func' in this node()

_parse_alignment
----------------

     Usage     : n/a; called automatically by the _get_parse_blast_func().
     Purpose   : Parses a single alignment section of a BLAST report.
     Argument  : String containing the alignment section.
     Throws    : n/a; All errors are trapped while parsing the hit data
               : and are processed as a group when the report is
               : completely processed (See _report_errors()).
               :
     Comments  : Alignment section contains all HSPs for a hit.
               : Requires Bio::Tools::Blast::Sbjct.pm.
               : Optionally calls a filter function to screen the hit on arbitrary
               : criteria. If the filter function returns true for a given hit,
               : that hit will be skipped.
               :
               : If the Blast object was created with -check_all_hits set to true,
               : all hits will be checked for significance and processed if necessary.
               : If this field is false, the parsing will stop after the first
               : non-significant hit.
               : See parse() for description of parsing parameters.

   See Also   : `parse' in this node(), `_get_parse_blast_func' in this
node(), `_report_errors' in this node(), *Bio::Tools::Blast::Sbjct()*,
`Links to related modules' in this node

_parse_footer
-------------

     Usage     : n/a; internal function. called by _parse_alignment()
     Purpose   : Extracts statistical and other parameters from the BLAST report.
               : Sets various key elements such as the program and version,
               : gapping, and the layout for the report (blast1 or blast2).
     Argument  : Data to be parsed.
     Returns   : String containing an alignment section for processing by
               : _parse_alignment().
     Throws    : Exception if cannot find the parameters section of report.
               : Warning if cannot determine if gapping was used.
               : Warning if cannot determine the scoring matrix used.
     Comments  : This method must always get called, even if the -STATS
               : parse() parameter is false. The reason is that the layout
               : of the report  and the presence of gapping must always be set.
               : The determination whether to set additional stats is made
               : by methods called by _parse_footer().

   See Also   : `parse' in this node(), `_parse_alignment' in this node(),
`_set_database' in this node()

_set_blast2_stats
-----------------

     Usage     : n/a; internal function called by _parse_footer()
     Purpose   : Extracts statistical and other parameters from BLAST2 report footer.
               : Stats collected: database release, gapping,
               : posted date, matrix used, filter used, Karlin-Altschul parameters,
               : E, S, T, X, W.
     Throws    : Exception if cannot get "Parameters" section of Blast report.

   See Also   : `parse' in this node(), `_parse_footer' in this node(),
`_set_database' in this node(), *Bio::Tools::SeqAnal::set_date()*,`Links
to related modules' in this node

_set_blast1_stats
-----------------

     Usage     : n/a; internal function called by _parse_footer()
     Purpose   : Extracts statistical and other parameters from BLAST 1.x style eports.
               : Handles NCBI Blast1 and WashU-Blast2 formats.
               : Stats collected: database release, gapping,
               : posted date, matrix used, filter used, Karlin-Altschul parameters,
               : E, S, T, X, W.

   See Also   : `parse' in this node(), `_parse_footer' in this node(),
`_set_database' in this node(), *Bio::Tools::SeqAnal::set_date()*,`Links
to related modules' in this node

_set_gapping_wu
---------------

     Usage     : n/a; internal function called by _set_blast1_stats()
     Purpose   : Determine if gapping_wu was on for WashU Blast reports.
     Comments  : In earlier versions, gapping was always specified
               : but in the current version (2.0a19MP), gapping is on by default
               : and there is no positive "gapping" indicator in the Parameters
               : section.

   See Also   : `_set_blast1_stats' in this node()

_set_date
---------

     Usage     : n/a; internal function called by _parse_footer()
     Purpose   : Determine the date on which the Blast analysis was performed.
     Comments  : Date information is not consistently added to Blast output.
               : Uses superclass method set_date() to set date from the file,
               : (if any).

   See Also   : `_parse_footer' in this node(),
*Bio::Tools::SeqAnal::set_date()*,`Links to related modules' in this node

_set_length
-----------

     Usage     : n/a; called automatically during Blast report parsing.
     Purpose   : Sets the length of the query sequence (extracted from report).
     Returns   : integer (length of the query sequence)
     Throws    : Exception if cannot determine the query sequence length from
               :           the BLAST report.
               : Exception if the length is below the min_length cutoff (if any).
     Comments  : The logic here is a bit different from the other _set_XXX()
               : methods since the significance of the BLAST report is assessed
               : if MIN_LENGTH is set.

   See Also   : *Bio::Tools::SeqAnal::length()*, `Links to related
modules' in this node

_set_database
-------------

     Usage     : n/a; called automatically during Blast report parsing.
     Purpose   : Sets the name of the database used by the BLAST analysis.
               : Extracted from raw BLAST report.
     Throws    : Exception if the name of the database cannot be determined.
     Comments  : The database name is used by methods or related objects
               : for database-specific parsing.

   See Also   : `parse' in this node(),
*Bio::Tools::SeqAnal::database()*,*Bio::Tools::SeqAnal::_set_db_stats()*,`Links
to related modules' in this node

_set_query
----------

     Usage     : n/a; called automatically during Blast report parsing.
     Purpose   : Set the name of the query and the query description.
               : Extracted from the raw BLAST report.
     Returns   : String containing name of query extracted from report.
     Throws    : Warning if the name of the query cannont be obtained.

   See Also   : *Bio::Tools::SeqAnal::query_desc()*,`Links to related
modules' in this node

_parse_signif
-------------

     Usage     : &_parse_signif(string, layout, gapped);
               : This is a class function.
     Purpose   : Extracts the P- or Expect value from a single line of a BLAST description section.
     Example   : &_parse_signif("PDB_UNIQUEP:3HSC_  heat-shock cognate ...   799  4.0e-206  2", 1);
               : &_parse_signif("gi|758803  (U23828) peritrophin-95 precurs   38  0.19", 2);
     Argument  : string = line from BLAST description section
               : layout = integer (1 or 2)
               : gapped = boolean (true if gapped Blast).
     Returns   : Float (0.001 or 1e-03)
     Status    : Static

signif
------

     Usage     : $blast->signif();
     Purpose   : Gets the P or Expect value used as significance screening cutoff.
     Returns   : Scientific notation number with this format: 1.0e-05.
     Argument  : n/a
     Comments  : Screening of significant hits uses the data provided on the
               : description line. For Blast1 and WU-Blast2, this data is P-value.
               : for Blast2 it is an Expect value.
               :
               : Obtains info from the static $Blast object if it has not been set
               : for the current object.

   See Also   : `_set_signif' in this node()

is_signif
---------

     Usage     : $blast->is_signif();
     Purpose   : Determine if the BLAST report contains significant hits.
     Returns   : Boolean
     Argument  : n/a
     Comments  : BLAST reports without significant hits but with defined
               : significance criteria will throw exceptions during construction.
               : This obviates the need to check significant() for
               : such objects.

   See Also   : `_set_signif' in this node()

signif_fmt
----------

     Usage     : $blast->signif_fmt( [FMT] );
     Purpose   : Allows retrieval of the P/Expect exponent values only
               : or as a two-element list (mantissa, exponent).
     Usage     : $blast_obj->signif_fmt('exp');
               : $blast_obj->signif_fmt('parts');
     Returns   : String or '' if not set.
     Argument  : String, FMT = 'exp' (return the exponent only)
               :             = 'parts'(return exponent + mantissa in 2-elem list)
               :              = undefined (return the raw value)
     Comments  : P/Expect values are still stored internally as the full,
               : scientific notation value.
               : This method uses the static $Blast object since this issue
               : will pertain to all Blast reports within a given set.
               : This setting is propagated to Bio::Tools::Blast::Sbjct.pm.

min_length
----------

     Usage     : $blast->min_length();
     Purpose   : Gets the query sequence length used as significance screening criteria.
     Returns   : Integer
     Argument  : n/a
     Comments  : Obtains info from the static $Blast object if it has not been set
               : for the current object.

   See Also   : `_set_signif' in this node(), `signif' in this node()

gapped
------

     Usage     : $blast->gapped();
     Purpose   : Set/Get boolean indicator for gapped BLAST.
     Returns   : Boolean
     Argument  : n/a
     Comments  : Obtains info from the static $Blast object if it has not been set
               : for the current object.

_get_stats
----------

     Usage     : n/a; internal method.
     Purpose   : Set/Get indicator for collecting full statistics from report.
     Returns   : Boolean (0 | 1)
     Comments  : Obtains info from the static $Blast object which gets set
               : by _init_parse_params().

_layout
-------

     Usage     : n/a; internal method.
     Purpose   : Set/Get indicator for the layout of the report.
     Returns   : Integer (1 | 2)
               : Defaults to 2 if not set.
     Comments  : Blast1 and WashU-Blast2 have a layout = 1.
               : This is intended for internal use by this and closely
               : allied modules like Sbjct.pm and HSP.pm.
               :
               : Obtains info from the static $Blast object if it has not been set
               : for the current object.

hits
----

     Usage     : $blast->hits();
     Purpose   : Get a list containing all BLAST hit (Sbjct) objects.
               : Get the numbers of significant hits.
     Examples  : @hits       = $blast->hits();
               : $num_signif = $blast->hits();
     Returns   : List context : list of Bio::Tools::Blast::Sbjct.pm objects
               :                or an empty list if there are no hits.
               : Scalar context: integer (number of significant hits)
               :                 or zero if there are no hits.
               :                 (Equivalent to num_hits()).
     Argument  : n/a. Relies on wantarray.
     Throws    : n/a.
               : Not throwing exception because the absence of hits may have
               : resulted from stringent significance criteria, not a failure
               : set the hits.

   See Also   : `hit' in this node(), `num_hits' in this node(),
`is_signif' in this node(), `_set_signif' in this node()

hit
---

     Example   : $blast_obj->hit( [class] )
     Purpose   : Get a specific hit object.
               : Provides some syntactic sugar for the hits() method.
     Usage     : $hitObj = $blast->hit();
               : $hitObj = $blast->hit('best');
               : $hitObj = $blast->hit('worst');
               : $hitObj = $blast->hit( $name );
     Returns   : Object reference for a Bio::Tools::Blast::Sbjct.pm object.
               : undef if there are no hit (Sbjct) objects defined.
     Argument  : Class (or no argument).
               :   No argument (default) = highest scoring hit (same as 'best').
               :   'best' or 'first' = highest scoring hit.
               :   'worst' or 'last' = lowest scoring hit.
               :   $name = retrieve a hit by seq id (case-insensitive).
     Throws    : Exception if the Blast object has no significant hits.
               : Exception if a hit cannot be found when supplying a specific
               : hit sequence identifier as an argument.
     Comments  : 'best'  = lowest significance value (P or Expect) among significant hits.
               : 'worst' = highest sigificance value (P or Expect) among significant hits.

   See Also   : `hits' in this node(), `num_hits' in this node(),
`is_signif' in this node()

num_hits
--------

     Usage     : $blast->num_hits( ['total'] );
     Purpose   : Get number of significant hits or number of total hits.
     Examples  : $num_signif = $blast-num_hits;
               : $num_total  = $blast->num_hits('total');
     Returns   : Integer
     Argument  : String = 'total' (or no argument).
               :   No argument (Default) = return number of significant hits.
               :   'total' = number of total hits.
     Throws    : n/a.
               : Not throwing exception because the absence of hits may have
               : resulted from stringent significance criteria, not a failure
               : set the hits.
     Comments  : A significant hit is defined as a hit with an expect value
               : (or P value for WU-Blast) at or below the -signif parameter
               : used when parsing the report. Additionally, if a filter function
               : was supplied, the significant hit must also pass that
               : criteria.
     See Also   : L<hits>(), L<hit>(), L<is_signif>(), L<_set_signif>(), L<parse>()

lowest_p
--------

     Usage     : $blast->lowest_p()
     Purpose   : Get the lowest P-value among all hits in a BLAST report.
               : Syntactic sugar for $blast->hit('best')->p().
     Returns   : Float or scientific notation number.
               : Returns -1.0 if lowest_p has not been set.
     Argument  : n/a.
     Throws    : Exception if the Blast report does not report P-values
               : (as is the case for NCBI Blast2).
     Comments  : A value is returned regardless of whether or not there were
               : significant hits ($DEFAULT_SIGNIF, currently  999).

   See Also   : `lowest_expect' in this node(), `lowest_signif' in this
node(), `highest_p' in this node(), `signif_fmt' in this node()

lowest_expect
-------------

     Usage     : $blast->lowest_expect()
     Purpose   : Get the lowest Expect value among all hits in a BLAST report.
               : Syntactic sugar for $blast->hit('best')->expect()
     Returns   : Float or scientific notation number.
               : Returns -1.0 if lowest_expect has not been set.
     Argument  : n/a.
     Throws    : Exception if there were no significant hits and the report
               : does not have Expect values on the description lines
               : (i.e., Blast1, WashU-Blast2).

   See Also   : `lowest_p' in this node(), `lowest_signif' in this node(),
`highest_expect' in this node(), `signif_fmt' in this node()

highest_p
---------

     Example   : $blast->highest_p( ['overall'])
     Purpose   : Get the highest P-value among all hits in a BLAST report.
               : Syntactic sugar for $blast->hit('worst')->p()
               : Can also get the highest P-value overall (not just among signif hits).
     Usage     : $p_signif = $blast->highest_p();
               : $p_all    = $blast->highest_p('overall');
     Returns   : Float or scientific notation number.
               : Returns -1.0 if highest_p has not been set.
     Argument  : String 'overall' or no argument.
               : No argument = get highest P-value among significant hits.
     Throws    : Exception if object is created from a Blast2 report
               : (which does not report P-values).

   See Also   : `highest_signif' in this node(), `lowest_p' in this
node(), `_set_signif' in this node(), `signif_fmt' in this node()

highest_expect
--------------

     Usage     : $blast_object->highest_expect( ['overall'])
     Purpose   : Get the highest Expect value among all significant hits in a BLAST report.
               : Syntactic sugar for $blast->hit('worst')->expect()
     Examples  : $e_sig = $blast->highest_expect();
               : $e_all = $blast->highest_expect('overall');
     Returns   : Float or scientific notation number.
               : Returns -1.0 if highest_exoect has not been set.
     Argument  : String 'overall' or no argument.
               : No argument = get highest Expect-value among significant hits.
     Throws    : Exception if there were no significant hits and the report
               : does not have Expect values on the description lines
               : (i.e., Blast1, WashU-Blast2).

   See Also   : `lowest_expect' in this node(), `highest_signif' in this
node(), `signif_fmt' in this node()

lowest_signif
-------------

     Usage     : $blast_obj->lowest_signif();
               : Syntactic sugar for $blast->hit('best')->signif()
     Purpose   : Get the lowest P or Expect value among all hits
               : in a BLAST report.
               : This method is syntactic sugar for $blast->hit('best')->signif()
               : The value returned is the one which is reported in the decription
               : section of the Blast report.
               : For Blast1 and WU-Blast2, this is a P-value,
               : for NCBI Blast2, it is an Expect value.
     Example   : $blast->lowest_signif();
     Returns   : Float or scientific notation number.
               : Returns -1.0 if lowest_signif has not been set.
     Argument  : n/a.
     Throws    : n/a.
     Status    : Deprecated. Use lowest_expect() or lowest_p().
     Comments  : The signif() method provides a way to deal with the fact that
               : Blast1 and Blast2 formats differ in what is reported in the
               : description lines of each hit in the Blast report. The signif()
               : method frees any client code from having to know if this is a P-value
               : or an Expect value, making it easier to write code that can process
               : both Blast1 and Blast2 reports. This is not necessarily a good thing, since
               : one should always know when one is working with P-values or
               : Expect values (hence the deprecated status).
               : Use of lowest_expect() is recommended since all hits will have an Expect value.

   See Also   : `lowest_p' in this node(), `lowest_expect' in this node(),
`signif' in this node(), `signif_fmt' in this node(), `_set_signif' in
this node()

highest_signif
--------------

     Usage     : $blast_obj->highest_signif('overall');
               : Syntactic sugar for $blast->hit('worst')->signif()
     Purpose   : Get the highest P or Expect value among all hits
               : in a BLAST report.
               : The value returned is the one which is reported in the decription
               : section of the Blast report.
               : For Blast1 and WU-Blast2, this is a P-value,
               : for NCBI Blast2, it is an Expect value.
     Example   : $blast->highest_signif();
     Returns   : Float or scientific notation number.
               : Returns -1.0 if highest_signif has not been set.
     Argument  : Optional  string 'overall' to get the highest overall significance value.
     Throws    : n/a.
     Status    : Deprecated. Use highest_expect() or highest_p().
     Comments  : Analogous to lowest_signif(), q.v.

   See Also   : `lowest_signif' in this node(), `lowest_p' in this node(),
`lowest_expect' in this node(), `signif' in this node(), `signif_fmt' in
this node(), `_set_signif' in this node()

matrix
------

     Usage     : $blast_object->matrix();
     Purpose   : Get the name of the scoring matrix used.
               : This is extracted from the report.
     Argument  : n/a
     Returns   : string or undef if not defined

filter
------

     Usage     : $blast_object->filter();
     Purpose   : Get the name of the low-complexity sequence filter used.
               : (SEG, SEG+XNU, DUST, NONE).
               : This is extracted from the report.
     Argument  : n/a
     Returns   : string or undef if not defined

expect
------

     Usage     : $blast_object->expect();
     Purpose   : Get the expect parameter (E) used for the Blast analysis.
               : This is extracted from the report.
     Argument  : n/a
     Returns   : string or undef if not defined.

karlin_altschul
---------------

     Usage     : $blast_object->karlin_altschul();
     Purpose   : Get the Karlin_Altschul sum statistics (Lambda, K, H)
               : These are extracted from the report.
     Argument  : n/a
     Returns   : list of three floats (Lambda, K, H)
               : If not defined, returns list of three zeros)

word_size
---------

     Usage     : $blast_object->word_size();
     Purpose   : Get the word_size used during the Blast analysis.
               : This is extracted from the report.
     Argument  : n/a
     Returns   : integer or undef if not defined.

s
-

     Usage     : $blast_object->s();
     Purpose   : Get the s statistic for the Blast analysis.
               : This is extracted from the report.
     Argument  : n/a
     Returns   : integer or undef if not defined.

gap_creation
------------

     Usage     : $blast_object->gap_creation();
     Purpose   : Get the gap creation penalty used for a gapped Blast analysis.
               : This is extracted from the report.
     Argument  : n/a
     Returns   : integer or undef if not defined.

   See Also   : `gap_extension' in this node()

gap_extension
-------------

     Usage     : $blast_object->gap_extension();
     Purpose   : Get the gap extension penalty used for a gapped Blast analysis.
               : This is extracted from the report.
     Argument  : n/a
     Returns   : integer or undef if not defined.

   See Also   : `gap_extension' in this node()

ambiguous_aln
-------------

     Usage     : $blast_object->ambiguous_aln();
     Purpose   : Test all hits and determine if any have an ambiguous alignment.
     Example   : print "ambiguous" if $blast->ambiguous_aln();
     Returns   : Boolean (true if ANY significant hit has an ambiguous alignment)
     Argument  : n/a
     Throws    : n/a
     Status    : Experimental
     Comments  : An ambiguous BLAST alignment is defined as one where two or more
               : different HSPs have significantly overlapping sequences such
               : that it is not possible to create a unique alignment
               : by simply concatenating HSPs. This may indicate the presence
               : of multiple domains in one sequence relative to another.
               : This method only indicates the presence of ambiguity in at
               : least one significant hit. To determine the nature of the
               : ambiguity, each hit must be examined.

   See Also   : *Bio::Tools::Blast::Sbjct::ambiguous_aln()*,`Links to
related modules' in this node

overlap
-------

     Usage     : $blast_object->overlap([integer]);
     Purpose   : Set/Get the number of overlapping residues allowed when tiling multiple HSPs.
               : Delegates to Bio::Tools::Blast::Sbjct::overlap().
     Throws    : Exception if there are no significant hits.
     Status    : Experimental

   See Also   : *Bio::Tools::Blast::Sbjct::overlap()*,`Links to related
modules' in this node

homol_data
----------

     Usage     : @data = $blast_object->homo_data( %named_params );
     Purpose   : Gets specific similarity data about each significant hit.
     Returns   : Array of strings:
               : "Homology data" for each HSP is in the format:
               :  "<integer> <start> <stop>"
               : Data for different HSPs are tab-delimited.
     Argument  : named parameters passed along to the hit objects.
     Throws    : n/a
     Status    : Experimental
     Comments  : This is a very experimental method used for obtaining an
               : indication of:
               :   1) how many HSPs are in a Blast alignment
               :   2) how strong the similarity is between sequences in the HSP
               :   3) the endpoints of the alignment (sequence monomer numbers)

   See Also   : *Bio::Tools::Blast::Sbjct::homol_data()*,`Links to related
modules' in this node

REPORT GENERATING METHODS
=========================

table
-----

     Usage     : $blast_obj->table( [get_desc]);
     Purpose   : Output data for each HSP of each hit in tab-delimited format.
     Example   : print $blast->table;
               : print $blast->table(0);
               : # Call table_labels() to print labels.
     Argument  : get_desc = boolean, if false the description of each hit is not included.
               :            Default: true (if not defined, include description column).
     Returns   : String containing tab-delimited set of data for each HSP
               : of each significant hit. Different HSPs are separated by newlines.
               : Left-to-Right order of fields:
               : 1 QUERY_NAME             # Sequence identifier of the query.
               : 2 QUERY_LENGTH           # Full length of the query sequence.
               : 3 SBJCT_NAME             # Sequence identifier of the sbjct ("hit".
               : 4 SBJCT_LENGTH           # Full length of the sbjct sequence.
               : 5 EXPECT                 # Expect value for the alignment.
               : 6 SCORE                  # Blast score for the alignment.
               : 7 BITS                   # Bit score for the alignment.
               : 8 NUM_HSPS               # Number of HSPs (not the "N" value).
               : 9 HSP_FRAC_IDENTICAL     # fraction of identical substitutions.
               : 10 HSP_FRAC_CONSERVED    # fraction of conserved ("positive") substitutions.
               : 11 HSP_QUERY_ALN_LENGTH  # Length of the aligned portion of the query sequence.
               : 12 HSP_SBJCT_ALN_LENGTH  # Length of the aligned portion of the sbjct sequence.
               : 13 HSP_QUERY_GAPS        # Number of gaps in the aligned query sequence.
               : 14 HSP_SBJCT_GAPS        # Number of gaps in the aligned sbjct sequence.
               : 15 HSP_QUERY_START       # Starting coordinate of the query sequence.
               : 16 HSP_QUERY_END         # Ending coordinate of the query sequence.
               : 17 HSP_SBJCT_START       # Starting coordinate of the sbjct sequence.
               : 18 HSP_SBJCT_END         # Ending coordinate of the sbjct sequence.
               : 19 HSP_QUERY_STRAND      # Strand of the query sequence (TBLASTN/X only)
               : 20 HSP_SBJCT_STRAND      # Strand of the sbjct sequence (TBLASTN/X only)
               : 21 HSP_FRAME             # Frame for the sbjct translation (TBLASTN/X only)
               : 22 SBJCT_DESCRIPTION  (optional)  # Full description of the sbjct sequence from
               :                                  # the alignment section.
     Throws    : n/a
     Comments  : This method does not collect data based on tiling of the HSPs.
               : The table will contains redundant information since the hit name,
               : id, and other info for the hit are listed for each HSP.
               : If you need more flexibility in the output format than this
               : method provides, design a custom function.

   See Also  : `table_tiled' in this node(), `table_labels' in this
node(), `_display_hits' in this node()

table_labels
------------

     Usage     : print $blast_obj->table_labels( [get_desc] );
     Purpose   : Get column labels for table().
     Returns   : String containing column labels. Tab-delimited.
     Argument  : get_desc = boolean, if false the description column is not included.
               : Default: true (if not defined, include description column).
     Throws    : n/a

   See Also   : `table' in this node()

table_tiled
-----------

     Purpose   : Get data from tiled HSPs in tab-delimited format.
               : Allows only minimal flexibility in the output format.
               : If you need more flexibility, design a custom function.
     Usage     : $blast_obj->table_tiled( [get_desc]);
     Example   : print $blast->table_tiled;
               : print $blast->table_tiled(0);
               : # Call table_labels_tiled() if you want labels.
     Argument  : get_desc = boolean, if false the description of each hit is not included.
               :            Default: true (include description).
     Returns   : String containing tab-delimited set of data for each HSP
               : of each significant hit. Multiple hits are separated by newlines.
               : Left-to-Right order of fields:
               : 1 QUERY_NAME           # Sequence identifier of the query.
               : 2 QUERY_LENGTH         # Full length of the query sequence.
               : 3 SBJCT_NAME           # Sequence identifier of the sbjct ("hit".
               : 4 SBJCT_LENGTH         # Full length of the sbjct sequence.
               : 5 EXPECT               # Expect value for the alignment.
               : 6 SCORE                # Blast score for the alignment.
               : 7 BITS                 # Bit score for the alignment.
               : 8 NUM_HSPS             # Number of HSPs (not the "N" value).
               : 9 FRAC_IDENTICAL*      # fraction of identical substitutions.
               : 10 FRAC_CONSERVED*     # fraction of conserved ("positive") substitutions .
               : 11 FRAC_ALN_QUERY*     # fraction of the query sequence that is aligned.
               : 12 FRAC_ALN_SBJCT*     # fraction of the sbjct sequence that is aligned.
               : 13 QUERY_ALN_LENGTH*   # Length of the aligned portion of the query sequence.
               : 14 SBJCT_ALN_LENGTH*   # Length of the aligned portion of the sbjct sequence.
               : 15 QUERY_GAPS*         # Number of gaps in the aligned query sequence.
               : 16 SBJCT_GAPS*         # Number of gaps in the aligned sbjct sequence.
               : 17 QUERY_START*        # Starting coordinate of the query sequence.
               : 18 QUERY_END*          # Ending coordinate of the query sequence.
               : 19 SBJCT_START*        # Starting coordinate of the sbjct sequence.
               : 20 SBJCT_END*          # Ending coordinate of the sbjct sequence.
               : 21 AMBIGUOUS_ALN       # Ambiguous alignment indicator ('qs', 'q', 's').
               : 22 SBJCT_DESCRIPTION  (optional)  # Full description of the sbjct sequence from
               :                                  # the alignment section.
               :
               : * Items marked with a "*" report data summed across all HSPs
               :   after tiling them to avoid counting data from overlapping regions
               :   multiple times.
     Throws    : n/a
     Comments  : This function relies on tiling of the HSPs since it calls
               : frac_identical() etc. on the hit as opposed to each HSP individually.

   See Also   : `table' in this node(), `table_labels_tiled' in this
node(), *Bio::Tools::Blast::Sbjct::"HSP Tiling and Ambiguous Alignments"*,
`Links to related modules' in this node

table_labels_tiled
------------------

     Usage     : print $blast_obj->table_labels_tiled( [get_desc] );
     Purpose   : Get column labels for table_tiled().
     Returns   : String containing column labels. Tab-delimited.
     Argument  : get_desc = boolean, if false the description column is not included.
               : Default: true (include description column).
     Throws    : n/a

   See Also   : `table_tiled' in this node()

display
-------

     Usage     : $blast_object->display( %named_parameters );
     Purpose   : Display information about Bio::Tools::Blast.pm data members,
               : E.g., parameters of the report, data for each hit., etc.
               : Overrides Bio::Root::Object::display().
     Example   : $object->display(-SHOW=>'stats');
               : $object->display(-SHOW=>'stats,hits');
     Argument  : Named parameters: (TAGS CAN BE UPPER OR LOWER CASE)
               :     -SHOW  => 'file' | 'hits' | 'homol'
               :     -WHERE => filehandle (default = STDOUT)
     Returns   : n/a (print/printf is called)
     Status    : Experimental
     Comments  : For tab-delimited output, see table().

   See Also   : `_display_homol' in this node(), `_display_hits' in this
node(), `_display_stats' in this node(), `table' in this node(),
*Bio::Root::Tools::SeqAnal::display()*,`Links to related modules' in this
node,

_display_homol
--------------

     Usage     : n/a; called automatically by display()
     Purpose   : Print homology data for hits in the BLAST report.
     Example   : n/a
     Argument  : one argument = filehandle object.
     Returns   : printf call.
     Status    : Experimental

   See Also   : `homol_data' in this node(), `display' in this node()

_display_stats
--------------

     Usage     : n/a; called automatically by display()
     Purpose   : Display information about the Blast report "meta" data.
               : Overrides Bio::Tools::SeqAnal::_display_stats() calling it first.
     Example   : n/a
     Argument  : one argument = filehandle object.
     Returns   : printf call.
     Status    : Experimental

   See Also   : `display' in this node(),
*Bio::Tools::SeqAnal::_display_stats()*,`Links to related modules' in this
node

_display_hits
-------------

     Usage     : n/a; called automatically by display()
     Purpose   : Display data for each hit. Not tab-delimited.
     Example   : n/a
     Argument  : one argument = filehandle object.
     Returns   : printf call.
     Status    : Experimental
     Comments  : For tab-delimited output, see table().

   See Also   : `display' in this node(),
*Bio::Tools::Blast::Sbjct::display()*, `table' in this node(),  `Links to
related modules' in this node

to_html
-------

     Usage     : $blast_object->to_html( [%named_parameters] )
     Purpose   : To produce an HTML-formatted version of a BLAST report
               : for efficient navigation of the report using a web browser.
     Example   : # Using the static Blast object:
               : # Can read from STDIN or from a designated file:
               :   $Blast->to_html($file);
               :   $Blast->to_html(-FILE=>$file, -HEADER=>$header);
               :   (if no file is supplied, STDIN will be used).
               : # saving HTML to an array:
               :   $Blast->to_html(-FILE=>$file, -OUT =>\@out);
               : # Using a pre-existing blast object (must have been built from
               : # a file, not STDIN:
               :   $blastObj->to_html();
     Returns   : n/a, either prints report to STDOUT or saves to a supplied array
               : if an '-OUT' parameter is defined (see below).
     Argument  : %named_parameters: (TAGS ARE AND CASE INSENSITIVE).
               :    -FILE   => string containing name of a file to be processed.
               :               If not a valid file or undefined, STDIN will be used.
               :               Can skip the -FILE tag if supplying a filename
               :               as a single argument.
               :    -HEADER => string
               :               This should be an HTML-formatted string to be used
               :               as a header for the page, typically describing query sequence,
               :               database searched, the date of the analysis, and any
               :               additional links.
               :               If not supplied, no special header is used.
               :               Regardless of whether a header is supplied, the
               :               standard info at the top of the report is highlighted.
               :               This should include the <HEADER></HEADER> section
               :               of the page as well.
               :
               :    -IN    => array reference containing a raw Blast report.
               :              each line in a separate element in the array.
               :              If -IN is not supplied, read() is called
               :              and data is then read either from STDIN or a file.
               :
               :    -OUT   => array reference to hold the HTML output.
               :              If not supplied, output is sent to STDOUT.
     Throws    : Exception is propagated from $HTML::get_html_func()
               : and Bio::Root::Object::read().
     Comments  : The code that does the actual work is located in
               :  Bio::Tools::Blast::HTML::get_html_func().
     Bugs      : Some hypertext links to external databases may not be
               : correct. This due in part to the dynamic nature of
               : the web.
               : Hypertext links are not added to hits without database ids.
     TODO      : Possibly create a function to produce fancy default header
               : using data extracted from the report (requires some parsing).
               : For example, it would be nice to always include a date

   See Also   : *Bio::Tools::Blast::HTML::get_html_func()*,
*Bio::Root::Object::read()*, `Links to related modules' in this node

FOR DEVELOPERS ONLY
===================

Data Members
------------

   Information about the various data members of this module is provided
for those wishing to modify or understand the code. Two things to bear in
mind:

  1. Do NOT rely on these in any code outside of this module.  All data
     members are prefixed with an underscore to signify that they are
     private.  Always use accessor methods. If the accessor doesn't exist
     or is inadequate, create or modify an accessor (and let me know,
     too!). (An exception to this might be for Sbjct.pm or HSP.pm which
     are more tightly coupled to Blast.pm and may access Blast data
     members directly for efficiency purposes, but probably should not).

  2. This documentation may be incomplete and out of date.  It is easy for
     these data member descriptions to become obsolete as this module is
     still evolving. Always double check this info and search for members
     not described here.

        An instance of Bio::Tools::Blast.pm is a blessed reference to a
hash containing all or some of the following fields:

     FIELD           VALUE
     --------------------------------------------------------------
     _significance    P-value or Expect value cutoff (depends on Blast version:
     	          Blast1/WU-Blast2 = P-value; Blast2 = Expect value).
       	          Values GREATER than this are deemed not significant.

     _significant     Boolean. True if the query has one or more significant hit.

     _min_length      Integer. Query sequences less than this will be skipped.

     _confirm_significance  Boolean. True if client has supplied significance criteria.

     _gapped          Boolean. True if BLAST analysis has gapping turned on.

     _hits            List of Sbjct.pm objects.

     _num_hits        Number of hits obtained from the BLAST report.

     _num_hits_significant Number of significant based on Significant data members.

     _highestSignif   Highest P or Expect value overall (not just what is stored in _hits).

     _lowestSignif    Lowest P or Expect value overall (not just what is stored in _hits).

   The static $Blast object has a special set of members:

     _errs
     _share
     _stream
     _get_stats
     _gapped
     _filt_func

     Miscellaneous statistical parameters:
     -------------------------------------
      _filter, _matrix, _word_size, _expect, _gapCreation, _gapExtension, _s,
      _lambda, _k, _h

     INHERITED DATA MEMBERS
     -----------------------
     (See Bio::Tools::SeqAnal.pm for inherited data members.)


