This is Info file pm.info, produced by Makeinfo version 1.68 from the input file bigpm.texi.  File: pm.info, Node: Bio/Tools/Blast/HSP, Next: Bio/Tools/Blast/HTML, Prev: Bio/Tools/Blast, Up: Module List Bioperl BLAST High-Scoring Segment Pair object ********************************************** NAME ==== Bio::Tools::Blast::HSP.pm - Bioperl BLAST High-Scoring Segment Pair object SYNOPSIS ======== Object Creation --------------- The construction of HSP objects is handled by Bio::Tools::Blast:: Sbjct.pm. You should not need to use this package directly. See `_initialize' in this node() for a description of constructor parameters. require Bio::Tools::Blast::HSP; $hspObj = eval{ new Bio::Tools::Blast::HSP(-DATA =>\@hspData, -PARENT =>$sbjct_object, -NAME =>$hspCount, -PROGRAM =>'TBLASTN', ); }; @hspData includes the raw BLAST report data for a specific HSP, and is prepared by Bio::Tools::Blast::Sbjct.pm. INSTALLATION ============ This module is included with the central Bioperl distribution: http://bio.perl.org/Core/Latest ftp://bio.perl.org/pub/DIST Follow the installation instructions included in the README file. DESCRIPTION =========== The Bio::Tools::Blast::HSP.pm module encapsulates data and methods for manipulating, parsing, and analyzing HSPs ("High-scoring Segment Pairs") derived from BLAST sequence analysis. This module is a utility module used by the *Bio::Tools::Blast::Sbjct.pm* and is not intended for separate use. Please see documentation for Bio::Tools::Blast.pm for some basic information about using HSP objects (`Links:' in this node). * Supports BLAST versions 1.x and 2.x, gapped and ungapped. Bio::Tools::Blast::HSP.pm has the ability to extract a list of all residue indices for identical and conservative matches along both query and sbjct sequences. Since this degree of detail is not always needed, this behavior does not occur during construction of the HSP object. These data will automatically be collected as necessary as the HSP.pm object is used. DEPENDENCIES ============ Bio::Tools::Blast::HSP.pm is a concrete class that inherits from *Bio::Root::Object.pm* and relies on *Bio::Tools::Sbjct.pm* as a container for HSP.pm objects. *Bio::Seq.pm* and *Bio::UnivAln.pm* are employed for creating sequence and alignment objects, respectively. Relationship to UnivAln.pm & Seq.pm ----------------------------------- HSP.pm can provide the query or sbjct sequence as a *Bio::Seq.pm* object via the `seq' in this node() method. The HSP.pm object can also create a two-sequence *Bio::UnivAln.pm* alignment object using the the query and sbjct sequences via the `get_aln' in this node() method. Creation of alignment objects is not automatic when constructing the HSP.pm object since this level of functionality is not always required and would generate a lot of extra overhead when crunching many reports. FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR ====== Steve A. Chervitz, sac@genome.stanford.edu SEE ALSO ======== Bio::Tools::Blast::Sbjct.pm - Blast hit object. Bio::Tools::Blast.pm - Blast object. Bio::Seq.pm - Biosequence object Bio::UnivAln.pm - Biosequence alignment object. Bio::Root::Object.pm - Proposed base class for all Bioperl objects. Links: ------ http://bio.perl.org/Core/POD/Tools/Blast/Sbjct.pm.html http://bio.perl.org/Projects/modules.html - Online module documentation http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project http://bio.perl.org/ - Bioperl Project Homepage COPYRIGHT ========= Copyright (c) 1996-98 Steve A. Chervitz. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. APPENDIX ======== Methods beginning with a leading underscore are considered private and are intended for internal use by this module. They are not considered part of the public interface and are described here for documentation purposes only. _initialize ----------- Usage : n/a; automatically called by Bio::Root::Object::new() : Bio::Tools::Blast::HSP.pm objects are constructed : automatically by Bio::Tools::Sbjct.pm, so there is no need : for direct consumption. Purpose : Initializes HSP data and calls private methods to extract : the data for a given HSP. : Calls superclass constructor first (Bio::Root::Object.pm). Returns : n/a Argument : Named parameters passed from new(): : All tags must be uppercase (does not call _rearrange()). : -DATA => array ref containing raw data for one HSP. : -PARENT => Sbjct.pm object ref. : -NAME => integer (1..n). : -PROGRAM => string ('TBLASTN', 'BLASTP', etc.). See Also : `_set_data' in this node(), *Bio::Root::Object::new()*, *Bio::Tools::Blast::Sbjct::_set_hsps()* _set_data --------- Usage : n/a; called automatically during object construction. Purpose : Sets the query sequence, sbjct sequence, and the "match" data : which consists of the symbols between the query and sbjct lines : in the alignment. Argument : Array (all lines from a single, complete HSP, one line per element) Throws : Propagates any exceptions from the methods called ("See Also") See Also : `_set_seq' in this node(), `_set_residues' in this node(), `_set_score_stats' in this node(), `_set_match_stats' in this node(), `_initialize' in this node() _set_score_stats ---------------- Usage : n/a; called automatically by _set_data() Purpose : Sets various score statistics obtained from the HSP listing. Argument : String with any of the following formats: : blast2: Score = 30.1 bits (66), Expect = 9.2 : blast2: Score = 158.2 bits (544), Expect(2) = e-110 : blast1: Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40 : blast1: Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99 Throws : Exception if the stats cannot be parsed, probably due to a change : in the Blast report format. See Also : `_set_data' in this node() _set_match_stats ---------------- Usage : n/a; called automatically by _set_data() Purpose : Sets various matching statistics obtained from the HSP listing. Argument : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%) : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%) : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%) : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3 : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus Throws : Exception if the stats cannot be parsed, probably due to a change : in the Blast report format. Comments : The "Gaps = " data in the HSP header has a different meaning depending : on the type of Blast: for BLASTP, this number is the total number of : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the : query sequence only. Thus, it is safer to collect the data : separately by examining the actual sequence strings as is done : in _set_seq(). See Also : `_set_data' in this node(), `_set_seq' in this node() _set_seq_data ------------- Usage : n/a; called automatically when sequence data is requested. Purpose : Sets the HSP sequence data for both query and sbjct sequences. : Includes: start, stop, length, gaps, and raw sequence. Argument : n/a Throws : Propagates any exception thrown by _set_match_seq() Comments : Uses raw data stored by _set_data() during object construction. : These data are not always needed, so it is conditionally : executed only upon demand by methods such as gaps(), _set_residues(), : etc. _set_seq() does the dirty work. See Also : `_set_seq' in this node() _set_seq -------- Usage : n/a; called automatically by _set_seq_data() : $hsp_obj->($seq_type, @data); Purpose : Sets sequence information for both the query and sbjct sequences. : Directly counts the number of gaps in each sequence (if gapped Blast). Argument : $seq_type = 'query' or 'sbjct' : @data = all seq lines with the form: : Query: 61 SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120 Throws : Exception if data strings cannot be parsed, probably due to a change : in the Blast report format. Comments : Uses first argument to determine which data members to set : making this method sensitive data member name changes. : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc). Warning : Sequence endpoints are normalized so that start < end. This affects HSPs : for TBLASTN/X hits on the minus strand. Normalization facilitates use : of range information by methods such as match(). See Also : `_set_seq_data' in this node(), `matches' in this node(), `range' in this node(), `start' in this node(), `end' in this node() _set_residues ------------- Usage : n/a; called automatically when residue data is requested. Purpose : Sets the residue numbers representing the identical and : conserved positions. These data are obtained by analyzing the : symbols between query and sbjct lines of the alignments. Argument : n/a Throws : Propagates any exception thrown by _set_seq_data() and _set_match_seq(). Comments : These data are not always needed, so it is conditionally : executed only upon demand by methods such as seq_inds(). : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc). See Also : `_set_seq_data' in this node(), `_set_match_seq' in this node(), seq_inds() _set_match_seq -------------- Usage : n/a. Internal method. : $hsp_obj->_set_match_seq() Purpose : Set the 'match' sequence for the current HSP (symbols in between : the query and sbjct lines.) Returns : Array reference holding the match sequences lines. Argument : n/a Throws : Exception if the _matchList field is not set. Comments : The match information is not always necessary. This method : allows it to be conditionally prepared. : Called by _set_residues>() and seq_str(). See Also : `_set_residues' in this node(), `seq_str' in this node() score ----- Usage : $hsp_obj->score() Purpose : Get the Blast score for the HSP. Returns : Integer Argument : n/a Throws : n/a See Also : `bits' in this node() bits ---- Usage : $hsp_obj->bits() Purpose : Get the Blast score in bits for the HSP. Returns : Float Argument : n/a Throws : n/a See Also : `score' in this node() n - Usage : $hsp_obj->n() Purpose : Get the N value (num HSPs on which P/Expect is based). : This value is not defined with NCBI Blast2 with gapping. Returns : Integer or null string if not defined. Argument : n/a Throws : n/a Comments : The 'N' value is listed in parenthesis with P/Expect value: : e.g., P(3) = 1.2e-30 ---> (N = 3). : Not defined in NCBI Blast2 with gaps. : This typically is equal to the number of HSPs but not always. : To obtain the number of HSPs, use Bio::Tools::Blast::Sbjct::num_hsps(). See Also : `score' in this node() frame ----- Usage : $hsp_obj->frame() Purpose : Get the reading frame number (-/+ 1, 2, 3) (TBLASTN/X only). Returns : Integer or null string if not defined. Argument : n/a Throws : n/a signif() -------- Usage : $hsp_obj->signif() Purpose : Get the P-value or Expect value for the HSP. Returns : Float (0.001 or 1.3e-43) : Returns P-value if it is defined, otherwise, Expect value. Argument : n/a Throws : n/a Comments : Provided for consistency with Sbjct::signif() : Support for returning the significance data in different : formats (e.g., exponent only), is not provided for HSP objects. : This is only available for the Sbjct or Blast object. See Also : `p' in this node(), `expect' in this node(), *Bio::Tools::Blast::Sbjct::signif()* expect ------ Usage : $hsp_obj->expect() Purpose : Get the Expect value for the HSP. Returns : Float (0.001 or 1.3e-43) Argument : n/a Throws : n/a Comments : Support for returning the expectation data in different : formats (e.g., exponent only), is not provided for HSP objects. : This is only available for the Sbjct or Blast object. See Also : `p' in this node() p - Usage : $hsp_obj->p() Purpose : Get the P-value for the HSP. Returns : Float (0.001 or 1.3e-43) or undef if not defined. Argument : n/a Throws : n/a Comments : P-value is not defined with NCBI Blast2 reports. : Support for returning the expectation data in different : formats (e.g., exponent only) is not provided for HSP objects. : This is only available for the Sbjct or Blast object. See Also : `expect' in this node() length ------ Usage : $hsp->length( [seq_type] ) Purpose : Get the length of the aligned portion of the query or sbjct. Example : $hsp->length('query') Returns : integer Argument : seq_type: 'query' | 'sbjct' | 'total' (default = 'total') Throws : n/a Comments : 'total' length is the full length of the alignment : as reported in the denominators in the alignment section: : "Identical = 34/120 Positives = 67/120". : Developer note: when using the built-in length function within : this module, call it as CORE::length(). See Also : `gaps' in this node() gaps ---- Usage : $hsp->gaps( [seq_type] ) Purpose : Get the number of gaps in the query, sbjct, or total alignment. : Also can return query gaps and sbjct gaps as a two-element list : when in array context. Example : $total_gaps = $hsp->gaps(); : ($qgaps, $sgaps) = $hsp->gaps(); : $qgaps = $hsp->gaps('query'); Returns : scalar context: integer : array context without args: (int, int) = ('queryGaps', 'sbjctGaps') Argument : seq_type: 'query' | 'sbjct' | 'total' : (default = 'total', scalar context) : Array context can be "induced" by providing an argument of 'list' or 'array'. Throws : n/a See Also : `length' in this node(), `matches' in this node() matches ------- Usage : $hsp->matches([seq_type], [start], [stop]); Purpose : Get the total number of identical and conservative matches : in the query or sbjct sequence for the given HSP. Optionally can : report data within a defined interval along the seq. : (Note: 'conservative' matches are called 'positives' in the : Blast report.) Example : ($id,$cons) = $hsp_object->matches('sbjct'); : ($id,$cons) = $hsp_object->matches('query',300,400); Returns : 2-element array of integers Argument : (1) seq_type = 'query' | 'sbjct' (default = query) : (2) start = Starting coordinate (optional) : (3) stop = Ending coordinate (optional) Throws : Exception if the supplied coordinates are out of range. Comments : Relies on seq_str('match') to get the string of alignment symbols : between the query and sbjct lines which are used for determining : the number of identical and conservative matches. See Also : `length' in this node(), `gaps' in this node(), `seq_str' in this node(), *Bio::Tools::Blast::Sbjct::_adjust_contigs()* frac_identical -------------- Usage : $hsp_object->frac_identical( [seq_type] ); Purpose : Get the fraction of identical positions within the given HSP. Example : $frac_iden = $hsp_object->frac_identical('query'); Returns : Float (2-decimal precision, e.g., 0.75). Argument : seq_type: 'query' | 'sbjct' | 'total' : default = 'total' (but see comments below). Throws : n/a Comments : Different versions of Blast report different values for the total : length of the alignment. This is the number reported in the : denominators in the stats section: : "Identical = 34/120 Positives = 67/120". : BLAST-GP uses the total length of the alignment (with gaps) : WU-BLAST uses the length of the query sequence (without gaps). : Therefore, when called without an argument or an argument of 'total', : this method will report different values depending on the : version of BLAST used. : : To get the fraction identical among only the aligned residues, : ignoring the gaps, call this method with an argument of 'query' : or 'sbjct'. See Also : `frac_conserved' in this node(), `num_identical' in this node(), `matches' in this node() frac_conserved -------------- Usage : $hsp_object->frac_conserved( [seq_type] ); Purpose : Get the fraction of conserved positions within the given HSP. : (Note: 'conservative' positions are called 'positives' in the : Blast report.) Example : $frac_cons = $hsp_object->frac_conserved('query'); Returns : Float (2-decimal precision, e.g., 0.75). Argument : seq_type: 'query' | 'sbjct' : default = 'total' (but see comments below). Throws : n/a Comments : Different versions of Blast report different values for the total : length of the alignment. This is the number reported in the : denominators in the stats section: : "Identical = 34/120 Positives = 67/120". : BLAST-GP uses the total length of the alignment (with gaps) : WU-BLAST uses the length of the query sequence (without gaps). : Therefore, when called without an argument or an argument of 'total', : this method will report different values depending on the : version of BLAST used. : : To get the fraction conserved among only the aligned residues, : ignoring the gaps, call this method with an argument of 'query' : or 'sbjct'. See Also : `frac_conserved' in this node(), `num_conserved' in this node(), `matches' in this node() num_identical ------------- Usage : $hsp_object->num_identical(); Purpose : Get the number of identical positions within the given HSP. Example : $num_iden = $hsp_object->num_identical(); Returns : integer Argument : n/a Throws : n/a See Also : `num_conserved' in this node(), `frac_identical' in this node() num_conserved ------------- Usage : $hsp_object->num_conserved(); Purpose : Get the number of conserved positions within the given HSP. Example : $num_iden = $hsp_object->num_conserved(); Returns : integer Argument : n/a Throws : n/a See Also : `num_identical' in this node(), `frac_conserved' in this node() range ----- Usage : $hsp->range( [seq_type] ); Purpose : Gets the (start, end) coordinates for the query or sbjct sequence : in the HSP alignment. Example : ($qbeg, $qend) = $hsp->range('query'); : ($sbeg, $send) = $hsp->range('sbjct'); Returns : Two-element array of integers Argument : seq_type = string, 'query' or 'sbjct' (default = 'query') : (case insensitive). Throws : n/a See Also : `start' in this node(), `end' in this node() start ----- Usage : $hsp->start( [seq_type] ); Purpose : Gets the start coordinate for the query, sbjct, or both sequences : in the HSP alignment. Example : $qbeg = $hsp->start('query'); : $sbeg = $hsp->start('sbjct'); : ($qbeg, $sbeg) = $hsp->start(); Returns : scalar context: integer : array context without args: list of two integers Argument : In scalar context: seq_type = 'query' or 'sbjct' : (case insensitive). If not supplied, 'query' is used. : Array context can be "induced" by providing an argument of 'list' or 'array'. Throws : n/a See Also : `end' in this node(), `range' in this node() end --- Usage : $hsp->end( [seq_type] ); Purpose : Gets the end coordinate for the query, sbjct, or both sequences : in the HSP alignment. Example : $qbeg = $hsp->end('query'); : $sbeg = $hsp->end('sbjct'); : ($qbeg, $sbeg) = $hsp->end(); Returns : scalar context: integer : array context without args: list of two integers Argument : In scalar context: seq_type = 'query' or 'sbjct' : (case insensitive). If not supplied, 'query' is used. : Array context can be "induced" by providing an argument of 'list' or 'array'. Throws : n/a See Also : `start' in this node(), `range' in this node() strand ------ Usage : $hsp_object->strand( [seq_type] ) Purpose : Get the strand of the query or sbjct sequence. Example : print $hsp->strand('query'); : ($qstrand, $sstrand) = $hsp->strand(); Returns : -1, 0, or 1 : -1 = Minus strand, +1 = Plus strand : Returns 0 if strand is not defined, which occurs : for non-TBLASTN/X reports. : In scalar context without arguments, returns queryStrand value. : In array context without arguments, returns a two-element list : of strings (queryStrand, sbjctStrand). : Array context can be "induced" by providing an argument of 'list' or 'array'. Argument : seq_type: 'query' | 'sbjct' or undef Throws : n/a See Also : `_set_seq' in this node(), `_set_match_stats' in this node() seq --- Usage : $hsp->seq( [seq_type] ); Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object. Example : $seqObj = $hsp->seq('query'); Returns : Object reference for a Bio::Seq.pm object. Argument : seq_type = 'query' or 'sbjct' (default = 'query'). Throws : Propagates any exception that occurs during construction : of the Bio::Seq.pm object. Comments : The sequence is returned in an array of strings corresponding : to the strings in the original format of the Blast alignment. : (i.e., same spacing). See Also : `seq_str' in this node(), `seq_inds' in this node(), *Bio::Seq.pm* seq_str ------- Usage : $hsp->seq_str( seq_type ); Purpose : Get the full query, sbjct, or 'match' sequence as a string. : The 'match' sequence is the string of symbols in between the : query and sbjct sequences. Example : $str = $hsp->seq_str('query'); Returns : String Argument : seq_Type = 'query' or 'sbjct' or 'match' Throws : Exception if the argument does not match an accepted seq_type. Comments : Calls _set_residues() to set the 'match' sequence if it has : not been set already. See Also : `seq' in this node(), `seq_inds' in this node(), `_set_match_seq' in this node() seq_inds -------- Usage : $hsp->seq_inds( seq_type, class, collapse ); Purpose : Get a list of residue positions (indices) for all identical : or conserved residues in the query or sbjct sequence. Example : @ind = $hsp->seq_inds('query', 'identical'); : @ind = $hsp->seq_inds('sbjct', 'conserved'); : @ind = $hsp->seq_inds('sbjct', 'conserved', 1); Returns : List of integers : May include ranges if collapse is true. Argument : seq_type = 'query' or 'sbjct' (default = query) : class = 'identical' or 'conserved' (default = identical) : (can be shortened to 'id' or 'cons') : (actually, anything not 'id' will evaluate to 'conserved'). : collapse = boolean, if true, consecutive positions are merged : using a range notation, e.g., "1 2 3 4 5 7 9 10 11" : collapses to "1-5 7 9-11". This is useful for : consolidating long lists. Default = no collapse. Throws : n/a. Comments : Calls _set_residues() to set the 'match' sequence if it has : not been set already. See Also : `seq' in this node(), `_set_residues' in this node(), `collapse_nums' in this node(), *Bio::Tools::Blast::Sbjct::seq_inds()* get_aln ------- Usage : $hsp->get_aln() Purpose : Get a Bio::UnivAln.pm object constructed from the query + sbjct : sequences of the present HSP object. Example : $aln_obj = $hsp->get_aln(); Returns : Object reference for a Bio::UnivAln.pm object. Argument : n/a. Throws : Propagates any exception ocurring during the construction of : the Bio::UnivAln object. Comments : Requires Bio::UnivAln.pm. : The Bio::UnivAln.pm object is constructed from the query + sbjct : sequence objects obtained by calling seq(). : Gap residues are included (see $GAP_SYMBOL). It is important that : Bio::UnivAln.pm recognizes the gaps correctly. A strategy for doing : this is being considered. Currently it is hard-wired. See Also : `seq' in this node(), *Bio::UnivAln.pm* display ------- Usage : $sbjct_object->display( %named_parameters ); Purpose : Display information about Bio::Tools::Blast::Sbjct.pm data members : including: length, gaps, score, significance value, : sequences and sequence indices. Example : $object->display(-SHOW=>'stats'); Argument : Named parameters: (TAGS CAN BE UPPER OR LOWER CASE) : -SHOW => 'hsp', : -WHERE => filehandle (default = STDOUT) Returns : n/a Status : Experimental Comments : For more control over the display of sequence data, : use seq(), seq_str(), seq_inds(). See Also : `_display_seq' in this node(), `seq' in this node(), `seq_str' in this node(), `seq_inds' in this node(), `_display_matches' in this node(), *Bio::Root::Object::display()* _display_seq ------------ Usage : n/a; called automatically by display() Purpose : Display information about query and sbjct HSP sequences. : Prints the start, stop coordinates and the actual sequence. Example : n/a Argument : Returns : printf call. Status : Experimental Comments : For more control, use seq(), seq_str(), or seq_inds(). See Also : `display' in this node(), `seq' in this node(), `seq_str' in this node(), `seq_inds' in this node(), `_display_matches' in this node() _display_matches ---------------- Usage : n/a; called automatically by display() Purpose : Display information about identical and conserved positions : within both the query and sbjct sequences. Example : n/a Argument : Returns : printf call. Status : Experimental Comments : For more control, use seq_inds(). See Also : `display' in this node(), `seq_inds' in this node(), `_display_seq' in this node(), homol_data ---------- Usage : $data = $hsp_object->homo_data( %named_params ); Purpose : Gets similarity data for a single HSP. Returns : String: : "Homology data" for each HSP is in the format: : " " : where integer is the value returned by homol_score(). Argument : Named params: (UPPER OR LOWERCASE TAGS) : currently just one param is used: : -SEQ =>'query' or 'sbjct' Throws : n/a Status : Experimental Comments : This is a very experimental method used for obtaining a : coarse indication of: : 1) how strong the similarity is between the sequences in the HSP, : 3) the endpoints of the alignment (sequence monomer numbers) See Also : `homol_score' in this node(), *Bio::Tools::Blast.::homol_data()*, *Bio::Tools::Blast::Sbjct::homol_data()* homol_score ----------- Usage : $self->homol_score(); Purpose : Get a homology score (integer 1 - 3) as a coarse representation of : the strength of the similarity independent of sequence composition. : Based on the Blast bit score. Example : $hscore = $hsp->homol_score(); Returns : Integer Argument : n/a Throws : n/a Status : Experimental Comments : See @Bio::Tools::Blast::HSP::SCORE_CUTOFFS for the specific values. : Currently, BIT_SCORE HOMOL_SCORE : --------- ----------- : >=100 --> 3 : 30-100 --> 2 : < 30 --> 1 See Also : `homol_data' in this node() CLASS METHODS ============= collapse_nums ------------- Usage : @cnums = collapse_nums( @numbers ); Purpose : Collapses a list of numbers into a set of ranges of consecutive terms: : Useful for condensing long lists of consecutive numbers. : EXPANDED: : 1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32 : COLLAPSED: : 1-6 10 12-15 17 18 20-22 24 26 30-32 Argument : List of numbers and sorted numerically. Returns : List of numbers mixed with ranges of numbers (see above). Throws : n/a Comments : Probably belongs in a more general utility class. See Also : `seq_inds' in this node() FOR DEVELOPERS ONLY =================== Data Members ------------ Information about the various data members of this module is provided for those wishing to modify or understand the code. Two things to bear in mind: 1. Do NOT rely on these in any code outside of this module. All data members are prefixed with an underscore to signify that they are private. Always use accessor methods. If the accessor doesn't exist or is inadequate, create or modify an accessor (and let me know, too!). 2. This documentation may be incomplete and out of date. It is easy for these data member descriptions to become obsolete as this module is still evolving. Always double check this info and search for members not described here. An instance of Bio::Tools::Blast::HSP.pm is a blessed reference to a hash containing all or some of the following fields: FIELD VALUE -------------------------------------------------------------- (member names are mostly self-explanatory) _score : _bits : _p : _n : Integer. The 'N' value listed in parenthesis with P/Expect value: : e.g., P(3) = 1.2e-30 ---> (N = 3). : Not defined in NCBI Blast2 with gaps. : To obtain the number of HSPs, use Bio::Tools::Blast::Sbjct::num_hsps(). _expect : _queryLength : _queryGaps : _queryStart : _queryStop : _querySeq : _sbjctLength : _sbjctGaps : _sbjctStart : _sbjctStop : _sbjctSeq : _matchSeq : String. Contains the symbols between the query and sbjct lines which indicate identical (letter) and conserved ('+') matches or a mismatch (' '). _numIdentical : _numConserved : _identicalRes_query : _identicalRes_sbjct : _conservedRes_query : _conservedRes_sbjct : _match_indent : The number of leading space characters on each line containing the match symbols. _match_indent is 13 in this example: Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG Q +APWGLARIS G+ + Y YD+ AG ^^^^^^^^^^^^^ INHERITED DATA MEMBERS _name : From Bio::Root::Object.pm. : _parent : From Bio::Root::Object.pm. This member contains a reference to the : Bio::Tools::Blast::Sbjct.pm object to which this hit belongs.  File: pm.info, Node: Bio/Tools/Blast/HTML, Next: Bio/Tools/Blast/Run/LocalBlast, Prev: Bio/Tools/Blast/HSP, Up: Module List Bioperl Utility module for HTML formatting Blast reports ******************************************************** NAME ==== Bio::Tools::Blast::HTML.pm - Bioperl Utility module for HTML formatting Blast reports SYNOPSIS ======== Adding HTML-formatting ---------------------- use Bio::Tools::Blast::HTML qw(&get_html_func); $func = &get_html_func(); # Now as each line of the report is read, pass it to &$func($line). See `get_html_func' in this node() for details. Also see *Bio::Tools::Blast::to_html* for an example of usage. Removing HTML-formatting ------------------------ use Bio::Tools::Blast::HTML qw(&strip_html); &strip_html(\$blast_report_string) See `strip_html' in this node() for details. INSTALLATION ============ This module is included with the central Bioperl distribution: http://bio.perl.org/Core/Latest ftp://bio.perl.org/pub/DIST Follow the installation instructions included in the README file. DESCRIPTION =========== This module can be used to add HTML formatting to or remove HTML formatting from a raw Blast sequence analysis report. Hypertext links to the appropriate database are added for each hit sequence (GenBank, Swiss-Prot, PIR, PDB, SGD). This module is intended for use by Bio::Tools::Blast.pm and related modules, which provides a front-end to the methods in Bio::Tools::Blast::HTML.pm. DEPENDENCIES ============ Bio::Tools::Blast::HTML.pm does not inherit from any other class besides Exporter. It is used by Bio::Tools::Blast.pm only. This class relies on *Bio::Tools::WWW.pm* to provide key URLS for adding links in the Blast report to specific databases. The greatest dependency comes from the dynamic state of the web. URLs are are likely to change in the future, so all links cannot be guaranteed to work indefinitely. Feel free to report broken or incorrect database links (`FEEDBACK' in this node). Thanks! SEE ALSO ======== Bio::Tools::Blast.pm - Blast object. Bio::Tools::WWW.pm - URL repository. http://bio.perl.org/Projects/modules.html - Online module documentation http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project http://bio.perl.org/ - Bioperl Project Homepage FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR ====== Steve A. Chervitz, sac@genome.stanford.edu COPYRIGHT ========= Copyright (c) 1998-2000 Steve A. Chervitz. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. APPENDIX ======== Methods beginning with a leading underscore are considered private and are intended for internal use by this module. They are not considered part of the public interface and are described here for documentation purposes only. get_html_func ------------- Usage : $func_ref = &get_html_func( [array_ref] ); : This method is exported. Purpose : Provides a function that adds HTML formatting to a : raw Blast report line-by-line. : Utility method used by to_html() in Bio::Tools::Blast.pm. Returns : Reference to an anonymous function to be used while reading in : the raw report. : The function itself operates on the Blast report line-by-line : HTML-ifying it and printing it to STDOUT (or saving in the supplied : array ref) as it goes: : foreach( @raw_report ) { &$func_ref($_); } Argument : array ref (optional) for storing the HTML-formatted report. : If no argument is supplied, HTML output is sent to STDOUT. Throws : Croaks if an argument is supplied and is not an array ref. : The anonymous function returned by this method croaks if : the Blast output appears to be HTML-formatted already. Comments : Adapted from a script by Keith Robison November 1993 : krobison@nucleus.harvard.edu; http://golgi.harvard.edu/gilbert.html : Modified extensively by Steve Chervitz and Mike Cherry. : Some modifications are customizations for BLAST reports served up : by the Saccharomyces Genome Database. : Feel free to modify or replace portions of this code as necessary : to accomodate new BLAST datasets or changes to the Blast format. See Also : *Bio::Tools::Blast::to_html()* _set_markup_data ---------------- Usage : n/a; utility method used by get_html_func() Purpose : Sets various hashes and regexps used for adding HTML : to raw Blast output. Returns : n/a Comments : These items need be set only once. See Also : `get_html_func' in this node() _markup_database ---------------- Usage : n/a; utility method used by get_html_func() Purpose : Converts a cryptic database ID into a readable name. Returns : n/a Comments : This is used for converting local database IDs into : understandable terms. At present, it only recognizes : databases used locally at SGD. See Also : `get_html_func' in this node() _markup_report -------------- Usage : n/a; utility function used by get_html_func() Purpose : Adds HTML links to aid navigation of raw Blast output. Returns : n/a Comments : HTML-formatting is dependent on the Blast server that : provided the Blast report. Currently, this function can handle reports : produced by NCBI and SGD. Feel free to modify this function : to accomodate reports produced by other servers/sites. : : This function is simply a collection of substitution regexps : that recognize and modify the relevant lines of the Blast report. : All non-header lines of the report are passed through this function, : only the ones that match will get modified. : : The general scheme for adding links is as follows: : (Some of the SGD markups do not follow this scheme precisely : but this is the general trend.) : : For description lines in the summary table at the top of report: : : DB:SEQUENCE_ID DESCRIPTION SIGNIF_VAL : DB = links to the indicated database (if not Gen/Embl/Ddbj). : SEQUENCE_ID = links to GenBank entry for the sequence. : SIGNIF_VAL = internal link to relevant alignment section. : : For the alignment sections in the body of the report: : : DB:SEQUENCE_ID (Back | Top) DESCRIPTION : DB = links to the indicated database (if not Gen/Embl/Ddbj). : SEQUENCE_ID = links to GenBank entry for the sequence. : SIGNIF_VAL = internal link to alignment section. : Back = internal link to description line in summary section. : Top = internal link to top of page. : : 'DB' links are created for PDB, PIR, and SwissProt sequences. : : RE_PARSING HTML-FOMRATTED REPORTS: : ---------------------------------- : HTML-formatted reports generated by this module, as well as reports : obtained from the NCBI servers, should be parsable : by Bio::Tools::Blast.pm. Parsing HTML-formatted reports is : slow, however, since the HTML must be removed prior to parsing. : Parsing HTML-formatted reports is dependent on the specific structure : of the HTML and is generally not recommended. : : Note that since URLs can change without notice, links will need updating. : The links are obtained from Bio::Tools::WWW.pm updating that module : will update this as well. : Bugs : Some links to external databases are incorrect : (in particular, for 'bbs' and 'prf' databases on NCBI Blast reports. : Some links may fail as a result of the dynamic nature of the web. : Hypertext links are not added to hits without database ids. See Also : `get_html_func' in this node(), *Bio::Tools::WWW.pm*, `strip_html' in this node() _prog_ref_html -------------- Usage : n/a; utility method used by get_html_func(). Purpose : Get a special alert for BLAST reports against all of GenBank/EMBL. Returns : string with HTML See Also : `get_html_func' in this node() _genbank_alert -------------- Usage : n/a; utility method used by get_html_func(). Purpose : Get a special alert for BLAST reports against all of GenBank/EMBL. Returns : string with HTML See Also : `get_html_func' in this node() strip_html ---------- Usage : $boolean = &strip_html( string_ref ); : This method is exported. Purpose : Removes HTML formatting from a supplied string. : Attempts to restore the Blast report to enable : parsing by Bio::Tools::Blast.pm. Returns : Boolean: true if string was stripped, false if not. Argument : string_ref = reference to a string containing the whole Blast : report. Throws : Croaks if the argument is not a scalar reference. Comments : Based on code originally written by Alex Dong Li : (ali@genet.sickkids.on.ca). : This method does some Blast-specific stripping : (adds back a '>' character in front of each HSP : alignment listing). : : THIS METHOD IS HIGHLY ERROR-PRONE! : : Removal of the HTML tags and accurate reconstitution of the : non-HTML-formatted report is highly dependent on structure of : the HTML-formatted version. For example, it assumes that first : line of each alignment section (HSP listing) starts with a : anchor tag. This permits the reconstruction of the : original report in which these lines begin with a ">". : This is required for parsing. : : If the structure of the Blast report itself is not intended to : be a standard, the structure of the HTML-formatted version : is even less so. Therefore, the use of this method to : reconstitute parsable Blast reports from HTML-format versions : should be considered a temorary solution. See Also : *Bio::Tools::Blast::parse()*  File: pm.info, Node: Bio/Tools/Blast/Run/LocalBlast, Next: Bio/Tools/Blast/Run/Webblast, Prev: Bio/Tools/Blast/HTML, Up: Module List Bioperl module for running Blast analyses locally. ************************************************** NAME ==== Bio::Tools::Blast::Run::LocalBlast.pm - Bioperl module for running Blast analyses locally. SYNOPSIS ======== use Bio::Tools::Blast::Run::LocalBlast qw(&blast_local); &blast_local( %named_parameters); See `blast_local' in this node() for a description of available parameters. INSTALLATION ============ This module is included with the central Bioperl distribution: http://bio.perl.org/Core/Latest ftp://bio.perl.org/pub/DIST Follow the installation instructions included in the README file. DESCRIPTION =========== Bio::Tools::Blast::Run::LocalBlast.pm contains methods and data necessary for running Blast sequence analyses on a local machine. This module must be customized for a specific site. The basic requirements are that it conform to this minimal API: 1. Export a method called `blast_local' in this node() that accepts a Bio::Tools::Blast.pm object + named parameters as specified by `blast_local' in this node(). 2. The `blast_local' in this node() method should return a list of names of files containing the raw Blast reports. 3. Exporting arrays containing a list of available databases in the arrays `@Blast_dbn_local' and `@Blast_dbp_local'. The generic version of this module provides some rudimentary logic, but feel free to customize as necessary. Script Files ------------ Sometimes it is convenient to write an executable shell script for running a set of Blasts on a local machine. This script can be saved and re-executed as necessary or saved for documentation purposes. This module could provide a convenient way to consolidate the logic necessary for producing such script files or perhaps stubs of script file that could be further modified for Blast-ing specific datasets. DEPENDENCIES ============ Bio::Tools::Blast::Run::LocalBast.pm is used by Bio::Tools::Blast.pm The development of this is linked with the Blast.pm module and should be updated along with that module. SEE ALSO ======== Bio::Tools::Blast.pm - Blast object. Bio::Tools::Blast::Run::postclient.pl - Script for accessing remote server. Bio::Tools::Blast::Run::Webblast.pm - Utility module for running Blasts remotely. Bio::Tools::Blast::HTML.pm - Blast HTML-formating utility class. Bio::Seq.pm - Biosequence object http://bio.perl.org/Projects/modules.html - Online module documentation http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project http://bio.perl.org/ - Bioperl Project Homepage FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR ====== Steve A. Chervitz, sac@genome.stanford.edu VERSION ======= Bio::Tools::Blast::Run::LocalBlast.pm, 0.01 COPYRIGHT ========= Copyright (c) 1998 Steve A. Chervitz. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. APPENDIX ======== Methods beginning with a leading underscore are considered private and are intended for internal use by this module. They are not considered part of the public interface and are described here for documentation purposes only. blast_local ----------- Usage : @files = blast_local($blast_object, %namedParameters); : This method is exported. Purpose : Run a local Blast analysis on one or more sequences. : This method defines the API for your LocalBlast.pm module. Returns : Array containing a list of filenames of the Blast reports. Argument : $blast_object = object ref for a Bio::Tools::Blast.pm object. : %named parameters: (PARAMETER TAGS CAN BU UPPER OR LOWER CASE) : These are some basic parameters. Supply more as desired. : : -SEQS => ref to an array of Bio::Seq.pm objects. : -SEQ_FILES => ref to an array of strings containing full-path file names. : -PROG => name of blast program (blastp, blastx, etc.) : -DATABASE => name of database (see below.) : -EXPECT => expect value cutoff : -FILTER => sequence complexity filter ('default' or 'none') : -MATRIX => substitution scoring matrix (blast1 only for NCBI server) : -DESCR => integer, number of on-line descriptions (V, 100) : -ALIGN => integer, number of alignments (B, 100) : -GAP => 'on' or 'off' : -OUT_DIR => output directory to store blast result files : Throws : Exception if: : - Cannot obtain parameters by calling _rearrange() on the : first argument, which should be a Bio::Tools::Blast.pm object ref. : - No sequences are provided (objects or files). : - Sequence type is incompatible with Blast program type. : - Database name is not one of the valid names. Comments : ------------------------------------------------------------- Available programs: blastn, blastx, blastp, tblastn, tblastx ------------------------------------------------------------- Available local databases are: LIST YOUR LOCAL DATABASES HERE. These are exported by this module in the @Blast_dbp_local and @Blast_dbn_local arrays. ------------------------------------------------------------- Available substitution scoring matrices: (Here are the standard ones) BLOSUM: 100,90,85,80,75,70,65,62,60,55,50,45,40,35,30 PAM: 500,490,480,470,460,450,440,430,420,410,400,390,380,370,360,350 340,330,320,310,300,290,280,270,260,250,240,230,220,210,200,190, 180,170,160,150,140,130,120,110,100,90,80,70,60,50,40,30,20,10 OTHER: DAYHOFF, GONNET, IDENTITY, MATCH These are exported by this module in the @Blast_matrix_local ------------------------------------------------------------- Available sequence complexity filters: SEG, SEG+XNU, XNU, dust, none. See Also : _set_options(), _validate_options(), _blast_seqs(), _blast_files(), Bio::Tools::Blast.pm