This is Info file pm.info, produced by Makeinfo version 1.68 from the input file bigpm.texi.  File: pm.info, Node: Bio/Tools/CodonTable, Next: Bio/Tools/ESTScan, Prev: Bio/Tools/Blast/Sbjct, Up: Module List Bioperl codon table object ************************** NAME ==== Bio::Tools::CodonTable - Bioperl codon table object SYNOPSIS ======== This is a read-only class for all known codon tables. The IDs are the ones used by nucleotide sequence databases. All common IUPAC ambiguity codes for DNA, RNA and animo acids are recognized. # to use use Bio::Tools::CodonTable; # defaults to ID 1 "Standard" $myCodonTable = Bio::Tools::CodonTable->new(); $myCodonTable2 = Bio::Tools::CodonTable -> new ( -id => 3 ); # change codon table $myCodonTable->id(5); # examine codon table print join (' ', "The name of the codon table no.", $myCodonTable->id(4), "is:", $myCodonTable->name(), "\n"); # translate a codon $aa = $myCodonTable->translate('ACU'); $aa = $myCodonTable->translate('act'); $aa = $myCodonTable->translate('ytr'); # reverse translate an amino acid @codons = $myCodonTable->revtranslate('A'); @codons = $myCodonTable->revtranslate('Ser'); @codons = $myCodonTable->revtranslate('Glx'); @codons = $myCodonTable->revtranslate('cYS', 'rna'); #boolean tests print "Is a start\n" if $myCodonTable->is_start_codon('ATG'); print "Is a termianator\n" if $myCodonTable->is_ter_codon('tar'); print "Is a unknown\n" if $myCodonTable->is_unknown_codon('JTG'); DESCRIPTION =========== Codon tables are also called translation tables or genetics codes since that is what they try to represent. A bit more complete picture of the full complexity of codon usage in various taxonomic groups presented at the NCBI Genetic Codes Home page. CodonTable is a BioPerl class that knows all current translation tables that are used by primary nucleotide sequence databases (GenBank, EMBL and DDBJ). It provides methods to output information about tables and relationships between codons and amino acids. This class and its methods recognized all common IUPAC ambiguity codes for DNA, RNA and animo acids. The translation method follows the conventions in EMBL and TREMBL databases. It is a nuisance to separate RNA and cDNA representations of nucleic acid transcripts. The CodonTable object accepts codons of both type as input and allows the user to set the mode for output when reverse translating. Its default for output is DNA. Note: This class deals with individual codons and amino acids, only. Call it from your own objects to translate and reverse translate longer sequences. The amino acid codes are IUPAC recommendations for common amino acids: A Ala Alanine R Arg Arginine N Asn Asparagine D Asp Aspartic acid C Cys Cysteine Q Gln Glutamine E Glu Glutamic acid G Gly Glycine H His Histidine I Ile Isoleucine L Leu Leucine K Lys Lysine M Met Methionine F Phe Phenylalanine P Pro Proline S Ser Serine T Thr Threonine W Trp Tryptophan Y Tyr Tyrosine V Val Valine B Asx Aspartic acid or Asparagine Z Glx Glutamine or Glutamic acid X Xaa Any or unknown amino acid It is worth noting that, "Bacterial" codon table no. 11 produces an polypeptide that is, confusingly, identical to the standard one. The only differences are in available initiator codons. NCBI Genetic Codes home page: http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c EBI Translation Table Viewer: http://www.ebi.ac.uk/cgi-bin/mutations/trtables.cgi Amended ASN.1 version with ids 16 and 21 is at: ftp://ftp.ebi.ac.uk/pub/databases/geneticcode/ Thank your for Matteo diTomasso for the original Perl implementation of these tables. FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing lists Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR - Heikki Lehvaslaiho =========================== Email: heikki@ebi.ac.uk Address: EMBL Outstation, European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton Cambs. CB10 1SD, United Kingdom APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ id -- Title : id Usage : $obj->id(3); $id_integer = $obj->id(); Function: Sets or returns the id of the translation table. IDs are integers from 1 to 15, excluding 7 and 8 which have been removed as redundant. If an invalid ID is given the method returns 0, false. Example : Returns : value of id, a scalar, 0 if not a valid Args : newvalue (optional) name ---- Title : name Usage : $obj->name() Function: returns the descriptive name of the translation table Example : Returns : A string Args : None translate --------- Title : translate Usage : $obj->translate('YTR') Function: Returns one letter amino acid code for a codon input. Returns 'X' for unknown codons and codons that code for more than one amino acid. Returns an empty string if input is not three characters long. Exceptions for these are: - IUPAC amino acid code B for Aspartic Acid and Asparagine, is used. - IUPAC amino acid code Z for Glutamic Acid, Glutamine is used. - if the codon is two nucleotides long and if by adding an a third character 'N', it codes for a single amino acid (with exceptions above), return that, otherwise return empty string. Returns empty string for other input strings that are not three characters long. Example : Returns : One letter ambiguous IUPAC amino acid code Args : a codon = a three character, ambiguous IUPAC nucleotide string translate_strict ---------------- Title : translate_strict Usage : $obj->translate_strict('ACT') Function: returns one letter amino acid code for a codon input Fast and simple translation. User is responsible to resolve ambiguous nucleotide codes before calling this method. Returns 'X' for unknown codons and an empty string for input strings that are not three characters long. It is not recommended to use this method in a production environment. Use method translate, instead. Example : Returns : A string Args : a codon = a three nucleotide character string revtranslate ------------ Title : revtranslate Usage : $obj->revtranslate('G') Function: returns codons for an amino acid Returns an empty string for unknown amino acid codes. Ambiquous IUPAC codes Asx,B, (Asp,D; Asn,N) and Glx,Z (Glu,E; Gln,Q) are resolved. Both single and three letter amino acid codes are accepted. '*' and 'Ter' are used for terminator. By default, the output codons are shown in DNA. If the output is needed in RNA (tr/t/u/), add a second argument 'RNA'. Example : $obj->revtranslate('Gly', 'RNA') Returns : An array of three lower case letter strings i.e. codons Args : amino acid, 'RNA' is_start_codon -------------- Title : is_start_codon Usage : $obj->is_start_codon('ATG') Function: returns true (1) for all codons that can be used as a translation start, false (0) for others. Example : $myCodonTable->is_start_codon('ATG') Returns : boolean Args : codon is_ter_codon ------------ Title : is_ter_codon Usage : $obj->is_ter_codon('GAA') Function: returns true (1) for all codons that can be used as a translation tarminator, false (0) for others. Example : $myCodonTable->is_ter_codon('ATG') Returns : boolean Args : codon is_unknown_codon ---------------- Title : is_unknown_codon Usage : $obj->is_unknown_codon('GAJ') Function: returns false (0) for all codons that are valid, true (1) for others. Example : $myCodonTable->is_unknown_codon('NTG') Returns : boolean Args : codon _unambiquous_codons ------------------- Title : _unambiquous_codons Usage : @codons = _unambiquous_codons('ACN') Function: Example : Returns : array of strings (one letter unambiguous amino acid codes) Args : a codon = a three IUPAC nucleotide character string  File: pm.info, Node: Bio/Tools/ESTScan, Next: Bio/Tools/Fasta, Prev: Bio/Tools/CodonTable, Up: Module List Results of one ESTScan run ************************** NAME ==== Bio::Tools::ESTScan - Results of one ESTScan run SYNOPSIS ======== $estscan = Bio::Tools::ESTScan->new(-file => 'result.estscan'); # filehandle: $estscan = Bio::Tools::ESTScan->new( -fh => \*INPUT ); # parse the results # note: this class is-a Bio::Tools::AnalysisResult which implements # Bio::SeqAnalysisParserI, i.e., $genscan->next_feature() is the same while($gene = $estscan->next_prediction()) { # $gene is an instance of Bio::Tools::Prediction::Gene foreach my $orf ($gene->exons()) { # $orf is an instance of Bio::Tools::Prediction::Exon $cds_str = $orf->predicted_cds(); } } # essential if you gave a filename at initialization (otherwise the file # will stay open) $estscan->close(); DESCRIPTION =========== The ESTScan module provides a parser for ESTScan coding region prediction output. This module inherits off *Note Bio/Tools/AnalysisResult: Bio/Tools/AnalysisResult, and therefore implements the *Note Bio/SeqAnalysisParserI: Bio/SeqAnalysisParserI, interface. See *Note Bio/SeqAnalysisParserI: Bio/SeqAnalysisParserI,. FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR - Hilmar Lapp ==================== Email hlapp@gmx.net (or hilmar.lapp@pharma.novartis.com) Describe contact details here APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ analysis_method --------------- Usage : $estscan->analysis_method(); Purpose : Inherited method. Overridden to ensure that the name matches /estscan/i. Returns : String Argument : n/a next_feature ------------ Title : next_feature Usage : while($orf = $estscan->next_feature()) { # do something } Function: Returns the next gene structure prediction of the ESTScan result file. Call this method repeatedly until FALSE is returned. The returned object is actually a SeqFeatureI implementing object. This method is required for classes implementing the SeqAnalysisParserI interface, and is merely an alias for next_prediction() at present. Example : Returns : A Bio::Tools::Prediction::Gene object. Args : next_prediction --------------- Title : next_prediction Usage : while($gene = $estscan->next_prediction()) { # do something } Function: Returns the next gene structure prediction of the ESTScan result file. Call this method repeatedly until FALSE is returned. So far, this method DOES NOT work for reverse strand predictions, even though the code looks like. Example : Returns : A Bio::Tools::Prediction::Gene object. Args : close ----- Title : close Usage : $result->close() Function: Closes the file handle associated with this result file. Inherited method, overridden. Example : Returns : Args : _fasta_stream ------------- Title : _fasta_stream Usage : $result->_fasta_stream() Function: Gets/Sets the FASTA sequence IO stream for reading the contents of the file associated with this MZEF result object. If called for the first time, creates the stream from the filehandle if necessary. Example : Returns : Args :  File: pm.info, Node: Bio/Tools/Fasta, Next: Bio/Tools/GFF, Prev: Bio/Tools/ESTScan, Up: Module List Bioperl Fasta utility object **************************** NAME ==== Bio::Tools::Fasta.pm - Bioperl Fasta utility object INSTALLATION ============ This module is included with the central Bioperl distribution: http://bio.perl.org/Core/Latest ftp://bio.perl.org/pub/DIST Follow the installation instructions included in the README file. SYNOPSIS ======== Object Creation --------------- Bio::Tools::Fasta.pm cannot yet build sequence analysis objects given output from the FASTA program. This module can only be used for parsing Fasta multiple sequence files. This situation may change. Parse a Fasta multiple-sequence file. ------------------------------------- If $file is not a valid filename, data will be read from STDIN. See the `parse' in this node() method for a complete description of parameters. use Bio::Tools::Fasta qw(:obj); $seqCount = $Fasta->parse(-file => $file, -seqs => \@seqs, -ids => \@ids, -edit_id => 1, -edit_seq => 1, -descs => \@descs, -filt_func => \&filter_seq # filter input sequences. -exec_func => \&process_seq # process each seq as it is parsed. ); DESCRIPTION =========== The Bio::Tools::Fasta.pm module, in its present incarnation, encapsulates data and methods for managing Fasta multiple sequence files (reading, parsing). It does not yet work with output from the Fasta sequence analysis program (`References & Information about the FASTA program' in this node). The documentation of this module is incomplete. For some examples of usage, see the DEMO SCRIPTS section. Unlike "Blast", the term "Fasta" is ambiguous since it refers to both a sequence file format and a sequence analysis utility (I use "FASTA" to refer to the program; "Fasta" for the file format). Ultimately, this module will be able to work with both Fasta sequence files as well as result files generated by FASTA sequence analysis, analogous to the way the Bio::Tools::Blast.pm object is used for working with Blast output. References & Information about the FASTA program ------------------------------------------------ *WEBSITES:* ftp://ftp.virginia.edu/pub/fasta/ - FASTA software http://www2.ebi.ac.uk/fasta3/ - FASTA server at EBI *PUBLICATIONS:* (with PubMed links) Pearson W.R. and Lipman, D.J. (1988). Improved tools for biological sequence comparison. PNAS 85:2444-2448 http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=3162770&form=6&db=m&Dopt=b Pearson, W.R. (1990). Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in Enzymology 183:63-98. http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=2156132&form=6&db=m&Dopt=b USAGE ===== A simple demo script is included with the central Bioperl distribution (`INSTALLATION' in this node) and is also available from: http://bio.perl.org/Core/Examples/seq/ DEPENDENCIES ============ Bio::Tools::Fasta.pm is a concrete class that inherits from *Bio::Tools::SeqAnal.pm*. This module also relies on *Bio::Seq.pm* for producing sequence objects. FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. vsns-bcd-perl@lists.uni-bielefeld.de - General discussion vsns-bcd-perl-guts@lists.uni-bielefeld.de - Technically-oriented discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR ====== Steve A. Chervitz, sac@genome.stanford.edu VERSION ======= Bio::Tools::Fasta.pm, 0.014 COPYRIGHT ========= Copyright (c) 1998 Steve A. Chervitz. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. SEE ALSO ======== Bio::Tools::SeqAnal.pm - Sequence analysis object base class. Bio::Seq.pm - Biosequence object Bio::Root::Object.pm - Proposed base class for all Bioperl objects. http://bio.perl.org/Projects/modules.html - Online module documentation http://bio.perl.org/ - Bioperl Project Homepage `References & Information about the FASTA program' in this node. TODO ==== * Incorporate code for parsing Fasta sequence analysis reports. * Improve documentation. APPENDIX ======== Methods beginning with a leading underscore are considered private and are intended for internal use by this module. They are not considered part of the public interface and are described here for documentation purposes only. _initialize ----------- Usage : n/a; automatically called by Bio::Root::Object::new() Purpose : Calls superclass constructor. Returns : n/a Argument : Named parameters passed to new() are processed by this method. : At present, none are processed. See Also : *Bio::Tools::SeqAnal::_initialize()* parse ----- Usage : $fasta_obj->$parse( %named_parameters) Purpose : Parse a set of Fasta sequences or Fasta reports from a file or STDIN. : (Currently only Fasta sequence parsing is supported). Returns : Integer (number of sequences or Fasta reports parsed). Argument : Named parameters: (TAGS CAN BE UPPER OR LOWER CASE) : -FILE => string (name of file containing Fasta-formatted sequences. : Optional. If a valid file is not supplied, : STDIN will be used). : -SEQS => boolean (true = parse a Fasta multi-sequence file : false = parse a Fasta sequence analysis report). : -IDS => array_ref (optional). : -DESCS => array_ref (optional). : -EDIT_ID => boolean (true = edit sequence identifiers). : -EDIT_SEQ => boolean (true = edit sequence data). : -TYPE => string (type of sequences to be processed: : 'dna', 'rna', 'amino'), : -FILT_FUNC => func_ref (reference to a function for filtering out : sequences as they are being parsed. : This function should return a boolean : (true if the sequence should be filtered out) : and accept three arguments as shown : in this sample filter function: : sub filt { : my($len, $id, $desc); : # $len is the sequence length : return ($len < 25 and $id =~ /^123/); : } : This function will screen out any sequence : less than 25 in length and having an id : starting with '123'. : -SAVE_ARRAY => array_ref (reference to an array for storing all : sequence objects as they are created.) : -EXEC_FUNC => func_ref (reference to a function for processing each : sequence object) as it is parsed. : When working with sequences, this function : should accept a Bio::Seq.pm object as its : sole argument. Return value will be ignored). : -STRICT => boolean (increases sensitivity to errors). : : ---------------------------------------------------------------- : NOTE: Parameters such as seqs, ids, desc, edit_id, edit_seq, type : are used only when parsing Fasta sequence files. : Additional parameters will be added as necessary for : parsing Fasta sequence analysis reports. : : NOTE: When retreiving sequence data instead of objects, : the -SEQS, -IDS, and -DESCS parameters should all be array refs. : This constitutes a signal that sequence objects are not : to be constructed. : Throws : Propagates any exceptions thrown by _parse_seq_stream() Comments : WORKING WITH SEQUENCE DATA: --------------------------- The parse method can return sequence data bundled into Bio::Seq.pm objects or in raw format (separate arrays for seq, id, and desc data). The reason for this is that in some cases, you don't particularly need to work with sequence objects and it is inefficient to build objects just to have them broken apart. However, there is something to be said for choosing one approach -- always return seq objects. In this way, the object becomes the basic unit of exchange. For now, both options are allowed. The story will be different for Fasta sequence analysis report objects since these are a much more complex data type and it would be unwieldy and dangerous to return parsed data unencapsulated from an object. See Also : `_parse_seq_stream' in this node(), `_set_id_desc' in this node(), `_get_parse_seq_func' in this node() _parse_seq_stream ----------------- Usage : n/a. Internal method called by parse() Purpose : Obtains the function to be used during parsing and calls read(). Returns : Integer (the number of sequences read) Argument : Named parameters (forwarded from parse()) Throws : Propagates any exception thrown by _get_parse_seq_func() and read(). Comments : This method permits the sequence data to be parsed as it is being read in. The motivation here is that when working with a potentially huge set of sequences, there is no need to read them all into memory before you start processing them. In fact, you may only be interested in a few of them. This method constructs and returns a closure for parsing a single Fasta sequence. It is called automatically by the read() method inherited from Bio::Root::Object.pm. Another issue concerns what to do with the parsed data: save it or use it? Sometimes you need to process all sequence data as a group (eg., sorting). Other times, you can safely process each sequence as it gets parsed and then move on to the next. By delivering each sequence as it gets parsed, the client is free to decide what to do with it. See Also : `_get_parse_seq_func' in this node(), *Bio::Root::Object::read()* _get_parse_seq_func ------------------- Usage : n/a. Internal method called by _parse_seq_stream() Purpose : Generates a function reference to be used for parsing raw sequence data : as it is being loaded by read(). : Used when parsing Fasta sequence files. Returns : Function reference (actually a closure) Argument : Named parameters forwared from _parse_seq_stream() Throws : Exceptions due to improper argument types. : (to be elaborated...) Comments : The function generated performs sequence editing if : the -EDIT_SEQ parse() parameter is is non-zero. : This consists of removing any ambiguous residues at begin : or end of seq. : Regardless of -EDIT_SEQ, all sequence will be edited to remove : whitespace and non-alphabetic chars. : Gaps characters are permitted ('.' and '-'). : (Need a more universal way to identify gap characters.) : If sequence objects are generated and an -EXEC_FUNC is supplied, : each object will be destroyed after calling this function. : This prevents memory usage problems for large runs. See Also : `parse' in this node(), `_parse_seq_stream' in this node(), *Bio::Root::Object::_rearrange*() edit_id ------- Usage : $fasta_obj->edit_id() Purpose : Set/Get a boolean indicator as to whether sequence IDs should be edited. : Used when parsing Fasta sequence files. Returns : Boolean (true if the IDs are to be edited). Argument : Boolean (optional) Throws : n/a See Also : `_set_id_desc' in this node(), `_get_parse_seq_func' in this node() edit_seqs --------- Usage : $fasta_obj->edit_seqs() Purpose : Set/Get a boolean indicator as to whether sequences should be edited. : Used when parsing Fasta sequence files. Returns : Boolean (true if the sequences are to be edited). Argument : Boolean (optional) Throws : n/a See Also : `_get_parse_seq_func' in this node() _set_id_desc ------------ Usage : n/a. Internal method called by _get_parse_seq_func() Purpose : Sets the _id and _desc data members, optionally editing the id. : Used when parsing Fasta sequence files. Returns : 2-element list containing: ($id, $description) Argument : String containing raw ID + description (leading '>' will be stripped) Throws : n/a Comments : Optionally edits the ID if the '_edit_id' field is true. : Descriptions are not altered. : ID Edits: : 1) Uppercases the ID. : 2) If the ID has any | characters the following is performed: : a) Replace | characters with _ characters. : (prevent regexp and shell trouble). : b) Cleans up complex identifiers. : Some GenBank specifiers have multiple parts: : >gi|2980872|gnl|PID|e1283615 homeobox protein SHOTb : Only the first ID is saved as the official ID. : Extra ids will be included at the end of the : description between brackets: : GI_2980872 homeobox protein SHOTb [ GNL PID e1283615 ] : : ID editing is somewhat experimental. See Also : `_get_parse_seq_func' in this node(), `edit_id' in this node() num_seqs -------- Usage : $fasta_obj->num_seqs() Purpose : Get the number of sequences read by the Fasta object. Returns : Integer Argument : n/a Throws : n/a FOR DEVELOPERS ONLY =================== Data Members ------------ Information about the various data members of this module is provided for those wishing to modify or understand the code. Two things to bear in mind: 1. Do NOT rely on these in any code outside of this module. All data members are prefixed with an underscore to signify that they are private. Always use accessor methods. If the accessor doesn't exist or is inadequate, create or modify an accessor (and let me know, too!). 2. This documentation may be incomplete and out of date. It is easy for these data member descriptions to become obsolete as this module is still evolving. Always double check this info and search for members not described here. An instance of Bio::Tools::Fasta.pm is a blessed reference to a hash containing all or some of the following fields: FIELD VALUE -------------------------------------------------------------- _seqCount Number of sequences parsed. _edit_seq Boolean. Should sequences be edited during parsing? _edit_id Boolean. Should ids be edited during parsing? More data members will be added when code for Fasta report processing is incorporated. INHERITED DATA MEMBERS (See Bio::Tools::SeqAnal.pm for inherited data members.)  File: pm.info, Node: Bio/Tools/GFF, Next: Bio/Tools/Genscan, Prev: Bio/Tools/Fasta, Up: Module List A Bio::SeqAnalysisParserI compliant GFF format parser ***************************************************** NAME ==== Bio::Tools::GFF - A Bio::SeqAnalysisParserI compliant GFF format parser SYNOPSIS ======== use Bio::Tool::GFF; # specify input via -fh or -file my $gffio = Bio::Tools::GFF(-fh => \*STDIN, -gff_version => 2); my $feature; # loop over the input stream while($feature = $gffio->next_feature()) { # do something with feature } $gffio->close(); # you can also obtain a GFF parser as a SeqAnalasisParserI in # HT analysis pipelines (see Bio::SeqAnalysisParserI and # Bio::Factory::SeqAnalysisParserFactory) my $factory = Bio::Factory::SeqAnalysisParserFactory->new(); my $parser = $factory->get_parser(-input => \*STDIN, -method => "gff"); while($feature = $parser->next_feature()) { # do something with feature } DESCRIPTION =========== This class provides a simple GFF parser and writer. In the sense of a SeqAnalysisParser, it parses an input file or stream into SeqFeatureI objects, but is not in any way specific to a particular analysis program and the output that program produces. That is, if you can get your analysis program spit out GFF, here is your result parser. FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR - Matthew Pocock ======================= Email mrp@sanger.ac.uk APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new --- Title : new Usage : Function: Creates a new instance. Recognized named parameters are -file, -fh, and -gff_version. Returns : a new object Args : names parameters next_feature ------------ Title : next_feature Usage : $seqfeature = $gffio->next_feature(); Function: Returns the next feature available in the input file or stream, or undef if there are no more features. Example : Returns : A Bio::SeqFeatureI implementing object, or undef if there are no more features. Args : none from_gff_string --------------- Title : from_gff_string Usage : $gff->from_gff_string($feature, $gff_string); Function: Sets properties of a SeqFeatureI object from a GFF-formatted string. Interpretation of the string depends on the version that has been specified at initialization. This method is used by next_feature(). It actually dispatches to one of the version-specific (private) methods. Example : Returns : void Args : A Bio::SeqFeatureI implementing object to be initialized The GFF-formatted string to initialize it from _from_gff1_string ----------------- Title : _from_gff1_string Usage : Function: Example : Returns : void Args : A Bio::SeqFeatureI implementing object to be initialized The GFF-formatted string to initialize it from _from_gff2_string ----------------- Title : _from_gff2_string Usage : Function: Example : Returns : void Args : A Bio::SeqFeatureI implementing object to be initialized The GFF2-formatted string to initialize it from write_feature ------------- Title : write_feature Usage : $gffio->write_feature($feature); Function: Writes the specified SeqFeatureI object in GFF format to the stream associated with this instance. Example : Returns : Args : A Bio::SeqFeatureI implementing object to be serialized gff_string ---------- Title : gff_string Usage : $gffstr = $gffio->gff_string($feature); Function: Obtain the GFF-formatted representation of a SeqFeatureI object. The formatting depends on the version specified at initialization. This method is used by write_feature(). It actually dispatches to one of the version-specific (private) methods. Example : Returns : A GFF-formatted string representation of the SeqFeature Args : A Bio::SeqFeatureI implementing object to be GFF-stringified _gff1_string ------------ Title : _gff1_string Usage : $gffstr = $gffio->_gff1_string Function: Example : Returns : A GFF1-formatted string representation of the SeqFeature Args : A Bio::SeqFeatureI implementing object to be GFF-stringified _gff2_string ------------ Title : _gff2_string Usage : $gffstr = $gffio->_gff2_string Function: Example : Returns : A GFF2-formatted string representation of the SeqFeature Args : A Bio::SeqFeatureI implementing object to be GFF2-stringified gff_version ----------- Title : _gff_version Usage : $gffversion = $gffio->gff_version Function: Example : Returns : The GFF version this parser will accept and emit. Args : none  File: pm.info, Node: Bio/Tools/Genscan, Next: Bio/Tools/HMMER/Domain, Prev: Bio/Tools/GFF, Up: Module List Results of one Genscan run ************************** NAME ==== Bio::Tools::Genscan - Results of one Genscan run SYNOPSIS ======== $genscan = Bio::Tools::Genscan->new(-file => 'result.genscan'); # filehandle: $genscan = Bio::Tools::Genscan->new( -fh => \*INPUT ); # parse the results # note: this class is-a Bio::Tools::AnalysisResult which implements # Bio::SeqAnalysisParserI, i.e., $genscan->next_feature() is the same while($gene = $genscan->next_prediction()) { # $gene is an instance of Bio::Tools::Prediction::Gene, which inherits # off Bio::SeqFeature::Gene::Transcript. # # $gene->exons() returns an array of # Bio::Tools::Prediction::Exon objects # all exons: @exon_arr = $gene->exons(); # initial exons only @init_exons = $gene->exons('Initial'); # internal exons only @intrl_exons = $gene->exons('Internal'); # terminal exons only @term_exons = $gene->exons('Terminal'); # singleton exons: ($single_exon) = $gene->exons(); } # essential if you gave a filename at initialization (otherwise the file # will stay open) $genscan->close(); DESCRIPTION =========== The Genscan module provides a parser for Genscan gene structure prediction output. It parses one gene prediction into a Bio::SeqFeature::Gene::Transcript- derived object. This module also implements the Bio::SeqAnalysisParserI interface, and thus can be used wherever such an object fits. See *Note Bio/SeqAnalysisParserI: Bio/SeqAnalysisParserI,. FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/ AUTHOR - Hilmar Lapp ==================== Email hlapp@gmx.net Describe contact details here APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ analysis_method --------------- Usage : $genscan->analysis_method(); Purpose : Inherited method. Overridden to ensure that the name matches /genscan/i. Returns : String Argument : n/a next_feature ------------ Title : next_feature Usage : while($gene = $genscan->next_feature()) { # do something } Function: Returns the next gene structure prediction of the Genscan result file. Call this method repeatedly until FALSE is returned. The returned object is actually a SeqFeatureI implementing object. This method is required for classes implementing the SeqAnalysisParserI interface, and is merely an alias for next_prediction() at present. Example : Returns : A Bio::Tools::Prediction::Gene object. Args : next_prediction --------------- Title : next_prediction Usage : while($gene = $genscan->next_prediction()) { # do something } Function: Returns the next gene structure prediction of the Genscan result file. Call this method repeatedly until FALSE is returned. Example : Returns : A Bio::Tools::Prediction::Gene object. Args : _parse_predictions ------------------ Title : _parse_predictions() Usage : $obj->_parse_predictions() Function: Parses the prediction section. Automatically called by next_prediction() if not yet done. Example : Returns : _prediction ----------- Title : _prediction() Usage : $gene = $obj->_prediction() Function: internal Example : Returns : _add_prediction --------------- Title : _add_prediction() Usage : $obj->_add_prediction($gene) Function: internal Example : Returns : _predictions_parsed ------------------- Title : _predictions_parsed Usage : $obj->_predictions_parsed Function: internal Example : Returns : TRUE or FALSE _has_cds -------- Title : _has_cds() Usage : $obj->_has_cds() Function: Whether or not the result contains the predicted CDSs, too. Example : Returns : TRUE or FALSE _read_fasta_seq --------------- Title : _read_fasta_seq() Usage : ($id,$seqstr) = $obj->_read_fasta_seq(); Function: Simple but specialised FASTA format sequence reader. Uses $self->_readline() to retrieve input, and is able to strip off the traling description lines. Example : Returns : An array of two elements.  File: pm.info, Node: Bio/Tools/HMMER/Domain, Next: Bio/Tools/HMMER/Results, Prev: Bio/Tools/Genscan, Up: Module List One particular domain hit from HMMER ************************************ NAME ==== Bio::Tools::HMMER::Domain - One particular domain hit from HMMER SYNOPSIS ======== Read the Bio::Tools::HMMER::Results docs DESCRIPTION =========== A particular domain score. We reuse the Homol SeqFeature system here, so this inherits off Homol SeqFeature. As this code originally came from a separate project, there are some backward compatibility stuff provided to keep this working with old code. Don\'t forget this inherits off Bio::SeqFeature, so all your usual nice start/end/score stuff is ready for use. CONTACT ======= Describe contact details here APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ add_alignment_line ------------------ Title : add_alignment_line Usage : $domain->add_alignment_line($line_from_hmmer_output); Function: add an alignment line to this Domain object Returns : Nothing Args : scalar Adds an alignment line, mainly for storing the HMMER alignments as flat text which can be reguritated. You\'re right. This is *not nice* and not the right way to do it. C\'est la vie. each_alignment_line ------------------- Title : each_alignment_line Usage : foreach $line ( $domain->each_alignment_line ) Function: reguritates the alignment lines as they were fed in. only useful realistically for printing. Example : Returns : Args : None get_nse ------- Title : get_nse Usage : $domain->get_nse() Function: Provides a seqname/start-end format, useful for unique keys. nse stands for name-start-end It is used alot in Pfam Example : Returns : A string Args : Optional seperator 1 and seperator 2 (default / and -) hmmacc ------ Title : hmmacc Usage : $domain->hmmacc($newacc) Function: set get for HMM accession number. This is placed in the homol feature of the HMM Example : Returns : Args : hmmname ------- Title : hmmname Usage : $domain->hmmname($newname) Function: set get for HMM accession number. This is placed in the homol feature of the HMM Example : Returns : Args : bits ---- Title : bits Usage : Function: backward compatibility. Same as score Example : Returns : Args : evalue ------ Title : evalue Usage : Function: $domain->evalue($value); Example : Returns : Args : seqbits ------- Title : seqbits Usage : Function: $domain->seqbits($value); Example : Returns : Args : seq_range --------- Title : seq_range Usage : Function: Throws an exception to catch scripts which need to upgrade Example : Returns : Args : hmm_range --------- Title : hmm_range Usage : Function: Throws an exception to catch scripts which need to upgrade Example : Returns : Args :  File: pm.info, Node: Bio/Tools/HMMER/Results, Next: Bio/Tools/HMMER/Set, Prev: Bio/Tools/HMMER/Domain, Up: Module List Object representing HMMER output results **************************************** NAME ==== Bio::Tools::HMMER::Results - Object representing HMMER output results SYNOPSIS ======== # parse a hmmsearch file (can also parse a hmmpfam file) $res = new Bio::Tools::HMMER::Results( -file => 'output.hmm' , -type => 'hmmsearch'); # print out the results for each sequence foreach $seq ( $res->each_Set ) { print "Sequence bit score is",$seq->bits,"\n"; foreach $domain ( $seq->each_Domain ) { print " Domain start ",$domain->start," end ",$domain->end, " score ",$domain->bits,"\n"; } } # new result object on a sequence/domain cutoff of 25 bits sequence, 15 bits domain $newresult = $res->filter_on_cutoff(25,15); # alternative way of getting out all domains directly foreach $domain ( $res->each_Domain ) { print "Domain on ",$domain->seqname," with score ", $domain->bits," evalue ",$domain->evalue,"\n"; } DESCRIPTION =========== This object represents HMMER output, either from hmmsearch or hmmpfam. For hmmsearch, a series of HMMER::Set objects are made, one for each sequence, which have the the bits score for the object. For hmmpfam searches, only one Set object is made. These objects come from the original HMMResults modules used internally in Pfam, written by Ewan. Ewan then converted them to bioperl objects in 1999. That conversion is meant to be backwardly compatible, but may not be (caveat emptor). FEEDBACK ======== Mailing Lists ------------- User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://www.bioperl.org/MailList.html - About the mailing lists Reporting Bugs -------------- Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web: bioperl-bugs@bio.perl.org http://www.bioperl.org/bioperl-bugs/ AUTHOR - Ewan Birney ==================== Email birney@sanger.ac.uk Describe contact details here APPENDIX ======== The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ next_feature ------------ Title : next_feature Usage : while( my $feat = $res->next_feature ) { # do something } Function: SeqAnalysisParserI implementing function Example : Returns : A Bio::SeqFeatureI compliant object, in this case, each DomainUnit object, ie, flattening the Sequence aspect of this. Args : None number ------ Title : number Usage : print "There are ",$res->number," domains hit\n"; Function: provides the number of domains in the HMMER report add_Domain ---------- Title : add_Domain Usage : $res->add_Domain($unit) Function: adds a domain to the results array. Mainly used internally. Args : A Bio::Tools::HMMER::Domain each_Domain ----------- Title : each_Domain Usage : foreach $domain ( $res->each_Domain() ) Function: array of Domain units which are held in this report Returns : array Args : none domain_bits_cutoff_from_evalue ------------------------------ Title : domain_bits_cutoff_from_evalue Usage : $cutoff = domain_bits_cutoff_from_evalue(0.01); Function: return a bits cutoff from an evalue using the scores here. Somewhat interesting logic: Find the two bit score which straddle the evalue if( 25 is between these two points) return 25 else return the midpoint. This logic tries to ensure that with large signal to noise separation one still has sensible 25 bit cutoff Returns : Args : write_FT_output --------------- Title : write_FT_output Usage : $res->write_FT_output(\*STDOUT,'DOMAIN') Function: writes feature table output ala swissprot Returns : Args : filter_on_cutoff ---------------- Title : filter_on_cutoff Usage : $newresults = $results->filter_on_cutoff(25,15); Function: Produces a new HMMER::Results module which has been trimmed at the cutoff. Returns : a Bio::Tools::HMMER::Results module Args : sequence cutoff and domain cutoff. in bits score if you want one cutoff, simply use same number both places write_ascii_out --------------- Title : write_ascii_out Usage : $res->write_ascii_out(\*STDOUT) Function: writes as seq seq_start seq_end model-acc model_start model_end model_name Returns : Args : FIXME: Now that we have no modelacc, this is probably a bad thing. write_GDF_bits -------------- Title : write_GDF_bits Usage : $res->write_GDF_bits(25,15,\*STDOUT) Function: writes GDF format with a sequence,domain threshold Returns : Args : add_Set ------- Title : add_Set Usage : Mainly internal function Function: Returns : Args : each_Set -------- Title : each_Set Usage : Function: Returns : Args : get_Set ------- Title : get_Set Usage : $set = $res->get_Set('sequence-name'); Function: returns the Set for a particular sequence Returns : a HMMER::Set object Args : name of the sequence _parse_hmmpfam -------------- Title : _parse_hmmpfam Usage : $res->_parse_hmmpfam($filehandle) Function: Returns : Args : _parse_hmmsearch ---------------- Title : _parse_hmmsearch Usage : $res->_parse_hmmsearch($filehandle) Function: Returns : Args :