This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: Bio/LocationI,  Next: Bio/Parse,  Prev: Bio/Location/WidestCoordPolicy,  Up: Module List

Abstract interface of a Location on a Sequence
**********************************************

NAME
====

   Bio::LocationI - Abstract interface of a Location on a Sequence

SYNOPSIS
========

     # get a LocationI somehow
     printf( "start = %d, end = %d, strand = %s, seq_id = %s\n",
     	    $location->start, $location->end, $location->strand,
     	    $location->seq_id);
     print "location str is ", $location->to_FTstring(), "\n";

DESCRIPTION
===========

   This Interface defines the methods for a Bio::LocationI, an object
which encapsulates a location on a biological sequence.  Locations need
not be attached to actual sequences as they are stand alone objects.
LocationI objects are used by Bio::SeqFeatureI objects to manage and
represent locations for a Sequence Feature.

FEEDBACK
========

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org             - General discussion
     http://bio.perl.org/MailList.html - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Jason Stajich
======================

   Email jason@chg.mc.duke.edu

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

start
-----

     Title   : start
     Usage   : $start = $location->start();
     Function: Get the start coordinate of this location as defined by the
               currently active coordinate computation policy. In simple cases,
               this will return the same number as min_start() and max_start(),
               in more ambiguous cases like fuzzy locations the number may be
               equal to one or neither of both.

     We override this here from RangeI in order to delegate 'get' to
     a Bio::Location::CoordinatePolicy implementing object. Implementing
     classes may also wish to provide 'set' functionality, in which
     case they *must* override this method. The implementation
     provided here will throw an exception if called with arguments.

     Returns : A positive integer value.
     Args    : none

end
---

     Title   : end
     Usage   : $end = $location->end();
     Function: Get the end coordinate of this location as defined by the
               currently active coordinate computation policy. In simple cases,
               this will return the same number as min_end() and max_end(),
               in more ambiguous cases like fuzzy locations the number may be
               equal to one or neither of both.

     We override this here from RangeI in order to delegate 'get' to
     a Bio::Location::CoordinatePolicy implementing object. Implementing
     classes may also wish to provide 'set' functionality, in which
     case they *must* override this method. The implementation
     provided here will throw an exception if called with arguments.

     Returns : A positive integer value.
     Args    : none

min_start
---------

     Title   : min_start
     Usage   : my $minstart = $location->min_start();
     Function: Get minimum starting point of feature.

     Note that an implementation must not call start() in this method.

     Returns : integer or undef if no minimum starting point.
     Args    : none

max_start
---------

     Title   : max_start
     Usage   : my $maxstart = $location->max_start();
     Function: Get maximum starting point of feature.

     Note that an implementation must not call start() in this method
     unless start() is overridden such as not to delegate to the
     coordinate computation policy object.

     Returns : integer or undef if no maximum starting point.
     Args    : none

start_pos_type
--------------

     Title   : start_pos_type
     Usage   : my $start_pos_type = $location->start_pos_type();
     Function: Get start position type encoded as text

     Known valid values are 'BEFORE' (<5..100), 'AFTER' (>5..100),
     'EXACT' (5..100), 'WITHIN' ((5.10)..100), 'BETWEEN', (5^6), with
     their meaning best explained by their GenBank/EMBL location string
     encoding in brackets.

     Returns : string ('BEFORE', 'AFTER', 'EXACT','WITHIN', 'BETWEEN')
     Args    : none

min_end
-------

     Title   : min_end
     Usage   : my $minend = $location->min_end();
     Function: Get minimum ending point of feature.

     Note that an implementation must not call end() in this method
     unless end() is overridden such as not to delegate to the
     coordinate computation policy object.

     Returns : integer or undef if no minimum ending point.
     Args    : none

max_end
-------

     Title   : max_end
     Usage   : my $maxend = $location->max_end();
     Function: Get maximum ending point of feature.

     Note that an implementation must not call end() in this method
     unless end() is overridden such as not to delegate to the
     coordinate computation policy object.

     Returns : integer or undef if no maximum ending point.
     Args    : none

end_pos_type
------------

     Title   : end_pos_type
     Usage   : my $end_pos_type = $location->end_pos_type();
     Function: Get end position encoded as text.

     Known valid values are 'BEFORE' (5..<100), 'AFTER' (5..>100),
     'EXACT' (5..100), 'WITHIN' (5..(90.100)), 'BETWEEN', (5^6), with
     their meaning best explained by their GenBank/EMBL location string
     encoding in brackets.

     Returns : string ('BEFORE', 'AFTER', 'EXACT','WITHIN', 'BETWEEN')
     Args    : none

seq_id
------

     Title   : seq_id
     Usage   : my $seqid = $location->seq_id();
     Function: Get/Set seq_id that location refers to
     Returns : seq_id (a string)
     Args    : [optional] seq_id value to set

coordinate_policy
-----------------

     Title   : coordinate_policy
     Usage   : $policy = $location->coordinate_policy();
               $location->coordinate_policy($mypolicy); # set may not be possible
     Function: Get the coordinate computing policy employed by this object.

     See Bio::Location::CoordinatePolicyI for documentation about
     the policy object and its use.

     The interface *does not* require implementing classes to accept
     setting of a different policy. The implementation provided here
     does, however, allow to do so.

     Implementors of this interface are expected to initialize every
     new instance with a CoordinatePolicyI object. The implementation
     provided here will return a default policy object if none has
     been set yet. To change this default policy object call this
     method as a class method with an appropriate argument. Note that
     in this case only subsequently created Location objects will be
     affected.

     Returns : A Bio::Location::CoordinatePolicyI implementing object.
     Args    : On set, a Bio::Location::CoordinatePolicyI implementing object.

to_FTstring
-----------

     Title   : to_FTstring
     Usage   : my $locstr = $location->to_FTstring()
     Function: returns the FeatureTable string of this location
     Returns : string
     Args    : none


File: pm.info,  Node: Bio/Parse,  Next: Bio/PrimarySeq,  Prev: Bio/LocationI,  Up: Module List

The Bioperl ReadSeq interface
*****************************

NAME
====

   Seq::Parse - The Bioperl ReadSeq interface

SYNOPSIS
========

   Simple perl interface/wrapper to D.G. Gilbert's ReadSeq program.  Used
by Seq.pm when internal parsing/formatting code fails.

   **NOTE** Not currently used by any of the core bioperl modules.  It can
be used as a standalone interface to the readseq package but manual
editing of is required. See the first few lines of the .pm file for
details.

DESCRIPTION
===========

   This package was called upon by Seq.pm when internal attemts to format
or parse a sequence fail. It is currently not used by any bioperl core
module. Basically we decided to deal with sequence formatting in a
different way.

   Parse.pm is a simple interface to D.G. Gilbert's ReadSeq program, it is
not meant to be particularly elegant or efficient. The interface should be
abstract enough to allow future versions to seamlessly access other
sequence conversion programs besides ReadSeq.

   At this time the interface methods have not been fully thought out or
implemented. Suggestions are welcome.

   If ReadSeq is not on the local system, or this package is not properly
configured, Seq.pm will (hopefully) realize this and not attempt to use
this code.

USAGE
=====

   The ReadSeq executable needs to be installed on your system.

   Readseq is freely distributed and is available in shell archive (.shar)
form via FTP from ftp.bio.indiana.edu (129.79.224.25) in the
molbio/readseq directory.  (URL) ftp://ftp.bio.indiana.edu/molbio/readseq/

Standalone
----------

     use Parse;

With Seq.pm
-----------

   If properly configured, Seq.pm will automatically use this module when
internal methods at parsing or formatting fail.

   The correct path to the readseq executable is configured into this
module during the 'make Makefile.PL' phase of installation.

   Manual edits needed in Parse.pm if auto-configuration does not happen:

   - Change the value of *$READSEQ_PATH* so that it defines a path to the
ReadSeq executable on your system.

   - Uncomment the line(s) containing $OK = "Y"

As a standalone module
----------------------

   Parse.pm should be usable is a standalone module. See the usage
instructions.

Sequence Conversion/Formatting
------------------------------

   ReadSeq has trouble with raw sequences so an explicit
convert_from_raw() method has been written.  The following code will
return the sequence "GAATTCGTT" as a GCG formatted string.

     $reply  = &Parse::convert_from_raw(-sequence=>'GAATTCGTT',
                                        -fmt=>'gcg');

   The "fmt" named-parameter field can be set for the following formats:

     IG        (or 'Stanford')
     GenBank   (or 'GB')
     NBRF
     EMBL
     GCG
     Strider
     Fitch
     Fasta
     Zuker
     Phylip3.2 (use 'Phylip3')
     Phylip
     Plain     (or 'Raw')
     PIR       (or 'CODATA')
     MSF
     ASN.1     (use 'ASN1')
     PAUP
     Pretty

   The "options" named-parameter field can be used to pass switches
directly to the ReadSeq executable. This option should only be used by
people familiar with operating ReadSeq on the command-line. Use at your own
risk as this has not been fully tested.

   As an example, the ReadSeq switch -c will cause all of the characters
in the formatted sequence to be returned in lowercase.

     $reply  = &Parse::convert_from_raw(-sequence=>"$seq_string",
                                        -options=>'-c',
                                        -fmt=>'gcg');

Appendix
========

   The following documentation describes the various functions contained
in this package. Some functions are for internal use and are not meant to
be called by the user; they are preceded by an underscore ("_").

_rearrange()
------------

     Title     : _rearrange
      Usage     : n/a (internal function)
      Function  : Rearranges named parameters to requested order.
      Example   : &_rearrange([SEQUENCE,ID,DESC],@p);
      Returns   : @params - an array of parameters in the requested order.
      Argument  : $order : a reference to an array which describes the desired
                           order of the named parameters.
                  @param : an array of parameters, either as a list (in
                           which case the function simply returns the list),
                           or as an associative array (in which case the
                           function sorts the values according to @{$order}
                           and returns that new array.

_write_tmp_file()
-----------------

     Title     : _write_tmp_file
      Usage     : n/a (internal function)
      Function  : Writes a temporary file to disk. Uses
                : the POSIX tmpnam() call to get path &
                : filename. Should be more portable than
                : just writing to /tmp.
                :
      Example   : &_write_tmp_file("$formatted_sequence");
      Returns   : string containing the temp file path
      Argument  : string that is to be written to disk

version()
---------

     Title     : version
      Usage     : &Parse::version;
      Function  : Prints current package version
      Example   : &Parse::version;
      Returns   : none
      Argument  : none
                :

convert_from_raw()
------------------

     Title     : convert_from_raw()
      Usage     : print &Parse::convert_from_raw(-sequence=>$raw_seq,
                :                                -fmt=>'asn1');
                :
                : $reply  = &Parse::convert_from_raw(-sequence=>'GAATTCGTT',
                :                                    -options=>'-c',
                :                                    -fmt=>'gcg');
                :
      Function  : ReadSeq does not function well when called upon
                : to read or convert "raw" or unformatted sequence
                : strings or files. This code will take a given
                : raw sequence and manipulate it into FASTA
                : format before invoking ReadSeq.
                :
                : The following named paramters may be used as
                : arguments:
                :
                :  -sequence=>     Sequence string.
                :  -fmt=>          Format sequence will be converted to.
                :  -options=>      String containing command-line
                :                  switches for ReadSeq. Passed
                :                  directly.
                :
      Example   : see usage
      Returns   : Formatted sequence string
      Argument  : named parameters, see function
                :

convert()
---------

     Title     : convert
                :
      Usage     : print &Parse::convert(-sequence=>$raw_seq,
                :                       -fmt=>'asn1');
                :
                : $reply  = &Parse::convert(-sequence=>'GAATTCGTT',
                :                           -options=>'-c',
                :                           -fmt=>'gcg');
                :
                : $reply  = &Parse::convert(-location=>'/tmp/a.seq',
                :                           -fmt=>'raw');
                :
      Note      : ReadSeq does not function well when called upon
                : to read or convert "raw" or unformatted sequence
                : strings or files. User beware.
                :
      Function  : Will read/parse a given sequence string *OR* a given
                : sequence file.
                :
                : If a sequence string AND a sequence file path are
                : both passed in, the file path will be used with no
                : complaint.
                :
                : The following named paramters may be used as
                : arguments:
                :
                :  -sequence=>     Sequence string.
                :  -location=>     Sequence file path.
                :  -fmt=>          Format sequence will be converted to.
                :  -options=>      String containing command-line
                :                  switches for ReadSeq. Passed
                :                  directly.
                :
      Example   : see usage
      Returns   : Formatted sequence string
      Argument  : named parameters, see function
                :

ACKNOWLEDGEMENTS
================

SEE ALSO
========

     Core bioperl modules

REFERENCES
==========

   Bioperl Project http://bio.perl.org

COPYRIGHT
=========

   Copyright (c) 1997-1998 Chris Dagdigian, Georg Fuellen, Steven E.
Brenner and others. All Rights Reserved.  This module is free software;
you can redistribute it and/or modify it under the same terms as Perl
itself.


File: pm.info,  Node: Bio/PrimarySeq,  Next: Bio/PrimarySeqI,  Prev: Bio/Parse,  Up: Module List

Bioperl lightweight Sequence Object
***********************************

NAME
====

   Bio::PrimarySeq - Bioperl lightweight Sequence Object

SYNOPSIS
========

     # The Bio::SeqIO for file reading, Bio::DB::GenBank for
     # database reading
     use Bio::Seq;
     use Bio::SeqIO;
     use Bio::DB::GenBank;

     #make from memory
     $seqobj = Bio::PrimarySeq->new ( -seq => 'ATGGGGTGGGCGGTGGGTGGTTTG',
     			    -id  => 'GeneFragment-12',
     			    -accession_number => 'X78121',
     			    -moltype => 'dna'
     			    );

     # read from file
     $inputstream = Bio::SeqIO->new(-file => "myseq.fa",-format => 'Fasta');
     $seqobj = $inputstream->next_seq();

     # get from database
     $db = Bio::DB::GenBank->new();
     $seqobj = $db->get_Seq_by_acc('X78121');

     # to get out parts of the sequence.

     print "Sequence ", $seqobj->id(), " with accession ", $seqobj->accession, " and desc ", $seqobj->desc, "\n";

     $string  = $seqobj->seq();
     $string2 = $seqobj->subseq(1,40);

DESCRIPTION
===========

   PrimarySeq is a lightweight Sequence object, storing little more than
the sequence, its name, a computer useful unique name. It does not contain
sequence features or other information.  To have a sequence with sequence
features you should use the Seq object which uses this object.

   Sequence objects are defined by the Bio::PrimarySeqI interface, and this
object is a pure Perl implementation of the interface (if that's gibberish
to you, don't worry. The take home message is that this object is the
bioperl default sequence object, but other people can use their own
objects as sequences if they so wish). If you are interested in wrapping
your own objects as compliant Bioperl sequence objects, then you should
read the Bio::PrimarySeqI documentation

   The documenation of this object is a merge of the Bio::PrimarySeq and
Bio::PrimarySeqI documentation.  This allows all the methods which you can
call on sequence objects here.

Reimplementation
================

   The Sequence object was completely rewritten for the 0.6 series. This
was because the old Sequence object was becoming heavily bloated and
difficult to maintain. There are some key changes from the old object to
the new object, but basically, everything should work with the new object
with a minimal number of changes.

   The key change is that the format IO has been removed from this object
and moved to the Bio::SeqIO system, which provides a much better way to
encapsulate the sequence format reading. Please read the SeqIO
documentation, but the take home message is that lines like

     # old style reading from files
     $seq = Bio::Seq->new( -file => "myfile");

   Becomes

     # new style reading from files.
     $inputstream = Bio::SeqIO->new( -file => "myfile", -format => 'Fasta');
     $seqobj = $inputstream->next_seq();

   For writing files, a similar system is used

     # old style writing to files
     print OUTPUT $seq->layout_fasta;

     # new style writing to files
     $outputstream = Bio::SeqIO->new( -fh => \*OUTPUT, -format => 'Fasta');
     $outputstream->write_seq($seqobj);

Deprecated methods
------------------

   A number of methods which were present in the old 0.04/0.05 series have
been deprecated.  Most of these methods work as before, but provide a
warning that someone has called a deprecated method.

getseq - use seq/subseq instead
str - use seq/subseq instead
ary - use seq/subseq with your own split afterwards
type - use moltype, but notice that moltype returns different values (lowercase)
FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org             - General discussion
     http://bio.perl.org/MailList.html - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Ewan Birney
====================

   Email birney@sanger.ac.uk

   Describe contact details here

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

new
---

     Title   : new
     Usage   : $seq    = Bio::PrimarySeq->new( -seq => 'ATGGGGGTGGTGGTACCCT',
                                               -id  => 'human_id',
     					   -accession_number => 'AL000012',
     					   );

     Function: Returns a new primary seq object from
               basic constructors, being a string for the sequence
               and strings for id and accession_number.

     Note that you can provide an empty sequence string. However, in
     this case you MUST specify the type of sequence you wish to
     initialize by the parameter -moltype. See moltype() for possible
     values.
      Returns : a new Bio::PrimarySeq object

seq
---

     Title   : seq
     Usage   : $string    = $obj->seq()
     Function: Returns the sequence as a string of letters. The
               case of the letters is left up to the implementer.
               Suggested cases are upper case for proteins and lower case for
               DNA sequence (IUPAC standard), but you should not rely on this
     Returns : A scalar

subseq
------

     Title   : subseq
     Usage   : $substring = $obj->subseq(10,40);
     Function: returns the subseq from start to end, where the first base
               is 1 and the number is inclusive, ie 1-2 are the first two
               bases of the sequence
     Returns : a string
     Args    :

length
------

     Title   : length
     Usage   : $len = $seq->length()
     Function:
     Example :
     Returns : integer representing the length of the sequence.
     Args    :

display_id
----------

     Title   : display_id
     Usage   : $id_string = $obj->display_id();
     Function: returns the display id, aka the common name of the Sequence object.

     The semantics of this is that it is the most likely string to be
     used as an identifier of the sequence, and likely to have "human" readability.
     The id is equivalent to the ID field of the GenBank/EMBL databanks and
     the id field of the Swissprot/sptrembl database. In fasta format, the >(\S+)
     is presumed to be the id, though some people overload the id to embed other
     information. Bioperl does not use any embedded information in the ID field,
     and people are encouraged to use other mechanisms (accession field for example,
     or extending the sequence object) to solve this.

     Returns : A string
     Args    : None

accession_number
----------------

     Title   : accession_number
     Usage   : $unique_key = $obj->accession_number;
     Function: Returns the unique biological id for a sequence, commonly
               called the accession_number. For sequences from established
               databases, the implementors should try to use the correct
               accession number. Notice that primary_id() provides the
               unique id for the implemetation, allowing multiple objects
               to have the same accession number in a particular implementation.

     For sequences with no accession number, this method should return
     "unknown".
      Returns : A string
      Args    : A string (optional) for setting

primary_id
----------

     Title   : primary_id
     Usage   : $unique_key = $obj->primary_id;
     Function: Returns the unique id for this object in this
               implementation. This allows implementations to manage
               their own object ids in a way the implementaiton can control
               clients can expect one id to map to one object.

     For sequences with no natural primary id, this method should return
     a stringified memory location.
      Returns : A string
      Args    : A string (optional, for setting)

moltype
-------

     Title   : moltype
     Usage   : if( $obj->moltype eq 'dna' ) { /Do Something/ }
     Function: Returns the type of sequence being one of
               'dna', 'rna' or 'protein'. This is case sensitive.

     This is not called <type> because this would cause
     upgrade problems from the 0.5 and earlier Seq objects.

     Returns : a string either 'dna','rna','protein'. NB - the object must
               make a call of the type - if there is no type specified it
               has to guess.
     Args    : none

desc
----

     Title   : desc
     Usage   : $obj->desc($newval)
     Function: Get/set description of the sequence.
     Example :
     Returns : value of desc
     Args    : newvalue (optional)

can_call_new
------------

     Title   : can_call_new
     Usage   :
     Function:
     Example :
     Returns :
     Args    :

id
--

     Title   : id
     Usage   : $id = $seq->id()
     Function: This is mapped on display_id
     Example :
     Returns :
     Args    :

Methods Inherieted from Bio::PrimarySeqI
========================================

   These methods are available on Bio::PrimarySeq, although they are
actually implemented on Bio::PrimarySeqI

revcom
------

     Title   : revcom
     Usage   : $rev = $seq->revcom()
     Function: Produces a new Bio::SeqI implementing object which
               is the reversed complement of the sequence. For protein
               sequences this throws an exception of
               "Sequence is a protein. Cannot revcom"

     The id is the same id as the orginal sequence, and the
     accession number is also indentical. If someone wants to
     track that this sequence has be reversed, it needs to
     define its own extensions

     To do an inplace edit of an object you can go:

     $seqobj = $seqobj->revcom();

     This of course, causes Perl to handle the garbage
     collection of the old object, but it is roughly speaking as
     efficient as an inplace edit.

     Returns : A new (fresh) Bio::SeqI object
     Args    : none

trunc
-----

     Title   : trunc
     Usage   : $subseq = $myseq->trunc(10,100);
     Function: Provides a truncation of a sequence,

     Example :
     Returns : a fresh Bio::SeqI implementing object
     Args    :

Internal methods
================

   These are internal methods to PrimarySeq

_guess_type
-----------

     Title   : _guess_type
     Usage   :
     Function:
     Example :
     Returns :
     Args    :


File: pm.info,  Node: Bio/PrimarySeqI,  Next: Bio/Range,  Prev: Bio/PrimarySeq,  Up: Module List

Interface definition for a Bio::PrimarySeq
******************************************

NAME
====

   Bio::PrimarySeqI - Interface definition for a Bio::PrimarySeq

SYNOPSIS
========

     # get a Bio::PrimarySeqI compliant object somehow

     # to test this is a seq object

     $obj->isa("Bio::PrimarySeqI") || $obj->throw("$obj does not implement the Bio::PrimarySeqI interface");

     # accessors

     $string    = $obj->seq();
     $substring = $obj->subseq(12,50);
     $display   = $obj->display_id(); # for human display
     $id        = $obj->primary_id(); # unique id for this object, implementation defined
     $unique_key= $obj->accession_number();
                        # unique biological id

     # object manipulation

     eval {
     	$rev    = $obj->revcom();
     };
     if( $@ ) {
     	$obj->throw("Could not reverse complement. Probably not DNA. Actual exception\n$@\n");
     }

     $trunc = $obj->trunc(12,50);

     # $rev and $trunc are Bio::PrimarySeqI compliant objects

DESCRIPTION
===========

   This object defines an abstract interface to basic sequence
information. PrimarySeq is an object just for the sequence and its
name(s), nothing more. Seq is the larger object complete with features.
There is a pure perl implementation of this in Bio::PrimarySeq. If you
just want to use Bio::PrimarySeq objects, then please read that module
first. This module defines the interface, and is of more interest to
people who want to wrap their own Perl Objects/RDBs/FileSystems etc in way
that they "are" bioperl sequence objects, even though it is not using Perl
to store the sequence etc.

   This interface defines what bioperl consideres necessary to "be" a
sequence, without providing an implementation of this. (An implementation
is provided in Bio::PrimarySeq). If you want to provide a Bio::PrimarySeq
'compliant' object which in fact wraps another object/database/out-of-perl
experience, then this is the correct thing to wrap, generally by providing
a wrapper class which would inheriet from your object and this
Bio::PrimarySeqI interface. The wrapper class then would have methods
lists in the "Implementation Specific Functions" which would provide these
methods for your object.

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Ewan Birney
====================

   Email birney@sanger.ac.uk

   Describe contact details here

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

Implementation Specific Functions
=================================

   These functions are the ones that a specific implementation must define.

seq
---

     Title   : seq
     Usage   : $string    = $obj->seq()
     Function: Returns the sequence as a string of letters. The
               case of the letters is left up to the implementer.
               Suggested cases are upper case for proteins and lower case for
               DNA sequence (IUPAC standard),
               but implementations are suggested to keep an open mind about
               case (some users... want mixed case!)
     Returns : A scalar
     Status  : Virtual

subseq
------

     Title   : subseq
     Usage   : $substring = $obj->subseq(10,40);
     Function: returns the subseq from start to end, where the first base
               is 1 and the number is inclusive, ie 1-2 are the first two
               bases of the sequence

     Start cannot be larger than end but can be equal

     Returns : a string
     Args    :
     Status  : Virtual

display_id
----------

     Title   : display_id
     Usage   : $id_string = $obj->display_id();
     Function: returns the display id, aka the common name of the Sequence object.

     The semantics of this is that it is the most likely string
     to be used as an identifier of the sequence, and likely to
     have "human" readability.  The id is equivalent to the ID
     field of the GenBank/EMBL databanks and the id field of the
     Swissprot/sptrembl database. In fasta format, the >(\S+) is
     presumed to be the id, though some people overload the id
     to embed other information. Bioperl does not use any
     embedded information in the ID field, and people are
     encouraged to use other mechanisms (accession field for
     example, or extending the sequence object) to solve this.

     Notice that $seq->id() maps to this function, mainly for
     legacy/convience issues
      Returns : A string
      Args    : None
      Status  : Virtual

accession_number
----------------

     Title   : accession_number
     Usage   : $unique_biological_key = $obj->accession_number;
     Function: Returns the unique biological id for a sequence, commonly
               called the accession_number. For sequences from established
               databases, the implementors should try to use the correct
               accession number. Notice that primary_id() provides the
               unique id for the implemetation, allowing multiple objects
               to have the same accession number in a particular implementation.

     For sequences with no accession number, this method should return
     "unknown".
      Returns : A string
      Args    : None
      Status  : Virtual

primary_id
----------

     Title   : primary_id
     Usage   : $unique_implementation_key = $obj->primary_id;
     Function: Returns the unique id for this object in this
               implementation. This allows implementations to manage
               their own object ids in a way the implementaiton can control
               clients can expect one id to map to one object.

     For sequences with no accession number, this method should return
     a stringified memory location.
      Returns : A string
      Args    : None
      Status  : Virtual

can_call_new
------------

     Title   : can_call_new
     Usage   : if( $obj->can_call_new ) {
                 $newobj = $obj->new( %param );
     	 }
     Function: can_call_new returns 1 or 0 depending
               on whether an implementation allows new
               constructor to be called. If a new constructor
               is allowed, then it should take the followed hashed
               constructor list.

     $myobject->new( -seq => $sequence_as_string,
     			   -display_id  => $id
     			   -accession_number => $accession
     			   -moltype => 'dna',
     			   );
      Example :
      Returns : 1 or 0
      Args    :

moltype
-------

     Title   : moltype
     Usage   : if( $obj->moltype eq 'dna' ) { /Do Something/ }
     Function: Returns the type of sequence being one of
               'dna', 'rna' or 'protein'. This is case sensitive.

     This is not called <type> because this would cause
     upgrade problems from the 0.5 and earlier Seq objects.

     Returns : a string either 'dna','rna','protein'. NB - the object must
               make a call of the type - if there is no type specified it
               has to guess.
     Args    : none
     Status  : Virtual

Optional Implementation Functions
=================================

   The following functions rely on the above functions. A implementing
class does not need to provide these functions, as they will be provided
by this class, but is free to override these functions.

   All of revcom(), trunc(), and translate() create new sequence objects.
They will call new() on the class of the sequence object instance passed
as argument, unless can_call_new() returns FALSE. In the latter case a
Bio::PrimarySeq object will be created. Implementors which really want to
control how objects are created (eg, for object persistence over a
database, or objects in a CORBA framework), they are encouraged to
override these methods

revcom
------

     Title   : revcom
     Usage   : $rev = $seq->revcom()
     Function: Produces a new Bio::PrimarySeqI implementing object which
               is the reversed complement of the sequence. For protein
               sequences this throws an exception of "Sequence is a protein. Cannot revcom"

     The id is the same id as the orginal sequence, and the accession number
     is also indentical. If someone wants to track that this sequence has be
     reversed, it needs to define its own extensions

     To do an inplace edit of an object you can go:

     $seq = $seq->revcom();

     This of course, causes Perl to handle the garbage collection of the old
     object, but it is roughly speaking as efficient as an inplace edit.

     Returns : A new (fresh) Bio::PrimarySeqI object
     Args    : none

trunc
-----

     Title   : trunc
     Usage   : $subseq = $myseq->trunc(10,100);
     Function: Provides a truncation of a sequence,

     Example :
     Returns : a fresh Bio::PrimarySeqI implementing object
     Args    : Two integers denoting first and last base of the sub-sequence.

translate
---------

     Title   : translate
     Usage   : $protein_seq_obj = $dna_seq_obj->translate
               #if full CDS expected:
               $protein_seq_obj = $cds_seq_obj->translate(undef,undef,undef,undef,1);
     Function:

     Provides the translation of the DNA sequence using full
     IUPAC ambiguities in DNA/RNA and amino acid codes.

     The full CDS translation is identical to EMBL/TREMBL
     database translation. Note that the trailing terminator
     character is removed before returning the translation
     object.

     Note: if you set $dna_seq_obj->verbose(1) you will get a
     warning if the first codon is not a valid initiator.

     Returns : A Bio::PrimarySeqI implementing object
     Args    : character for terminator (optional) defaults to '*'
               character for unknown amino acid (optional) defaults to 'X'
               frame (optional) valid values 0, 1, 2, defaults to 0
               codon table id (optional) defaults to 1
               complete coding sequence expected, defaults to 0 (false)
               boolean, throw exception if not complete CDS (true) or defaults to warning (false)

id
--

     Title   : id
     Usage   : $id = $seq->id()
     Function: ID of the sequence. This should normally be (and actually is in
               the implementation provided here) just a synonym for display_id().
     Example :
     Returns : A string.
     Args    :

length
------

     Title   : length
     Usage   : $len = $seq->length()
     Function:
     Example :
     Returns : integer representing the length of the sequence.
     Args    :

desc
----

     Title   : desc
     Usage   : $seq->desc($newval);
               $description = $seq->desc();
     Function: Get/set description text for a seq object
     Example :
     Returns : value of desc
     Args    : newvalue (optional)

Private functions
=================

   These are some private functions for the PrimarySeqI interface. You do
not need to implement these functions

_attempt_to_load_Seq
--------------------

     Title   : _attempt_to_load_Seq
     Usage   :
     Function:
     Example :
     Returns :
     Args    :


File: pm.info,  Node: Bio/Range,  Next: Bio/RangeI,  Prev: Bio/PrimarySeqI,  Up: Module List

Pure perl RangeI implementation
*******************************

NAME
====

   Bio::Range - Pure perl RangeI implementation

DESCRIPTION
===========

   This provides a pure perl implementation of the BioPerl range interface.

   Ranges are modeled as having (start, end, length, strand). They use
Bio-coordinates - all points >= start and <= end are within the range. End
is always greater-than or equal-to start, and length is greather than or
equal to 1. The behaviour of a range is undefined if ranges with negative
numbers or zero are used.

   So, in summary:

     length = end - start + 1
     end >= start
     strand = (-1 | 0 | +1)

SYNOPSIS
========

     $range = new Bio::Range(-start=>10, -end=>30, -strand=>+1);
     $r2 = new Bio::Range(-start=>15, -end=>200, -strand=>+1);

     print join(', ', $range->union($r2), "\n";
     print join(', ', $range->intersection($r2), "\n";
     print $range->overlaps($r2), "\n";
     print $range->contains($r2), "\n";

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Heikki Lehvaslaiho
===========================

   Email heikki@ebi.ac.uk

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal metho ds are usually preceded with a _

Constructors
============

new
---

     Title   : new
     Usage   : $range = Bio::Range->new(-start => 100, -end=> 200, -strand = +1);
     Function: generates a new Bio::Range
     Returns : a new range
     Args    : two of (-start, -end, '-length') - the third is calculated
             : -strand (defaults to 0)

Member variable access
======================

   These methods let you get at and set the member variables

start
-----

     Title    : start
     Function : return or set the start co-ordinate
     Example  : $s = $range->start(); $range->start(7);
     Returns  : the value of the start co-ordinate
     Args     : optionally, the new start co-ordinate
     Overrides: Bio::RangeI::start

end
---

     Title    : end
     Function : return or set the end co-ordinate
     Example  : $e = $range->end(); $range->end(2000);
     Returns  : the value of the end co-ordinate
     Args     : optionally, the new end co-ordinate
     Overrides: Bio::RangeI::end

strand
------

     Title    : strand
     Function : return or set the strandidness
     Example  : $st = $range->strand(); $range->strand(-1);
     Returns  : the value of the strandedness (-1, 0 or 1)
     Args     : optionaly, the new strand - (-1, 0, 1) or (-, ., +).
     Overrides: Bio::RangeI::Strand

length
------

     Title    : length
     Function : returns the length of this range
     Example  : $length = $range->length();
     Returns  : the length of this range, equal to end - start + 1
     Args     : if you attempt to set the length, and exeption will be thrown
     Overrides: Bio::RangeI::Length

toString
--------

     Title   : toString
     Function: stringifies this range
     Example : print $range->toString(), "\n";
     Returns : a string representation of this range

Boolean Methods
===============

   These methods return true or false.

     $range->overlaps($otherRange) && print "Ranges overlap\n";

overlaps
--------

     Title    : overlaps
     Usage    : if($r1->overlaps($r2)) { do stuff }
     Function : tests if $r2 overlaps $r1
     Args     : a range to test for overlap with
     Returns  : true if the ranges overlap, false otherwise
     Inherited: Bio::RangeI

contains
--------

     Title    : contains
     Usage    : if($r1->contains($r2) { do stuff }
     Function : tests wether $r1 totaly contains $r2
     Args     : a range to test for being contained
     Returns  : true if the argument is totaly contained within this range
     Inherited: Bio::RangeI

equals
------

     Title    : equals
     Usage    : if($r1->equals($r2))
     Function : test whether $r1 has the same start, end, length as $r2
     Args     : a range to test for equality
     Returns  : true if they are describing the same range
     Inherited: Bio::RangeI

Geometrical methods
===================

   These methods do things to the geometry of ranges, and return triplets
(start, end, strand) from which new ranges could be built.

intersection
------------

     Title    : intersection
     Usage    : ($start, $stop, $strand) = $r1->intersection($r2)
     Function : gives the range that is contained by both ranges
     Args     : a range to compare this one to
     Returns  : nothing if they do not overlap, or the range that they do overlap
     Inherited: Bio::RangeI::intersection

union
-----

     Title    : union
     Usage    : ($start, $stop, $strand) = $r1->union($r2);
              : ($start, $stop, $strand) = Bio::Range->union(@ranges);
     Function : finds the minimal range that contains all of the ranges
     Args     : a range or list of ranges to find the union of
     Returns  : the range containing all of the ranges
     Inherited: Bio::RangeI::union


File: pm.info,  Node: Bio/RangeI,  Next: Bio/Root/Err,  Prev: Bio/Range,  Up: Module List

Range interface
***************

NAME
====

   Bio::RangeI - Range interface

SYNOPSIS
========

   None.

DESCRIPTION
===========

   This provides a standard BioPerl range interface that should be
implemented by any object that wants to be treated as a range. This serves
purely as an abstract base class for implementers and can not be
instantiated.

   Ranges are modeled as having (start, end, length, strand). They use
Bio-coordinates - all points >= start and <= end are within the range. End
is always greater-than or equal-to start, and length is greather than or
equal to 1. The behaviour of a range is undefined if ranges with negative
numbers or zero are used.

   So, in summary:

     length = end - start + 1
     end >= start
     strand = (-1 | 0 | +1)

FEEDBACK
========

Mailing Lists
-------------

   User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to one of
the Bioperl mailing lists.  Your participation is much appreciated.

     bioperl-l@bioperl.org          - General discussion
     http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs
--------------

   Report bugs to the Bioperl bug tracking system to help us keep track
the bugs and their resolution.  Bug reports can be submitted via email or
the web:

     bioperl-bugs@bio.perl.org
     http://bio.perl.org/bioperl-bugs/

AUTHOR - Heikki Lehvaslaiho
===========================

   Email:  heikki@ebi.ac.uk

APPENDIX
========

   The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

Abstract methods
================

   These methods must be implemented in all subclasses.

new
---

     Title   : new
     Function: confesses if you try to instantiate a RangeI
             : RangeI is an interface, so RangeI->new should never be called
     	  : To make a range, instantiate one of the implementing classes. e.g.
     	  : $range = Bio::Range->new(-start=>20, -end=>2000, -strand=>1)

start
-----

     Title   : start
     Usage   : $start = $range->start();
     Function: get/set the start of this range
     Returns : the start of this range
     Args    : optionaly allows the start to be set
             : using $range->start($start)

end
---

     Title   : end
     Usage   : $end = $range->end();
     Function: get/set the end of this range
     Returns : the end of this range
     Args    : optionaly allows the end to be set
             : using $range->end($end)

length
------

     Title   : length
     Usage   : $length = $range->length();
     Function: get/set the length of this range
     Returns : the length of this range
     Args    : optionaly allows the length to be set
             : using $range->length($length)

strand
------

     Title   : strand
     Usage   : $strand = $range->strand();
     Function: get/set the strand of this range
     Returns : the strandidness (-1, 0, +1)
     Args    : optionaly allows the strand to be set
             : using $range->strand($strand)

Boolean Methods
===============

   These methods return true or false. They throw an error if start and
end are not defined.

     $range->overlaps($otherRange) && print "Ranges overlap\n";

overlaps
--------

     Title   : overlaps
     Usage   : if($r1->overlaps($r2)) { do stuff }
     Function: tests if $r2 overlaps $r1
     Args    : arg #1 = a range to compare this one to (mandatory)
               arg #2 = strand option ('strong', 'weak', 'ignore') (optional)
     Returns : true if the ranges overlap, false otherwise

contains
--------

     Title   : contains
     Usage   : if($r1->contains($r2) { do stuff }
     Function: tests whether $r1 totally contains $r2
     Args    : arg #1 = a range to compare this one to (mandatory)
     	             alternatively, integer scalar to test
               arg #2 = strand option ('strong', 'weak', 'ignore') (optional)
     Returns : true if the argument is totaly contained within this range

equals
------

     Title   : equals
     Usage   : if($r1->equals($r2))
     Function: test whether $r1 has the same start, end, length as $r2
     Args    : a range to test for equality
     Returns : true if they are describing the same range

Geometrical methods
===================

   These methods do things to the geometry of ranges, and return
Bio::RangeI compliant objects or triplets (start, stop, strand) from which
new ranges could be built.

intersection
------------

     Title   : intersection
     Usage   : ($start, $stop, $strand) = $r1->intersection($r2)
     Function: gives the range that is contained by both ranges
     Args    : arg #1 = a range to compare this one to (mandatory)
               arg #2 = strand option ('strong', 'weak', 'ignore') (optional)
     Returns : undef if they do not overlap,
               or the range that they do overlap (in an objectlike the calling one)

union
-----

     Title   : union
     Usage   : ($start, $stop, $strand) = $r1->union($r2);
             : ($start, $stop, $strand) = Bio::RangeI->union(@ranges);
     Function: finds the minimal range that contains all of the ranges
     Args    : a range or list of ranges to find the union of
     Returns : the range object containing all of the ranges

overlap_extent
--------------

     Title   : overlap_extent
     Usage   : ($a_unique,$common,$b_unique) = $a->overlap_extent($b)
     Function: Provides actual amount of overlap between two different
               ranges.
     Example :
     Returns : array of values for
               - the amount unique to a
               - the amount common to both
               - the amount unique to b
     Args    :


