\documentclass{slides}

\usepackage{graphicx}
\usepackage{verbatim}

\begin{document}

\begin{slide}

{\large\bf Using perl for Bioinformatics}

{\bf Overview}

\begin{itemize}

\item
Starting perl and creating perl programs

\item
Variables

\item
Subroutines

\end{itemize}

\end{slide}

\begin{slide}

{\large\bf Basic {\tt emacs} usage}

{\bf Starting {\tt emacs}}

Start emacs with the commands

\begin{verbatim}
athena% add seven
athena% bemacs &
\end{verbatim}

Normally, it's called {\tt emacs}, but I've made a wrapper script for it
so that you don't have to worry about setting environment variables, and
so on.

{\bf Opening and editing files in emacs}

Type {\tt C-x C-f \~{}/.environment<RET>}.  That is: Hold down the
control key, and press and release {\tt x}, then {\tt f}.  Release the
control key, then type {\tt \~{}/.environment}, and press return.

\end{slide}

\begin{slide}

{\bf Files in emacs, continued}

Add this line to the text in the resulting window

\begin{verbatim}
add seven
PERL5LIB=/mit/seven/lib/site_perl/5.6.0
\end{verbatim}

{\bf Saving files}

To save the file, type {\tt C-x C-s}.  That is: Hold down the control
key, and press and release {\tt x}, then {\tt s}.

{\bf The {\tt \~{}/.environment} file}

Check that you modified the file correctly by opening up another {\tt
  xterm}, and typing 

\begin{verbatim}
athena% source ~/.environment && which pw
\end{verbatim}

If this doesn't result in any obvious error messages, it shouldn't be
necessary to type ``{\tt add seven}'' next time you log in.

\end{slide}

\begin{slide}

{\large\bf Customization for non-Athena machines}

If you have trouble working with perl on Athena machines, let me know,
and I'll straighten things out.  If you want to set up your personal
machine for 7.91, I'm afraid you're on your own.  I understand how
desirable such an arrangment can be, though, so here are a couple of
pointers.  It's going to be a fair amount of work, though.

It's probably not worth it to try to set things up on Windows machines.
You could try installing {\tt ActivePerl} from \\ 
%
{\tt http://www.activestate.com/Products/ActivePerl/} and installing
Bioperl by hand from there, but I have no idea whether that'll work or
not.

\end{slide}

\begin{slide}

For Unix machines, you need at a perl version later than 5.6.  Install
the packages in {\tt /mit/seven/src/bioperl/}

E.g.
\begin{verbatim}
tar zxf bioperl-1.2.tar.gz
cd bioperl-1.2
perl Makefile.PL
make install
\end{verbatim}

There are some more Bioperl installation notes in\\
{\tt /mit/seven/src/bioperl/README}

To get the emacs enhancements, put the code in {\tt
  /mit/seven/7.91/dotfiles/emacs} in your\\ 
{\tt \~{}/.emacs} file.

\end{slide}

\begin{slide}

{\large\bf More information about {\tt emacs}}

You're going to be using {\tt emacs} a lot.  It's best you get
comfortable with it as quickly as possible.  

{\bf Documentation commands}

\begin{tabular}{ll}
{\tt C-h t}             & {\tt emacs} tutorial (highly \\
                        & recommended.) \\
{\tt C-h i Info<RET>}   & Manual for {\tt emacs} \\
                        & documentation system. \\
{\tt C-h i Emacs<RET>}  & Manual for emacs. \\
{\tt C-h ?}             & All help commands.\\
\end{tabular}

{\bf Editor commands}

\begin{tabular}{ll}
{\tt C-g}     & Abort.\\
{\tt C-x u}   & Undo.\\
\end{tabular}

\end{slide}

\begin{slide}

{\large\bf First steps in perl programming}

In your terminal, make a directory for your perl programs like so:
\begin{verbatim}
athena% mkdir ~/7.91
\end{verbatim}

In {\tt emacs}, open up the file {\tt \~{}/7.91/hello.pl}, and put
this in it:
\begin{verbatim}
use strict;
print "hello, world!\n";
\end{verbatim}

Now, at the terminal, type this:
\begin{verbatim}
athena% pw -w hello.pl
\end{verbatim}

The resulting output will be ``{\tt hello, world!}''.

\end{slide}

\begin{slide}

{\large\bf Things to note.}

\begin{itemize}

\item
{\em Always} begin your programs with ``{\tt use strict;}''.  It will
save you a lot of grief, later.

\item
The {\tt pw} command is shorthand for 
\begin{verbatim}
/mit/perl5/bin/perl -w
\end{verbatim}
Always use the {\tt perl} in the {\tt perl5} locker.  It has much more
functionality than the local one.  If you develop on some other
platform, {\em always} pass the {\tt -w} switch to {\tt perl}

\item
All commands end with semicolons.

\end{itemize}

\end{slide}

\begin{slide}

{\large\bf Documentation}

Put the cursor on the word ``{\tt print}'', and type {\tt C-c C-h f}.
You will get the documentation for the {\tt print} command.  Try it on
the word ``{\tt use},'' too.

Put the cursor on the word ``{\tt strict}'', and type {\tt C-c C-h m}.
You will get the documentation for the {\tt strict} module.

The latter keybinding is the most reliable, but the former produces
documentation in info format, which can be helpful.

Don't forget google: searching for \\ ``{\tt site:bioperl.org BLAST}''
returns pointers to documentation of bioperl's {\tt BLAST}
functionality.  Searching for ``{\tt perl list scalar context}'' returns
pointers to explanations of how functions in perl can return different
values depending on the context in which they're called.

\end{slide}

\begin{slide}

{\large\bf Variables}

You can use the debugger to play with perl expressions like so:

\begin{verbatim}
athena% perl -d -e 0
main::(-e:1):   0
  DB<1> $a = 1
  DB<2> print $a
1
  DB<3> $a = "foo"
  DB<4> print $a
foo
  DB<5> print "interpolation of $a"
interpolation of foo
\end{verbatim}

Variables starting with ``\$''  are called {\em scalars}.

\end{slide}

\begin{slide}

{\large\bf Some bioinformatics}

Create a file {\tt bptranslate.pl} containing the following:

\verbatiminput{bptranslate.pl}

This takes a nucleotide sequence file, tries to guess the file format
from its extension, and prints out the standard translation of the
first sequence in the file.  Use it like so:

\begin{verbatim}
athena% cd /mit/seven/7.91
athena$ pw bptranslate.pl control.fa
ALRLPIKSLISCVFVCRLRYI*DSCSPWWPKTPTPPG...
\end{verbatim}

\end{slide}

\begin{slide}

{\large\bf Things to note}

\begin{itemize}

\item
You need to declare the variables you use with ``{\tt my}''.  This is
due to the ``{\tt use strict;}'' command.  Without the {\tt strict}
module, variables that have not been seen before are initially assigned
a default trivial value, which can get very confusing if you typo a
variable name.

\item
The script gets the filename passed on the command line with the command
``{\tt shift @ARGV}''.  The variable {\tt @ARGV} is an {\em array}, and
{\tt shift} returns the first element and removes it from the array.

\item
After printing the translation, we ask it to print ``$\backslash
\mbox{\tt n}$'', which is the symbol for a newline.  Otherwise, the
subsequent {\tt athena} prompt shows up on the same line.

\end{itemize}

\end{slide}

\begin{slide}

{\large\bf bioperl objects}

\begin{itemize}

\item
You access variables within modules using ``::''.  

\item
The {\tt read\_sequence} function returns a {\tt Bio::Seq} {\em object},
which we assign to the variable {\tt \$seq}.  This object has a {\em
  method}, {\tt translate}, a function which returns another {\tt
  Bio::Seq} object containing the translation to protein.  This object
is converted into an actual {\em string} (sequence of characters) using
the {\tt seq} method.

\item
You can read about the {\tt Bio::Seq} module by putting your cursor on
it, and typing {\tt C-c C-h m}, or by typing {\tt perldoc Bio::Seq} in
your terminal window.

\end{itemize}

\end{slide}

\begin{slide}

{\large\bf Our own translator}

In {\tt /mit/seven/7.91/perl\_module/translate.pl}, there is a translation
program that does not depend on Bioperl.  I'll go through it because it
introduces some important perl concepts.
\begin{verbatim}
sub translate{
    my $sequence = shift;
    $sequence = uc($sequence);
    my $seqidx; my $codon; my @codon_list;
    for ($seqidx = 0; 
         $seqidx < length $sequence ; 
         $seqidx += 3) {
      $codon = substr($sequence, $seqidx, 3);
      if (length $codon == 3) {
        push(@codon_list, $codons{$codon}||'X');
      }
    }
    return ( join ('' , @codon_list));
}
\end{verbatim}
\end{slide}

\begin{slide}

{\large\bf Running the debugger}

Open up {\tt translate.pl} and press {\tt C-c C-c} to start the
debugger.  Enter ``{\tt s}'' twice, then keep entering ``{\tt n}'' to
get a feel for how the {\tt translate} subroutine works.

The ``{\tt s}'' command ``steps into'' the context of the function that
is about to be called.  That's how we get the debugger into {\tt
  translate}.  The ``{\tt n}'' command ``steps over'' the command that
is about to be executed.

You can evaluate expressions in the current context using the ``{\tt
  x}'' command:

\begin{verbatim}
  DB<1> x @codon_list
  empty array
  DB<2> x $seqidx
0  0
\end{verbatim}

\end{slide}

\begin{slide}

{\large\bf Exercise}

If you have time, it'll be very instructive to use the debugger to step
through the calls to {\tt read\_sequence} and {\tt seq->translate} in\\
{\tt bptranslate.pl}.  (If you see anything interesting, try looking it
up using {\tt C-c C-h m}.)

\end{slide}

\begin{slide}

{\large\bf Things to note about {\tt translate.pl}}

\begin{itemize}

\item
The {\tt \%codons} variable is a {\em hash}: a mapping between
arbitrary key-value pairs.  In this case, it maps the codons to their
respective residue symbols.

\item
The block under {\tt translate} is a {\em subroutine}: a piece of code
that you can call repeatedly, with different {\em arguments}.  If you
{\em call} it like ``{\tt translate("acgactagcaattcaca");}, it gets {\em
  passed} one argument: the string\\ {\tt "acgactagcaattcaca"}.  Within
{\tt translate}, the argument is assigned to {\tt \$sequence} using the
{\tt shift} command.

\end{itemize}

\end{slide}

\begin{slide}

{\large\bf More on {\tt translate.pl}}

\begin{itemize}

\item
The {\tt uc} command converts {\tt \$sequence} to uppercase.

\item
The {\tt for} block causes {\tt \$seqidx} to iterate over the values
{\tt 0, 3, 6, 9, \ldots}.  The function {\tt substr(\$sequence,
  \$seqidx, 3)} returns the substring of length three starting at each
of these positions.

\item
Triplets of nucleotides are translated by looking them up in the {\tt
  \%codon\_list} hash.

\item
The {\tt @codons} list stores the translated residues.  They get added
to the end of the list with the {\tt push} command.

\item
The list of residues in {\tt \%codon\_list} are joined together into a
string using the {\tt join} command.

\end{itemize}

\end{slide}

\end{document}
}