This is Info file pm.info, produced by Makeinfo version 1.68 from the input file bigpm.texi.  File: pm.info, Node: Bone/Easy, Next: Bone/Easy/Rules, Prev: BnP, Up: Module List Perl module for generating pickup lines. **************************************** NAME ==== Bone::Easy - Perl module for generating pickup lines. SYNOPSIS ======== use Bone::Easy; # I know you get this a lot, but what's a unholy fairy like you # doing in a mosque like this? print pickup, "\n"; DESCRIPTION =========== Generates pickup-lines GUARANTEED to get something thrown in your face. AUTHOR ====== Idea and original ruleset by TheSpark.com and Chris Coyne Perl Code by Michael G Schwern LICENSE ======= This program may be distributed under the same license as Perl itself, except for Bone::Easy::Rules. *Note Bone/Easy/Rules: Bone/Easy/Rules, for details. SEE ALSO ======== *Note Safe: Safe,, *Note Sex: Sex,, `pickup' in this node, *Note Bone/Easy/Rules: Bone/Easy/Rules,  File: pm.info, Node: Bone/Easy/Rules, Next: Boulder, Prev: Bone/Easy, Up: Module List Default ruleset for Bone::Easy ****************************** NAME ==== Bone::Easy::Rules - Default ruleset for Bone::Easy SYNOPSIS ======== use Bone::Easy::Rules; @rules = ; DESCRIPTION =========== Due to licensing issues, the default ruleset must reside seperately from the code. AUTHOR ====== Original ruleset by TheSpark.com and Chris Coyne Perl code by Michael G Schwern LICENSE ======= This library (Bone::Easy::Rules) is free for noncommercial use and modification. Anyone wishing to use it for commercial purposes must get the permission of Chris Coyne and TheSpark.com SEE ALSO ======== *Note Bone/Easy: Bone/Easy, and `pickup' in this node  File: pm.info, Node: Boulder, Next: Boulder/Blast, Prev: Bone/Easy/Rules, Up: Module List An API for hierarchical tag/value structures ******************************************** NAME ==== Boulder - An API for hierarchical tag/value structures SYNOPSIS ======== # Read a series of People records from STDIN. # Add an "Eligibility" attribute to all those whose # Age >= 35 and Friends list includes "Fred" use Boulder::Stream; my $stream = Boulder::Stream->newFh; while ( my $record = <$stream> ) { next unless $record->Age >= 35; my @friends = $record->Friends; next unless grep {$_ eq 'Fred'} @friends; $record->insert(Eligibility => 'yes'); print $stream $record; } Related manual pages: basics ------ Stone hierarchical tag/value records Stone::Cursor Traverse a hierarchy Boulder::Stream stream-oriented storage for Stones Boulder::Store record-oriented storage for Stones Boulder::XML XML conversion for Stones Boulder::String conversion to strings genome-related --------------- Boulder::Genbank parse Genbank (DNA sequence) records Boulder::Blast parse BLAST (basic local alignment search tool) reports Boulder::Medline parse Medline (pubmed) records Boulder::Omim parse OMIM (online Mendelian inheritance in man) records Boulder::Swissprot parse Swissprot records Boulder::Unigene parse Unigene records DESCRIPTION =========== Boulder IO ---------- Boulder IO is a simple TAG=VALUE data format designed for sharing data between programs connected via a pipe. It is also simple enough to use as a common data exchange format between databases, Web pages, and other data representations. The basic data format is very simple. It consists of a series of TAG=VALUE pairs separated by newlines. It is record-oriented. The end of a record is indicated by an empty delimiter alone on a line. The delimiter is "=" by default, but can be adjusted by the user. An example boulder stream looks like this: Name=Lincoln Stein Home=/u/bush202/lds32 Organization=Cold Spring Harbor Laboratory Login=lds32 Password_age=20 Password_expires=60 Alias=lstein Alias=steinl = Name=Leigh Deacon Home=/u/bush202/tanager Organization=Cold Spring Harbor Laboratory Login=tanager Password_age=2 Password_expires=60 = Notes: (1) There is no need for all tags to appear in all records, or indeed for all the records to be homogeneous. (2) Multiple values are allowed, as with the Alias tag in the second record. (3) Lines can be any length, as in a potential 40 Kbp DNA sequence entry. (4) Tags can be any alphanumeric character (upper or lower case) and may contain embedded spaces. Conventionally we use the characters A-Z0-9_, because they can be used without single quoting as keys in Perl associative arrays, but this is merely stylistic. Values can be any character at all except for the reserved characters {}=% and newline. You can incorporate binary data into the data stream by escaping these characters in the URL manner, using a % sign followed by the (capitalized) hexadecimal code for the character. The module makes this automatic. Hierarchical Records -------------------- The simple boulder format can be extended to accomodate nested relations and other intresting structures. Nested records can be created in this way: Name=Lincoln Stein Home=/u/bush202/lds32 Organization=Cold Spring Harbor Laboratory Login=lds32 Password_age=20 Password_expires=60 Privileges={ ChangePasswd=yes CronJobs=yes Reboot=yes Shutdown=no } = Name=Leigh Deacon Home=/u/bush202/tanager Organization=Cold Spring Harbor Laboratory Login=tanager Password_age=2 Password_expires=60 Privileges={ ChangePasswd=yes CronJobs=no Reboot=no Shutdown=no } = As in the original format, tags may be multivalued. For example, there might be several Privilege record assigned to a login account. Each subrecord may contain further subrecords. Within the program, a hierarchical record is encapsulated within a "Stone", an opaque structure that implements methods for fetching and settings its various tags. Using Boulder for I/O --------------------- The Boulder API was designed to make reading and writing of complex hierarchical records almost as easy as reading and writing single lines of text. Boulder::Stream The main component of the Boulder modules is Boulder::Stream, which provides a stream-oriented view of the data. You can read and write to Boulder::Streams via tied filehandles, or via method calls. Data records are flattened into a simple format called "boulderio" format. Boulder::XML Boulder::XML acts like Boulder::Stream, but the serialization format is XML. You need XML::Parser installed to use this module. Boulder::Store This is a simple persistent storage class which allows you to store several (thousand) Stone's into a DB_File database. You must have libdb and the Perl DB_File extensions installed in order to take advantage of this class. Boulder::Genbank Boulder::Unigene Boulder::OMIM Boulder::Blast Boulder::Medline Boulder::SwissProt These are parsers and accessors for various biological data sources. They act like Boulder::Stream, but return a set of Stone objects that have certain prescribed tags and values. Many of these modules were written by Luca I.G. Toldo . Stone Objects ------------- The Stone object encapsulates a set of tags and values. Any tag can be single- or multivalued, and tags are allowed to contain subtags to any depth. A simple set of methods named tags(), get(), put(), insert(), replace() and so forth, allows you to examine the tags that are available, get and set their values, and search for particular tags. In addition, an autoload mechanism allows you to use method calls to access tags, for example: my @friends = $record->Friends; is equivalent to: my @friends = $record->get('Friends'); A Stone::Cursor class allows you to traverse Stones systematically. A full explanation of the Stone class can be found in its manual page. AUTHOR ====== Lincoln D. Stein , Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. This module can be used and distributed on the same terms as Perl itself. SEE ALSO ======== *Note Boulder/Blast: Boulder/Blast,, *Note Boulder/Genbank: Boulder/Genbank,, *Note Boulder/Medline: Boulder/Medline,, *Note Boulder/Unigene: Boulder/Unigene,, *Note Boulder/Omim: Boulder/Omim,, `Boulder::SwissProt' in this node  File: pm.info, Node: Boulder/Blast, Next: Boulder/Blast/NCBI, Prev: Boulder, Up: Module List Parse and read BLAST files ************************** NAME ==== Boulder::Blast - Parse and read BLAST files SYNOPSIS ======== use Boulder::Blast; # parse from a single file $blast = Boulder::Blast->parse('run3.blast'); # parse and read a set of blast output files $stream = Boulder::Blast->new('run3.blast','run4.blast'); while ($blast = $stream->get) { # do something with $blast object } # parse and read a whole directory of blast runs $stream = Boulder::Blast->new(<*.blast>); while ($blast = $stream->get) { # do something with $blast object } # parse and read from STDIN $stream = Boulder::Blast->new; while ($blast = $stream->get) { # do something with $blast object } # parse and read as a filehandle $stream = Boulder::Blast->newFh(<*.blast>); while ($blast = <$stream>) { # do something with $blast object } # once you have a $blast object, you can get info about it: $query = $blast->Blast_query; @hits = $blast->Blast_hits; foreach $hit (@hits) { $hit_sequence = $hit->Name; # get the ID $significance = $hit->Signif; # get the significance @hsps = $hit->Hsps; # list of HSPs foreach $hsp (@hsps) { $query = $hsp->Query; # query sequence $subject = $hsp->Subject; # subject sequence $signif = $hsp->Signif; # significance of HSP } } DESCRIPTION =========== The Boulder::Blast class parses the output of the *Washington University (WU)* or National Cenber for Biotechnology Information (NCBI) series of BLAST programs and turns them into *Stone* records. You may then use the standard Stone access methods to retrieve information about the BLAST run, or add the information to a Boulder stream. The parser works equally well on the contents of a static file, or on information read dynamically from a filehandle or pipe. METHODS ======= parse() Method -------------- $stone = Boulder::Blast->parse($file_path); $stone = Boulder::Blast->parse($filehandle); The parse() method accepts a path to a file or a filehandle, parses its contents, and returns a Boulder Stone object. The file path may be absolute or relative to the current directgly. The filehandle may be specified as an IO::File object, a FileHandle object, or a reference to a glob (`\*FILEHANDLE' notation). If you call parse() without any arguments, it will try to parse the contents of standard input. new() Method ------------ $stream = Boulder::Blast->new; $stream = Boulder::Blast->new($file [,@more_files]); $stream = Boulder::Blast->new(\*FILEHANDLE); If you wish, you may create the parser first with Boulder::Blast new(), and then invoke the parser object's parse() method as many times as you wish to, producing a Stone object each time. TAGS ==== The following tags are defined in the parsed Blast Stone object: Information about the program ----------------------------- These top-level tags provide information about the version of the BLAST program itself. Blast_program The name of the algorithm used to run the analysis. Possible values include: blastn blastp blastx tblastn tblastx fasta3 fastx3 fasty3 tfasta3 tfastx3 tfasty3 Blast_version This gives the version of the program in whatever form appears on the banner page, e.g. "2.0a19-WashU". Blast_program_date This gives the date at which the program was compiled, if and only if it appears on the banner page. Information about the run ------------------------- These top-level tags give information about the particular run, such as the parameters that were used for the algorithm. Blast_run_date This gives the date and time at which the similarity analysis was run, in the format "Fri Jul 6 09:32:36 1998" Blast_parms This points to a subrecord containing information about the algorithm's runtime parameters. The following subtags are used. Others may be added in the future: Hspmax the value of the -hspmax argument Expectation the value of E Matrix the matrix in use, e.g. BLOSUM62 Ctxfactor the value of the -ctxfactor argument Gapall The value of the -gapall argument Information about the query sequence and subject database --------------------------------------------------------- Thse top-level tags give information about the query sequence and the database that was searched on. Blast_query The identifier for the search sequence, as defined by the FASTA format. This will be the first set of non-whitespace characters following the ">" character. In other words, the search sequence "name". Blast_query_length The length of the query sequence, in base pairs. Blast_db The Unix filesystem path to the subject database. Blast_db_title The title of the subject database. The search results: the *Blast_hits* tag. ----------------------------------------- Each BLAST hit is represented by the tag *Blast_hits*. There may be zero, one, or many such tags. They will be presented in reverse sorted order of significance, i.e. most significant hit first. Each *Blast_hits* tag is a Stone subrecord containing the following subtags: Name The name/identifier of the sequence that was hit. Length The total length of the sequence that was hit Signif The significance of the hit. If there are multiple HSPs in the hit, this will be the most significant (smallest) value. Identity The percent identity of the hit. If there are multiple HSPs, this will be the one with the highest percent identity. Expect The expectation value for the hit. If there are multiple HSPs, this will be the lowest expectation value in the set. Hsps One or more sub-sub-tags, pointing to a nested record containing information about each high-scoring segment pair (HSP). See the next section for details. The Hsp records: the Hsps tag ----------------------------- Each *Blast_hit* tag will have at least one, and possibly several Hsps tags, each one corresponding to a high-scoring segment pair (HSP). These records contain detailed information about the hit, including the alignments. Tags are as follows: Signif The significance (P value) of this HSP. Bits The number of bits of significance. Expect Expectation value for this HSP. Identity Percent identity. =item Positives Percent positive matches. Score The Smith-Waterman alignment score. Orientation The word "plus" or "minus". This tag is only present for nucleotide searches, when the reverse complement match may be present. Strand Depending on algorithm used, indicates complementarity of match and possibly the reading frame. This is copied out of the blast report. Possibilities include: "Plus / Minus" "Plus / Plus" -- blastn algorithm "+1 / -2" "+2 / -2" -- blastx, tblastx Query_start Position at which the HSP starts in the query sequence (1-based indexing). Query_end Position at which the HSP stops in the query sequence. Subject_start Position at which the HSP starts in the subject (target) sequence. Subject_end Position at which the HSP stops in the subject (target) sequence. Query, Subject, Alignment These three tags contain strings which, together, create the gapped alignment of the query sequence with the subject sequence. For example, to print the alignment of the first HSP of the first match, you might say: $hsp = $blast->Blast_hits->Hsps; print join("\n",$hsp->Query,$hsp->Alignment,$hsp->Subject),"\n"; See the bottom of this manual page for an example BLAST run. CAVEATS ======= This module has been extensively tested with WUBLAST, but very little with NCBI BLAST. It probably will not work with PSI Blast or other variants. The author plans to adapt this module to parse other formats, as well as non-BLAST formats such as the output of Fastn. SEE ALSO ======== *Note Boulder: Boulder,, `Boulder::GenBank' in this node AUTHOR ====== Lincoln Stein . Copyright (c) 1998-1999 Cold Spring Harbor Laboratory This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty. EXAMPLE BLASTN RUN ================== This output was generated by the *quickblast.pl* program, which is located in the `eg/' subdirectory of the *Boulder* distribution directory. It is a typical *blastn* (nucleotide->nucleotide) run; however long lines (usually DNA sequences) have been truncated. Also note that per the Boulder protocol, the percent sign (%) is escaped in the usual way. It will be unescaped when reading the stream back in. Blast_run_date=Fri Nov 6 14:40:41 1998 Blast_db_date=2:40 PM EST Nov 6, 1998 Blast_parms={ Hspmax=10 Expectation=10 Matrix=+5,-4 Ctxfactor=2.00 } Blast_program_date=05-Feb-1998 Blast_db= /usr/tmp/quickblast18202aaaa Blast_version=2.0a19-WashU Blast_query=BCD207R Blast_db_title= test.fasta Blast_query_length=332 Blast_program=blastn Blast_hits={ Signif=3.5e-74 Expect=3.5e-74, Name=BCD207R Identity=100%25 Length=332 Hsps={ Subject=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA... Signif=3.5e-74 Length=332 Bits=249.1 Query_start=1 Subject_end=332 Query=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA... Positives=100%25 Expect=3.5e-74, Identity=100%25 Query_end=332 Orientation=plus Score=1660 Strand=Plus / Plus Subject_start=1 Alignment=||||||||||||||||||||||||||||||||||||||||||||||||||||||||||... } } = Example BLASTP run ================== Here is the output from a typical *blastp* (protein->protein) run. Long lines have again been truncated. Blast_run_date=Fri Nov 6 14:37:23 1998 Blast_db_date=2:36 PM EST Nov 6, 1998 Blast_parms={ Hspmax=10 Expectation=10 Matrix=BLOSUM62 Ctxfactor=1.00 } Blast_program_date=05-Feb-1998 Blast_db= /usr/tmp/quickblast18141aaaa Blast_version=2.0a19-WashU Blast_query=YAL004W Blast_db_title= elegans.fasta Blast_query_length=216 Blast_program=blastp Blast_hits={ Signif=0.95 Expect=3.0, Name=C28H8.2 Identity=30%25 Length=51 Hsps={ Subject=HMTVEFHVTSQSW---FGFEDHFHMIIR-AVNDENVGWGVRYLSMAF Signif=0.95 Length=46 Bits=15.8 Query_start=100 Subject_end=49 Query=HLTQD-HGGDLFWGKVLGFTLKFNLNLRLTVNIDQLEWEVLHVSLHF Positives=52%25 Expect=3.0, Identity=30%25 Query_end=145 Orientation=plus Score=45 Subject_start=7 Alignment=H+T + H W GF F++ +R VN + + W V ++S+ F } } Blast_hits={ Signif=0.99 Expect=4.7, Name=ZK896.2 Identity=24%25 Length=340 Hsps={ Subject=FSGKFTTFVLNKDQATLRMSSAEKTAEWNTAFDSRRGFF----TSGNYGL... Signif=0.99 Length=101 Bits=22.9 Query_start=110 Subject_end=243 Query=FWGKVLGFTL-KFNLNLRLTVNIDQLEWEVLHVSLHFWVVEVSTDQTLSVE... Positives=41%25 Expect=4.7, Identity=24%25 Query_end=210 Orientation=plus Score=65 Subject_start=146 Alignment=F GK F L K LR++ EW S + T +... } } =  File: pm.info, Node: Boulder/Blast/NCBI, Next: Boulder/Blast/WU, Prev: Boulder/Blast, Up: Module List Parse and read NCBI BLAST files ******************************* NAME ==== Boulder::Blast::NCBI - Parse and read NCBI BLAST files SYNOPSIS ======== Not for direct use. Use Boulder::Blast instead. DESCRIPTION =========== Specialized parser for NCBI format BLAST output. Loaded automatically by Boulder::Blast. SEE ALSO ======== *Note Boulder: Boulder,, `Boulder::GenBank' in this node, *Note Boulder/Blast: Boulder/Blast, AUTHOR ====== Lincoln Stein . Copyright (c) 1998 Cold Spring Harbor Laboratory This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty.  File: pm.info, Node: Boulder/Blast/WU, Next: Boulder/Genbank, Prev: Boulder/Blast/NCBI, Up: Module List Parse and read WU-BLAST files ***************************** NAME ==== Boulder::Blast::WU - Parse and read WU-BLAST files SYNOPSIS ======== Not for direct use. Use Boulder::Blast instead. DESCRIPTION =========== Specialized parser for WUBLAST format BLAST output. Loaded automatically by Boulder::Blast. SEE ALSO ======== *Note Boulder: Boulder,, `Boulder::GenBank' in this node, *Note Boulder/Blast: Boulder/Blast, AUTHOR ====== Lincoln Stein . Copyright (c) 1998 Cold Spring Harbor Laboratory This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty.  File: pm.info, Node: Boulder/Genbank, Next: Boulder/Medline, Prev: Boulder/Blast/WU, Up: Module List Fetch Genbank data records as parsed Boulder Stones *************************************************** NAME ==== Boulder::Genbank - Fetch Genbank data records as parsed Boulder Stones SYNOPSIS ======== use Boulder::Genbank # network access via Entrez $gb = Boulder::Genbank->newFh( qw(M57939 M28274 L36028) ); while ($data = <$gb>) { print $data->Accession; @introns = $data->features->Intron; print "There are ",scalar(@introns)," introns.\n"; $dna = $data->Sequence; print "The dna is ",length($dna)," bp long.\n"; my @features = $data->features(-type=>[ qw(Exon Source Satellite) ], -pos=>[90,310] ); foreach (@features) { print $_->Type,"\n"; print $_->Position,"\n"; print $_->Gene,"\n"; } } # another syntax $gb = new Boulder::Genbank(-accessor=>'Entrez', -fetch => [qw/M57939 M28274 L36028/]); # local access via Yank $gb = new Boulder::Genbank(-accessor=>'Yank', -fetch=>[qw/M57939 M28274 L36028/]); while (my $s = $gb->get) { # etc. } # parse a file of Genbank records $gb = new Boulder::Genbank(-accessor=>'File', -fetch => '/usr/local/db/gbpri3.seq'); while (my $s = $gb->get) { # etc. } # parse flatfile records yourself open (GB,"/usr/local/db/gbpri3.seq"); local $/ = "//\n"; while () { my $s = Boulder::Genbank->parse($_); # etc. } DESCRIPTION =========== Boulder::Genbank provides retrieval and parsing services for NCBI Genbank-format records. It returns Genbank entries in *Note Stone: Stone, format, allowing easy access to the various fields and values. Boulder::Genbank is a descendent of Boulder::Stream, and provides a stream-like interface to a series of Stone objects. Access to Genbank is provided by three different *accessors*, which together give access to remote and local Genbank databases. When you create a new Boulder::Genbank stream, you provide one of the three accessors, along with accessor-specific parameters that control what entries to fetch. The three accessors are: Entrez This provides access to NetEntrez, accessing the most recent Genbank information directly from NCBI's Web site. The parameters passed to this accessor are either a series of Genbank accession numbers, or an Entrez query (see http://www.ncbi.nlm.nih.gov/Entrez/linking.html). If you provide a list of accession numbers, the stream will return a series of stones corresponding to the numbers. Otherwise, if you provided an Entrez query, the entries returned will be in the order returned by Entez. File This provides access to local Genbank entries by reading from a flat file (typically one of the .seq files downloadable from NCBI's Web site). The stream will return a Stone corresponding to each of the entries in the file, starting from the top of the file and working downward. The parameter in this case is the path to the local file. Yank This provides access to local Genbank entries using Will Fitzhugh's Yank program. Yank provides fast indexed access to a Genbank flat file using the accession number as the key. The parameter passed to the Yank accessor is a list of accession numbers. Stones will be returned in the requested order. By default the yank binary lives in /usr/local/bin/yank. To support other locations, you may define the environment variable YANK to contain the full path. It is also possible to parse a single Genbank entry from a text string stored in a scalar variable, returning a Stone object. Boulder::Genbank methods ------------------------ This section lists the public methods that the Boulder::Genbank class makes available. new() # Network fetch via Entrez, with accession numbers $gb=new Boulder::Genbank(-accessor => 'Entrez', -fetch => [qw/M57939 M28274 L36028/]); # Same, but shorter and uses -> operator $gb = Boulder::Genbank->new qw(M57939 M28274 L36028); # Network fetch via Entrez, with a query # Network fetch via Entrez, with a query $query = 'Homo sapiens[Organism] AND EST[Keyword]'; $gb=new Boulder::Genbank(-accessor => 'Entrez', -fetch => $query); # Local fetch via Yank, with accession numbers $gb=new Boulder::Genbank(-accessor => 'Yank', -fetch => [qw/M57939 M28274 L36028/]); # Local fetch via File $gb=new Boulder::Genbank(-accessor => 'File', -fetch => '/usr/local/genbank/gbpri3.seq'); The new() method creates a new Boulder::Genbank stream on the accessor provided. The three possible accessors are Entrez, Yank and File. If successful, the method returns the stream object. Otherwise it returns undef. new() takes the following arguments: -accessor Name of the accessor to use -fetch Parameters to pass to the accessor Specify the accessor to use with the *-accessor* argument. If not specified, it defaults to Entrez. *-fetch* is an accessor-specific argument. The possibilities are: For Entrez, the *-fetch* argument may point to a scalar, in which case it is interpreted as an Entrez query string. See http://www.ncbi.nlm.nih.gov/Entrez/linking.html for a description of the query syntax. Alternatively, *-fetch* may point to an array reference, in which case it is interpreted as a list of accession numbers to retrieve. If *-fetch* points to a hash, it is interpreted as extended information. See `"Extended Entrez Parameters"' in this node below. For Yank, the *-fetch* argument must point to an array reference containing the accession numbers to retrieve. For File, the *-fetch* argument must point to a string-valued scalar, which will be interpreted as the path to the file to read Genbank entries from. For Entrez (and Entrez only) Boulder::Genbank allows you to use a shortcut syntax in which you provde new() with a list of accession numbers: $gb = new Boulder::Genbank('M57939','M28274','L36028'); newFh() This works like new(), but returns a filehandle. To recover each GenBank record read from the filehandle with the <> operator: $fh = Boulder::GenBank->newFh('M57939','M28274','L36028'); while ($record = <$fh>) { print $record->asString; } get() The get() method is inherited from Boulder::Stream, and simply returns the next parsed Genbank Stone, or undef if there is nothing more to fetch. It has the same semantics as the parent class, including the ability to restrict access to certain top-level tags. The object returned is a *Note Stone/GB_Sequence: Stone/GB_Sequence, object, which is a descendent of *Note Stone: Stone,. put() The put() method is inherited from the parent Boulder::Stream class, and will write the passed Stone to standard output in Boulder format. This means that it is currently not possible to write a Boulder::Genbank object back into Genbank flatfile form. Extended Entrez Parameters -------------------------- The Entrez accessor recognizes extended parameters that allow you the ability to customize the search. Instead of passing a query string scalar or a list of accession numbers as the *-fetch* argument, pass a hash reference. The hashref should contain one or more of the following keys: *-query* The Entrez query to process. *-accession* The list of accession numbers to fetch, as an array ref. *-db* The database to search. This is a single-letter database code selected from the following list: m MEDLINE p Protein n Nucleotide t 3-D structure c Genome As an example, here's how to search for ESTs from Oryza sativa that have been entered or modified since January 12, 1999. my $gb = new Boulder::Genbank( -accessor=>Entrez, -query=>'Oryza sativa[Organism] AND EST[Keyword] AND 1999/01/12[Modification date]', -db => 'n' }); METHODS DEFINED BY THE GENBANK STONE OBJECT =========================================== Each record returned from the Boulder::Genbank stream defines a set of methods that correspond to features and other fields in the Genbank flat file record. *Note Stone/GB_Sequence: Stone/GB_Sequence, gives the full details, but they are listed for reference here: $length = $entry->length ------------------------ Get the length of the sequence. $start = $entry->start ---------------------- Get the start position of the sequence, currently always "1". $end = $entry->end ------------------ Get the end position of the sequence, currently always the same as the length. @feature_list = $entry->features(-pos=>[50,450],-type=>['CDS','Exon']) ---------------------------------------------------------------------- features() will search the entry feature list for those features that meet certain criteria. The criteria are specified using the *-pos* and/or *-type* argument names, as shown below. -pos Provide a position or range of positions which the feature must overlap. A single position is specified in this way: -pos => 1500; # feature must overlap postion 1500 or a range of positions in this way: -pos => [1000,1500]; # 1000 to 1500 inclusive If no criteria are provided, then features() returns all the features, and is equivalent to calling the Features() accessor. -type, -types Filter the list of features by type or a set of types. Matches are case-insensitive, so "exon", "Exon" and "EXON" are all equivalent. You may call with a single type as in: -type => 'Exon' or with a list of types, as in -types => ['Exon','CDS'] The names "-type" and "-types" can be used interchangeably. $seqObj = $entry->bioSeq; ------------------------- Returns a *Note Bio/Seq: Bio/Seq, object from the Bioperl project. Dies with an error message unless the Bio::Seq module is installed. OUTPUT TAGS =========== The tags returned by the parsing operation are taken from the NCBI ASN.1 schema. For consistency, they are normalized so that the initial letter is capitalized, and all subsequent letters are lowercase. This section contains an abbreviated list of the most useful/common tags. See "The NCBI Data Model", by James Ostell and Jonathan Kans in "Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins" (Eds. A. Baxevanis and F. Ouellette), pp 121-144 for the full listing. Top-Level Tags -------------- These are tags that appear at the top level of the parsed Genbank entry. Accession The accession number of this entry. Because of the vagaries of the Genbank data model, an entry may have multiple accession numbers (e.g. after a merging operation). Accession may therefore be a multi-valued tag. Example: my $accessionNo = $s->Accession; Authors The list of authors, as they appear on the AUTHORS line of the Genbank record. No attempt is made to parse them into individual authors. Basecount The nucleotide basecount for the entry. It is presented as a Boulder Stone with keys "a", "c", "t" and "g". Example: my $A = $s->Basecount->A; my $C = $s->Basecount->C; my $G = $s->Basecount->G; my $T = $s->Basecount->T; print "GC content is ",($G+$C)/($A+$C+$G+$T),"\n"; Blob The entire flatfile record as an unparsed chunk of text (a "blob"). This is a handy way of reassembling the record for human inspection. Comment The COMMENT line from the Genbank record. Definition The DEFINITION line from the Genbank record, unmodified. Features The FEATURES table. This is a complex stone object with multiple subtags. See the `"The Features Tag"' in this node for details. Journal The JOURNAL line from the Genbank record, unmodified. Keywords The KEYWORDS line from the Genbank record, unmodified. No attempt is made to parse the keywords into separate values. Example: my $keywords = $s->Keywords Locus The LOCUS line from the Genbank record. It is not further parsed. Medline, Nid References to other database accession numbers. Organism The taxonomic name of the organism from which this entry was derived. This line is taken from the Genbank entry unmodified. See the NCBI data model documentation for an explanation of their taxonomic syntax. Reference The REFERENCE line from the Genbank entry. There are often multiple Reference lines. Example: my @references = $s->Reference; Sequence The DNA or RNA sequence of the entry. This is presented as a single lower-case string, with all base numbers and formatting characters removed. Source The entry's SOURCE field; often giving clues on how the sequencing was performed. Title The TITLE field from the paper describing this entry, if any. The Features Tag ---------------- The Features tag points to a Stone record that contains multiple subtags. Each subtag is the name of a feature which points, in turn, to a Stone that describes the feature's location and other attributes. The full list of feature is beyond this document, but the following are the features that are most often seen: Cds a CDS Intron an intron Exon an exon Gene a gene Mrna an mRNA Polya_site a putative polyadenylation signal Repeat_unit a repetitive region Source More information about the organism and cell type the sequence was derived from Satellite a microsatellite (dinucleotide repeat) Each feature will contain one or more of the following subtags: DB_xref A cross-reference to another database in the form DB_NAME:accession_number. See the NCBI Web site for a description of these cross references. Evidence The evidence for this feature, either "experimental" or "predicted". Gene If the feature involves a gene, this will be the gene's name (or one of its names). This subtag is often seen in "Gene" and Cds features. Example: foreach ($s->Features->Cds) { my $gene = $_->Gene; my $position = $_->Position; Print "Gene $gene ($position)\n"; } Map If the feature is mapped, this provides a map position, usually as a cytogenetic band. Note A grab-back for various text notes. Number When multiple features of this type occur, this field is used to number them. Ordinarily this field is not needed because Boulder::Genbank preserves the order of features. Organism If the feature is Source, this provides the source organism. Position The position of this feature, usually expresed as a range (1970..1975). Product The protein product of the feature, if applicable, as a text string. Translation The protein translation of the feature, if applicable. SEE ALSO ======== *Note Boulder: Boulder,, *Note Boulder/Blast: Boulder/Blast, AUTHOR ====== Lincoln Stein . Copyright (c) 1997-2000 Lincoln D. Stein This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty. EXAMPLE GENBANK OBJECT ====================== The following is an excerpt from a moderately complex Genbank Stone. The Sequence line and several other long lines have been truncated for readability. Authors=Spritz,R.A., Strunk,K., Surowy,C.S.O., Hoch,S., Barton,D.E. and Francke,U. Authors=Spritz,R.A., Strunk,K., Surowy,C.S. and Mohrenweiser,H.W. Locus=HUMRNP7011 2155 bp DNA PRI 03-JUL-1991 Accession=M57939 Accession=J04772 Accession=M57733 Keywords=ribonucleoprotein antigen. Sequence=aagcttttccaggcagtgcgagatagaggagcgcttgagaaggcaggttttgcagcagacggcagtgacagcccag... Definition=Human small nuclear ribonucleoprotein (U1-70K) gene, exon 10 and 11. Journal=Nucleic Acids Res. 15, 10373-10391 (1987) Journal=Genomics 8, 371-379 (1990) Nid=g337441 Medline=88096573 Medline=91065657 Features={ Polya_site={ Evidence=experimental Position=1989 Gene=U1-70K } Polya_site={ Position=1990 Gene=U1-70K } Polya_site={ Evidence=experimental Position=1992 Gene=U1-70K } Polya_site={ Evidence=experimental Position=1998 Gene=U1-70K } Source={ Organism=Homo sapiens Db_xref=taxon:9606 Position=1..2155 Map=19q13.3 } Cds={ Codon_start=1 Product=ribonucleoprotein antigen Db_xref=PID:g337445 Position=join(M57929:329..475,M57930:183..245,M57930:358..412, ... Gene=U1-70K Translation=MTQFLPPNLLALFAPRDPIPYLPPLEKLPHEKHHNQPYCGIAPYIREFEDPRDAPPPTR... } Cds={ Codon_start=1 Product=ribonucleoprotein antigen Db_xref=PID:g337444 Evidence=experimental Position=join(M57929:329..475,M57930:183..245,M57930:358..412, ... Gene=U1-70K Translation=MTQFLPPNLLALFAPRDPIPYLPPLEKLPHEKHHNQPYCGIAPYIREFEDPR... } Polya_signal={ Position=1970..1975 Note=putative Gene=U1-70K } Intron={ Evidence=experimental Position=1100..1208 Gene=U1-70K } Intron={ Number=10 Evidence=experimental Position=1100..1181 Gene=U1-70K } Intron={ Number=9 Evidence=experimental Position=order(M57937:702..921,1..1011) Note=2.1 kb gap Gene=U1-70K } Intron={ Position=order(M57935:272..406,M57936:1..284,M57937:1..599, <1..>1208) Gene=U1-70K } Intron={ Evidence=experimental Position=order(M57935:284..406,M57936:1..284,M57937:1..599, <1..>1208) Note=first gap-0.14 kb, second gap-0.62 kb Gene=U1-70K } Intron={ Number=8 Evidence=experimental Position=order(M57935:272..406,M57936:1..284,M57937:1..599, <1..>1181) Note=first gap-0.14 kb, second gap-0.62 kb Gene=U1-70K } Exon={ Number=10 Evidence=experimental Position=1012..1099 Gene=U1-70K } Exon={ Number=11 Evidence=experimental Position=1182..(1989.1998) Gene=U1-70K } Exon={ Evidence=experimental Position=1209..(1989.1998) Gene=U1-70K } Mrna={ Product=ribonucleoprotein antigen Position=join(M57928:358..668,M57929:319..475,M57930:183..245, ... Gene=U1-70K } Mrna={ Product=ribonucleoprotein antigen Citation=[2] Evidence=experimental Position=join(M57928:358..668,M57929:319..475,M57930:183..245, ... Gene=U1-70K } Gene={ Position=join(M57928:207..719,M57929:1..562,M57930:1..577, ... Gene=U1-70K } } Reference=1 (sites) Reference=2 (bases 1 to 2155) =  File: pm.info, Node: Boulder/Medline, Next: Boulder/Omim, Prev: Boulder/Genbank, Up: Module List Fetch Medline data records as parsed Boulder Stones *************************************************** NAME ==== Boulder::Medline - Fetch Medline data records as parsed Boulder Stones SYNOPSIS ======== # parse a file of Medline records $ml = new Boulder::Medline(-accessor=>'File', -param => '/data/medline/medline.txt'); while (my $s = $ml->get) { print $s->Identifier; print $s->Abstract; } # parse flatfile yourself open (ML,"/data/medline/medline.txt"); local $/ = "*RECORD*"; while () { my $s = Boulder::Medline->parse($_); # etc. } DESCRIPTION =========== Boulder::Medline provides retrieval and parsing services for Medline records Boulder::Medline provides retrieval and parsing services for NCBI Medline records. It returns Medline entries in *Note Stone: Stone, format, allowing easy access to the various fields and values. Boulder::Medline is a descendent of Boulder::Stream, and provides a stream-like interface to a series of Stone objects. Access to Medline is provided by one *accessors*, which give access to local Medline database. When you create a new Boulder::Medline stream, you provide the accessors, along with accessor-specific parameters that control what entries to fetch. The accessors is: File This provides access to local Medline entries by reading from a flat file. The stream will return a Stone corresponding to each of the entries in the file, starting from the top of the file and working downward. The parameter is the path to the local file. It is also possible to parse a single Medline entry from a text string stored in a scalar variable, returning a Stone object. Boulder::Medline methods ------------------------ This section lists the public methods that the Boulder::Medline class makes available. new() # Local fetch via File $ml=new Boulder::Medline(-accessor => 'File', -param => '/data/medline/medline.txt'); The new() method creates a new Boulder::Medline stream on the accessor provided. The only possible accessors is File. If successful, the method returns the stream object. Otherwise it returns undef. new() takes the following arguments: -accessor Name of the accessor to use -param Parameters to pass to the accessor Specify the accessor to use with the *-accessor* argument. If not specified, it defaults to File. *-param* is an accessor-specific argument. The possibilities is: For File, the *-param* argument must point to a string-valued scalar, which will be interpreted as the path to the file to read Medline entries from. get() The get() method is inherited from Boulder::Stream, and simply returns the next parsed Medline Stone, or undef if there is nothing more to fetch. It has the same semantics as the parent class, including the ability to restrict access to certain top-level tags. put() The put() method is inherited from the parent Boulder::Stream class, and will write the passed Stone to standard output in Boulder format. This means that it is currently not possible to write a Boulder::Medline object back into Medline flatfile form. OUTPUT TAGS =========== The tags returned by the parsing operation are taken from the MEDLARS definition file MEDDOC.DOC Top-Level Tags -------------- These are tags that appear at the top level of the parsed Medline entry. ABSTRACT ABSTRACT AUTHOR ADDRESS AUTHOR CALL NUMBER CAS REGISTRY/EC NUMBER CLASS UPDATE DATE COMMENTS COUNTRY DATE OF ENTRY DATE OF PUBLICATION ENGLISH ABSTRACT INDICATOR ENTRY MONTH GENE SYMBOL ID NUMBER INDEXING PRIORITY ISSN ISSUE/PART/SUPPLEMENT JOURNAL SUBSET JOURNAL TITLE CODE LANGUAGE LAST REVISION DATE MACHINE-READABLE IDENTIFIER MeSH HEADING NO-AUTHOR INDICATOR NOT FOR PUBLICATION NUMBER OF REFERENCES PAGINATION PERSONAL NAME AS SUBJECT PUBLICATION TYPE RECORD ORIGINATOR SECONDARY SOURCE ID SPECIAL LIST INDICATOR TITLE TITLE ABBREVIATION TRANSLITERATED/VERNACULAR TITLE UNIQUE IDENTIFIER VOLUME ISSUE Identifier The Medline identifier of this entry. Identifier is a single-value tag. Example: my $identifierNo = $s->Identifier; Title The Medline title for this entry. Example: my $titledef=$s->Title; SEE ALSO ======== *Note Boulder: Boulder,, *Note Boulder/Blast: Boulder/Blast,, *Note Boulder/Genbank: Boulder/Genbank, AUTHOR ====== Lincoln Stein . Luca I.G. Toldo Copyright (c) 1997 Lincoln D. Stein Copyright (c) 1999 Luca I.G. Toldo This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty.  File: pm.info, Node: Boulder/Omim, Next: Boulder/Store, Prev: Boulder/Medline, Up: Module List Fetch Omim data records as parsed Boulder Stones ************************************************ NAME ==== Boulder::Omim - Fetch Omim data records as parsed Boulder Stones SYNOPSIS ======== # parse a file of Omim records $om = new Boulder::Omim(-accessor=>'File', -param => '/data/omim/omim.txt'); while (my $s = $om->get) { print $s->Identifier; print $s->Text; } # parse flatfile records yourself open (OM,"/data/omim/omim.txt"); local $/ = "*RECORD*"; while () { my $s = Boulder::Omim->parse($_); # etc. } DESCRIPTION =========== Boulder::Omim provides retrieval and parsing services for OMIM records Boulder::Omim provides retrieval and parsing services for NCBI Omim records. It returns Omim entries in *Note Stone: Stone, format, allowing easy access to the various fields and values. Boulder::Omim is a descendent of Boulder::Stream, and provides a stream-like interface to a series of Stone objects. Access to Omim is provided by one *accessors*, which give access to local Omim database. When you create a new Boulder::Omim stream, you provide the accessors, along with accessor-specific parameters that control what entries to fetch. The accessors is: File This provides access to local Omim entries by reading from a flat file (typically omim.txt file downloadable from NCBI's Ftp site). The stream will return a Stone corresponding to each of the entries in the file, starting from the top of the file and working downward. The parameter is the path to the local file. It is also possible to parse a single Omim entry from a text string stored in a scalar variable, returning a Stone object. Boulder::Omim methods --------------------- This section lists the public methods that the *Boulder::Omim* class makes available. new() # Local fetch via File $om=new Boulder::Omim(-accessor => 'File', -param => '/data/omim/omim.txt'); The new() method creates a new *Boulder::Omim* stream on the accessor provided. The only possible accessors is File. If successful, the method returns the stream object. Otherwise it returns undef. new() takes the following arguments: -accessor Name of the accessor to use -param Parameters to pass to the accessor Specify the accessor to use with the *-accessor* argument. If not specified, it defaults to File. *-param* is an accessor-specific argument. The possibilities is: For File, the *-param* argument must point to a string-valued scalar, which will be interpreted as the path to the file to read Omim entries from. get() The get() method is inherited from Boulder::Stream, and simply returns the next parsed Omim Stone, or undef if there is nothing more to fetch. It has the same semantics as the parent class, including the ability to restrict access to certain top-level tags. put() The put() method is inherited from the parent Boulder::Stream class, and will write the passed Stone to standard output in Boulder format. This means that it is currently not possible to write a Boulder::Omim object back into Omim flatfile form. OUTPUT TAGS =========== The tags returned by the parsing operation are taken from the names shown in the network Entrez interface to Omim. Top-Level Tags -------------- These are tags that appear at the top level of the parsed Omim entry. Identifier The Omim identifier of this entry. Identifier is a single-value tag. Example: my $identifierNo = $s->Identifier; Title The Omim title for this entry. Example: my $titledef=$s->Title; Text The Text of this Omim entry Example: my $thetext=$s->Text; Mini The text condensed version, also called "Mini" in Entrez interface Example: my $themini=$s->Mini; SeeAlso References to other relevant work. Example: my $thereviews=$s->Reviews; CreationDate This field contains the name of the person who originated the initial entry in OMIM and the date it appeared in the database. The entry may have been subsequently added to, edited, or totally rewritten by others, and their attribution is listed in the CONTRIBUTORS field. Example: my $theCreation=$s->CreationDate; Contributors This field contains a list, in chronological order, of the persons who have contributed significantly to the content of the MIM entry. The name is followed by "updated", "edited" or "re-created". Example: my @theContributors=$s->Contributors; History This field contains the edit history of this record, with an identifier and a date in which minor changes had been performed on the record. Example: my @theHistory=$s->History; References The references cited in the entry. Example: my @theReferences=$s->References; ClinicalSynopsis The content of the Clinical Synopsis data field. Example: my @theClinicalSynopsis=$s->ClinicalSynopsis; AllelicVariants The Allelic Variants Example: my @theAllelicVariants=$s->AllelicVariants; SEE ALSO ======== *Note Boulder: Boulder,, *Note Boulder/Blast: Boulder/Blast,, *Note Boulder/Genbank: Boulder/Genbank, AUTHOR ====== Lincoln Stein . Luca I.G. Toldo Copyright (c) 1997 Lincoln D. Stein Copyright (c) 1999 Luca I.G. Toldo This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty.