This is Info file pm.info, produced by Makeinfo version 1.68 from the input file bigpm.texi.  File: pm.info, Node: Text/PDF/Objind, Next: Text/PDF/Page, Prev: Text/PDF/Number, Up: Module List PDF indirect object reference. Also acts as an abstract superclass for all elements in a PDF file. ************************************************************************************************** NAME ==== Text::PDF::Objind - PDF indirect object reference. Also acts as an abstract superclass for all elements in a PDF file. INSTANCE VARIABLES ================== Instance variables differ from content variables in that they all start with a space. parent For an object which is a reference to an object in some source, this holds the reference to the source object, so that should the reference have to be de-referenced, then we know where to go and get the info. objnum (R) The object number in the source (only for object references) objgen (R) The object generation in the source There are other instance variables which are used by the parent for file control. isfree This marks whether the object is in the free list and available for re-use as another object elsewhere in the file. nextfree Holds a direct reference to the next free object in the free list. METHODS ======= Text::PDF::Objind->new() ------------------------ Creates a new indirect object uid --- Returns a Unique id for this object, creating one if it didn't have one before $r->val ------- Returns the val of this object or reads the object and then returns its value. Note that all direct subclasses *must* make their own versions of this subroutine otherwise we could be in for a very deep loop! $r->realise ----------- Makes sure that the object is fully read in, etc. $r->outobjdeep($fh, $pdf) ------------------------- If you really want to output this object, then you must need to read it first. This also means that all direct subclasses must subclass this method or loop forever! $r->outobj($fh) --------------- If this is a full object then outputs a reference to the object, otherwise calls outobjdeep to output the contents of the object at this point. $r->elementsof -------------- Abstract superclass function filler. Returns self here but should return something more useful if an array. $r->empty --------- Empties all content from this object to free up memory or to be read to pass the object into the free list. Simplistically undefs all instance variables other than object number and generation. $r->merge($objind) ------------------ This merges content information into an object reference place-holder. This occurs when an object reference is read before the object definition and the information in the read data needs to be merged into the object place-holder $r->is_obj($pdf) ---------------- Returns whether this object is a full object with its own object number or whether it is purely a sub-object. $pdf indicates which output file we are concerned that the object is an object in. $r->copy($pdf, $res) -------------------- Returns a new copy of this object. The object is assumed to be some kind of associative array and the copy is a deep copy for elements which are not PDF objects, according to $pdf, and shallow copy for those that are. Notice that calling copy on an object forces at least a one level copy even if it is a PDF object. The returned object loses its PDF object status though. If $res is defined then the copy goes into that object rather than creating a new one. It is up to the caller to bless $res, etc. Notice that elements from $self are not copied into $res if there is already an entry for them existing in $res.  File: pm.info, Node: Text/PDF/Page, Next: Text/PDF/Pages, Prev: Text/PDF/Objind, Up: Module List Represents a PDF page, inherits from *Note Text/PDF/Pages: Text/PDF/Pages, ************************************************************************** NAME ==== Text::PDF::Page - Represents a PDF page, inherits from *Note Text/PDF/Pages: Text/PDF/Pages, DESCRIPTION =========== Represents a page of output in PDF. It also keeps track of the content stream, any resources (such as fonts) being switched, etc. Page inherits from Pages due to a number of shared methods. They are really structurally quite different. INSTANCE VARIABLES ================== A page has various working variables: curstrm The currently open stream METHODS ======= Text::PDF::Page->new($pdf, $parent, $index) ------------------------------------------- Creates a new page based on a pages object (perhaps the root object). The page is also added to the parent at this point, so pages are ordered in a PDF document in the order in which they are created rather than in the order they are closed. Only the essential elements in the page dictionary are created here, all others are either optional or can be inherited. The optional index value indicates the index in the parent list that this page should be inserted (so that new pages need not be appended) $p->add($str) ------------- Adds the string to the currently active stream for this page. If no stream exists, then one is created and added to the list of streams for this page. The slightly cryptic name is an aim to keep it short given the number of times people are likely to have to type it. $p->ship_out($pdf) ------------------ Ships the page out to the given output file context  File: pm.info, Node: Text/PDF/Pages, Next: Text/PDF/SFont, Prev: Text/PDF/Page, Up: Module List a PDF pages hierarchical element. Inherits from *Note Text/PDF/Dict: Text/PDF/Dict, *********************************************************************************** NAME ==== Text::PDF::Pages - a PDF pages hierarchical element. Inherits from *Note Text/PDF/Dict: Text/PDF/Dict, DESCRIPTION =========== A Pages object is the parent to other pages objects or to page objects themselves. METHODS ======= Text::PDF::Pages->new($parent) ------------------------------ This creates a new Pages object. Notice that $parent here is not the file context for the object but the parent pages object for this pages. If we are using this class to create a root node, then $parent should point to the file context, which is identified by not having a Type of Pages. $p->out_obj($isnew) ------------------- Tells all the files that this thing is destined for that they should output this object come time to output. If this object has no parent, then it must be the root. So set as the root for the files in question and tell it to be output too. If $isnew is set, then call new_obj rather than out_obj to create as a new object in the file. $p->find_prop($key) ------------------- Searches up through the inheritance tree to find a property. $p->add_font($pdf, $font) ------------------------- Creates or edits the resource dictionary at this level in the hierarchy. If the font is already supported even through the hierarchy, then it is not added. $p->bbox($xmin, $ymin, $xmax, $ymax, [$param]) ---------------------------------------------- Specifies the bounding box for this and all child pages. If the values are identical to those inherited then no change is made. $param specifies the attribute name so that other 'bounding box'es can be set with this method. $p->proc_set(@entries) ---------------------- Ensures that the current resource contains all the entries in the proc_sets listed. If necessary it creates a local resource dictionary to achieve this.  File: pm.info, Node: Text/PDF/SFont, Next: Text/PDF/String, Prev: Text/PDF/Pages, Up: Module List PDF Standard inbuilt font resource object. Inherits from *Note Text/PDF/Dict: Text/PDF/Dict, ******************************************************************************************** NAME ==== Text::PDF::SFont - PDF Standard inbuilt font resource object. Inherits from *Note Text/PDF/Dict: Text/PDF/Dict, METHODS ======= Text::PDF::SFont->new($parent, $name, $pdfname) ----------------------------------------------- Creates a new font object with given parent and name. The name must be from one of the core 14 base fonts included with PDF. These are: Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique Symbol, ZapfDingbats The $pdfname is the name that this particular font object will be referenced by throughout the PDF file. If you want to play silly games with naming, then you can write the code to do it! All fonts in this system are full PDF objects. BUGS ==== Currently no width support for Symbol or ZapfDingbats, I haven't got my head around the AFMs yet. MacExpertEncoding not supported yet (I don't have the width info for any of the fonts) $f->width($text) ---------------- Returns the width of the text in em. $f->trim($text, $len) --------------------- Trims the given text to the given length (in per mille em) returning the trimmed text $f->out_text($text) ------------------- Acknowledges the text to be output for subsetting purposes, etc.  File: pm.info, Node: Text/PDF/String, Next: Text/PDF/TTFont, Prev: Text/PDF/SFont, Up: Module List PDF String type objects and superclass for simple objects that are basically stringlike (Number, Name, etc.) ************************************************************************************************************ NAME ==== Text::PDF::String - PDF String type objects and superclass for simple objects that are basically stringlike (Number, Name, etc.) METHODS ======= Text::PDF::String->from_pdf($string) ------------------------------------ Creates a new string object (not a full object yet) from a given string. The string is parsed according to input criteria with escaping working. Text::PDF::String->new($string) ------------------------------- Creates a new string object (not a full object yet) from a given string. The string is parsed according to input criteria with escaping working. $s->convert($str) ----------------- Returns $str converted as per criteria for input from PDF file $s->val ------- Returns the value of this string (the string itself). $->as_pdf --------- Returns the string formatted for output as PDF $s->outobjdeep -------------- Outputs the string in PDF format, complete with necessary conversions  File: pm.info, Node: Text/PDF/TTFont, Next: Text/PDF/TTFont0, Prev: Text/PDF/String, Up: Module List Inherits from *Note Text/PDF/Dict: Text/PDF/Dict, and represents a TrueType font within a PDF file. *************************************************************************************************** NAME ==== Text::PDF::TTFont - Inherits from *Note Text/PDF/Dict: Text/PDF/Dict, and represents a TrueType font within a PDF file. DESCRIPTION =========== A font consists of two primary parts in a PDF file: the header and the font descriptor. Whilst two fonts may share font descriptors, they will have their own header dictionaries including encoding and widhth information. INSTANCE VARIABLES ================== There are no instance variables beyond the variables which directly correspond to entries in the appropriate PDF dictionaries. METHODS ======= Text::PDF::TTFont->new($parent, $fontfname, $pdfname, %opts) ------------------------------------------------------------ Creates a new font resource for the given fontfile. This includes the font descriptor and the font stream. The $pdfname is the name by which this font resource will be known throught a particular PDF file. All font resources are full PDF objects. $t->width($text) ---------------- Measures the width of the given text according to the widths in the font $t->trim($text, $len) --------------------- Trims the given text to the given length (in per mille em) returning the trimmed text $t->out_text($text) Indicates to the font that the text is to be output and returns the text to be output -------------------------------------------------------------------------------------------------------------- $f->copy -------- Copies the font object excluding the name, widths and encoding, etc. TITLE ===== Text::PDF::TTIOString - internal IO type handle for string output for font embedding. This code is ripped out of IO::Scalar, to save the direct dependence for so little. See IO::Scalar for details  File: pm.info, Node: Text/PDF/TTFont0, Next: Text/PDF/Utils, Prev: Text/PDF/TTFont, Up: Module List Inherits from `PDF::Dict' in this node and represents a TrueType Type 0 font within a PDF file. *********************************************************************************************** NAME ==== Text::PDF::TTFont0 - Inherits from `PDF::Dict' in this node and represents a TrueType Type 0 font within a PDF file. DESCRIPTION =========== A font consists of two primary parts in a PDF file: the header and the font descriptor. Whilst two fonts may share font descriptors, they will have their own header dictionaries including encoding and widhth information. INSTANCE VARIABLES ================== There are no instance variables beyond the variables which directly correspond to entries in the appropriate PDF dictionaries. METHODS ======= Text::PDF::TTFont->new($parent, $fontfname. $pdfname) ----------------------------------------------------- Creates a new font resource for the given fontfile. This includes the font descriptor and the font stream. The $pdfname is the name by which this font resource will be known throughout a particular PDF file. All font resources are full PDF objects. out_text($text) --------------- Returns the string to be put into a content stream for text to be output in this font. The text is assumed to be UTF8 encoded and the return string is a glyph sequence for the text. If subsetting is enabled, then all the glyphs returned are also marked for output. width($text) ------------ Returns the width of the string, assuming it to be UTF8 encoded. outobjdeep($fh, $pdf) --------------------- Handles the creation of the font stream including subsetting at this point. So if you get this far, that's it for subsetting. ship_out($pdf) -------------- Ship this font out to the given $pdf file context empty ----- Empty the font of as much as possible in order to save memory  File: pm.info, Node: Text/PDF/Utils, Next: Text/ParseWords, Prev: Text/PDF/TTFont0, Up: Module List Utility functions for PDF library ********************************* NAME ==== Text::PDF::Utils - Utility functions for PDF library DESCRIPTION =========== A set of utility functions to save the fingers of the PDF library users! FUNCTIONS ========= PDFBool ------- Creates a Bool via Text::PDF::Bool->new PDFArray -------- Creates an array via Text::PDF::Array->new PDFDict ------- Creates a dict via Text::PDF::Dict->new PDFName ------- Creates a name via Text::PDF::Name->new PDFNum ------ Creates a number via Text::PDF::Number->new PDFStr ------ Creates a string via Text::PDF::String->new asPDFBool --------- Returns a boolean value in PDF output form asPDFStr -------- Returns a string in PDF output form (including () or <>) asPDFName --------- Returns a Name in PDF Output form (including /) asPDFNum -------- Returns a number in PDF output form unpacku($str) ------------- Returns a list of unicode values for the given UTF8 string  File: pm.info, Node: Text/ParseWords, Next: Text/Query, Prev: Text/PDF/Utils, Up: Module List parse text into an array of tokens or array of arrays ***************************************************** NAME ==== Text::ParseWords - parse text into an array of tokens or array of arrays SYNOPSIS ======== use Text::ParseWords; @lists = &nested_quotewords($delim, $keep, @lines); @words = "ewords($delim, $keep, @lines); @words = &shellwords(@lines); @words = &parse_line($delim, $keep, $line); @words = &old_shellwords(@lines); # DEPRECATED! DESCRIPTION =========== The &nested_quotewords() and "ewords() functions accept a delimiter (which can be a regular expression) and a list of lines and then breaks those lines up into a list of words ignoring delimiters that appear inside quotes. "ewords() returns all of the tokens in a single long list, while &nested_quotewords() returns a list of token lists corresponding to the elements of @lines. &parse_line() does tokenizing on a single string. The &*quotewords() functions simply call &parse_lines(), so if you're only splitting one line you can call &parse_lines() directly and save a function call. The $keep argument is a boolean flag. If true, then the tokens are split on the specified delimiter, but all other characters (quotes, backslashes, etc.) are kept in the tokens. If $keep is false then the &*quotewords() functions remove all quotes and backslashes that are not themselves backslash-escaped or inside of single quotes (i.e., "ewords() tries to interpret these characters just like the Bourne shell). NB: these semantics are significantly different from the original version of this module shipped with Perl 5.000 through 5.004. As an additional feature, $keep may be the keyword "delimiters" which causes the functions to preserve the delimiters in each string as tokens in the token lists, in addition to preserving quote and backslash characters. &shellwords() is written as a special case of "ewords(), and it does token parsing with whitespace as a delimiter- similar to most Unix shells. EXAMPLES ======== The sample program: use Text::ParseWords; @words = "ewords('\s+', 0, q{this is "a test" of\ quotewords \"for you}); $i = 0; foreach (@words) { print "$i: <$_>\n"; $i++; } produces: 0: 1: 2: 3: 4: <"for> 5: demonstrating: 1. a simple word 2. multiple spaces are skipped because of our $delim 3. use of quotes to include a space in a word 4. use of a backslash to include a space in a word 5. use of a backslash to remove the special meaning of a double-quote 6. another simple word (note the lack of effect of the backslashed double-quote) Replacing `"ewords('\s+', 0, q{this is...})' with `&shellwords(q{this is...})' is a simpler way to accomplish the same thing. AUTHORS ======= Maintainer is Hal Pomeranz , 1994-1997 (Original author unknown). Much of the code for &parse_line() (including the primary regexp) from Joerk Behrends . Examples section another documentation provided by John Heidemann Bug reports, patches, and nagging provided by lots of folks- thanks everybody! Special thanks to Michael Schwern for assuring me that a &nested_quotewords() would be useful, and to Jeff Friedl for telling me not to worry about error-checking (sort of- you had to be there).  File: pm.info, Node: Text/Query, Next: Text/Query/Advanced, Prev: Text/ParseWords, Up: Module List Query processing framework ************************** NAME ==== Text::Query - Query processing framework SYNOPSIS ======== use Text::Query; # Constructor $query = Text::Query->new([QSTRING] [OPTIONS]); # Methods $query->prepare(QSTRING [OPTIONS]); $query->match([TARGET]); $query->matchscalar([TARGET]); DESCRIPTION =========== This module provides an object that matches a data source against a query expression. Query expressions are compiled into an internal form when a new object is created or the prepare method is called; they are not recompiled on each match. The class provided by this module uses four packages to process the query. The query parser parses the question and calls a query expression builder (internal form of the question). The optimizer is then called to reduce the complexity of the expression. The solver applies the expression on a data source. The following parsers are provided: Text::Query::ParseAdvanced Text::Query::ParseSimple The following builders are provided: Text::Query::BuildAdvancedString Text::Query::BuildSimpleString The following solver is provided: Text::Query::SolveSimpleString Text::Query::SolveAdvancedString EXAMPLES ======== use Text::Query; my $q=new Text::Query('hello and world', -parse => 'Text::Query::ParseAdvanced', -solve => 'Text::Query::SolveAdvancedString', -build => 'Text::Query::BuildAdvancedString'); die "bad query expression" if not defined $q; print if $q->match; ... $q->prepare('goodbye or adios or ta ta', -litspace => 1, -case => 1); #requires single space between the two ta's if($q->match($line)) { #doesn't match "Goodbye" ... $q->prepare('"and" or "or"'); #quoting operators for literal match ... $q->prepare('\\bintegrate\\b', -regexp => 1); #won't match "disintegrated" CONSTRUCTOR =========== new ([QSTRING] [OPTIONS]) This is the constructor for a new Text::Query object. If a `QSTRING' is given it will be compiled to internal form. OPTIONS are passed in a hash like fashion, using key and value pairs. Possible options are: *-parse* - Package name of the parser. Default is Text::Query::ParseSimple. *-build* - Package name of the builder. Default is Text::Query::Build. *-optimize* - Package name of the optimizer. Default is Text::Query::Optimize. *-solve* - Package name of the solver. Default is Text::Query::Solve. *-mode* - Name of predefined group of packages to use. Options are currently `simple_text' and `advanced_text'. These options are handled by the configure method. All other options are passed to the parser prepare function. See the corresponding manual pages for a description. If `QSTRING' is undefined, the prepare function is not called. The constructor will croak if a `QSTRING' was supplied and had illegal syntax. METHODS ======= configure ([OPTIONS]) Set the parse, build, optimize or solve packages. See the CONSTRUCTOR description for explanations. prepare (QSTRING [OPTIONS]) Compiles the query expression in `QSTRING' to internal form and sets any options (same as in the constructor). prepare may be used to change the query expression and options for an existing query object. If OPTIONS are omitted, any options set by a previous call to prepare are persistent. The optimizer (-optimize) is called with the result of the parser (-parse). The parser uses the builder (-build) to construct the internal form. This method returns a reference to the query object if the syntax of the expression was legal, or croak if not. match ([TARGET]) Calls the match method of the solver (-solve). matchscalar ([TARGET]) Calls the matchscalar method of the solver (-solve). SEE ALSO ======== Text::Query::ParseAdvanced(3), Text::Query::ParseSimple(3), Text::Query::BuildSimpleString(3), Text::Query::BuildAdvanedString(3), Text::Query::SolveSimpleString(3), Text::Query::SolveAdvancedString(3), Text::Query::Build(3), Text::Query::Parse(3), Text::Query::Solve(3), Text::Query::Optimize(3) AUTHORS ======= Eric Bohlman (ebohlman@netcom.com) Loic Dachary (loic@senga.org)  File: pm.info, Node: Text/Query/Advanced, Next: Text/Query/Build, Prev: Text/Query, Up: Module List Match text against Boolean expression ************************************* NAME ==== Text::Query::Advanced - Match text against Boolean expression SYNOPSIS ======== use Text::Query::Advanced; # Constructor $query = Text::Query::Advanced->new([QSTRING] [OPTIONS]); # Methods $query->prepare(QSTRING [OPTIONS]); $query->match([TARGET]); $query->matchscalar([TARGET]); # Methods that can be overridden to produce custom query trees, etc. $query->build_final_expression(Q1); $query->build_expression(Q1,Q2); $query->build_expression_finish(Q1); $query->build_conj(Q1,Q2,F); $query->build_near(Q1,Q2); $query->build_concat(Q1,Q2); $query->build_negation(Q1); $query->build_literal(Q1); DESCRIPTION =========== This module provides an object that matches a string or list of strings against a Boolean query expression similar to an AltaVista "advanced query". Elements of the query expression may be regular expressions or literal text. Query expressions are compiled into an internal form (currently, a regular expression making use of most of the tricks listed in Recipe 6.17 of _The Perl Cookbook_) when a new object is created or the prepare method is called; they are not recompiled on each match. The class provided by this module may be subclassed to produce query processors that match against input other than literal strings, e.g. indices. Query expressions consist of literal strings (or regexps) joined by the following operators, in order of precedence from lowest to highest: OR, | AND, & NEAR NOT, ! Operator names are not case-sensitive. Note that if you want to use a | in a regexp, you need to backwhack it to keep it from being seen as a query operator. Sub-expressions may be quoted in single or double quotes to match "and," "or," or "not" literally and may be grouped in parentheses (`(, )') to alter the precedence of evaluation. A parenthesized sub-expression may also be concatenated with other sub- expressions to match sequences: `(Perl or Python) interpreter' would match either "Perl interpreter" or "Python interpreter". Concatenation has a precedence higher than NOT but lower than AND. Juxtaposition of simple words has the highest precedence of all. EXAMPLES ======== use Text::Query::Advanced; my $q=new Text::Query::Advanced('hello and world'); die "bad query expression" if not defined $q; print if $q->match; ... $q->prepare('goodbye or adios or ta ta',-litspace=>1,-case=>1); #requires single space between the two ta's if ($q->match($line)) { #doesn't match "Goodbye" ... $q->prepare('"and" or "or"'); #quoting operators for literal match ... $q->prepare('\\bintegrate\\b',-regexp=>1); #won't match "disintegrated" CONSTRUCTOR =========== new ([QSTRING] [OPTIONS]) This is the constructor for a new Text::Query object. If a `QSTRING' is given it will be compiled to internal form. OPTIONS are passed in a hash like fashion, using key and value pairs. Possible options are: *-case* - If true, do case-sensitive match. *-litspace* - If true, match spaces (except between operators) in `QSTRING' literally. If false, match spaces as `\s+'. *-near* - Sets the number of words that can occur between two expressions and still satisfy the NEAR operator. The default is 10. *-regexp* - If true, treat patterns in `QSTRING' as regular expressions rather than literal text. *-whole* - If true, match whole words only, not substrings of words. The constructor will return undef if a `QSTRING' was supplied and had illegal syntax. METHODS ======= prepare (QSTRING [OPTIONS]) Compiles the query expression in `QSTRING' to internal form and sets any options (same as in the constructor). prepare may be used to change the query expression and options for an existing query object. If OPTIONS are omitted, any options set by a previous call to the constructor or prepare remain in effect. This method returns a reference to the query object if the syntax of the expression was legal, or undef if not. match ([TARGET]) If `TARGET' is a scalar, match returns a true value if the string specified by `TARGET' matches the query object's query expression. If `TARGET' is not given, the match is made against $_. If `TARGET' is an array, match returns a (possibly empty) list of all matching elements. If the elements of the array are references to sub- arrays, the match is done against the first element of each sub-array. This allows arbitrary information (e.g. filenames) to be associated with each string to match. If `TARGET' is a reference to an array, match returns a reference to a (possibly empty) list of all matching elements. matchscalar ([TARGET]) Behaves just like MATCH when `TARGET' is a scalar or is not given. Slightly faster than MATCH under these circumstances. CODE-GENERATION METHODS ======================= The following methods are used to generate regexps based on query elements. They may be overridden to generate other forms of matching code, such as trees to be used by a custom version of match that evaluates index lists or the like. All these methods return a scalar corresponding to the code that performs the specified options. As supplied, they return regexp strings, but overridden methods could return objects, array references, etc. Parameters Q1 and Q2 are the same type of scalar as the return values. build_final_expression(Q1) Does any final processing to generate code to match a top-level expression. As supplied, optionally adds case-insensitivity code and then uses `qr//' to compile the regexp. The return value will be stored in the object's `matchexp' field. It is NOT necessarily of a type that can be passed to the other code-generation methods. build_expression(Q1,Q2) Generate code to match `Q1' OR `Q2' build_expression_finish(Q1) Generate any code needed to enclose an expression. As supplied, encloses the generated regexp in non-capturing parentheses. build_conj(Q1,Q2,F) Generate code needed to match `Q1' AND `Q2'. F will be true if this is the first time this method is called in a sequence of several conjunctions (the supplied method uses this to factor a common ^ out of the generated sub- expressions, which greatly speeds up matching). =item build_near(Q1,Q2) Generate code needed to match `Q1' NEAR `Q2'. build_concat(Q1,Q2) Generate code needed to match `Q1' immediately followed by `Q2'. build_negation(Q1) Generate code needed to match NOT `Q1'. build_literal(Q1) Generate code to match `Q1' as a literal. AUTHOR ====== Eric Bohlman (ebohlman@netcom.com) CREDITS ======= The parse_tokens routine was adapted from the parse_line routine in Text::Parsewords. COPYRIGHT ========= Copyright (c) 1998-1999 Eric Bohlman. All rights reserved. This program is free software; you can redistribute and/or modify it under the same terms as Perl itself. =cut  File: pm.info, Node: Text/Query/Build, Next: Text/Query/BuildAdvancedString, Prev: Text/Query/Advanced, Up: Module List Base class for query builders ***************************** NAME ==== Text::Query::Build - Base class for query builders SYNOPSIS ======== package Text::Query::BuildMy; use Text::Query::Build; use vars qw(@ISA); @ISA = qw(Text::Query::Build); DESCRIPTION =========== This module provides a virtual base class for query builders. Query builders are called by the parser logic. A given set of functions is provided by the builder to match a Boolean logic. All the methods return a scalar corresponding to the code that performs the specified options. Parameters Q1 and Q2 are the same type of scalar as the return values. METHODS ======= matchstring() Return a string that represent the last built expression. Two identical expressions should generate the same string. This is for testing purpose. CODE-GENERATION METHODS ======================= build_init() Called before building the expression. A chance to initialize object data. build_final_expression(Q1) Does any final processing to generate code to match a top-level expression. The return value is NOT necessarily of a type that can be passed to the other code-generation methods. build_expression(Q1,Q2) Generate code to match `Q1' OR `Q2' build_expression_finish(Q1) Generate any code needed to enclose an expression. build_conj(Q1,Q2,F) Generate code needed to match `Q1' AND `Q2'. F will be true if this is the first time this method is called in a sequence of several conjunctions. =item build_near(Q1,Q2) Generate code needed to match `Q1' NEAR `Q2'. build_concat(Q1,Q2) Generate code needed to match `Q1' immediately followed by `Q2'. build_negation(Q1) Generate code needed to match NOT `Q1'. build_literal(Q1) Generate code to match `Q1' as a literal. build_scope_start($scope) Generate code to enter in the `$scope' query context. build_scope_end($scope,Q1) Generate code needed to match `Q1' in the `$scope' context. build_mandatory(Q1) Generate code to match `Q1' (think + in AltaVista syntax). build_forbiden(Q1) Generate code to match NOT `Q1' (think - in AltaVista syntax). SEE ALSO ======== Text::Query(3) AUTHORS ======= Eric Bohlman (ebohlman@netcom.com) Loic Dachary (loic@senga.org)  File: pm.info, Node: Text/Query/BuildAdvancedString, Next: Text/Query/BuildSimpleString, Prev: Text/Query/Build, Up: Module List Builder for Text::Query::ParseAdvanced to build regexps ******************************************************* NAME ==== Text::Query::BuildAdvancedString - Builder for Text::Query::ParseAdvanced to build regexps SYNOPSIS ======== use Text::Query; my $q=new Text::Query('hello and world', -parse => 'Text::Query::ParseAdvanced', -solve => 'Text::Query::SolveAdvancedString', -build => 'Text::Query::BuildAdvancedString'); DESCRIPTION =========== Build a regexp to match the advanced query parsed by Text::Query::ParseAdvanced. The words of the query can be regular expressions and will provide the expected result if the `-regexp' option is set. SEE ALSO ======== Text::Query(3) Text::Query::Build(3) AUTHORS ======= Eric Bohlman (ebohlman@netcom.com) Loic Dachary (loic@senga.org)  File: pm.info, Node: Text/Query/BuildSimpleString, Next: Text/Query/Optimize, Prev: Text/Query/BuildAdvancedString, Up: Module List Builder for Text::Query::ParseSimple to build regexps ***************************************************** NAME ==== Text::Query::BuildSimpleString - Builder for Text::Query::ParseSimple to build regexps SYNOPSIS ======== use Text::Query; my $q=new Text::Query('+hello +world', -parse => 'Text::Query::ParseSimple', -solve => 'Text::Query::SolveSimpleString', -build => 'Text::Query::BuildSimpleString'); DESCRIPTION =========== Build a regexp to match the simple query parsed by Text::Query::ParseSimple. The words of the query can be regular expressions and will provide the expected result if the `-regexp' option is set. SEE ALSO ======== Text::Query(3) Text::Query::Build(3) AUTHORS ======= Eric Bohlman (ebohlman@netcom.com) Loic Dachary (loic@senga.org)  File: pm.info, Node: Text/Query/Optimize, Next: Text/Query/Parse, Prev: Text/Query/BuildSimpleString, Up: Module List Base class for query parsers **************************** NAME ==== Text::Query::Parse - Base class for query parsers SYNOPSIS ======== package Text::Query::OptimizeSmart; use Text::Query::Optimize; use vars qw(@ISA); @ISA = qw(Text::Query::Optimize); DESCRIPTION =========== This module provides a virtual base class for query optimizers. It defines the optimize method that is called by the `Text::Query' object to optimize the internal query. METHODS ======= optimize (INTERNAL) Returns the INTERNAL argument after optimization. The default implementation returns the argument untouched. SEE ALSO ======== Text::Query(3) AUTHORS ======= Eric Bohlman (ebohlman@netcom.com) Loic Dachary (loic@senga.org)  File: pm.info, Node: Text/Query/Parse, Next: Text/Query/ParseAdvanced, Prev: Text/Query/Optimize, Up: Module List Base class for query parsers **************************** NAME ==== Text::Query::Parse - Base class for query parsers SYNOPSIS ======== package Text::Query::ParseThisSyntax; use Text::Query::Parse; use vars qw(@ISA); @ISA = qw(Text::Query::Parse); DESCRIPTION =========== This module provides a virtual base class for query parsers. It defines the prepare method that is called by the `Text::Query' object to compile the query string. MEMBERS ======= *-build* Pointer to a Text::Query::Build object. scope Scope stack. Defines the context in which the query must be solved. token The current token. Destroyed by prepare. tokens A reference to the list of all the tokens. Filled by parse_tokens. Destroyed by prepare. parseopts A reference to a hash table containing all the parameters given to the prepare function. *-verbose* Integer indicating the desired verbose level. METHODS ======= prepare (QSTRING [OPTIONS]) Compiles the query expression in `QSTRING' to internal form and sets any options. First calls `build_init' to reset the builder and destroy the token and tokens members. Then calls parse_tokens to fill the tokens member. Then calls expression to use the tokens from tokens. The expression is expected to call the build_* functions to build the compiled expression. At last calls `build_final_expression' with the result of expression. A derived parser must redefine this function to define default values for specific options. expression () Must be redefined by derived package. Returns the internal form of the question built from build_* functions using the tokens. parse_tokens (QSTRING) Must be redefined by derived package. Parses the `QSTRING' scalar and fills the tokens member with lexical units. Shortcuts to the corresponding function of the Text::Query::Build object found in the `-build' member. OPTIONS ======= These are the options of the prepare method and the constructor. -quotes defaults to \'\" Defines the quote characters. -case defaults to 0 If true, do case-sensitive match. -litspace defaults to 0 If true, match spaces (except between operators) in `QSTRING' literally. If false, match spaces as `\s+'. -regexp defaults to 0 If true, treat patterns in `QSTRING' as regular expressions rather than literal text. -whole defaults to 0 If true, match whole words only, not substrings of words. SEE ALSO ======== Text::Query(3) AUTHORS ======= Eric Bohlman (ebohlman@netcom.com) Loic Dachary (loic@senga.org)  File: pm.info, Node: Text/Query/ParseAdvanced, Next: Text/Query/ParseSimple, Prev: Text/Query/Parse, Up: Module List Parse AltaVista advanced query syntax ************************************* NAME ==== Text::Query::ParseAdvanced - Parse AltaVista advanced query syntax SYNOPSIS ======== use Text::Query; my $q=new Text::Query('hello and world', -parse => 'Text::Query::ParseAdvanced', -solve => 'Text::Query::SolveAdvancedString', -build => 'Text::Query::BuildAdvancedString'); DESCRIPTION =========== This module provides an object that parses a string containing a Boolean query expression similar to an AltaVista "advanced query". It's base class is Text::Query::Parse; Query expressions consist of literal strings (or regexps) joined by the following operators, in order of precedence from lowest to highest: OR, | AND, & NEAR, ~ NOT, ! Operator names are not case-sensitive. Note that if you want to use a | in a regexp, you need to backwhack it to keep it from being seen as a query operator. Sub-expressions may be quoted in single or double quotes to match "and," "or," or "not" literally and may be grouped in parentheses (`(, )') to alter the precedence of evaluation. A parenthesized sub-expression may also be concatenated with other sub- expressions to match sequences: `(Perl or Python) interpreter' would match either "Perl interpreter" or "Python interpreter". Concatenation has a precedence higher than NOT but lower than AND. Juxtaposition of simple words has the highest precedence of all. OPTIONS ======= These are the additional options of the prepare method and the constructor. -near defaults to 10 Sets the number of words that can occur between two expressions and still satisfy the NEAR operator. -operators defaults to and, or, not, near Sets the operator names. The argument of the option is a pointer to a hash table mapping the default names to desired names. For instance: { 'or' => 'ou', 'and' => 'et', 'near' => 'proche', 'not' => 'non', } -scope_map default to {} Map the scope names to other names. If a scope is specified as `scope:' search the map for an entry whose key is scope and replace scope with the scalar found. For instance: { 'scope' => 'otherscope' } SEE ALSO ======== Text::Query(3) Text::Query::Parse(3) AUTHORS ======= Eric Bohlman (ebohlman@netcom.com) Loic Dachary (loic@senga.org)  File: pm.info, Node: Text/Query/ParseSimple, Next: Text/Query/Simple, Prev: Text/Query/ParseAdvanced, Up: Module List Parse AltaVista simple query syntax *********************************** NAME ==== Text::Query::ParseSimple - Parse AltaVista simple query syntax SYNOPSIS ======== use Text::Query; my $q=new Text::Query('hello and world', -parse => 'Text::Query::ParseSimple', -solve => 'Text::Query::SolveSimpleString', -build => 'Text::Query::BuildSimpleString'); DESCRIPTION =========== This module provides an object that parses a string containing a Boolean query expression similar to an AltaVista "simple query". Elements of the query expression may be assigned weights. It's base class is Text::Query::Parse; Query expressions are compiled into an internal form when a new object is created or the prepare method is called; they are not recompiled on each match. Query expressions consist of words (sequences of non-whitespace) or phrases (quoted strings) separated by whitespace. Words or phrases prefixed with a + must be present for the expression to match; words or phrases prefixed with a - must be absent for the expression to match. Words or phrases may optionally be followed by a number in parentheses (no whitespace is allowed between the word or phrase and the parenthesized number). This number specifies the weight given to the word or phrase. If a weight is not given, a weight of 1 is assumed. EXAMPLES ======== use Text::Query; my $q=new Text::Query('+hello world', -solve => 'Text::Query::SolveSimpleString', -build => 'Text::Query::BuildSimpleString'); die "bad query expression" if not defined $q; $count=$q->match; ... $q->prepare('goodbye adios -"ta ta"', -litspace=>1); #requires single space between the two ta's if ($q->match($line, -case=>1)) { #doesn't match "Goodbye" ... $q->prepare('\\bintegrate\\b', -regexp=>1); #won't match "disintegrated" ... $q->prepare('information(2) retrieval'); #information has twice the weight of retrieval SEE ALSO ======== Text::Query(3) Text::Query::Parse(3) AUTHORS ======= Eric Bohlman (ebohlman@netcom.com) Loic Dachary (loic@senga.org)  File: pm.info, Node: Text/Query/Simple, Next: Text/Query/Solve, Prev: Text/Query/ParseSimple, Up: Module List Match text against simple query expression and return relevance value for ranking ********************************************************************************* NAME ==== Text::Query::Simple - Match text against simple query expression and return relevance value for ranking SYNOPSIS ======== use Text::Query::Simple; # Constructor $query = Text::Query::Simple->new([QSTRING] [OPTIONS]); # Methods $query->prepare(QSTRING [OPTIONS]); $query->match([TARGET]); $query->matchscalar([TARGET]); DESCRIPTION =========== This module provides an object that tests a string or list of strings against a query expression similar to an AltaVista "simple query" and returns a "relevance value." Elements of the query expression may be regular expressions or literal text, and may be assigned weights. Query expressions are compiled into an internal form when a new object is created or the prepare method is called; they are not recompiled on each match. Query expressions consist of words (sequences of non-whitespace), regexps or phrases (quoted strings) separated by whitespace. Words or phrases prefixed with a + must be present for the expression to match; words or phrases prefixed with a - must be absent for the expression to match. A successful match returns a count of the number of times any of the words (except ones prefixed with -) appeared in the text. This type of result is useful for ranking documents according to relevance. Words or phrases may optionally be followed by a number in parentheses (no whitespace is allowed between the word or phrase and the parenthesized number). This number specifies the weight given to the word or phrase; it will be added to the count each time the word or phrase appears in the text. If a weight is not given, a weight of 1 is assumed. EXAMPLES ======== use Text::Query::Simple; my $q=new Text::Query::Simple('+hello world'); die "bad query expression" if not defined $q; $count=$q->match; ... $q->prepare('goodbye adios -"ta ta",-litspace=>1); #requires single space between the two ta's if ($q->match($line,-case=>1)) { #doesn't match "Goodbye" ... $q->prepare('\\bintegrate\\b',-regexp=>1); #won't match "disintegrated" ... $q->prepare('information(2) retrieval'); #information has twice the weight of retrieval CONSTRUCTOR =========== new ([QSTRING] [OPTIONS]) This is the constructor for a new Text::Query::Simple object. If a `QSTRING' is given it will be compiled to internal form. OPTIONS are passed in a hash like fashion, using key and value pairs. Possible options are: *-case* - If true, do case-sensitive match. *-litspace* - If true, match spaces (except between operators) in `QSTRING' literally. If false, match spaces as `\s+'. *-regexp* - If true, treat patterns in `QSTRING' as regular expressions rather than literal text. *-whole* - If true, match whole words only, not substrings of words. The constructor will return undef if a `QSTRING' was supplied and had illegal syntax. METHODS ======= prepare (QSTRING [OPTIONS]) Compiles the query expression in `QSTRING' to internal form and sets any options (same as in the constructor). prepare may be used to change the query expression and options for an existing query object. If OPTIONS are omitted, any options set by a previous call to the constructor or prepare remain in effect. This method returns a reference to the query object if the syntax of the expression was legal, or undef if not. match ([TARGET]) If `TARGET' is a scalar, match returns the number of words in the string specified by `TARGET' that match the query object's query expression. If `TARGET' is not given, the match is made against $_. If `TARGET' is an array, match returns a list of references to anonymous arrays consisting of each element followed by its match count. The list is sorted in descending order by match count. If the elements of `TARGET' were anonymous arrays, the match count is appended to each element. This allows arbitrary information (such as a filename) to be associated with each element. If `TARGET' is a reference to an array, match returns a reference to a sorted list of matching items, with counts, for all elements. matchscalar ([TARGET]) Behaves just like MATCH when `TARGET' is a scalar or is not given. Slightly faster than MATCH under these circumstances. RESTRICTIONS ============ This module requires Perl 5.005 or higher due to the use of evaluated expressions in regexes AUTHOR ====== Eric Bohlman (ebohlman@netcom.com) CREDITS ======= The parse_tokens routine was adapted from the parse_line routine in Text::Parsewords. COPYRIGHT ========= Copyright (c) 1998 Eric Bohlman. All rights reserved. This program is free software; you can redistribute and/or modify it under the same terms as Perl itself. =cut  File: pm.info, Node: Text/Query/Solve, Next: Text/Query/SolveAdvancedString, Prev: Text/Query/Simple, Up: Module List Base class for query resolution ******************************* NAME ==== Text::Query::Solve - Base class for query resolution SYNOPSIS ======== package Text::Query::SolveSource; use Text::Query::Parse; use vars qw(@ISA); @ISA = qw(Text::Query::Solve); DESCRIPTION =========== This module provides a virtual base class for query resolution. It defines the match and matchscalar method that is called by the `Text::Query' object to apply a query on a data source. METHODS ======= match (EXPR [TARGET]) If `TARGET' is a scalar, match returns a true value if the data source specified by `TARGET' matches the EXPR query expression. If `TARGET' is not given, the match is made against $_. If `TARGET' is an array, match returns a (possibly empty) list of all matching elements. If the elements of the array are references to sub- arrays, the match is done against the first element of each sub-array. This allows arbitrary information (e.g. filenames) to be associated with each data source to match. If `TARGET' is a reference to an array, match returns a reference to a (possibly empty) list of all matching elements. matchscalar (EXPR [TARGET]) Behaves just like MATCH when `TARGET' is a scalar or is not given. SEE ALSO ======== Text::Query(3) AUTHORS ======= Eric Bohlman (ebohlman@netcom.com) Loic Dachary (loic@senga.org)