This is Info file pm.info, produced by Makeinfo version 1.68 from the input file bigpm.texi. File: pm.info, Node: Parse/RecDescent/Consumer, Next: Parse/Template, Prev: Parse/RecDescent, Up: Module List Perl extension for blah blah blah ********************************* NAME ==== Parse::RecDescent::Consumer - Perl extension for blah blah blah SYNOPSIS ======== use Parse::RecDescent::Consumer; # then in a Parse::RecDescent grammar... url: url: { $C = Consumer($text) } httpurl { REBOL::url->new(value => $C->($text)) } | { $C = Consumer($text) } ftpurl { REBOL::url->new(value => $C->($text)) } DESCRIPTION =========== A common need when writing grammars is to know how much text was consumed at different points in a parse. Usually, this involves a lot of brain-twisting unwinding of of highly nested list-of-lists (of lists...). Instead this module allows you to take the low-road approach. You simply create a `Consumer' which records the current text about to be parsed. After you have successfully transitioned through the desired tokens, you simply re-call your `Consumer' and it gives you the text that was consumed during the token transitions without you having to unravel a highly nested list-of-lists (of lists...). IMPLEMENTATION ============== when you first call Consumer(), you are returned a closure which has the current text remaining to be parsed in it. When you evaluate the closure, passing it the (more or less consumed) new text, the closure calculates the difference in length between the two texts, and returns a substring of the first equating to the amount of text consumed between calls: sub Parse::RecDescent::Consumer { my $text=shift; my $closure = sub { my $new_length=length($_[0]); my $original_text = $text; my $original_length = length($text); return substr($original_text, 0, ($original_length-$new_length)); } } EXPORT ------ None by default. AUTHOR ====== A. U. Thor, a.u.thor@a.galaxy.far.far.away SEE ALSO ======== perl(1). File: pm.info, Node: Parse/Template, Next: Parse/Text, Prev: Parse/RecDescent/Consumer, Up: Module List Processor for templates containing Perl expressions *************************************************** NAME ==== Parse::Template - Processor for templates containing Perl expressions SYNOPSIS ======== use Parse::Template; my %template = ( 'TOP' => q!Text before %%$self->eval('DATA')%% text after!, 'DATA' => q!Insert data: ! . q!1. List: %%"@list$N"%%! . q!2. Hash: %%"$hash{'key_value'}$N"%%! . q!3. File content: %%print %%! . q!4. Sub: %%&SUB()$N%%! ); my $tmplt = new Parse::Template (%template); open FH, "< foo"; $tmplt->env('var' => '(value!)'); $tmplt->env('list' => [1, 2, 10], 'N' => "\n", 'FH' => \*FH, 'SUB' => sub { "->content generated by a sub<-" }, 'hash' => { 'key_value' => q!It\'s an hash value! }); print $tmplt->eval('TOP'), "\n"; DESCRIPTION =========== The `Parse::Template' class evaluates Perl expressions placed within a text. This class can be used as a code generator, or a generator of documents in various document formats (HTML, XML, RTF, etc.). The principle of template-based text generation is simple. A template consists of a text which includes expressions to be evaluated. Interpretation of these expressions generates text fragments which are substituted in place of the expressions. In the case of `Parse::Template' the expressions to be evaluated are placed within two `%%'. Evaluation takes place within an environment in which, for example, you can place data structures which will serve to generate the parts to be completed. TEMPLATE Text + Perl Expression | +-----> Evaluation ----> Text(document or program) | Subs + Data structures ENVIRONMENT The `Parse::Template' class permits decomposing a template into parts. These parts are defined by a hash passed as an argument to the class constructor: `Parse::Template-'>`new('someKey', '... text with expressions to evaluate ...')'. Within a part, a sub-part can beincluded by means of an expression of the form: $self->eval('SUB_PART_NAME') `$self' designates the instance of the `Parse::Template' class. In an expression you can also use the `$part' which contains the part of the template where the expression is found. Within an expression it is possible to specify only the name of a part to be inserted. In this case a subroutine with the name of this part is generated dynamically. In the example given in the synopsis, the insertion of the `TOP' part can thus be rewritten as follows: 'TOP' => q!Text before %%DATA()%% text after! `DATA()' is placed within `%%' and is in effect treated as an expression to be evaluated. The subroutines take arguments. In the following example, the argument is used to control the depth of recursive calls of a template: print Parse::Template->new( 'TOP' => q!%%$_[0] < 10 ? '[' . TOP($_[0] + 1) . ']' : ''%%! )->eval('TOP', 0); `$_[0]' initially contains 0. `TOP' is included as long as the argument is less than 10. For each inclusion, 1 is added to the argument. The `env()' method permits constructing the environment required for evaluation of a template. Each entry to be defined within this environment must be specified using a key consisting of the name of the symbol to be created, associated with a reference whose type is that of the entry to be created within this environment (for example, a reference to an array to create an array). A scalar variable is defined by associating the name of the variable with its value. A scalar variable containing a reference is defined by writing `'var'='>`\$variable', where `$variable' is a lexical variable that contains the reference. Each instance of `Parse::Template' is defined within a specific class, a subclass of `Parse::Template'. The subclass contains the environment specific to the template and inherits methods from the `Parse::Template' class. In case of a syntax error in the evalutaion of an expression, `Parse::Template' tries to indicate the template part and the expression that is "incriminated". If the variable `$Parse::Template::CONFESS' contains the value TRUE, the stack of evaluations is printed. METHODS ======= new HASH Constructor for the class. HASH is a hash which defines the template text. Example: use Parse::Template; $t = new Parse::Template('key' => 'associated text'); env HASH env SYMBOL Permits defining the environment that is specific to a template. `env(SYMBOL)' returns the reference associated with the symbol, or undef if the symbol is not defined. The reference that is returned is of the type indicated by the character (`&, $, %, @, *') that prefixes the symbol. Examples: $tmplt->env('LIST' => [1, 2, 3])} Defines a list @{$tmplt->env('*LIST')} Returns the list @{$tmplt->env('@LIST')} Ditto eval PART_NAME Evaluates the template part designated by `PART_NAME'. Returns the string resulting from this evaluation. getPart PART_NAME Returns the designated part of the template. ppregexp REGEXP Preprocesses a regular expression so that it can be inserted into a template where the regular expression delimiter is either a "/" or a "!". setPart PART_NAME => TEXT `setPart()' permits defining a new entry in the hash that defines the contents of the template. EXAMPLES ======== The `Parse::Template' class can be used in all sorts of amusing ways. Here are a few illustrations. The first example shows how to generate an HTML document by using a data structure placed within the evaluation environment. The template consists of two parts, DOC and SECTION. The SECTION part is called within the DOC part to generate as many sections as there are elements in the array `section_content'. my %template = ('DOC' => <<'END_OF_DOC;', 'SECTION' => <<'END_OF_SECTION;'); %% my $content; for (my $i = 0; $i <= $#section_content; $i++) { $content .= SECTION($i); } $content; %% END_OF_DOC; %% $section_content[$_[0]]->{Content} =~ s/^/

/mg; join '', '

', $section_content[$_[0]]->{Title}, '

', $section_content[$_[0]]->{Content}; %% END_OF_SECTION; my $tmplt = new Parse::Template (%template); $tmplt->env('section_content' => [ { Title => 'First Section', Content => 'Nothing to declare' }, { Title => 'Second section', Content => 'Nothing else to declare' } ] ); print $tmplt->eval('DOC'), "\n"; The second example shows how to generate an HTML document using a functional notation, in other words, obtaining the text:

text in boldtext in italic

from P(B("text in bold"), I("text in italic")) The Perl expression that permits producing the content of an element is very simple, and reduces to: join '', @_ The content to be evaluated is the same regardless of the tag and can therefore be placed within a variable. We therefore obtain the following template: my $ELT_CONTENT = q!%%join '', @_%%!; my $HTML_T1 = new Parse::Template( 'DOC' => '%%P(B("text in bold"), I("text in italic"))%%', 'P' => qq!

$ELT_CONTENT

!, 'B' => qq!$ELT_CONTENT!, 'I' => qq!$ELT_CONTENT!, ); print $HTML_T1->eval('DOC'), "\n"; We can go further by making use of the `$part' variable, which is defined by default in the environment of evaluation of the template: $ELT_CONTENT = q!%%"<$part>" . join('', @_) . ""%%!; $HTML_T2 = new Parse::Template( 'DOC' => '%%P(B("text in bold"), I("text in italic"))%%', 'P' => qq!$ELT_CONTENT!, 'B' => qq!$ELT_CONTENT!, 'I' => qq!$ELT_CONTENT!, ); print $HTML_T2->eval('DOC'), "\n"; Let's look at another step which automates the production of expressions from the list of HTML tags which are of interest to us: $DOC = q!P(B("text in bold"), I("text in italic"))!; $ELT_CONTENT = q!%%"<$part>" . join('', @_) . ""%%!; $HTML_T3 = new Parse::Template( 'DOC' => qq!%%$DOC%%!, map { $_ => $ELT_CONTENT } qw(P B I) ); print $HTML_T3->eval('DOC'), "\n"; With a slight transformation it is possible to use a method-invocation notation: $ELT_CONTENT = q!%%shift(@_); "<$part>" . join('', @_) . ""%%!; $HTML_T4 = new Parse::Template( map { $_ => $ELT_CONTENT } qw(P B I) ); print $HTML_T4->P( $HTML_T4->B("text in bold"), $HTML_T4->I("text in italic") ), "\n"; The `shift(@_)' permits getting rid of the template object, which we don't need within the expression. `Parse::Template' was initially created to serve as a code generator for the `Parse::Lex' class. You will find other examples of its use in the classes `Parse::Lex', `Parse::CLex' and `Parse::Token'. NOTES CONCERNING THE CURRENT VERSION ==================================== I would be very interested to receive your comments and suggestions. English documentation isn't up to date. BUG === Instances are not destroyed. Therefore, do not use this class to create a large number of instances. AUTHOR ====== Philippe Verdret (with translation of documentation into English by Ocrat). COPYRIGHT ========= Copyright (c) 1995-2000 Philippe Verdret. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. File: pm.info, Node: Parse/Text, Next: Parse/Token, Prev: Parse/Template, Up: Module List Perl module for parsing plain text files **************************************** NAME ==== Parse::Text - Perl module for parsing plain text files SYNOPSIS ======== use Parse::Text; DESCRIPTION =========== This is currently a place holder for a very powerful, full feature plain text parser. EXPORT ------ None by default. AUTHOR ====== Casey Tweten, SEE ALSO ======== *Note Perl: (perl.info)perl,. File: pm.info, Node: Parse/Token, Next: Parse/Tokens, Prev: Parse/Text, Up: Module List Definition of tokens used by `Parse::Lex' ***************************************** NAME ==== `Parse::Token' - Definition of tokens used by `Parse::Lex' SYNOPSIS ======== require 5.005; use Parse::Lex; @token = qw( ADDOP [-+] INTEGER [1-9][0-9]* ); $lexer = Parse::Lex->new(@token); $lexer->from(\*DATA); $content = $INTEGER->next; if ($INTEGER->status) { print "$content\n"; } $content = $ADDOP->next; if ($ADDOP->status) { print "$content\n"; } if ($INTEGER->isnext(\$content)) { print "$content\n"; } __END__ 1+2 DESCRIPTION =========== The `Parse::Token' class and its derived classes permit defining the tokens used by `Parse::Lex' or `Parse::LexEvent'. The creation of tokens can be done by means of the new() or `factory()' methods. The `Lex::new()' method of the `Parse::Lex' package indirectly creates instances of the tokens to be recognized. The next() or `isnext()' methods of the `Parse::Token' package permit interfacing the lexical analyzer with a syntactic analyzer of recursive descent type. For interfacing with `byacc', see the `Parse::YYLex' package. `Parse::Token' is included indirectly by means of `use Parse::Lex' or `use Parse::LexEvent'. Methods ======= action Returns the anonymous subroutine defined within the `Parse::Token' object. factory LIST factory ARRAY_REF The `factory(LIST)' method creates a list of tokens from a list of specifications, which include for each token: a name, a regular expression, and possibly an anonymous subroutine. The list can also include objects of class `Parse::Token' or of a class derived from it. The `factory(ARRAY_REF)' method permits creating tokens from specifications of type attribute-value: Parse::Token->factory([Type => 'Simple', Name => 'EXAMPLE', Regex => '.+']); Type indicates the type of each token to be created (the package prefix is not indicated). `factory()' creates a series of tokens but does not import these tokens into the calling package. You could for example write: %keywords = qw ( PROC undef FUNC undef RETURN undef IF undef ELSE undef WHILE undef PRINT undef READ undef ); @tokens = Parse::Token->factory(%keywords); and install these tokens in a symbol table in the following manner: foreach $name (keys %keywords) { ${$name} = pop @tokens; $symbol{"\L$name"} = [${$name}, '']; } `${$name}' is the token instance. During the lexical analysis phase, you can use the tokens in the following manner: qw(IDENT [a-zA-Z][a-zA-Z0-9_]*), sub { $symbol{$_[1]} = [] unless defined $symbol{$_[1]}; my $type = $symbol{$_[1]}[0]; $lexer->setToken((not defined $type) ? $VAR : $type); $_[1]; # THE TOKEN TEXT } This permits indicating that any symbol of unknown type is a variable. In this example we have used `$_[1]' which corresponds to the text recognized by the regular expression. This text associated with the token must be returned by the anonymous subroutine. get EXPR get obtains the value of the attribute named by the result of evaluating EXPR. You can also use the name of the attribute as a method name. getText Returns the character string that was recognized by means of this `Parse::Token' object. Same as the text() method. isnext EXPR isnext Returns the status of the token. The consumed string is put into EXPR if it is a reference to a scalar. name Returns the name of the token. next Activate searching for the lexeme defined by the regular expression contained in the object. If this lexeme is recognized on the character stream to analyze, next returns the string found and sets the status of the object to true. new SYMBOL_NAME, REGEXP, SUB new SYMBOL_NAME, REGEXP Creates an object of type Parse::Token::Simple or Parse::Token::Segmented. The arguments of the new() method are, respectively: a symbolic name, a regular expression, and possibly an anonymous subroutine. The subclasses of `Parse::Token' permit specifying tokens by means of a list of attribute-values. REGEXP is either a simple regular expression, or a reference to an array containing from one to three regular expressions. In the first case, the instance belongs to the Parse::Token::Simple class. In the second case, the instance belongs to the Parse::Token::Segmented class. The tokens of this type permit recognizing structures of type character string delimited by quotation marks, comments in a C program, etc. The regular expressions are used to recognize: 1. The beginning of the lexeme, 2. The "body" of the lexeme; if this second expression is missing, `Parse::Lex' uses "(?:.*?)", 3. the end of the lexeme; if this last expression is missing then the first one is used. (Note! The end of the lexeme cannot span several lines). Example: qw(STRING), [qw(" (?:[^"\\\\]+|\\\\(?:.|\n))* ")], These regular expressions can recognize multi-line strings delimited by quotation marks, where the backslash is used to quote the quotation marks appearing within the string. Notice the quadrupling of the backslash. Here is a variation of the previous example which uses the s option to include newline in the characters recognized by ".": qw(STRING), [qw(" (?s:[^"\\\\]+|\\\\.)* ")], (Note: it is possible to write regular expressions which are more efficient in terms of execution time, but this is not our objective with this example. See *Mastering Regular Expressions*.) The anonymous subroutine is called when the lexeme is recognized by the lexical analyzer. This subroutine takes two arguments: `$_[0]' contains the token instance, and `$_[1]' contains the string recognized by the regular expression. The scalar returned by the anonymous subroutine defines the character string memorized in the token instance. In the anonymous subroutine you can use the positional variables $1, $2, etc. which correspond to the groups of parentheses in the regular expression. regexp Returns the regular expression of the `Token' object. set LIST Allows marking a token with a list of attribute-value pairs. An attribute name can be used as a method name. setText EXPR The value of EXPR defines the character string associated with the lexeme. Same as the `text(EXPR)' method. status EXPR status Indicates if the last search of the lexeme succeeded or failed. `status EXPR' overrides the existing value and sets it to the value of EXPR. text EXPR text text() returns the character string recognized by means of the token. The value of EXPR sets the character string associated with the lexeme. trace OUTPUT trace Class method which activates/deactivates a trace of the lexical analysis. OUTPUT can be a file name or a reference to a filehandle to which the trace will be directed. Subclasses of Parse::Token ========================== Subclasses of the `Parse::Token' class are being defined. They permit recognizing specific structures such as, for example, strings within double-quotes, C comments, etc. Here are the subclasses which I am working on: Parse::Token::Simple : tokens of this class are defined by means of a single regular expression. Parse::Token::Segmented : tokens of this class are defined by means of three regular expressions. Reading of new data is done automatically. Parse::Token::Delimited : permits recognizing, for example, C language comments. Parse::Token::Quoted : permits recognizing, for example, character strings within quotation marks. `Parse::Token::Nested' : permits recognizing nested structures such as parenthesized expressions. NOT DEFINED. These classes are recently created and no doubt contain some bugs. Parse::Token::Action -------------------- Tokens of the Parse::Token::Action class permit inserting arbitrary Perl expressions within a lexical analyzer. An expression can be used for instance to print out internal variables of the analyzer: * `$LEX_BUFFER' : contents of the buffer to be analyzed * `$LEX_LENGTH' : length of the character string being analyzed * `$LEX_RECORD' : number of the record being analyzed * `$LEX_OFFSET' : number of characters already consumed since the start of the analysis. * `$LEX_POS' : position reached by the analysis as a number of characters since the start of the buffer. The class constructor accepts the following attributes: * Name : the name of the token * Expr : a Perl expression Example : $ACTION = new Parse::Token::Action( Name => 'ACTION', Expr => q!print "LEX_POS: $LEX_POS\n" . "LEX_BUFFER: $LEX_BUFFER\n" . "LEX_LENGTH: $LEX_LENGTH\n" . "LEX_RECORD: $LEX_RECORD\n" . "LEX_OFFSET: $LEX_OFFSET\n" ;!, ); Parse::Token::Simple -------------------- The class constructor accepts the following attributes: * Handler : the value indicates the name of a function to call during an analysis performed by an analyzer of class `Parse::LexEvent'. * Name : the associated value is the name of the token. * `Regex' : the associated value is a regular expression corresponding to the pattern to be recognized. * `ReadMore' : if the associated value is 1, the recognition of the token continues after reading a new record. The strings recognized are concatenated. This attribute only has effect during analysis of a character stream. * `Sub' : the associated value must be an anonymous subroutine to be executed after the token is recognized. This function is only used with analyzers of class `Parse::Lex' or `Parse::CLex'. Example. new Parse::Token::Simple(Name => 'remainder', Regex => '[^/\'\"]+', ReadMore => 1); Parse::Token::Segmented ----------------------- The definition of these tokens includes three regular expressions. During analysis of a data stream, new data is read as long as the end of the token has not been reached. The class constructor accepts the following attributes: * Handler : the value indicates the name of a function to call during analysis performed by an analyzer of class `Parse::LexEvent'. * Name : the associated value is the name of the token. * `Regex' : the associated value must be a reference to an array that contains three regular expressions. * `Sub' : the associated value must be an anonymous subroutine to be executed after the token is recognized. This function is only used with analyzers of class `Parse::Lex' or `Parse::CLex'. Parse::Token::Quoted -------------------- Parse::Token::Quoted is a subclass of Parse::Token::Segmented. It permits recognizing character strings within double quotes or single quotes. Examples. --------------------------------------------------------- Start End Escaping --------------------------------------------------------- ' ' '' " " "" " " \ --------------------------------------------------------- The class constructor accepts the following attributes: * End : The associated value is a regular expression permitting recognizing the end of the token. * Escape : The associated value indicates the character used to escape the delimiter. By default, a double occurrence of the terminating character escapes that character. * Handler : the value indicates the name of a function to be called during an analysis performed by an analyzer of class `Parse::LexEvent'. * Name : the associated value is the name of the token. * Start : the associated value is a regular expression permitting recognizing the start of the token. * `Sub' : the associated value must be an anonymous subroutine to be executed after the token is recognized. This function is only used with analyzers of class `Parse::Lex' or `Parse::CLex'. Example. new Parse::Token::Quoted(Name => 'squotes', Handler => 'string', Escape => '\\', Quote => qq!\'!, ); Parse::Token::Delimited ----------------------- Parse::Token::Delimited is a subclass of Parse::Token::Segmented. It permits, for example, recognizing C language comments. Examples. --------------------------------------------------------- Start End Constraint on the contents --------------------------------------------------------- /* */ C Comment No '--' XML Comment SGML Comment Processing instruction in SGML/XML --------------------------------------------------------- The class constructor accepts the following attributes: * End : The associated value is a regular expression permitting recognizing the end of the token. * Handler : the value indicates the name of a function to be called during an analysis performed by an analyzer of class `Parse::LexEvent'. * Name : the associated value is the name of the token. * Start : the associated value is a regular expression permitting recognizing the start of the token. * `Sub' : the associated value must be an anonymous subroutine to be executed after the token is recognized. This function is only used with analyzers of class `Parse::Lex' or `Parse::CLex'. Example. new Parse::Token::Delimited(Name => 'comment', Start => '/[*]', End => '[*]/' ); Parse::Token::Nested - Not defined ---------------------------------- Examples. ---------------------------------------------------------- Start End ---------------------------------------------------------- ( ) Symbolic Expressions { } Rich Text Format Groups ---------------------------------------------------------- BUGS ==== The implementation of subclasses of tokens is not complete for analyzers of the `Parse::CLex' class. I am not too keen to do it, since an implementation for classes `Parse::Lex' and `Parse::LexEvent' seems quite sufficient. AUTHOR ====== Philippe Verdret. Documentation translated to English by Vladimir Alexiev and Ocrat. ACKNOWLEDGMENTS =============== Version 2.0 owes much to suggestions made by Vladimir Alexiev. Ocrat has significantly contributed to improving this documentation. Thanks also to the numerous persons who have made comments or sometimes sent bug fixes. REFERENCES ========== Friedl, J.E.F. Mastering Regular Expressions. O'Reilly & Associates 1996. Mason, T. & Brown, D. - Lex & Yacc. O'Reilly & Associates, Inc. 1990. COPYRIGHT ========= Copyright (c) 1995-1999 Philippe Verdret. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. File: pm.info, Node: Parse/Tokens, Next: Parse/Vipar, Prev: Parse/Token, Up: Module List class for parsing text with embedded tokens ******************************************* NAME ==== Parse::Tokens - class for parsing text with embedded tokens SYNOPSIS ======== use Parse::Tokens; @ISA = ('Parse::Tokens'); # overide SUPER::token sub token { my( $self, $token ) = @_; # $token->[0] - left bracket # $token->[1] - contents # $token->[2] - right bracket # do something with the token... } # overide SUPER::token sub ether { my( $self, $text ) = @_; # do something with the text... } DESCRIPTION =========== `Parse::Tokens' provides a base class for parsing delimited strings from text blocks. Use `Parse::Tokens' as a base class for your own module or script. Very similar in style to HTML::Parser. Functions ========= autoflush() Turn on autoflushing causing the template cash (not the text) to be purged before each parse();. delimiters() Specify delimiters as an array reference pointing to the left and right delimiters. Returns array reference containing two array references of delimiters and escaped delimiters. flush() Flush the template cash. parse() Run the parser. new() Pass parameter as a hash reference. Options are: TEXT - a block of text; DELIMITERS - a array reference consisting of the left and right token delimiters (eg ['']); AUTOFLUSH - 0 or 1 (default). While these are all optional at initialization, both TEXT and DELIMITERS must be set prior to calling parse() or as parameters to parse(). text() Load text. AUTHOR ====== Steve McKay, steve@colgreen.com COPYRIGHT ========= Copyright 2000 by Steve McKay. All rights reserved. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl(1). File: pm.info, Node: Parse/Vipar, Next: Parse/YALALR/Run, Prev: Parse/Tokens, Up: Module List Visual LALR parser debugger *************************** NAME ==== Parse::Vipar - Visual LALR parser debugger SYNOPSIS ======== % vipar expr.y [-data=DATAFILE] DATAFILE would contain a list of tokens, one per line, with optional values after them separated by whitespace. Example: number '+' number '*' number DESCRIPTION =========== Presents a visual display of a LALR parser in action. AUTHOR ====== Steve Fink SEE ALSO ======== Parse::YALALR File: pm.info, Node: Parse/YALALR/Run, Next: Parse/YYLex, Prev: Parse/Vipar, Up: Module List Yet Another LALR parser *********************** NAME ==== Parse::YALALR - Yet Another LALR parser SYNOPSIS ======== From the command line: % yalalr [--lang=c] [--lang=perl] grammar.y In a program: use Parse::YALALR::Build; use Parse::YALALR::Run; open(GRAMMAR, "new("perl", \*GRAMMAR); $builder->build_table(); $parser = $builder->{parser}; @inputstream = ([number=>10], ["'+'"=> undef ], [number=>20]); $_->[0] = $parser->{symmap}->get_index($_->[0]) foreach (@inputstream); Parse::YALALR::Run::run_parser($parser, \@inputstream); DESCRIPTION =========== Generates an LALR parser from an input grammar. Really just intended as a companion to Parse::Vipar, but (sorta) works standalone. Does not yet generate a standalone parser. run_parser will also accept a CODE ref to use as a lexer. Every invocation should return a pair (token, value). The above example is equivalent to $lexer = { my $i = 0; sub { my ($t,$v)=@{$inputstream[$i++]}; ($parser->get_index($t), $v) } }; Parse::YALALR::Run::run_parser($parser, $lexer); AUTHOR ====== Steve Fink SEE ALSO ======== Parse::YALALR File: pm.info, Node: Parse/YYLex, Next: Parse/Yapp, Prev: Parse/YALALR/Run, Up: Module List version of Parse::Lex to be used by a byacc parser. *************************************************** NAME ==== Parse::YYLex - version of Parse::Lex to be used by a byacc parser. SYNOPSIS ======== *Parse::Lex* requires this perl version: require 5.004; use Parse::YYLex; If using a procedural parser: Parse::YYLex->create ...; # exports &yylex and $yylval # see Parse::Lex for the token table args <...> Parse::YYLex::lex->from(\*FH); require 'MyParser.pl'; # generated by byacc yyparse(); If using an object-oriented parser: $lexer = new Parse::YYLex ...; # see Parse::Lex for the token table args <...> use MyParser; # generated by byacc5 $parser = new MyParser($lexer->getyylex, \&yyerror, $debug); # you must write &yyerror $lexer->from(\*STREAM); $parser->yyparse(*STREAM); To get the token definitions from `MyParser.ph' instead of `y.tab.ph' or to change the skip regexp (default whitespace), do this before calling new or create: Parse::YYLex->ytab('MyParser.ph'); Parse::YYLex->skip("); DESCRIPTION =========== Often times you'd use a lexer in conjunction with a parser. And most of the time you'd want to generate that parser with a yacc parser generator. *Parse::YYLex* is a derivative of *Parse::Lex* compatible with yacc parsers, by adapting it to the byacc calling conventions: * The parser wants to receive integer token types as defined in `y.tab.ph' instead of the symbolic types that *Parse::Lex* returns. * The parser wants its tokens as two components (type and value), whereas *Parse::Lex* returns one object with these two components. Furthermore, a procedural parser wants the value stored in a variable `$yylval'. * The parser wants to receive the tokens by calling a yylex function, not an object method. Thus we have to give the parser a curried form of the lexer function, where the self argument is fixed. Procedural Parsers ------------------ Yacc (and Bison) traditionally generate C or C++ parsers. Fortunately, Berkeley yacc has been modified to generate Perl, see ftp://ftp.sterling.com/local/perl-byacc.tar.Z Byacc with the -P option generates procedural perl code that is compatible with both perl4 and perl5. (However you cannot use *Parse::YYLex* with perl4.) Use this variant for quick hacks, as it is more convenient than the one below. In this case `Parse::YYLex-create'> instantiates a lexer and exports a `&yylex' function (the lexer) and a `$yylval' variable (the token value) to its caller's namespace (which should be the namespace of the parser). If you need to call any object methods of the created lexer (see *Parse::Lex* for documentation), use the `$Parse::YYLex::lex' variable. Object-Oriented Parsers ----------------------- Another byacc modification (I call it byacc5) generates object-oriented Perl5 code: CPAN/authors/id/JAKE/perl5-byacc-patches-0.5.tar.gz Use this variant if you need more than one parser, you need flexibility, or you simply like OO. In this case you need to use new, and pass the return value of *getyylex* (a reference to the curried lexing function) to the parser. The lexing function returns a two-element array, the token type and value. Numeric Token Table ------------------- Yacc parsers insist on using numeric token types, and define these in a file customarily named `y.tab.ph'. That is where *Parse::YYLex* will look by default, and the file has to be in the @INC path (which includes the current directory). You can specify a different token table before calling new or create: Parse::YYLex->ytab('MyParser.ph'); LIMITATIONS =========== `Parse::YYLex' is based on *Parse::Lex* which requires perl 5.004 and will not work with earlier versions. A slightly different version, *Parse::CLex*, works with earlier perl versions. It would be easy to allow a choice between *Parse::Lex* and *Parse::CLex*, but the latter has some limitations, and presently seems to have some bugs. AUTHOR ====== Vladimir Alexiev SEE ALSO ======== byacc(1), *Note Parse/Lex: Parse/Lex,. File: pm.info, Node: Parse/Yapp, Next: Parse/iPerl, Prev: Parse/YYLex, Up: Module List Perl extension for generating and using LALR parsers. ***************************************************** NAME ==== Parse::Yapp - Perl extension for generating and using LALR parsers. SYNOPSIS ======== yapp -m MyParser grammar_file.yp ... use MyParser; $parser=new MyParser(); $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub); $nberr=$parser->YYNberr(); $parser->YYData->{DATA}= [ 'Anything', 'You Want' ]; $data=$parser->YYData->{DATA}[0]; DESCRIPTION =========== Parse::Yapp (Yet Another Perl Parser compiler) is a collection of modules that let you generate and use yacc like thread safe (reentrant) parsers with perl object oriented interface. The script yapp is a front-end to the Parse::Yapp module and let you easily create a Perl OO parser from an input grammar file. The Grammar file ---------------- Comments Through all your files, comments are either Perl style, introduced by *#* up to the end of line, or C style, enclosed between */** and **/*. `Tokens and string literals' Through all the grammar files, two kind of symbols may appear: *Non-terminal* symbols, called also *left-hand-side* symbols, which are the names of your rules, and Terminal symbols, called also Tokens. Tokens are the symbols your lexer function will feed your parser with (see below). They are of two flavours: symbolic tokens and string literals. Non-terminals and symbolic tokens share the same identifier syntax: [A-Za-z][A-Za-z0-9_]* String literals are enclosed in single quotes and can contain almost anything. They will be output to your parser file double-quoted, making any special character as such. '"', '$' and '@' will be automatically quoted with '\', making their writing more natural. On the other hand, if you need a single quote inside your literal, just quote it with '\'. You cannot have a literal *'error'* in your grammar as it would confuse the driver with the error token. Use a symbolic token instead. In case you inadvertently use it, this will produce a warning telling you you should have written it error and will treat it as if it were the error token, which is certainly NOT what you meant. `Grammar file syntax' It is very close to yacc syntax (in fact, *Parse::Yapp* should compile a clean *yacc* grammar without any modification, whereas the opposite is not true). This file is divided in three sections, separated by `%%': header section %% rules section %% footer section *The Header Section* section may optionally contain: One or more code blocks enclosed inside `%{' and `%}' just like in yacc. They may contain any valid Perl code and will be copied verbatim at the very beginning of the parser module. They are not as useful as they are in yacc, but you can use them, for example, for global variable declarations, though you will notice later that such global variables can be avoided to make a reentrant parser module. Precedence declarations, introduced by `%left', `%right' and `%nonassoc' specifying associativity, followed by the list of tokens or litterals having the same precedence and associativity. The precedence beeing the latter declared will be having the highest level. (see the yacc or bison manuals for a full explanation of how they work, as they are implemented exactly the same way in Parse::Yapp) `%start' followed by a rule's left hand side, declaring this rule to be the starting rule of your grammar. The default, when `%start' is not used, is the first rule in your grammar section. `%token' followed by a list of symbols, forcing them to be recognized as tokens, generating a syntax error if used in the left hand side of a rule declaration. Note that in Parse::Yapp, you *don't* need to declare tokens as in yacc: any symbol not appearing as a left hand side of a rule is considered to be a token. Other yacc declarations or constructs such as `%type' and `%union' are parsed but (almost) ignored. `%expect' followed by a number, suppress warnings about number of Shift/Reduce conflicts when both numbers match, a la bison. *The Rule Section* contains your grammar rules: A rule is made of a left-hand-side symbol, followed by a ':' and one or more right-hand-sides separated by '|' and terminated by a `';'': exp: exp '+' exp | exp '-' exp ; A right hand side may be empty: input: #empty | input line ; (if you have more than one empty rhs, Parse::Yapp will issue a warning, as this is usually a mistake, and you will certainly have a reduce/reduce conflict) A rhs may be followed by an optional `%prec' directive, followed by a token, giving the rule an explicit precedence (see yacc manuals for its precise meaning) and optionnal semantic action code block (see below). exp: '-' exp %prec NEG { -$_[1] } | exp '+' exp { $_[1] + $_[3] } | NUM ; Note that in Parse::Yapp, a lhs *cannot* appear more than once as a rule name (This differs from yacc). `The footer section' may contain any valid Perl code and will be appended at the very end of your parser module. Here you can write your lexer, error report subs and anything relevant to you parser. `Semantic actions' Semantic actions are run every time a *reduction* occurs in the parsing flow and they must return a semantic value. They are (usually, but see below `In rule actions') written at the very end of the rhs, enclosed with `{ }', and are copied verbatim to your parser file, inside of the rules table. Be aware that matching braces in Perl is much more difficult than in C: inside strings they don't need to match. While in C it is very easy to detect the beginning of a string construct, or a single character, it is much more difficult in Perl, as there are so many ways of writing such literals. So there is no check for that today. If you need a brace in a double-quoted string, just quote it (`\{' or `\}'). For single-quoted strings, you will need to make a comment matching it *in th right order*. Sorry for the inconvenience. { "{ My string block }". "\{ My other string block \}". qq/ My unmatched brace \} /. # Force the match: { q/ for my closing brace } / q/ My opening brace { / # must be closed: } } All of these constructs should work. In Parse::Yapp, semantic actions are called like normal Perl sub calls, with their arguments passed in `@_', and their semantic value are their return values. $_[1] to $_[n] are the parameters just as $1 to $n in yacc, while $_[0] is the parser object itself. Having $_[0] beeing the parser object itself allows you to call parser methods. Thats how the yacc macros are implemented: yyerrok is done by calling $_[0]->YYErrok YYERROR is done by calling $_[0]->YYError YYACCEPT is done by calling $_[0]->YYAccept YYABORT is done by calling $_[0]->YYAbort All those methods explicitly return undef, for convenience. YYRECOVERING is done by calling $_[0]->YYRecovering Four useful methods in error recovery sub $_[0]->YYCurtok $_[0]->YYCurval $_[0]->YYExpect $_[0]->YYLexer return respectivly the current input token that made the parse fail, its semantic value (both can be used to modify their values too, but *know what you are doing* ! See *Error reporting routine* section for an example), a list which contains the tokens the parser expected when the failure occured and a reference to the lexer routine. Note that if `$_[0]->YYCurtok' is declared as a `%nonassoc' token, it can be included in `$_[0]->YYExpect' list whenever the input try to use it in an associative way. This is not a bug: the token IS expected to report an error if encountered. To detect such a thing in your error reporting sub, the following example should do the trick: grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect and do { #Non-associative token used in an associative expression }; Accessing semantics values on the left of your reducing rule is done through the method $_[0]->YYSemval( index ) where index is an integer. Its value being *1 .. n* returns the same values than *$_[1] .. $_[n]*, but *-n .. 0* returns values on the left of the rule beeing reduced (It is related to *$-n .. $0 .. $n* in yacc, but you cannot use *$_[0]* or *$_[-n]* constructs in Parse::Yapp for obvious reasons) There is also a provision for a user data area in the parser object, accessed by the method: $_[0]->YYData which returns a reference to an anonymous hash, which let you have all of your parsing data held inside the object (see the Calc.yp or ParseYapp.yp files in the distribution for some examples). That's how you can make you parser module reentrant: all of your module states and variables are held inside the parser object. Note: unfortunatly, method calls in Perl have a lot of overhead, and when YYData is used, it may be called a huge number of times. If your are not a *real* purist and efficiency is your concern, you may access directly the user-space in the object: $parser->{USER} wich is a reference to an anonymous hash array, and then benchmark. If no action is specified for a rule, the equivalant of a default action is run, which returns the first parameter: { $_[1] } `In rule actions' It is also possible to embed semantic actions inside of a rule: typedef: TYPE { $type = $_[1] } identlist { ... } ; When the Parse::Yapp's parser encounter such an embedded action, it modifies the grammar as if you wrote (although @x-1 is not a legal lhs value): @x-1: /* empty */ { $type = $_[1] }; typedef: TYPE @x-1 identlist { ... } ; where x is a sequential number incremented for each "in rule" action, and *-1* represents the "dot position" in the rule where the action arises. In such actions, you can use *$_[1]..$_[n]* variables, which are the semantic values on the left of your action. Be aware that the way Parse::Yapp modifies your grammar because of *in rule actions* can produce, in some cases, spurious conflicts that wouldn't happen otherwise. `Generating the Parser Module' Now that you grammar file is written, you can use yapp on it to generate your parser module: yapp -v Calc.yp will create two files `Calc.pm', your parser module, and `Calc.output' a verbose output of your parser rules, conflicts, warnings, states and summary. What your are missing now is a lexer routine. `The Lexer sub' is called each time the parser need to read the next token. It is called with only one argument that is the parser object itself, so you can access its methods, specially the $_[0]->YYData data area. It is its duty to return the next token and value to the parser. They must be returned as a list of two variables, the first one is the token known by the parser (symbolic or literal), the second one beeing anything you want (usualy the content of the token, or the literal value) from a simple scalar value to any complex reference, as the parsing driver never use it but to call semantic actions: ( 'NUMBER', $num ) or ( '>=', '>=' ) or ( 'ARRAY', [ @values ] ) When the lexer reach the end of input, it must return the " empty token with an undef value: ( '', undef ) Note that your lexer should never return `'error'' as token value: for the driver, this is the error token used for error recovery and would lead to odd reactions. Now that you have your lexer written, maybe you will need to output meaningful error messages, instead of the default which is to print 'Parse error.' on STDERR. So you will need an Error reporting sub. item `Error reporting routine' If you want one, write it knowing that it is passed as parameter the parser object. So you can share information whith the lexer routine quite easily. You can also use the `$_[0]->YYErrok' method in it, which will resume parsing as if no error occured. Of course, since the invalid token is still invalid, you're supposed to fix the problem by yourself. The method `$_[0]->YYLexer' may help you, as it returns a reference to the lexer routine, and can be called as ($tok,$val)=&{$_[0]->Lexer} to get the next token and semantic value from the input stream. To make them current for the parser, use: ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val) and know what you're doing... Parsing Now you've got everything to do the parsing. First, use the parser module: use Calc; Then create the parser object: $parser=new Calc; Now, call the YYParse method, telling it where to find the lexer and error report subs: $result=$parser->YYParse(yylex => \&Lexer, yyerror => \&ErrorReport); (assuming Lexer and ErrorReport subs have been written in your current package) The order in which parameters appear is unimportant. Et voila. The YYParse method will do the parse, then return the last semantic value returned, or undef if error recovery cannot recover. If you need to be sure the parse has been successful (in case your last returned semantic value *is* undef) make a call to: $parser->YYNberr() which returns the total number of time the error reporting sub has been called. `Error Recovery' in Parse::Yapp is implemented the same way it is in yacc. `Debugging Parser' To debug your parser, you can call the YYParse method with a debug parameter: $parser->YYParse( ... , yydebug => value, ... ) where value is a bitfield, each bit representing a specific debug output: Bit Value Outputs 0x01 Token reading (useful for Lexer debugging) 0x02 States information 0x04 Driver actions (shifts, reduces, accept...) 0x08 Parse Stack dump 0x10 Error Recovery tracing To have a full debugging ouput, use debug => 0x1F Debugging output is sent to STDERR, and be aware that it can produce `huge' outputs. `Standalone Parsers' By default, the parser modules generated will need the Parse::Yapp module installed on the system to run. They use the Parse::Yapp::Driver which can be safely shared between parsers in the same script. In the case you'd prefer to have a standalone module generated, use the -s switch with yapp: this will automagically copy the driver code into your module so you can use/distribute it without the need of the Parse::Yapp module, making it really a `Standalone Parser'. If you do so, please remember to include Parse::Yapp's copyright notice in your main module copyright, so others can know about Parse::Yapp module. `Source file line numbers' by default will be included in the generated parser module, which will help to find the guilty line in your source file in case of a syntax error. You can disable this feature by compiling your grammar with yapp using the -n switch. BUGS AND SUGGESTIONS ==================== If you find bugs, think of anything that could improve Parse::Yapp or have any questions related to it, feel free to contact the author. AUTHOR ====== Francois Desarmenien SEE ALSO ======== yapp(1) perl(1) yacc(1) bison(1). COPYRIGHT ========= The Parse::Yapp module and its related modules and shell scripts are copyright (c) 1998-2001 Francois Desarmenien, France. All rights reserved. You may use and distribute them under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file. If you use the "standalone parser" option so people don't need to install Parse::Yapp on their systems in order to run you software, this copyright noticed should be included in your software copyright too, and the copyright notice in the embedded driver should be left untouched.