This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: Parse/PerlConfig,  Next: Parse/RecDescent,  Prev: Parse/LexEvent,  Up: Module List

parse a configuration file written in Perl
******************************************

NAME
====

   Parse::PerlConfig - parse a configuration file written in Perl

SYNOPSIS
========

     use Parse::PerlConfig;
     my $parsed = Parse::PerlConfig::parse(
         File            =>      "/etc/perlapp/conf",
         Handlers        =>      [\%config, \&config],
     );

DESCRIPTION
===========

   This module is useful for parsing a configuration file written in Perl
and obtaining the values defined therein.  This is achieved through the
parse() function, which creates a namespace, reads in Perl code, evals it,
and then examines the namespace's symbol table.  Symbols are then
processed into a hash and returned.

Export
------

   The parse() function is exportable upon request.

Parsing
-------

   Parsing is not a simple do("filename").  Instead the filenames
specified are opened, read, eval'd, and closed.  The justification for
this being twofold:

no @INC search
     I did not want surprises in what file was found; do("file") searches
     @INC.

lexicals
     I wanted to be able to insert lexicals for the code in the file to
     see.  Being able to define variables without having them parsed back
     out (remember, the namespace is searched) is a nice feature.

   Parsing (in this manner) requires a namespace.  By default, the
namespace is constructed by appending Namespace_Base to a unique
identifier (currently, an encoded version of the filename, but don't rely
on this).  You can override this behaviour by specifying an explicit
Namespace argument.

   Prior to eval'ing the contents of a configuration file the lexical hash
%parse_perl_config is initialized with several keys (documented below); if
a Lexicals argument was given each of the lexicals specified are
initialized.

   There are a few caveats; lexicals specified in the Lexicals argument
cannot override %parse_perl_config; keys specified in Lexicals cannot be
code references, because code references cannot currently be reliably
reconstructed; modifications to %parse_perl_config keys (other than Error,
documented below) are discouraged, as the results are not defined.

   The %parse_perl_config hash contains the following keys:

Error
     Making this key a true value will cause the error handler Error_eval
     to be called with the value.

Namespace
     The namespace the file is being evaluated in.

Filename
     The name of the file being parsed.

Parse_Args
     A hash of the arguments passed to parse().

   Once the namespace has been setup, and the code eval'd, it is then
parse()'s job to go through the namespace's symbol table and look for
"things".  What it looks for depends on the Thing_Order and Symbols
arguments.

   After that, handlers are updated, and a hash reference of what was
parsed out is returned.

The parse subroutine
====================

Arguments
---------

   parse() takes a list of key-value pairs as its arguments and adds them
to an argument hash.  If the first argument to parse() is a hash or array
reference, it is dereferenced and used as if it were specified as a list.
All elements following this argument are added to the arguments hash, and
they override any settings specified by the reference.

   This means the call:

     parse(
         { Files => "/home/me/config.conf", Error_default => 'fwarn' },
         Files => "/home/you/config.conf"
     );

   causes parse()'s argument hash to consist of the following (ignoring
default settings):

     Files           =>  "/home/you/config.conf",
     Error_default   =>  "fwarn",

   Simply replace the braces, {}, with brackets, [], and you get the same
result.  This makes it convenient to store commonly-used arguments to
parse() in a hash or array, and efficiently pass these arguments to
parse(), while still allowing a seperate Files argument for each call.

   The below itemization of parse()'s arguments describes key-value pairs.
Each item consists of a key name and a description of the expected value
for that key.

   The value description requires some explanation.

   A single pipe, "|", indicates alternative values; only one of the values
must be specified.

   Values bracketed with "<" and ">" indicate that value is not literal,
but figurative.  So, in the case of <coderef>, you must specify a code
reference (a closure or reference to a named subroutine), not the literal
string "<coderef>".  Values without such bracketing are literal.

   Braces, {}, indicate a hash reference is required; brackets, [],
indicate an array reference is required.

   Below each key-value description is a description of the default
setting, followed by a description of what the argument means.

Files <filename> | [<filename> ...]
          default: none, this argument is required

     This is the file or files you wish to parse.  If a file cannot be
     parsed for any reason the entire parse is not abandoned, the file is
     simply skipped (after calling an appropriate error handling function).

File <filename>
     Equivalent to Files <filename>.

Handlers <hashref>|<coderef> | [<hashref>|<coderef> ...]
          default: none

     By default, parse() simply returns a hash reference of symbol names
     and their values.  Given a Handlers argument, parse() will add
     key-value pairs to each hash reference specified, and call each code
     reference specified with a single argument, the hash reference it
     returns.

Handler <hashref>|<coderef>
     Equivalent to Handlers <hashref>|<coderef>.

Lexicals <hashref>
          default: none

     The key-value pairs in the specified hashref are made into lexical
     variables in the configuration eval.  See the section on Parsing for
     further information.

Thing_Order <string>|<arrayref>
          default: '$@%&i*'

     Specifies the default thing order for symbols parsed from each
     configuration file.  See the section Things for further information.

Taint_Clean <boolean>
          default: false

     If set to any true value the filehandle opened on the configuration
     file is untainted before evaling the code contained therein.  Because
     this involves loading IO::Handle, which involves quite a bit of code,
     the option is turned off by default.  You *will* get taint exceptions
     if you don't specify this option while running in -T mode.

     Also, as the namespace is currently constructed, having a tainted
     filename will cause the namespace name to be tainted, so it is also
     untainted.  In the case of an explicitly specified Namespace value,
     it will also be untainted.

     No other values are untainted.  This includes any key-value pairs
     specified by the Lexicals argument; you must untaint those yourself,
     since there is no reasonable way for parse() to determine how best to
     untaint them.

Symbols <hashref>
          default: empty hashref

     This is an override for the Thing_Order argument above.  The keys in
     the specified hashref are symbols you want parsed specially, the
     values the thing order (either a string or array reference).  See the
     section Things for further information regarding thing order.

Namespace <string>
          default: generated from Namespace_Base and a unique identifier

     This option explicitly specifies the namespace the files are parsed
     in.  See the section Parsing for further information.

Namespace_Base <string>
          default: Parse::PerlConfig::ConfigFile

     Unless the Namespace argument is specified, the namespace a file is
     parsed in is generated by appending a cleaned up version of the
     filename to this setting.  See the section Parsing for further
     information.


     See the section Error Handling for further information.

Things
------

   "Things" (as taken from the Perl documentation, regarding the
*foo{THING} syntax) are the Perl datatypes.  These include scalars,
arrays, hashes, subroutines, IO handles, and globs.

   Anywhere a "things" argument is required you can specify one of two
things; a string containing the special "thing" characters, or an array
reference of each thing's actual name.  The thing characters are as
follows: `$' for scalar, % for a hash, `@' for an array, & for a
subroutine, i for an IO handle, and * for a glob.  The full name for each
coincides with the full name for each datatype in their respective glob
slots: SCALAR for a scalar, HASH for a hash, ARRAY for an array, CODE for
a subroutine, IO for an IO handle, and GLOB for a glob.

Exception Handling
------------------

   parse() takes various Error_* and Warn_* arguments that determine how it
handles any problems it encounters.  Each argument can take one of several
values.

default
     The error handling specifed by Error_default in the case of an Error_*
     argument, or Warn_default in the case of a Warn_* argument, is used.

noop
     The error is ignored.

warn
     Results in a call to CORE::warn() with a trailing newline, but only if
     $^W is set to a true value.

fwarn
     Like warn, but the warning is raised regardless of $^W's value.

die
     Results in a call to CORE::die() with a trailing newline.

<code reference>
     The code reference will be called with a single argument, that of the
     error message.  The error message is guaranteed to contain no
     trailing newlines (in case the code reference decides to die() or
     warn()).

   There are various handler arguments.  Unless otherwise specified, the
default handler is used (Error_default's or Warn_default's value).

Warn_default
          default: noop

     The default warning handler.

Warn_preparse
     Called just before a file is parsed to indicate parsing is about to
     begin.

Warn_eval
     Called with any warnings issued by the eval'd file.

Error_default
          default: warn

     The default error handler.

Error_argument
     Called if there is a problem with one of the arguments specified.

Error_file_is_dir
     Called if a configuration file specified was discovered to be a
     directory.

Error_failed_open
     Called if the open attempt on a configuration file fails.

Error_eval
     Called if the variable $parse_perl_config{Error} is set in the
     configuration file, or if there was an eval error.

Error_unknown_thing
     Called if there is a problem with a thing character or thing name in
     a thing argument (thing thing thing).

Error_unknown_handler
     Called if an unknown reference is encountered in the Handlers
     argument.

Error_invalid_lexical
     Called if an invalid lexical name or a CODE reference value is
     encountered in the Lexicals argument.

Error_invalid_namespace
     Called if either the constructed namespace (using Namespace_Base) or a
     specified Namespace value is invalid.  This may indicate an error in
     the construction of a namespace name (the generation of a unique
     identifier), but it's most likely you specified Namespace_Base or
     Namespace with invalid characters.

BUGS
====

   Due to the fact that the scalar slot in a glob is always filled it is
not possible to distinguish from a scalar that was never defined (e.g.
`@foo' was, but `$foo' was never mentioned) from one that is simply undef.
Because of this, for example, if you have a thing order of `$@' and code
along the lines of `$foo = undef; @foo = ();' the 'foo' key of the hash
will be an array reference, despite there being a scalar and `$' coming
first in the thing order.

TODO
====

   t/parse/symbols.t, t/parse/multi-file.t, t/parse/namespace.t

AUTHOR
======

   Michael Fowler <michael@shoebox.net>


File: pm.info,  Node: Parse/RecDescent,  Next: Parse/RecDescent/Consumer,  Prev: Parse/PerlConfig,  Up: Module List

Generate Recursive-Descent Parsers
**********************************

NAME
====

   Parse::RecDescent - Generate Recursive-Descent Parsers

VERSION
=======

   This document describes version 1.80 of Parse::RecDescent, released
January 20, 2001.

SYNOPSIS
========

     use Parse::RecDescent;

     # Generate a parser from the specification in $grammar:

     $parser = new Parse::RecDescent ($grammar);

     # Generate a parser from the specification in $othergrammar

     $anotherparser = new Parse::RecDescent ($othergrammar);

     # Parse $text using rule 'startrule' (which must be
     # defined in $grammar):

     $parser->startrule($text);

     # Parse $text using rule 'otherrule' (which must also
     # be defined in $grammar):

     $parser->otherrule($text);

     # Change the universal token prefix pattern
     # (the default is: '\s*'):

     $Parse::RecDescent::skip = '[ \t]+';

     # Replace productions of existing rules (or create new ones)
     # with the productions defined in $newgrammar:

     $parser->Replace($newgrammar);

     # Extend existing rules (or create new ones)
     # by adding extra productions defined in $moregrammar:

     $parser->Extend($moregrammar);

     # Global flags (useful as command line arguments under -s):

     $::RD_ERRORS	   # unless undefined, report fatal errors
     $::RD_WARN	   # unless undefined, also report non-fatal problems
     $::RD_HINT	   # if defined, also suggestion remedies
     $::RD_TRACE	   # if defined, also trace parsers' behaviour
     $::RD_AUTOSTUB	   # if defined, generates "stubs" for undefined rules
     $::RD_AUTOACTION   # if defined, appends specified action to productions

DESCRIPTION
===========

Overview
--------

   Parse::RecDescent incrementally generates top-down recursive-descent
text parsers from simple *yacc*-like grammar specifications. It provides:

   * Regular expressions or literal strings as terminals (tokens),

   * Multiple (non-contiguous) productions for any rule,

   * Repeated and optional subrules within productions,

   * Full access to Perl within actions specified as part of the grammar,

   * Simple automated error reporting during parser generation and parsing,

   * The ability to commit to, uncommit to, or reject particular
     productions during a parse,

   * The ability to pass data up and down the parse tree ("down" via
     subrule argument lists, "up" via subrule return values)

   * Incremental extension of the parsing grammar (even during a parse),

   * Precompilation of parser objects,

   * User-definable reduce-reduce conflict resolution via "scoring" of
     matching productions.

Using Parse::RecDescent
-----------------------

   Parser objects are created by calling `Parse::RecDescent::new', passing
in a grammar specification (see the following subsections). If the grammar
is correct, new returns a blessed reference which can then be used to
initiate parsing through any rule specified in the original grammar. A
typical sequence looks like this:

     $grammar = q {
     		# GRAMMAR SPECIFICATION HERE
     	     };

     $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n";

     # acquire $text

     defined $parser->startrule($text) or print "Bad text!\n";

   The rule through which parsing is initiated must be explicitly defined
in the grammar (i.e. for the above example, the grammar must include a
rule of the form: "startrule: <subrules>".

   If the starting rule succeeds, its value (see below) is returned.
Failure to generate the original parser or failure to match a text is
indicated by returning undef. Note that it's easy to set up grammars that
can succeed, but which return a value of 0, "0", or "".  So don't be
tempted to write:

     $parser->startrule($text) or print "Bad text!\n";

   Normally, the parser has no effect on the original text. So in the
previous example the value of $text would be unchanged after having been
parsed.

   If, however, the text to be matched is passed by reference:

     $parser->startrule(\$text)

   then any text which was consumed during the match will be removed from
the start of $text.

Rules
-----

   In the grammar from which the parser is built, rules are specified by
giving an identifier (which must satisfy /[A-Za-z]\w*/), followed by a
colon *on the same line*, followed by one or more productions, separated
by single vertical bars. The layout of the productions is entirely
free-format:

     rule1:	production1
          |  production2 |
     	production3 | production4

   At any point in the grammar previously defined rules may be extended
with additional productions. This is achieved by redeclaring the rule with
the new productions. Thus:

     rule1: a | b | c
     rule2: d | e | f
     rule1: g | h

   is exactly equivalent to:

     rule1: a | b | c | g | h
     rule2: d | e | f

   Each production in a rule consists of zero or more items, each of which
may be either: the name of another rule to be matched (a "subrule"), a
pattern or string literal to be matched directly (a "token"), a block of
Perl code to be executed (an "action"), a special instruction to the
parser (a "directive"), or a standard Perl comment (which is ignored).

   A rule matches a text if one of its productions matches. A production
matches if each of its items match consecutive substrings of the text. The
productions of a rule being matched are tried in the same order that they
appear in the original grammar, and the first matching production
terminates the match attempt (successfully). If all productions are tried
and none matches, the match attempt fails.

   Note that this behaviour is quite different from the "prefer the longer
match" behaviour of *yacc*. For example, if *yacc* were parsing the rule:

     seq : 'A' 'B'
         | 'A' 'B' 'C'

   upon matching "AB" it would look ahead to see if a 'C' is next and, if
so, will match the second production in preference to the first. In other
words, *yacc* effectively tries all the productions of a rule
breadth-first in parallel, and selects the "best" match, where "best"
means longest (note that this is a gross simplification of the true
behaviour of *yacc* but it will do for our purposes).

   In contrast, Parse::RecDescent tries each production depth-first in
sequence, and selects the "best" match, where "best" means first. This is
the fundamental difference between "bottom-up" and "recursive descent"
parsing.

   Each successfully matched item in a production is assigned a value,
which can be accessed in subsequent actions within the same production
(or, in some cases, as the return value of a successful subrule call).
Unsuccessful items don't have an associated value, since the failure of an
item causes the entire surrounding production to immediately fail. The
following sections describe the various types of items and their success
values.

Subrules
--------

   A subrule which appears in a production is an instruction to the parser
to attempt to match the named rule at that point in the text being parsed.
If the named subrule is not defined when requested the production
containing it immediately fails (unless it was "autostubbed" - see
`Autostubbing' in this node).

   A rule may (recursively) call itself as a subrule, but not as the
left-most item in any of its productions (since such recursions are usually
non-terminating).

   The value associated with a subrule is the value associated with its
`$return' variable (see `"Actions"' in this node below), or with the last
successfully matched item in the subrule match.

   Subrules may also be specified with a trailing repetition specifier,
indicating that they are to be (greedily) matched the specified number of
times. The available specifiers are:

     subrule(?)	# Match one-or-zero times
     subrule(s)	# Match one-or-more times
     subrule(s?)	# Match zero-or-more times
     subrule(N)	# Match exactly N times for integer N > 0
     subrule(N..M)	# Match between N and M times
     subrule(..M)	# Match between 1 and M times
     subrule(N..)	# Match at least N times

   Repeated subrules keep matching until either the subrule fails to
match, or it has matched the minimal number of times but fails to consume
any of the parsed text (this second condition prevents the subrule
matching forever in some cases).

   Since a repeated subrule may match many instances of the subrule
itself, the value associated with it is not a simple scalar, but rather a
reference to a list of scalars, each of which is the value associated with
one of the individual subrule matches. In other words in the rule:

     program: statement(s)

   the value associated with the repeated subrule "statement(s)" is a
reference to an array containing the values matched by each call to the
individual subrule "statement".

   Repetition modifieres may include a separator pattern:

     program: statement(s /;/)

   specifying some sequence of characters to be skipped between each
repetition.  This is really just a shorthand for the <leftop:...> directive
(see below).

Tokens
------

   If a quote-delimited string or a Perl regex appears in a production,
the parser attempts to match that string or pattern at that point in the
text. For example:

     typedef: "typedef" typename identifier ';'

     identifier: /[A-Za-z_][A-Za-z0-9_]*/

   As in regular Perl, a single quoted string is uninterpolated, whilst a
double-quoted string or a pattern is interpolated (at the time of
matching, not when the parser is constructed). Hence, it is possible to
define rules in which tokens can be set at run-time:

     typedef: "$::typedefkeyword" typename identifier ';'

     identifier: /$::identpat/

   Note that, since each rule is implemented inside a special namespace
belonging to its parser, it is necessary to explicitly quantify variables
from the main package.

   Regex tokens can be specified using just slashes as delimiters or with
the explicit `m<delimiter>......<delimiter>' syntax:

     typedef: "typedef" typename identifier ';'

     typename: /[A-Za-z_][A-Za-z0-9_]*/

     identifier: m{[A-Za-z_][A-Za-z0-9_]*}

   A regex of either type can also have any valid trailing parameter(s)
(that is, any of [cgimsox]):

     typedef: "typedef" typename identifier ';'

     identifier: / [a-z_] 		# LEADING ALPHA OR UNDERSCORE
     	      [a-z0-9_]*	# THEN DIGITS ALSO ALLOWED
     	    /ix			# CASE/SPACE/COMMENT INSENSITIVE

   The value associated with any successfully matched token is a string
containing the actual text which was matched by the token.

   It is important to remember that, since each grammar is specified in a
Perl string, all instances of the universal escape character '\' within a
grammar must be "doubled", so that they interpolate to single '\'s when
the string is compiled. For example, to use the grammar:

     word:	    /\S+/ | backslash
     line:	    prefix word(s) "\n"
     backslash:  '\\'

   the following code is required:

     $parser = new Parse::RecDescent (q{

     word:	    /\\S+/ | backslash
     line:	    prefix word(s) "\\n"
     backslash:  '\\\\'

     });

Terminal Separators
-------------------

   For the purpose of matching, each terminal in a production is considered
to be preceded by a "prefix" - a pattern which must be matched before a
token match is attempted. By default, the prefix is optional whitespace
(which always matches, at least trivially), but this default may be reset
in any production.

   The variable `$Parse::RecDescent::skip' stores the universal prefix,
which is the default for all terminal matches in all parsers built with
Parse::RecDescent.

   The prefix for an individual production can be altered by using the
`<skip:...>' directive (see below).

Actions
-------

   An action is a block of Perl code which is to be executed (as the block
of a do statement) when the parser reaches that point in a production. The
action executes within a special namespace belonging to the active parser,
so care must be taken in correctly qualifying variable names (see also
`Start-up Actions' in this node below).

   The action is considered to succeed if the final value of the block is
defined (that is, if the implied do statement evaluates to a defined value
- *even one which would be treated as "false"*). Note that the value
associated with a successful action is also the final value in the block.

   An action will fail if its last evaluated value is undef. This is
surprisingly easy to accomplish by accident. For instance, here's an
infuriating case of an action that makes its production fail, but only
when debugging *isn't* activated:

     description: name rank serial_number
     		{ print "Got $item[2] $item[1] ($item[3])\n"
     			if $::debugging
     		}

   If `$debugging' is false, no statement in the block is executed, so the
final value is undef, and the entire production fails. The solution is:

     description: name rank serial_number
     		{ print "Got $item[2] $item[1] ($item[3])\n"
     			if $::debugging;
     		  1;
     		}

   Within an action, a number of useful parse-time variables are available
in the special parser namespace (there are other variables also
accessible, but meddling with them will probably just break your parser.
As a general rule, if you avoid referring to unqualified variables -
especially those starting with an underscore - inside an action, things
should be okay):

`@item' and `%item'
     The array slice `@item[1..$#item]' stores the value associated with
     each item (that is, each subrule, token, or action) in the current
     production. The analogy is to $1, $2, etc. in a *yacc* grammar.  Note
     that, for obvious reasons, `@item' only contains the values of items
     before the current point in the production.

     The first element (`$item[0]') stores the name of the current rule
     being matched.

     `@item' is a standard Perl array, so it can also be indexed with
     negative numbers, representing the number of items back from the
     current position in the parse:

          stuff: /various/ bits 'and' pieces "then" data 'end'
          		{ print $item[-2] }  # PRINTS data
          				     # (EASIER THAN: $item[6])

     The `%item' hash complements the <@item> array, providing named
     access to the same item values:

          stuff: /various/ bits 'and' pieces "then" data 'end'
          		{ print $item{data}  # PRINTS data
          				     # (EVEN EASIER THAN USING @item)

     The results of named subrules are stored in the hash under each
     subrule's name, whilst all other items are stored under a "named
     positional" key that indictates their ordinal position within their
     item type: __STRINGn__, __PATTERNn__, __DIRECTIVEn__, __ACTIONn__:

          stuff: /various/ bits 'and' pieces "then" data 'end' { save }
                          { print $item{__PATTERN1__}, # PRINTS 'various'
                                  $item{__STRING2__},  # PRINTS 'then'
                                  $item{__ACTION1__},  # PRINTS RETURN
          						     # VALUE OF save
                          }

     If you want proper *named* access to patterns or literals, you need
     to turn them into separate rules:

          stuff: various bits 'and' pieces "then" data 'end'
                          { print $item{various}  # PRINTS various
                          }

          various: /various/

     The special entry `$item{__RULE__}' stores the name of the current
     rule (i.e. the same value as `$item[0]'.

     The advantage of using `%item', instead of `@items' is that it
     removes the need to track items positions that may change as a grammar
     evolves. For example, adding an interim `<skip>' directive of action
     can silently ruin a trailing action, by moving an `@item' element
     "down" the array one place. In contrast, the named entry of `%item'
     is unaffected by such an insertion.

     A limitation of the `%item' hash is that it only records the last
     value of a particular subrule. For example:

          range: '(' number '..' number )'
                          { $return = $item{number} }

     will return only the value corresponding to the *second* match of the
     number subrule. In other words, successive calls to a subrule
     overwrite the corresponding entry in `%item'. Once again, the
     solution is to rename each subrule in its own rule:

          range: '(' from_num '..' to_num )'
                          { $return = $item{from_num} }

          from_num: number
          to_num:   number

`@arg' and `%arg'
     The array `@arg' and the hash `%arg' store any arguments passed to
     the rule from some other rule (see `"Subrule argument lists' in this
     node). Changes to the elements of either variable do not propagate
     back to the calling rule (data can be passed back from a subrule via
     the `$return' variable - see next item).

`$return'
     If a value is assigned to `$return' within an action, that value is
     returned if the production containing the action eventually matches
     successfully. Note that setting `$return' *doesn't* cause the current
     production to succeed. It merely tells it what to return if it *does*
     succeed.  Hence `$return' is analogous to $$ in a *yacc* grammar.

     If `$return' is not assigned within a production, the value of the
     last component of the production (namely: `$item[$#item]') is
     returned if the production succeeds.

`$commit'
     The current state of commitment to the current production (see
     `"Directives"' in this node below).

`$skip'
     The current terminal prefix (see `"Directives"' in this node below).

$text
     The remaining (unparsed) text. Changes to $text *do not propagate*
     out of unsuccessful productions, but do survive successful
     productions. Hence it is possible to dynamically alter the text being
     parsed - for example, to provide a `#include'-like facility:

          hash_include: '#include' filename
                                  { $text = ::loadfile($item[2]) . $text }

          filename: '<' /[a-z0-9._-]+/i '>'  { $return = $item[2] }
                  | '"' /[a-z0-9._-]+/i '"'  { $return = $item[2] }

`$thisline' and `$prevline'
     `$thisline' stores the current line number within the current parse
     (starting from 1). `$prevline' stores the line number for the last
     character which was already successfully parsed (this will be
     different from `$thisline' at the end of each line).

     For efficiency, `$thisline' and `$prevline' are actually tied hashes,
     and only recompute the required line number when the variable's value
     is used.

     Assignment to `$thisline' adjusts the line number calculator, so that
     it believes that the current line number is the value being assigned.
     Note that this adjustment will be reflected in all subsequent line
     numbers calculations.

     Modifying the value of the variable $text (as in the previous
     `hash_include' example, for instance) will confuse the line counting
     mechanism. To prevent this, you should call
     `Parse::RecDescent::LineCounter::resync($thisline)' *immediately*
     after any assignment to the variable $text (or, at least, before the
     next attempt to use `$thisline').

     Note that if a production fails after assigning to or resync'ing
     `$thisline', the parser's line counter mechanism will usually be
     corrupted.

     Also see the entry for `@itempos'.

     The line number can be set to values other than 1, by calling the
     start rule with a second argument. For example:

          $parser = new Parse::RecDescent ($grammar);

          $parser->input($text, 10);      # START LINE NUMBERS AT 10

`$thiscolumn' and `$prevcolumn'
     `$thiscolumn' stores the current column number within the current line
     being parsed (starting from 1). `$prevcolumn' stores the column number
     of the last character which was actually successfully parsed. Usually
     `$prevcolumn == $thiscolumn-1', but not at the end of lines.

     For efficiency, `$thiscolumn' and `$prevcolumn' are actually tied
     hashes, and only recompute the required column number when the
     variable's value is used.

     Assignment to `$thiscolumn' or `$prevcolumn' is a fatal error.

     Modifying the value of the variable $text (as in the previous
     `hash_include' example, for instance) may confuse the column counting
     mechanism.

     Note that `$thiscolumn' reports the column number before any
     whitespace that might be skipped before reading a token. Hence if you
     wish to know where a token started (and ended) use something like
     this:

          rule: token1 token2 startcol token3 endcol token4
                          { print "token3: columns $item[3] to $item[5]"; }

          startcol: // { $thiscolumn }    # NEED THE // TO STEP PAST TOKEN SEP
          endcol:      { $prevcolumn }

     Also see the entry for `@itempos'.

`$thisoffset' and `$prevoffset'
     `$thisoffset' stores the offset of the current parsing position
     within the complete text being parsed (starting from 0).
     `$prevoffset' stores the offset of the last character which was
     actually successfully parsed. In all cases `$prevoffset ==
     $thisoffset-1'.

     For efficiency, `$thisoffset' and `$prevoffset' are actually tied
     hashes, and only recompute the required offset when the variable's
     value is used.

     Assignment to `$thisoffset' or <$prevoffset> is a fatal error.

     Modifying the value of the variable $text will not affect the offset
     counting mechanism.

     Also see the entry for `@itempos'.

`@itempos'
     The array `@itempos' stores a hash reference corresponding to each
     element of `@item'. The elements of the hash provide the following:

          $itempos[$n]{offset}{from}      # VALUE OF $thisoffset BEFORE $item[$n]
          $itempos[$n]{offset}{to}        # VALUE OF $prevoffset AFTER $item[$n]
          $itempos[$n]{line}{from}        # VALUE OF $thisline BEFORE $item[$n]
          $itempos[$n]{line}{to}          # VALUE OF $prevline AFTER $item[$n]
          $itempos[$n]{column}{from}      # VALUE OF $thiscolumn BEFORE $item[$n]
          $itempos[$n]{column}{to}        # VALUE OF $prevcolumn AFTER $item[$n]

     Note that the various `$itempos[$n]...{from}' values record the
     appropriate value after any token prefix has been skipped.

     Hence, instead of the somewhat tedious and error-prone:

          rule: startcol token1 endcol
                startcol token2 endcol
                startcol token3 endcol
                          { print "token1: columns $item[1]
                                                to $item[3]
                                   token2: columns $item[4]
                                                to $item[6]
                                   token3: columns $item[7]
                                                to $item[9]" }

          startcol: // { $thiscolumn }    # NEED THE // TO STEP PAST TOKEN SEP
          endcol:      { $prevcolumn }

     it is possible to write:

          rule: token1 token2 token3
                          { print "token1: columns $itempos[1]{column}{from}
                                                to $itempos[1]{column}{to}
                                   token2: columns $itempos[2]{column}{from}
                                                to $itempos[2]{column}{to}
                                   token3: columns $itempos[3]{column}{from}
                                                to $itempos[3]{column}{to}" }

     Note however that (in the current implementation) the use of
     `@itempos' anywhere in a grammar implies that item positioning
     information is collected *everywhere* during the parse. Depending on
     the grammar and the size of the text to be parsed, this may be
     prohibitively expensive and the explicit use of `$thisline',
     `$thiscolumn', etc. may be a better choice.

`$thisparser'
     A reference to the Parse::RecDescent object through which parsing was
     initiated.

     The value of `$thisparser' propagates down the subrules of a parse
     but not back up. Hence, you can invoke subrules from another parser
     for the scope of the current rule as follows:

          rule: subrule1 subrule2
              | { $thisparser = $::otherparser } <reject>
              | subrule3 subrule4
              | subrule5

     The result is that the production calls "subrule1" and "subrule2" of
     the current parser, and the remaining productions call the named
     subrules from `$::otherparser'. Note, however that "Bad Things" will
     happen if `::otherparser' isn't a blessed reference and/or doesn't
     have methods with the same names as the required subrules!

`$thisrule'
     A reference to the `Parse::RecDescent::Rule' object corresponding to
     the rule currently being matched.

`$thisprod'
     A reference to the `Parse::RecDescent::Production' object
     corresponding to the production currently being matched.

`$score' and `$score_return'
     $score stores the best production score to date, as specified by an
     earlier `<score:...>' directive. $score_return stores the
     corresponding return value for the successful production.

     See `Scored productions' in this node.

   Warning: the parser relies on the information in the various `this...'
objects in some non-obvious ways. Tinkering with the other members of
these objects will probably cause Bad Things to happen, unless you
*really* know what you're doing. The only exception to this advice is that
the use of `$this...->{local}' is always safe.

Start-up Actions
----------------

   Any actions which appear before the first rule definition in a grammar
are treated as "start-up" actions. Each such action is stripped of its
outermost brackets and then evaluated (in the parser's special namespace)
just before the rules of the grammar are first compiled.

   The main use of start-up actions is to declare local variables within
the parser's special namespace:

     { my $lastitem = '???'; }

     list: item(s)   { $return = $lastitem }

     item: book      { $lastitem = 'book'; }
           bell      { $lastitem = 'bell'; }
           candle    { $lastitem = 'candle'; }

   but start-up actions can be used to execute any valid Perl code within
a parser's special namespace.

   Start-up actions can appear within a grammar extension or replacement
(that is, a partial grammar installed via `Parse::RecDescent::Extend()' or
`Parse::RecDescent::Replace()' - see `Incremental Parsing' in this node),
and will be executed before the new grammar is installed. Note, however,
that a particular start-up action is only ever executed once.

Autoactions
-----------

   It is sometimes desirable to be able to specify a default action to be
taken at the end of every production (for example, in order to easily
build a parse tree). If the variable `$::RD_AUTOACTION' is defined when
`Parse::RecDescent::new()' is called, the contents of that variable are
treated as a specification of an action which is to appended to each
production in the corresponding grammar. So, for example, to construct a
simple parse tree:

     $::RD_AUTOACTION = q { [@item] };

     parser = new Parse::RecDescent (q{
         expression: and_expr '||' expression | and_expr
         and_expr:   not_expr '&&' and_expr   | not_expr
         not_expr:   '!' brack_expr           | brack_expr
         brack_expr: '(' expression ')'       | identifier
         identifier: /[a-z]+/i
         });

   which is equivalent to:

     parser = new Parse::RecDescent (q{
         expression: and_expr '&&' expression
                         { [@item] }
                   | and_expr
                         { [@item] }

     and_expr:   not_expr '&&' and_expr
                     { [@item] }
             |   not_expr
                     { [@item] }

     not_expr:   '!' brack_expr
                     { [@item] }
             |   brack_expr
                     { [@item] }

     brack_expr: '(' expression ')'
                     { [@item] }
               | identifier
                     { [@item] }

     identifier: /[a-z]+/i
                     { [@item] }
     });

   Alternatively, we could take an object-oriented approach, use different
classes for each node (and also eliminating redundant intermediate nodes):

     $::RD_AUTOACTION = q
       { $#item==1 ? $item[1] : new ${"$item[0]_node"} (@item[1..$#item]) };

     parser = new Parse::RecDescent (q{
         expression: and_expr '||' expression | and_expr
         and_expr:   not_expr '&&' and_expr   | not_expr
         not_expr:   '!' brack_expr           | brack_expr
         brack_expr: '(' expression ')'       | identifier
         identifier: /[a-z]+/i
         });

   which is equivalent to:

     parser = new Parse::RecDescent (q{
         expression: and_expr '&&' expression
                         { new expression_node (@item[1..3]) }
                   | and_expr

     and_expr:   not_expr '&&' and_expr
                     { new and_expr_node (@item[1..3]) }
             |   not_expr

     not_expr:   '!' brack_expr
                     { new not_expr_node (@item[1..2]) }
             |   brack_expr

     brack_expr: '(' expression ')'
                     { new brack_expr_node (@item[1..3]) }
               | identifier

     identifier: /[a-z]+/i
                     { new identifer_node (@item[1]) }
     });

   Note that, if a production already ends in an action, no autoaction is
appended to it. For example, in this version:

     $::RD_AUTOACTION = q
       { $#item==1 ? $item[1] : new ${"$item[0]_node"} (@item[1..$#item]) };

     parser = new Parse::RecDescent (q{
         expression: and_expr '&&' expression | and_expr
         and_expr:   not_expr '&&' and_expr   | not_expr
         not_expr:   '!' brack_expr           | brack_expr
         brack_expr: '(' expression ')'       | identifier
         identifier: /[a-z]+/i
                         { new terminal_node($item[1]) }
         });

   each identifier match produces a `terminal_node' object, not an
`identifier_node' object.

   A level 1 warning is issued each time an "autoaction" is added to some
production.

Autotrees
---------

   A commonly needed autoaction is one that builds a parse-tree. It is
moderately tricky to set up such an action (which must treat terminals
differently from non-terminals), so Parse::RecDescent simplifies the
process by providing the `<autotree>' directive.

   If this directive appears at the start of grammar, it causes
Parse::RecDescent to insert autoactions at the end of any rule except
those which already end in an action. The action inserted depends on
whether the production is an intermediate rule (two or more items), or a
terminal of the grammar (i.e. a single pattern or string item).

   So, for example, the following grammar:

     <autotree>

     file    : command(s)
     command : get | set | vet
     get     : 'get' ident ';'
     set     : 'set' ident 'to' value ';'
     vet     : 'check' ident 'is' value ';'
     ident   : /\w+/
     value   : /\d+/

   is equivalent to:

     file    : command(s)                    { bless \%item, $item[0] }
     command : get                           { bless \%item, $item[0] }
             | set                           { bless \%item, $item[0] }
             | vet                           { bless \%item, $item[0] }
     get     : 'get' ident ';'               { bless \%item, $item[0] }
     set     : 'set' ident 'to' value ';'    { bless \%item, $item[0] }
     vet     : 'check' ident 'is' value ';'  { bless \%item, $item[0] }

     ident   : /\w+/          { bless {__VALUE__=>$item[1]}, $item[0] }
     value   : /\d+/          { bless {__VALUE__=>$item[1]}, $item[0] }

   Note that each node in the tree is blessed into a class of the same name
as the rule itself. This makes it easy to build object-oriented processors
for the parse-trees that the grammar produces. Note too that the last two
rules produce special objects with the single attribute '__VALUE__'. This
is because they consist solely of a single terminal.

   This autoaction-ed grammar would then produce a parse tree in a data
structure like this:

     {
       file => {
                 command => {
                              [ get => {
                                         identifier => { __VALUE__ => 'a' },
                                       },
                                set => {
                                         identifier => { __VALUE__ => 'b' },
                                         value      => { __VALUE__ => '7' },
                                       },
                                vet => {
                                         identifier => { __VALUE__ => 'b' },
                                         value      => { __VALUE__ => '7' },
                                       },
                               ],
                            },
               }
     }

   (except, of course, that each nested hash would also be blessed into
the appropriate class).

Autostubbing
------------

   Normally, if a subrule appears in some production, but no rule of that
name is ever defined in the grammar, the production which refers to the
non-existent subrule fails immediately. This typically occurs as a result
of misspellings, and is a sufficiently common occurance that a warning is
generated for such situations.

   However, when prototyping a grammar it is sometimes useful to be able
to use subrules before a proper specification of them is really possible.
For example, a grammar might include a section like:

     function_call: identifier '(' arg(s?) ')'

     identifier: /[a-z]\w*/i

   where the possible format of an argument is sufficiently complex that
it is not worth specifying in full until the general function call syntax
has been debugged. In this situation it is convenient to leave the real
rule `arg' undefined and just slip in a placeholder (or "stub"):

     arg: 'arg'

   so that the function call syntax can be tested with dummy input such as:

     f0()
     f1(arg)
     f2(arg arg)
     f3(arg arg arg)

   et cetera.

   Early in prototyping, many such "stubs" may be required, so
Parse::RecDescent provides a means of automating their definition.  If the
variable `$::RD_AUTOSTUB' is defined when a parser is built, a subrule
reference to any non-existent rule (say, `sr'), causes a "stub" rule of
the form:

     sr: 'sr'

   to be automatically defined in the generated parser.  A level 1 warning
is issued for each such "autostubbed" rule.

   Hence, with `$::AUTOSTUB' defined, it is possible to only partially
specify a grammar, and then "fake" matches of the unspecified (sub)rules
by just typing in their name.

Look-ahead
----------

   If a subrule, token, or action is prefixed by "...", then it is treated
as a "look-ahead" request. That means that the current production can (as
usual) only succeed if the specified item is matched, but that the matching
*does not consume any of the text being parsed*. This is very similar to
the `/(?=...)/' look-ahead construct in Perl patterns. Thus, the rule:

     inner_word: word ...word

   will match whatever the subrule "word" matches, provided that match is
followed by some more text which subrule "word" would also match (although
this second substring is not actually consumed by "inner_word")

   Likewise, a "...!" prefix, causes the following item to succeed (without
consuming any text) if and only if it would normally fail. Hence, a rule
such as:

     identifier: ...!keyword ...!'_' /[A-Za-z_]\w*/

   matches a string of characters which satisfies the pattern
`/[A-Za-z_]\w*/', but only if the same sequence of characters would not
match either subrule "keyword" or the literal token '_'.

   Sequences of look-ahead prefixes accumulate, multiplying their positive
and/or negative senses. Hence:

     inner_word: word ...!......!word

   is exactly equivalent the the original example above (a warning is
issued in cases like these, since they often indicate something left out,
or misunderstood).

   Note that actions can also be treated as look-aheads. In such cases,
the state of the parser text (in the local variable $text) after the
look-ahead action is guaranteed to be identical to its state before the
action, regardless of how it's changed *within* the action (unless you
actually undefine $text, in which case you get the disaster you deserve
:-).

Directives
----------

   Directives are special pre-defined actions which may be used to alter
the behaviour of the parser. There are currently eighteen directives:
`<commit>', `<uncommit>', `<reject>', `<score>', `<autoscore>', `<skip>',
`<resync>', `<error>', `<rulevar>', `<matchrule>', `<leftop>', `<rightop>',
`<defer>', `<nocheck>', `<perl_quotelike>', `<perl_codeblock>',
`<perl_variable>', and `<token>'.

Committing and uncommitting
     The `<commit>' and `<uncommit>' directives permit the recursive
     descent of the parse tree to be pruned (or "cut") for efficiency.
     Within a rule, a `<commit>' directive instructs the rule to ignore
     subsequent productions if the current production fails. For example:

          command: 'find' <commit> filename
                 | 'open' <commit> filename
                 | 'move' filename filename

     Clearly, if the leading token 'find' is matched in the first
     production but that production fails for some other reason, then the
     remaining productions cannot possibly match. The presence of the
     `<commit>' causes the "command" rule to fail immediately if an
     invalid "find" command is found, and likewise if an invalid "open"
     command is encountered.

     It is also possible to revoke a previous commitment. For example:

          if_statement: 'if' <commit> condition
                                  'then' block <uncommit>
                                  'else' block
                      | 'if' <commit> condition
                                  'then' block

     In this case, a failure to find an "else" block in the first
     production shouldn't preclude trying the second production, but a
     failure to find a "condition" certainly should.

     As a special case, any production in which the first item is an
     `<uncommit>' immediately revokes a preceding `<commit>' (even though
     the production would not otherwise have been tried). For example, in
     the rule:

          request: 'explain' expression
                 | 'explain' <commit> keyword
                 | 'save'
                 | 'quit'
                 | <uncommit> term '?'

     if the text being matched was "explain?", and the first two
     productions failed, then the `<commit>' in production two would cause
     productions three and four to be skipped, but the leading
     `<uncommit>' in the production five would allow that production to
     attempt a match.

     Note in the preceding example, that the `<commit>' was only placed in
     production two. If production one had been:

          request: 'explain' <commit> expression

     then production two would be (inappropriately) skipped if a leading
     "explain..." was encountered.

     Both `<commit>' and `<uncommit>' directives always succeed, and their
     value is always 1.

Rejecting a production
     The `<reject>' directive immediately causes the current production to
     fail (it is exactly equivalent to, but more obvious than, the action
     `{undef}'). A `<reject>' is useful when it is desirable to get the
     side effects of the actions in one production, without prejudicing a
     match by some other production later in the rule. For example, to
     insert tracing code into the parse:

          complex_rule: { print "In complex rule...\n"; } <reject>
          
          complex_rule: simple_rule '+' 'i' '*' simple_rule
                      | 'i' '*' simple_rule
                      | simple_rule

     It is also possible to specify a conditional rejection, using the
     form `<reject:*condition*>', which only rejects if the specified
     condition is true. This form of rejection is exactly equivalent to
     the action `{(*condition*)?undef:1}>'.  For example:

          command: save_command
                 | restore_command
                 | <reject: defined $::tolerant> { exit }
                 | <error: Unknown command. Ignored.>

     A `<reject>' directive never succeeds (and hence has no associated
     value). A conditional rejection may succeed (if its condition is not
     satisfied), in which case its value is 1.

     As an extra optimization, Parse::RecDescent ignores any production
     which *begins* with an unconditional `<reject>' directive, since any
     such production can never successfully match or have any useful
     side-effects. A level 1 warning is issued in all such cases.

     Note that productions beginning with conditional `<reject:...>'
     directives are never "optimized away" in this manner, even if they
     are always guaranteed to fail (for example: `<reject:1>')

     Due to the way grammars are parsed, there is a minor restriction on
     the condition of a conditional `<reject:...>': it cannot contain any
     raw '<' or '>' characters. For example:

          line: cmd <reject: $thiscolumn > max> data

     results in an error when a parser is built from this grammar (since
     the grammar parser has no way of knowing whether the first > is a
     "less than" or the end of the `<reject:...>'.

     To overcome this problem, put the condition inside a do{} block:

          line: cmd <reject: do{$thiscolumn > max}> data

     Note that the same problem may occur in other directives that take
     arguments. The same solution will work in all cases.

Skipping between terminals
     The `<skip>' directive enables the terminal prefix used in a
     production to be changed. For example:

          OneLiner: Command <skip:'[ \t]*'> Arg(s) /;/

     causes only blanks and tabs to be skipped before terminals in the Arg
     subrule (and any of *its* subrules>, and also before the final `/;/'
     terminal.  Once the production is complete, the previous terminal
     prefix is reinstated. Note that this implies that distinct
     productions of a rule must reset their terminal prefixes individually.

     The `<skip>' directive evaluates to the previous terminal prefix, so
     it's easy to reinstate a prefix later in a production:

          Command: <skip:","> CSV(s) <skip:$item[1]> Modifier

     The value specified after the colon is interpolated into a pattern,
     so all of the following are equivalent (though their efficiency
     increases down the list):

          <skip: "$colon|$comma">   # ASSUMING THE VARS HOLD THE OBVIOUS VALUES

          <skip: ':|,'>

          <skip: q{[:,]}>

          <skip: qr/[:,]/>

     There is no way of directly setting the prefix for an entire rule,
     except as follows:

          Rule: <skip: '[ \t]*'> Prod1
              | <skip: '[ \t]*'> Prod2a Prod2b
              | <skip: '[ \t]*'> Prod3

     or, better:

          Rule: <skip: '[ \t]*'>
              (
                  Prod1
                | Prod2a Prod2b
                | Prod3
              )

     *Note: Up to release 1.51 of Parse::RecDescent, an entirely different
     mechanism was used for specifying terminal prefixes. The current
     method is not backwards-compatible with that early approach. The
     current approach is stable and will not to change again.*

Resynchronization
     The `<resync>' directive provides a visually distinctive means of
     consuming some of the text being parsed, usually to skip an erroneous
     input. In its simplest form `<resync>' simply consumes text up to and
     including the next newline (`"\n"') character, succeeding only if the
     newline is found, in which case it causes its surrounding rule to
     return zero on success.

     In other words, a `<resync>' is exactly equivalent to the token
     `/[^\n]*\n/' followed by the action `{ $return = 0 }' (except that
     productions beginning with a `<resync>' are ignored when generating
     error messages). A typical use might be:

          script : command(s)

          command: save_command
                 | restore_command
                 | <resync> # TRY NEXT LINE, IF POSSIBLE

     It is also possible to explicitly specify a resynchronization
     pattern, using the `<resync:*pattern*>' variant. This version
     succeeds only if the specified pattern matches (and consumes) the
     parsed text. In other words, `<resync:*pattern*>' is exactly
     equivalent to the token `/*pattern*/' (followed by a `{ $return = 0 }'
     action). For example, if commands were terminated by newlines or
     semi-colons:

          command: save_command
                 | restore_command
                 | <resync:[^;\n]*[;\n]>

     The value of a successfully matched `<resync>' directive (of either
     type) is the text that it consumed. Note, however, that since the
     directive also sets `$return', a production consisting of a lone
     `<resync>' succeeds but returns the value zero (which a calling rule
     may find useful to distinguish between "true" matches and "tolerant"
     matches).  Remember that returning a zero value indicates that the
     rule *succeeded* (since only an undef denotes failure within
     Parse::RecDescent parsers.

Error handling
     The `<error>' directive provides automatic or user-defined generation
     of error messages during a parse. In its simplest form `<error>'
     prepares an error message based on the mismatch between the last item
     expected and the text which cause it to fail. For example, given the
     rule:

          McCoy: curse ',' name ', I'm a doctor, not a' a_profession '!'
               | pronoun 'dead,' name '!'
               | <error>

     the following strings would produce the following messages:

    "Amen, Jim!"
               ERROR (line 1): Invalid McCoy: Expected curse or pronoun
                               not found

    "Dammit, Jim, I'm a doctor!"
               ERROR (line 1): Invalid McCoy: Expected ", I'm a doctor, not a"
                               but found ", I'm a doctor!" instead

    "He's dead,\n"
               ERROR (line 2): Invalid McCoy: Expected name not found

    "He's alive!"
               ERROR (line 1): Invalid McCoy: Expected 'dead,' but found
                               "alive!" instead

    "Dammit, Jim, I'm a doctor, not a pointy-eared Vulcan!"
               ERROR (line 1): Invalid McCoy: Expected a profession but found
                               "pointy-eared Vulcan!" instead

     Note that, when autogenerating error messages, all underscores in any
     rule name used in a message are replaced by single spaces (for example
     "a_production" becomes "a production"). Judicious choice of rule
     names can therefore considerably improve the readability of automatic
     error messages (as well as the maintainability of the original
     grammar).

     If the automatically generated error is not sufficient, it is
     possible to provide an explicit message as part of the error
     directive. For example:

          Spock: "Fascinating ',' (name | 'Captain') '.'
               | "Highly illogical, doctor."
               | <error: He never said that!>

     which would result in all failures to parse a "Spock" subrule
     printing the following message:

          ERROR (line <N>): Invalid Spock:  He never said that!

     The error message is treated as a "qq{...}" string and interpolated
     when the error is generated (not when the directive is specified!).
     Hence:

          <error: Mystical error near "$text">

     would correctly insert the ambient text string which caused the error.

     There are two other forms of error directive: `<error?>' and
     `<error?: msg>'. These behave just like `<error>' and `<error: msg>'
     respectively, except that they are only triggered if the rule is
     "committed" at the time they are encountered. For example:

          Scotty: "Ya kenna change the Laws of Phusics," <commit> name
                | name <commit> ',' 'she's goanta blaw!'
                | <error?>

     will only generate an error for a string beginning with "Ya kenna
     change the Laws o' Phusics," or a valid name, but which still fails
     to match the corresponding production. That is,
     `$parser->Scotty("Aye, Cap'ain")' will fail silently (since neither
     production will "commit" the rule on that input), whereas
     `$parser->Scotty("Mr Spock, ah jest kenna do'ut!")'  will fail with
     the error message:

          ERROR (line 1): Invalid Scotty: expected 'she's goanta blaw!'
                          but found 'I jest kenna do'ut!' instead.

     since in that case the second production would commit after matching
     the leading name.

     Note that to allow this behaviour, all `<error>' directives which are
     the first item in a production automatically uncommit the rule just
     long enough to allow their production to be attempted (that is, when
     their production fails, the commitment is reinstated so that
     subsequent productions are skipped).

     In order to *permanently* uncommit the rule before an error message,
     it is necessary to put an explicit `<uncommit>' before the `<error>'.
     For example:

          line: 'Kirk:'  <commit> Kirk
              | 'Spock:' <commit> Spock
              | 'McCoy:' <commit> McCoy
              | <uncommit> <error?> <reject>
              | <resync>

     Error messages generated by the various `<error...>' directives are
     not displayed immediately. Instead, they are "queued" in a buffer and
     are only displayed once parsing ultimately fails. Moreover,
     `<error...>' directives that cause one production of a rule to fail
     are automatically removed from the message queue if another
     production subsequently causes the entire rule to succeed.  This
     means that you can put `<error...>' directives wherever useful
     diagnosis can be done, and only those associated with actual parser
     failure will ever be displayed. Also see `"Gotchas"' in this node.

     As a general rule, the most useful diagnostics are usually generated
     either at the very lowest level within the grammar, or at the very
     highest. A good rule of thumb is to identify those subrules which
     consist mainly (or entirely) of terminals, and then put an
     `<error...>' directive at the end of any other rule which calls one
     or more of those subrules.

     There is one other situation in which the output of the various types
     of error directive is suppressed; namely, when the rule containing
     them is being parsed as part of a "look-ahead" (see `"Look-ahead"' in
     this node). In this case, the error directive will still cause the
     rule to fail, but will do so silently.

     An unconditional `<error>' directive always fails (and hence has no
     associated value). This means that encountering such a directive
     always causes the production containing it to fail. Hence an
     `<error>' directive will inevitably be the last (useful) item of a
     rule (a level 3 warning is issued if a production contains items
     after an unconditional `<error>' directive).

     An `<error?>' directive will *succeed* (that is: fail to fail :-), if
     the current rule is uncommitted when the directive is encountered. In
     that case the directive's associated value is zero. Hence, this type
     of error directive can be used before the end of a production. For
     example:

          command: 'do' <commit> something
                 | 'report' <commit> something
                 | <error?: Syntax error> <error: Unknown command>

     Warning: The `<error?>' directive does not mean "always fail (but do
     so silently unless committed)". It actually means "only fail (and
     report) if committed, otherwise *succeed*". To achieve the "fail
     silently if uncommitted" semantics, it is necessary to use:

          rule: item <commit> item(s)
              | <error?> <reject>      # FAIL SILENTLY UNLESS COMMITTED

     However, because people seem to expect a lone `<error?>' directive to
     work like this:

          rule: item <commit> item(s)
              | <error?: Error message if committed>
              | <error:  Error message if uncommitted>

     Parse::RecDescent automatically appends a `<reject>' directive if the
     `<error?>' directive is the only item in a production. A level 2
     warning (see below) is issued when this happens.

     The level of error reporting during both parser construction and
     parsing is controlled by the presence or absence of four global
     variables: `$::RD_ERRORS', `$::RD_WARN', `$::RD_HINT', and
     <$::RD_TRACE>. If `$::RD_ERRORS' is defined (and, by default, it is)
     then fatal errors are reported.

     Whenever `$::RD_WARN' is defined, certain non-fatal problems are also
     reported.  Warnings have an associated "level": 1, 2, or 3. The
     higher the level, the more serious the warning. The value of the
     corresponding global variable (`$::RD_WARN') determines the *lowest*
     level of warning to be displayed. Hence, to see all warnings, set
     `$::RD_WARN' to 1.  To see only the most serious warnings set
     `$::RD_WARN' to 3.  By default `$::RD_WARN' is initialized to 3,
     ensuring that serious but non-fatal errors are automatically reported.

     See `"DIAGNOSTICS"' for a list of the varous error and warning
     messages that Parse::RecDescent generates when these two variables
     are defined.

     Defining any of the remaining variables (which are not defined by
     default) further increases the amount of information reported.
     Defining `$::RD_HINT' causes the parser generator to offer more
     detailed analyses and hints on both errors and warnings.  Note that
     setting `$::RD_HINT' at any point automagically sets `$::RD_WARN' to
     1.

     Defining `$::RD_TRACE' causes the parser generator and the parser to
     report their progress to STDERR in excruciating detail (although,
     without hints unless $::RD_HINT is separately defined). This detail
     can be moderated in only one respect: if `$::RD_TRACE' has an integer
     value (N) greater than 1, only the N characters of the "current
     parsing context" (that is, where in the input string we are at any
     point in the parse) is reported at any time.

     `$::RD_TRACE' is mainly useful for debugging a grammar that isn't
     behaving as you expected it to. To this end, if `$::RD_TRACE' is
     defined when a parser is built, any actual parser code which is
     generated is also written to a file named "RD_TRACE" in the local
     directory.

     Note that the four variables belong to the "main" package, which
     makes them easier to refer to in the code controlling the parser, and
     also makes it easy to turn them into command line flags ("-RD_ERRORS",
     "-RD_WARN", "-RD_HINT", "-RD_TRACE") under *perl -s*.

Specifying local variables
     It is occasionally convenient to specify variables which are local to
     a single rule. This may be achieved by including a `<rulevar:...>'
     directive anywhere in the rule. For example:

          markup: <rulevar: $tag>

          markup: tag {($tag=$item[1]) =~ s/^<|>$//g} body[$tag]

     The example `<rulevar: $tag>' directive causes a "my" variable named
     $tag to be declared at the start of the subroutine implementing the
     `markup' rule (that is, before the first production, regardless of
     where in the rule it is specified).

     Specifically, any directive of the form: `<rulevar:*text*>' causes a
     line of the form `my *text*;' to be added at the beginning of the
     rule subroutine, immediately after the definitions of the following
     local variables:

          $thisparser     $commit
          $thisrule       @item
          $thisline       @arg
          $text           %arg

     This means that the following `<rulevar>' directives work as expected:

          <rulevar: $count = 0 >

          <rulevar: $firstarg = $arg[0] || '' >

          <rulevar: $myItems = \@item >

          <rulevar: @context = ( $thisline, $text, @arg ) >

          <rulevar: ($name,$age) = $arg{"name","age"} >

     If a variable that is also visible to subrules is required, it needs
     to be local'd, not my'd. `rulevar' defaults to my, but if local is
     explicitly specified:

          <rulevar: local $count = 0 >

     then a local-ized variable is declared instead, and will be available
     within subrules.

     Note however that, because all such variables are "my" variables,
     their values *do not persist* between match attempts on a given rule.
     To preserve values between match attempts, values can be stored
     within the "local" member of the `$thisrule' object:

          countedrule: { $thisrule->{"local"}{"count"}++ }
                       <reject>
                     | subrule1
                     | subrule2
                     | <reject: $thisrule->{"local"}{"count"} == 1>
                       subrule3

     When matching a rule, each `<rulevar>' directive is matched as if it
     were an unconditional `<reject>' directive (that is, it causes any
     production in which it appears to immediately fail to match).  For
     this reason (and to improve readability) it is usual to specify any
     `<rulevar>' directive in a separate production at the start of the
     rule (this has the added advantage that it enables Parse::RecDescent
     to optimize away such productions, just as it does for the `<reject>'
     directive).

Dynamically matched rules
     Because regexes and double-quoted strings are interpolated, it is
     relatively easy to specify productions with "context sensitive"
     tokens. For example:

          command:  keyword  body  "end $item[1]"

     which ensures that a command block is bounded by a "*<keyword>*...end
     *<same keyword>*" pair.

     Building productions in which subrules are context sensitive is also
     possible, via the `<matchrule:...>' directive. This directive behaves
     identically to a subrule item, except that the rule which is invoked
     to match it is determined by the string specified after the colon.
     For example, we could rewrite the command rule like this:

          command:  keyword  <matchrule:body>  "end $item[1]"

     Whatever appears after the colon in the directive is treated as an
     interpolated string (that is, as if it appeared in `qq{...}'
     operator) and the value of that interpolated string is the name of
     the subrule to be matched.

     Of course, just putting a constant string like body in a
     `<matchrule:...>' directive is of little interest or benefit.  The
     power of directive is seen when we use a string that interpolates to
     something interesting. For example:

          command:        keyword <matchrule:$item[1]_body> "end $item[1]"

          keyword:        'while' | 'if' | 'function'

          while_body:     condition block

          if_body:        condition block ('else' block)(?)

          function_body:  arglist block

     Now the command rule selects how to proceed on the basis of the
     keyword that is found. It is as if command were declared:

          command:        'while'    while_body    "end while"
                 |        'if'       if_body       "end if"
                 |        'function' function_body "end function"

     When a `<matchrule:...>' directive is used as a repeated subrule, the
     rule name expression is "late-bound". That is, the name of the rule
     to be called is re-evaluated *each time* a match attempt is made.
     Hence, the following grammar:

          { $::species = 'dogs' }

          pair:   'two' <matchrule:$::species>(s)

          dogs:   /dogs/ { $::species = 'cats' }

          cats:   /cats/

     will match the string "two dogs cats cats" completely, whereas it will
     only match the string "two dogs dogs dogs" up to the eighth letter. If
     the rule name were "early bound" (that is, evaluated only the first
     time the directive is encountered in a production), the reverse
     behaviour would be expected.

Deferred actions
     The `<defer:...>' directive is used to specify an action to be
     performed when (and only if!) the current production ultimately
     succeeds.

     Whenever a `<defer:...>' directive appears, the code it specifies is
     converted to a closure (an anonymous subroutine reference) which is
     queued within the active parser object. Note that, because the
     deferred code is converted to a closure, the values of any "local"
     variable (such as $text, <@item>, etc.) are preserved until the
     deferred code is actually executed.

     If the parse ultimately succeeds and the production in which the
     `<defer:...>' directive was evaluated formed part of the successful
     parse, then the deferred code is executed immediately before the
     parse returns. If however the production which queued a deferred
     action fails, or one of the higher-level rules which called that
     production fails, then the deferred action is removed from the queue,
     and hence is never executed.

     For example, given the grammar:

          sentence: noun trans noun
                  | noun intrans

          noun:     'the dog'
                          { print "$item[1]\t(noun)\n" }
              |     'the meat'
                          { print "$item[1]\t(noun)\n" }

          trans:    'ate'
                          { print "$item[1]\t(transitive)\n" }

          intrans:  'ate'
                          { print "$item[1]\t(intransitive)\n" }
                 |  'barked'
                          { print "$item[1]\t(intransitive)\n" }

     then parsing the sentence `"the dog ate"' would produce the output:

          the dog  (noun)
          ate      (transitive)
          the dog  (noun)
          ate      (intransitive)

     This is because, even though the first production of sentence
     ultimately fails, its initial subrules `noun' and trans do match, and
     hence they execute their associated actions.  Then the second
     production of sentence succeeds, causing the actions of the subrules
     `noun' and `intrans' to be executed as well.

     On the other hand, if the actions were replaced by `<defer:...>'
     directives:

          sentence: noun trans noun
                  | noun intrans

          noun:     'the dog'
                          <defer: print "$item[1]\t(noun)\n" >
              |     'the meat'
                          <defer: print "$item[1]\t(noun)\n" >

          trans:    'ate'
                          <defer: print "$item[1]\t(transitive)\n" >

          intrans:  'ate'
                          <defer: print "$item[1]\t(intransitive)\n" >
                 |  'barked'
                          <defer: print "$item[1]\t(intransitive)\n" >

     the output would be:

          the dog  (noun)
          ate      (intransitive)

     since deferred actions are only executed if they were evaluated in a
     production which ultimately contributes to the successful parse.

     In this case, even though the first production of sentence caused the
     subrules `noun' and trans to match, that production ultimately failed
     and so the deferred actions queued by those subrules were subsequently
     disgarded. The second production then succeeded, causing the entire
     parse to succeed, and so the deferred actions queued by the (second)
     match of the `noun' subrule and the subsequent match of `intrans'
     *are* preserved and eventually executed.

     Deferred actions provide a means of improving the performance of a
     parser, by only executing those actions which are part of the final
     parse-tree for the input data.

     Alternatively, deferred actions can be viewed as a mechanism for
     building (and executing) a customized subroutine corresponding to the
     given input data, much in the same way that autoactions (see
     `"Autoactions"' in this node) can be used to build a customized data
     structure for specific input.

     Whether or not the action it specifies is ever executed, a
     `<defer:...>' directive always succeeds, returning the number of
     deferred actions currently queued at that point.

Parsing Perl
     Parse::RecDescent provides limited support for parsing subsets of
     Perl, namely: quote-like operators, Perl variables, and complete code
     blocks.

     The `<perl_quotelike>' directive can be used to parse any Perl
     quote-like operator: `'a string'', `m/a pattern/', `tr{ans}{lation}',
     etc.  It does this by calling Text::Balanced::quotelike().

     If a quote-like operator is found, a reference to an array of eight
     elements is returned. Those elements are identical to the last eight
     elements returned by Text::Balanced::extract_quotelike() in an array
     context, namely:

    [0]
          the name of the quotelike operator - 'q', 'qq', 'm', 's', 'tr' -
          if the operator was named; otherwise undef,

    [1]
          the left delimiter of the first block of the operation,

    [2]
          the text of the first block of the operation (that is, the
          contents of a quote, the regex of a match, or substitution or
          the target list of a translation),

    [3]
          the right delimiter of the first block of the operation,

    [4]
          the left delimiter of the second block of the operation if there
          is one (that is, if it is a s, tr, or y); otherwise undef,

    [5]
          the text of the second block of the operation if there is one
          (that is, the replacement of a substitution or the translation
          list of a translation); otherwise undef,

    [6]
          the right delimiter of the second block of the operation (if
          any); otherwise undef,

    [7]
          the trailing modifiers on the operation (if any); otherwise
          undef.

     If a quote-like expression is not found, the directive fails with the
     usual undef value.

     The `<perl_variable>' directive can be used to parse any Perl
     variable: $scalar, @array, %hash, $ref->{field}[$index], etc.  It
     does this by calling Text::Balanced::extract_variable().

     If the directive matches text representing a valid Perl variable
     specification, it returns that text. Otherwise it fails with the usual
     undef value.

     The `<perl_codeblock>' directive can be used to parse
     curly-brace-delimited block of Perl code, such as: { $a = 1; f() =~
     m/pat/; }.  It does this by calling
     Text::Balanced::extract_codeblock().

     If the directive matches text representing a valid Perl code block,
     it returns that text. Otherwise it fails with the usual undef value.

Constructing tokens
     Eventually, Parse::RecDescent will be able to parse tokenized input,
     as well as ordinary strings. In preparation for this joyous day, the
     `<token:...>' directive has been provided.  This directive creates a
     token which will be suitable for input to a Parse::RecDescent parser
     (when it eventually supports tokenized input).

     The text of the token is the value of the immediately preceding item
     in the production. A `<token:...>' directive always succeeds with a
     return value which is the hash reference that is the new token. It
     also sets the return value for the production to that hash ref.

     The `<token:...>' directive makes it easy to build a
     Parse::RecDescent-compatible lexer in Parse::RecDescent:

          my $lexer = new Parse::RecDescent q
          {
                  lex:    token(s)

          token:  /a\b/                      <token:INDEF>
               |  /the\b/                    <token:DEF>
               |  /fly\b/                    <token:NOUN,VERB>
               |  /[a-z]+/i { lc $item[1] }  <token:ALPHA>
               |  <error: Unknown token>

          };

     which will eventually be able to be used with a regular
     Parse::RecDescent grammar:

          my $parser = new Parse::RecDescent q
          {
                  startrule: subrule1 subrule 2
          
                  # ETC...
          };

     either with a pre-lexing phase:

          $parser->startrule( $lexer->lex($data) );

     or with a lex-on-demand approach:

          $parser->startrule( sub{$lexer->token(\$data)} );

     But at present, only the `<token:...>' directive is actually
     implemented. The rest is vapourware.

Specifying operations
     One of the commonest requirements when building a parser is to specify
     binary operators. Unfortunately, in a normal grammar, the rules for
     such things are awkward:

          disjunction:    conjunction ('or' conjunction)(s?)
                                  { $return = [ $item[1], @{$item[2]} ] }

          conjunction:    atom ('and' atom)(s?)
                                  { $return = [ $item[1], @{$item[2]} ] }

     or inefficient:

          disjunction:    conjunction 'or' disjunction
                                  { $return = [ $item[1], @{$item[2]} ] }
                     |    conjunction
                                  { $return = [ $item[1] ] }

          conjunction:    atom 'and' conjunction
                                  { $return = [ $item[1], @{$item[2]} ] }
                     |    atom
                                  { $return = [ $item[1] ] }

     and either way is ugly and hard to get right.

     The `<leftop:...>' and `<rightop:...>' directives provide an easier
     way of specifying such operations. Using `<leftop:...>' the above
     examples become:

          disjunction:    <leftop: conjunction 'or' conjunction>
          conjunction:    <leftop: atom 'and' atom>

     The `<leftop:...>' directive specifies a left-associative binary
     operator.  It is specified around three other grammar elements
     (typically subrules or terminals), which match the left operand, the
     operator itself, and the right operand respectively.

     A `<leftop:...>' directive such as:

          disjunction:    <leftop: conjunction 'or' conjunction>

     is converted to the following:

          disjunction:    ( conjunction ('or' conjunction)(s?)
                                  { $return = [ $item[1], @{$item[2]} ] } )

     In other words, a `<leftop:...>' directive matches the left operand
     followed by zero or more repetitions of both the operator and the
     right operand. It then flattens the matched items into an anonymous
     array which becomes the (single) value of the entire `<leftop:...>'
     directive.

     For example, an `<leftop:...>' directive such as:

          output:  <leftop: ident '<<' expr >

     when given a string such as:

          cout << var << "str" << 3

     would match, and `$item[1]' would be set to:

          [ 'cout', 'var', '"str"', '3' ]

     In other words:

          output:  <leftop: ident '<<' expr >

     is equivalent to a left-associative operator:

          output:  ident                                  { $return = [$item[1]]       }
                |  ident '<<' expr                        { $return = [@item[1,3]]     }
                |  ident '<<' expr '<<' expr              { $return = [@item[1,3,5]]   }
                |  ident '<<' expr '<<' expr '<<' expr    { $return = [@item[1,3,5,7]] }
                #  ...etc...

     Similarly, the `<rightop:...>' directive takes a left operand, an
     operator, and a right operand:

          assign:  <rightop: var '=' expr >

     and converts them to:

          assign:  ( (var '=' {$return=$item[1]})(s?) expr
                                  { $return = [ @{$item[1]}, $item[2] ] } )

     which is equivalent to a right-associative operator:

          assign:  var                            { $return = [$item[1]]       }
                |  var '=' expr                   { $return = [@item[1,3]]     }
                |  var '=' var '=' expr           { $return = [@item[1,3,5]]   }
                |  var '=' var '=' var '=' expr   { $return = [@item[1,3,5,7]] }
                #  ...etc...

     Note that for both the `<leftop:...>' and `<rightop:...>' directives,
     the directive does not normally return the operator itself, just a
     list of the operands involved. This is particularly handy for
     specifying lists:

          list: '(' <leftop: list_item ',' list_item> ')'
                          { $return = $item[2] }

     There is, however, a problem: sometimes the operator is itself
     significant.  For example, in a Perl list a comma and a `=>' are both
     valid separators, but the `=>' has additional stringification
     semantics.  Hence it's important to know which was used in each case.

     To solve this problem the `<leftop:...>' and `<rightop:...>'
     directives do return the operator(s) as well, under two circumstances.
     The first case is where the operator is specified as a subrule. In
     that instance, whatever the operator matches is returned (on the
     assumption that if the operator is important enough to have its own
     subrule, then it's important enough to return).

     The second case is where the operator is specified as a regular
     expression. In that case, if the first bracketed subpattern of the
     regular expression matches, that matching value is returned (this is
     analogous to the behaviour of the Perl split function, except that
     only the first subpattern is returned).

     In other words, given the input:

          ( a=>1, b=>2 )

     the specifications:

          list:      '('  <leftop: list_item separator list_item>  ')'

          separator: ',' | '=>'

     or:

          list:      '('  <leftop: list_item /(,|=>)/ list_item>  ')'

     cause the list separators to be interleaved with the operands in the
     anonymous array in `$item[2]':

          [ 'a', '=>', '1', ',', 'b', '=>', '2' ]

     But the following version:

          list:      '('  <leftop: list_item /,|=>/ list_item>  ')'

     returns only the operators:

          [ 'a', '1', 'b', '2' ]
          
          Of course, none of the above specifications handle the case of an empty
          list, since the C<Less_Than_Special_Sequenceleftop:...Greater_Than_Special_Sequence> and C<Less_Than_Special_Sequencerightop:...Greater_Than_Special_Sequence> directives
          require at least a single right or left operand to match. To specify
          that the operator can match "trivially",
          it's necessary to add a C<(?)> qualifier to the directive:

          list:      '('  <leftop: list_item /(,|=>)/ list_item>(?)  ')'

     Note that in almost all the above examples, the first and third
     arguments of the `<leftop:...>' directive were the same subrule. That
     is because `<leftop:...>''s are frequently used to specify
     "separated" lists of the same type of item. To make such lists easier
     to specify, the following syntax:

          list:   element(s /,/)

     is exactly equivalent to:

          list:   <leftop: element /,/ element>

     Note that the separator must be specified as a raw pattern (i.e.  not
     a string or subrule).

Scored productions
     By default, Parse::RecDescent grammar rules always accept the first
     production that matches the input. But if two or more productions may
     potentially match the same input, choosing the first that does so may
     not be optimal.

     For example, if you were parsing the sentence "time flies like an
     arrow", you might use a rule like this:

          sentence: verb noun preposition article noun { [@item] }
                  | adjective noun verb article noun   { [@item] }
                  | noun verb preposition article noun { [@item] }

     Each of these productions matches the sentence, but the third one is
     the most likely interpretation. However, if the sentence had been
     "fruit flies like a banana", then the second production is probably
     the right match.

     To cater for such situtations, the `<score:...>' can be used.  The
     directive is equivalent to an unconditional `<reject>', except that
     it allows you to specify a "score" for the current production. If
     that score is numerically greater than the best score of any
     preceding production, the current production is cached for later
     consideration. If no later production matches, then the cached
     production is treated as having matched, and the value of the item
     immediately before its `<score:...>' directive is returned as the
     result.

     In other words, by putting a `<score:...>' directive at the end of
     each production, you can select which production matches using
     criteria other than specification order. For example:

          sentence: verb noun preposition article noun { [@item] } <score: sensible(@item)>
                  | adjective noun verb article noun   { [@item] } <score: sensible(@item)>
                  | noun verb preposition article noun { [@item] } <score: sensible(@item)>

     Now, when each production reaches its respective `<score:...>'
     directive, the subroutine `sensible' will be called to evaluate the
     matched items (somehow). Once all productions have been tried, the
     one which `sensible' scored most highly will be the one that is
     accepted as a match for the rule.

     The variable $score always holds the current best score of any
     production, and the variable $score_return holds the corresponding
     return value.

     As another example, the following grammar matches lines that may be
     separated by commas, colons, or semi-colons. This can be tricky if a
     colon-separated line also contains commas, or vice versa. The grammar
     resolves the ambiguity by selecting the rule that results in the
     fewest fields:

          line: seplist[sep=>',']  <score: -@{$item[1]}>
              | seplist[sep=>':']  <score: -@{$item[1]}>
              | seplist[sep=>" "]  <score: -@{$item[1]}>

          seplist: <skip:""> <leftop: /[^$arg{sep}]*/ "$arg{sep}" /[^$arg{sep}]*/>

     Note the use of negation within the `<score:...>' directive to ensure
     that the seplist with the most items gets the lowest score.

     As the above examples indicate, it is often the case that all
     productions in a rule use exactly the same `<score:...>' directive.
     It is tedious to have to repeat this identical directive in every
     production, so Parse::RecDescent also provides the `<autoscore:...>'
     directive.

     If an `<autoscore:...>' directive appears in any production of a
     rule, the code it specifies is used as the scoring code for every
     production of that rule, except productions that already end with an
     explicit `<score:...>' directive. Thus the rules above could be
     rewritten:

          line: <autoscore: -@{$item[1]}>
          line: seplist[sep=>',']
              | seplist[sep=>':']
              | seplist[sep=>" "]

          sentence: <autoscore: sensible(@item)>
                  | verb noun preposition article noun { [@item] }
                  | adjective noun verb article noun   { [@item] }
                  | noun verb preposition article noun { [@item] }

     Note that the `<autoscore:...>' directive itself acts as an
     unconditional `<reject>', and (like the `<rulevar:...>' directive) is
     pruned at compile-time wherever possible.

Dispensing with grammar checks
     During the compilation phase of parser construction,
     Parse::RecDescent performs a small number of checks on the grammar
     it's given. Specifically it checks that the grammar is not
     left-recursive, that there are no "insatiable" constructs of the form:

          rule: subrule(s) subrule

     and that there are no rules missing (i.e. referred to, but never
     defined).

     These checks are important during development, but can slow down
     parser construction in stable code. So Parse::RecDescent provides the
     <nocheck> directive to turn them off. The directive can only appear
     before the first rule definition, and switches off checking
     throughout the rest of the current grammar.

     Typically, this directive would be added when a parser has been
     thoroughly tested and is ready for release.

Subrule argument lists
----------------------

   It is occasionally useful to pass data to a subrule which is being
invoked. For example, consider the following grammar fragment:

     classdecl: keyword decl

     keyword:   'struct' | 'class';

     decl:      # WHATEVER

   The `decl' rule might wish to know which of the two keywords was used
(since it may affect some aspect of the way the subsequent declaration is
interpreted). Parse::RecDescent allows the grammar designer to pass data
into a rule, by placing that data in an *argument list* (that is, in
square brackets) immediately after any subrule item in a production.
Hence, we could pass the keyword to `decl' as follows:

     classdecl: keyword decl[ $item[1] ]

     keyword:   'struct' | 'class';

     decl:      # WHATEVER

   The argument list can consist of any number (including zero!) of
comma-separated Perl expressions. In other words, it looks exactly like a
Perl anonymous array reference. For example, we could pass the keyword,
the name of the surrounding rule, and the literal 'keyword' to `decl' like
so:

     classdecl: keyword decl[$item[1],$item[0],'keyword']

     keyword:   'struct' | 'class';

     decl:      # WHATEVER

   Within the rule to which the data is passed (`decl' in the above
examples) that data is available as the elements of a local variable
`@arg'. Hence `decl' might report its intentions as follows:

     classdecl: keyword decl[$item[1],$item[0],'keyword']

     keyword:   'struct' | 'class';

     decl:      { print "Declaring $arg[0] (a $arg[2])\n";
                  print "(this rule called by $arg[1])" }

   Subrule argument lists can also be interpreted as hashes, simply by
using the local variable `%arg' instead of `@arg'. Hence we could rewrite
the previous example:

     classdecl: keyword decl[keyword => $item[1],
                             caller  => $item[0],
                             type    => 'keyword']

     keyword:   'struct' | 'class';

     decl:      { print "Declaring $arg{keyword} (a $arg{type})\n";
                  print "(this rule called by $arg{caller})" }

   Both `@arg' and `%arg' are always available, so the grammar designer may
choose whichever convention (or combination of conventions) suits best.

   Subrule argument lists are also useful for creating "rule templates"
(especially when used in conjunction with the `<matchrule:...>'
directive). For example, the subrule:

     list:     <matchrule:$arg{rule}> /$arg{sep}/ list[%arg]
                     { $return = [ $item[1], @{$item[3]} ] }
         |     <matchrule:$arg{rule}>
                     { $return = [ $item[1]] }

   is a handy template for the common problem of matching a separated list.
For example:

     function: 'func' name '(' list[rule=>'param',sep=>';'] ')'

     param:    list[rule=>'name',sep=>','] ':' typename

     name:     /\w+/

     typename: name

   When a subrule argument list is used with a repeated subrule, the
argument list goes before the repetition specifier:

     list:   /some|many/ thing[ $item[1] ](s)

   The argument list is "late bound". That is, it is re-evaluated for every
repetition of the repeated subrule.  This means that each repeated attempt
to match the subrule may be passed a completely different set of arguments
if the value of the expression in the argument list changes between
attempts. So, for example, the grammar:

     { $::species = 'dogs' }

     pair:   'two' animal[$::species](s)

     animal: /$arg[0]/ { $::species = 'cats' }

   will match the string "two dogs cats cats" completely, whereas it will
only match the string "two dogs dogs dogs" up to the eighth letter. If the
value of the argument list were "early bound" (that is, evaluated only the
first time a repeated subrule match is attempted), one would expect the
matching behaviours to be reversed.

   Of course, it is possible to effectively "early bind" such argument
lists by passing them a value which does not change on each repetition.
For example:

     { $::species = 'dogs' }

     pair:   'two' { $::species } animal[$item[2]](s)

     animal: /$arg[0]/ { $::species = 'cats' }

   Arguments can also be passed to the start rule, simply by appending them
to the argument list with which the start rule is called (after the "line
number" parameter). For example, given:

     $parser = new Parse::RecDescent ( $grammar );

     $parser->data($text, 1, "str", 2, \@arr);

     #             ^^^^^  ^  ^^^^^^^^^^^^^^^
     #               |    |         |
     # TEXT TO BE PARSED  |         |
     # STARTING LINE NUMBER         |
     # ELEMENTS OF @arg WHICH IS PASSED TO RULE data

   then within the productions of the rule data, the array `@arg' will
contain `("str", 2, \@arr)'.

Alternations
------------

   Alternations are implicit (unnamed) rules defined as part of a
production. An alternation is defined as a series of '|'-separated
productions inside a pair of round brackets. For example:

     character: 'the' ( good | bad | ugly ) /dude/

   Every alternation implicitly defines a new subrule, whose
automatically-generated name indicates its origin:
"_alternation_<I>_of_production_<P>_of_rule<R>" for the appropriate values
of <I>, <P>, and <R>. A call to this implicit subrule is then inserted in
place of the brackets. Hence the above example is merely a convenient
short-hand for:

     character: 'the'
                _alternation_1_of_production_1_of_rule_character
                /dude/

     _alternation_1_of_production_1_of_rule_character:
                good | bad | ugly

   Since alternations are parsed by recursively calling the parser
generator, any type(s) of item can appear in an alternation. For example:

     character: 'the' ( 'high' "plains"      # Silent, with poncho
                      | /no[- ]name/         # Silent, no poncho
                      | vengeance_seeking    # Poncho-optional
                      | <error>
                      ) drifter

   In this case, if an error occurred, the automatically generated message
would be:

     ERROR (line <N>): Invalid implicit subrule: Expected
                       'high' or /no[- ]name/ or generic,
                       but found "pacifist" instead

   Since every alternation actually has a name, it's even possible to
extend or replace them:

     parser->Replace(
             "_alternation_1_of_production_1_of_rule_character:
                     'generic Eastwood'"
                     );

   More importantly, since alternations are a form of subrule, they can be
given repetition specifiers:

     character: 'the' ( good | bad | ugly )(?) /dude/

Incremental Parsing
-------------------

   Parse::RecDescent provides two methods - `Extend' and Replace - which
can be used to alter the grammar matched by a parser. Both methods take
the same argument as `Parse::RecDescent::new', namely a grammar
specification string

   `Parse::RecDescent::Extend' interprets the grammar specification and
adds any productions it finds to the end of the rules for which they are
specified. For example:

     $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";
     parser->Extend($add);

   adds two productions to the rule "name" (creating it if necessary) and
one production to the rule "desc".

   `Parse::RecDescent::Replace' is identical, except that it first resets
are rule specified in the additional grammar, removing any existing
productions.  Hence after:

     $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";
     parser->Replace($add);

   are are *only* valid "name"s and the one possible description.

   A more interesting use of the `Extend' and Replace methods is to call
them inside the action of an executing parser. For example:

     typedef: 'typedef' type_name identifier ';'
                    { $thisparser->Extend("type_name: '$item[3]'") }
            | <error>

     identifier: ...!type_name /[A-Za-z_]w*/

   which automatically prevents type names from being typedef'd, or:

     command: 'map' key_name 'to' abort_key
                    { $thisparser->Replace("abort_key: '$item[2]'") }
            | 'map' key_name 'to' key_name
                    { map_key($item[2],$item[4]) }
            | abort_key
                    { exit if confirm("abort?") }

     abort_key: 'q'

     key_name: ...!abort_key /[A-Za-z]/

   which allows the user to change the abort key binding, but not to
unbind it.

   The careful use of such constructs makes it possible to reconfigure a a
running parser, eliminating the need for semantic feedback by providing
syntactic feedback instead. However, as currently implemented, `Replace()'
and `Extend()' have to regenerate and re-eval the entire parser whenever
they are called. This makes them quite slow for large grammars.

   In such cases, the judicious use of an interpolated regex is likely to
be far more efficient:

     typedef: 'typedef' type_name/ identifier ';'
                    { $thisparser->{local}{type_name} .= "|$item[3]" }
            | <error>

     identifier: ...!type_name /[A-Za-z_]w*/

     type_name: /$thisparser->{local}{type_name}/

Precompiling parsers
--------------------

   Normally Parse::RecDescent builds a parser from a grammar at run-time.
That approach simplifies the design and implementation of parsing code,
but has the disadvantage that it slows the parsing process down - you have
to wait for Parse::RecDescent to build the parser every time the program
runs. Long or complex grammars can be particularly slow to build, leading
to unacceptable delays at start-up.

   To overcome this, the module provides a way of "pre-building" a parser
object and saving it in a separate module. That module can then be used to
create clones of the original parser.

   A grammar may be precompiled using the `Precompile' class method.  For
example, to precompile a grammar stored in the scalar $grammar, and
produce a class named PreGrammar in a module file named PreGrammar.pm, you
could use:

     use Parse::RecDescent;

     Parse::RecDescent->Precompile($grammar, "PreGrammar");

   The first argument is the grammar string, the second is the name of the
class to be built. The name of the module file is generated automatically
by appending ".pm" to the last element of the class name. Thus

     Parse::RecDescent->Precompile($grammar, "My::New::Parser");

   would produce a module file named Parser.pm.

   It is somewhat tedious to have to write a small Perl program just to
generate a precompiled grammar class, so Parse::RecDescent has some special
magic that allows you to do the job directly from the command-line.

   If your grammar is specified in a file named grammar, you can generate
a class named Yet::Another::Grammar like so:

     > perl -MParse::RecDescent - grammar Yet::Another::Grammar

   This would produce a file named `Grammar.pm' containing the full
definition of a class called Yet::Another::Grammar. Of course, to use that
class, you would need to put the `Grammar.pm' file in a directory named
`Yet/Another', somewhere in your Perl include path.

   Having created the new class, it's very easy to use it to build a
parser. You simply use the new module, and then call its new method to
create a parser object. For example:

     use Yet::Another::Grammar;
     my $parser = Yet::Another::Grammar->new();

   The effect of these two lines is exactly the same as:

     use Parse::RecDescent;

     open GRAMMAR_FILE, "grammar" or die;
     local $/;
     my $grammar = <GRAMMAR_FILE>;

     my $parser = Parse::RecDescent->new($grammar);

   only considerably faster.

   Note however that the parsers produced by either approach are exactly
the same, so whilst precompilation has an effect on *set-up* speed, it has
no effect on *parsing* speed. RecDescent 2.0 will address that problem.

A Metagrammar for Parse::RecDescent
-----------------------------------

   The following is a specification of grammar format accepted by
`Parse::RecDescent::new' (specified in the Parse::RecDescent grammar
format!):

     grammar    : components(s)

     component  : rule | comment

     rule       : "\n" identifier ":" production(s?)

     production : items(s)

     item       : lookahead(?) simpleitem
                | directive
                | comment

     lookahead  : '...' | '...!'                   # +'ve or -'ve lookahead
     
     simpleitem : subrule args(?)                  # match another rule
                | repetition                       # match repeated subrules
                | terminal                         # match the next input
                | bracket args(?)                  # match alternative items
                | action                           # do something

     subrule    : identifier                       # the name of the rule

     args       : {extract_codeblock($text,'[]')}  # just like a [...] array ref

     repetition : subrule args(?) howoften

     howoften   : '(?)'                            # 0 or 1 times
                | '(s?)'                           # 0 or more times
                | '(s)'                            # 1 or more times
                | /(\d+)[.][.](/\d+)/              # $1 to $2 times
                | /[.][.](/\d*)/                   # at most $1 times
                | /(\d*)[.][.])/                   # at least $1 times

     terminal   : /[/]([\][/]|[^/])*[/]/           # interpolated pattern
                | /"([\]"|[^"])*"/                 # interpolated literal
                | /'([\]'|[^'])*'/                 # uninterpolated literal

     action     : { extract_codeblock($text) }     # embedded Perl code

     bracket    : '(' Item(s) production(s?) ')'   # alternative subrules

     directive  : '<commit>'                       # commit to production
                | '<uncommit>'                     # cancel commitment
                | '<resync>'                       # skip to newline
                | '<resync:' pattern '>'           # skip <pattern>
                | '<reject>'                       # fail this production
                | '<reject:' condition '>'         # fail if <condition>
                | '<error>'                        # report an error
                | '<error:' string '>'             # report error as "<string>"
                | '<error?>'                       # error only if committed
                | '<error?:' string '>'            #   "    "    "    "
                | '<rulevar:' /[^>]+/ '>'          # define rule-local variable
                | '<matchrule:' string '>'         # invoke rule named in string

     identifier : /[a-z]\w*/i                      # must start with alpha

     comment    : /#[^\n]*/                        # same as Perl

     pattern    : {extract_bracketed($text,'<')}   # allow embedded "<..>"

     condition  : {extract_codeblock($text,'{<')}  # full Perl expression

     string     : {extract_variable($text)}        # any Perl variable
                | {extract_quotelike($text)}       #   or quotelike string
                | {extract_bracketed($text,'<')}   #   or balanced brackets

GOTCHAS
=======

   This section describes common mistakes that grammar writers seem to
make on a regular basis.

1. Expecting an error to always invalidate a parse
--------------------------------------------------

   A common mistake when using error messages is to write the grammar like
this:

     file: line(s)

     line: line_type_1
         | line_type_2
         | line_type_3
         | <error>

   The expectation seems to be that any line that is not of type 1, 2 or 3
will invoke the `<error>' directive and thereby cause the parse to fail.

   Unfortunately, that only happens if the error occurs in the very first
line.  The first rule states that a file is matched by one or more lines,
so if even a single line succeeds, the first rule is completely satisfied
and the parse as a whole succeeds. That means that any error messages
generated by subsequent failures in the line rule are quietly ignored.

   Typically what's really needed is this:

     file: line(s) eofile    { $return = $item[1] }

     line: line_type_1
         | line_type_2
         | line_type_3
         | <error>

     eofile: /^\Z/

   The addition of the `eofile' subrule  to the first production means that
a file only matches a series of successful line matches *that consume the
complete input text*. If any input text remains after the lines are
matched, there must have been an error in the last line. In that case the
`eofile' rule will fail, causing the entire file rule to fail too.

   Note too that `eofile' must match `/^\Z/' (end-of-text), not `/^\cZ/'
or `/^\cD/' (end-of-file).

   And don't forget the action at the end of the production. If you just
write:

     file: line(s) eofile

   then the value returned by the file rule will be the value of its last
item: `eofile'. Since `eofile' always returns an empty string on success,
that will cause the file rule to return that empty string. Apart from
returning the wrong value, returning an empty string will trip up code
such as:

     $parser->file($filetext) || die;

   (since "" is false).

   Remember that Parse::RecDescent returns undef on failure, so the only
safe test for failure is:

     defined($parser->file($filetext)) || die;

DIAGNOSTICS
===========

   Diagnostics are intended to be self-explanatory (particularly if you
use *-RD_HINT* (under *perl -s*) or define `$::RD_HINT' inside the
program).

   Parse::RecDescent currently diagnoses the following:

   * Invalid regular expressions used as pattern terminals (fatal error).

   * Invalid Perl code in code blocks (fatal error).

   * Lookahead used in the wrong place or in a nonsensical way (fatal
     error).

   * "Obvious" cases of left-recursion (fatal error).

   * Missing or extra components in a `<leftop>' or `<rightop>' directive.

   * Unrecognisable components in the grammar specification (fatal error).

   * "Orphaned" rule components specified before the first rule (fatal
     error) or after an `<error>' directive (level 3 warning).

   * Missing rule definitions (this only generates a level 3 warning,
     since you may be providing them later via
     `Parse::RecDescent::Extend()').

   * Instances where greedy repetition behaviour will almost certainly
     cause the failure of a production (a level 3 warning - see `"ON-GOING
     ISSUES AND FUTURE DIRECTIONS"' in this node below).

   * Attempts to define rules named 'Replace' or 'Extend', which cannot be
     called directly through the parser object because of the predefined
     meaning of `Parse::RecDescent::Replace' and
     `Parse::RecDescent::Extend'. (Only a level 2 warning is generated,
     since such rules can still be used as subrules).

   * Productions which consist of a single `<error?>' directive, and which
     therefore may succeed unexpectedly (a level 2 warning, since this
     might conceivably be the desired effect).

   * Multiple consecutive lookahead specifiers (a level 1 warning only,
     since their effects simply accumulate).

   * Productions which start with a `<reject>' or `<rulevar:...>'
     directive. Such productions are optimized away (a level 1 warning).

   * Rules which are autogenerated under `$::AUTOSTUB' (a level 1 warning).

AUTHOR
======

   Damian Conway (damian@conway.org)

BUGS AND IRRITATIONS
====================

   There are undoubtedly serious bugs lurking somewhere in this much code
:-) Bug reports and other feedback are most welcome.

   Ongoing annoyances include:

   * There's no support for parsing directly from an input stream.  If and
     when the Perl Gods give us regular expressions on streams, this
     should be trivial (ahem!) to implement.

   * The parser generator can get confused if actions aren't properly
     closed or if they contain particularly nasty Perl syntax errors
     (especially unmatched curly brackets).

   * The generator only detects the most obvious form of left recursion
     (potential recursion on the first subrule in a rule). More subtle
     forms of left recursion (for example, through the second item in a
     rule after a "zero" match of a preceding "zero-or-more" repetition,
     or after a match of a subrule with an empty production) are not found.

   * Instead of complaining about left-recursion, the generator should
     silently transform the grammar to remove it. Don't expect this
     feature any time soon as it would require a more sophisticated
     approach to parser generation than is currently used.

   * The generated parsers don't always run as fast as might be wished.

   * The meta-parser should be bootstrapped using Parse::RecDescent :-)

ON-GOING ISSUES AND FUTURE DIRECTIONS
=====================================

  1. Repetitions are "incorrigibly greedy" in that they will eat
     everything they can and won't backtrack if that behaviour causes a
     production to fail needlessly.  So, for example:

          rule: subrule(s) subrule

     will never succeed, because the repetition will eat all the subrules
     it finds, leaving none to match the second item. Such constructions
     are relatively rare (and `Parse::RecDescent::new' generates a warning
     whenever they occur) so this may not be a problem, especially since
     the insatiable behaviour can be overcome "manually" by writing:

          rule: penultimate_subrule(s) subrule

          penultimate_subrule: subrule ...subrule

     The issue is that this construction is exactly twice as expensive as
     the original, whereas backtracking would add only 1/N to the cost (for
     matching N repetitions of `subrule'). I would welcome feedback on the
     need for backtracking; particularly on cases where the lack of it
     makes parsing performance problematical.

  2. Having opened that can of worms, it's also necessary to consider
     whether there is a need for non-greedy repetition specifiers. Again,
     it's possible (at some cost) to manually provide the required
     functionality:

          rule: nongreedy_subrule(s) othersubrule

          nongreedy_subrule: subrule ...!othersubrule

     Overall, the issue is whether the benefit of this extra functionality
     outweighs the drawbacks of further complicating the (currently
     minimalist) grammar specification syntax, and (worse) introducing
     more overhead into the generated parsers.

  3. An `<autocommit>' directive would be nice. That is, it would be
     useful to be able to say:

          command: <autocommit>
          command: 'find' name
                 | 'find' address
                 | 'do' command 'at' time 'if' condition
                 | 'do' command 'at' time
                 | 'do' command
                 | unusual_command

     and have the generator work out that this should be "pruned" thus:

          command: 'find' name
                 | 'find' <commit> address
                 | 'do' <commit> command <uncommit>
                          'at' time
                          'if' <commit> condition
                 | 'do' <commit> command <uncommit>
                          'at' <commit> time
                 | 'do' <commit> command
                 | unusual_command

     There are several issues here. Firstly, should the `<autocommit>'
     automatically install an `<uncommit>' at the start of the last
     production (on the grounds that the "command" rule doesn't know
     whether an "unusual_command" might start with "find" or "do") or
     should the "unusual_command" subgraph be analysed (to see if it
     *might* be viable after a "find" or "do")?

     The second issue is how regular expressions should be treated. The
     simplest approach would be simply to uncommit before them (on the
     grounds that they *might* match). Better efficiency would be obtained
     by analyzing all preceding literal tokens to determine whether the
     pattern would match them.

     Overall, the issues are: can such automated "pruning" approach a
     hand-tuned version sufficiently closely to warrant the extra set-up
     expense, and (more importantly) is the problem important enough to
     even warrant the non-trivial effort of building an automated solution?


COPYRIGHT
=========

   Copyright (c) 1997-2000, Damian Conway. All Rights Reserved.  This
module is free software. It may be used, redistributed and/or modified
under the terms of the Perl Artistic License   (see
http://www.perl.com/perl/misc/Artistic.html)


