This is Info file pm.info, produced by Makeinfo version 1.68 from the input file bigpm.texi.  File: pm.info, Node: Palm/PunchClock, Next: Palm/Raw, Prev: Palm/PDB, Up: Module List Perl extension for parsing PunchClock pdb files *********************************************** NAME ==== Palm::PunchClock - Perl extension for parsing PunchClock pdb files SYNOPSIS ======== use Palm::PDB use Palm::PunchClock; $pdb = new Palm::PDB; $pdb->Load("PC_Div-PClk.PDB"); DESCRIPTION =========== The Palm::PunchClock module does an attempt to parse PuchClock pdb files. PunchClock is a timemanagement program for PalmOS written by Psync, Inc. BUGS ==== Since this module was written in a few hours with no knowlegde of PunchClocks internal format I have only guessed at the format, thus it only parses the most vital data. Categories and such is ignored :-) AUTHOR ====== Peder Stray PunchClock is written by Psync, Inc. http://www.psync.com/ SEE ALSO ======== perl(1), Palm::PDB(3).  File: pm.info, Node: Palm/Raw, Next: Palm/StdAppInfo, Prev: Palm/PunchClock, Up: Module List Handler for "raw" Palm databases. ********************************* NAME ==== Palm::Raw - Handler for "raw" Palm databases. SYNOPSIS ======== use Palm::Raw; For standalone programs. use Palm::Raw(); @ISA = qw( Palm::Raw ); For Palm::PDB helper modules. DESCRIPTION =========== The Raw PDB handler is a helper class for the Palm::PDB package. It is intended as a generic handler for any database, or as a fallback default handler. If you have a standalone program and want it to be able to parse any type of database, use use Palm::Raw; If you are using Palm::Raw as a parent class for your own database handler, use use Palm::Raw(); If you omit the parentheses, Palm::Raw will register itself as the default handler for all databases, which is probably not what you want. The Raw handler does no processing on the database whatsoever. The AppInfo block, sort block, records and resources are simply strings, raw data from the database. By default, the Raw handler only handles record databases (.pdb files). If you want it to handle resource databases (.prc files) as well, you need to call &Palm::PDB::RegisterPRCHandlers("Palm::Raw", ""); in your script. AppInfo block ------------- $pdb->{appinfo} This is a scalar, the raw data of the AppInfo block. Sort block ---------- $pdb->{sort} This is a scalar, the raw data of the sort block. Records ------- @{$pdb->{records}}; Each element in the "records" array is a scalar, the raw data of that record. Resources --------- @{$pdb->{resources}}; Each element in the "resources" array is a scalar, the raw data of that resource. AUTHOR ====== Andrew Arensburger SEE ALSO ======== Palm::PDB(3)  File: pm.info, Node: Palm/StdAppInfo, Next: Palm/ToDo, Prev: Palm/Raw, Up: Module List Handles standard AppInfo block ****************************** NAME ==== Palm::StdAppInfo - Handles standard AppInfo block SYNOPSIS ======== package MyPDBHandler; use Palm::StdAppInfo(); @ISA = qw( Palm::StdAppInfo ); DESCRIPTION =========== Many Palm applications use a common format for keeping track of categories. The `Palm::StdAppInfo' class deals with this common format. A standard AppInfo block begins with: short renamed; // Bitmap of renamed category names char labels[16][16]; // Array of category names char uniqueIDs[16]; // Category IDs char lastUniqueID; char padding; // For word alignment FUNCTIONS ========= seed_StdAppInfo --------------- &Palm::StdAppInfo::seed_StdAppInfo(\%appinfo); Creates the standard fields in an existing AppInfo hash. newStdAppInfo ------------- $appinfo = Palm::StdAppInfo->newStdAppInfo; Like seed_StdAppInfo, but creates the AppInfo hash and returns it. new --- $pdb = new Palm::StdAppInfo; Create a new PDB, initialized with nothing but a standard AppInfo block. There are very few reasons to use this, and even fewer good ones. parse_StdAppInfo ---------------- $len = &Palm::StdAppInfo::parse_StdAppInfo(\%appinfo, $data); This function is intended to be called from within a PDB helper class's ParseAppInfoBlock method. `parse_StdAppInfo()' parses a standard AppInfo block from the raw data $data and fills in the fields in `%appinfo'. It returns the number of bytes parsed. ParseAppInfoBlock ----------------- $pdb = new Palm::StdAppInfo; $pdb->ParseAppInfoBlock($data); If your application's AppInfo block contains standard category support and nothing else, you may choose to just inherit this method instead of writing your own ParseAppInfoBlock method. pack_StdAppInfo --------------- $data = &Palm::StdAppInfo::pack_StdAppInfo(\%appinfo); This function is intended to be called from within a PDB helper class's PackAppInfoBlock method. pack_StdAppInfo takes an AppInfo hash and packs it as a string of raw data that can be written to a PDB. PackAppInfoBlock ---------------- $pdb = new Palm::StdAppInfo; $data = $pdb->PackAppInfoBlock(); If your application's AppInfo block contains standard category support and nothing else, you may choose to just inherit this method instead of writing your own PackAppInfoBlock method. AUTHOR ====== Andrew Arensburger SEE ALSO ======== Palm::PDB(3)  File: pm.info, Node: Palm/ToDo, Next: Parallel/ForkManager, Prev: Palm/StdAppInfo, Up: Module List Handler for Palm ToDo databases. ******************************** NAME ==== Palm::ToDo - Handler for Palm ToDo databases. SYNOPSIS ======== use Palm::ToDo; DESCRIPTION =========== The ToDo PDB handler is a helper class for the Palm::PDB package. It parses ToDo databases. AppInfo block ------------- The AppInfo block begins with standard category support. See *Note Palm/StdAppInfo: Palm/StdAppInfo, for details. Other fields include: $pdb->{appinfo}{dirty_appinfo} $pdb->{appinfo}{sortOrder} I don't know what these are. Sort block ---------- $pdb->{sort} This is a scalar, the raw data of the sort block. Records ------- $record = $pdb->{records}[N] $record->{due_day} $record->{due_month} $record->{due_year} The due date of the ToDo item. If the item has no due date, these are undefined. $record->{completed} This is defined and true iff the item has been completed. $record->{priority} An integer. The priority of the item. $record->{description} A text string. The description of the item. $record->{note} A text string. The note attached to the item. Undefined if the item has no note. new --- $pdb = new Palm::ToDo; Create a new PDB, initialized with the various Palm::ToDo fields and an empty record list. Use this method if you're creating a ToDo PDB from scratch. new_Record ---------- $record = $pdb->new_Record; Creates a new ToDo record, with blank values for all of the fields. AUTHOR ====== Andrew Arensburger SEE ALSO ======== Palm::PDB(3) Palm::StdAppInfo(3)  File: pm.info, Node: Parallel/ForkManager, Next: Parallel/MPI, Prev: Palm/ToDo, Up: Module List A simple parallel processing fork manager ***************************************** NAME ==== Parallel::ForkManager - A simple parallel processing fork manager SYNOPSIS ======== use Parallel::ForkManager; $pm = new Parallel::ForkManager($MAX_PROCESSES); foreach $data (@all_data) { # Forks and returns the pid for the child: my $pid = $pm->start and next; ... do some work with $data in the child process ... $pm->finish; # Terminates the child process } DESCRIPTION =========== This module is intended for use in operations that can be done in parallel where the number of processes to be forked off should be limited. Typical use is a downloader which will be retrieving hundreds/thousands of files. The code for a downloader would look something like this: use LWP::Simple; use Parallel::ForkManager; ... @links=( ["http://www.foo.bar/rulez.data","rulez_data.txt"], ["http://new.host/more_data.doc","more_data.doc"], ... ); ... # Max 30 processes for parallel download my $pm = new Parallel::ForkManager(30); foreach my $linkarray (@links) { $pm->start and next; # do the fork my ($link,$fn) = @$linkarray; warn "Cannot get $fn from $link" if getstore($link,$fn) != RC_OK; $pm->finish; # do the exit in the child process } $pm->wait_all_childs; First you need to instantiate the ForkManager with the "new" constructor. You must specify the maximum number of processes to be created. If you specify 0, then NO fork will be done; this is good for debugging purposes. Next, use $pm->start to do the fork. $pm returns 0 for the child process, and child pid for the parent process (see also `perlfunc(1p)' in this node). The "and next" skips the internal loop in the parent process. NOTE: $pm->start dies if the fork fails. $pm->finish terminates the child process (assuming a fork was done in the "start"). NOTE: You cannot use $pm->start if you are already in the child process. If you want to manage another set of subprocesses in the child process, you must instantiate another Parallel::ForkManager object! METHODS ======= new $processes Instantiate a new Parallel::ForkManager object. You must specify the maximum number of children to fork off. If you specify 0 (zero), then no children will be forked. This is intended for debugging purposes. start This method does the fork. It returns the pid of the child process for the parent, and 0 for the child process. If the $processes parameter for the constructor is 0 then, assuming you're in the child process, $pm->start simply returns 0. finish Closes the child process by exiting. If you use the program in debug mode ($processes == 0), this method doesn't do anything. wait_all_childs You can call this method to wait for all the processes which has been forked. This is a blocking wait. EXPERIMENTAL FEATURES ===================== There are callbacks in the code, which can be called on events like starting a process or on finish. This code is not tested at all, hence the lack of documentation. If you want to try these features, please look at the code and test them. Feel free to send me patches if you find something wrong. COPYRIGHT ========= Copyright (c) 2000 Szabó, Balázs (dLux) All right reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. AUTHOR ====== dLux (Szabó, Balázs) Noah Robin (documentation tweaks)  File: pm.info, Node: Parallel/MPI, Next: Params/Validate, Prev: Parallel/ForkManager, Up: Module List Perl interface to the MPI message passing system ************************************************ NAME ==== Parallel::MPI - Perl interface to the MPI message passing system SYNOPSIS ======== use Parallel::MPI; MPI_Init(); . . . MPI_Finalize(); DESCRIPTION =========== The following is a summary of the available constants and functions: Error Handling ============== If an MPI error occurs, set: $Parallel::MPI::errno $Parallel::MPI::errstr $Parallel::MPI::exceptions: if set, toss an exception when an error occurs. Exported constants ================== Datatypes (not all are supported!) MPI_2COMPLEX MPI_2DOUBLE_COMPLEX MPI_2DOUBLE_PRECISION MPI_2INT MPI_2INTEGER MPI_2REAL MPI_COMPLEX MPI_DATATYPE_NULL MPI_DOUBLE MPI_DOUBLE_COMPLEX MPI_DOUBLE_INT MPI_DOUBLE_PRECISION MPI_FLOAT MPI_FLOAT_INT MPI_INT MPI_INTEGER MPI_BYTE MPI_CHAR MPI_CHARACTER MPI_LOGICAL MPI_LONG MPI_LONG_DOUBLE MPI_LONG_DOUBLE_INT MPI_LONG_INT MPI_LONG_LONG_INT MPI_REAL MPI_SHORT MPI_SHORT_INT MPI_UNSIGNED MPI_UNSIGNED_CHAR MPI_UNSIGNED_LONG MPI_UNSIGNED_SHORT New Datatypes MPI_STRING Status MPI_ANY_SOURCE MPI_ANY_TAG Operations MPI_BAND MPI_BOR MPI_BXOR MPI_LAND MPI_LOR MPI_LXOR MPI_MAX MPI_MAXLOC MPI_MIN MPI_MINLOC MPI_OP_NULL MPI_PROD MPI_SUM Communicators MPI_COMM_NULL MPI_COMM_SELF MPI_COMM_WORLD Communicator and Group Comparisons MPI_CONGRUENT MPI_IDENT MPI_SIMILAR MPI_UNEQUAL MPI_VERSION Exported functions ================== MPI_Init() MPI_Finalize() MPI_Initialized() MPI_Comm_rank(communicator) MPI_Comm_size(communicator) MPI_Send(\$message, length, datatype, destination, tag, communicator) MPI_Recv(\$message, length, datatype, source, tag, communicator) MPI_Sendrecv(\$message, length, datatype, destination, tag, communicator) MPI_Barrier(comm) MPI_Bcast(\$from, count, datatype, root, communicator) MPI_Wtime() MPI_Wtick() MPI_Abort(communicator, errorcode) MPI_Reduce(\$from, \$to, count, datatype, operation, root, communicator) MPI_Allreduce(\$from, \$to, count, datatype, operation, communicator) MPI_Scatter(\$from, count, type, \$to, count, type, root, communicator) MPI_Gather(\$from, count, type, \$to, count, type, root, communicator) AUTHORS ======= Josh Wilmes and Chris Stevens SEE ALSO ======== MPI man pages. The paper, "Parallel::MPI - An MPI Binding for Perl", included in the Parallel::MPI distribution  File: pm.info, Node: Params/Validate, Next: Parse/CLex, Prev: Parallel/MPI, Up: Module List Validate method/function parameters *********************************** NAME ==== Params::Validate - Validate method/function parameters SYNOPSIS ======== use Params::Validate qw(:all); # takes named params (hash or hashref) sub foo { validate( @_, { foo => 1, # mandatory bar => 0, # optional } ); } # takes positional params sub bar { # first two are mandatory, third is optional validate_pos( @_, 1, 1, 0 ); } sub foo2 { validate( @_, { foo => # specify a type { type => ARRAYREF }, bar => # specify an interface { can => [ 'print', 'flush', 'frobnicate' ] }, baz => { type => SCALAR, # a scalar ... callbacks => # ... that is a plain integer ... { 'numbers only' => sub { shift() =~ /^\d+$/ }, # ... and smaller than 90 'less than 90' => sub { shift() < 90 }, }, } } ); } DESCRIPTION =========== The Params::Validate module allows you to validate method or function call parameters to an arbitrary level of specificity. At the simplest level, it is capable of validating the required parameters were given and that no unspecified additional parameters were passed in. It is also capable of determining that a parameter is of a specific type, that it is an object of a certain class hierarchy, that it possesses certain methods, or applying validation callbacks to arguments. EXPORT ------ The module always exports the validate and `validate_pos' methods. In addition, it can export the following constants, which are used as part of the type checking. These are SCALAR, `ARRAYREF', `HASHREF', CODEREF, `GLOB', `GLOBREF', and `SCALARREF', `UNDEF', OBJECT, and `HANDLE'. These are explained in the section on `Type Validation|Params::Validate' in this node. These constants are available via the tag `:types'. There is also a :all tag, which for now is equivalent to the `:types' tag. Finally, it is possible to import the ``set_options' in this node|"GLOBAL" OPTIONS' function, but only by requesting it explicitly, as it is not included in :all. The reason for this is that this function only needs to be called once per module and its name is potentially common enough that exporting it without an explicit request to do so seems bound to cause trouble. PARAMETER VALIDATION ==================== The validation mechanisms provided by this module can handle both named or positional parameters. For the most part, the same features are available for each. The biggest difference is the way that the validation specification is given to the relevant subroutine. The other difference is in the error messages produced when validation checks fail. When handling named parameters, the module is capable of handling either a hash or a hash reference transparently. Subroutines expecting named parameters should call the validate subroutine like this: validate( @_, { parameter1 => validation spec, parameter2 => validation spec, ... } ); Subroutines expected positional parameters should call the `validate_pos' subroutine like this: validate_pos( @_, { validation spec }, { validation spec } ); Mandatory/Optional Parameters ----------------------------- If you just want to specify that some parameters are mandatory and others are optional, this can be done very simply. For a subroutine expecting named parameters, you would do this: validate( @_, { foo => 1, bar => 1, baz => 0 } ); This says that the foo and bar parameters are mandatory and that the `baz' parameter is optional. The presence of any other parameters will cause an error. For a subroutine expecting positional parameters, you would do this: validate_pos( @_, 1, 1, 0, 0 ); This says that you expect at least 2 and no more than 4 parameters. If you have a subroutine that has a minimum number of parameters but can take any maximum number, you can do this: validate_pos( @_, 1, 1, (0) x @_ - 2 ); This will always be valid as long as at least two parameters are given. A similar construct could be used for the more complex validation parameters described further on. Please note that this: validate_pos( @_, 1, 1, 0, 1, 1 ); makes absolutely no sense, so don't do it. Any zeros must come at the end of the validation specification. Type Validation --------------- This module supports the following simple types, which can be `exported as constants|EXPORT' in this node: * SCALAR A scalar which is not a reference, such as 10 or `'hello''. A parameter that is undefined is not treated as a scalar. If you want to allow undefined values, you will have to specify `SCALAR | UNDEF'. * ARRAYREF An array reference such as `[1, 2, 3]' or `\@foo'. * HASHREF A hash reference such as `{ a =' 1, b => 2 }> or `\%bar'. * CODEREF A subroutine reference such as `\&foo_sub' or `sub { print "hello" }'. * GLOB This one is a bit tricky. A glob would be something like `*FOO', but not `\*FOO', which is a glob reference. It should be noted that this trick: my $fh = do { local *FH; }; makes $fh a glob, not a glob reference. On the other hand, the return value from `Symbol::gensym' is a glob reference. Either can be used as a file or directory handle. * GLOBREF A glob reference such as `\*FOO'. See the `GLOB|GLOB' in this node entry above for more details. * SCALARREF A reference to a scalar such as `\$x'. * UNDEF An undefined value * OBJECT A blessed reference. * HANDLE This option is special, in that it is just a shortcut for `GLOB | GLOBREF'. However, it seems likely that most people interested in either globs or glob references are likely to really be interested in whether what is being in is a potentially valid file or directory handle. To specify that a parameter must be of a given type when using named parameters, do this: validate( @_, { foo => { type => SCALAR }, bar => { type => HASHREF } } ); If a parameter can be of more than one type, just use the bitwise or (|) operator to combine them. validate( @_, { foo => { type => GLOB | GLOBREF } ); For positional parameters, this can be specified as follows: validate_pos( @_, { type => SCALAR | ARRAYREF }, { type => CODEREF } ); Interface Validation -------------------- To specify that a parameter is expected to have a certain set of methods, we can do the following: validate( @_, { foo => # just has to be able to ->bar { can => 'bar' } } ); ... or ... validate( @_, { foo => # must be able to ->bar and ->print { can => [ qw( bar print ) ] } } ); Class Validation ---------------- A word of warning. When constructing your external interfaces, it is probably better to specify what you methods you expect an object to have rather than what class it should be of (or a child of). This will make your API much more flexible. With that said, if you want to verify that an incoming parameter belongs to a class (or child class) or classes, do: validate( @_, { foo => { isa => 'My::Frobnicator' } } ); ... or ... validate( @_, { foo => { isa => [ qw( My::Frobnicator IO::Handle ) ] } } ); # must be both, not either! Callback Validation ------------------- If none of the above are enough, it is possible to pass in one or more callbacks to validate the parameter. The callback will be given the value of the parameter as its sole argument. Callbacks are specified as hash reference. The key is an id for the callback (used in error messages) and the value is a subroutine reference, such as: validate( @_, { foo => callbacks => { 'smaller than a breadbox' => sub { shift() < $breadbox }, 'green or blue' => sub { my $val = shift; $val eq 'green' || $val eq 'blue' } } } ); On a side note, I would highly recommend taking a look at Damian Conway's Regexp::Common module, which could greatly simply the callbacks you use, as it provides patterns useful for validating all sorts of data. Mandatory/Optional Revisited ---------------------------- If you want to specify something such as type or interface, plus the fact that a parameter can be optional, do this: validate( @_, { foo => { type => SCALAR }, bar => { type => ARRAYREF, optional => 1 } } ); or this for positional parameters: validate_pos( @_, { type => SCALAR }, { type => ARRAYREF, optional => 1 } ); By default, parameters are assumed to be mandatory unless specified as optional. USAGE NOTES =========== Method calls ------------ When using this module to validate the parameters passed to a method call, you will probably want to remove the class/object from the parameter list before calling validate or `validate_pos'. If your method expects named parameters, then this is necessary for the validate function to actually work, otherwise `@_' will not contain a hash, but rather your object (or class) *followed* by a hash. Thus the idiomatic usage of validate in a method call will look something like this: sub method { my $self = shift; validate( @_, { foo => 1, bar => { type => ARRAYREF } } ); my %params = @_; } "GLOBAL" OPTIONS ================ Because the calling syntax for the validate and `validate_pos' functions does not make it possible to specify any options other than the the validation spec, it is possible to set some options as pseudo-'globals'. These allow you to specify such things as whether or not the validation of named parameters should be case sensitive, for one example. These options are called pseudo-'globals' because these settings are *only applied to calls originating from the package that set the options*. In other words, if I am in package `Foo' and I call `Params::Validate::set_options', those options are only in effect when I call validate from package `Foo'. While this is quite different from how most other modules operate, I feel that this is necessary in able to make it possible for one module/application to use Params::Validate while still using other modules that also use Params::Validate, perhaps with different options set; The downside to this is that if you are writing an app with a standard calling style for all functions, and your app has ten modules, *each module must include a call to `Params::Validate::set_options'*. Options ------- * ignore_case => $boolean This is only relevant when dealing with named parameters. If it is true, then the validation code will ignore the case of parameters. Defaults to false. * strip_leading => $characters This too is only relevant when dealing with named parameters. If this is given then any parameters starting with these characters will be considered equivalent to parameters without them entirely. For example, if this is specified as '-', then `-foo' and foo would be considered identical. * allow_extra => $boolean If true, then the validation routine will allow extra parameters not named in the validation specification. In the case of positional parameters, this allows an unlimited number of maximum parameters (though a minimum may still be set). Defaults to false. * on_fail => $callback If given, this callback will be called whenever a validation check fails. It will be called with a single parameter, which will be a string describing the failure. This is useful if you wish to have this module throw exceptions as objects rather than as strings, for example. This callback is expected to die internally. If it does not, the validation will proceed onwards, with unpredictable results. The default is to simply use Perl's builtin die function. DISABLING VALIDATION ==================== ** This functionality may change in the future ** If the environment variable `NO_VALIDATION' is set to something true, then all calls to the validation functions are turned into no-ops. This may be useful if you only want to use this module during development but don't want the speed hit during production. I am not terribly happy with the current mechanism for doing this so this may change in the future. LIMITATIONS =========== Right now there is no way (short of a callback) to specify that something must be of one of a list of classes, or that it must possess one of a list of methods. If this is desired, it can be added in the future. Ideally, there would be only one validation function. If someone figures out how to do this, please let me know. SEE ALSO ======== Carp::Assert and Class::Contract. AUTHOR ====== Dave Rolsky,  File: pm.info, Node: Parse/CLex, Next: Parse/FixedLength, Prev: Params/Validate, Up: Module List Generator of lexical analyzers ****************************** NAME ==== `Parse::CLex' - Generator of lexical analyzers SYNOPSIS ======== See the `Parse::Lex' documentation. DESCRIPTION =========== See the `Parse::Lex' documentation. AUTHOR ====== Philippe Verdret. COPYRIGHT ========= Copyright (c) 1999 Philippe Verdret. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.  File: pm.info, Node: Parse/FixedLength, Next: Parse/Lex, Prev: Parse/CLex, Up: Module List SYNOPSIS ======== use Parse::FixedLength; $phone_number=8037814191; parse($phone_number, \%moms_phone, [ {'area_code' => 3}, {'exchange' => 3}, {'number' => 4} ] ); for (keys %moms_phone) { print $_, " ", $moms_phone{$_}, $/; } # yields $moms_phone{area_code} == 803 # $moms_phone{exchange} == 781 # $moms_phone{number} == 4191 =cut DESCRIPTION =========== The `Parse::FixedLength' module facilitates the process of breaking a string into its fixed-length components. PARSING ROUTINES ================ parse() parse($string_to_parse, $href_storing_parse, $LOH_parse_instructions) This function takes a string, a reference to a hash and a reference to a list of hashes and stores the results of fixed length parsing into the hash reference passed in. quick_parse() To facilitate the parsing of certain common fixed-length strings, the quick_parse() function takes the name of an LOH (list of hashes) containing formatting information, a string, and a reference to a hah in which to store parsing results. The currently available formatting routines are: `@us_phone' $phone_number=8882221234; Parse::FixedLength::quick_parse("us_phone",$phone_number, \%lncs_phone); `@us_ssan' `@MM_DD_YYYY' `@MM_DD_YY' `@YY_MM_DD' `@YYYY_MM_DD' print_parsed() This routine can be called after parsing to print a record of parse results. EXAMPLES ======== see SYNOPSIS AUTHOR ====== Terrence Brannon COPYRIGHT ========= This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.  File: pm.info, Node: Parse/Lex, Next: Parse/LexEvent, Prev: Parse/FixedLength, Up: Module List Generator of lexical analyzers ****************************** NAME ==== `Parse::Lex' - Generator of lexical analyzers SYNOPSIS ======== require 5.005; use Parse::Lex; @token = ( qw( ADDOP [-+] LEFTP [\(] RIGHTP [\)] INTEGER [1-9][0-9]* NEWLINE \n ), qw(STRING), [qw(" (?:[^"]+|"")* ")], qw(ERROR .*), sub { die qq!can\'t analyze: "$_[1]"!; } ); Parse::Lex->trace; # Class method $lexer = Parse::Lex->new(@token); $lexer->from(\*DATA); print "Tokenization of DATA:\n"; TOKEN:while (1) { $token = $lexer->next; if (not $lexer->eoi) { print "Line $.\t"; print "Type: ", $token->name, "\t"; print "Content:->", $token->text, "<-\n"; } else { last TOKEN; } } __END__ 1+2-5 "a multiline string with an embedded "" in it" an invalid string with a "" in it" DESCRIPTION =========== The classes `Parse::Lex' and `Parse::CLex' create lexical analyzers. They use different analysis techniques: 1. `Parse::Lex' steps through the analysis by moving a pointer within the character strings to be analyzed (use of `pos()' together with `\G'), 2. `Parse::CLex' steps through the analysis by consuming the data recognized (use of s///). Analyzers of the `Parse::CLex' class do not allow the use of anchoring in regular expressions. In addition, the subclasses of `Parse::Token' are not implemented for this type of analyzer. A lexical analyzer is specified by means of a list of tokens passed as arguments to the new() method. Tokens are instances of the `Parse::Token' class, which comes with `Parse::Lex'. The definition of a token usually comprises two arguments: a symbolic name (like INTEGER), followed by a regular expression. If a sub ref (anonymous subroutine) is given as third argument, it is called when the token is recognized. Its arguments are the `Parse::Token' instance and the string recognized by the regular expression. The anonymous subroutine's return value is used as the new string contents of the `Parse::Token' instance. The order in which the lexical analyzer examines the regular expressions is determined by the order in which these expressions are passed as arguments to the new() method. The token returned by the lexical analyzer corresponds to the first regular expression which matches (this strategy is different from that used by Lex, which returns the longest match possible out of all that can be recognized). The lexical analyzer can recognize tokens which span multiple records. If the definition of the token comprises more than one regular expression (placed within a reference to an anonymous array), the analyzer reads as many records as required to recognize the token (see the documentation for the `Parse::Token' class). When the start pattern is found, the analyzer looks for the end, and if necessary, reads more records. No backtracking is done in case of failure. The analyzer can be used to analyze an isolated character string or a stream of data coming from a file handle. At the end of the input data the analyzer returns a `Parse::Token' instance named `EOI' (End Of Input). Start Conditions ---------------- You can associate start conditions with the token-recognition rules that comprise your lexical analyzer (this is similar to what Flex provides). When start conditions are used, the rule which succeeds is no longer necessarily the first rule that matches. A token symbol may be preceded by a start condition specifier for the associated recognition rule. For example: qw(C1:TERMINAL_1 REGEXP), sub { # associated action }, qw(TERMINAL_2 REGEXP), sub { # associated action }, Symbol `TERMINAL_1' will be recognized only if start condition `C1' is active. Start conditions are activated/deactivated using the `start(CONDITION_NAME)' and `end(CONDITION_NAME)' methods. `start('INITIAL')' resets the analysis automaton. Start conditions can be combined using AND/OR operators as follows: C1:SYMBOL condition C1 C1:C2:SYMBOL condition C1 AND condition C2 C1,C2:SYMBOL condition C1 OR condition C2 There are two types of start conditions: inclusive and exclusive, which are declared by class methods `inclusive()' and `exclusive()' respectively. With an inclusive start condition, all rules are active regardless of whether or not they are qualified with the start condition. With an exclusive start condition, only the rules qualified with the start condition are active; all other rules are deactivated. Example (borrowed from the documentation of Flex): use Parse::Lex; @token = ( 'EXPECT', 'expect-floats', sub { $lexer->start('expect'); $_[1] }, 'expect:FLOAT', '\d+\.\d+', sub { print "found a float: $_[1]\n"; $_[1] }, 'expect:NEWLINE', '\n', sub { $lexer->end('expect') ; $_[1] }, 'NEWLINE2', '\n', 'INT', '\d+', sub { print "found an integer: $_[1] \n"; $_[1] }, 'DOT', '\.', sub { print "found a dot\n"; $_[1] }, ); Parse::Lex->exclusive('expect'); $lexer = Parse::Lex->new(@token); The special start condition ALL is always verified. Methods ------- analyze EXPR Analyzes EXPR and returns a list of pairs consisting of a token name followed by recognized text. EXPR can be a character string or a reference to a filehandle. Examples: @tokens = Parse::Lex->new(qw(PLUS [+] NUMBER \d+))->analyze("3+3+3"); @tokens = Parse::Lex->new(qw(PLUS [+] NUMBER \d+))->analyze(\*STREAM); buffer EXPR buffer Returns the contents of the internal buffer of the lexical analyzer. With an expression as argument, places the result of the expression in the buffer. It is not advisable to directly change the contents of the buffer without changing the position of the analysis pointer (`pos()') and the value length of the buffer (length()). configure(HASH) Instance method which permits specifying a lexical analyzer. This method accepts the list of the following attribute values: From => EXPR This attribute plays the same role as the `from(EXPR)' method. EXPR can be a filehandle or a character string. Tokens => ARRAY_REF `ARRAY_REF' must contain the list of attribute values specifying the tokens to be recognized (see the documentation for `Parse::Token'). Skip => REGEX This attribute plays the same role as the `skip(REGEX)' method. `REGEX' describes the patterns to skip over during the analysis. end EXPR Deactivates condition EXPR. eoi Returns TRUE when there is no more data to analyze. every SUB Avoids having to write a reading loop in order to analyze a stream of data. SUB is an anonymous subroutine executed after the recognition of each token. For example, to lex the string "1+2" you can write: use Parse::Lex; $lexer = Parse::Lex->new( qw( ADDOP [-+] INTEGER \d+ )); $lexer->from("1+2"); $lexer->every (sub { print $_[0]->name, "\t"; print $_[0]->text, "\n"; }); The first argument of the anonymous subroutine is the `Parse::Token' instance recognized. exclusive LIST Class method declaring the conditions present in LIST to be exclusive. flush If saving of the consumed strings is activated, flush() returns and clears the buffer containing the character strings recognized up to now. This is only useful if `hold()' has been called to activate saving of consumed strings. from EXPR from `from(EXPR)' allows specifying the source of the data to be analyzed. The argument of this method can be a string (or list of strings), or a reference to a filehandle. If no argument is given, `from()' returns the filehandle if defined, or undef if input is a string. When an argument EXPR is used, the return value is the calling lexer object itself. By default it is assumed that data are read from STDIN. Examples: $handle = new IO::File; $handle->open("< filename"); $lexer->from($handle); $lexer->from(\*DATA); $lexer->from('the data to be analyzed'); getSub getSub returns the anonymous subroutine that performs the lexical analysis. Example: my $token = ''; my $sub = $lexer->getSub; while (($token = &$sub()) ne $Token::EOI) { print $token->name, "\t"; print $token->text, "\n"; } # or my $token = ''; local *tokenizer = $lexer->getSub; while (($token = tokenizer()) ne $Token::EOI) { print $token->name, "\t"; print $token->text, "\n"; } getToken Same as `token()' method. hold EXPR hold Activates/deactivates saving of the consumed strings. The return value is the current setting (TRUE or FALSE). Can be used as a class method. You can obtain the contents of the buffer using the flush method, which also empties the buffer. inclusive LIST Class method declaring the conditions present in LIST to be inclusive. length EXPR length Returns the length of the current record. `length EXPR' sets the length of the current record. line EXPR line Returns the line number of the current record. `line EXPR' sets the value of the line number. Always returns 1 if a character string is being analyzed. The readline() method increments the line number. name EXPR name `name EXPR' lets you give a name to the lexical analyzer. name() return the value of this name. next Causes searching for the next token. Return the recognized `Parse::Token' instance. Returns the `Token::EOI' instance at the end of the data. Examples: $lexer = Parse::Lex->new(@token); print $lexer->next->name; # print the token type print $lexer->next->text; # print the token content nextis SCALAR_REF Variant of the next() method. Tokens are placed in `SCALAR_REF'. The method returns 1 as long as the token is not `EOI'. Example: while($lexer->nextis(\$token)) { print $token->text(); } new LIST Creates and returns a new lexical analyzer. The argument of the method is a list of `Parse::Token' instances, or a list of triplets permitting their creation. The triplets consist of: the symbolic name of the token, the regular expression necessary for its recognition, and possibly an anonymous subroutine that is called when the token is recognized. For each triplet, an instance of type `Parse::Token' is created in the calling package. offset Returns the number of characters already consumed since the beginning of the analyzed data stream. pos EXPR pos `pos EXPR' sets the position of the beginning of the next token to be recognized in the current line (this doesn't work with analyzers of the `Parse::CLex' class). `pos()' returns the number of characters already consumed in the current line. readline Reads data from the input specified by the `from()' method. Returns the result of the reading. Example: use Parse::Lex; $lexer = Parse::Lex->new(); while (not $lexer->eoi) { print $lexer->readline() # read and print one line } reset Clears the internal buffer of the lexical analyzer and erases all tokens already recognized. restart Reinitializes the analysis automaton. The only active condition becomes the condition `INITIAL'. setToken TOKEN Sets the token to `TOKEN'. Useful to requalify a token inside the anonymous subroutine associated with this token. skip EXPR skip EXPR is a regular expression defining the token separator pattern (by default `[ \t]+'). `skip('')' sets this to no pattern. With no argument, `skip()' returns the value of the pattern. `skip()' can be used as a class method. Changing the skip pattern causes recompilation of the lexical analyzer. Example: Parse::Lex->skip('\s*#(?s:.*)|\s+'); @tokens = Parse::Lex->new('INTEGER' => '\d+')->analyze(\*DATA); print "@tokens\n"; # print INTEGER 1 INTEGER 2 INTEGER 3 INTEGER 4 EOI __END__ 1 # first string to skip 2 3# second string to skip 4 start EXPR Activates condition EXPR. state EXPR Returns the state of the condition represented by EXPR. token Returns the instance corresponding to the last recognized token. In case no token was recognized, return the special token named DEFAULT. tokenClass EXPR tokenClass Indicates which is the class of the tokens to be created from the list passed as argument to the new() method. If no argument is given, returns the name of the class. By default the class is `Parse::Token'. trace OUTPUT trace Class method which activates trace mode. The activation of trace mode must take place before the creation of the lexical analyzer. The mode can then be deactivated by another call of this method. OUTPUT can be a file name or a reference to a filehandle where the trace will be redirected. ERROR HANDLING ============== To handle the cases of token non-recognition, you can define a specific token at the end of the list of tokens that comprise our lexical analyzer. If searching for this token succeeds, it is then possible to call an error handling function: qw(ERROR (?s:.*)), sub { print STDERR "ERROR: buffer content->", $_[0]->lexer->buffer, "<-\n"; die qq!can\'t analyze: "$_[1]"!; } EXAMPLES ======== ctokenizer.pl - Scan a stream of data using the `Parse::CLex' class. tokenizer.pl - Scan a stream of data using the `Parse::Lex' class. every.pl - Use of the every method. sexp.pl - Interpreter for prefix arithmetic expressions. sexpcond.pl - Interpeter for prefix arithmetic expressions, using conditions. BUGS ==== Analyzers of the `Parse::CLex' class do not allow the use of regular expressions with anchoring. SEE ALSO ======== `Parse::Token', `Parse::LexEvent', `Parse::YYLex'. AUTHOR ====== Philippe Verdret. Documentation translated to English by Vladimir Alexiev and Ocrat. ACKNOWLEDGMENTS =============== Version 2.0 owes much to suggestions made by Vladimir Alexiev. Ocrat has significantly contributed to improving this documentation. Thanks also to the numerous people who have sent me bug reports and occasionally fixes. REFERENCES ========== Friedl, J.E.F. Mastering Regular Expressions. O'Reilly & Associates 1996. Mason, T. & Brown, D. - Lex & Yacc. O'Reilly & Associates, Inc. 1990. FLEX - A Scanner generator (available at ftp://ftp.ee.lbl.gov/ and elsewhere) COPYRIGHT ========= Copyright (c) 1995-1999 Philippe Verdret. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.  File: pm.info, Node: Parse/LexEvent, Next: Parse/PerlConfig, Prev: Parse/Lex, Up: Module List Generator of event-oriented lexical analyzers (1.00 ALPHA) ********************************************************** NAME ==== `Parse::LexEvent' - Generator of event-oriented lexical analyzers (1.00 ALPHA) SYNOPSIS ======== use Parse::LexEvent; sub string { print $_[0]->name, ": $_[1]\n"; } sub comment { print $_[0]->name, ": $_[1]\n"; } sub remainder { print $_[0]->name, ": $_[1]\n"; } $lexer = Parse::LexEvent->new()->configure( From => \*DATA, Tokens => [ Type => 'Simple', Name => 'ccomment', Handler => 'comment', Regex => '//.*\n', Type => 'Delimited', Name => 'comment', Handler => 'comment', Start => '/[*]', End => '[*]/', Type => 'Quoted', Name => 'squotes', Handler => 'string', Quote => qq!\'!, Type => 'Quoted', Name => 'dquotes', Handler => 'string', Quote => qq!\"!, Type => 'Simple', Name => 'remainder', Regex => '(?s:[^/\'\"]+)', ReadMore => 1, ] )->parse(); __END__ /* C comment */ // C++ comment var d = "string in double quotes"; var s = 'string in single quotes'; var i = 10; var y = 100; DESCRIPTION =========== `Parse::LexEvent' generates lexical analyzers in the fashion of `Parse::Lex', but the generated analyzers emit an event at the finish of recognition of each token. This event corresponds to the call of a procedure whose name is that of the token. It is possible to give a different name to this procedure by making use of the Handler parameter when defining a token. An application using `Parse::LexEvent' must define the required procedures. These procedures take the token object as first argument and the recognized character string as the second. `Parse::LexEvent' inherits from `Parse::ALex' and possesses all the methods described in the documentation of the `Parse::Lex' class, except for the methods `analyze()', `every()' next(), and `nextis()'. Methods ------- parse() This method runs the analysis of data specified by `from()'. EXAMPLES ======== cparser.pl - This analyzer recognizes three types of structures: C ou C++ comments, strings within quotation marks, and the rest. It emits an event specific to each. You can use it, for example, to analyze C, C++ or Javascript programs. SEE ALSO ======== `Parse::Lex', `Parse::Token'. AUTHOR ====== Philippe Verdret. COPYRIGHT ========= Copyright (c) 1999 Philippe Verdret. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.