This is Info file pm.info, produced by Makeinfo version 1.68 from the
input file bigpm.texi.


File: pm.info,  Node: SQL/Schema/Trigger,  Next: SQL/Schema/View,  Prev: SQL/Schema/Table/Column,  Up: Module List

A database trigger
******************

NAME
====

   SQL::Schema::Trigger - A database trigger

SYNOPSIS
========

     my $trigger = SQL::Schema::Trigger->new(%attr);

     my $sql = $trigger->create_statement;
     print $sql;

     print "$trigger";

DESCRIPTION
===========

   `SQL::Schema::Trigger' is a class for objects representing a database
trigger. The methods of an instanciated object do allow to access the
information within a database's data dictionary and to represent them as
SQL create statements and the like using the proper SQL dialect.

Constructors
------------

     $trigger = SQL::Schema::Trigger->new(%attr);

   The new method instanciates a trigger object. The object is an in
memory representation of a (possible) database trigger.  The attributes
are given as key value pairs by the hash %attr.  Possible keys are:

     key               required?   value description

     trigger_name      yes         the name of the trigger (without a
                                   preceeding schema name)

     description       yes         the "head" of the trigger's create statement
                                   excluding `create trigger ' resp.
                                   `create or replace trigger ' but including
                                   everything from the trigger's name on up to
                                   (but not including) the when_clause

     when_clause       no          the when clause of the trigger everything
                                   from `when' until the start of the
                                   PL/SQL block (remember: the PL/SQL block
                                   starts with `declare' or `begin')

     trigger_body      yes         the PL/SQL block of the trigger including
                                   `declare', `begin' and `end;'

   All the keys correspond to the columns with the same name from Oracle's
data dictionary view `user_triggers'.

     $trigger = SQL::Schema::Trigger->select($dbh,$name);

   The select method fetches the attributes required by new from the
database and returns the trigger object. (It calls new internally.)

   If the trigger with the name $name could not be found within the
database, the method returns undef.

   The method's arguments are as follows:

$dbh
     A database handle as defined by `DBI(3)' in this node.

$name
     The name of the trigger without preceeding schema name.

Methods
-------

   The following attribute methods do return the current value of the
attributes (as handed over to the new method):

     $name = $trigger->name;
     $description = $trigger->description;
     $when_clause = $trigger->when_clause;
     $trigger_body = $trigger->trigger_body;

     $sql = $trigger->create_statement;
     $sql = "$trigger";

   Returns a string containing an SQL statements for creation of this
trigger. This method is overloaded with the string operator.  So the two
examples above are equivalent.

     $sql = $trigger->drop_statement;

   Returns a string containing an SQL statement that would drop this
trigger.

AUTHOR AND COPYRIGHT
====================

     SQL::Schema::Trigger is Copyright (C) 2000,
       Torsten Hentschel
       Windmuehlenweg 47
       44141 Dortmund
       Germany

     Email: todd@bayleys.ping.de

     All rights reserved.

     You may distribute this package under the terms of either the GNU
     General Public License or the Artistic License, as specified in the
     Perl README file.

SEE ALSO
========

   `DBI(3)' in this node, `SQL::Schema(3)' in this node,


File: pm.info,  Node: SQL/Schema/View,  Next: SQL/Statement,  Prev: SQL/Schema/Trigger,  Up: Module List

A database view
***************

NAME
====

   SQL::Schema::View - A database view

SYNOPSIS
========

     my $view = SQL::Schema::View->new(%attr);

     my $sql = $view->create_statement;
     print $sql;

     print "$view";

DESCRIPTION
===========

   `SQL::Schema::View' is a class for objects representing a database
view. The methods of an instanciated object do allow to access the
information within a database's data dictionary and to represent them as
SQL create statements and the like using the proper SQL dialect.

Constructors
------------

     $view = SQL::Schema::View->new(%attr);

   The new method instanciates a view object. The object is an in memory
representation of a (possible) database view.  The attributes are given as
key value pairs by the hash %attr.  Possible keys are:

     key              required?   value description

     view_name        yes         the name of the view (without a
                                  preceeding schema name)

     aliases          no          a reference to a list of names
                                  for the columns returned by the view

     subquery         yes         the SQL text of the subquery

     constraint_name  no          the name of the constraint for the
                                  "with check option"; WARNING: The
                                  subquery has to end with the text
                                  'with check option ' to make this work
                                  correctly

     $view = SQL::Schema::View->select($dbh,$name);

   The select method fetches the attributes required by new from the
database and returns the view object. (It calls new internally.)

   If the view with the name $name could not be found within the database,
the method returns undef.

   The method's arguments are as follows:

$dbh
     A database handle as defined by `DBI(3)' in this node.

$name
     The name of the view without preceeding schema name.

Methods
-------

   The following attribute methods do return the current value of the
attributes (as handed over to the new method):

     $name = $view->name;
     @aliases = $view->alisaes;
     $subquery = $view->subquery;
     $constraint_name = $view->constraint_name;

     $sql = $view->create_statement;
     $sql = "$view";

   Returns a string containing an SQL statements for creation of this
view. This method is overloaded with the string operator.  So the two
examples above are equivalent.

     $sql = $view->drop_statement;

   Returns a string containing an SQL statement that would drop this view.

AUTHOR AND COPYRIGHT
====================

     SQL::Schema::View is Copyright (C) 2000,
       Torsten Hentschel
       Windmuehlenweg 47
       44141 Dortmund
       Germany

     Email: todd@bayleys.ping.de

     All rights reserved.

     You may distribute this package under the terms of either the GNU
     General Public License or the Artistic License, as specified in the
     Perl README file.

SEE ALSO
========

   `DBI(3)' in this node, `SQL::Schema(3)' in this node,


File: pm.info,  Node: SQL/Statement,  Next: SWISH,  Prev: SQL/Schema/View,  Up: Module List

SQL parsing and processing engine
*********************************

NAME
====

   SQL::Statement - SQL parsing and processing engine

SYNOPSIS
========

     require SQL::Statement;

     # Create a parser
     my($parser) = SQL::Statement->new('Ansi');

     # Parse an SQL statement
     $@ = '';
     my ($stmt) = eval {
         SQL::Statement->new("SELECT id, name FROM foo WHERE id > 1",
     			    $parser);
     };
     if ($@) {
     	die "Cannot parse statement: $@";
     }

     # Query the list of result columns;
     my $numColums = $stmt->columns();  # Scalar context
     my @columns = $stmt->columns();    # Array context
     # @columns now contains SQL::Statement::Column instances

     # Likewise, query the tables being used in the statement:
     my $numTables = $stmt->tables();   # Scalar context
     my @tables = $stmt->tables();      # Array context
     # @tables now contains SQL::Statement::Table instances

     # Query the WHERE clause; this will retrieve an
     # SQL::Statement::Op instance
     my $where = $stmt->where();

     # Evaluate the WHERE clause with concrete data, represented
     # by an SQL::Eval object
     my $result = $stmt->eval_where($eval);

     # Execute a statement:
     $stmt->execute($data, $params);

DESCRIPTION
===========

   For installing the module, see `"INSTALLATION"' in this node below.

   The SQL::Statement module implements a small, abstract SQL engine. This
module is not usefull itself, but as a base class for deriving concrete
SQL engines. The implementation is designed to work fine with the DBI
driver DBD::CSV, thus probably not so well suited for a larger
environment, but I'd hope it is extendable without too much problems.

   By parsing an SQL query you create an SQL::Statement instance. This
instance offers methods for retrieving syntax, for WHERE clause and
statement evaluation.

Creating a parser object
------------------------

   What's accepted as valid SQL, depends on the parser object. There is a
set of so-called features that the parsers may have or not. Usually you
start with a builtin parser:

     my $parser = SQL::Parser->new($name, [ \%attr ]);

   Currently two parsers are builtin: The *Ansi* parser implements a proper
subset of ANSI SQL. (At least I hope so. :-) The SQL::Statement parser is
used by the DBD:CSV driver.

   You can query or set individual features. Currently available are:

create.type_blob
create.type_real
create.type_text
     These enable the respective column types in a *CREATE TABLE* clause.
     They are all disabled in the *Ansi* parser, but enabled in the
     SQL::Statement parser. Example:

select.join
     This enables the use of multiple tables in a SELECT statement, for
     example

          SELECT a.id, b.name FROM a, b WHERE a.id = b.id AND a.id = 2

   To enable or disable a feature, for example select.join, use the
following:

     # Enable feature
     $parser->feature("select", "join", 1);
     # Disable feature
     $parser->feature("select", "join", 0);

   Of course you can query features:

     # Query feature
     my $haveSelectJoin = $parser->feature("select", "join");

   The new method allows a shorthand for setting features. For example,
the following is equivalent to the SQL::Statement parser:

     $parser = SQL::Statement->new('Ansi',
     				{ 'create' => { 'type_text' => 1,
     						'type_real' => 1,
     						'type_blob' => 1 },
     				  'select' => { 'join' => 0 }});

Parsing a query
---------------

   A statement can be parsed with

     my $stmt = SQL::Statement->new($query, $parser);

   In case of syntax errors or other problems, the method throws a Perl
exception. Thus, if you want to catch exceptions, the above becomes

     $@ = '';
     my $stmt = eval { SQL::Statement->new($query, $parser) };
     if ($@) { print "An error occurred: $@"; }

   The accepted SQL syntax is restricted, though easily extendable. See
`SQL syntax' in this node below. See `Creating a parser object' in this
node above.

Retrieving query information
----------------------------

   The following methods can be used to obtain information about a query:

command
     Returns the SQL command, currently one of SELECT, INSERT, UPDATE,
     DELETE, CREATE or DROP, the last two referring to *CREATE TABLE* and
     *DROP TABLE*. See `SQL syntax' in this node below. Example:

          my $command = $stmt->command();

columns
          my $numColumns = $stmt->columns();  # Scalar context
          my @columnList = $stmt->columns();  # Array context
          my($col1, $col2) = ($stmt->columns(0), $stmt->columns(1));

     This method is used to retrieve column lists. The meaning depends on
     the query command:

          SELECT $col1, $col2, ... $colN FROM $table WHERE ...
          UPDATE $table SET $col1 = $val1, $col2 = $val2, ...
              $colN = $valN WHERE ...
          INSERT INTO $table ($col1, $col2, ..., $colN) VALUES (...)

     When used without arguments, the method returns a list of the columns
     $col1, $col2, ..., $colN, you may alternatively use a column number
     as argument. Note that the column list may be empty, like in

          INSERT INTO $table VALUES (...)

     and in CREATE or DROP statements.

     But what does "returning a column" mean? It is returning an
     SQL::Statement::Column instance, a class that implements the methods
     table and name, both returning the respective scalar. For example,
     consider the following statements:

          INSERT INTO foo (bar) VALUES (1)
          SELECT bar FROM foo WHERE ...
          SELECT foo.bar FROM foo WHERE ...

     In all these cases exactly one column instance would be returned with

          $col->name() eq 'bar'
          $col->table() eq 'foo'

tables
          my $tableNum = $stmt->tables();  # Scalar context
          my @tables = $stmt->tables();    # Array context
          my($table1, $table2) = ($stmt->tables(0), $stmt->tables(1));

     Similar to columns, this method returns instances of
     `SQL::Statement::Table'.  For UPDATE, DELETE, INSERT, CREATE and
     DROP, a single table will always be returned.  SELECT statements can
     return more than one table, in case of joins. Table objects offer a
     single method, name which returns the table name.

params
          my $paramNum = $stmt->params();  # Scalar context
          my @params = $stmt->params();    # Array context
          my($p1, $p2) = ($stmt->params(0), $stmt->params(1));

     The params method returns information about the input parameters used
     in a statement. For example, consider the following:

          INSERT INTO foo VALUES (?, ?)

     This would return two instances of SQL::Statement::Param. Param
     objects implement a single method, `$param-'num()>, which retrieves
     the parameter number. (0 and 1, in the above example). As of now, not
     very usefull ... :-)

row_values
          my $rowValueNum = $stmt->row_values(); # Scalar context
          my @rowValues = $stmt->row_values();   # Array context
          my($rval1, $rval2) = ($stmt->row_values(0),
          			  $stmt->row_values(1));

     This method is used for statements like

          UPDATE $table SET $col1 = $val1, $col2 = $val2, ...
              $colN = $valN WHERE ...
          INSERT INTO $table (...) VALUES ($val1, $val2, ..., $valN)

     to read the values $val1, $val2, ... $valN. It returns scalar values
     or SQL::Statement::Param instances.

Order
          my $orderNum = $stmt->order();   # Scalar context
          my @order = $stmt->order();      # Array context
          my($o1, $o2) = ($stmt->order(0), $stmt->order(1));

     In SELECT statements you can use this for looking at the ORDER
     clause. Example:

          SELECT * FROM FOO ORDER BY id DESC, name

     In this case, order could return 2 instances of SQL::Statement::Order.
     You can use the methods `$o->table()', `$o->column()' and
     `$o->desc()' to examine the order object.

where
          my $where = $stmt->where();

     This method is used to examine the syntax tree of the WHERE clause.
     It returns undef (if no WHERE clause was used) or an instance of
     SQL::Statement::Op. The Op instance offers 4 methods:

    op
          returns the operator, one of AND, `OR', =, `<>', `>=', `>',
          `<=', `<', `LIKE', `CLIKE' or `IS'.

    arg1

    arg2
          returns the left-hand and right-hand sides of the operator. This
          can be a scalar value, an SQL::Statement::Param object or yet
          another SQL::Statement::Op instance.

    neg
          returns a TRUE value, if the operation result must be negated
          after evalution.

     To evaluate the WHERE clause, fetch the topmost Op instance with the
     where method. Then evaluate the left-hand and right-hand side of the
     operation, perhaps recursively. Once that is done, apply the operator
     and finally negate the result, if required.

   To illustrate the above, consider the following WHERE clause:

     WHERE NOT (id > 2 AND name = 'joe') OR name IS NULL

   We can represent this clause by the following tree:

     (id > 2)   (name = 'joe')
            \   /
               NOT         AND
                \      (name IS NULL)
                 \    /
                   OR

   Thus the WHERE clause would return an SQL::Statement::Op instance with
the op() field set to 'OR'. The arg2() field would return another
SQL::Statement::Op instance with arg1() being the SQL::Statement::Column
instance representing id, the arg2() field containing the value undef
(NULL) and the op() field being 'IS'.

   The arg1() field of the topmost Op instance would return an Op instance
with op() eq 'AND' and neg() returning TRUE. The arg1() and arg2() fields
would be Op's representing "id > 2" and "name = 'joe'".

   Of course there's a ready-for-use method for WHERE clause evaluation:

Evaluating a WHERE clause
-------------------------

   The WHERE clause evaluation depends on an object being used for
fetching parameter and column values. Usually this can be an SQL::Eval
object, but in fact it can be any object that supplies the methods

     $val = $eval->param($paramNum);
     $val = $eval->column($table, $column);

   See *Note SQL/Eval: SQL/Eval, for a detailed description of these
methods.  Once you have such an object, you can call a

     $match = $stmt->eval_where($eval);

Evaluating queries
------------------

   So far all methods have been concrete. However, the interface for
executing and evaluating queries is abstract. That means, for using them
you have to derive a subclass from SQL::Statement that implements at least
certain missing methods and/or overwrites others. See the test.pl script
for an example subclass.

   Something that all methods have in common is that they simply throw a
Perl exception in case of errors.

execute
     After creating a statement, you must execute it by calling the execute
     method. Usually you put an eval statement around this call:

          $@ = '';
          my $rows = eval { $self->execute($data); };
          if ($@) { die "An error occurred!"; }

     In case of success the method returns the number of affected rows or
     -1, if unknown. Additionally it sets the attributes

          $self->{'NUM_OF_FIELDS'}
          $self->{'NUM_OF_ROWS'}
          $self->{'data'}

     the latter being an array ref of result rows. The argument $data is
     for private use by concrete subclasses and will be passed through to
     all methods. (It is intentionally not implemented as attribute:
     Otherwise we might well become self referencing data structures which
     could prevent garbage collection.)

CREATE
DROP
INSERT
UPDATE
DELETE
SELECT
     Called by execute for doing the real work. Usually they create an
     SQL::Eval object by calling `$self->open_tables()', call
     `$self->verify_columns()' and then do their job. Finally they return
     the triple

          ($self->{'NUM_OF_ROWS'}, $self->{'NUM_OF_FIELDS'},
           $self->{'data'})

     so that execute can setup these attributes. Example:

          ($self->{'NUM_OF_ROWS'}, $self->{'NUM_OF_FIELDS'},
           $self->{'data'}) = $self->SELECT($data);

verify_columns
     Called for verifying the row names that are used in the statement.
     Example:

          $self->verify_columns($eval, $data);

open_tables
     Called for creating an SQL::Eval object. In fact what it returns
     doesn't need to be derived from SQL::Eval, it's completely sufficient
     to implement the same interface of methods. See *Note SQL/Eval:
     SQL/Eval, for details. The arguments $data, `$createMode' and
     `$lockMode' are corresponding to those of
     SQL::Eval::Table::open_table and usually passed through. Example:

          my $eval = $self->open_tables($data, $createMode, $lockMode);

     The eval object can be used for calling `$self-'verify_columns> or
     `$self-'eval_where>.

open_table
     This method is completely abstract and *must* be implemented by
     subclasses.  The default implementation of `$self-'open_tables> calls
     this method for any table used by the statement. See the test.pl
     script for an example of imlplementing a subclass.

SQL syntax
==========

   The SQL::Statement module is far away from ANSI SQL or something
similar, it is designed for implementing the DBD::CSV module. See
`DBD::CSV(3)' in this node.

   I do not want to give a formal grammar here, more an informal
description: Read the statement definition in sql_yacc.y, if you need
something precise.

   The main lexical elements of the grammar are:

Integers
Reals
     Syntax obvious

Strings
     Surrounded by either single or double quotes; some characters need to
     be escaped with a backslash, in particular the backslash itself (\\),
     the NUL byte (\0), Line feeds (\n), Carriage return (\r), and the
     quotes (\' or \").

Parameters
     Parameters represent scalar values, like Integers, Reals and Strings
     do. However, their values are read inside Execute() and not inside
     Prepare(). Parameters are represented by question marks (?).

Identifiers
     Identifiers are table or column names. Syntactically they consist of
     alphabetic characters, followed by an arbitrary number of alphanumeric
     characters. Identifiers like SELECT, INSERT, INTO, ORDER, BY, WHERE,
     ... are forbidden and reserved for other tokens.

   What it offers is the following:

CREATE
------

   This is the CREATE TABLE command:

     CREATE TABLE $table ( $col1 $type1, ..., $colN $typeN,
                           [ PRIMARY KEY ($col1, ... $colM) ] )

   The column names are $col1, ... $colN. The column types can be INTEGER,
`CHAR(n)', `VARCHAR(n)', REAL or `BLOB'. These types are currently
completely ignored. So is the (optional) `PRIMARY KEY' clause.

DROP
----

   Very simple:

     DROP TABLE $table

INSERT
------

   This can be

     INSERT INTO $table [ ( $col1, ..., $colN ) ]
         VALUES ( $val1, ... $valN )

DELETE
------

     DELETE FROM $table [ WHERE $where_clause ]

   See `SELECT' in this node below for a decsription of $where_clause

UPDATE
------

     UPDATE $table SET $col1 = $val1, ... $colN = $valN
         [ WHERE $where_clause ]

   See `SELECT' in this node below for a decsription of $where_clause

SELECT
------

     SELECT [DISTINCT] $col1, ... $colN FROM $table
         [ WHERE $where_clause ] [ ORDER BY $ocol1, ... $ocolM ]

   The $where_clause is based on boolean expressions of the form $val1 $op
$val2, with $op being one of '=', '<>', '>', '<', '>=', '<=', 'LIKE',
'CLIKE' or IS. You may use OR, AND and brackets to combine such boolean
expressions or NOT to negate them.

INSTALLATION
============

   Like most other Perl modules, you simply do a

     perl Makefile.PL
     make		(nmake or dmake, if you are using Win32)
     make test		(Let me know, if any tests fail)
     make install

   Known problems are:

   * Some flavours of SCO Unix don't seem to have alloca() or something
     similar.  I recommend using gcc or egcs for compiling Perl and the
     SQL::Statement module: Both compilers have a builtin alloca().

     Another option could be to use external alloca.c, for example

          http://www.pu.informatik.th-darmstadt.de/FTP/pub/pu/alloca.c
          http://www.cs.purdue.edu/homes/young/src2www-example/alloca.c.html

     I did test neither of them and cannot give detailed instructions for
     including them into the SQL::Statement module. However, it should be
     sufficient to compile alloca.c with the same instructions than, for
     example, sql_yacc.c and finally repeat the linker command by
     inserting alloca.o after sql_yacc.o.

     Note that I cannot modify the sources to work without alloca(), as it
     is the bison parser that's using alloca() and I don't have the bison
     generated code in my hands.

     My thanks to Theo Petersen, <theo@acsp.com>, for pointing out this
     problem and the possible workarounds.

INTERNALS
=========

   Internally the module is splitted into three parts:

Perl-independent C part
-----------------------

   This part, contained in the files `sql_yacc.y', `sql_data.h',
`sql_data.c' and `sql_op.c', is completely independent from Perl.  It
might well be used from within another script language, Tcl say, or from a
true C application.

   You probably ask, why Perl independence? Well, first of all, I think
this is a valuable target in itself. But the main reason was the
impossibility to use the Perl headers inside bison generated code. The
Perl headers export almost the complete Yacc interface to XS, for whatever
reason, thus redefining constants and structures created by your own bison
code. :-(

Perl-dependent C part
---------------------

   This is contained in `Statement.xs'. The both C parts communicate via a
C structure sql_stmt_t. In fact, an SQL::Statement object is nothing else
than a pointer to such a structure. The XS calls columns(), Table(),
where(), ... do nothing more than fetching data from this structure and
converting it to Perl objects. See `The sql_stmt_t structure' in this node
below for details on the structure.

Perl part
---------

   Besides some stub functions for retrieving statement data, this is
mainly the query processing with the exception of WHERE clause evaluation.

The sql_stmt_t structure
------------------------

   This structure is designed for optimal performance. A typical query
will be parsed with only 4 or 5 malloc() calls; in particular no memory
will be aquired for storing strings; only pointers into the query string
are used.

   The statement stores its tokens in the values array. The array elements
are of type sql_val_t, a union, that can represent the most interesting
tokens; for example integers and reals are stored in the data.i and data.d
parts of the union, strings are stored in the data.str part, columns in
the data.col part and so on. Arrays are allocated in chunks of 64
elements, thus a single malloc() will be usually sufficient for allocating
the complete array. Some types use pointers into the values array: For
example, operations are stored in an sql_op_t structure that containes
elements arg1 and arg2 which are pointers into the value table, pointing
to other operations or scalars. These pointers are stored as indices, so
that the array can be extended using realloc().

   The sql_stmt_t structure contains other arrays: columns, tables,
rowvals, order, ... representing the data returned by the columns(),
tables(), row_values() and order() methods. All of these contain pointers
into the values array, again stored as integers.

   Arrays are initialized with the _InitArray call in SQL_Statement_Prepare
and deallocated with _DestroyArray in SQL_Statement_Destroy. Array
elements are obtained by calling _AllocData, which returns an index.  The
number -1 is used for errors or as a NULL value.

The WHERE clause evaluation
---------------------------

   A WHERE clause is evaluated by calling SQL_Statement_EvalWhere(). This
function is in the Perl independent part, but it needs the possibility to
retrieve data from the Perl part, for example column or parameter values.
These values are retrieved via callbacks, stored in the sql_eval_t
structure. The field stmt->evalData points to such a structure. Of course
the calling method can extend the sql_eval_t structure (like eval_where in
Statement.xs does) to include private data not used by
SQL_Statement_EvalWhere.

Features
--------

   Different parsers are implemented via the sql_parser_t structure. This
is mainly a set of yes/no flags. If you'd like to add features, do the
following:

   First of all, extend the sql_parser_t structure. If your feature is
part of a certain statement, place it into the statements section, for
example "select.join". Otherwise choose a section like "misc" or
"general". (There's no particular for the section design, but structure
never hurts.)

   Second, add your feature to sql_yacc.y. If your feature needs to extend
the lexer, do it like this:

     if (FEATURE(misc, myfeature) {
     	/*  Scan your new symbols  */
     	...
     }

   See the *BOOL* symbol as an example.

   If you need to extend the parser, do it like this:

     my_new_rule:
     	/*  NULL, old behaviour, doesn't use my feature  */
         | my_feature
             { YFEATURE(misc, myfeature); }
     ;

   Thus all parsers not having FEATURE(misc, myfeature) set will produce a
parse error here. Again, see the BOOL symbol for an example.

   Third thing is to extend the builtin parsers. If they support your
feature, add a 1, otherwise a 0. Currently there are two builtin parsers:
The *ansiParser* in sql_yacc.y and the sqlEvalParser in Statement.xs.

   Finally add support for your feature to the `feature' method in
Statement.xs. That's it!

MULTITHREADING
==============

   The complete module code is reentrant. In particular the parser is
created with `%pure_parser'. See `bison(1)' in this node for details on
reentrant parsers. That means, the module is ready for multithreading, as
long as you don't share handles between threads. Read-only handles, for
example parsers, can even be shared.

   Statement handles cannot be shared among threads, at least not, if you
don't grant serialized access. Per-thread handles are always safe.

AUTHOR AND COPYRIGHT
====================

   This module is Copyright (C) 1998 by

     Jochen Wiedmann
     Am Eisteich 9
     72555 Metzingen
     Germany

     Email: joe@ispsoft.de
     Phone: +49 7123 14887

   All rights reserved.

   You may distribute this module under the terms of either the GNU
General Public License or the Artistic License, as specified in the Perl
README file.

SEE ALSO
========

   `DBI(3)' in this node, `DBD::CSV(3)' in this node


File: pm.info,  Node: SWISH,  Next: SWISH/Fork,  Prev: SQL/Statement,  Up: Module List

Perl interface to the SWISH-E search engine.
********************************************

NAME
====

   SWISH - Perl interface to the SWISH-E search engine.

SYNOPSIS
========

     use SWISH;

     $sh = SWISH->connect('Fork',
         prog     => '/usr/local/bin/swish-e',
         indexes  => 'index.swish-e',
         results  => sub { print $_[1]->as_string,"\n" },
     );

     die $SWISH::errstr unless $sh;

     $hits = $sh->query('metaname=(foo or bar)');

     print $hits ? "Returned $hits documents\n" : 'failed query:' . $sh->errstr . "\n";

     # Variations

     $sh = SWISH->connect('Fork',
         prog     => '/usr/local/bin/swish-e',
         indexes  => \@indexes,
         results  => \&results,      # callback
         headers  => \&headers,
         maxhits  => 200,
         timeout  => 20,
         -e       => undef,      # add just a switch
     );

     $sh = SWISH->connect('Library', %parameters );
     $sh = SWISH->connect('Library', \%parameters );

     $sh = SWISH->connect('Server',
         port     => $port_number,
         host     => $host_name,
         %parameters,
     );

     $hits = $sh->query( $query_string );
     $hits = $sh->query( query => $query_string );

     $hits = $sh->query(
         query       => $query_string,
         results     => \&results,
         headers     => \&headers,
         properties  => [qw/title subject/],
         sortorder   => 'subject',
         startnum    => 100,
         maxhits     => 1000,
     );

     $error_msg = $sh->error unless $hits;

     # Unusual, but might want to use in your headers() callback.
     $sh->abort_query;

     @raw_results = $sh->raw_query( \%query_settings );

     $r = $sh->index( '/path/to/config' );
     $r = $sh->index( \%indexing_settings );

     # If all config settings were stored in the index header
     $r = $sh->reindex;

     $header_array_ref = $sh->indexheaders;

     # returns words as swish sees them for indexing
     $search_words = $sh->swish_words( \$doc );

     $stemmed = $sh->stem_word( $word );

     $sh->disconnect;
     # or an alias:
     $sh->close;


DESCRIPTION
===========

   NOTE: This is alpha code and is not to be used in a production
environment.  Testing and feedback on using this module is *gratefully
appreciated*.

   This module provides a standard interface to the SWISH-E search engine.
With this interface your program can use SWISH-E in the standard
forking/exec method, or with the SWISH-E C library routines, and, if ever
developed, the SWISH-E server with only a small change.

   The idea is that you can change the way your program accesses a SWISH-E
index without having to change your code.  Much, that is.

METHODS
=======

   Most methods will take either a hash, or a reference to a hash as a
named parameter list.  Parameters set in the connect() method will be
defaults, with parameters in other methods overriding the defaults.

connect
     `$sh = SWISH->connect( $access_method, \%params );'

     The connect method uses the `$access_method' to initiate a connection
     with SWISH-E.  What that means depends on the access method.  The
     return value is an object used to access methods below, or undefined
     if failed.  Errors may be retrieved with the package variable
     $SWISH::errstr.

     The SWISH module will load the driver for the type of access
     specified in the access method, if available, by loading the
     `SWISH::$access_method module'.

     Parameters are described below in PARAMETERS, but must include the
     path to the swish binary program if using the File access_method and
     index file(s).  (index files?)

query
     `$hits = $sh->query( query => $query, \%parameters );'

     The query method executes a query and returns the number of hits
     found.  `$hits' is undefined if there is an error.  The last error
     may be retrieved with `$sh->error'.

     query can be passed a single scalar as the search string, a hash, or
     a reference to a hash.  Parameters passed override the defaults
     specified in the connect method.

          Examples:
              $hits = $sh->query( 'foo or bar' );
              $hits = $sh->query( 'subject=(foo or bar)' );
              $hits = $sh->query( query => 'foo or bar' );
              $hits = $sh->query( %parameters );
              $hits = $sh->query( \%parameters );

     It is recommended to use a callback function to receive the search
     results.  See headers below.

raw_query
     A raw_query returns a list containing every output line from the
     query, including index header lines.  This can generate a large list,
     so using query with a callback function is recommended.

          Example:
              @results = $sh->raw_query('foo');

indexheaders
     The indexheaders method accesses the headers from the last query
     call.  Since SWISH may return more than one set of headers (e.g. when
     searching multiple indexes), this method returns a reference to an
     array of hashes.

          Example:
              foreach my $header_set ( @{$sh->indexheaders} ) {
                  print "\nHeaders:\n";
                  print "$_:$header_set->{$_}\n" for sort keys %$header_set;
              }

abort_query
     Calling $sh->abort_query within your callback handlers (results and
     headers) will terminate the current request.  You could also probably
     just die() and get the same results.

index
     ** To Be Implemented **

     The index method creates a swish index file.  You may pass index
     either a path to a SWISH-E configuration file, or reference to a hash
     with the index parameters stored in name => value pairs.

     The parameters in the hash will be written to a temporary file before
     indexing in with the Fork method.  If passing a reference to a hash,
     you may include a key tempfile that specifies the location of the
     temporary file.  Otherwise, /tmp will be assumed.

     If a parameter is not passed it will look in the object for an
     attribute named *indexparam*

reindex
     ** To Be Implemented? **

     This is a wish list method.  The idea is all the indexing parameters
     would be stored in the header on an index so to all one would need to
     do to reindex is call swish with the name of the index file.

stem_word
     ** To Be Implemented **

     stem_word returns the stem of the word passed.  This may be left to a
     separate module, but could be require()d on the fly.  The swish
     stemming routine is needed to highlight search terms when the index
     contains stemmed words.

swish_words
     ** To Be Implemented? **

     swish_words takes a scalar or a reference to a scalar and tokenizes
     the words as swish would do during indexing. The return value is a
     reference to an array where each element is a token.  Each token is
     also a reference to an array where the first element is the word, and
     the second element is a flag indicating if this is an indexable word.
     Confused?

     This requires HTML::Parser (HTML::TokeParser?) to be installed.

     The point of this is for enable phrase highlighting.  You can read
     your source and, if lucky, highlight phrase found in searches.

          Example:
              $words = $sh->swish_words( 'This is a phrase of words' );
                                            0 1 2345   6  7 89  10

          $words->[0][0] is 'This'
          $words->[0][1] is 1 indicating that swish would have this indexed
          $words->[0][2] is 0 this is swish word zero
          $words->[0][3] is the stemmed version of 'This', if using stemming.

          $words->[1][0] is ' '
          $words->[1][1] is 0 indicating that swish would not index
          $words->[1][2] is undef (not a word)
          
          $words->[2][0] is 'is'
          $words->[2][1] is 0 indicating that swish would not index (stop word)
          $words->[2][2] is undef (not a word)

          $words->[6][0] is 'phrase'
          $words->[6][1] is 1 indicating that it is a swish word
          $words->[6][2] is 2 this is the second swish word
                            ('is' and 'a' are stop words)

ACCESS METHODS
==============

   Two access methods are available:  `Fork' and `Library'.

   The Fork method requires a prog parameter passed to the connect class
method.  This parameter specifies the location of the swish-e executable
program

   The *Library* method does not require any special parameters, but does
require that the SWISH::Library module is installed and can be found
within @INC.

   The Server method is a proposed method to access a SWISH-E server.
Required parameters may include port, host, user, and password to gain
access to the SWISH-E server.

PARAMETERS
==========

   Parameters can be specified when starting a swish connection.  The
parameters are stored as defaults within the object and will be used on
each query, unless other overriding parameters are specified in an
individual method call.

   Most parameters have been given longer names (below).  But, any valid
parameter may be specified by using the standard dash followed by a
letter.  That is:

     maxhits => 100,

   is the same as

     -m      => 100,

   And to add just a switch without a parameter:     -e      => undef,

   Keep in mind that not all switches may work with all access methods.
The swish binary may have different options than the swish library.

prog
     prog defines the path to the swish executable.  This is only used in
     the Fork access method.

          Example:
              $parameters{ path } = '/usr/local/bin/swish-e';

indexes
     indexes defines the index files used in the next query or raw_query
     operation.

          Examples:
              $parameters{ indexes } = '/path/to/index.swish-e';
              $parameters{ indexes } = ['/path/to/index.swish-e', '/another/index'];

query
     query defines the search words (-w switch for SWISH-E)

          Example:
              $parameters{ query } = 'keywords=(apples or oranges) and subject=(trees)';

tags or context
     tags (or the alias context) is a string that defines where to look in
     a HTML document (-t switch)

properties
     properties defines which properties to return in the search results.
     Properties must be defined during indexing.  You must pass an array
     reference if using more than one property.

          Examples:
              $sh = query( query => 'foo', properties => 'title' );
              $sh = query( query => 'foo', properties => [qw/title subject/] );

maxhits
     Define the maximum number of results to return.  Currently, If you
     specify more than one index file maxhits is *per index file*.

startnum
     Defines the starting number in the results.  This is used for
     generating paged results.  Should there be pagesize and pagenum
     parameters?

sortorder
     Sorts the results based on properties listed.  Properties must be
     defined during indexing.  You may specify ascending or descending
     sorts in future version of swish.

          Example:
              $parameters{ sortorder } = 'subject';

          # under developement
          $parameters{ sortorder } = [qw/subject category/];
          $parameters{ sortorder } = [qw/subject asc category desc/];

start_date
     ** Not implemented **

     Specify a starting dates in unix seconds.  Only results after this
     date will be returned.

end_date
     ** Not implemented **

     Ending date in unix seconds.

results
     results defines a callback subroutine.  This routine is called for
     each result returned by a query.

          Example:
              $parameters{ results } = \&display_results
              $parameters{ results } = sub { print $_[1]->file, "\n" };

     Two paramaters are passed: the current search object (created by
     connect) and an object blessed into the SWISH::Results class.

          Example:

          sub display_results {
              my ($sh, $hit) = @_;

          # SWISH::Results attributes
          my @show = qw/score file title size position total_hits/;
          
          my %results = map { ($_, $hit->$_) } @show;
          my @properties = @{$hit->{properties}} if $hit->{properties};
          print join( ':', @results{ @show }, @properties ), "\n";
                  }

     The callback routines (results and headers) are called while inside
     an eval block.  If you die within your handlers the program will NOT
     exit, but any message you pass to die() will be available in
     $sh->errstr.  In general, do as little as possible with your callback
     routines.

     The SWISH::Results class is currently within the SWISH module.  This
     may change.

headers
     headers defines a callback subroutine.  This routine is called for
     each result returned by a query.

          Example:
              $parameters{ headers } = \&headers;

     Your subroutine is called with three parameters: the current object,
     and the header and value.

          sub headers {
              my ( $sh, $header, $value ) = @_;
              print "$header: $value\n";
          }

     In general, it will be better to call the headers method.

timeout
     timeout is the number of seconds to wait before aborting a query
     request.  Don't spend too much time in your results callback routine
     if you are using a timeout.  Timeout is emplemented as a $SIG{ALRM}
     handler and funny things happen with perl's signal handlers.

TO DO
=====

   How to detect a new index if library holds the file open?

   Is it ok to change index files on the same object?  Does the library
keep the index file open between requests?

   Interface for Windows platform?

SEE ALSO
========

   http://sunsite.berkeley.edu/SWISH-E/

AUTHOR
======

   Bill Moseley <moseley@hank.org>


File: pm.info,  Node: SWISH/Fork,  Next: SWISH/Stemmer,  Prev: SWISH,  Up: Module List

Perl extension for accessing the SWISH-E search engine via a fork/exec.
***********************************************************************

NAME
====

   SWISH::Fork - Perl extension for accessing the SWISH-E search engine
via a fork/exec.

SYNOPSIS
========

     use SWISH;

     $sh = SWISH->connect('Fork',
        prog     => '/usr/local/bin/swish-e',
        indexes  => 'index.swish-e',
        results  => sub { print $_[1]->as_string,"\n" },
     );

DESCRIPTION
===========

   This module is a driver for the SWISH search engine using the forked
access method.  Please see *Note SWISH: SWISH, for usage instructions.

REQUIRED MODULES
----------------

   SWISH - the front-end for module for accessing the SWISH search engine.

   Sys::Signal - Use instead of `local $SIG{ALRM}' to restore signal
handlers.  Should be fixed in Perl 5.6.1

   Symbol - localized file handles (standard module)

AUTHOR
======

   Bill Moseley - moseley@hank.org

SEE ALSO
========

   *Note SWISH: SWISH,


File: pm.info,  Node: SWISH/Stemmer,  Next: Safe,  Prev: SWISH/Fork,  Up: Module List

Perl extension for stemming words using a modified version of the  Stem routine from the SWISH search engine.  (http://sunsite.berkeley.edu/SWISH-E/)
*****************************************************************************************************************************************************

NAME
====

   Stemmer - Perl extension for stemming words using a modified version of
the Stem routine from the SWISH search engine.
(http://sunsite.berkeley.edu/SWISH-E/)

SYNOPSIS
========

     use SWISH::Stemmer;
     $stemmed_word = SwishStem( $word );

DESCRIPTION
===========

   This module provides access to the Stem() function used in SWISH-E to
stem words.  This module is needed if you wish to highlight words in
source documents.

   Not that this module includes the stemmer.c function from the SWISH-E
distribution.  You should make sure that the stemmer.c file contained here
is closely matched to the stemmer.c module in the SWISH-E distribution.

   Also, there is a SWISH-E library that will contain the Stem() and
SwishStem() functions.  So you may be able to access that library instead
of using this module.  Check the SWISH-E discussion list for more info.

     Purpose:    Implementation of the Porter stemming algorithm documented
                 in: Porter, M.F., "An Algorithm For Suffix Stripping,"
                 Program 14 (3), July 1980, pp. 130-137.

     Provenance: Written by B. Frakes and C. Cox, 1986.
                 Changed by C. Fox, 1990.
                    - made measure function a DFA
                    - restructured structs
                    - renamed functions and variables
                    - restricted function and variable scopes
                 Changed by C. Fox, July, 1991.
                    - added ANSI C declarations
                    - branch tested to 90% coverage

     Notes:      This code will make little sense without the the Porter
                 article.  The stemming function converts its input to
                 lower case.

AUTHOR
======

   Bill Moseley used the stemmer.c from the Swish-e distribution for this
module.

SEE ALSO
========

   perl(1)


File: pm.info,  Node: Safe,  Next: Safe/Hole,  Prev: SWISH/Stemmer,  Up: Module List

Compile and execute code in restricted compartments
***************************************************

NAME
====

   Safe - Compile and execute code in restricted compartments

SYNOPSIS
========

     use Safe;

     $compartment = new Safe;

     $compartment->permit(qw(time sort :browse));

     $result = $compartment->reval($unsafe_code);

DESCRIPTION
===========

   The Safe extension module allows the creation of compartments in which
perl code can be evaluated. Each compartment has

a new namespace
     The "root" of the namespace (i.e. "main::") is changed to a different
     package and code evaluated in the compartment cannot refer to
     variables outside this namespace, even with run-time glob lookups and
     other tricks.

     Code which is compiled outside the compartment can choose to place
     variables into (or share variables with) the compartment's namespace
     and only that data will be visible to code evaluated in the
     compartment.

     By default, the only variables shared with compartments are the
     "underscore" variables $_ and @_ (and, technically, the less
     frequently used %_, the _ filehandle and so on). This is because
     otherwise perl operators which default to $_ will not work and
     neither will the assignment of arguments to @_ on subroutine entry.

an operator mask
     Each compartment has an associated "operator mask". Recall that perl
     code is compiled into an internal format before execution.
     Evaluating perl code (e.g. via "eval" or "do 'file'") causes the code
     to be compiled into an internal format and then, provided there was
     no error in the compilation, executed.  Code evaluated in a
     compartment compiles subject to the compartment's operator mask.
     Attempting to evaluate code in a compartment which contains a masked
     operator will cause the compilation to fail with an error. The code
     will not be executed.

     The default operator mask for a newly created compartment is the
     ':default' optag.

     It is important that you read the Opcode(3) module documentation for
     more information, especially for detailed definitions of opnames,
     optags and opsets.

     Since it is only at the compilation stage that the operator mask
     applies, controlled access to potentially unsafe operations can be
     achieved by having a handle to a wrapper subroutine (written outside
     the compartment) placed into the compartment. For example,

          $cpt = new Safe;
          sub wrapper {
              # vet arguments and perform potentially unsafe operations
          }
          $cpt->share('&wrapper');

WARNING
=======

   The authors make *no warranty*, implied or otherwise, about the
suitability of this software for safety or security purposes.

   The authors shall not in any case be liable for special, incidental,
consequential, indirect or other similar damages arising from the use of
this software.

   Your mileage will vary. If in any doubt *do not use it*.

RECENT CHANGES
--------------

   The interface to the Safe module has changed quite dramatically since
version 1 (as supplied with Perl5.002). Study these pages carefully if you
have code written to use Safe version 1 because you will need to makes
changes.

Methods in class Safe
---------------------

   To create a new compartment, use

     $cpt = new Safe;

   Optional argument is (NAMESPACE), where NAMESPACE is the root namespace
to use for the compartment (defaults to "Safe::Root0", incremented for
each new compartment).

   Note that version 1.00 of the Safe module supported a second optional
parameter, MASK.  That functionality has been withdrawn pending deeper
consideration. Use the permit and deny methods described below.

   The following methods can then be used on the compartment object
returned by the above constructor. The object argument is implicit in each
case.

permit (OP, ...)
     Permit the listed operators to be used when compiling code in the
     compartment (in *addition* to any operators already permitted).

permit_only (OP, ...)
     Permit *only* the listed operators to be used when compiling code in
     the compartment (no other operators are permitted).

deny (OP, ...)
     Deny the listed operators from being used when compiling code in the
     compartment (other operators may still be permitted).

deny_only (OP, ...)
     Deny *only* the listed operators from being used when compiling code
     in the compartment (all other operators will be permitted).

trap (OP, ...)
untrap (OP, ...)
     The trap and untrap methods are synonyms for deny and permit
     respectfully.

share (NAME, ...)
     This shares the variable(s) in the argument list with the compartment.
     This is almost identical to exporting variables using the
     `Exporter(3)' in this node module.

     Each NAME must be the name of a variable, typically with the leading
     type identifier included. A bareword is treated as a function name.

     Examples of legal names are '$foo' for a scalar, '@foo' for an array,
     '%foo' for a hash, '&foo' or 'foo' for a subroutine and '*foo' for a
     glob (i.e.  all symbol table entries associated with "foo", including
     scalar, array, hash, sub and filehandle).

     Each NAME is assumed to be in the calling package. See share_from for
     an alternative method (which share uses).

share_from (PACKAGE, ARRAYREF)
     This method is similar to share() but allows you to explicitly name
     the package that symbols should be shared from. The symbol names
     (including type characters) are supplied as an array reference.

          $safe->share_from('main', [ '$foo', '%bar', 'func' ]);

varglob (VARNAME)
     This returns a glob reference for the symbol table entry of VARNAME in
     the package of the compartment. VARNAME must be the name of a
     variable without any leading type marker. For example,

          $cpt = new Safe 'Root';
          $Root::foo = "Hello world";
          # Equivalent version which doesn't need to know $cpt's package name:
          ${$cpt->varglob('foo')} = "Hello world";

reval (STRING)
     This evaluates STRING as perl code inside the compartment.

     The code can only see the compartment's namespace (as returned by the
     root method). The compartment's root package appears to be the
     `main::' package to the code inside the compartment.

     Any attempt by the code in STRING to use an operator which is not
     permitted by the compartment will cause an error (at run-time of the
     main program but at compile-time for the code in STRING).  The error
     is of the form "%s trapped by operation mask operation...".

     If an operation is trapped in this way, then the code in STRING will
     not be executed. If such a trapped operation occurs or any other
     compile-time or return error, then $@ is set to the error message,
     just as with an eval().

     If there is no error, then the method returns the value of the last
     expression evaluated, or a return statement may be used, just as with
     subroutines and *eval()*. The context (list or scalar) is determined
     by the caller as usual.

     This behaviour differs from the beta distribution of the Safe
     extension where earlier versions of perl made it hard to mimic the
     return behaviour of the eval() command and the context was always
     scalar.

     Some points to note:

     If the entereval op is permitted then the code can use eval "..." to
     'hide' code which might use denied ops. This is not a major problem
     since when the code tries to execute the eval it will fail because the
     opmask is still in effect. However this technique would allow clever,
     and possibly harmful, code to 'probe' the boundaries of what is
     possible.

     Any string eval which is executed by code executing in a compartment,
     or by code called from code executing in a compartment, will be eval'd
     in the namespace of the compartment. This is potentially a serious
     problem.

     Consider a function foo() in package pkg compiled outside a
     compartment but shared with it. Assume the compartment has a root
     package called 'Root'. If foo() contains an eval statement like eval
     '$foo = 1' then, normally, $pkg::foo will be set to 1.  If foo() is
     called from the compartment (by whatever means) then instead of
     setting $pkg::foo, the eval will actually set $Root::pkg::foo.

     This can easily be demonstrated by using a module, such as the Socket
     module, which uses eval "..." as part of an AUTOLOAD function. You can
     'use' the module outside the compartment and share an (autoloaded)
     function with the compartment. If an autoload is triggered by code in
     the compartment, or by any code anywhere that is called by any means
     from the compartment, then the eval in the Socket module's AUTOLOAD
     function happens in the namespace of the compartment. Any variables
     created or used by the eval'd code are now under the control of the
     code in the compartment.

     A similar effect applies to all runtime symbol lookups in code called
     from a compartment but not compiled within it.

rdo (FILENAME)
     This evaluates the contents of file FILENAME inside the compartment.
     See above documentation on the reval method for further details.

root (NAMESPACE)
     This method returns the name of the package that is the root of the
     compartment's namespace.

     Note that this behaviour differs from version 1.00 of the Safe module
     where the root module could be used to change the namespace. That
     functionality has been withdrawn pending deeper consideration.

mask (MASK)
     This is a get-or-set method for the compartment's operator mask.

     With no MASK argument present, it returns the current operator mask of
     the compartment.

     With the MASK argument present, it sets the operator mask for the
     compartment (equivalent to calling the deny_only method).

Some Safety Issues
------------------

   This section is currently just an outline of some of the things code in
a compartment might do (intentionally or unintentionally) which can have
an effect outside the compartment.

Memory
     Consuming all (or nearly all) available memory.

CPU
     Causing infinite loops etc.

Snooping
     Copying private information out of your system. Even something as
     simple as your user name is of value to others. Much useful
     information could be gleaned from your environment variables for
     example.

Signals
     Causing signals (especially SIGFPE and SIGALARM) to affect your
     process.

     Setting up a signal handler will need to be carefully considered and
     controlled.  What mask is in effect when a signal handler gets
     called?  If a user can get an imported function to get an exception
     and call the user's signal handler, does that user's restricted mask
     get re-instated before the handler is called?  Does an imported
     handler get called with its original mask or the user's one?

State Changes
     Ops such as chdir obviously effect the process as a whole and not just
     the code in the compartment. Ops such as rand and srand have a similar
     but more subtle effect.

AUTHOR
------

   Originally designed and implemented by Malcolm Beattie,
mbeattie@sable.ox.ac.uk.

   Reworked to use the Opcode module and other changes added by Tim Bunce
<`Tim.Bunce@ig.co.uk'>.