This is /home/pdm/install/Python-2.1/Doc/ext/python-ext.info, produced
by makeinfo version 4.0 from ext.texi.

   April 15, 2001		2.1


File: python-ext.info,  Node: Building Arbitrary Values,  Next: Reference Counts,  Prev: Keyword Parameters for Extension Functions,  Up: Extending Python with C or C++

Building Arbitrary Values
=========================

   This function is the counterpart to `PyArg_ParseTuple()'.  It is
declared as follows:

     PyObject *Py_BuildValue(char *format, ...);

   It recognizes a set of format units similar to the ones recognized by
`PyArg_ParseTuple()', but the arguments (which are input to the
function, not output) must not be pointers, just values.  It returns a
new Python object, suitable for returning from a C function called from
Python.

   One difference with `PyArg_ParseTuple()': while the latter requires
its first argument to be a tuple (since Python argument lists are
always represented as tuples internally), `Py_BuildValue()' does not
always build a tuple.  It builds a tuple only if its format string
contains two or more format units.  If the format string is empty, it
returns `None'; if it contains exactly one format unit, it returns
whatever object is described by that format unit.  To force it to
return a tuple of size 0 or one, parenthesize the format string.

   When memory buffers are passed as parameters to supply data to build
objects, as for the `s' and `s#' formats, the required data is copied.
Buffers provided by the caller are never referenced by the objects
created by `Py_BuildValue()'.  In other words, if your code invokes
`malloc()' and passes the allocated memory to `Py_BuildValue()', your
code is responsible for calling `free()' for that memory once
`Py_BuildValue()' returns.

   In the following description, the quoted form is the format unit; the
entry in (round) parentheses is the Python object type that the format
unit will return; and the entry in [square] brackets is the type of the
C value(s) to be passed.

   The characters space, tab, colon and comma are ignored in format
strings (but not within format units such as `s#').  This can be used
to make long format strings a tad more readable.

``s' (string) {[char * }]'
     Convert a null-terminated C string to a Python object.  If the C
     string pointer is `NULL', `None' is used.

``s#' (string) {[char *, int }]'
     Convert a C string and its length to a Python object.  If the C
     string pointer is `NULL', the length is ignored and `None' is
     returned.

``z' (string or `None') {[char * }]'
     Same as `s'.

``z#' (string or `None') {[char *, int }]'
     Same as `s#'.

``u' (Unicode string) {[Py_UNICODE * }]'
     Convert a null-terminated buffer of Unicode (UCS-2) data to a
     Python Unicode object.  If the Unicode buffer pointer is `NULL',
     `None' is returned.

``u#' (Unicode string) {[Py_UNICODE *, int }]'
     Convert a Unicode (UCS-2) data buffer and its length to a Python
     Unicode object.   If the Unicode buffer pointer is `NULL', the
     length is ignored and `None' is returned.

``i' (integer) {[int }]'
     Convert a plain C `int' to a Python integer object.

``b' (integer) {[char }]'
     Same as `i'.

``h' (integer) {[short int }]'
     Same as `i'.

``l' (integer) {[long int }]'
     Convert a C `long int' to a Python integer object.

``c' (string of length 1) {[char }]'
     Convert a C `int' representing a character to a Python string of
     length 1.

``d' (float) {[double }]'
     Convert a C `double' to a Python floating point number.

``f' (float) {[float }]'
     Same as `d'.

``D' (complex) {[Py_complex * }]'
     Convert a C `Py_complex' structure to a Python complex number.

``O' (object) {[PyObject * }]'
     Pass a Python object untouched (except for its reference count,
     which is incremented by one).  If the object passed in is a `NULL'
     pointer, it is assumed that this was caused because the call
     producing the argument found an error and set an exception.
     Therefore, `Py_BuildValue()' will return `NULL' but won't raise an
     exception.  If no exception has been raised yet,
     `PyExc_SystemError' is set.

``S' (object) {[PyObject * }]'
     Same as `O'.

``U' (object) {[PyObject * }]'
     Same as `O'.

``N' (object) {[PyObject * }]'
     Same as `O', except it doesn't increment the reference count on
     the object.  Useful when the object is created by a call to an
     object constructor in the argument list.

``O&' (object) {[CONVERTER, ANYTHING }]'
     Convert ANYTHING to a Python object through a CONVERTER function.
     The function is called with ANYTHING (which should be compatible
     with `void *') as its argument and should return a "new" Python
     object, or `NULL' if an error occurred.

``(ITEMS)' (tuple) {[MATCHING-ITEMS }]'
     Convert a sequence of C values to a Python tuple with the same
     number of items.

``[ITEMS ' (list) {[MATCHING-ITEMS]}]'
     Convert a sequence of C values to a Python list with the same
     number of items.

``{ITEMS}' (dictionary) {[MATCHING-ITEMS }]'
     Convert a sequence of C values to a Python dictionary.  Each pair
     of consecutive C values adds one item to the dictionary, serving
     as key and value, respectively.

   If there is an error in the format string, the `PyExc_SystemError'
exception is raised and `NULL' returned.

   Examples (to the left the call, to the right the resulting Python
value):

         Py_BuildValue("")                        None
         Py_BuildValue("i", 123)                  123
         Py_BuildValue("iii", 123, 456, 789)      (123, 456, 789)
         Py_BuildValue("s", "hello")              'hello'
         Py_BuildValue("ss", "hello", "world")    ('hello', 'world')
         Py_BuildValue("s#", "hello", 4)          'hell'
         Py_BuildValue("()")                      ()
         Py_BuildValue("(i)", 123)                (123,)
         Py_BuildValue("(ii)", 123, 456)          (123, 456)
         Py_BuildValue("(i,i)", 123, 456)         (123, 456)
         Py_BuildValue("[i,i]", 123, 456)         [123, 456]
         Py_BuildValue("{s:i,s:i}",
                       "abc", 123, "def", 456)    {'abc': 123, 'def': 456}
         Py_BuildValue("((ii)(ii)) (ii)",
                       1, 2, 3, 4, 5, 6)          (((1, 2), (3, 4)), (5, 6))


File: python-ext.info,  Node: Reference Counts,  Next: Writing Extensions in C++,  Prev: Building Arbitrary Values,  Up: Extending Python with C or C++

Reference Counts
================

   In languages like C or C++, the programmer is responsible for
dynamic allocation and deallocation of memory on the heap.  In C, this
is done using the functions `malloc()' and `free()'.  In C++, the
operators `new' and `delete' are used with essentially the same
meaning; they are actually implemented using `malloc()' and `free()',
so we'll restrict the following discussion to the latter.

   Every block of memory allocated with `malloc()' should eventually be
returned to the pool of available memory by exactly one call to
`free()'.  It is important to call `free()' at the right time.  If a
block's address is forgotten but `free()' is not called for it, the
memory it occupies cannot be reused until the program terminates.  This
is called a "memory leak".  On the other hand, if a program calls
`free()' for a block and then continues to use the block, it creates a
conflict with re-use of the block through another `malloc()' call.
This is called "using freed memory".  It has the same bad consequences
as referencing uninitialized data -- core dumps, wrong results,
mysterious crashes.

   Common causes of memory leaks are unusual paths through the code.
For instance, a function may allocate a block of memory, do some
calculation, and then free the block again.  Now a change in the
requirements for the function may add a test to the calculation that
detects an error condition and can return prematurely from the
function.  It's easy to forget to free the allocated memory block when
taking this premature exit, especially when it is added later to the
code.  Such leaks, once introduced, often go undetected for a long
time: the error exit is taken only in a small fraction of all calls,
and most modern machines have plenty of virtual memory, so the leak
only becomes apparent in a long-running process that uses the leaking
function frequently.  Therefore, it's important to prevent leaks from
happening by having a coding convention or strategy that minimizes this
kind of errors.

   Since Python makes heavy use of `malloc()' and `free()', it needs a
strategy to avoid memory leaks as well as the use of freed memory.  The
chosen method is called "reference counting".  The principle is simple:
every object contains a counter, which is incremented when a reference
to the object is stored somewhere, and which is decremented when a
reference to it is deleted.  When the counter reaches zero, the last
reference to the object has been deleted and the object is freed.

   An alternative strategy is called "automatic garbage collection".
(Sometimes, reference counting is also referred to as a garbage
collection strategy, hence my use of "automatic" to distinguish the
two.)  The big advantage of automatic garbage collection is that the
user doesn't need to call `free()' explicitly.  (Another claimed
advantage is an improvement in speed or memory usage -- this is no hard
fact however.)  The disadvantage is that for C, there is no truly
portable automatic garbage collector, while reference counting can be
implemented portably (as long as the functions `malloc()' and `free()'
are available -- which the C Standard guarantees).  Maybe some day a
sufficiently portable automatic garbage collector will be available for
C.  Until then, we'll have to live with reference counts.

* Menu:

* Reference Counting in Python::
* Ownership Rules::
* Thin Ice::
* NULL Pointers::


File: python-ext.info,  Node: Reference Counting in Python,  Next: Ownership Rules,  Prev: Reference Counts,  Up: Reference Counts

Reference Counting in Python
----------------------------

   There are two macros, `Py_INCREF(x)' and `Py_DECREF(x)', which
handle the incrementing and decrementing of the reference count.
`Py_DECREF()' also frees the object when the count reaches zero.  For
flexibility, it doesn't call `free()' directly -- rather, it makes a
call through a function pointer in the object's "type object".  For
this purpose (and others), every object also contains a pointer to its
type object.

   The big question now remains: when to use `Py_INCREF(x)' and
`Py_DECREF(x)'?  Let's first introduce some terms.  Nobody "owns" an
object; however, you can "own a reference" to an object.  An object's
reference count is now defined as the number of owned references to it.
The owner of a reference is responsible for calling `Py_DECREF()' when
the reference is no longer needed.  Ownership of a reference can be
transferred.  There are three ways to dispose of an owned reference:
pass it on, store it, or call `Py_DECREF()'.  Forgetting to dispose of
an owned reference creates a memory leak.

   It is also possible to "borrow"(1) a reference to an object.  The
borrower of a reference should not call `Py_DECREF()'.  The borrower
must not hold on to the object longer than the owner from which it was
borrowed.  Using a borrowed reference after the owner has disposed of
it risks using freed memory and should be avoided completely.(2)

   The advantage of borrowing over owning a reference is that you don't
need to take care of disposing of the reference on all possible paths
through the code -- in other words, with a borrowed reference you don't
run the risk of leaking when a premature exit is taken.  The
disadvantage of borrowing over leaking is that there are some subtle
situations where in seemingly correct code a borrowed reference can be
used after the owner from which it was borrowed has in fact disposed of
it.

   A borrowed reference can be changed into an owned reference by
calling `Py_INCREF()'.  This does not affect the status of the owner
from which the reference was borrowed -- it creates a new owned
reference, and gives full owner responsibilities (i.e., the new owner
must dispose of the reference properly, as well as the previous owner).

   ---------- Footnotes ----------

   (1) The metaphor of "borrowing" a reference is not completely
correct: the owner still has a copy of the reference.

   (2) Checking that the reference count is at least 1 *does not work*
-- the reference count itself could be in freed memory and may thus be
reused for another object!


File: python-ext.info,  Node: Ownership Rules,  Next: Thin Ice,  Prev: Reference Counting in Python,  Up: Reference Counts

Ownership Rules
---------------

   Whenever an object reference is passed into or out of a function, it
is part of the function's interface specification whether ownership is
transferred with the reference or not.

   Most functions that return a reference to an object pass on ownership
with the reference.  In particular, all functions whose function it is
to create a new object, e.g. `PyInt_FromLong()' and `Py_BuildValue()',
pass ownership to the receiver.  Even if in fact, in some cases, you
don't receive a reference to a brand new object, you still receive
ownership of the reference.  For instance, `PyInt_FromLong()' maintains
a cache of popular values and can return a reference to a cached item.

   Many functions that extract objects from other objects also transfer
ownership with the reference, for instance `PyObject_GetAttrString()'.
The picture is less clear, here, however, since a few common routines
are exceptions: `PyTuple_GetItem()', `PyList_GetItem()',
`PyDict_GetItem()', and `PyDict_GetItemString()' all return references
that you borrow from the tuple, list or dictionary.

   The function `PyImport_AddModule()' also returns a borrowed
reference, even though it may actually create the object it returns:
this is possible because an owned reference to the object is stored in
`sys.modules'.

   When you pass an object reference into another function, in general,
the function borrows the reference from you -- if it needs to store it,
it will use `Py_INCREF()' to become an independent owner.  There are
exactly two important exceptions to this rule: `PyTuple_SetItem()' and
`PyList_SetItem()'.  These functions take over ownership of the item
passed to them -- even if they fail!  (Note that `PyDict_SetItem()' and
friends don't take over ownership -- they are "normal.")

   When a C function is called from Python, it borrows references to its
arguments from the caller.  The caller owns a reference to the object,
so the borrowed reference's lifetime is guaranteed until the function
returns.  Only when such a borrowed reference must be stored or passed
on, it must be turned into an owned reference by calling `Py_INCREF()'.

   The object reference returned from a C function that is called from
Python must be an owned reference -- ownership is tranferred from the
function to its caller.


File: python-ext.info,  Node: Thin Ice,  Next: NULL Pointers,  Prev: Ownership Rules,  Up: Reference Counts

Thin Ice
--------

   There are a few situations where seemingly harmless use of a borrowed
reference can lead to problems.  These all have to do with implicit
invocations of the interpreter, which can cause the owner of a
reference to dispose of it.

   The first and most important case to know about is using
`Py_DECREF()' on an unrelated object while borrowing a reference to a
list item.  For instance:

     bug(PyObject *list) {
         PyObject *item = PyList_GetItem(list, 0);
     
         PyList_SetItem(list, 1, PyInt_FromLong(0L));
         PyObject_Print(item, stdout, 0); /* BUG! */
     }

   This function first borrows a reference to `list[0]', then replaces
`list[1]' with the value `0', and finally prints the borrowed
reference.  Looks harmless, right?  But it's not!

   Let's follow the control flow into `PyList_SetItem()'.  The list
owns references to all its items, so when item 1 is replaced, it has to
dispose of the original item 1.  Now let's suppose the original item 1
was an instance of a user-defined class, and let's further suppose that
the class defined a `__del__()' method.  If this class instance has a
reference count of 1, disposing of it will call its `__del__()' method.

   Since it is written in Python, the `__del__()' method can execute
arbitrary Python code.  Could it perhaps do something to invalidate the
reference to `item' in `bug()'?  You bet!  Assuming that the list
passed into `bug()' is accessible to the `__del__()' method, it could
execute a statement to the effect of `del list[0]', and assuming this
was the last reference to that object, it would free the memory
associated with it, thereby invalidating `item'.

   The solution, once you know the source of the problem, is easy:
temporarily increment the reference count.  The correct version of the
function reads:

     no_bug(PyObject *list) {
         PyObject *item = PyList_GetItem(list, 0);
     
         Py_INCREF(item);
         PyList_SetItem(list, 1, PyInt_FromLong(0L));
         PyObject_Print(item, stdout, 0);
         Py_DECREF(item);
     }

   This is a true story.  An older version of Python contained variants
of this bug and someone spent a considerable amount of time in a C
debugger to figure out why his `__del__()' methods would fail...

   The second case of problems with a borrowed reference is a variant
involving threads.  Normally, multiple threads in the Python
interpreter can't get in each other's way, because there is a global
lock protecting Python's entire object space.  However, it is possible
to temporarily release this lock using the macro
`Py_BEGIN_ALLOW_THREADS', and to re-acquire it using
`Py_END_ALLOW_THREADS'.  This is common around blocking I/O calls, to
let other threads use the CPU while waiting for the I/O to complete.
Obviously, the following function has the same problem as the previous
one:

     bug(PyObject *list) {
         PyObject *item = PyList_GetItem(list, 0);
         Py_BEGIN_ALLOW_THREADS
         ...some blocking I/O call...
         Py_END_ALLOW_THREADS
         PyObject_Print(item, stdout, 0); /* BUG! */
     }


File: python-ext.info,  Node: NULL Pointers,  Prev: Thin Ice,  Up: Reference Counts

NULL Pointers
-------------

   In general, functions that take object references as arguments do not
expect you to pass them `NULL' pointers, and will dump core (or cause
later core dumps) if you do so.  Functions that return object
references generally return `NULL' only to indicate that an exception
occurred.  The reason for not testing for `NULL' arguments is that
functions often pass the objects they receive on to other function --
if each function were to test for `NULL', there would be a lot of
redundant tests and the code would run more slowly.

   It is better to test for `NULL' only at the "source", i.e. when a
pointer that may be `NULL' is received, e.g. from `malloc()' or from a
function that may raise an exception.

   The macros `Py_INCREF()' and `Py_DECREF()' do not check for `NULL'
pointers -- however, their variants `Py_XINCREF()' and `Py_XDECREF()'
do.

   The macros for checking for a particular object type
(`PyTYPE_Check()') don't check for `NULL' pointers -- again, there is
much code that calls several of these in a row to test an object
against various different expected types, and this would generate
redundant tests.  There are no variants with `NULL' checking.

   The C function calling mechanism guarantees that the argument list
passed to C functions (`args' in the examples) is never `NULL' -- in
fact it guarantees that it is always a tuple.(1)

   It is a severe error to ever let a `NULL' pointer "escape" to the
Python user.

   ---------- Footnotes ----------

   (1)  These guarantees don't hold when you use the "old" style
calling convention -- this is still found in much existing code.


File: python-ext.info,  Node: Writing Extensions in C++,  Next: Providing a C API for an Extension Module,  Prev: Reference Counts,  Up: Extending Python with C or C++

Writing Extensions in C++
=========================

   It is possible to write extension modules in C++.  Some restrictions
apply.  If the main program (the Python interpreter) is compiled and
linked by the C compiler, global or static objects with constructors
cannot be used.  This is not a problem if the main program is linked by
the C++ compiler.  Functions that will be called by the Python
interpreter (in particular, module initalization functions) have to be
declared using `extern "C"'.  It is unnecessary to enclose the Python
header files in `extern "C" {...}' -- they use this form already if the
symbol `__cplusplus' is defined (all recent C++ compilers define this
symbol).


File: python-ext.info,  Node: Providing a C API for an Extension Module,  Prev: Writing Extensions in C++,  Up: Extending Python with C or C++

Providing a C API for an Extension Module
=========================================

   This section was written by Konrad Hinsen <hinsen@cnrs-orleans.fr>.
Many extension modules just provide new functions and types to be used
from Python, but sometimes the code in an extension module can be
useful for other extension modules. For example, an extension module
could implement a type "collection" which works like lists without
order. Just like the standard Python list type has a C API which
permits extension modules to create and manipulate lists, this new
collection type should have a set of C functions for direct
manipulation from other extension modules.

   At first sight this seems easy: just write the functions (without
declaring them `static', of course), provide an appropriate header
file, and document the C API. And in fact this would work if all
extension modules were always linked statically with the Python
interpreter. When modules are used as shared libraries, however, the
symbols defined in one module may not be visible to another module.
The details of visibility depend on the operating system; some systems
use one global namespace for the Python interpreter and all extension
modules (e.g. Windows), whereas others require an explicit list of
imported symbols at module link time (e.g. AIX), or offer a choice of
different strategies (most Unices). And even if symbols are globally
visible, the module whose functions one wishes to call might not have
been loaded yet!

   Portability therefore requires not to make any assumptions about
symbol visibility. This means that all symbols in extension modules
should be declared `static', except for the module's initialization
function, in order to avoid name clashes with other extension modules
(as discussed in section~*Note Module's Method Table and Initialization
Function::). And it means that symbols that _should_ be accessible from
other extension modules must be exported in a different way.

   Python provides a special mechanism to pass C-level information (i.e.
pointers) from one extension module to another one: CObjects.  A
CObject is a Python data type which stores a pointer (`void *').
CObjects can only be created and accessed via their C API, but they can
be passed around like any other Python object. In particular, they can
be assigned to a name in an extension module's namespace.  Other
extension modules can then import this module, retrieve the value of
this name, and then retrieve the pointer from the CObject.

   There are many ways in which CObjects can be used to export the C API
of an extension module. Each name could get its own CObject, or all C
API pointers could be stored in an array whose address is published in
a CObject. And the various tasks of storing and retrieving the pointers
can be distributed in different ways between the module providing the
code and the client modules.

   The following example demonstrates an approach that puts most of the
burden on the writer of the exporting module, which is appropriate for
commonly used library modules. It stores all C API pointers (just one
in the example!) in an array of `void' pointers which becomes the value
of a CObject. The header file corresponding to the module provides a
macro that takes care of importing the module and retrieving its C API
pointers; client modules only have to call this macro before accessing
the C API.

   The exporting module is a modification of the `spam' module from
section~*Note A Simple Example::. The function `spam.system()' does not
call the C library function `system()' directly, but a function
`PySpam_System()', which would of course do something more complicated
in reality (such as adding "spam" to every command). This function
`PySpam_System()' is also exported to other extension modules.

   The function `PySpam_System()' is a plain C function, declared
`static' like everything else:

     static int
     PySpam_System(command)
         char *command;
     {
         return system(command);
     }

   The function `spam_system()' is modified in a trivial way:

     static PyObject *
     spam_system(self, args)
         PyObject *self;
         PyObject *args;
     {
         char *command;
         int sts;
     
         if (!PyArg_ParseTuple(args, "s", &command))
             return NULL;
         sts = PySpam_System(command);
         return Py_BuildValue("i", sts);
     }

   In the beginning of the module, right after the line

     #include "Python.h"

   two more lines must be added:

     #define SPAM_MODULE
     #include "spammodule.h"

   The `#define' is used to tell the header file that it is being
included in the exporting module, not a client module. Finally, the
module's initialization function must take care of initializing the C
API pointer array:

     void
     initspam()
     {
         PyObject *m;
         static void *PySpam_API[PySpam_API_pointers];
         PyObject *c_api_object;
     
         m = Py_InitModule("spam", SpamMethods);
     
         /* Initialize the C API pointer array */
         PySpam_API[PySpam_System_NUM] = (void *)PySpam_System;
     
         /* Create a CObject containing the API pointer array's address */
         c_api_object = PyCObject_FromVoidPtr((void *)PySpam_API, NULL);
     
         if (c_api_object != NULL) {
             /* Create a name for this object in the module's namespace */
             PyObject *d = PyModule_GetDict(m);
     
             PyDict_SetItemString(d, "_C_API", c_api_object);
             Py_DECREF(c_api_object);
         }
     }

   Note that `PySpam_API' is declared `static'; otherwise the pointer
array would disappear when `initspam' terminates!

   The bulk of the work is in the header file `spammodule.h', which
looks like this:

     #ifndef Py_SPAMMODULE_H
     #define Py_SPAMMODULE_H
     #ifdef __cplusplus
     extern "C" {
     #endif
     
     /* Header file for spammodule */
     
     /* C API functions */
     #define PySpam_System_NUM 0
     #define PySpam_System_RETURN int
     #define PySpam_System_PROTO (char *command)
     
     /* Total number of C API pointers */
     #define PySpam_API_pointers 1
     
     #ifdef SPAM_MODULE
     /* This section is used when compiling spammodule.c */
     
     static PySpam_System_RETURN PySpam_System PySpam_System_PROTO;
     
     #else
     /* This section is used in modules that use spammodule's API */
     
     static void **PySpam_API;
     
     #define PySpam_System \
      (*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])
     
     #define import_spam() \
     { \
       PyObject *module = PyImport_ImportModule("spam"); \
       if (module != NULL) { \
         PyObject *module_dict = PyModule_GetDict(module); \
         PyObject *c_api_object = PyDict_GetItemString(module_dict, "_C_API"); \
         if (PyCObject_Check(c_api_object)) { \
           PySpam_API = (void **)PyCObject_AsVoidPtr(c_api_object); \
         } \
       } \
     }
     
     #endif
     
     #ifdef __cplusplus
     }
     #endif
     
     #endif /* !defined(Py_SPAMMODULE_H */

   All that a client module must do in order to have access to the
function `PySpam_System()' is to call the function (or rather macro)
`import_spam()' in its initialization function:

     void
     initclient()
     {
         PyObject *m;
     
         Py_InitModule("client", ClientMethods);
         import_spam();
     }

   The main disadvantage of this approach is that the file
`spammodule.h' is rather complicated. However, the basic structure is
the same for each function that is exported, so it has to be learned
only once.

   Finally it should be mentioned that CObjects offer additional
functionality, which is especially useful for memory allocation and
deallocation of the pointer stored in a CObject. The details are
described in the  in the section "CObjects" and in the implementation
of CObjects (files `Include/cobject.h' and `Objects/cobject.c' in the
Python source code distribution).


File: python-ext.info,  Node: Defining New Types,  Next: Building C and C++ Extensions on UNIX,  Prev: Extending Python with C or C++,  Up: Top

Defining New Types
******************

   This section was written by Michael Hudson <mwh21@cam.ac.uk>.
As mentioned in the last chapter, Python allows the writer of an
extension module to define new types that can be manipulated from
Python code, much like strings and lists in core Python.

   This is not hard; the code for all extension types follows a pattern,
but there are some details that you need to understand before you can
get started.

* Menu:

* Basics::
* Type Methods::


File: python-ext.info,  Node: Basics,  Next: Type Methods,  Prev: Defining New Types,  Up: Defining New Types

The Basics
==========

   The Python runtime sees all Python objects as variables of type
`PyObject*'.  A `PyObject' is not a very magnificent object - it just
contains the refcount and a pointer to the object's "type object".
This is where the action is; the type object determines which (C)
functions get called when, for instance, an attribute gets looked up on
an object or it is multiplied by another object.  I call these C
functions "type methods" to distinguish them from things like
`[].append' (which I will call "object methods" when I get around to
them).

   So, if you want to define a new object type, you need to create a new
type object.

   This sort of thing can only be explained by example, so here's a
minimal, but complete, module that defines a new type:

     #include <Python.h>
     
     staticforward PyTypeObject noddy_NoddyType;
     
     typedef struct {
         PyObject_HEAD
     } noddy_NoddyObject;
     
     static PyObject*
     noddy_new_noddy(PyObject* self, PyObject* args)
     {
         noddy_NoddyObject* noddy;
     
         if (!PyArg_ParseTuple(args,":new_noddy"))
             return NULL;
     
         noddy = PyObject_New(noddy_NoddyObject, &noddy_NoddyType);
     
         return (PyObject*)noddy;
     }
     
     static void
     noddy_noddy_dealloc(PyObject* self)
     {
         PyObject_Del(self);
     }
     
     static PyTypeObject noddy_NoddyType = {
         PyObject_HEAD_INIT(NULL)
         0,
         "Noddy",
         sizeof(noddy_NoddyObject),
         0,
         noddy_noddy_dealloc, /*tp_dealloc*/
         0,          /*tp_print*/
         0,          /*tp_getattr*/
         0,          /*tp_setattr*/
         0,          /*tp_compare*/
         0,          /*tp_repr*/
         0,          /*tp_as_number*/
         0,          /*tp_as_sequence*/
         0,          /*tp_as_mapping*/
         0,          /*tp_hash */
     };
     
     static PyMethodDef noddy_methods[] = {
         { "new_noddy", noddy_new_noddy, METH_VARARGS },
         {NULL, NULL}
     };
     
     DL_EXPORT(void)
     initnoddy(void)
     {
         noddy_NoddyType.ob_type = &PyType_Type;
     
         Py_InitModule("noddy", noddy_methods);
     }

   Now that's quite a bit to take in at once, but hopefully bits will
seem familiar from the last chapter.

   The first bit that will be new is:

     staticforward PyTypeObject noddy_NoddyType;

   This names the type object that will be defining further down in the
file.  It can't be defined here because its definition has to refer to
functions that have no yet been defined, but we need to be able to
refer to it, hence the declaration.

   The `staticforward' is required to placate various brain dead
compilers.

     typedef struct {
         PyObject_HEAD
     } noddy_NoddyObject;

   This is what a Noddy object will contain.  In this case nothing more
than every Python object contains - a refcount and a pointer to a type
object.  These are the fields the `PyObject_HEAD' macro brings in.  The
reason for the macro is to standardize the layout and to enable special
debugging fields to be brought in debug builds.

   For contrast

     typedef struct {
         PyObject_HEAD
         long ob_ival;
     } PyIntObject;

   is the corresponding definition for standard Python integers.

   Next up is:

     static PyObject*
     noddy_new_noddy(PyObject* self, PyObject* args)
     {
         noddy_NoddyObject* noddy;
     
         if (!PyArg_ParseTuple(args,":new_noddy"))
             return NULL;
     
         noddy = PyObject_New(noddy_NoddyObject, &noddy_NoddyType);
     
         return (PyObject*)noddy;
     }

   This is in fact just a regular module function, as described in the
last chapter.  The reason it gets special mention is that this is where
we create our Noddy object.  Defining PyTypeObject structures is all
very well, but if there's no way to actually _create_ one of the
wretched things it is not going to do anyone much good.

   Almost always, you create objects with a call of the form:

     PyObject_New(<type>, &<type object>);

   This allocates the memory and then initializes the object (i.e. sets
the reference count to one, makes the `ob_type' pointer point at the
right place and maybe some other stuff, depending on build options).
You _can_ do these steps separately if you have some reason to -- but
at this level we don't bother.

   We cast the return value to a `PyObject*' because that's what the
Python runtime expects.  This is safe because of guarantees about the
layout of structures in the C standard, and is a fairly common C
programming trick.  One could declare `noddy_new_noddy' to return a
`noddy_NoddyObject*' and then put a cast in the definition of
`noddy_methods' further down the file -- it doesn't make much
difference.

   Now a Noddy object doesn't do very much and so doesn't need to
implement many type methods.  One you can't avoid is handling
deallocation, so we find

     static void
     noddy_noddy_dealloc(PyObject* self)
     {
         PyObject_Del(self);
     }

   This is so short as to be self explanatory.  This function will be
called when the reference count on a Noddy object reaches `0' (or it is
found as part of an unreachable cycle by the cyclic garbage collector).
`PyObject_Del()' is what you call when you want an object to go away.
If a Noddy object held references to other Python objects, one would
decref them here.

   Moving on, we come to the crunch -- the type object.

     static PyTypeObject noddy_NoddyType = {
         PyObject_HEAD_INIT(NULL)
         0,
         "Noddy",
         sizeof(noddy_NoddyObject),
         0,
         noddy_noddy_dealloc, /*tp_dealloc*/
         0,                   /*tp_print*/
         0,                   /*tp_getattr*/
         0,                   /*tp_setattr*/
         0,                   /*tp_compare*/
         0,                   /*tp_repr*/
         0,                   /*tp_as_number*/
         0,                   /*tp_as_sequence*/
         0,                   /*tp_as_mapping*/
         0,                   /*tp_hash */
     };

   Now if you go and look up the definition of `PyTypeObject' in
`object.h' you'll see that it has many, many more fields that the
definition above.  The remaining fields will be filled with zeros by
the C compiler, and it's common practice to not specify them explicitly
unless you need them.

   This is so important that I'm going to pick the top of it apart still
further:

         PyObject_HEAD_INIT(NULL)

   This line is a bit of a wart; what we'd like to write is:

         PyObject_HEAD_INIT(&PyType_Type)

   as the type of a type object is "type", but this isn't strictly
conforming C and some compilers complain.  So instead we fill in the
`ob_type' field of `noddy_NoddyType' at the earliest oppourtunity -- in
`initnoddy()'.

         0,

   XXX why does the type info struct start PyObject_*VAR*_HEAD??

         "Noddy",

   The name of our type.  This will appear in the default textual
representation of our objects and in some error messages, for example:

     >>> "" + noddy.new_noddy()
     Traceback (most recent call last):
       File "<stdin>", line 1, in ?
     TypeError: cannot add type "Noddy" to string

         sizeof(noddy_NoddyObject),

   This is so that Python knows how much memory to allocate when you
call `PyObject_New'.

         0,

   This has to do with variable length objects like lists and strings.
Ignore for now...

   Now we get into the type methods, the things that make your objects
different from the others.  Of course, the Noddy object doesn't
implement many of these, but as mentioned above you have to implement
the deallocation function.

         noddy_noddy_dealloc, /*tp_dealloc*/

   From here, all the type methods are nil so I won't go over them yet -
that's for the next section!

   Everything else in the file should be familiar, except for this line
in `initnoddy':

         noddy_NoddyType.ob_type = &PyType_Type;

   This was alluded to above -- the `noddy_NoddyType' object should
have type "type", but `&PyType_Type' is not constant and so can't be
used in its initializer.  To work around this, we patch it up in the
module initialization.

   That's it!  All that remains is to build it; put the above code in a
file called `noddymodule.c' and

     from distutils.core import setup, Extension
     setup(name = "noddy", version = "1.0",
         ext_modules = [Extension("noddy", ["noddymodule.c"])])

   in a file called `setup.py'; then typing

     $ python setup.py build%$

   at a shell should produce a file `noddy.so' in a subdirectory; move
to that directory and fire up Python -- you should be able to `import
noddy' and play around with Noddy objects.

   That wasn't so hard, was it?


File: python-ext.info,  Node: Type Methods,  Prev: Basics,  Up: Defining New Types

Type Methods
============

   This section aims to give a quick fly-by on the various type methods
you can implement and what they do.

   Here is the definition of `PyTypeObject', with some fields only used
in debug builds omitted:

     typedef struct _typeobject {
         PyObject_VAR_HEAD
         char *tp_name; /* For printing */
         int tp_basicsize, tp_itemsize; /* For allocation */
     
         /* Methods to implement standard operations */
     
         destructor tp_dealloc;
         printfunc tp_print;
         getattrfunc tp_getattr;
         setattrfunc tp_setattr;
         cmpfunc tp_compare;
         reprfunc tp_repr;
     
         /* Method suites for standard classes */
     
         PyNumberMethods *tp_as_number;
         PySequenceMethods *tp_as_sequence;
         PyMappingMethods *tp_as_mapping;
     
         /* More standard operations (here for binary compatibility) */
     
         hashfunc tp_hash;
         ternaryfunc tp_call;
         reprfunc tp_str;
         getattrofunc tp_getattro;
         setattrofunc tp_setattro;
     
         /* Functions to access object as input/output buffer */
         PyBufferProcs *tp_as_buffer;
     
         /* Flags to define presence of optional/expanded features */
         long tp_flags;
     
         char *tp_doc; /* Documentation string */
     
         /* call function for all accessible objects */
         traverseproc tp_traverse;
     
         /* delete references to contained objects */
         inquiry tp_clear;
     
         /* rich comparisons */
         richcmpfunc tp_richcompare;
     
         /* weak reference enabler */
         long tp_weaklistoffset;
     
     } PyTypeObject;

   Now that's a _lot_ of methods.  Don't worry too much though - if you
have a type you want to define, the chances are very good that you will
only implement a handful of these.

   As you probably expect by now, I'm going to go over this
line-by-line, saying a word about each field as we get to it.

         char *tp_name; /* For printing */

   The name of the type - as mentioned in the last section, this will
appear in various places, almost entirely for diagnostic purposes.  Try
to choose something that will be helpful in such a situation!

         int tp_basicsize, tp_itemsize; /* For allocation */

   These fields tell the runtime how much memory to allocate when new
objects of this typed are created.  Python has some builtin support for
variable length structures (think: strings, lists) which is where the
`tp_itemsize' field comes in.  This will be dealt with later.

   Now we come to the basic type methods - the ones most extension types
will implement.

         destructor tp_dealloc;
         printfunc tp_print;
         getattrfunc tp_getattr;
         setattrfunc tp_setattr;
         cmpfunc tp_compare;
         reprfunc tp_repr;


File: python-ext.info,  Node: Building C and C++ Extensions on UNIX,  Next: Building C and C++ Extensions on Windows,  Prev: Defining New Types,  Up: Top

Building C and C++ Extensions on UNIX
*************************************

   This section was written by Jim Fulton <jim@Digicool.com>.
Starting in Python 1.4, Python provides a special make file for
building make files for building dynamically-linked extensions and
custom interpreters.  The make file make file builds a make file that
reflects various system variables determined by configure when the
Python interpreter was built, so people building module's don't have to
resupply these settings.  This vastly simplifies the process of
building extensions and custom interpreters on Unix systems.

   The make file make file is distributed as the file
`Misc/Makefile.pre.in' in the Python source distribution.  The first
step in building extensions or custom interpreters is to copy this make
file to a development directory containing extension module source.

   The make file make file, `Makefile.pre.in' uses metadata provided in
a file named `Setup'.  The format of the `Setup' file is the same as
the `Setup' (or `Setup.dist') file provided in the `Modules/' directory
of the Python source distribution.  The `Setup' file contains variable
definitions:

     EC=/projects/ExtensionClass

   and module description lines.  It can also contain blank lines and
comment lines that start with `#'.

   A module description line includes a module name, source files,
options, variable references, and other input files, such as libraries
or object files.  Consider a simple example:

     ExtensionClass ExtensionClass.c

   This is the simplest form of a module definition line.  It defines a
module, `ExtensionClass', which has a single source file,
`ExtensionClass.c'.

   This slightly more complex example uses an *-I* option to specify an
include directory:

     EC=/projects/ExtensionClass
     cPersistence cPersistence.c -I$(EC)

   This example also illustrates the format for variable references.

   For systems that support dynamic linking, the `Setup' file should
begin:

     *shared*

   to indicate that the modules defined in `Setup' are to be built as
dynamically linked modules.  A line containing only `*static*' can be
used to indicate the subsequently listed modules should be statically
linked.

   Here is a complete `Setup' file for building a `cPersistent' module:

     # Set-up file to build the cPersistence module.
     # Note that the text should begin in the first column.
     *shared*
     
     # We need the path to the directory containing the ExtensionClass
     # include file.
     EC=/projects/ExtensionClass
     cPersistence cPersistence.c -I$(EC)

   After the `Setup' file has been created, `Makefile.pre.in' is run
with the `boot' target to create a make file:

     make -f Makefile.pre.in boot

   This creates the file, Makefile.  To build the extensions, simply
run the created make file:

     make

   It's not necessary to re-run `Makefile.pre.in' if the `Setup' file
is changed.  The make file automatically rebuilds itself if the `Setup'
file changes.

* Menu:

* Building Custom Interpreters::
* Module Definition Options::
* Example::
* Distributing your extension modules::


File: python-ext.info,  Node: Building Custom Interpreters,  Next: Module Definition Options,  Prev: Building C and C++ Extensions on UNIX,  Up: Building C and C++ Extensions on UNIX

Building Custom Interpreters
============================

   The make file built by `Makefile.pre.in' can be run with the
`static' target to build an interpreter:

     make static

   Any modules defined in the `Setup' file before the `*shared*' line
will be statically linked into the interpreter.  Typically, a
`*shared*' line is omitted from the `Setup' file when a custom
interpreter is desired.


File: python-ext.info,  Node: Module Definition Options,  Next: Example,  Prev: Building Custom Interpreters,  Up: Building C and C++ Extensions on UNIX

Module Definition Options
=========================

   Several compiler options are supported:

Option                               Meaning
------                               -----
-C                                   Tell the C pre-processor not to
                                     discard comments
-DNAME=VALUE                         Define a macro
-IDIR                                Specify an include directory, DIR
-LDIR                                Specify a link-time library
                                     directory, DIR
-RDIR                                Specify a run-time library
                                     directory, DIR
-lLIB                                Link a library, LIB
-UNAME                               Undefine a macro

   Other compiler options can be included (snuck in) by putting them in
variables.

   Source files can include files with `.c', `.C', `.cc', `.cpp',
`.cxx', and `.c++' extensions.

   Other input files include files with `.a', `.o', `.sl', and `.so'
extensions.


File: python-ext.info,  Node: Example,  Next: Distributing your extension modules,  Prev: Module Definition Options,  Up: Building C and C++ Extensions on UNIX

Example
=======

   Here is a more complicated example from `Modules/Setup.dist':

     GMP=/ufs/guido/src/gmp
     mpz mpzmodule.c -I$(GMP) $(GMP)/libgmp.a

   which could also be written as:

     mpz mpzmodule.c -I$(GMP) -L$(GMP) -lgmp


File: python-ext.info,  Node: Distributing your extension modules,  Prev: Example,  Up: Building C and C++ Extensions on UNIX

Distributing your extension modules
===================================

   There are two ways to distribute extension modules for others to use.
The way that allows the easiest cross-platform support is to use the
`distutils' package.  The manual  contains information on this
approach.  It is recommended that all new extensions be distributed
using this approach to allow easy building and installation across
platforms.  Older extensions should migrate to this approach as well.

   What follows describes the older approach; there are still many
extensions which use this.

   When distributing your extension modules in source form, make sure to
include a `Setup' file.  The `Setup' file should be named `Setup.in' in
the distribution.  The make file make file, `Makefile.pre.in', will
copy `Setup.in' to `Setup' if the person installing the extension
doesn't do so manually.  Distributing a `Setup.in' file makes it easy
for people to customize the `Setup' file while keeping the original in
`Setup.in'.

   It is a good idea to include a copy of `Makefile.pre.in' for people
who do not have a source distribution of Python.

   Do not distribute a make file.  People building your modules should
use `Makefile.pre.in' to build their own make file.  A `README' file
included in the package should provide simple instructions to perform
the build.


File: python-ext.info,  Node: Building C and C++ Extensions on Windows,  Next: Embedding Python in Another Application,  Prev: Building C and C++ Extensions on UNIX,  Up: Top

Building C and C++ Extensions on Windows
****************************************

   This chapter briefly explains how to create a Windows extension
module for Python using Microsoft Visual C++, and follows with more
detailed background information on how it works.  The explanatory
material is useful for both the Windows programmer learning to build
Python extensions and the UNIX programmer interested in producing
software which can be successfully built on both UNIX and Windows.

* Menu:

* A Cookbook Approach::
* Differences Between UNIX and Windows::
* Using DLLs in Practice::