This is /home/pdm/install/Python-2.1/Doc/lib/python-lib.info, produced
by makeinfo version 4.0 from lib.texi.

   April 15, 2001		2.1


File: python-lib.info,  Node: Calibration,  Next: Profiler Extensions,  Prev: Limitations,  Up: Python Profiler

Calibration
===========

   The profiler class has a hard coded constant that is added to each
event handling time to compensate for the overhead of calling the time
function, and socking away the results.  The following procedure can be
used to obtain this constant for a given platform (see discussion in
section Limitations above).

     import profile
     pr = profile.Profile()
     print pr.calibrate(100)
     print pr.calibrate(100)
     print pr.calibrate(100)

   The argument to `calibrate()' is the number of times to try to do
the sample calls to get the CPU times.  If your computer is _very_
fast, you might have to do:

     pr.calibrate(1000)

   or even:

     pr.calibrate(10000)

   The object of this exercise is to get a fairly consistent result.
When you have a consistent answer, you are ready to use that number in
the source code.  For a Sun Sparcstation 1000 running Solaris 2.3, the
magical number is about .00053.  If you have a choice, you are better
off with a smaller constant, and your results will "less often" show up
as negative in profile statistics.

   The following shows how the trace_dispatch() method in the Profile
class should be modified to install the calibration constant on a Sun
Sparcstation 1000:

     def trace_dispatch(self, frame, event, arg):
         t = self.timer()
         t = t[0] + t[1] - self.t - .00053 # Calibration constant
     
         if self.dispatch[event](frame,t):
             t = self.timer()
             self.t = t[0] + t[1]
         else:
             r = self.timer()
             self.t = r[0] + r[1] - t # put back unrecorded delta
         return

   Note that if there is no calibration constant, then the line
containing the callibration constant should simply say:

     t = t[0] + t[1] - self.t  # no calibration constant

   You can also achieve the same results using a derived class (and the
profiler will actually run equally fast!!), but the above method is the
simplest to use.  I could have made the profiler "self calibrating",
but it would have made the initialization of the profiler class slower,
and would have required some _very_ fancy coding, or else the use of a
variable where the constant `.00053' was placed in the code shown.
This is a *VERY* critical performance section, and there is no reason
to use a variable lookup at this point, when a constant can be used.


File: python-lib.info,  Node: Profiler Extensions,  Prev: Calibration,  Up: Python Profiler

Deriving Better Profilers
=========================

   The `Profile' class of module `profile' was written so that derived
classes could be developed to extend the profiler.  Rather than
describing all the details of such an effort, I'll just present the
following two examples of derived classes that can be used to do
profiling.  If the reader is an avid Python programmer, then it should
be possible to use these as a model and create similar (and perchance
better) profile classes.

   If all you want to do is change how the timer is called, or which
timer function is used, then the basic class has an option for that in
the constructor for the class.  Consider passing the name of a function
to call into the constructor:

     pr = profile.Profile(your_time_func)

   The resulting profiler will call `your_time_func()' instead of
`os.times()'.  The function should return either a single number or a
list of numbers (like what `os.times()' returns).  If the function
returns a single time number, or the list of returned numbers has
length 2, then you will get an especially fast version of the dispatch
routine.

   Be warned that you _should_ calibrate the profiler class for the
timer function that you choose.  For most machines, a timer that
returns a lone integer value will provide the best results in terms of
low overhead during profiling.  (`os.times()' is _pretty_ bad, 'cause
it returns a tuple of floating point values, so all arithmetic is
floating point in the profiler!).  If you want to substitute a better
timer in the cleanest fashion, you should derive a class, and simply
put in the replacement dispatch method that better handles your timer
call, along with the appropriate calibration constant :-).

* Menu:

* OldProfile Class::
* HotProfile Class::


File: python-lib.info,  Node: OldProfile Class,  Next: HotProfile Class,  Prev: Profiler Extensions,  Up: Profiler Extensions

OldProfile Class
----------------

   The following derived profiler simulates the old style profiler,
providing errant results on recursive functions. The reason for the
usefulness of this profiler is that it runs faster (i.e., less
overhead) than the old profiler.  It still creates all the caller
stats, and is quite useful when there is _no_ recursion in the user's
code.  It is also a lot more accurate than the old profiler, as it does
not charge all its overhead time to the user's code.

     class OldProfile(Profile):
     
         def trace_dispatch_exception(self, frame, t):
             rt, rtt, rct, rfn, rframe, rcur = self.cur
             if rcur and not rframe is frame:
                 return self.trace_dispatch_return(rframe, t)
             return 0
     
         def trace_dispatch_call(self, frame, t):
             fn = `frame.f_code`
     
             self.cur = (t, 0, 0, fn, frame, self.cur)
             if self.timings.has_key(fn):
                 tt, ct, callers = self.timings[fn]
                 self.timings[fn] = tt, ct, callers
             else:
                 self.timings[fn] = 0, 0, {}
             return 1
     
         def trace_dispatch_return(self, frame, t):
             rt, rtt, rct, rfn, frame, rcur = self.cur
             rtt = rtt + t
             sft = rtt + rct
     
             pt, ptt, pct, pfn, pframe, pcur = rcur
             self.cur = pt, ptt+rt, pct+sft, pfn, pframe, pcur
     
             tt, ct, callers = self.timings[rfn]
             if callers.has_key(pfn):
                 callers[pfn] = callers[pfn] + 1
             else:
                 callers[pfn] = 1
             self.timings[rfn] = tt+rtt, ct + sft, callers
     
             return 1
     
         def snapshot_stats(self):
             self.stats = {}
             for func in self.timings.keys():
                 tt, ct, callers = self.timings[func]
                 nor_func = self.func_normalize(func)
                 nor_callers = {}
                 nc = 0
                 for func_caller in callers.keys():
                     nor_callers[self.func_normalize(func_caller)] = \
                         callers[func_caller]
                     nc = nc + callers[func_caller]
                 self.stats[nor_func] = nc, nc, tt, ct, nor_callers


File: python-lib.info,  Node: HotProfile Class,  Prev: OldProfile Class,  Up: Profiler Extensions

HotProfile Class
----------------

   This profiler is the fastest derived profile example.  It does not
calculate caller-callee relationships, and does not calculate
cumulative time under a function.  It only calculates time spent in a
function, so it runs very quickly (re: very low overhead).  In truth,
the basic profiler is so fast, that is probably not worth the savings
to give up the data, but this class still provides a nice example.

     class HotProfile(Profile):
     
         def trace_dispatch_exception(self, frame, t):
             rt, rtt, rfn, rframe, rcur = self.cur
             if rcur and not rframe is frame:
                 return self.trace_dispatch_return(rframe, t)
             return 0
     
         def trace_dispatch_call(self, frame, t):
             self.cur = (t, 0, frame, self.cur)
             return 1
     
         def trace_dispatch_return(self, frame, t):
             rt, rtt, frame, rcur = self.cur
     
             rfn = `frame.f_code`
     
             pt, ptt, pframe, pcur = rcur
             self.cur = pt, ptt+rt, pframe, pcur
     
             if self.timings.has_key(rfn):
                 nc, tt = self.timings[rfn]
                 self.timings[rfn] = nc + 1, rt + rtt + tt
             else:
                 self.timings[rfn] =      1, rt + rtt
     
             return 1
     
         def snapshot_stats(self):
             self.stats = {}
             for func in self.timings.keys():
                 nc, tt = self.timings[func]
                 nor_func = self.func_normalize(func)
                 self.stats[nor_func] = nc, nc, tt, 0, {}


File: python-lib.info,  Node: Internet Protocols and Support,  Next: Internet Data Handling,  Prev: Python Profiler,  Up: Top

Internet Protocols and Support
******************************

   The modules described in this chapter implement Internet protocols
and support for related technology.  They are all implemented in Python.
Most of these modules require the presence of the system-dependent
module `socket', which is currently supported on most popular
platforms.  Here is an overview:

* Menu:

* webbrowser::
* cgi::
* urllib::
* urllib2::
* httplib::
* ftplib::
* gopherlib::
* poplib::
* imaplib::
* nntplib::
* smtplib::
* telnetlib::
* urlparse::
* SocketServer::
* BaseHTTPServer::
* SimpleHTTPServer::
* CGIHTTPServer::
* Cookie::
* asyncore::


File: python-lib.info,  Node: webbrowser,  Next: cgi,  Prev: Internet Protocols and Support,  Up: Internet Protocols and Support

Convenient Web-browser controller
=================================

   Easy-to-use controller for Web browsers.  This module was documented
by Fred L. Drake, Jr. <fdrake@acm.org>.
This section was written by Fred L. Drake, Jr. <fdrake@acm.org>.
The `webbrowser' module provides a very high-level interface to allow
displaying Web-based documents to users.  The controller objects are
easy to use and are platform-independent.  Under most circumstances,
simply calling the `open()' function from this module will do the right
thing.

   Under UNIX, graphical browsers are preferred under X11, but text-mode
browsers will be used if graphical browsers are not available or an X11
display isn't available.  If text-mode browsers are used, the calling
process will block until the user exits the browser.

   Under UNIX, if the environment variable `BROWSER' exists, it is
interpreted to override the platform default list of browsers, as a
colon-separated list of browsers to try in order.  When the value of a
list part contains the string `%s', then it is interpreted as a literal
browser command line to be used with the argument URL substituted for
the `%s'; if the part does not contain `%s', it is simply interpreted
as the name of the browser to launch.

   For non-UNIX platforms, or when X11 browsers are available on UNIX,
the controlling process will not wait for the user to finish with the
browser, but allow the browser to maintain its own window on the
display.

   The following exception is defined:

`Error'
     Exception raised when a browser control error occurs.

   The following functions are defined:

`open(url[, new=0][, autoraise=1])'
     Display URL using the default browser.  If NEW is true, a new
     browser window is opened if possible.  If AUTORAISE is true, the
     window is raised if possible (note that under many window managers
     this will occur regardless of the setting of this variable).

`open_new(url)'
     Open URL in a new window of the default browser, if possible,
     otherwise, open URL in the only browser window.  (This entry point
     is deprecated and may be removed in 2.1.)

`get([name])'
     Return a controller object for the browser type NAME.  If NAME is
     empty, return a controller for a default browser appropriate to
     the caller's environment.

`register(name, constructor[, instance])'
     Register the browser type NAME.  Once a browser type is
     registered, the `get()' function can return a controller for that
     browser type.  If INSTANCE is not provided, or is `None',
     CONSTRUCTOR will be called without parameters to create an
     instance when needed.  If INSTANCE is provided, CONSTRUCTOR will
     never be called, and may be `None'.

     This entry point is only useful if you plan to either set the
     `BROWSER' variable or call `get' with a nonempty argument matching
     the name of a handler you declare.

   A number of browser types are predefined.  This table gives the type
names that may be passed to the `get()' function and the corresponding
instantiations for the controller classes, all defined in this module.

Type Name                Class Name               Notes
------                   -----                    -----
'mozilla'                `Netscape('mozilla')'    
'netscape'               `Netscape('netscape')'   
'mosaic'                 `GenericBrowser('mosaic  
                         %s &')'                  
'kfm'                    `Konqueror()'            (1)
'grail'                  `Grail()'                
'links'                  `GenericBrowser('links   
                         %s')'                    
'lynx'                   `GenericBrowser('lynx    
                         %s')'                    
'w3m'                    `GenericBrowser('w3m     
                         %s')'                    
'windows-default'        `WindowsDefault'         (2)
'internet-config'        `InternetConfig'         (3)

Notes:

`(1)'
     "Konqueror" is the file manager for the KDE desktop environment for
     UNIX, and only makes sense to use if KDE is running.  Some way of
     reliably detecting KDE would be nice; the `KDEDIR' variable is not
     sufficient.  Note also that the name "kfm" is used even when using
     the `konqueror' command with KDE 2 -- the implementation selects
     the best strategy for running Konqueror.

`(2)'
     Only on Windows platforms; requires the common extension modules
     `win32api' and `win32con'.

`(3)'
     Only on MacOS platforms; requires the standard MacPython `ic'
     module, described in the  manual.

* Menu:

* Browser Controller Objects::


File: python-lib.info,  Node: Browser Controller Objects,  Prev: webbrowser,  Up: webbrowser

Browser Controller Objects
--------------------------

   Browser controllers provide two methods which parallel two of the
module-level convenience functions:

`open(url[, new])'
     Display URL using the browser handled by this controller.  If NEW
     is true, a new browser window is opened if possible.

`open_new(url)'
     Open URL in a new window of the browser handled by this
     controller, if possible, otherwise, open URL in the only browser
     window.  (This method is deprecated and may be removed in 2.1.)


File: python-lib.info,  Node: cgi,  Next: urllib,  Prev: webbrowser,  Up: Internet Protocols and Support

Common Gateway Interface support.
=================================

   Common Gateway Interface support, used to interpret forms in
server-side scripts.

   Support module for CGI (Common Gateway Interface) scripts.

   This module defines a number of utilities for use by CGI scripts
written in Python.

* Menu:

* cgi-intro::
* Using the cgi module::
* Old classes::
* Functions in cgi module::
* Caring about security::
* Installing your CGI script on a Unix system::
* Testing your CGI script::
* Debugging CGI scripts::
* Common problems and solutions::


File: python-lib.info,  Node: cgi-intro,  Next: Using the cgi module,  Prev: cgi,  Up: cgi

Introduction
------------

   A CGI script is invoked by an HTTP server, usually to process user
input submitted through an HTML `<FORM>' or `<ISINDEX>' element.

   Most often, CGI scripts live in the server's special `cgi-bin'
directory.  The HTTP server places all sorts of information about the
request (such as the client's hostname, the requested URL, the query
string, and lots of other goodies) in the script's shell environment,
executes the script, and sends the script's output back to the client.

   The script's input is connected to the client too, and sometimes the
form data is read this way; at other times the form data is passed via
the "query string" part of the URL.  This module is intended to take
care of the different cases and provide a simpler interface to the
Python script.  It also provides a number of utilities that help in
debugging scripts, and the latest addition is support for file uploads
from a form (if your browser supports it -- Grail 0.3 and Netscape 2.0
do).

   The output of a CGI script should consist of two sections, separated
by a blank line.  The first section contains a number of headers,
telling the client what kind of data is following.  Python code to
generate a minimal header section looks like this:

     print "Content-Type: text/html"     # HTML is following
     print                               # blank line, end of headers

   The second section is usually HTML, which allows the client software
to display nicely formatted text with header, in-line images, etc.
Here's Python code that prints a simple piece of HTML:

     print "<TITLE>CGI script output</TITLE>"
     print "<H1>This is my first CGI script</H1>"
     print "Hello, world!"


File: python-lib.info,  Node: Using the cgi module,  Next: Old classes,  Prev: cgi-intro,  Up: cgi

Using the cgi module
--------------------

   Begin by writing `import cgi'.  Do not use `from cgi import *' --
the module defines all sorts of names for its own use or for backward
compatibility that you don't want in your namespace.

   It's best to use the `FieldStorage' class.  The other classes
defined in this module are provided mostly for backward compatibility.
Instantiate it exactly once, without arguments.  This reads the form
contents from standard input or the environment (depending on the value
of various environment variables set according to the CGI standard).
Since it may consume standard input, it should be instantiated only
once.

   The `FieldStorage' instance can be indexed like a Python dictionary,
and also supports the standard dictionary methods `has_key()' and
`keys()'.  Form fields containing empty strings are ignored and do not
appear in the dictionary; to keep such values, provide the optional
`keep_blank_values' argument when creating the `FieldStorage' instance.

   For instance, the following code (which assumes that the
`Content-Type' header and blank line have already been printed) checks
that the fields `name' and `addr' are both set to a non-empty string:

     form = cgi.FieldStorage()
     form_ok = 0
     if form.has_key("name") and form.has_key("addr"):
         form_ok = 1
     if not form_ok:
         print "<H1>Error</H1>"
         print "Please fill in the name and addr fields."
         return
     print "<p>name:", form["name"].value
     print "<p>addr:", form["addr"].value
     ...further form processing here...

   Here the fields, accessed through `form[KEY]', are themselves
instances of `FieldStorage' (or `MiniFieldStorage', depending on the
form encoding).  The `value' attribute of the instance yields the
string value of the field.  The `getvalue()' method returns this string
value directly; it also accepts an optional second argument as a
default to return if the requested key is not present.

   If the submitted form data contains more than one field with the same
name, the object retrieved by `form[KEY]' is not a `FieldStorage' or
`MiniFieldStorage' instance but a list of such instances.  Similarly,
in this situation, `form.getvalue(KEY)' would return a list of strings.
If you expect this possibility (i.e., when your HTML form contains
multiple fields with the same name), use the `type()' function to
determine whether you have a single instance or a list of instances.
For example, here's code that concatenates any number of username
fields, separated by commas:

     value = form.getvalue("username", "")
     if type(value) is type([]):
         # Multiple username fields specified
         usernames = ",".join(value)
     else:
         # Single or no username field specified
         usernames = value

   If a field represents an uploaded file, accessing the value via the
`value' attribute or the `getvalue()' method reads the entire file in
memory as a string.  This may not be what you want.  You can test for
an uploaded file by testing either the `filename' attribute or the
`file' attribute.  You can then read the data at leisure from the
`file' attribute:

     fileitem = form["userfile"]
     if fileitem.file:
         # It's an uploaded file; count lines
         linecount = 0
         while 1:
             line = fileitem.file.readline()
             if not line: break
             linecount = linecount + 1

   The file upload draft standard entertains the possibility of
uploading multiple files from one field (using a recursive
`multipart/*' encoding).  When this occurs, the item will be a
dictionary-like `FieldStorage' item.  This can be determined by testing
its `type' attribute, which should be `multipart/form-data' (or perhaps
another MIME type matching `multipart/*').  In this case, it can be
iterated over recursively just like the top-level form object.

   When a form is submitted in the "old" format (as the query string or
as a single data part of type `application/x-www-form-urlencoded'), the
items will actually be instances of the class `MiniFieldStorage'.  In
this case, the `list', `file', and `filename' attributes are always
`None'.


File: python-lib.info,  Node: Old classes,  Next: Functions in cgi module,  Prev: Using the cgi module,  Up: cgi

Old classes
-----------

   These classes, present in earlier versions of the `cgi' module, are
still supported for backward compatibility.  New applications should
use the `FieldStorage' class.

   `SvFormContentDict' stores single value form content as dictionary;
it assumes each field name occurs in the form only once.

   `FormContentDict' stores multiple value form content as a dictionary
(the form items are lists of values).  Useful if your form contains
multiple fields with the same name.

   Other classes (`FormContent', `InterpFormContentDict') are present
for backwards compatibility with really old applications only.  If you
still use these and would be inconvenienced when they disappeared from
a next version of this module, drop me a note.


File: python-lib.info,  Node: Functions in cgi module,  Next: Caring about security,  Prev: Old classes,  Up: cgi

Functions
---------

   These are useful if you want more control, or if you want to employ
some of the algorithms implemented in this module in other
circumstances.

`parse(fp)'
     Parse a query in the environment or from a file (default
     `sys.stdin').

`parse_qs(qs[, keep_blank_values, strict_parsing])'
     Parse a query string given as a string argument (data of type
     `application/x-www-form-urlencoded').  Data are returned as a
     dictionary.  The dictionary keys are the unique query variable
     names and the values are lists of values for each name.

     The optional argument KEEP_BLANK_VALUES is a flag indicating
     whether blank values in URL encoded queries should be treated as
     blank strings.  A true value indicates that blanks should be
     retained as blank strings.  The default false value indicates that
     blank values are to be ignored and treated as if they were not
     included.

     The optional argument STRICT_PARSING is a flag indicating what to
     do with parsing errors.  If false (the default), errors are
     silently ignored.  If true, errors raise a ValueError exception.

`parse_qsl(qs[, keep_blank_values, strict_parsing])'
     Parse a query string given as a string argument (data of type
     `application/x-www-form-urlencoded').  Data are returned as a list
     of name, value pairs.

     The optional argument KEEP_BLANK_VALUES is a flag indicating
     whether blank values in URL encoded queries should be treated as
     blank strings.  A true value indicates that blanks should be
     retained as blank strings.  The default false value indicates that
     blank values are to be ignored and treated as if they were not
     included.

     The optional argument STRICT_PARSING is a flag indicating what to
     do with parsing errors.  If false (the default), errors are
     silently ignored.  If true, errors raise a ValueError exception.

`parse_multipart(fp, pdict)'
     Parse input of type `multipart/form-data' (for file uploads).
     Arguments are FP for the input file and PDICT for a dictionary
     containing other parameters in the `Content-Type' header.

     Returns a dictionary just like `parse_qs()' keys are the field
     names, each value is a list of values for that field.  This is
     easy to use but not much good if you are expecting megabytes to be
     uploaded -- in that case, use the `FieldStorage' class instead
     which is much more flexible.

     Note that this does not parse nested multipart parts -- use
     `FieldStorage' for that.

`parse_header(string)'
     Parse a MIME header (such as `Content-Type') into a main value and
     a dictionary of parameters.

`test()'
     Robust test CGI script, usable as main program.  Writes minimal
     HTTP headers and formats all information provided to the script in
     HTML form.

`print_environ()'
     Format the shell environment in HTML.

`print_form(form)'
     Format a form in HTML.

`print_directory()'
     Format the current directory in HTML.

`print_environ_usage()'
     Print a list of useful (used by CGI) environment variables in HTML.

`escape(s[, quote])'
     Convert the characters `&', `<' and `>' in string S to HTML-safe
     sequences.  Use this if you need to display text that might
     contain such characters in HTML.  If the optional flag QUOTE is
     true, the double quote character (`"') is also translated; this
     helps for inclusion in an HTML attribute value, e.g. in `<A
     HREF="...">'.


File: python-lib.info,  Node: Caring about security,  Next: Installing your CGI script on a Unix system,  Prev: Functions in cgi module,  Up: cgi

Caring about security
---------------------

   There's one important rule: if you invoke an external program (e.g.
via the `os.system()' or `os.popen()' functions), make very sure you
don't pass arbitrary strings received from the client to the shell.
This is a well-known security hole whereby clever hackers anywhere on
the web can exploit a gullible CGI script to invoke arbitrary shell
commands.  Even parts of the URL or field names cannot be trusted,
since the request doesn't have to come from your form!

   To be on the safe side, if you must pass a string gotten from a form
to a shell command, you should make sure the string contains only
alphanumeric characters, dashes, underscores, and periods.


File: python-lib.info,  Node: Installing your CGI script on a Unix system,  Next: Testing your CGI script,  Prev: Caring about security,  Up: cgi

Installing your CGI script on a Unix system
-------------------------------------------

   Read the documentation for your HTTP server and check with your local
system administrator to find the directory where CGI scripts should be
installed; usually this is in a directory `cgi-bin' in the server tree.

   Make sure that your script is readable and executable by "others";
the UNIX file mode should be `0755' octal (use `chmod 0755 FILENAME').
Make sure that the first line of the script contains `#!' starting in
column 1 followed by the pathname of the Python interpreter, for
instance:

     #!/usr/local/bin/python

   Make sure the Python interpreter exists and is executable by
"others".

   Make sure that any files your script needs to read or write are
readable or writable, respectively, by "others" -- their mode should be
`0644' for readable and `0666' for writable.  This is because, for
security reasons, the HTTP server executes your script as user
"nobody", without any special privileges.  It can only read (write,
execute) files that everybody can read (write, execute).  The current
directory at execution time is also different (it is usually the
server's cgi-bin directory) and the set of environment variables is
also different from what you get at login.  In particular, don't count
on the shell's search path for executables (`PATH') or the Python
module search path (`PYTHONPATH') to be set to anything interesting.

   If you need to load modules from a directory which is not on Python's
default module search path, you can change the path in your script,
before importing other modules, e.g.:

     import sys
     sys.path.insert(0, "/usr/home/joe/lib/python")
     sys.path.insert(0, "/usr/local/lib/python")

   (This way, the directory inserted last will be searched first!)

   Instructions for non-UNIX systems will vary; check your HTTP server's
documentation (it will usually have a section on CGI scripts).


File: python-lib.info,  Node: Testing your CGI script,  Next: Debugging CGI scripts,  Prev: Installing your CGI script on a Unix system,  Up: cgi

Testing your CGI script
-----------------------

   Unfortunately, a CGI script will generally not run when you try it
from the command line, and a script that works perfectly from the
command line may fail mysteriously when run from the server.  There's
one reason why you should still test your script from the command line:
if it contains a syntax error, the Python interpreter won't execute it
at all, and the HTTP server will most likely send a cryptic error to
the client.

   Assuming your script has no syntax errors, yet it does not work, you
have no choice but to read the next section.


File: python-lib.info,  Node: Debugging CGI scripts,  Next: Common problems and solutions,  Prev: Testing your CGI script,  Up: cgi

Debugging CGI scripts
---------------------

   First of all, check for trivial installation errors -- reading the
section above on installing your CGI script carefully can save you a
lot of time.  If you wonder whether you have understood the
installation procedure correctly, try installing a copy of this module
file (`cgi.py') as a CGI script.  When invoked as a script, the file
will dump its environment and the contents of the form in HTML form.
Give it the right mode etc, and send it a request.  If it's installed
in the standard `cgi-bin' directory, it should be possible to send it a
request by entering a URL into your browser of the form:

     http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home

   If this gives an error of type 404, the server cannot find the script
- perhaps you need to install it in a different directory.  If it gives
another error (e.g.  500), there's an installation problem that you
should fix before trying to go any further.  If you get a nicely
formatted listing of the environment and form content (in this example,
the fields should be listed as "addr" with value "At Home" and "name"
with value "Joe Blow"), the `cgi.py' script has been installed
correctly.  If you follow the same procedure for your own script, you
should now be able to debug it.

   The next step could be to call the `cgi' module's `test()' function
from your script: replace its main code with the single statement

     cgi.test()

   This should produce the same results as those gotten from installing
the `cgi.py' file itself.

   When an ordinary Python script raises an unhandled exception (e.g.
because of a typo in a module name, a file that can't be opened, etc.),
the Python interpreter prints a nice traceback and exits.  While the
Python interpreter will still do this when your CGI script raises an
exception, most likely the traceback will end up in one of the HTTP
server's log file, or be discarded altogether.

   Fortunately, once you have managed to get your script to execute
_some_ code, it is easy to catch exceptions and cause a traceback to be
printed.  The `test()' function below in this module is an example.
Here are the rules:

  1. Import the traceback module before entering the `try' ... `except'
     statement

  2. Assign `sys.stderr' to be `sys.stdout'

  3. Make sure you finish printing the headers and the blank line early

  4. Wrap all remaining code in a `try' ... `except' statement

  5. In the except clause, call `traceback.print_exc()'

   For example:

     import sys
     import traceback
     print "Content-Type: text/html"
     print
     sys.stderr = sys.stdout
     try:
         ...your code here...
     except:
         print "\n\n<PRE>"
         traceback.print_exc()

   Notes: The assignment to `sys.stderr' is needed because the
traceback prints to `sys.stderr'.  The `print "{\}n{\}n<PRE>"'
statement is necessary to disable the word wrapping in HTML.

   If you suspect that there may be a problem in importing the traceback
module, you can use an even more robust approach (which only uses
built-in modules):

     import sys
     sys.stderr = sys.stdout
     print "Content-Type: text/plain"
     print
     ...your code here...

   This relies on the Python interpreter to print the traceback.  The
content type of the output is set to plain text, which disables all
HTML processing.  If your script works, the raw HTML will be displayed
by your client.  If it raises an exception, most likely after the first
two lines have been printed, a traceback will be displayed.  Because no
HTML interpretation is going on, the traceback will readable.


File: python-lib.info,  Node: Common problems and solutions,  Prev: Debugging CGI scripts,  Up: cgi

Common problems and solutions
-----------------------------

   * Most HTTP servers buffer the output from CGI scripts until the
     script is completed.  This means that it is not possible to
     display a progress report on the client's display while the script
     is running.

   * Check the installation instructions above.

   * Check the HTTP server's log files.  (`tail -f logfile' in a
     separate window may be useful!)

   * Always check a script for syntax errors first, by doing something
     like `python script.py'.

   * When using any of the debugging techniques, don't forget to add
     `import sys' to the top of the script.

   * When invoking external programs, make sure they can be found.
     Usually, this means using absolute path names -- `PATH' is usually
     not set to a very useful value in a CGI script.

   * When reading or writing external files, make sure they can be read
     or written by every user on the system.

   * Don't try to give a CGI script a set-uid mode.  This doesn't work
     on most systems, and is a security liability as well.


File: python-lib.info,  Node: urllib,  Next: urllib2,  Prev: cgi,  Up: Internet Protocols and Support

Open arbitrary resources by URL
===============================

   Open an arbitrary network resource by URL (requires sockets).

   This module provides a high-level interface for fetching data across
the World-Wide Web.  In particular, the `urlopen()' function is similar
to the built-in function `open()', but accepts Universal Resource
Locators (URLs) instead of filenames.  Some restrictions apply -- it
can only open URLs for reading, and no seek operations are available.

   It defines the following public functions:

`urlopen(url[, data])'
     Open a network object denoted by a URL for reading.  If the URL
     does not have a scheme identifier, or if it has `file:' as its
     scheme identifier, this opens a local file; otherwise it opens a
     socket to a server somewhere on the network.  If the connection
     cannot be made, or if the server returns an error code, the
     `IOError' exception is raised.  If all went well, a file-like
     object is returned.  This supports the following methods:
     `read()', `readline()', `readlines()', `fileno()', `close()',
     `info()' and `geturl()'.

     Except for the `info()' and `geturl()' methods, these methods have
     the same interface as for file objects -- see section *Note File
     Objectsfile:: in this manual.  (It is not a built-in file object,
     however, so it can't be used at those few places where a true
     built-in file object is required.)

     The `info()' method returns an instance of the class
     `mimetools.Message' containing meta-information associated with
     the URL.  When the method is HTTP, these headers are those
     returned by the server at the head of the retrieved HTML page
     (including Content-Length and Content-Type).  When the method is
     FTP, a Content-Length header will be present if (as is now usual)
     the server passed back a file length in response to the FTP
     retrieval request.  When the method is local-file, returned
     headers will include a Date representing the file's last-modified
     time, a Content-Length giving file size, and a Content-Type
     containing a guess at the file's type. See also the description of
     the `mimetools' module.

     The `geturl()' method returns the real URL of the page.  In some
     cases, the HTTP server redirects a client to another URL.  The
     `urlopen()' function handles this transparently, but in some cases
     the caller needs to know which URL the client was redirected to.
     The `geturl()' method can be used to get at this redirected URL.

     If the URL uses the `http:' scheme identifier, the optional DATA
     argument may be given to specify a `POST' request (normally the
     request type is `GET').  The DATA argument must in standard
     `application/x-www-form-urlencoded' format; see the `urlencode()'
     function below.

     The `urlopen()' function works transparently with proxies which do
     not require authentication.  In a UNIX or Windows environment, set
     the `http_proxy', `ftp_proxy' or `gopher_proxy' environment
     variables to a URL that identifies the proxy server before
     starting the Python interpreter.  For example (the `%' is the
     command prompt):

          % http_proxy="http://www.someproxy.com:3128"
          % export http_proxy
          % python
          ...

     In a Macintosh environment, `urlopen()' will retrieve proxy
     information from Internet Config.

     Proxies which require authentication for use are not currently
     supported; this is considered an implementation limitation.

`urlretrieve(url[, filename[, reporthook[, data]]])'
     Copy a network object denoted by a URL to a local file, if
     necessary.  If the URL points to a local file, or a valid cached
     copy of the object exists, the object is not copied.  Return a
     tuple `(FILENAME, HEADERS)' where FILENAME is the local file name
     under which the object can be found, and HEADERS is either `None'
     (for a local object) or whatever the `info()' method of the object
     returned by `urlopen()' returned (for a remote object, possibly
     cached).  Exceptions are the same as for `urlopen()'.

     The second argument, if present, specifies the file location to
     copy to (if absent, the location will be a tempfile with a
     generated name).  The third argument, if present, is a hook
     function that will be called once on establishment of the network
     connection and once after each block read thereafter.  The hook
     will be passed three arguments; a count of blocks transferred so
     far, a block size in bytes, and the total size of the file.  The
     third argument may be `-1' on older FTP servers which do not
     return a file size in response to a retrieval request.

     If the URL uses the `http:' scheme identifier, the optional DATA
     argument may be given to specify a `POST' request (normally the
     request type is `GET').  The DATA argument must in standard
     `application/x-www-form-urlencoded' format; see the `urlencode()'
     function below.

`urlcleanup()'
     Clear the cache that may have been built up by previous calls to
     `urlretrieve()'.

`quote(string[, safe])'
     Replace special characters in STRING using the `%xx' escape.
     Letters, digits, and the characters `_,.-' are never quoted.  The
     optional SAFE parameter specifies additional characters that
     should not be quoted -- its default value is `'/''.

     Example: `quote('/~{}connolly/')' yields `'/%7econnolly/''.

`quote_plus(string[, safe])'
     Like `quote()', but also replaces spaces by plus signs, as
     required for quoting HTML form values.  Plus signs in the original
     string are escaped unless they are included in SAFE.

`unquote(string)'
     Replace `%xx' escapes by their single-character equivalent.

     Example: `unquote('/%7Econnolly/')' yields `'/~{}connolly/''.

`unquote_plus(string)'
     Like `unquote()', but also replaces plus signs by spaces, as
     required for unquoting HTML form values.

`urlencode(query[, doseq])'
     Convert a mapping object or a sequence of two-element tuples  to a
     "url-encoded" string, suitable to pass to `urlopen()' above as the
     optional DATA argument.  This is useful to pass a dictionary of
     form fields to a `POST' request.  The resulting string is a series
     of `KEY=VALUE' pairs separated by `&' characters, where both KEY
     and VALUE are quoted using `quote_plus()' above.  If the optional
     parameter DOSEQ is present and evaluates to true, individual
     `KEY=VALUE' pairs are generated for each element of the sequence.
     When a sequence of two-element tuples is used as the QUERY
     argument, the first element of each tuple is a key and the second
     is a value.  The order of parameters in the encoded string will
     match the order of parameter tuples in the sequence.

   The public functions `urlopen()' and `urlretrieve()' create an
instance of the `FancyURLopener' class and use it to perform their
requested actions.  To override this functionality, programmers can
create a subclass of `URLopener' or `FancyURLopener', then assign that
an instance of that class to the `urllib._urlopener' variable before
calling the desired function.  For example, applications may want to
specify a different `user-agent' header than `URLopener' defines.  This
can be accomplished with the following code:

     class AppURLopener(urllib.FancyURLopener):
         def __init__(self, *args):
             self.version = "App/1.7"
             apply(urllib.FancyURLopener.__init__, (self,) + args)
     
     urllib._urlopener = AppURLopener()

`URLopener([proxies[, **x509]])'
     Base class for opening and reading URLs.  Unless you need to
     support opening objects using schemes other than `http:', `ftp:',
     `gopher:' or `file:', you probably want to use `FancyURLopener'.

     By default, the `URLopener' class sends a `user-agent' header of
     `urllib/VVV', where VVV is the `urllib' version number.
     Applications can define their own `user-agent' header by
     subclassing `URLopener' or `FancyURLopener' and setting the
     instance attribute `version' to an appropriate string value before
     the `open()' method is called.

     Additional keyword parameters, collected in X509, are used for
     authentication with the `https:' scheme.  The keywords KEY_FILE
     and CERT_FILE are supported; both are needed to actually retrieve
     a resource at an `https:' URL.

`FancyURLopener(...)'
     `FancyURLopener' subclasses `URLopener' providing default handling
     for the following HTTP response codes: 301, 302 or 401.  For 301
     and 302 response codes, the `location' header is used to fetch the
     actual URL.  For 401 response codes (authentication required),
     basic HTTP authentication is performed.  For 301 and 302 response
     codes, recursion is bounded by the value of the MAXTRIES attribute,
     which defaults 10.

     The parameters to the constructor are the same as those for
     `URLopener'.

     *Note:*  When performing basic authentication, a `FancyURLopener'
     instance calls its `prompt_user_passwd()' method.  The default
     implementation asks the users for the required information on the
     controlling terminal.  A subclass may override this method to
     support more appropriate behavior if needed.

   Restrictions:

   * Currently, only the following protocols are supported: HTTP,
     (versions 0.9 and 1.0), Gopher (but not Gopher-+), FTP, and local
     files.

   * The caching feature of `urlretrieve()' has been disabled until I
     find the time to hack proper processing of Expiration time headers.

   * There should be a function to query whether a particular URL is in
     the cache.

   * For backward compatibility, if a URL appears to point to a local
     file but the file can't be opened, the URL is re-interpreted using
     the FTP protocol.  This can sometimes cause confusing error
     messages.

   * The `urlopen()' and `urlretrieve()' functions can cause
     arbitrarily long delays while waiting for a network connection to
     be set up.  This means that it is difficult to build an interactive
     web client using these functions without using threads.

   * The data returned by `urlopen()' or `urlretrieve()' is the raw
     data returned by the server.  This may be binary data (e.g. an
     image), plain text or (for example) HTML.  The HTTP protocol
     provides type information in the reply header, which can be
     inspected by looking at the `content-type' header.  For the
     Gopherprotocol, type information is encoded in the URL; there is
     currently no easy way to extract it.  If the returned data is
     HTML, you can use the module `htmllib' to parse it.

   * This module does not support the use of proxies which require
     authentication.  This may be implemented in the future.

   * Although the `urllib' module contains (undocumented) routines to
     parse and unparse URL strings, the recommended interface for URL
     manipulation is in module `urlparse'.


* Menu:

* URLopener Objects::
* Urllib Examples::


File: python-lib.info,  Node: URLopener Objects,  Next: Urllib Examples,  Prev: urllib,  Up: urllib

URLopener Objects
-----------------

   This section was written by Skip Montanaro <skip@mojam.com>.
`URLopener' and `FancyURLopener' objects have the following attributes.

`open(fullurl[, data])'
     Open FULLURL using the appropriate protocol.  This method sets up
     cache and proxy information, then calls the appropriate open
     method with its input arguments.  If the scheme is not recognized,
     `open_unknown()' is called.  The DATA argument has the same
     meaning as the DATA argument of `urlopen()'.

`open_unknown(fullurl[, data])'
     Overridable interface to open unknown URL types.

`retrieve(url[, filename[, reporthook[, data]]])'
     Retrieves the contents of URL and places it in FILENAME.  The
     return value is a tuple consisting of a local filename and either a
     `mimetools.Message' object containing the response headers (for
     remote URLs) or None (for local URLs).  The caller must then open
     and read the contents of FILENAME.  If FILENAME is not given and
     the URL refers to a local file, the input filename is returned.
     If the URL is non-local and FILENAME is not given, the filename is
     the output of `tempfile.mktemp()' with a suffix that matches the
     suffix of the last path component of the input URL.  If REPORTHOOK
     is given, it must be a function accepting three numeric
     parameters.  It will be called after each chunk of data is read
     from the network.  REPORTHOOK is ignored for local URLs.

     If the URL uses the `http:' scheme identifier, the optional DATA
     argument may be given to specify a `POST' request (normally the
     request type is `GET').  The DATA argument must in standard
     `application/x-www-form-urlencoded' format; see the `urlencode()'
     function below.

`version'
     Variable that specifies the user agent of the opener object.  To
     get `urllib' to tell servers that it is a particular user agent,
     set this in a subclass as a class variable or in the constructor
     before calling the base constructor.

   The `FancyURLopener' class offers one additional method that should
be overloaded to provide the appropriate behavior:

`prompt_user_passwd(host, realm)'
     Return information needed to authenticate the user at the given
     host in the specified security realm.  The return value should be
     a tuple, `(USER, PASSWORD)', which can be used for basic
     authentication.

     The implementation prompts for this information on the terminal; an
     application should override this method to use an appropriate
     interaction model in the local environment.


File: python-lib.info,  Node: Urllib Examples,  Prev: URLopener Objects,  Up: urllib

Examples
--------

   Here is an example session that uses the `GET' method to retrieve a
URL containing parameters:

     >>> import urllib
     >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
     >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
     >>> print f.read()

   The following example uses the `POST' method instead:

     >>> import urllib
     >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
     >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
     >>> print f.read()