[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter describes the procedures that are used for input and output (I/O). The chapter first describes ports and how they are manipulated, then describes the I/O operations. Finally, some low-level procedures are described that permit the implementation of custom ports and high-performance I/O.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Scheme uses ports for I/O. A port, which can be
treated like any other Scheme object, serves as a source or sink for
data. A port must be open before it can be read from or written to.
The standard I/O port, console-i/o-port
, is opened
automatically when you start Scheme. When you use a file for input or
output, you need to explicitly open and close a port to the file (with
procedures described in this chapter). Additional procedures let you
open ports to strings.
Many input procedures, such as read-char
and read
, read
data from the current input port by default, or from a port that you
specify. The current input port is initially console-i/o-port
,
but Scheme provides procedures that let you change the current input
port to be a file or string.
Similarly, many output procedures, such as write-char
and
display
, write data to the current output port by default, or to
a port that you specify. The current output port is initially
console-i/o-port
, but Scheme provides procedures that let you
change the current output port to be a file or string.
All ports read or write only ISO-8859-1 characters.
Every port is either an input port, an output port, or both. The following predicates distinguish all of the possible cases.
#t
if object is a port, otherwise returns
#f
.
#t
if object is an input port, otherwise returns
#f
. Any object satisfying this predicate also satisfies
port?
.
#t
if object is an output port, otherwise returns
#f
. Any object satisfying this predicate also satisfies
port?
.
#t
if object is both an input port and an output
port, otherwise returns #f
. Any object satisfying this predicate
also satisfies port?
, input-port?
, and
output-port?
.
condition-type:wrong-type-argument
if it is not a
port, input port, output port, or I/O port, respectively.
Otherwise they return object.
The next five procedures return the runtime system's standard ports. All of the standard ports are dynamically bound by the REP loop; this means that when a new REP loop is started, for example by an error, each of these ports is dynamically bound to the I/O port of the REP loop. When the REP loop exits, the ports revert to their original values.
current-input-port
returns the
value of console-i/o-port
.
current-output-port
returns the
value of console-i/o-port
.
load
procedure writes
messages to this port informing the user that a file is being loaded.
Initially, notification-output-port
returns the value of
console-i/o-port
.
trace
procedure is sent to this port. Initially, trace-output-port
returns the value of console-i/o-port
.
interaction-i/o-port
returns
the value of console-i/o-port
.
with-input-from-port
binds the current input port,
with-output-to-port
binds the current output port,
with-notification-output-port
binds the "notification" output
port, with-trace-output-port
binds the "trace" output port,
and with-interaction-i/o-port
binds the "interaction"
I/O port.
console-i/o-port
is an I/O port that communicates
with the "console". Under unix, the console is the controlling
terminal of the Scheme process. Under Windows and OS/2, the console
is the window that is created when Scheme starts up.
This variable is rarely used; instead programs should use one of the standard ports defined above. This variable should not be modified.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Before Scheme can access a file for reading or writing, it is necessary
to open a port to the file. This section describes procedures used to
open ports to files. Such ports are closed (like any other port) by
close-port
. File ports are automatically closed if and when they
are reclaimed by the garbage collector.
Before opening a file for input or output, by whatever method, the
filename argument is converted to canonical form by calling the
procedure merge-pathnames
with filename as its sole
argument. Thus, filename can be either a string or a pathname,
and it is merged with the current pathname defaults to produce the
pathname that is then opened.
Any file can be opened in one of two modes, normal or
binary. Normal mode is for accessing text files, and binary mode
is for accessing other files. Unix does not distinguish these modes,
but Windows and OS/2 do: in normal mode, their file ports perform
newline translation, mapping between the carriage-return/linefeed
sequence that terminates text lines in files, and the #\newline
that terminates lines in Scheme. In binary mode, such ports do not
perform newline translation. Unless otherwise mentioned, the procedures
in this section open files in normal mode.
condition-type:file-operation-error
is
signalled.
condition-type:file-operation-error
is signalled.
The optional argument append? is an MIT Scheme extension. If
append? is given and not #f
, the file is opened in
append mode. In this mode, the contents of the file are not
overwritten; instead any characters written to the file are appended to
the end of the existing contents. If the file does not exist, append
mode creates the file and writes to it in the normal way.
condition-type:file-operation-error
is signalled.
This procedure is often used to open special files. For example, under unix this procedure can be used to open terminal device files, PTY device files, and named pipes.
open-input-file
, open-output-file
, and
open-i/o-file
, respectively.
condition-type:file-operation-error
is signalled. If
procedure returns, then the port is closed automatically and the
value yielded by procedure is returned. If procedure does
not return, then the port will not be closed automatically unless it is
reclaimed by the garbage collector.(15)
call-with-input-file
and
call-with-output-file
, respectively.
current-input-port
or current-output-port
, and the
thunk is called with no arguments. When the thunk returns,
the port is closed and the previous default is restored.
with-input-from-file
and with-output-to-file
return the
value yielded by thunk. If an escape procedure is used to escape
from the continuation of these procedures, their behavior is
implementation-dependent; in that situation MIT Scheme leaves the files
open.
with-input-from-file
and
with-output-to-file
, respectively.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section describes the simplest kinds of ports: input ports that read their input from given strings, and output ports that accumulate their output and return it as a string. It also describes "truncating" output ports, which can limit the length of the resulting string to a given value.
0
and
end defaults to (string-length string)
.
with-input-from-string
creates a new input port that reads from
string, makes that port the current input port, and calls
thunk. When thunk returns, with-input-from-string
restores the previous current input port and returns the result yielded
by thunk.
(with-input-from-string "(a b c) (d e f)" read) => (a b c) |
Note: this procedure is equivalent to:
(with-input-from-port (string->input-port string) thunk) |
with-string-output-port
returns the port's accumulated output as
a newly allocated string.
with-output-to-string
creates a new output port that accumulates
output, makes that port the default value returned by
current-output-port
, and calls thunk with no arguments.
When thunk returns, with-output-to-string
restores the
previous default and returns the accumulated output as a newly allocated
string.
(with-output-to-string (lambda () (write 'abc))) => "abc" |
Note: this procedure is equivalent to:
(with-string-output-port (lambda (port) (with-output-to-port port thunk))) |
with-output-to-string
, except that the output is
limited to k characters. If thunk attempts to write more
than k characters, it will be aborted by invoking an escape
procedure that returns from with-output-to-truncated-string
.
The value of this procedure is a pair; the car of the pair is #t
if thunk attempted to write more than k characters, and
#f
otherwise. The cdr of the pair is a newly allocated string
containing the accumulated output.
This procedure is helpful for displaying circular lists, as shown in this example:
(define inf (list 'inf)) (with-output-to-truncated-string 40 (lambda () (write inf))) => (#f . "(inf)") (set-cdr! inf inf) (with-output-to-truncated-string 40 (lambda () (write inf))) => (#t . "(inf inf inf inf inf inf inf inf inf inf") |
#f
, this
procedure is equivalent to
(with-output-to-truncated-string k (lambda () (write object))) |
otherwise it is equivalent to
(with-output-to-string (lambda () (write object))) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section describes the procedures that read input. Input procedures can read either from the current input port or from a given port. Remember that to read from a file, you must first open a port to the file.
Input ports can be divided into two types, called interactive and non-interactive. Interactive input ports are ports that read input from a source that is time-dependent; for example, a port that reads input from a terminal or from another program. Non-interactive input ports read input from a time-independent source, such as an ordinary file or a character string.
All optional arguments called input-port, if not supplied, default to the current input port.
In MIT Scheme, if input-port is an interactive input port and no
characters are immediately available, read-char
will hang waiting
for input, even if the port is in non-blocking mode.
In MIT Scheme, if input-port is an interactive input port and no
characters are immediately available, peek-char
will hang waiting
for input, even if the port is in non-blocking mode.
#t
if a character is ready on input-port and
returns #f
otherwise. If char-ready?
returns #t
then the next read-char
operation on input-port is
guaranteed not to hang. If input-port is a file port at end of
file then char-ready?
returns
#t
.(17)
read
returns the next object parsable from
input-port, updating input-port to point to the first
character past the end of the written representation of the object. If
an end of file is encountered in the input before any characters are
found that can begin an object, read
returns an end-of-file
object. The input-port remains open, and further attempts to read
will also return an end-of-file object. If an end of file is
encountered after the beginning of an object's written representation,
but the written representation is incomplete and therefore not parsable,
an error is signalled.
#t
if object is an end-of-file object; otherwise
returns #f
.
read-char
, immediately returning that
character. Otherwise, #f
is returned, unless input-port is
a file port at end of file, in which case an end-of-file object is
returned. In no case will this procedure block waiting for input.
read-string
returns the characters, up to but excluding the
terminating character, as a newly allocated string.
This procedure ignores the blocking mode of the port, blocking unconditionally until it sees either a delimiter or eof of file. If end of file is encountered before any characters are read, an end-of-file object is returned.
On many input ports, this operation is significantly faster than the
following equivalent code using peek-char
and read-char
:
(define (read-string char-set input-port) (let ((char (peek-char input-port))) (if (eof-object? char) char (list->string (let loop ((char char)) (if (or (eof-object? char) (char-set-member? char-set char)) '() (begin (read-char input-port) (cons char (loop (peek-char input-port)))))))))) |
read-line
reads a single line of text from input-port, and
returns that line as a newly allocated string. The #\newline
terminating the line, if any, is discarded and does not appear in the
returned string.
This procedure ignores the blocking mode of the port, blocking unconditionally until it has read an entire line. If end of file is encountered before any characters are read, an end-of-file object is returned.
read-string!
and read-substring!
fill the specified region
of string with characters read from input-port until the
region is full or else there are no more characters available from the
port. For read-string!
, the region is all of string, and
for read-substring!
, the region is that part of string
specified by start and end.
The returned value is the number of characters filled into the region. However, there are several interesting cases to consider:
read-string!
(read-substring!
) is called when
input-port is at "end-of-file", then the returned value is
0
. Note that "end-of-file" can mean a file port that is at the
file's end, a string port that is at the string's end, or any other port
that will never produce more characters.
read-string!
(read-substring!
) immediately returns the
value #f
. Otherwise, the operation blocks until a character is
available. As soon as at least one character is available, the region
is filled using the available characters. The procedure then returns
immediately, without waiting for further characters, even if the number
of available characters is less than the size of the region. The
returned value is the number of characters actually filled in.
The importance of read-string!
and read-substring!
are
that they are both flexible and extremely fast, especially for large
amounts of data.
The following variables may be dynamically bound to change the behavior
of the read
procedure.
string->number
. The value of this variable must be one of
2
, 8
, 10
, or 16
; any other value is ignored,
and the reader uses radix 10
.
Note that much of the number syntax is invalid for radixes other than
10
. The reader detects cases where such invalid syntax is used
and signals an error. However, problems can still occur when
*parser-radix*
is set to 16
, because syntax that normally
denotes symbols can now denote numbers (e.g. abc
). Because of
this, it is usually undesirable to set this variable to anything other
than the default.
The default value of this variable is 10
.
#t
, symbols read
by the parser are converted to lower case before being interned.
Otherwise, symbols are interned without case conversion.
In general, it is a bad idea to use this feature, as it doesn't really make Scheme case-sensitive, and therefore can break features of the Scheme runtime that depend on case-insensitive symbols.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Output ports may or may not support buffering of output, in which output characters are collected together in a buffer and then sent to the output device all at once. (Most of the output ports implemented by the runtime system support buffering.) Sending all of the characters in the buffer to the output device is called flushing the buffer. In general, output procedures do not flush the buffer of an output port unless the buffer is full.
However, the standard output procedures described in this section
perform what is called discretionary flushing of the buffer.
Discretionary output flushing works as follows. After a procedure
performs its output (writing characters to the output buffer), it checks
to see if the port implements an operation called
discretionary-flush-output
. If so, then that operation is
invoked to flush the buffer. At present, only the console port defines
discretionary-flush-output
; this is used to guarantee that output
to the console appears immediately after it is written, without
requiring calls to flush-output
.
All optional arguments called output-port, if not supplied, default to the current output port.
write-char
, except that it is usually much faster.
write-char
, except that it is usually much faster.
write
shall be parsable by read
into an equivalent object.
Thus strings that appear in the written representation are enclosed in
doublequotes, and within those strings backslash and doublequote are
escaped by backslashes. write
performs discretionary output
flushing and returns an unspecified value.
write-string
instead of by write
. Character objects
appear in the representation as if written by write-char
instead
of by write
. display
performs discretionary output
flushing and returns an unspecified value.(18)
(write-char #\newline output-port)
.
newline
. In either case,
fresh-line
performs discretionary output flushing and returns an
unspecified value.
write
, except that it writes an end-of-line to
output-port after writing object's representation. This
procedure performs discretionary output flushing and returns an
unspecified value.
pp
prints object in a visually appealing and structurally
revealing manner on output-port. If object is a procedure,
pp
attempts to print the source text. If the optional argument
as-code? is true, pp
prints lists as Scheme code, providing
appropriate indentation; by default this argument is false. pp
performs discretionary output flushing and returns an unspecified value.
The following variables may be dynamically bound to change the behavior
of the write
and display
procedures.
2
, 8
, 10
,
or 16
; the default is 10
. If *unparser-radix*
is
not 10
, numbers are prefixed to indicate their radix.
4
, only the first four elements of any list are printed, followed
by ellipses to indicate any additional elements. The value of this
variable must be an exact non-negative integer, or #f
meaning no
limit; the default is #f
.
(fluid-let ((*unparser-list-breadth-limit* 4)) (write-to-string '(a b c d))) => "(a b c d)" (fluid-let ((*unparser-list-breadth-limit* 4)) (write-to-string '(a b c d e))) => "(a b c d ...)" |
#f
meaning no limit; the default
is #f
.
(fluid-let ((*unparser-list-depth-limit* 4)) (write-to-string '((((a))) b c d))) => "((((a))) b c d)" (fluid-let ((*unparser-list-depth-limit* 4)) (write-to-string '(((((a)))) b c d))) => "((((...))) b c d)" |
#f
meaning no limit; the default
is #f
.
(fluid-let ((*unparser-string-length-limit* 4)) (write-to-string "abcd")) => "\"abcd\"" (fluid-let ((*unparser-string-length-limit* 4)) (write-to-string "abcde")) => "\"abcd...\"" |
read
. These objects are printed
using the representation #@n
, where n is the result
of calling hash
on the object to be printed. The reader
recognizes this syntax, calling unhash
on n to get back the
original object. Note that this printed representation can only be
recognized by the Scheme program in which it was generated, because
these hash numbers are different for each invocation of Scheme.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The procedure format
is very useful for producing nicely
formatted text, producing good-looking messages, and so on. MIT
Scheme's implementation of format
is similar to that of Common
Lisp, except that Common Lisp defines many more
directives.(19)
format
is a run-time-loadable option. To use it, execute
(load-option 'format) |
once before calling it.
~
) introduces a format directive. The
character after the tilde, possibly preceded by prefix parameters and
modifiers, specifies what kind of formatting is desired. Most
directives use one or more arguments to create their output; the
typical directive puts the next argument into the output,
formatted in some special way. It is an error if no argument remains
for a directive requiring an argument, but it is not an error if one or
more arguments remain unprocessed by a directive.
The output is sent to destination. If destination is
#f
, a string is created that contains the output; this string is
returned as the value of the call to format
. In all other cases
format
returns an unspecified value. If destination is
#t
, the output is sent to the current output port. Otherwise,
destination must be an output port, and the output is sent there.
This procedure performs discretionary output flushing (see section 14.5 Output Procedures).
A format
directive consists of a tilde (~
), optional
prefix parameters separated by commas, optional colon (:
) and
at-sign (@
) modifiers, and a single character indicating what
kind of directive this is. The alphabetic case of the directive
character is ignored. The prefix parameters are generally integers,
notated as optionally signed decimal numbers. If both the colon and
at-sign modifiers are given, they may appear in either order.
In place of a prefix parameter to a directive, you can put the letter `V' (or `v'), which takes an argument for use as a parameter to the directive. Normally this should be an exact integer. This feature allows variable-width fields and the like. You can also use the character `#' in place of a parameter; it represents the number of arguments remaining to be processed.
It is an error to give a format directive more parameters than it is described here as accepting. It is also an error to give colon or at-sign modifiers to a directive in a combination not specifically described here as being meaningful.
~A
display
. ~mincolA
inserts spaces on the right, if
necessary, to make the width at least mincol columns. The
@
modifier causes the spaces to be inserted on the left rather
than the right.
~S
write
. ~mincolS
inserts spaces on the right, if
necessary, to make the width at least mincol columns. The
@
modifier causes the spaces to be inserted on the left rather
than the right.
~%
#\newline
character. ~n%
outputs
n newlines. No argument is used. Simply putting a newline
in control-string would work, but ~%
is often used because
it makes the control string look nicer in the middle of a program.
~~
~n~
outputs n tildes.
~newline
@
, the
newline is left in place, but any following whitespace is ignored. This
directive is typically used when control-string is too long to fit
nicely into one line of the program:
(define (type-clash-error procedure arg spec actual) (format #t "~%Procedure ~S~%requires its %A argument ~ to be of type ~S,~%but it was called with ~ an argument of type ~S.~%" procedure arg spec actual)) |
(type-clash-error 'vector-ref "first" 'integer 'vector) prints Procedure vector-ref requires its first argument to be of type integer, but it was called with an argument of type vector. |
Note that in this example newlines appear in the output only as
specified by the ~%
directives; the actual newline characters in
the control string are suppressed because each is preceded by a tilde.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
MIT Scheme provides hooks for specifying that certain kinds of objects have special written representations. There are no restrictions on the written representations, but only a few kinds of objects may have custom representation specified for them, specifically: records (see section 10.4 Records), vectors that have special tags in their zero-th elements (see section 8. Vectors), and pairs that have special tags in their car fields (see section 7. Lists). There is a different procedure for specifying the written representation of each of these types.
An unparser method is a procedure that is invoked with two arguments: an unparser state and an object. An unparser method generates a written representation for the object, writing it to the output port specified by the unparser state. The value yielded by an unparser method is ignored. Note that an unparser state is not an output port, rather it is an object that contains an output port as one of its components. Application programs generally do not construct or examine unparser state objects, but just pass them along.
There are two ways to create an unparser method (which is then
registered by one of the above procedures). The first, and easiest, is
to use standard-unparser-method
. The second is to define your
own method using the procedure with-current-unparser-state
. We
encourage the use of the first method, as it results in a more uniform
appearance for objects. Many predefined datatypes, for example
procedures and environments, already have this appearance.
#f
or a procedure of two arguments.
If procedure is #f
, the returned method generates an
external representation of this form:
#[name hash] |
Here name is the external representation of the argument
name, as generated by write
,(20) and hash is the external
representation of an exact non-negative integer unique to the object
being printed (specifically, it is the result of calling hash
on
the object). Subsequently, the expression
#@hash |
is notation for the object.
If procedure is supplied, the returned method generates a slightly different external representation:
#[name hash output] |
Here name and hash are as above, and output is the output generated by procedure. The representation is constructed in three stages:
"#["
,
name, " "
, and hash.
The following procedure is useful for writing more general kinds of unparser methods.
with-current-unparser-state
.
The port passed to procedure should only be used within the dynamic extent of procedure.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section describes procedures that prompt the user for input. Why should the programmer use these procedures when it is possible to do prompting using ordinary input and output procedures? One reason is that the prompting procedures are more succinct. However, a second and better reason is that the prompting procedures can be separately customized for each user interface, providing more natural interaction. The interfaces for Edwin and for GNU Emacs have already been customized in this fashion; because Edwin and Emacs are very similar editors, their customizations provide very similar behavior.
Each of these procedure accepts an optional argument called
port, which if given must be an I/O port. If not
given, this port defaults to the value of
(interaction-i/o-port)
; this is initially the console
I/O port.
If prompt is a string, it is used verbatim as the prompt string.
Otherwise, it must be a pair whose car is standard
and whose cdr
is a string; in this case the prompt string is formed by prepending to
the string the current REP loop "level number" and a space.
Also, a space is appended to the string, unless it already ends in a
space or is an empty string.
The default behavior of this procedure is to print a fresh line, a newline, and the prompt string; flush the output buffer; then read an object and return it.
Under Edwin and Emacs, before the object is read, the interaction buffer is put into a mode that allows expressions to be edited and submitted for input using specific editor commands. The first expression that is submitted is returned as the value of this procedure.
char-graphic?
. If at all possible, the character is read from
the user interface using a mode that reads the character as a single
keystroke; in other words, it should not be necessary for the user to
follow the character with a carriage return or something similar.
This is the procedure called by debug
and where
to read
the user's commands.
If prompt is a string, it is used verbatim as the prompt string.
Otherwise, it must be a pair whose car is standard
and whose cdr
is a string; in this case the prompt string is formed by prepending to
the string the current REP loop "level number" and a space.
Also, a space is appended to the string, unless it already ends in a
space or is an empty string.
The default behavior of this procedure is to print a fresh line, a newline, and the prompt string; flush the output buffer; read a character in raw mode, echo that character, and return it.
Under Edwin and Emacs, instead of reading a character, the interaction buffer is put into a mode in which graphic characters submit themselves as input. After this mode change, the first such character submitted is returned as the value of this procedure.
The prompt string is formed by appending a colon and a space to prompt, unless prompt already ends in a space or is the null string.
The default behavior of this procedure is to print a fresh line, a newline, and the prompt string; flush the output buffer; then read an object and return it.
Under Edwin and Emacs, the expression is read in the minibuffer.
prompt-for-expression
to read an expression, then evaluates the
expression using environment; if environment is not given,
the REP loop environment is used.
The prompt string is formed by appending the string " (y or n)? "
to prompt, unless prompt already ends in a space or is the
null string.
The default behavior of this procedure is to print a fresh line, a
newline, and the prompt string; flush the output buffer; then read a
character in raw mode. If the character is #\y
, #\Y
, or
#\space
, the procedure returns #t
; If the character is
#\n
, #\N
, or #\rubout
, the procedure returns
#f
. Otherwise the prompt is repeated.
Under Edwin or Emacs, the confirmation is read in the minibuffer.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section describes the low-level operations that can be used to build and manipulate I/O ports.
The purpose of these operations is twofold: to allow programmers to construct new kinds of I/O ports, and to provide faster I/O operations than those supplied by the standard high level procedures. The latter is useful because the standard I/O operations provide defaulting and error checking, and sometimes other features, which are often unnecessary. This interface provides the means to bypass such features, thus improving performance.
The abstract model of an I/O port, as implemented here, is a combination of a set of named operations and a state. The state is an arbitrary object, the meaning of which is determined by the operations. The operations are defined by a mapping from names to procedures.
The set of named operations is represented by an object called a port type. A port type is constructed from a set of named operations, and is subsequently used to construct a port. The port type completely specifies the behavior of the port. Port types also support a simple form of inheritance, allowing you to create new ports that are similar to existing ports.
The port operations are divided into two classes:
y-size
that returns the height of the terminal in characters.
Because only some ports will implement these operations, programs that
use custom operations must test each port for their existence, and be
prepared to deal with ports that do not implement them.
14.9.1 Port Types 14.9.2 Constructors and Accessors for Ports 14.9.3 Input Port Operations 14.9.4 Output Port Operations 14.9.5 Blocking Mode 14.9.6 Terminal Mode
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The procedures in this section provide means for constructing port types with standard and custom operations, and accessing their operations.
#f
or a port type; if it is a port
type, any operations implemented by port-type but not specified in
operations will be implemented by the resulting port type.
Operations need not contain definitions for all of the standard
operations; the procedure will provide defaults for any standard
operations that are not defined. At a minimum, the following operations
must be defined: for input ports, read-char
and peek-char
;
for output ports, either write-char
or write-substring
.
I/O ports must supply the minimum operations for both input and
output.
If an operation in operations is defined to be #f
, then the
corresponding operation in port-type is not inherited.
If read-char
is defined in operations, then any standard
input operations defined in port-type are ignored. Likewise, if
write-char
or write-substring
is defined in
operations, then any standard output operations defined in
port-type are ignored. This feature allows overriding the
standard operations without having to enumerate them.
#t
if object is a port type,
input-port type, output-port type, or I/O-port type,
respectively. Otherwise, they return #f
.
#f
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The procedures in this section provide means for constructing ports, accessing the type of a port, and manipulating the state of a port.
(port-type/operation (port/type port) symbol) |
(port-type/operation-names (port/type port)) |
eof-object?
. This
is sometimes useful when building input ports.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section describes the standard operations on input ports. Following that, some useful custom operations are described.
#f
is returned;
otherwise the operation hangs until input is available.
read-char
.
read-char
.
char-ready?
returns #t
if at least one character is
available to be read from input-port. If no characters are
available, the operation waits up to k milliseconds before
returning #f
, returning immediately if any characters become
available while it is waiting.
read-char
and discard-char
,
except that they read or discard multiple characters at once. This can
have a marked performance improvement on buffered input ports. All
characters up to, but excluding, the first character in char-set
(or end of file) are read from input-port. read-string
returns these characters as a newly allocated string, while
discard-chars
discards them and returns an unspecified value.
These operations hang until sufficient input is available, even if
input-port is in non-blocking mode. If end of file is encountered
before any input characters, read-string
returns an end-of-file
object.
If input-port is an interactive port, and at least one character
is immediately available, the available characters are written to the
substring and this operation returns immediately. If no characters are
available, and input-port is in blocking mode, the operation
blocks until at least one character is available. Otherwise, the
operation returns #f
immediately.
This is an extremely fast way to read characters from a port.
(input-port/read-char input-port) ((input-port/operation input-port 'read-char) input-port) |
The following custom operations are implemented for input ports to files, and will also work with some other kinds of input ports:
#t
if input-port is known to be at end of file,
otherwise it returns #f
.
#f
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section describes the standard operations on output ports. Following that, some useful custom operations are described.
flush-output
.
(output-port/write-char output-port char) ((output-port/operation output-port 'write-char) output-port char) |
(output-port/write-substring output-port string 0 (string-length string)) |
The following custom operations are generally useful.
#f
is returned.
#f
is returned.
x-size
, if it exists. If the x-size
operation is both
defined and returns a value other than #f
, that value is returned
as the result of this procedure. Otherwise, output-port/x-size
returns a default value (currently 80
).
output-port/x-size
is useful for programs that tailor their
output to the width of the display (a fairly common practice). If the
output device is not a display, such programs normally want some
reasonable default width to work with, and this procedure provides
exactly that.
y-size
, if it exists. If the y-size
operation is defined,
the value it returns is returned as the result of this procedure;
otherwise, #f
is returned.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
An interactive port is always in one of two modes: blocking or non-blocking. This mode is independent of the terminal mode: each can be changed independent of the other. Furthermore, if it is an interactive I/O port, there are separate blocking modes for input and for output.
If an input port is in blocking mode, attempting to read from it when no input is available will cause Scheme to "block", i.e. suspend itself, until input is available. If an input port is in non-blocking mode, attempting to read from it when no input is available will cause the reading procedure to return immediately, indicating the lack of input in some way (exactly how this situation is indicated is separately specified for each procedure or operation).
An output port in blocking mode will block if the output device is not ready to accept output. In non-blocking mode it will return immediately after performing as much output as the device will allow (again, each procedure or operation reports this situation in its own way).
Interactive ports are initially in blocking mode; this can be changed at any time with the procedures defined in this section.
These procedures represent blocking mode by the symbol blocking
,
and non-blocking mode by the symbol nonblocking
. An argument
called mode must be one of these symbols. A port argument
to any of these procedures may be any port, even if that port does not
support blocking mode; in that case, the port is not modified in any
way.
port/with-input-blocking-mode
binds the input blocking mode of port to be mode, executes
thunk, restores the input blocking mode of port to what it
was when port/with-input-blocking-mode
was called, and returns
the value that was yielded by thunk. This binding is performed
by dynamic-wind
, which guarantees that the input blocking mode is
restored if thunk escapes from its continuation.
port/with-output-blocking-mode
binds the output blocking mode of port to be mode, executes
thunk, restores the output blocking mode of port to what it
was when port/with-output-blocking-mode
was called, and returns
the value that was yielded by thunk. This binding is performed
by dynamic-wind
, which guarantees that the output blocking mode
is restored if thunk escapes from its continuation.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A port that reads from or writes to a terminal has a terminal mode; this is either cooked or raw. This mode is independent of the blocking mode: each can be changed independent of the other. Furthermore, a terminal I/O port has independent terminal modes both for input and for output.
A terminal port in cooked mode provides some standard processing to make the terminal easy to communicate with. For example, under unix, cooked mode on input reads from the terminal a line at a time and provides rubout processing within the line, while cooked mode on output might translate linefeeds to carriage-return/linefeed pairs. In general, the precise meaning of cooked mode is operating-system dependent, and furthermore might be customizable by means of operating system utilities. The basic idea is that cooked mode does whatever is necessary to make the terminal handle all of the usual user-interface conventions for the operating system, while keeping the program's interaction with the port as normal as possible.
A terminal port in raw mode disables all of that processing. In raw mode, characters are directly read from and written to the device without any translation or interpretation by the operating system. On input, characters are available as soon as they are typed, and are not echoed on the terminal by the operating system. In general, programs that put ports in raw mode have to know the details of interacting with the terminal. In particular, raw mode is used for writing programs such as text editors.
Terminal ports are initially in cooked mode; this can be changed at any time with the procedures defined in this section.
These procedures represent cooked mode by the symbol cooked
, and
raw mode by the symbol raw
. Additionally, the value #f
represents "no mode"; it is the terminal mode of a port that is not a
terminal. An argument called mode must be one of these three
values. A port argument to any of these procedures may be any
port, even if that port does not support terminal mode; in that case,
the port is not modified in any way.
port/with-input-terminal-mode
binds the input terminal mode of port to be mode, executes
thunk, restores the input terminal mode of port to what it
was when port/with-input-terminal-mode
was called, and returns
the value that was yielded by thunk. This binding is performed
by dynamic-wind
, which guarantees that the input terminal mode is
restored if thunk escapes from its continuation.
port/with-output-terminal-mode
binds the output terminal mode of port to be mode, executes
thunk, restores the output terminal mode of port to what it
was when port/with-output-terminal-mode
was called, and returns
the value that was yielded by thunk. This binding is performed
by dynamic-wind
, which guarantees that the output terminal mode is
restored if thunk escapes from its continuation.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The parser buffer mechanism facilitates construction of parsers for complex grammars. It does this by providing an input stream with unbounded buffering and backtracking. The amount of buffering is under program control. The stream can backtrack to any position in the buffer.
The mechanism defines two data types: the parser buffer and the parser-buffer pointer. A parser buffer is like an input port with buffering and backtracking. A parser-buffer pointer is a pointer into the stream of characters provided by a parser buffer.
Note that all of the procedures defined here consider a parser buffer
to contain a stream of 8-bit characters in the ISO-8859-1
character set, except for match-utf8-char-in-alphabet
which
treats it as a stream of Unicode characters encoded as 8-bit bytes in
the UTF-8 encoding.
There are several constructors for parser buffers:
input-port->parser-buffer
, but it runs faster and uses
less memory.
substring->parser-buffer
but buffers the entire string.
Parser buffers and parser-buffer pointers may be distinguished from other objects:
#t
if object is a parser buffer, otherwise
returns #f
.
#t
if object is a parser-buffer pointer,
otherwise returns #f
.
Characters can be read from a parser buffer much as they can be read from an input port. The parser buffer maintains an internal pointer indicating its current position in the input stream. Additionally, the buffer remembers all characters that were previously read, and can look at characters arbitrarily far ahead in the stream. It is this buffering capability that facilitates complex matching and backtracking.
#f
and leaves the internal pointer
unchanged.
#f
if no
characters are available. Leaves the internal pointer unchanged.
#f
. Leaves
the internal pointer unchanged.
The internal pointer of a parser buffer can be read or written:
get-parser-buffer-pointer
on buffer.
Additionally, if some of buffer's characters have been discarded
by discard-parser-buffer-head!
, pointer must be outside
the range that was discarded.
get-parser-buffer-pointer
on buffer.
Additionally, if some of buffer's characters have been discarded
by discard-parser-buffer-head!
, pointer must be outside
the range that was discarded.
set-parser-buffer-pointer!
.
The next rather large set of procedures does conditional matching
against the contents of a parser buffer. All matching is performed
relative to the buffer's internal pointer, so the first character to
be matched against is the next character that would be returned by
peek-parser-buffer-char
. The returned value is always
#t
for a successful match, and #f
otherwise. For
procedures whose names do not end in `-no-advance', a successful
match also moves the internal pointer of the buffer forward to the end
of the matched text; otherwise the internal pointer is unchanged.
match-parser-buffer-char
compares the character to char using char=?
. The
procedures whose names contain the `-ci' modifier do
case-insensitive comparison (i.e. they use char-ci=?
). The
procedures whose names contain the `not-' modifier are successful
if the character doesn't match char.
char-set-member?
.
The remaining procedures provide information that can be used to identify locations in a parser buffer's stream.
Pointer may alternatively be a parser buffer, in which case it is equivalent to having specified the buffer's internal pointer.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Although it is possible to write parsers using the parser-buffer abstraction (see section 14.10 Parser Buffers), it is tedious. The problem is that the abstraction isn't closely matched to the way that people think about syntactic structures. In this section, we introduce a higher-level mechanism that greatly simplifies the implementation of a parser.
The parser language described here allows the programmer to write BNF-like specifications that are translated into efficient Scheme code at compile time. The language is declarative, but it can be freely mixed with Scheme code; this allows the parsing of grammars that aren't conveniently described in the language.
The language also provides backtracking. For example, this expression matches any sequence of alphanumeric characters followed by a single alphabetic character:
(*matcher (seq (* (char-set char-set:alphanumeric)) (char-set char-set:alphabetic))) |
The way that this works is that the matcher matches alphanumeric characters in the input stream until it finds a non-alphanumeric character. It then tries to match an alphabetic character, which of course fails. At this point, if it matched at least one alphanumeric character, it backtracks: the last matched alphanumeric is "unmatched", and it again attempts to match an alphabetic character. The backtracking can be arbitrarily deep; the matcher will continue to back up until it finds a way to match the remainder of the expression.
So far, this sounds a lot like regular-expression matching (see section 6.8 Regular Expressions). However, there are some important differences.
Here is an example that shows off several of the features of the parser language. The example is a parser for XML start tags:
(*parser (with-pointer p (seq "<" parse-name parse-attribute-list (alt (match ">") (match "/>") (sexp (lambda (b) (error (string-append "Unterminated start tag at " (parser-buffer-position-string p))))))))) |
This shows that the basic description of a start tag is very similar
to its BNF. Non-terminal symbols parse-name
and
parse-attribute-list
do most of the work, and the noise strings
"<"
and ">"
are the syntactic markers delimiting the
form. There are two alternate endings for start tags, and if the
parser doesn't find either of the endings, the Scheme code (wrapped in
sexp
) is run to signal an error. The error procedure
perror
takes a pointer p
, which it uses to indicate the
position in the input stream at which the error occurred. In this
case, that is the beginning of the start tag, i.e. the position of
the leading "<"
marker.
This example still looks pretty complicated, mostly due to the error-signalling code. In practice, this is abstracted into a macro, after which the expression is quite succinct:
(*parser (bracket "start tag" (seq (noise (string "<")) parse-name) (match (alt (string ">") (string "/>"))) parse-attribute-list)) |
The bracket
macro captures the pattern of a bracketed item, and
hides much of the detail.
The parser language actually consists of two languages: one for defining matchers, and one for defining parsers. The languages are intentionally very similar, and are meant to be used together. Each sub-language is described below in its own section.
The parser language is a run-time-loadable option; to use it, execute
(load-option '*parser) |
once before compiling any code that uses the language.
14.11.1 *Matcher 14.11.2 *Parser 14.11.3 Parser-language Macros
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The matcher language is a declarative language for specifying a matcher procedure. A matcher procedure is a procedure that accepts a single parser-buffer argument and returns a boolean value indicating whether the match it performs was successful. If the match succeeds, the internal pointer of the parser buffer is moved forward over the matched text. If the match fails, the internal pointer is unchanged.
For example, here is a matcher procedure that matches the character `a':
(lambda (b) (match-parser-buffer-char b #\a)) |
Here is another example that matches two given characters, c1 and c2, in sequence:
(lambda (b) (let ((p (get-parser-buffer-pointer b))) (if (match-parser-buffer-char b c1) (if (match-parser-buffer-char b c2) #t (begin (set-parser-buffer-pointer! b p) #f)) #f))) |
This is code is clear, but has lots of details that get in the way of understanding what it is doing. Here is the same example in the matcher language:
(*matcher (seq (char c1) (char c2))) |
This is much simpler and more intuitive. And it generates virtually the same code:
(pp (*matcher (seq (char c1) (char c2)))) -| (lambda (#[b1]) -| (let ((#[p1] (get-parser-buffer-pointer #[b1]))) -| (and (match-parser-buffer-char #[b1] c1) -| (if (match-parser-buffer-char #[b1] c2) -| #t -| (begin -| (set-parser-buffer-pointer! #[b1] #[p1]) -| #f))))) |
Now that we have seen an example of the language, it's time to look at
the detail. The *matcher
special form is the interface between
the matcher language and Scheme.
*matcher
expression expands into Scheme code that implements a
matcher procedure.
Here are the predefined matcher expressions. New matcher expressions can be defined using the macro facility (see section 14.11.3 Parser-language Macros). We will start with the primitive expressions.
string-ci
expression does case-insensitive matching.
end-of-input
expression is successful only when there are
no more characters available to be matched.
discard-matched
expression always successfully matches the
null string. However, it isn't meant to be used as a matching
expression; it is used for its effect. discard-matched
causes
all of the buffered text prior to this point to be discarded (i.e.
it calls discard-parser-buffer-head!
on the parser buffer).
Note that discard-matched
may not be used in certain places in
a matcher expression. The reason for this is that it deliberately
discards information needed for backtracking, so it may not be used in
a place where subsequent backtracking will need to back over it. As a
rule of thumb, use discard-matched
only in the last operand of
a seq
or alt
expression (including any seq
or
alt
expressions in which it is indirectly contained).
In addition to the above primitive expressions, there are two
convenient abbreviations. A character literal (e.g. `#\A') is
a legal primitive expression, and is equivalent to a char
expression with that literal as its operand (e.g. `(char
#\A)'). Likewise, a string literal is equivalent to a string
expression (e.g. `(string "abc")').
Next there are several combinator expressions. These closely correspond to similar combinators in regular expressions. Parameters named mexp are arbitrary expressions in the matcher language.
(seq (char-set char-set:alphabetic) (char-set char-set:numeric)) |
matches an alphabetic character followed by a numeric character, such as `H4'.
Note that if there are no mexp operands, the seq
expression successfully matches the null string.
alt
expression.
The alt
expression participates in backtracking. If one of the
mexp operands matches, but the overall match in which this
expression is embedded fails, the backtracking mechanism will cause
the alt
expression to try the remaining mexp operands.
For example, if the expression
(seq (alt "ab" "a") "b") |
is matched against the text `abc', the alt
expression will
initially match its first operand. But it will then fail to match the
second operand of the seq
expression. This will cause the
alt
to be restarted, at which time it will match `a', and
the overall match will succeed.
Note that if there are no mexp operands, the alt
match
will always fail.
The *
expression participates in backtracking; if it matches
N occurrences of mexp, but the overall match fails, it
will backtrack to N-1 occurrences and continue. If the overall
match continues to fail, the *
expression will continue to
backtrack until there are no occurrences left.
(seq mexp (* mexp)) |
(alt mexp (seq)) |
sexp
expression allows arbitrary Scheme code to be embedded
inside a matcher. The expression operand must evaluate to a
matcher procedure at run time; the procedure is called to match the
parser buffer. For example,
(*matcher (seq "a" (sexp parse-foo) "b")) |
expands to
(lambda (#[b1]) (let ((#[p1] (get-parser-buffer-pointer #[b1]))) (and (match-parser-buffer-char #[b1] #\a) (if (parse-foo #[b1]) (if (match-parser-buffer-char #[b1] #\b) #t (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f)) (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f))))) |
The case in which expression is a symbol is so common that it has an abbreviation: `(sexp symbol)' may be abbreviated as just symbol.
with-pointer
expression fetches the parser buffer's
internal pointer (using get-parser-buffer-pointer
), binds it to
identifier, and then matches the pattern specified by
mexp. Identifier must be a symbol.
This is meant to be used on conjunction with sexp
, as a way to
capture a pointer to a part of the input stream that is outside the
sexp
expression. An example of the use of with-pointer
appears above (see with-pointer example).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The parser language is a declarative language for specifying a
parser procedure. A parser procedure is a procedure that
accepts a single parser-buffer argument and parses some of the input
from the buffer. If the parse is successful, the procedure returns a
vector of objects that are the result of the parse, and the internal
pointer of the parser buffer is advanced past the input that was
parsed. If the parse fails, the procedure returns #f
and the
internal pointer is unchanged. This interface is much like that of a
matcher procedure, except that on success the parser procedure returns
a vector of values rather than #t
.
The *parser
special form is the interface between the parser
language and Scheme.
*parser
expression expands into Scheme code that implements a
parser procedure.
There are several primitive expressions in the parser language. The first two provide a bridge to the matcher language (see section 14.11.1 *Matcher):
match
expression performs a match on the parser buffer.
The match to be performed is specified by mexp, which is an
expression in the matcher language. If the match is successful, the
result of the match
expression is a vector of one element: a
string containing that text.
noise
expression performs a match on the parser buffer.
The match to be performed is specified by mexp, which is an
expression in the matcher language. If the match is successful, the
result of the noise
expression is a vector of zero elements.
(In other words, the text is matched and then thrown away.)
The mexp operand is often a known character or string, so in the
case that mexp is a character or string literal, the
noise
expression can be abbreviated as the literal. In other
words, `(noise "foo")' can be abbreviated just `"foo"'.
values
expression supports this. The
expression arguments are arbitrary Scheme expressions that are
evaluated at run time and returned in a vector. The values
expression always succeeds and never modifies the internal pointer of
the parser buffer.
discard-matched
expression always succeeds, returning a
vector of zero elements. In all other respects it is identical to the
discard-matched
expression in the matcher language.
Next there are several combinator expressions. Parameters named pexp are arbitrary expressions in the parser language. The first few combinators are direct equivalents of those in the matcher language.
seq
expression parses each of the pexp operands in
order. If all of the pexp operands successfully match, the
result is the concatenation of their values (by vector-append
).
alt
expression attempts to parse each pexp operand in
order from left to right. The first one that successfully parses
produces the result for the entire alt
expression.
Like the alt
expression in the matcher language, this
expression participates in backtracking.
*
expression parses zero or more occurrences of pexp.
The results of the parsed occurrences are concatenated together (by
vector-append
) to produce the expression's result.
Like the *
expression in the matcher language, this expression
participates in backtracking.
*
expression parses one or more occurrences of pexp.
It is equivalent to
(seq pexp (* pexp)) |
*
expression parses zero or one occurrences of pexp.
It is equivalent to
(alt pexp (seq)) |
The next three expressions do not have equivalents in the matcher language. Each accepts a single pexp argument, which is parsed in the usual way. These expressions perform transformations on the returned values of a successful match.
transform
expression performs an arbitrary transformation
of the values returned by parsing pexp. Expression is a
Scheme expression that must evaluate to a procedure at run time. If
pexp is successfully parsed, the procedure is called with the
vector of values as its argument, and must return a vector or
#f
. If it returns a vector, the parse is successful, and those
are the resulting values. If it returns #f
, the parse fails
and the internal pointer of the parser buffer is returned to what it
was before pexp was parsed.
For example:
(transform (lambda (v) (if (= 0 (vector-length v)) #f v)) ...) |
encapsulate
expression transforms the values returned by
parsing pexp into a single value. Expression is a Scheme
expression that must evaluate to a procedure at run time. If
pexp is successfully parsed, the procedure is called with the
vector of values as its argument, and may return any Scheme object.
The result of the encapsulate
expression is a vector of length
one containing that object. (And consequently encapsulate
doesn't change the success or failure of pexp, only its value.)
For example:
(encapsulate vector->list ...) |
map
expression performs a per-element transform on the
values returned by parsing pexp. Expression is a Scheme
expression that must evaluate to a procedure at run time. If
pexp is successfully parsed, the procedure is mapped (by
vector-map
) over the values returned from the parse. The
mapped values are returned as the result of the map
expression.
(And consequently map
doesn't change the success or failure of
pexp, nor the number of values returned.)
For example:
(map string->symbol ...) |
Finally, as in the matcher language, we have sexp
and
with-pointer
to support embedding Scheme code in the parser.
sexp
expression allows arbitrary Scheme code to be embedded
inside a parser. The expression operand must evaluate to a
parser procedure at run time; the procedure is called to parse the
parser buffer. This is the parser-language equivalent of the
sexp
expression in the matcher language.
The case in which expression is a symbol is so common that it has an abbreviation: `(sexp symbol)' may be abbreviated as just symbol.
with-pointer
expression fetches the parser buffer's
internal pointer (using get-parser-buffer-pointer
), binds it to
identifier, and then parses the pattern specified by pexp.
Identifier must be a symbol. This is the parser-language
equivalent of the with-pointer
expression in the matcher
language.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The parser and matcher languages provide a macro facility so that common patterns can be abstracted. The macro facility allows new expression types to be independently defined in the two languages. The macros are defined in heirarchically organized tables, so that different applications can have private macro bindings.
define
special form, and
expression is a Scheme expression.
If formals is a list (or improper list) of symbols, the first symbol in the list is the name of the macro, and the remaining symbols are interpreted as the formals of a lambda expression. A lambda expression is formed by combining the latter formals with the expression, and this lambda expression, when evaluated, becomes the expander. The defined macro accepts the same number of operands as the expander. A macro instance is expanded by applying the expander to the list of operands; the result of the application is interpreted as a replacement expression for the macro instance.
If formals is a symbol, it is the name of the macro. In this case, the expander is a procedure of no arguments whose body is expression. When the formals symbol appears by itself as an expression in the language, the expander is called with no arguments, and the result is interpreted as a replacement expression for the symbol.
define-*matcher-macro
and
define-*parser-macro
special forms expand into calls to these
procedures.
The remaining procedures define the interface to the parser-macros table abstraction. Each parser-macro table has a separate binding space for macros in the matcher and parser languages. However, the table inherits bindings from one specified table; it's not possible to inherit matcher-language bindings from one table and parser-language bindings from another.
#f
; usually it is specified as the value of
global-parser-macros
.
There is a "current" table at all times, and macro definitions are always placed in this table. By default, the current table is the global macro table, but the following procedures allow this to be changed.
parser-macros?
.
parser-macros?
, and
thunk must be a procedure of no arguments.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
MIT Scheme provides a simple non-validating XML parser. This parser is mostly conformant, with the exception that it doesn't support UTF-16. The parser also does not support external document type declarations (DTDs). The output of the parser is a record tree that closely reflects the structure of the XML document.
There is also an output mechanism that writes an XML record tree to a port. There is no guarantee that parsing an XML document and writing it back out will make a verbatim copy of the document. The output will be semantically identical but may have small syntactic differences. For example, comments are discarded by the parser, and entities are substituted during the parsing process.
The purpose of the XML support is to provide a mechanism for reading and writing simple XML documents. In the future this support may be further developed to support a standard interface such as DOM or SAX.
The XML support is a run-time-loadable option; to use it, execute
(load-option 'xml) |
once before compiling any code that uses it.
The XML interface consists of an input procedure, an output procedure, and a set of record types.
xml-document
, which is the root record of an XML
record tree. The output is encoded in UTF-8.
XML names are represented in memory as symbols. All symbols appearing within XML records are XML names. Because XML names are case sensitive, there is a procedure to intern these symbols:
(let ((name1 (xml-intern string1)) (name2 (xml-intern string2))) (if (string=? string1 string2) (eq? name1 name2) (not (eq? name1 name2)))) |
The output from the XML parser and the input to the
XML output procedure is a complex data structure composed of
a heirarchy of typed components. Each component is a record whose
fields correspond to parts of the XML structure that the
record represents. There are no special operations on these records;
each is a tuple with named subparts. The root record type is
xml-document
, which represents a complete XML
document.
Each record type type has the following associated bindings:
type-rtd
type?
#t
if the object
is a record of this type, or #f
otherwise.
make-type
type-field
set-type-field!
xml-document
record is the top-level record representing a
complete XML document. Declaration is either an
xml-declaration
object or #f
. Dtd is either an
xml-dtd
object or #f
. Root is an
xml-element
object. Misc-1, misc-2, and
misc-3 are lists of miscellaneous items; a miscellaneous item is
either an xml-processing-instructions
object or a string of
whitespace.
xml-declaration
record represents the `<?xml ...
?>' declaration that optionally appears at the beginning of an
XML document. Version is a version string, typically
"1.0"
. Encoding is either an encoding string or
#f
. Standalone is either "yes"
, "no"
, or
#f
.
xml-element
record represents general XML
elements; the bulk of a typical XML document consists of
these elements. Name is the element name (a symbol).
Attributes is a list of attributes; each attribute is a pair
whose CAR is the attribute name (a symbol), and whose CDR is
the attribute value (a string). Contents is a list of the
contents of the element. Each element of this list is either a
string, an xml-element
record, an
xml-processing-instructions
record, or an
xml-uninterpreted
record.
xml-processing-instructions
record represents processing
instructions, which have the form `<?name ... ?>'.
These instructions are intended to contain non-XML data that
will be processed by another interpreter; for example they might
contain PHP programs. The name field is the processor
name (a symbol), and the text field is the body of the
instructions (a string).
xml-uninterpreted
records. In some
situations, for example when they are embedded in attribute values,
the surrounding text is also included in the xml-uninterpreted
record. The text field contains the uninterpreted XML
text (a string).
xml-dtd
record represents a document type declaration. The
root field is an XML name for the root element of the
document. External is either an xml-external-id
record
or #f
. Internal is a list of DTD element
records (e.g. xml-!element
, xml-!attlist
, etc.).
The remaining record types are valid only within a DTD.
xml-!element
record represents an element-type
declaration. Name is the XML name of the type being
declared (a symbol). Content-type describes the type and can
have several different values, as follows:
xml-!attlist
record represents an attribute-list
declaration. Name is the XML name of the type for
which attributes are being declared (a symbol). Definitions is
a list of attribute definitions, each of which is a list of three
elements (name type default)
. Name is
an XML name for the name of the attribute (a symbol).
Type describes the attribute type, and can have one of the
following values:
Default describes the default value for the attribute, and can have one of the following values:
xml-uninterpreted
record.
xml-uninterpreted
record.
xml-!entity
record represents a general entity
declaration. Name is an XML name for the entity.
Value is the entity's value, either a string, an
xml-uninterpreted
record, or an xml-external-id
record.
xml-parameter-!entity
record represents a parameter entity
declaration. Name is an XML name for the entity.
Value is the entity's value, either a string, an
xml-uninterpreted
record, or an xml-external-id
record.
xml-unparsed-!entity
record represents an unparsed entity
declaration. Name
is an XML name for the entity.
Id is an xml-external-id
record. Notation is an
XML name for the notation.
xml-!notation
record represents a notation declaration.
Name
is an XML name for the notation. Id is an
xml-external-id
record.
xml-external-id
record is a reference to an external
DTD. This reference consists of two parts: id is a
public ID literal, corresponding to the `PUBLIC'
keyword, while uri is a system literal, corresponding to the
`SYSTEM' keyword. Either or both may be present, depending on
the context. Each is represented as a string.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |