14. Input/Output

This chapter describes the procedures that are used for input and output (I/O). The chapter first describes ports and how they are manipulated, then describes the I/O operations. Finally, some low-level procedures are described that permit the implementation of custom ports and high-performance I/O.

14.1 Ports

14.2 File Ports

14.3 String Ports

14.4 Input Procedures

14.5 Output Procedures

14.6 Format

14.7 Custom Output

14.8 Prompting

14.9 Port Primitives

14.10 Parser Buffers

14.11 Parser Language

14.12 XML Parser

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1 Ports

Scheme uses ports for I/O. A port, which can be treated like any other Scheme object, serves as a source or sink for data. A port must be open before it can be read from or written to. The standard I/O port, console-i/o-port, is opened automatically when you start Scheme. When you use a file for input or output, you need to explicitly open and close a port to the file (with procedures described in this chapter). Additional procedures let you open ports to strings.

Many input procedures, such as read-char and read, read data from the current input port by default, or from a port that you specify. The current input port is initially console-i/o-port, but Scheme provides procedures that let you change the current input port to be a file or string.

Similarly, many output procedures, such as write-char and display, write data to the current output port by default, or to a port that you specify. The current output port is initially console-i/o-port, but Scheme provides procedures that let you change the current output port to be a file or string.

All ports read or write only ISO-8859-1 characters.

Every port is either an input port, an output port, or both. The following predicates distinguish all of the possible cases.

procedure: port? object: Returns #t if object is a port, otherwise returns #f.

procedure: input-port? object: Returns #t if object is an input port, otherwise returns #f. Any object satisfying this predicate also satisfies port?.

procedure: output-port? object: Returns #t if object is an output port, otherwise returns #f. Any object satisfying this predicate also satisfies port?.

procedure: i/o-port? object: Returns #t if object is both an input port and an output port, otherwise returns #f. Any object satisfying this predicate also satisfies port?, input-port?, and output-port?.

procedure: guarantee-port object
procedure: guarantee-input-port object
procedure: guarantee-output-port object
procedure: guarantee-i/o-port object: These procedures check the type of object, signalling an error of type
condition-type:wrong-type-argument if it is not a port, input port, output port, or I/O port, respectively. Otherwise they return object.

The next five procedures return the runtime system's standard ports. All of the standard ports are dynamically bound by the REP loop; this means that when a new REP loop is started, for example by an error, each of these ports is dynamically bound to the I/O port of the REP loop. When the REP loop exits, the ports revert to their original values.

procedure: current-input-port: Returns the current input port. This is the default port used by many input procedures. Initially, current-input-port returns the value of console-i/o-port.

procedure: current-output-port: Returns the current output port. This is the default port used by many output procedures. Initially, current-output-port returns the value of console-i/o-port.

procedure: notification-output-port: Returns an output port suitable for generating "notifications", that is, messages to the user that supply interesting information about the execution of a program. For example, the load procedure writes messages to this port informing the user that a file is being loaded. Initially, notification-output-port returns the value of console-i/o-port.

procedure: trace-output-port: Returns an output port suitable for generating "tracing" information about a program's execution. The output generated by the trace procedure is sent to this port. Initially, trace-output-port returns the value of console-i/o-port.

procedure: interaction-i/o-port: Returns an I/O port suitable for querying or prompting the user. The standard prompting procedures use this port by default (see section 14.8 Prompting). Initially, interaction-i/o-port returns the value of console-i/o-port.

procedure: with-input-from-port input-port thunk

procedure: with-output-to-port output-port thunk

procedure: with-notification-output-port output-port thunk

procedure: with-trace-output-port output-port thunk

procedure: with-interaction-i/o-port i/o-port thunk

Thunk must be a procedure of no arguments. Each of these procedures binds one of the standard ports to its first argument, calls thunk with no arguments, restores the port to its original value, and returns the result that was yielded by thunk. This temporary binding is performed the same way as dynamic binding of a variable, including the behavior in the presence of continuations (see section 2.3 Dynamic Binding).

with-input-from-port binds the current input port, with-output-to-port binds the current output port, with-notification-output-port binds the "notification" output port, with-trace-output-port binds the "trace" output port, and with-interaction-i/o-port binds the "interaction" I/O port.

procedure: set-current-input-port! input-port
procedure: set-current-output-port! output-port
procedure: set-notification-output-port! output-port
procedure: set-trace-output-port! output-port
procedure: set-interaction-i/o-port! i/o-port: Each of these procedures alters the binding of one of the standard ports and returns an unspecified value. The binding that is modified corresponds to the name of the procedure.

variable: console-i/o-port

console-i/o-port is an I/O port that communicates with the "console". Under unix, the console is the controlling terminal of the Scheme process. Under Windows and OS/2, the console is the window that is created when Scheme starts up.

This variable is rarely used; instead programs should use one of the standard ports defined above. This variable should not be modified.

procedure: close-port port: Closes port and returns an unspecified value. If port is a file port, the file is closed.

procedure: close-input-port port: Closes port and returns an unspecified value. Port must be an input port or an I/O port; if it is an I/O port, then only the input side of the port is closed.

procedure: close-output-port port: Closes port and returns an unspecified value. Port must be an output port or an I/O port; if it is an I/O port, then only the output side of the port is closed.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.2 File Ports

Before Scheme can access a file for reading or writing, it is necessary to open a port to the file. This section describes procedures used to open ports to files. Such ports are closed (like any other port) by close-port. File ports are automatically closed if and when they are reclaimed by the garbage collector.

Before opening a file for input or output, by whatever method, the filename argument is converted to canonical form by calling the procedure merge-pathnames with filename as its sole argument. Thus, filename can be either a string or a pathname, and it is merged with the current pathname defaults to produce the pathname that is then opened.

Any file can be opened in one of two modes, normal or binary. Normal mode is for accessing text files, and binary mode is for accessing other files. Unix does not distinguish these modes, but Windows and OS/2 do: in normal mode, their file ports perform newline translation, mapping between the carriage-return/linefeed sequence that terminates text lines in files, and the #\newline that terminates lines in Scheme. In binary mode, such ports do not perform newline translation. Unless otherwise mentioned, the procedures in this section open files in normal mode.

procedure: open-input-file filename: Takes a filename referring to an existing file and returns an input port capable of delivering characters from the file. If the file cannot be opened, an error of type condition-type:file-operation-error is signalled.

procedure: open-output-file filename [append?]

Takes a filename referring to an output file to be created and returns an output port capable of writing characters to a new file by that name. If the file cannot be opened, an error of type condition-type:file-operation-error is signalled.

The optional argument append? is an MIT Scheme extension. If append? is given and not #f, the file is opened in append mode. In this mode, the contents of the file are not overwritten; instead any characters written to the file are appended to the end of the existing contents. If the file does not exist, append mode creates the file and writes to it in the normal way.

procedure: open-i/o-file filename

Takes a filename referring to an existing file and returns an I/O port capable of both reading and writing the file. If the file cannot be opened, an error of type condition-type:file-operation-error is signalled.

This procedure is often used to open special files. For example, under unix this procedure can be used to open terminal device files, PTY device files, and named pipes.

procedure: open-binary-input-file filename
procedure: open-binary-output-file filename [append?]
procedure: open-binary-i/o-file filename: These procedures open files in binary mode. In all other respects they are identical to open-input-file, open-output-file, and open-i/o-file, respectively.

procedure: close-all-open-files: This procedure closes all file ports that are open at the time that it is called, and returns an unspecified value.

procedure: call-with-input-file filename procedure
procedure: call-with-output-file filename procedure: These procedures call procedure with one argument: the port obtained by opening the named file for input or output, respectively. If the file cannot be opened, an error of type condition-type:file-operation-error is signalled. If procedure returns, then the port is closed automatically and the value yielded by procedure is returned. If procedure does not return, then the port will not be closed automatically unless it is reclaimed by the garbage collector.(15)

procedure: call-with-binary-input-file filename procedure
procedure: call-with-binary-output-file filename procedure: These procedures open files in binary mode. In all other respects they are identical to call-with-input-file and call-with-output-file, respectively.

procedure: with-input-from-file filename thunk
procedure: with-output-to-file filename thunk: Thunk must be a procedure of no arguments. The file is opened for input or output, an input or output port connected to it is made the default value returned by current-input-port or current-output-port, and the thunk is called with no arguments. When the thunk returns, the port is closed and the previous default is restored. with-input-from-file and with-output-to-file return the value yielded by thunk. If an escape procedure is used to escape from the continuation of these procedures, their behavior is implementation-dependent; in that situation MIT Scheme leaves the files open.

procedure: with-input-from-binary-file filename thunk
procedure: with-output-to-binary-file filename thunk: These procedures open files in binary mode. In all other respects they are identical to with-input-from-file and with-output-to-file, respectively.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.3 String Ports

This section describes the simplest kinds of ports: input ports that read their input from given strings, and output ports that accumulate their output and return it as a string. It also describes "truncating" output ports, which can limit the length of the resulting string to a given value.

procedure: string->input-port string [start [end]]: Returns a new string port that delivers characters from string. The optional arguments start and end may be used to specify that the string port delivers characters from a substring of string; if not given, start defaults to 0 and end defaults to (string-length string).

procedure: with-input-from-string string thunk

Thunk must be a procedure of no arguments. with-input-from-string creates a new input port that reads from string, makes that port the current input port, and calls thunk. When thunk returns, with-input-from-string restores the previous current input port and returns the result yielded by thunk.

(with-input-from-string "(a b c) (d e f)" read) => (a b c)

Note: this procedure is equivalent to:

(with-input-from-port (string->input-port string) thunk)

procedure: with-string-output-port procedure: Procedure is called with one argument, an output port. The value yielded by procedure is ignored. When procedure returns, with-string-output-port returns the port's accumulated output as a newly allocated string.

procedure: with-output-to-string thunk

Thunk must be a procedure of no arguments. with-output-to-string creates a new output port that accumulates output, makes that port the default value returned by current-output-port, and calls thunk with no arguments. When thunk returns, with-output-to-string restores the previous default and returns the accumulated output as a newly allocated string.

(with-output-to-string (lambda () (write 'abc))) => "abc"

Note: this procedure is equivalent to:

(with-string-output-port (lambda (port) (with-output-to-port port thunk)))

procedure: with-output-to-truncated-string k thunk

Similar to with-output-to-string, except that the output is limited to k characters. If thunk attempts to write more than k characters, it will be aborted by invoking an escape procedure that returns from with-output-to-truncated-string.

The value of this procedure is a pair; the car of the pair is #t if thunk attempted to write more than k characters, and #f otherwise. The cdr of the pair is a newly allocated string containing the accumulated output.

This procedure is helpful for displaying circular lists, as shown in this example:

(define inf (list 'inf)) (with-output-to-truncated-string 40 (lambda () (write inf))) => (#f . "(inf)") (set-cdr! inf inf) (with-output-to-truncated-string 40 (lambda () (write inf))) => (#t . "(inf inf inf inf inf inf inf inf inf inf")

procedure: write-to-string object [k]

Writes object to a string output port, and returns the resulting newly allocated string. If k is supplied and not #f, this procedure is equivalent to

(with-output-to-truncated-string k (lambda () (write object)))

otherwise it is equivalent to

(with-output-to-string (lambda () (write object)))

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.4 Input Procedures

This section describes the procedures that read input. Input procedures can read either from the current input port or from a given port. Remember that to read from a file, you must first open a port to the file.

Input ports can be divided into two types, called interactive and non-interactive. Interactive input ports are ports that read input from a source that is time-dependent; for example, a port that reads input from a terminal or from another program. Non-interactive input ports read input from a time-independent source, such as an ordinary file or a character string.

All optional arguments called input-port, if not supplied, default to the current input port.

procedure: read-char [input-port]

Returns the next character available from input-port, updating input-port to point to the following character. If no more characters are available, an end-of-file object is returned.

In MIT Scheme, if input-port is an interactive input port and no characters are immediately available, read-char will hang waiting for input, even if the port is in non-blocking mode.

procedure: peek-char [input-port]

Returns the next character available from input-port, without updating input-port to point to the following character. If no more characters are available, an end-of-file object is returned.(16)

In MIT Scheme, if input-port is an interactive input port and no characters are immediately available, peek-char will hang waiting for input, even if the port is in non-blocking mode.

procedure: char-ready? [input-port]: Returns #t if a character is ready on input-port and returns #f otherwise. If char-ready? returns #t then the next read-char operation on input-port is guaranteed not to hang. If input-port is a file port at end of file then char-ready? returns #t.(17)

procedure: read [input-port]: Converts external representations of Scheme objects into the objects themselves. read returns the next object parsable from input-port, updating input-port to point to the first character past the end of the written representation of the object. If an end of file is encountered in the input before any characters are found that can begin an object, read returns an end-of-file object. The input-port remains open, and further attempts to read will also return an end-of-file object. If an end of file is encountered after the beginning of an object's written representation, but the written representation is incomplete and therefore not parsable, an error is signalled.

procedure: eof-object? object: Returns #t if object is an end-of-file object; otherwise returns #f.

procedure: read-char-no-hang [input-port]: If input-port can deliver a character without blocking, this procedure acts exactly like read-char, immediately returning that character. Otherwise, #f is returned, unless input-port is a file port at end of file, in which case an end-of-file object is returned. In no case will this procedure block waiting for input.

procedure: read-string char-set [input-port]

Reads characters from input-port until it finds a terminating character that is a member of char-set (see section 5.6 Character Sets) or encounters end of file. The port is updated to point to the terminating character, or to end of file if no terminating character was found. read-string returns the characters, up to but excluding the terminating character, as a newly allocated string.

This procedure ignores the blocking mode of the port, blocking unconditionally until it sees either a delimiter or eof of file. If end of file is encountered before any characters are read, an end-of-file object is returned.

On many input ports, this operation is significantly faster than the following equivalent code using peek-char and read-char:

(define (read-string char-set input-port) (let ((char (peek-char input-port))) (if (eof-object? char) char (list->string (let loop ((char char)) (if (or (eof-object? char) (char-set-member? char-set char)) '() (begin (read-char input-port) (cons char (loop (peek-char input-port))))))))))

procedure: read-line [input-port]

read-line reads a single line of text from input-port, and returns that line as a newly allocated string. The #\newline terminating the line, if any, is discarded and does not appear in the returned string.

This procedure ignores the blocking mode of the port, blocking unconditionally until it has read an entire line. If end of file is encountered before any characters are read, an end-of-file object is returned.

procedure: read-string! string [input-port]

procedure: read-substring! string start end [input-port]

read-string! and read-substring! fill the specified region of string with characters read from input-port until the region is full or else there are no more characters available from the port. For read-string!, the region is all of string, and for read-substring!, the region is that part of string specified by start and end.

The returned value is the number of characters filled into the region. However, there are several interesting cases to consider:

If read-string! (read-substring!) is called when input-port is at "end-of-file", then the returned value is 0. Note that "end-of-file" can mean a file port that is at the file's end, a string port that is at the string's end, or any other port that will never produce more characters.
If input-port is an interactive port (e.g. a terminal), and one or more characters are immediately available, the region is filled using the available characters. The procedure then returns immediately, without waiting for further characters, even if the number of available characters is less than the size of the region. The returned value is the number of characters actually filled in.
If input-port is an interactive port and no characters are immediately available, the result of the operation depends on the blocking mode of the port. If the port is in non-blocking mode, read-string! (read-substring!) immediately returns the value #f. Otherwise, the operation blocks until a character is available. As soon as at least one character is available, the region is filled using the available characters. The procedure then returns immediately, without waiting for further characters, even if the number of available characters is less than the size of the region. The returned value is the number of characters actually filled in.

The importance of read-string! and read-substring! are that they are both flexible and extremely fast, especially for large amounts of data.

The following variables may be dynamically bound to change the behavior of the read procedure.

variable: *parser-radix*

This variable defines the radix used by the reader when it parses numbers. This is similar to passing a radix argument to string->number. The value of this variable must be one of 2, 8, 10, or 16; any other value is ignored, and the reader uses radix 10.

Note that much of the number syntax is invalid for radixes other than 10. The reader detects cases where such invalid syntax is used and signals an error. However, problems can still occur when *parser-radix* is set to 16, because syntax that normally denotes symbols can now denote numbers (e.g. abc). Because of this, it is usually undesirable to set this variable to anything other than the default.

The default value of this variable is 10.

variable: *parser-canonicalize-symbols?*

This variable controls how the parser handles case-sensitivity of symbols. If it is bound to its default value of #t, symbols read by the parser are converted to lower case before being interned. Otherwise, symbols are interned without case conversion.

In general, it is a bad idea to use this feature, as it doesn't really make Scheme case-sensitive, and therefore can break features of the Scheme runtime that depend on case-insensitive symbols.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.5 Output Procedures

Output ports may or may not support buffering of output, in which output characters are collected together in a buffer and then sent to the output device all at once. (Most of the output ports implemented by the runtime system support buffering.) Sending all of the characters in the buffer to the output device is called flushing the buffer. In general, output procedures do not flush the buffer of an output port unless the buffer is full.

However, the standard output procedures described in this section perform what is called discretionary flushing of the buffer. Discretionary output flushing works as follows. After a procedure performs its output (writing characters to the output buffer), it checks to see if the port implements an operation called discretionary-flush-output. If so, then that operation is invoked to flush the buffer. At present, only the console port defines discretionary-flush-output; this is used to guarantee that output to the console appears immediately after it is written, without requiring calls to flush-output.

All optional arguments called output-port, if not supplied, default to the current output port.

procedure: write-char char [output-port]: Writes char (the character itself, not a written representation of the character) to output-port, performs discretionary output flushing, and returns an unspecified value.

procedure: write-string string [output-port]: Writes string to output-port, performs discretionary output flushing, and returns an unspecified value. This is equivalent to writing the contents of string, one character at a time using write-char, except that it is usually much faster.

procedure: write-substring string start end [output-port]: Writes the substring defined by string, start, and end to output-port, performs discretionary output flushing, and returns an unspecified value. This is equivalent to writing the contents of the substring, one character at a time using write-char, except that it is usually much faster.

procedure: write object [output-port]: Writes a written representation of object to output-port, and returns an unspecified value. If object has a standard external representation, then the written representation generated by write shall be parsable by read into an equivalent object. Thus strings that appear in the written representation are enclosed in doublequotes, and within those strings backslash and doublequote are escaped by backslashes. write performs discretionary output flushing and returns an unspecified value.

procedure: display object [output-port]: Writes a representation of object to output-port. Strings appear in the written representation as if written by write-string instead of by write. Character objects appear in the representation as if written by write-char instead of by write. display performs discretionary output flushing and returns an unspecified value.(18)

procedure: newline [output-port]: Writes an end-of-line to output-port, performs discretionary output flushing, and returns an unspecified value. Equivalent to (write-char #\newline output-port).

procedure: fresh-line [output-port]: Most output ports are able to tell whether or not they are at the beginning of a line of output. If output-port is such a port, this procedure writes an end-of-line to the port only if the port is not already at the beginning of a line. If output-port is not such a port, this procedure is identical to newline. In either case, fresh-line performs discretionary output flushing and returns an unspecified value.

procedure: write-line object [output-port]: Like write, except that it writes an end-of-line to output-port after writing object's representation. This procedure performs discretionary output flushing and returns an unspecified value.

procedure: flush-output [output-port]: If output-port is buffered, this causes the contents of its buffer to be written to the output device. Otherwise it has no effect. Returns an unspecified value.

procedure: beep [output-port]: Performs a "beep" operation on output-port, performs discretionary output flushing, and returns an unspecified value. On the console port, this usually causes the console bell to beep, but more sophisticated interactive ports may take other actions, such as flashing the screen. On most output ports, e.g. file and string output ports, this does nothing.

procedure: clear [output-port]: "Clears the screen" of output-port, performs discretionary output flushing, and returns an unspecified value. On a terminal or window, this has a well-defined effect. On other output ports, e.g. file and string output ports, this does nothing.

procedure: pp object [output-port [as-code?]]: pp prints object in a visually appealing and structurally revealing manner on output-port. If object is a procedure, pp attempts to print the source text. If the optional argument as-code? is true, pp prints lists as Scheme code, providing appropriate indentation; by default this argument is false. pp performs discretionary output flushing and returns an unspecified value.

The following variables may be dynamically bound to change the behavior of the write and display procedures.

variable: *unparser-radix*: This variable specifies the default radix used to print numbers. Its value must be one of the exact integers 2, 8, 10, or 16; the default is 10. If *unparser-radix* is not 10, numbers are prefixed to indicate their radix.

variable: *unparser-list-breadth-limit*

This variable specifies a limit on the length of the printed representation of a list or vector; for example, if the limit is 4, only the first four elements of any list are printed, followed by ellipses to indicate any additional elements. The value of this variable must be an exact non-negative integer, or #f meaning no limit; the default is #f.

(fluid-let ((*unparser-list-breadth-limit* 4)) (write-to-string '(a b c d))) => "(a b c d)" (fluid-let ((*unparser-list-breadth-limit* 4)) (write-to-string '(a b c d e))) => "(a b c d ...)"

variable: *unparser-list-depth-limit*

This variable specifies a limit on the nesting of lists and vectors in the printed representation. If lists (or vectors) are more deeply nested than the limit, the part of the representation that exceeds the limit is replaced by ellipses. The value of this variable must be an exact non-negative integer, or #f meaning no limit; the default is #f.

(fluid-let ((*unparser-list-depth-limit* 4)) (write-to-string '((((a))) b c d))) => "((((a))) b c d)" (fluid-let ((*unparser-list-depth-limit* 4)) (write-to-string '(((((a)))) b c d))) => "((((...))) b c d)"

variable: *unparser-string-length-limit*

This variable specifies a limit on the length of the printed representation of strings. If a string's length exceeds this limit, the part of the printed representation for the characters exceeding the limit is replaced by ellipses. The value of this variable must be an exact non-negative integer, or #f meaning no limit; the default is #f.

(fluid-let ((*unparser-string-length-limit* 4)) (write-to-string "abcd")) => "\"abcd\"" (fluid-let ((*unparser-string-length-limit* 4)) (write-to-string "abcde")) => "\"abcd...\""

variable: *unparse-with-maximum-readability?*: This variable, which takes a boolean value, tells the printer to use a special printed representation for objects that normally print in a form that cannot be recognized by read. These objects are printed using the representation #@n, where n is the result of calling hash on the object to be printed. The reader recognizes this syntax, calling unhash on n to get back the original object. Note that this printed representation can only be recognized by the Scheme program in which it was generated, because these hash numbers are different for each invocation of Scheme.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.6 Format

The procedure format is very useful for producing nicely formatted text, producing good-looking messages, and so on. MIT Scheme's implementation of format is similar to that of Common Lisp, except that Common Lisp defines many more directives.(19)

format is a run-time-loadable option. To use it, execute

(load-option 'format)

once before calling it.

procedure: format destination control-string argument ...

Writes the characters of control-string to destination, except that a tilde (~) introduces a format directive. The character after the tilde, possibly preceded by prefix parameters and modifiers, specifies what kind of formatting is desired. Most directives use one or more arguments to create their output; the typical directive puts the next argument into the output, formatted in some special way. It is an error if no argument remains for a directive requiring an argument, but it is not an error if one or more arguments remain unprocessed by a directive.

The output is sent to destination. If destination is #f, a string is created that contains the output; this string is returned as the value of the call to format. In all other cases format returns an unspecified value. If destination is #t, the output is sent to the current output port. Otherwise, destination must be an output port, and the output is sent there.

This procedure performs discretionary output flushing (see section 14.5 Output Procedures).

A format directive consists of a tilde (~), optional prefix parameters separated by commas, optional colon (:) and at-sign (@) modifiers, and a single character indicating what kind of directive this is. The alphabetic case of the directive character is ignored. The prefix parameters are generally integers, notated as optionally signed decimal numbers. If both the colon and at-sign modifiers are given, they may appear in either order.

In place of a prefix parameter to a directive, you can put the letter `V' (or `v'), which takes an argument for use as a parameter to the directive. Normally this should be an exact integer. This feature allows variable-width fields and the like. You can also use the character `#' in place of a parameter; it represents the number of arguments remaining to be processed.

It is an error to give a format directive more parameters than it is described here as accepting. It is also an error to give colon or at-sign modifiers to a directive in a combination not specifically described here as being meaningful.

~A

The next argument, which may be any object, is printed as if by display. ~mincolA inserts spaces on the right, if necessary, to make the width at least mincol columns. The @ modifier causes the spaces to be inserted on the left rather than the right.

~S

The next argument, which may be any object, is printed as if by write. ~mincolS inserts spaces on the right, if necessary, to make the width at least mincol columns. The @ modifier causes the spaces to be inserted on the left rather than the right.

~%

This outputs a #\newline character. ~n% outputs n newlines. No argument is used. Simply putting a newline in control-string would work, but ~% is often used because it makes the control string look nicer in the middle of a program.

~~

This outputs a tilde. ~n~ outputs n tildes.

~newline

Tilde immediately followed by a newline ignores the newline and any following non-newline whitespace characters. With an @, the newline is left in place, but any following whitespace is ignored. This directive is typically used when control-string is too long to fit nicely into one line of the program:

(define (type-clash-error procedure arg spec actual) (format #t "~%Procedure ~S~%requires its %A argument ~ to be of type ~S,~%but it was called with ~ an argument of type ~S.~%" procedure arg spec actual))

(type-clash-error 'vector-ref "first" 'integer 'vector) prints Procedure vector-ref requires its first argument to be of type integer, but it was called with an argument of type vector.

Note that in this example newlines appear in the output only as specified by the ~% directives; the actual newline characters in the control string are suppressed because each is preceded by a tilde.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.7 Custom Output

MIT Scheme provides hooks for specifying that certain kinds of objects have special written representations. There are no restrictions on the written representations, but only a few kinds of objects may have custom representation specified for them, specifically: records (see section 10.4 Records), vectors that have special tags in their zero-th elements (see section 8. Vectors), and pairs that have special tags in their car fields (see section 7. Lists). There is a different procedure for specifying the written representation of each of these types.

procedure: set-record-type-unparser-method! record-type unparser-method: Changes the unparser method of the type represented by record-type to be unparser-method, and returns an unspecified value. Subsequently, when the unparser encounters a record of this type, it will invoke unparser-method to generate the written representation.

procedure: unparser/set-tagged-vector-method! tag unparser-method: Changes the unparser method of the vector type represented by tag to be unparser-method, and returns an unspecified value. Subsequently, when the unparser encounters a vector with tag as its zero-th element, it will invoke unparser-method to generate the written representation.

procedure: unparser/set-tagged-pair-method! tag unparser-method: Changes the unparser method of the pair type represented by tag to be unparser-method, and returns an unspecified value. Subsequently, when the unparser encounters a pair with tag in its car field, it will invoke unparser-method to generate the written representation.

An unparser method is a procedure that is invoked with two arguments: an unparser state and an object. An unparser method generates a written representation for the object, writing it to the output port specified by the unparser state. The value yielded by an unparser method is ignored. Note that an unparser state is not an output port, rather it is an object that contains an output port as one of its components. Application programs generally do not construct or examine unparser state objects, but just pass them along.

There are two ways to create an unparser method (which is then registered by one of the above procedures). The first, and easiest, is to use standard-unparser-method. The second is to define your own method using the procedure with-current-unparser-state. We encourage the use of the first method, as it results in a more uniform appearance for objects. Many predefined datatypes, for example procedures and environments, already have this appearance.

procedure: standard-unparser-method name procedure

Returns a standard unparser method. Name may be any object, and is used as the name of the type with which the unparser method is associated; name is usually a symbol. Procedure must be #f or a procedure of two arguments.

If procedure is #f, the returned method generates an external representation of this form:

#[name hash]

Here name is the external representation of the argument name, as generated by write,(20) and hash is the external representation of an exact non-negative integer unique to the object being printed (specifically, it is the result of calling hash on the object). Subsequently, the expression

#@hash

is notation for the object.

If procedure is supplied, the returned method generates a slightly different external representation:

#[name hash output]

Here name and hash are as above, and output is the output generated by procedure. The representation is constructed in three stages:

The first part of the format (up to output) is written to the output port specified by the unparser state. This is "#[", name, " ", and hash.
Procedure is invoked on two arguments: the object and an output port.
The closing bracket is written to the output port.

The following procedure is useful for writing more general kinds of unparser methods.

procedure: with-current-unparser-state unparser-state procedure

This procedure calls procedure with one argument, the output port from unparser-state. Additionally, it arranges for the remaining components of unparser-state to be given to the printer when they are needed. The procedure generates some output by writing to the output port using the usual output operations, and the value yielded by procedure is returned from with-current-unparser-state.

The port passed to procedure should only be used within the dynamic extent of procedure.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.8 Prompting

This section describes procedures that prompt the user for input. Why should the programmer use these procedures when it is possible to do prompting using ordinary input and output procedures? One reason is that the prompting procedures are more succinct. However, a second and better reason is that the prompting procedures can be separately customized for each user interface, providing more natural interaction. The interfaces for Edwin and for GNU Emacs have already been customized in this fashion; because Edwin and Emacs are very similar editors, their customizations provide very similar behavior.

Each of these procedure accepts an optional argument called port, which if given must be an I/O port. If not given, this port defaults to the value of (interaction-i/o-port); this is initially the console I/O port.

procedure: prompt-for-command-expression prompt [port]

Prompts the user for an expression that is to be executed as a command. This is the procedure called by the REP loop to read the user's expressions.

If prompt is a string, it is used verbatim as the prompt string. Otherwise, it must be a pair whose car is standard and whose cdr is a string; in this case the prompt string is formed by prepending to the string the current REP loop "level number" and a space. Also, a space is appended to the string, unless it already ends in a space or is an empty string.

The default behavior of this procedure is to print a fresh line, a newline, and the prompt string; flush the output buffer; then read an object and return it.

Under Edwin and Emacs, before the object is read, the interaction buffer is put into a mode that allows expressions to be edited and submitted for input using specific editor commands. The first expression that is submitted is returned as the value of this procedure.

procedure: prompt-for-command-char prompt [port]

Prompts the user for a single character that is to be executed as a command; the returned character is guaranteed to satisfy char-graphic?. If at all possible, the character is read from the user interface using a mode that reads the character as a single keystroke; in other words, it should not be necessary for the user to follow the character with a carriage return or something similar.

This is the procedure called by debug and where to read the user's commands.

The default behavior of this procedure is to print a fresh line, a newline, and the prompt string; flush the output buffer; read a character in raw mode, echo that character, and return it.

Under Edwin and Emacs, instead of reading a character, the interaction buffer is put into a mode in which graphic characters submit themselves as input. After this mode change, the first such character submitted is returned as the value of this procedure.

procedure: prompt-for-expression prompt [port]

Prompts the user for an expression.

The prompt string is formed by appending a colon and a space to prompt, unless prompt already ends in a space or is the null string.

The default behavior of this procedure is to print a fresh line, a newline, and the prompt string; flush the output buffer; then read an object and return it.

Under Edwin and Emacs, the expression is read in the minibuffer.

procedure: prompt-for-evaluated-expression prompt [environment [port]]: Prompts the user for an evaluated expression. Calls prompt-for-expression to read an expression, then evaluates the expression using environment; if environment is not given, the REP loop environment is used.

procedure: prompt-for-confirmation prompt [port]

Prompts the user for confirmation. The result yielded by this procedure is a boolean.

The prompt string is formed by appending the string " (y or n)? " to prompt, unless prompt already ends in a space or is the null string.

The default behavior of this procedure is to print a fresh line, a newline, and the prompt string; flush the output buffer; then read a character in raw mode. If the character is #\y, #\Y, or #\space, the procedure returns #t; If the character is #\n, #\N, or #\rubout, the procedure returns #f. Otherwise the prompt is repeated.

Under Edwin or Emacs, the confirmation is read in the minibuffer.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.9 Port Primitives

This section describes the low-level operations that can be used to build and manipulate I/O ports.

The purpose of these operations is twofold: to allow programmers to construct new kinds of I/O ports, and to provide faster I/O operations than those supplied by the standard high level procedures. The latter is useful because the standard I/O operations provide defaulting and error checking, and sometimes other features, which are often unnecessary. This interface provides the means to bypass such features, thus improving performance.

The abstract model of an I/O port, as implemented here, is a combination of a set of named operations and a state. The state is an arbitrary object, the meaning of which is determined by the operations. The operations are defined by a mapping from names to procedures.

The set of named operations is represented by an object called a port type. A port type is constructed from a set of named operations, and is subsequently used to construct a port. The port type completely specifies the behavior of the port. Port types also support a simple form of inheritance, allowing you to create new ports that are similar to existing ports.

The port operations are divided into two classes:

Standard operations: There is a specific set of standard operations for input ports, and a different set for output ports. Applications can assume that the standard input operations are implemented for all input ports, and likewise the standard output operations are implemented for all output ports.
Custom operations: Some ports support additional operations. For example, ports that implement output to terminals (or windows) may define an operation named y-size that returns the height of the terminal in characters. Because only some ports will implement these operations, programs that use custom operations must test each port for their existence, and be prepared to deal with ports that do not implement them.

14.9.1 Port Types

14.9.2 Constructors and Accessors for Ports

14.9.3 Input Port Operations

14.9.4 Output Port Operations

14.9.5 Blocking Mode

14.9.6 Terminal Mode

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.9.1 Port Types

The procedures in this section provide means for constructing port types with standard and custom operations, and accessing their operations.

procedure: make-port-type operations port-type

Creates and returns a new port type. Operations must be a list; each element is a list of two elements, the name of the operation (a symbol) and the procedure that implements it. Port-type is either #f or a port type; if it is a port type, any operations implemented by port-type but not specified in operations will be implemented by the resulting port type.

Operations need not contain definitions for all of the standard operations; the procedure will provide defaults for any standard operations that are not defined. At a minimum, the following operations must be defined: for input ports, read-char and peek-char; for output ports, either write-char or write-substring. I/O ports must supply the minimum operations for both input and output.

If an operation in operations is defined to be #f, then the corresponding operation in port-type is not inherited.

If read-char is defined in operations, then any standard input operations defined in port-type are ignored. Likewise, if write-char or write-substring is defined in operations, then any standard output operations defined in port-type are ignored. This feature allows overriding the standard operations without having to enumerate them.

procedure: port-type? object
procedure: input-port-type? object
procedure: output-port-type? object
procedure: i/o-port-type? object: These predicates return #t if object is a port type, input-port type, output-port type, or I/O-port type, respectively. Otherwise, they return #f.

procedure: port-type/operations port-type: Returns a newly allocated list containing all of the operations implemented by port-type. Each element of the list is a list of two elements -- the name and its associated operation.

procedure: port-type/operation-names port-type: Returns a newly allocated list whose elements are the names of the operations implemented by port-type.

procedure: port-type/operation port-type symbol: Returns the operation named symbol in port-type. If port-type has no such operation, returns #f.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.9.2 Constructors and Accessors for Ports

The procedures in this section provide means for constructing ports, accessing the type of a port, and manipulating the state of a port.

procedure: make-port port-type state: Returns a new port with type port-type and the given state. The port will be an input, output, or I/O port according to port-type.

procedure: port/type port: Returns the port type of port.

procedure: port/state port: Returns the state component of port.

procedure: set-port/state! port object: Changes the state component of port to be object. Returns an unspecified value.

procedure: port/operation port symbol

Equivalent to

(port-type/operation (port/type port) symbol)

procedure: port/operation-names port

Equivalent to

(port-type/operation-names (port/type port))

procedure: make-eof-object input-port: Returns an object that satisfies the predicate eof-object?. This is sometimes useful when building input ports.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.9.3 Input Port Operations

This section describes the standard operations on input ports. Following that, some useful custom operations are described.

operation: input port read-char input-port: Removes the next character available from input-port and returns it. If input-port has no more characters and will never have any (e.g. at the end of an input file), this operation returns an end-of-file object. If input-port has no more characters but will eventually have some more (e.g. a terminal where nothing has been typed recently), and it is in non-blocking mode, #f is returned; otherwise the operation hangs until input is available.

operation: input port peek-char input-port: Reads the next character available from input-port and returns it. The character is not removed from input-port, and a subsequent attempt to read from the port will get that character again. In other respects this operation behaves like read-char.

operation: input port discard-char input-port: Discards the next character available from input-port and returns an unspecified value. In other respects this operation behaves like read-char.

operation: input port char-ready? input-port k: char-ready? returns #t if at least one character is available to be read from input-port. If no characters are available, the operation waits up to k milliseconds before returning #f, returning immediately if any characters become available while it is waiting.

operation: input port read-string input-port char-set
operation: input port discard-chars input-port char-set: These operations are like read-char and discard-char, except that they read or discard multiple characters at once. This can have a marked performance improvement on buffered input ports. All characters up to, but excluding, the first character in char-set (or end of file) are read from input-port. read-string returns these characters as a newly allocated string, while discard-chars discards them and returns an unspecified value. These operations hang until sufficient input is available, even if input-port is in non-blocking mode. If end of file is encountered before any input characters, read-string returns an end-of-file object.

operation: input port read-substring input-port string start end

Reads characters from input-port into the substring defined by string, start, and end until either the substring has been filled or there are no more characters available. Returns the number of characters written to the substring.

If input-port is an interactive port, and at least one character is immediately available, the available characters are written to the substring and this operation returns immediately. If no characters are available, and input-port is in blocking mode, the operation blocks until at least one character is available. Otherwise, the operation returns #f immediately.

This is an extremely fast way to read characters from a port.

procedure: input-port/read-char input-port

procedure: input-port/peek-char input-port

procedure: input-port/discard-char input-port

procedure: input-port/char-ready? input-port k

procedure: input-port/read-string input-port char-set

procedure: input-port/discard-chars input-port char-set

procedure: input-port/read-substring input-port string start end

Each of these procedures invokes the respective operation on input-port. For example, the following are equivalent:

(input-port/read-char input-port) ((input-port/operation input-port 'read-char) input-port)

The following custom operations are implemented for input ports to files, and will also work with some other kinds of input ports:

operation: input port eof? input-port: Returns #t if input-port is known to be at end of file, otherwise it returns #f.

operation: input port chars-remaining input-port: Returns an estimate of the number of characters remaining to be read from input-port. This is useful only when input-port is a file port in binary mode; in other cases, it returns #f.

operation: input port buffered-input-chars input-port: Returns the number of unread characters that are stored in input-port's buffer. This will always be less than or equal to the buffer's size.

operation: input port input-buffer-size input-port: Returns the maximum number of characters that input-port's buffer can hold.

operation: input port set-input-buffer-size input-port size: Resizes input-port's buffer so that it can hold at most size characters. Characters in the buffer are discarded. Size must be an exact non-negative integer.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.9.4 Output Port Operations

This section describes the standard operations on output ports. Following that, some useful custom operations are described.

operation: output port write-char output-port char: Writes char to output-port and returns an unspecified value.

operation: output port write-substring output-port string start end: Writes the substring specified by string, start, and end to output-port and returns an unspecified value. Equivalent to writing the characters of the substring, one by one, to output-port, but is implemented very efficiently.

operation: output port fresh-line output-port: Most output ports are able to tell whether or not they are at the beginning of a line of output. If output-port is such a port, end-of-line is written to the port only if the port is not already at the beginning of a line. If output-port is not such a port, and end-of-line is unconditionally written to the port. Returns an unspecified value.

operation: output port flush-output output-port: If output-port is buffered, this causes its buffer to be written out. Otherwise it has no effect. Returns an unspecified value.

operation: output port discretionary-flush-output output-port: Normally, this operation does nothing. However, ports that support discretionary output flushing implement this operation identically to flush-output.

procedure: output-port/write-char output-port char

procedure: output-port/write-substring output-port string start end

procedure: output-port/fresh-line output-port

procedure: output-port/flush-output output-port

procedure: output-port/discretionary-flush-output output-port

Each of these procedures invokes the respective operation on output-port. For example, the following are equivalent:

(output-port/write-char output-port char) ((output-port/operation output-port 'write-char) output-port char)

procedure: output-port/write-string output-port string

Writes string to output-port. Equivalent to

(output-port/write-substring output-port string 0 (string-length string))

The following custom operations are generally useful.

operation: output port buffered-output-chars output-port: Returns the number of unwritten characters that are stored in output-port's buffer. This will always be less than or equal to the buffer's size.

operation: output port output-buffer-size output-port: Returns the maximum number of characters that output-port's buffer can hold.

operation: output port set-output-buffer-size output-port size: Resizes output-port's buffer so that it can hold at most size characters. Characters in the buffer are discarded. Size must be an exact non-negative integer.

operation: output port x-size output-port: Returns an exact positive integer that is the width of output-port in characters. If output-port has no natural width, e.g. if it is a file port, #f is returned.

operation: output port y-size output-port: Returns an exact positive integer that is the height of output-port in characters. If output-port has no natural height, e.g. if it is a file port, #f is returned.

procedure: output-port/x-size output-port

This procedure invokes the custom operation whose name is the symbol x-size, if it exists. If the x-size operation is both defined and returns a value other than #f, that value is returned as the result of this procedure. Otherwise, output-port/x-size returns a default value (currently 80).

output-port/x-size is useful for programs that tailor their output to the width of the display (a fairly common practice). If the output device is not a display, such programs normally want some reasonable default width to work with, and this procedure provides exactly that.

procedure: output-port/y-size output-port: This procedure invokes the custom operation whose name is the symbol y-size, if it exists. If the y-size operation is defined, the value it returns is returned as the result of this procedure; otherwise, #f is returned.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.9.5 Blocking Mode

An interactive port is always in one of two modes: blocking or non-blocking. This mode is independent of the terminal mode: each can be changed independent of the other. Furthermore, if it is an interactive I/O port, there are separate blocking modes for input and for output.

If an input port is in blocking mode, attempting to read from it when no input is available will cause Scheme to "block", i.e. suspend itself, until input is available. If an input port is in non-blocking mode, attempting to read from it when no input is available will cause the reading procedure to return immediately, indicating the lack of input in some way (exactly how this situation is indicated is separately specified for each procedure or operation).

An output port in blocking mode will block if the output device is not ready to accept output. In non-blocking mode it will return immediately after performing as much output as the device will allow (again, each procedure or operation reports this situation in its own way).

Interactive ports are initially in blocking mode; this can be changed at any time with the procedures defined in this section.

These procedures represent blocking mode by the symbol blocking, and non-blocking mode by the symbol nonblocking. An argument called mode must be one of these symbols. A port argument to any of these procedures may be any port, even if that port does not support blocking mode; in that case, the port is not modified in any way.

procedure: port/input-blocking-mode port: Returns the input blocking mode of port.

procedure: port/set-input-blocking-mode port mode: Changes the input blocking mode of port to be mode. Returns an unspecified value.

procedure: port/with-input-blocking-mode port mode thunk: Thunk must be a procedure of no arguments. port/with-input-blocking-mode binds the input blocking mode of port to be mode, executes thunk, restores the input blocking mode of port to what it was when port/with-input-blocking-mode was called, and returns the value that was yielded by thunk. This binding is performed by dynamic-wind, which guarantees that the input blocking mode is restored if thunk escapes from its continuation.

procedure: port/output-blocking-mode port: Returns the output blocking mode of port.

procedure: port/set-output-blocking-mode port mode: Changes the output blocking mode of port to be mode. Returns an unspecified value.

procedure: port/with-output-blocking-mode port mode thunk: Thunk must be a procedure of no arguments. port/with-output-blocking-mode binds the output blocking mode of port to be mode, executes thunk, restores the output blocking mode of port to what it was when port/with-output-blocking-mode was called, and returns the value that was yielded by thunk. This binding is performed by dynamic-wind, which guarantees that the output blocking mode is restored if thunk escapes from its continuation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.9.6 Terminal Mode

A port that reads from or writes to a terminal has a terminal mode; this is either cooked or raw. This mode is independent of the blocking mode: each can be changed independent of the other. Furthermore, a terminal I/O port has independent terminal modes both for input and for output.

A terminal port in cooked mode provides some standard processing to make the terminal easy to communicate with. For example, under unix, cooked mode on input reads from the terminal a line at a time and provides rubout processing within the line, while cooked mode on output might translate linefeeds to carriage-return/linefeed pairs. In general, the precise meaning of cooked mode is operating-system dependent, and furthermore might be customizable by means of operating system utilities. The basic idea is that cooked mode does whatever is necessary to make the terminal handle all of the usual user-interface conventions for the operating system, while keeping the program's interaction with the port as normal as possible.

A terminal port in raw mode disables all of that processing. In raw mode, characters are directly read from and written to the device without any translation or interpretation by the operating system. On input, characters are available as soon as they are typed, and are not echoed on the terminal by the operating system. In general, programs that put ports in raw mode have to know the details of interacting with the terminal. In particular, raw mode is used for writing programs such as text editors.

Terminal ports are initially in cooked mode; this can be changed at any time with the procedures defined in this section.

These procedures represent cooked mode by the symbol cooked, and raw mode by the symbol raw. Additionally, the value #f represents "no mode"; it is the terminal mode of a port that is not a terminal. An argument called mode must be one of these three values. A port argument to any of these procedures may be any port, even if that port does not support terminal mode; in that case, the port is not modified in any way.

procedure: port/input-terminal-mode port: Returns the input terminal mode of port.

procedure: port/set-input-terminal-mode port mode: Changes the input terminal mode of port to be mode. Returns an unspecified value.

procedure: port/with-input-terminal-mode port mode thunk: Thunk must be a procedure of no arguments. port/with-input-terminal-mode binds the input terminal mode of port to be mode, executes thunk, restores the input terminal mode of port to what it was when port/with-input-terminal-mode was called, and returns the value that was yielded by thunk. This binding is performed by dynamic-wind, which guarantees that the input terminal mode is restored if thunk escapes from its continuation.

procedure: port/output-terminal-mode port: Returns the output terminal mode of port.

procedure: port/set-output-terminal-mode port mode: Changes the output terminal mode of port to be mode. Returns an unspecified value.

procedure: port/with-output-terminal-mode port mode thunk: Thunk must be a procedure of no arguments. port/with-output-terminal-mode binds the output terminal mode of port to be mode, executes thunk, restores the output terminal mode of port to what it was when port/with-output-terminal-mode was called, and returns the value that was yielded by thunk. This binding is performed by dynamic-wind, which guarantees that the output terminal mode is restored if thunk escapes from its continuation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.10 Parser Buffers

The parser buffer mechanism facilitates construction of parsers for complex grammars. It does this by providing an input stream with unbounded buffering and backtracking. The amount of buffering is under program control. The stream can backtrack to any position in the buffer.

The mechanism defines two data types: the parser buffer and the parser-buffer pointer. A parser buffer is like an input port with buffering and backtracking. A parser-buffer pointer is a pointer into the stream of characters provided by a parser buffer.

Note that all of the procedures defined here consider a parser buffer to contain a stream of 8-bit characters in the ISO-8859-1 character set, except for match-utf8-char-in-alphabet which treats it as a stream of Unicode characters encoded as 8-bit bytes in the UTF-8 encoding.

There are several constructors for parser buffers:

procedure: input-port->parser-buffer port: Returns a parser buffer that buffers characters read from port.

procedure: substring->parser-buffer string start end: Returns a parser buffer that buffers the characters in the argument substring. This is equivalent to creating a string input port and calling input-port->parser-buffer, but it runs faster and uses less memory.

procedure: string->parser-buffer string: Like substring->parser-buffer but buffers the entire string.

procedure: source->parser-buffer source: Returns a parser buffer that buffers the characters returned by calling source. Source is a procedure of three arguments: a string, a start index, and an end index (in other words, a substring specifier). Each time source is called, it writes some characters in the substring, and returns the number of characters written. When there are no more characters available, it returns zero. It must not return zero in any other circumstance.

Parser buffers and parser-buffer pointers may be distinguished from other objects:

procedure: parser-buffer? object: Returns #t if object is a parser buffer, otherwise returns #f.

procedure: parser-buffer-pointer? object: Returns #t if object is a parser-buffer pointer, otherwise returns #f.

Characters can be read from a parser buffer much as they can be read from an input port. The parser buffer maintains an internal pointer indicating its current position in the input stream. Additionally, the buffer remembers all characters that were previously read, and can look at characters arbitrarily far ahead in the stream. It is this buffering capability that facilitates complex matching and backtracking.

procedure: read-parser-buffer-char buffer: Returns the next character in buffer, advancing the internal pointer past that character. If there are no more characters available, returns #f and leaves the internal pointer unchanged.

procedure: peek-parser-buffer-char buffer: Returns the next character in buffer, or #f if no characters are available. Leaves the internal pointer unchanged.

procedure: parser-buffer-ref buffer index: Returns a character in buffer. Index is a non-negative integer specifying the character to be returned. If index is zero, returns the next available character; if it is one, returns the character after that, and so on. If index specifies a position after the last character in buffer, returns #f. Leaves the internal pointer unchanged.

The internal pointer of a parser buffer can be read or written:

procedure: get-parser-buffer-pointer buffer: Returns a parser-buffer pointer object corresponding to the internal pointer of buffer.

procedure: set-parser-buffer-pointer! buffer pointer: Sets the internal pointer of buffer to the position specified by pointer. Pointer must have been returned from a previous call of get-parser-buffer-pointer on buffer. Additionally, if some of buffer's characters have been discarded by discard-parser-buffer-head!, pointer must be outside the range that was discarded.

procedure: get-parser-buffer-tail buffer pointer: Returns a newly-allocated string consisting of all of the characters in buffer that fall between pointer and buffer's internal pointer. Pointer must have been returned from a previous call of get-parser-buffer-pointer on buffer. Additionally, if some of buffer's characters have been discarded by discard-parser-buffer-head!, pointer must be outside the range that was discarded.

procedure: discard-parser-buffer-head! buffer: Discards all characters in buffer that have already been read; in other words, all characters prior to the internal pointer. After this operation has completed, it is no longer possible to move the internal pointer backwards past the current position by calling set-parser-buffer-pointer!.

The next rather large set of procedures does conditional matching against the contents of a parser buffer. All matching is performed relative to the buffer's internal pointer, so the first character to be matched against is the next character that would be returned by peek-parser-buffer-char. The returned value is always #t for a successful match, and #f otherwise. For procedures whose names do not end in `-no-advance', a successful match also moves the internal pointer of the buffer forward to the end of the matched text; otherwise the internal pointer is unchanged.

procedure: match-parser-buffer-char buffer char
procedure: match-parser-buffer-char-ci buffer char
procedure: match-parser-buffer-not-char buffer char
procedure: match-parser-buffer-not-char-ci buffer char
procedure: match-parser-buffer-char-no-advance buffer char
procedure: match-parser-buffer-char-ci-no-advance buffer char
procedure: match-parser-buffer-not-char-no-advance buffer char
procedure: match-parser-buffer-not-char-ci-no-advance buffer char: Each of these procedures compares a single character in buffer to char. The basic comparison match-parser-buffer-char compares the character to char using char=?. The procedures whose names contain the `-ci' modifier do case-insensitive comparison (i.e. they use char-ci=?). The procedures whose names contain the `not-' modifier are successful if the character doesn't match char.

procedure: match-parser-buffer-char-in-set buffer char-set
procedure: match-parser-buffer-char-in-set-no-advance buffer char-set: These procedures compare the next character in buffer against char-set using char-set-member?.

procedure: match-parser-buffer-string buffer string
procedure: match-parser-buffer-string-ci buffer string
procedure: match-parser-buffer-string-no-advance buffer string
procedure: match-parser-buffer-string-ci-no-advance buffer string: These procedures match string against buffer's contents. The `-ci' procedures do case-insensitive matching.

procedure: match-parser-buffer-substring buffer string start end
procedure: match-parser-buffer-substring-ci buffer string start end
procedure: match-parser-buffer-substring-no-advance buffer string start end
procedure: match-parser-buffer-substring-ci-no-advance buffer string start end: These procedures match the specified substring against buffer's contents. The `-ci' procedures do case-insensitive matching.

procedure: match-utf8-char-in-alphabet buffer alphabet: This procedure treats buffer's contents as UTF-8 encoded Unicode characters and matches the next such character against alphabet, which must be a Unicode alphabet (see section 5.7 Unicode). UTF-8 represents characters with 1 to 6 bytes, so a successful match can move the internal pointer forward by as many as 6 bytes.

The remaining procedures provide information that can be used to identify locations in a parser buffer's stream.

procedure: parser-buffer-position-string pointer

Returns a string describing the location of pointer in terms of its character and line indexes. This resulting string is meant to be presented to an end user in order to direct their attention to a feature in the input stream. In this string, the indexes are presented as one-based numbers.

Pointer may alternatively be a parser buffer, in which case it is equivalent to having specified the buffer's internal pointer.

procedure: parser-buffer-pointer-index pointer
procedure: parser-buffer-pointer-line pointer: Returns the character or line index, respectively, of pointer. Both indexes are zero-based.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.11 Parser Language

Although it is possible to write parsers using the parser-buffer abstraction (see section 14.10 Parser Buffers), it is tedious. The problem is that the abstraction isn't closely matched to the way that people think about syntactic structures. In this section, we introduce a higher-level mechanism that greatly simplifies the implementation of a parser.

The parser language described here allows the programmer to write BNF-like specifications that are translated into efficient Scheme code at compile time. The language is declarative, but it can be freely mixed with Scheme code; this allows the parsing of grammars that aren't conveniently described in the language.

The language also provides backtracking. For example, this expression matches any sequence of alphanumeric characters followed by a single alphabetic character:

(*matcher (seq (* (char-set char-set:alphanumeric)) (char-set char-set:alphabetic)))

The way that this works is that the matcher matches alphanumeric characters in the input stream until it finds a non-alphanumeric character. It then tries to match an alphabetic character, which of course fails. At this point, if it matched at least one alphanumeric character, it backtracks: the last matched alphanumeric is "unmatched", and it again attempts to match an alphabetic character. The backtracking can be arbitrarily deep; the matcher will continue to back up until it finds a way to match the remainder of the expression.

So far, this sounds a lot like regular-expression matching (see section 6.8 Regular Expressions). However, there are some important differences.

The parser language uses a Scheme-like syntax that is easier to read and write than regular-expression notation.
The language provides macros so that common syntactic constructs can be abstracted.
The language mixes easily with Scheme code, allowing the full power of Scheme to be applied to program around limitations in the parser language.
The language provides expressive facilities for converting syntax into parsed structure. It also makes it easy to convert parsed strings into meaningful objects (e.g. numbers).
The language is compiled into machine language; regular expressions are usually interpreted.

Here is an example that shows off several of the features of the parser language. The example is a parser for XML start tags:

(*parser (with-pointer p (seq "<" parse-name parse-attribute-list (alt (match ">") (match "/>") (sexp (lambda (b) (error (string-append "Unterminated start tag at " (parser-buffer-position-string p)))))))))

This shows that the basic description of a start tag is very similar to its BNF. Non-terminal symbols parse-name and parse-attribute-list do most of the work, and the noise strings "<" and ">" are the syntactic markers delimiting the form. There are two alternate endings for start tags, and if the parser doesn't find either of the endings, the Scheme code (wrapped in sexp) is run to signal an error. The error procedure perror takes a pointer p, which it uses to indicate the position in the input stream at which the error occurred. In this case, that is the beginning of the start tag, i.e. the position of the leading "<" marker.

This example still looks pretty complicated, mostly due to the error-signalling code. In practice, this is abstracted into a macro, after which the expression is quite succinct:

(*parser (bracket "start tag" (seq (noise (string "<")) parse-name) (match (alt (string ">") (string "/>"))) parse-attribute-list))

The bracket macro captures the pattern of a bracketed item, and hides much of the detail.

The parser language actually consists of two languages: one for defining matchers, and one for defining parsers. The languages are intentionally very similar, and are meant to be used together. Each sub-language is described below in its own section.

The parser language is a run-time-loadable option; to use it, execute

(load-option '*parser)

once before compiling any code that uses the language.

14.11.1 *Matcher

14.11.2 *Parser

14.11.3 Parser-language Macros

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.11.1 *Matcher

The matcher language is a declarative language for specifying a matcher procedure. A matcher procedure is a procedure that accepts a single parser-buffer argument and returns a boolean value indicating whether the match it performs was successful. If the match succeeds, the internal pointer of the parser buffer is moved forward over the matched text. If the match fails, the internal pointer is unchanged.

For example, here is a matcher procedure that matches the character `a':

(lambda (b) (match-parser-buffer-char b #\a))

Here is another example that matches two given characters, c1 and c2, in sequence:

(lambda (b) (let ((p (get-parser-buffer-pointer b))) (if (match-parser-buffer-char b c1) (if (match-parser-buffer-char b c2) #t (begin (set-parser-buffer-pointer! b p) #f)) #f)))

This is code is clear, but has lots of details that get in the way of understanding what it is doing. Here is the same example in the matcher language:

(*matcher (seq (char c1) (char c2)))

This is much simpler and more intuitive. And it generates virtually the same code:

Now that we have seen an example of the language, it's time to look at the detail. The *matcher special form is the interface between the matcher language and Scheme.

special form: *matcher mexp: The operand mexp is an expression in the matcher language. The *matcher expression expands into Scheme code that implements a matcher procedure.

Here are the predefined matcher expressions. New matcher expressions can be defined using the macro facility (see section 14.11.3 Parser-language Macros). We will start with the primitive expressions.

matcher expression: char expression
matcher expression: char-ci expression
matcher expression: not-char expression
matcher expression: not-char-ci expression: These expressions match a given character. In each case, the expression operand is a Scheme expression that must evaluate to a character at run time. The `-ci' expressions do case-insensitive matching. The `not-' expressions match any character other than the given one.

matcher expression: string expression
matcher expression: string-ci expression: These expressions match a given string. The expression operand is a Scheme expression that must evaluate to a string at run time. The string-ci expression does case-insensitive matching.

matcher expression: char-set expression: These expressions match a single character that is a member of a given character set. The expression operand is a Scheme expression that must evaluate to a character set at run time.

matcher expression: alphabet expression: These expressions match a single character that is a member of a given Unicode alphabet (see section 5.7 Unicode). The expression operand is a Scheme expression that must evaluate to an alphabet at run time.

matcher expression: end-of-input: The end-of-input expression is successful only when there are no more characters available to be matched.

matcher expression: discard-matched

The discard-matched expression always successfully matches the null string. However, it isn't meant to be used as a matching expression; it is used for its effect. discard-matched causes all of the buffered text prior to this point to be discarded (i.e. it calls discard-parser-buffer-head! on the parser buffer).

Note that discard-matched may not be used in certain places in a matcher expression. The reason for this is that it deliberately discards information needed for backtracking, so it may not be used in a place where subsequent backtracking will need to back over it. As a rule of thumb, use discard-matched only in the last operand of a seq or alt expression (including any seq or alt expressions in which it is indirectly contained).

In addition to the above primitive expressions, there are two convenient abbreviations. A character literal (e.g. `#\A') is a legal primitive expression, and is equivalent to a char expression with that literal as its operand (e.g. `(char #\A)'). Likewise, a string literal is equivalent to a string expression (e.g. `(string "abc")').

Next there are several combinator expressions. These closely correspond to similar combinators in regular expressions. Parameters named mexp are arbitrary expressions in the matcher language.

matcher expression: seq mexp ...

This matches each mexp operand in sequence. For example,

(seq (char-set char-set:alphabetic) (char-set char-set:numeric))

matches an alphabetic character followed by a numeric character, such as `H4'.

Note that if there are no mexp operands, the seq expression successfully matches the null string.

matcher expression: alt mexp ...

This attempts to match each mexp operand in order from left to right. The first one that successfully matches becomes the match for the entire alt expression.

The alt expression participates in backtracking. If one of the mexp operands matches, but the overall match in which this expression is embedded fails, the backtracking mechanism will cause the alt expression to try the remaining mexp operands. For example, if the expression

(seq (alt "ab" "a") "b")

is matched against the text `abc', the alt expression will initially match its first operand. But it will then fail to match the second operand of the seq expression. This will cause the alt to be restarted, at which time it will match `a', and the overall match will succeed.

Note that if there are no mexp operands, the alt match will always fail.

matcher expression: * mexp

This matches zero or more occurrences of the mexp operand. (Consequently this match always succeeds.)

The * expression participates in backtracking; if it matches N occurrences of mexp, but the overall match fails, it will backtrack to N-1 occurrences and continue. If the overall match continues to fail, the * expression will continue to backtrack until there are no occurrences left.

matcher expression: + mexp

This matches one or more occurrences of the mexp operand. It is equivalent to

(seq mexp (* mexp))

matcher expression: ? mexp

This matches zero or one occurrences of the mexp operand. It is equivalent to

(alt mexp (seq))

matcher expression: sexp expression

The sexp expression allows arbitrary Scheme code to be embedded inside a matcher. The expression operand must evaluate to a matcher procedure at run time; the procedure is called to match the parser buffer. For example,

(*matcher (seq "a" (sexp parse-foo) "b"))

expands to

(lambda (#[b1]) (let ((#[p1] (get-parser-buffer-pointer #[b1]))) (and (match-parser-buffer-char #[b1] #\a) (if (parse-foo #[b1]) (if (match-parser-buffer-char #[b1] #\b) #t (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f)) (begin (set-parser-buffer-pointer! #[b1] #[p1]) #f)))))

The case in which expression is a symbol is so common that it has an abbreviation: `(sexp symbol)' may be abbreviated as just symbol.

matcher expression: with-pointer identifier mexp

The with-pointer expression fetches the parser buffer's internal pointer (using get-parser-buffer-pointer), binds it to identifier, and then matches the pattern specified by mexp. Identifier must be a symbol.

This is meant to be used on conjunction with sexp, as a way to capture a pointer to a part of the input stream that is outside the sexp expression. An example of the use of with-pointer appears above (see with-pointer example).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.11.2 *Parser

The parser language is a declarative language for specifying a parser procedure. A parser procedure is a procedure that accepts a single parser-buffer argument and parses some of the input from the buffer. If the parse is successful, the procedure returns a vector of objects that are the result of the parse, and the internal pointer of the parser buffer is advanced past the input that was parsed. If the parse fails, the procedure returns #f and the internal pointer is unchanged. This interface is much like that of a matcher procedure, except that on success the parser procedure returns a vector of values rather than #t.

The *parser special form is the interface between the parser language and Scheme.

special form: *parser pexp: The operand pexp is an expression in the parser language. The *parser expression expands into Scheme code that implements a parser procedure.

There are several primitive expressions in the parser language. The first two provide a bridge to the matcher language (see section 14.11.1 *Matcher):

parser expression: match mexp: The match expression performs a match on the parser buffer. The match to be performed is specified by mexp, which is an expression in the matcher language. If the match is successful, the result of the match expression is a vector of one element: a string containing that text.

parser expression: noise mexp

The noise expression performs a match on the parser buffer. The match to be performed is specified by mexp, which is an expression in the matcher language. If the match is successful, the result of the noise expression is a vector of zero elements. (In other words, the text is matched and then thrown away.)

The mexp operand is often a known character or string, so in the case that mexp is a character or string literal, the noise expression can be abbreviated as the literal. In other words, `(noise "foo")' can be abbreviated just `"foo"'.

parser expression: values expression ...: Sometimes it is useful to be able to insert arbitrary values into the parser result. The values expression supports this. The expression arguments are arbitrary Scheme expressions that are evaluated at run time and returned in a vector. The values expression always succeeds and never modifies the internal pointer of the parser buffer.

parser expression: discard-matched: The discard-matched expression always succeeds, returning a vector of zero elements. In all other respects it is identical to the discard-matched expression in the matcher language.

Next there are several combinator expressions. Parameters named pexp are arbitrary expressions in the parser language. The first few combinators are direct equivalents of those in the matcher language.

parser expression: seq pexp ...: The seq expression parses each of the pexp operands in order. If all of the pexp operands successfully match, the result is the concatenation of their values (by vector-append).

parser expression: alt pexp ...

The alt expression attempts to parse each pexp operand in order from left to right. The first one that successfully parses produces the result for the entire alt expression.

Like the alt expression in the matcher language, this expression participates in backtracking.

parser expression: * pexp

The * expression parses zero or more occurrences of pexp. The results of the parsed occurrences are concatenated together (by vector-append) to produce the expression's result.

Like the * expression in the matcher language, this expression participates in backtracking.

parser expression: + pexp

The * expression parses one or more occurrences of pexp. It is equivalent to

(seq pexp (* pexp))

parser expression: ? pexp

The * expression parses zero or one occurrences of pexp. It is equivalent to

(alt pexp (seq))

The next three expressions do not have equivalents in the matcher language. Each accepts a single pexp argument, which is parsed in the usual way. These expressions perform transformations on the returned values of a successful match.

parser expression: transform expression pexp

The transform expression performs an arbitrary transformation of the values returned by parsing pexp. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is called with the vector of values as its argument, and must return a vector or #f. If it returns a vector, the parse is successful, and those are the resulting values. If it returns #f, the parse fails and the internal pointer of the parser buffer is returned to what it was before pexp was parsed.

For example:

(transform (lambda (v) (if (= 0 (vector-length v)) #f v)) ...)

parser expression: encapsulate expression pexp

The encapsulate expression transforms the values returned by parsing pexp into a single value. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is called with the vector of values as its argument, and may return any Scheme object. The result of the encapsulate expression is a vector of length one containing that object. (And consequently encapsulate doesn't change the success or failure of pexp, only its value.)

For example:

(encapsulate vector->list ...)

parser expression: map expression pexp

The map expression performs a per-element transform on the values returned by parsing pexp. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is mapped (by vector-map) over the values returned from the parse. The mapped values are returned as the result of the map expression. (And consequently map doesn't change the success or failure of pexp, nor the number of values returned.)

For example:

(map string->symbol ...)

Finally, as in the matcher language, we have sexp and with-pointer to support embedding Scheme code in the parser.

parser expression: sexp expression

The sexp expression allows arbitrary Scheme code to be embedded inside a parser. The expression operand must evaluate to a parser procedure at run time; the procedure is called to parse the parser buffer. This is the parser-language equivalent of the sexp expression in the matcher language.

The case in which expression is a symbol is so common that it has an abbreviation: `(sexp symbol)' may be abbreviated as just symbol.

parser expression: with-pointer identifier pexp: The with-pointer expression fetches the parser buffer's internal pointer (using get-parser-buffer-pointer), binds it to identifier, and then parses the pattern specified by pexp. Identifier must be a symbol. This is the parser-language equivalent of the with-pointer expression in the matcher language.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.11.3 Parser-language Macros

The parser and matcher languages provide a macro facility so that common patterns can be abstracted. The macro facility allows new expression types to be independently defined in the two languages. The macros are defined in heirarchically organized tables, so that different applications can have private macro bindings.

special form: define-*matcher-macro formals expression

special form: define-*parser-macro formals expression

These special forms are used to define macros in the matcher and parser language, respectively. Formals is like the formals list of a define special form, and expression is a Scheme expression.

If formals is a list (or improper list) of symbols, the first symbol in the list is the name of the macro, and the remaining symbols are interpreted as the formals of a lambda expression. A lambda expression is formed by combining the latter formals with the expression, and this lambda expression, when evaluated, becomes the expander. The defined macro accepts the same number of operands as the expander. A macro instance is expanded by applying the expander to the list of operands; the result of the application is interpreted as a replacement expression for the macro instance.

If formals is a symbol, it is the name of the macro. In this case, the expander is a procedure of no arguments whose body is expression. When the formals symbol appears by itself as an expression in the language, the expander is called with no arguments, and the result is interpreted as a replacement expression for the symbol.

procedure: define-*matcher-expander identifier expander
procedure: define-*parser-expander identifier expander: These procedures provide a procedural interface to the macro-definition mechanism. Identifier must be a symbol, and expander must be an expander procedure, as defined above. Instances of the define-*matcher-macro and define-*parser-macro special forms expand into calls to these procedures.

The remaining procedures define the interface to the parser-macros table abstraction. Each parser-macro table has a separate binding space for macros in the matcher and parser languages. However, the table inherits bindings from one specified table; it's not possible to inherit matcher-language bindings from one table and parser-language bindings from another.

procedure: make-parser-macros parent-table: Create and return a new parser-macro table that inherits from parent-table. Parent-table must be either a parser-macro table, or #f; usually it is specified as the value of global-parser-macros.

procedure: parser-macros? object: This is a predicate for parser-macro tables.

procedure: global-parser-macros: Return the global parser-macro table. This table is predefined and contains all of the bindings documented here.

There is a "current" table at all times, and macro definitions are always placed in this table. By default, the current table is the global macro table, but the following procedures allow this to be changed.

procedure: current-parser-macros: Return the current parser-macro table.

procedure: set-current-parser-macros! table: Change the current parser-macro table to table, which must satisfy parser-macros?.

procedure: with-current-parser-macros table thunk: Bind the current parser-macro table to table, call thunk with no arguments, then restore the original table binding. The value returned by thunk is the returned as the value of this procedure. Table must satisfy parser-macros?, and thunk must be a procedure of no arguments.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.12 XML Parser

MIT Scheme provides a simple non-validating XML parser. This parser is mostly conformant, with the exception that it doesn't support UTF-16. The parser also does not support external document type declarations (DTDs). The output of the parser is a record tree that closely reflects the structure of the XML document.

There is also an output mechanism that writes an XML record tree to a port. There is no guarantee that parsing an XML document and writing it back out will make a verbatim copy of the document. The output will be semantically identical but may have small syntactic differences. For example, comments are discarded by the parser, and entities are substituted during the parsing process.

The purpose of the XML support is to provide a mechanism for reading and writing simple XML documents. In the future this support may be further developed to support a standard interface such as DOM or SAX.

The XML support is a run-time-loadable option; to use it, execute

(load-option 'xml)

once before compiling any code that uses it.

The XML interface consists of an input procedure, an output procedure, and a set of record types.

procedure: parse-xml-document buffer: This procedure parses an XML input stream and returns a newly-allocated XML record tree. The buffer argument must be a parser buffer (see section 14.10 Parser Buffers). Most errors in the input stream are detected and signalled, with information identifying the location of the error where possible. Note that the input stream is assumed to be UTF-8.

procedure: write-xml xml-document port: This procedure writes an XML record tree to port. The xml-document argument must be a record of type xml-document, which is the root record of an XML record tree. The output is encoded in UTF-8.

XML names are represented in memory as symbols. All symbols appearing within XML records are XML names. Because XML names are case sensitive, there is a procedure to intern these symbols:

procedure: xml-intern string

Returns the XML name called string. XML names are represented as symbols, but unlike ordinary Scheme symbols, they are case sensitive. The following is true for any two strings string1 and string2:

(let ((name1 (xml-intern string1)) (name2 (xml-intern string2))) (if (string=? string1 string2) (eq? name1 name2) (not (eq? name1 name2))))

The output from the XML parser and the input to the XML output procedure is a complex data structure composed of a heirarchy of typed components. Each component is a record whose fields correspond to parts of the XML structure that the record represents. There are no special operations on these records; each is a tuple with named subparts. The root record type is xml-document, which represents a complete XML document.

Each record type type has the following associated bindings:

type-rtd: is a variable bound to the record-type descriptor for type. The record-type descriptor may be used as a specializer in SOS method definitions, which greatly simplifies code to dispatch on these types.
type?: is a predicate for records of type type. It accepts one argument, which can be any object, and returns #t if the object is a record of this type, or #f otherwise.
make-type: is a constructor for records of type type. It accepts one argument for each field of type, in the same order that they are written in the type description, and returns a newly-allocated record of that type.
type-field: is an accessor procedure for the field field in records of type type. It accepts one argument, which must be a record of that type, and returns the contents of the corresponding field in the record.
set-type-field!: is a modifier procedure for the field field in records of type type. It accepts two arguments: the first must be a record of that type, and the second is a new value for the corresponding field. The record's field is modified to have the new value.

record type: xml-document declaration misc-1 dtd misc-2 root misc-3: The xml-document record is the top-level record representing a complete XML document. Declaration is either an xml-declaration object or #f. Dtd is either an xml-dtd object or #f. Root is an xml-element object. Misc-1, misc-2, and misc-3 are lists of miscellaneous items; a miscellaneous item is either an xml-processing-instructions object or a string of whitespace.

record type: xml-declaration version encoding standalone: The xml-declaration record represents the `<?xml ... ?>' declaration that optionally appears at the beginning of an XML document. Version is a version string, typically "1.0". Encoding is either an encoding string or #f. Standalone is either "yes", "no", or #f.

record type: xml-element name attributes contents: The xml-element record represents general XML elements; the bulk of a typical XML document consists of these elements. Name is the element name (a symbol). Attributes is a list of attributes; each attribute is a pair whose CAR is the attribute name (a symbol), and whose CDR is the attribute value (a string). Contents is a list of the contents of the element. Each element of this list is either a string, an xml-element record, an xml-processing-instructions record, or an xml-uninterpreted record.

record type: xml-processing-instructions name text: The xml-processing-instructions record represents processing instructions, which have the form `<?name ... ?>'. These instructions are intended to contain non-XML data that will be processed by another interpreter; for example they might contain PHP programs. The name field is the processor name (a symbol), and the text field is the body of the instructions (a string).

record type: xml-uninterpreted text: Some documents contain entity references that can't be expanded by the parser, perhaps because the document requires an external DTD. Such references are left uninterpreted in the output by wrapping them in xml-uninterpreted records. In some situations, for example when they are embedded in attribute values, the surrounding text is also included in the xml-uninterpreted record. The text field contains the uninterpreted XML text (a string).

record type: xml-dtd root external internal: The xml-dtd record represents a document type declaration. The root field is an XML name for the root element of the document. External is either an xml-external-id record or #f. Internal is a list of DTD element records (e.g. xml-!element, xml-!attlist, etc.).

The remaining record types are valid only within a DTD.

record type: xml-!element name content-type

The xml-!element record represents an element-type declaration. Name is the XML name of the type being declared (a symbol). Content-type describes the type and can have several different values, as follows:

The XML names `EMPTY' and `ANY' correspond to the XML keywords of the same name.
A list `(MIX type ...)' corresponds to the `(#PCDATA | type | ...)' syntax.

record type: xml-!attlist name definitions

The xml-!attlist record represents an attribute-list declaration. Name is the XML name of the type for which attributes are being declared (a symbol). Definitions is a list of attribute definitions, each of which is a list of three elements (name type default). Name is an XML name for the name of the attribute (a symbol). Type describes the attribute type, and can have one of the following values:

The XML names `CDATA', `IDREFS', `IDREF', `ID', `ENTITY', `ENTITIES', `NMTOKENS', and `NMTOKEN' correspond to the XML keywords of the same names.
A list `(NOTATION name1 name2 ...)' corresponds to the `NOTATION (name1 | name2 ...)' syntax.
A list `(ENUMERATED name1 name2 ...)' corresponds to the `(name1 | name2 ...)' syntax.

Default describes the default value for the attribute, and can have one of the following values:

The XML names `#REQUIRED' and `#IMPLIED' correspond to the XML keywords of the same names.
A list `(#FIXED value)' corresponds to the `#FIXED "value"' syntax. Value is represented as a string, but might also be an xml-uninterpreted record.
A list `(DEFAULT value)' corresponds to the `"value"' syntax. Value is represented as a string, but might also be an xml-uninterpreted record.

record type: xml-!entity name value: The xml-!entity record represents a general entity declaration. Name is an XML name for the entity. Value is the entity's value, either a string, an xml-uninterpreted record, or an xml-external-id record.

record type: xml-parameter-!entity name value: The xml-parameter-!entity record represents a parameter entity declaration. Name is an XML name for the entity. Value is the entity's value, either a string, an xml-uninterpreted record, or an xml-external-id record.

record type: xml-unparsed-!entity name id notation: The xml-unparsed-!entity record represents an unparsed entity declaration. Name is an XML name for the entity. Id is an xml-external-id record. Notation is an XML name for the notation.

record type: xml-!notation name id: The xml-!notation record represents a notation declaration. Name is an XML name for the notation. Id is an xml-external-id record.

record type: xml-external-id id uri: The xml-external-id record is a reference to an external DTD. This reference consists of two parts: id is a public ID literal, corresponding to the `PUBLIC' keyword, while uri is a system literal, corresponding to the `SYSTEM' keyword. Either or both may be present, depending on the context. Each is represented as a string.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Chris Hanson on June, 17 2002 using texi2html