6. Strings

A string is a mutable sequence of characters. In the current implementation of MIT Scheme, the elements of a string must all satisfy the predicate char-ascii?; if someone ports MIT Scheme to a non-ASCII operating system this requirement will change.

A string is written as a sequence of characters enclosed within double quotes " ". To include a double quote inside a string, precede the double quote with a backslash \ (escape it), as in

"The word \"recursion\" has many meanings."

The printed representation of this string is

The word "recursion" has many meanings.

To include a backslash inside a string, precede it with another backslash; for example,

"Use #\\Control-q to quit."

The printed representation of this string is

Use #\Control-q to quit.

The effect of a backslash that doesn't precede a double quote or backslash is unspecified in standard Scheme, but MIT Scheme specifies the effect for three other characters: \t, \n, and \f. These escape sequences are respectively translated into the following characters: #\tab, #\newline, and #\page. Finally, a backslash followed by exactly three octal digits is translated into the character whose ISO-8859-1 code is those digits.

If a string literal is continued from one line to another, the string will contain the newline character (#\newline) at the line break. Standard Scheme does not specify what appears in a string literal at a line break.

The length of a string is the number of characters that it contains. This number is an exact non-negative integer that is established when the string is created (but see section 6.10 Variable-Length Strings). Each character in a string has an index, which is a number that indicates the character's position in the string. The index of the first (leftmost) character in a string is 0, and the index of the last character is one less than the length of the string. The valid indexes of a string are the exact non-negative integers less than the length of the string.

A number of the string procedures operate on substrings. A substring is a segment of a string, which is specified by two integers start and end satisfying these relationships:

0 <= start <= end <= (string-length string)

Start is the index of the first character in the substring, and end is one greater than the index of the last character in the substring. Thus if start and end are equal, they refer to an empty substring, and if start is zero and end is the length of string, they refer to all of string.

Some of the procedures that operate on strings ignore the difference between uppercase and lowercase. The versions that ignore case include `-ci' (for "case insensitive") in their names.

6.1 Construction of Strings

6.2 Selecting String Components

6.3 Comparison of Strings

6.4 Alphabetic Case in Strings

6.5 Cutting and Pasting Strings

6.6 Searching Strings

6.7 Matching Strings

6.8 Regular Expressions

6.9 Modification of Strings

6.10 Variable-Length Strings

6.11 Byte Vectors

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.1 Construction of Strings

procedure: make-string k [char]

Returns a newly allocated string of length k. If you specify char, all elements of the string are initialized to char, otherwise the contents of the string are unspecified. Char must satisfy the predicate char-ascii?.

(make-string 10 #\x) => "xxxxxxxxxx"

procedure: string char ...

Returns a newly allocated string consisting of the specified characters. The arguments must all satisfy char-ascii?.

(string #\a) => "a" (string #\a #\b #\c) => "abc" (string #\a #\space #\b #\space #\c) => "a b c" (string) => ""

procedure: list->string char-list

Char-list must be a list of ISO-8859-1 characters. list->string returns a newly allocated string formed from the elements of char-list. This is equivalent to

(apply string
char-list)

. The inverse of this operation is string->list.

(list->string '(#\a #\b)) => "ab" (string->list "Hello") => (#\H #\e #\l #\l #\o)

procedure: string-copy string

Returns a newly allocated copy of string.

Note regarding variable-length strings: the maximum length of the result depends only on the length of string, not its maximum length. If you wish to copy a string and preserve its maximum length, do the following:

(define (string-copy-preserving-max-length string) (let ((length)) (dynamic-wind (lambda () (set! length (string-length string)) (set-string-length! string (string-maximum-length string))) (lambda () (string-copy string)) (lambda () (set-string-length! string length)))))

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.2 Selecting String Components

procedure: string? object

Returns #t if object is a string; otherwise returns #f.

(string? "Hi") => #t (string? 'Hi) => #f

procedure: string-length string

Returns the length of string as an exact non-negative integer.

(string-length "") => 0 (string-length "The length") => 10

procedure: string-null? string

Returns #t if string has zero length; otherwise returns #f.

(string-null? "") => #t (string-null? "Hi") => #f

procedure: string-ref string k

Returns character k of string. K must be a valid index of string.

(string-ref "Hello" 1) => #\e (string-ref "Hello" 5) error--> 5 not in correct range

procedure: string-set! string k char

Stores char in element k of string and returns an unspecified value. K must be a valid index of string, and char must satisfy the predicate char-ascii?.

(define str "Dog") => unspecified (string-set! str 0 #\L) => unspecified str => "Log" (string-set! str 3 #\t) error--> 3 not in correct range

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.3 Comparison of Strings

procedure: string=? string1 string2

procedure: substring=? string1 start end string2 start end

procedure: string-ci=? string1 string2

procedure: substring-ci=? string1 start end string2 start end

Returns #t if the two strings (substrings) are the same length and contain the same characters in the same (relative) positions; otherwise returns #f. string-ci=? and substring-ci=? don't distinguish uppercase and lowercase letters, but string=? and substring=? do.

(string=? "PIE" "PIE") => #t (string=? "PIE" "pie") => #f (string-ci=? "PIE" "pie") => #t (substring=? "Alamo" 1 3 "cola" 2 4) => #t ; compares "la"

procedure: string<? string1 string2

procedure: substring<? string1 start1 end1 string2 start2 end2

procedure: string>? string1 string2

procedure: string<=? string1 string2

procedure: string>=? string1 string2

procedure: string-ci<? string1 string2

procedure: substring-ci<? string1 start1 end1 string2 start2 end2

procedure: string-ci>? string1 string2

procedure: string-ci<=? string1 string2

procedure: string-ci>=? string1 string2

These procedures compare strings (substrings) according to the order of the characters they contain (also see section 5.2 Comparison of Characters). The arguments are compared using a lexicographic (or dictionary) order. If two strings differ in length but are the same up to the length of the shorter string, the shorter string is considered to be less than the longer string.

(string<? "cat" "dog") => #t (string<? "cat" "DOG") => #f (string-ci<? "cat" "DOG") => #t (string>? "catkin" "cat") => #t ; shorter is lesser

procedure: string-compare string1 string2 if-eq if-lt if-gt

procedure: string-compare-ci string1 string2 if-eq if-lt if-gt

If-eq, if-lt, and if-gt are procedures of no arguments (thunks). The two strings are compared; if they are equal, if-eq is applied, if string1 is less than string2, if-lt is applied, else if string1 is greater than string2, if-gt is applied. The value of the procedure is the value of the thunk that is applied.

string-compare distinguishes uppercase and lowercase letters;
string-compare-ci does not.

(define (cheer) (display "Hooray!")) (define (boo) (display "Boo-hiss!")) (string-compare "a" "b" cheer (lambda() 'ignore) boo) -| Hooray! => unspecified

procedure: string-hash string

procedure: string-hash-mod string k

string-hash returns an exact non-negative integer that can be used for storing the specified string in a hash table. Equal strings (in the sense of string=?) return equal (=) hash codes, and non-equal but similar strings are usually mapped to distinct hash codes.

string-hash-mod is like string-hash, except that it limits the result to a particular range based on the exact non-negative integer k. The following are equivalent:

(string-hash-mod string k) (modulo (string-hash string) k)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.4 Alphabetic Case in Strings

procedure: string-capitalized? string

procedure: substring-capitalized? string start end

These procedures return #t if the first word in the string (substring) is capitalized, and any subsequent words are either lower case or capitalized. Otherwise, they return #f. A word is defined as a non-null contiguous sequence of alphabetic characters, delimited by non-alphabetic characters or the limits of the string (substring). A word is capitalized if its first letter is upper case and all its remaining letters are lower case.

(map string-capitalized? '("" "A" "art" "Art" "ART")) => (#f #t #f #t #f)

procedure: string-upper-case? string

procedure: substring-upper-case? string start end

procedure: string-lower-case? string

procedure: substring-lower-case? string start end

These procedures return #t if all the letters in the string (substring) are of the correct case, otherwise they return #f. The string (substring) must contain at least one letter or the procedures return #f.

(map string-upper-case? '("" "A" "art" "Art" "ART")) => (#f #t #f #f #t)

procedure: string-capitalize string
procedure: string-capitalize! string
procedure: substring-capitalize! string start end: string-capitalize returns a newly allocated copy of string in which the first alphabetic character is uppercase and the remaining alphabetic characters are lowercase. For example, "abcDEF" becomes "Abcdef". string-capitalize! is the destructive version of string-capitalize: it alters string and returns an unspecified value. substring-capitalize! destructively capitalizes the specified part of string.

procedure: string-downcase string

procedure: string-downcase! string

procedure: substring-downcase! string start end

string-downcase returns a newly allocated copy of string in which all uppercase letters are changed to lowercase. string-downcase! is the destructive version of string-downcase: it alters string and returns an unspecified value. substring-downcase! destructively changes the case of the specified part of string.

(define str "ABCDEFG") => unspecified (substring-downcase! str 3 5) => unspecified str => "ABCdeFG"

procedure: string-upcase string
procedure: string-upcase! string
procedure: substring-upcase! string start end: string-upcase returns a newly allocated copy of string in which all lowercase letters are changed to uppercase. string-upcase! is the destructive version of string-upcase: it alters string and returns an unspecified value. substring-upcase! destructively changes the case of the specified part of string.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.5 Cutting and Pasting Strings

procedure: string-append string ...

Returns a newly allocated string made from the concatenation of the given strings. With no arguments, string-append returns the empty string ("").

(string-append) => "" (string-append "*" "ace" "*") => "*ace*" (string-append "" "" "") => "" (eq? str (string-append str)) => #f ; newly allocated

procedure: substring string start end

Returns a newly allocated string formed from the characters of string beginning with index start (inclusive) and ending with end (exclusive).

(substring "" 0 0) => "" (substring "arduous" 2 5) => "duo" (substring "arduous" 2 8) error--> 8 not in correct range (define (string-copy s) (substring s 0 (string-length s)))

procedure: string-head string end

Returns a newly allocated copy of the initial substring of string, up to but excluding end. It could have been defined by:

(define (string-head string end) (substring string 0 end))

procedure: string-tail string start

Returns a newly allocated copy of the final substring of string, starting at index start and going to the end of string. It could have been defined by:

(define (string-tail string start) (substring string start (string-length string))) (string-tail "uncommon" 2) => "common"

procedure: string-pad-left string k [char]

procedure: string-pad-right string k [char]

These procedures return a newly allocated string created by padding string out to length k, using char. If char is not given, it defaults to #\space. If k is less than the length of string, the resulting string is a truncated form of string. string-pad-left adds padding characters or truncates from the beginning of the string (lowest indices), while string-pad-right does so at the end of the string (highest indices).

(string-pad-left "hello" 4) => "ello" (string-pad-left "hello" 8) => " hello" (string-pad-left "hello" 8 #\*) => "***hello" (string-pad-right "hello" 4) => "hell" (string-pad-right "hello" 8) => "hello "

procedure: string-trim string [char-set]

procedure: string-trim-left string [char-set]

procedure: string-trim-right string [char-set]

Returns a newly allocated string created by removing all characters that are not in char-set from: (string-trim) both ends of string; (string-trim-left) the beginning of string; or (string-trim-right) the end of string. Char-set defaults to char-set:not-whitespace.

(string-trim " in the end ") => "in the end" (string-trim " ") => "" (string-trim "100th" char-set:numeric) => "100" (string-trim-left "-.-+-=-" (char-set #\+)) => "+-=-" (string-trim "but (+ x y) is" (char-set #$ #$)) => "(+ x y)"

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.6 Searching Strings

The first few procedures in this section perform string search, in which a given string (the text) is searched to see if it contains another given string (the pattern) as a proper substring. At present these procedures are implemented using a hybrid strategy. For short patterns of less than 4 characters, the naive string-search algorithm is used. For longer patterns, the Boyer-Moore string-search algorithm is used.

procedure: string-search-forward pattern string

procedure: substring-search-forward pattern string start end

Pattern must be a string. Searches string for the leftmost occurrence of the substring pattern. If successful, the index of the first character of the matched substring is returned; otherwise, #f is returned.

substring-search-forward limits its search to the specified substring of string; string-search-forward searches all of string.

(string-search-forward "rat" "pirate") => 2 (string-search-forward "rat" "pirate rating") => 2 (substring-search-forward "rat" "pirate rating" 4 13) => 7 (substring-search-forward "rat" "pirate rating" 9 13) => #f

procedure: string-search-backward pattern string

procedure: substring-search-backward pattern string start end

Pattern must be a string. Searches string for the rightmost occurrence of the substring pattern. If successful, the index to the right of the last character of the matched substring is returned; otherwise, #f is returned.

substring-search-backward limits its search to the specified substring of string; string-search-backward searches all of string.

(string-search-backward "rat" "pirate") => 5 (string-search-backward "rat" "pirate rating") => 10 (substring-search-backward "rat" "pirate rating" 1 8) => 5 (substring-search-backward "rat" "pirate rating" 9 13) => #f

procedure: string-search-all pattern string

procedure: substring-search-all pattern string start end

Pattern must be a string. Searches string to find all occurrences of the substring pattern. Returns a list of the occurrences; each element of the list is an index pointing to the first character of an occurrence.

substring-search-all limits its search to the specified substring of string; string-search-all searches all of string.

(string-search-all "rat" "pirate") => (2) (string-search-all "rat" "pirate rating") => (2 7) (substring-search-all "rat" "pirate rating" 4 13) => (7) (substring-search-all "rat" "pirate rating" 9 13) => ()

procedure: substring? pattern string

Pattern must be a string. Searches string to see if it contains the substring pattern. Returns #t if pattern is a substring of string, otherwise returns #f.

(substring? "rat" "pirate") => #t (substring? "rat" "outrage") => #f (substring? "" any-string) => #t (if (substring? "moon" text) (process-lunar text) 'no-moon)

procedure: string-find-next-char string char

procedure: substring-find-next-char string start end char

procedure: string-find-next-char-ci string char

procedure: substring-find-next-char-ci string start end char

Returns the index of the first occurrence of char in the string (substring); returns #f if char does not appear in the string. For the substring procedures, the index returned is relative to the entire string, not just the substring. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-find-next-char "Adam" #\A) => 0 (substring-find-next-char "Adam" 1 4 #\A) => #f (substring-find-next-char-ci "Adam" 1 4 #\A) => 2

procedure: string-find-next-char-in-set string char-set

procedure: substring-find-next-char-in-set string start end char-set

Returns the index of the first character in the string (or substring) that is also in char-set, or returns #f if none of the characters in char-set occur in string. For the substring procedure, only the substring is searched, but the index returned is relative to the entire string, not just the substring.

(string-find-next-char-in-set my-string char-set:alphabetic) => start position of the first word in my-string ; Can be used as a predicate: (if (string-find-next-char-in-set my-string (char-set #$ #$ )) 'contains-parentheses 'no-parentheses)

procedure: string-find-previous-char string char
procedure: substring-find-previous-char string start end char
procedure: string-find-previous-char-ci string char
procedure: substring-find-previous-char-ci string start end char: Returns the index of the last occurrence of char in the string (substring); returns #f if char doesn't appear in the string. For the substring procedures, the index returned is relative to the entire string, not just the substring. The -ci procedures don't distinguish uppercase and lowercase letters.

procedure: string-find-previous-char-in-set string char-set
procedure: substring-find-previous-char-in-set string start end char-set: Returns the index of the last character in the string (substring) that is also in char-set. For the substring procedure, the index returned is relative to the entire string, not just the substring.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.7 Matching Strings

procedure: string-match-forward string1 string2

procedure: substring-match-forward string1 start end string2 start end

procedure: string-match-forward-ci string1 string2

procedure: substring-match-forward-ci string1 start end string2 start end

Compares the two strings (substrings), starting from the beginning, and returns the number of characters that are the same. If the two strings (substrings) start differently, returns 0. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-match-forward "mirror" "micro") => 2 ; matches "mi" (string-match-forward "a" "b") => 0 ; no match

procedure: string-match-backward string1 string2

procedure: substring-match-backward string1 start end string2 start end

procedure: string-match-backward-ci string1 string2

procedure: substring-match-backward-ci string1 start end string2 start end

Compares the two strings (substrings), starting from the end and matching toward the front, returning the number of characters that are the same. If the two strings (substrings) end differently, returns 0. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-match-backward-ci "BULBOUS" "fractious") => 3 ; matches "ous"

procedure: string-prefix? string1 string2

procedure: substring-prefix? string1 start1 end1 string2 start2 end2

procedure: string-prefix-ci? string1 string2

procedure: substring-prefix-ci? string1 start1 end1 string2 start2 end2

These procedures return #t if the first string (substring) forms the prefix of the second; otherwise returns #f. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-prefix? "abc" "abcdef") => #t (string-prefix? "" any-string) => #t

procedure: string-suffix? string1 string2

procedure: substring-suffix? string1 start1 end1 string2 start2 end2

procedure: string-suffix-ci? string1 string2

procedure: substring-suffix-ci? string1 start1 end1 string2 start2 end2

These procedures return #t if the first string (substring) forms the suffix of the second; otherwise returns #f. The -ci procedures don't distinguish uppercase and lowercase letters.

(string-suffix? "ous" "bulbous") => #t (string-suffix? "" any-string) => #t

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.8 Regular Expressions

MIT Scheme provides support for using regular expressions to search and match strings. This manual does not define regular expressions; instead see section `Syntax of Regular Expressions' in The Emacs Editor.

In addition to providing standard regular-expression support, MIT Scheme also provides the REXP abstraction. This is an alternative way to write regular expressions that is easier to read and understand than the standard notation. Regular expressions written in this notation can be translated into the standard notation.

The regular-expression support is a run-time-loadable option. To use it, execute

(load-option 'regular-expression)

once before calling any of the procedures defined here.

6.8.1 Regular-expression procedures

6.8.2 REXP abstraction

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.8.1 Regular-expression procedures

Procedures that perform regular-expression match and search accept standardized arguments. Regexp is the regular expression; it is a string. String is the string being matched or searched. Procedures that operate on substrings also accept start and end index arguments with the usual meaning. The optional argument case-fold? says whether the match/search is case-sensitive; if case-fold? is #f, it is case-sensitive, otherwise it is case-insensitive. The optional argument syntax-table is a character syntax table that defines the character syntax, such as which characters are legal word constituents. This feature is primarily for Edwin, so character syntax tables will not be documented here. Supplying #f for (or omitting) syntax-table will select the default character syntax, equivalent to Edwin's fundamental mode.

procedure: re-string-match regexp string [case-fold? [syntax-table]]

procedure: re-substring-match regexp string start end [case-fold? [syntax-table]]

These procedures match regexp against the respective string or substring, returning #f for no match, or a set of match registers (see below) if the match succeeds. Here is an example showing how to extract the matched substring:

(let ((r (re-substring-match regexp string start end))) (and r (substring string start (re-match-end-index 0 r))))

procedure: re-string-search-forward regexp string [case-fold? [syntax-table]]

procedure: re-substring-search-forward regexp string start end [case-fold? [syntax-table]]

Searches string for the leftmost substring matching regexp. Returns a set of match registers (see below) if the search is successful, or #f if it is unsuccessful.

re-substring-search-forward limits its search to the specified substring of string; re-string-search-forward searches all of string.

procedure: re-string-search-backward regexp string [case-fold? [syntax-table]]

procedure: re-substring-search-backward regexp string start end [case-fold? [syntax-table]]

Searches string for the rightmost substring matching regexp. Returns a set of match registers (see below) if the search is successful, or #f if it is unsuccessful.

re-substring-search-backward limits its search to the specified substring of string; re-string-search-backward searches all of string.

When a successful match or search occurs, the above procedures return a set of match registers. The match registers are a set of index registers that record indexes into the matched string. Each index register corresponds to an instance of the regular-expression grouping operator `\(', and records the start index (inclusive) and end index (exclusive) of the matched group. These registers are numbered from 1 to 9, corresponding left-to-right to the grouping operators in the expression. Additionally, register 0 corresponds to the entire substring matching the regular expression.

procedure: re-match-start-index n registers
procedure: re-match-end-index n registers: N must be an exact integer between 0 and 9 inclusive. Registers must be a match-registers object as returned by one of the regular-expression match or search procedures above. re-match-start-index returns the start index of the corresponding regular-expression register, and re-match-end-index returns the corresponding end index.

procedure: re-match-extract string registers n

Registers must be a match-registers object as returned by one of the regular-expression match or search procedures above. String must be the string that was passed as an argument to the procedure that returned registers. N must be an exact integer between 0 and 9 inclusive. If the matched regular expression contained m grouping operators, then the value of this procedure is undefined for n strictly greater than m.

This procedure extracts the substring corresponding to the match register specified by registers and n. This is equivalent to the following expression:

(substring string (re-match-start-index n registers) (re-match-end-index n registers))

procedure: regexp-group alternative ...

Each alternative must be a regular expression. The returned value is a new regular expression that consists of the alternatives combined by a grouping operator. For example:

(regexp-group "foo" "bar" "baz") => "\$foo\\|bar\\|baz\$"

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.8.2 REXP abstraction

The REXP abstraction is a set of combinators that are composed into a complete regular expression. Each combinator directly corresponds to a particular piece of regular-expression notation. For example, the expression (rexp-any-char) corresponds to the . character in standard regular-expression notation, while (rexp* rexp) corresponds to the * character.

The primary advantages of REXP are that it makes the nesting structure of regular expressions explicit, and that it simplifies the description of complex regular expressions by allowing them to be built up using straightforward combinators.

procedure: rexp? object: Returns #t if object is a REXP expression, or #f otherwise. A REXP is one of: a string, which represents the pattern matching that string; a character set, which represents the pattern matching a character in that set; or an object returned by calling one of the procedures defined here.

procedure: rexp->regexp rexp: Converts rexp to standard regular-expression notation, returning a newly-allocated string.

procedure: rexp-compile rexp

Converts rexp to standard regular-expression notation, then compiles it and returns the compiled result. Equivalent to

(re-compile-pattern (rexp->regexp rexp) #f)

procedure: rexp-any-char: Returns a REXP that matches any single character except a newline. This is equivalent to the . construct.

procedure: rexp-line-start: Returns a REXP that matches the start of a line. This is equivalent to the ^ construct.

procedure: rexp-line-end: Returns a REXP that matches the end of a line. This is equivalent to the $ construct.

procedure: rexp-string-start: Returns a REXP that matches the start of the text being matched. This is equivalent to the \` construct.

procedure: rexp-string-end: Returns a REXP that matches the end of the text being matched. This is equivalent to the \' construct.

procedure: rexp-word-edge: Returns a REXP that matches the start or end of a word. This is equivalent to the \b construct.

procedure: rexp-not-word-edge: Returns a REXP that matches anywhere that is not the start or end of a word. This is equivalent to the \B construct.

procedure: rexp-word-start: Returns a REXP that matches the start of a word. This is equivalent to the \< construct.

procedure: rexp-word-end: Returns a REXP that matches the end of a word. This is equivalent to the \> construct.

procedure: rexp-word-char: Returns a REXP that matches any word-constituent character. This is equivalent to the \w construct.

procedure: rexp-not-word-char: Returns a REXP that matches any character that isn't a word constituent. This is equivalent to the \W construct.

The next two procedures accept a syntax-type argument specifying the syntax class to be matched against. This argument is a symbol selected from the following list. Each symbol is followed by the equivalent character used in standard regular-expression notation. whitespace (space character), punctuation (.), word (w), symbol (_), open ((), close ()), quote ('), string-delimiter ("), math-delimiter ($), escape (\), char-quote (/), comment-start (<), comment-end (>).

procedure: rexp-syntax-char syntax-type: Returns a REXP that matches any character of type syntax-type. This is equivalent to the \s construct.

procedure: rexp-not-syntax-char syntax-type: Returns a REXP that matches any character not of type syntax-type. This is equivalent to the \S construct.

procedure: rexp-sequence rexp ...: Returns a REXP that matches each rexp argument in sequence. If no rexp argument is supplied, the result matches the null string. This is equivalent to concatenating the regular expressions corresponding to each rexp argument.

procedure: rexp-alternatives rexp ...: Returns a REXP that matches any of the rexp arguments. This is equivalent to concatenating the regular expressions corresponding to each rexp argument, separating them by the \| construct.

procedure: rexp-group rexp ...: rexp-group is like rexp-sequence, except that the result is marked as a match group. This is equivalent to the $ ... $ construct.

The next three procedures in principal accept a single REXP argument. For convenience, they accept multiple arguments, which are converted into a single argument by rexp-group. Note, however, that if only one REXP argument is supplied, and it's very simple, no grouping occurs.

procedure: rexp* rexp ...: Returns a REXP that matches zero or more instances of the pattern matched by the rexp arguments. This is equivalent to the * construct.

procedure: rexp+ rexp ...: Returns a REXP that matches one or more instances of the pattern matched by the rexp arguments. This is equivalent to the + construct.

procedure: rexp-optional rexp ...: Returns a REXP that matches zero or one instances of the pattern matched by the rexp arguments. This is equivalent to the ? construct.

procedure: rexp-case-fold rexp: Returns a REXP that matches the same pattern as rexp, but is insensitive to character case. This has no equivalent in standard regular-expression notation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.9 Modification of Strings

procedure: string-replace string char1 char2

procedure: substring-replace string start end char1 char2

procedure: string-replace! string char1 char2

procedure: substring-replace! string start end char1 char2

These procedures replace all occurrences of char1 with char2 in the original string (substring). string-replace and substring-replace return a newly allocated string containing the result. string-replace! and substring-replace! destructively modify string and return an unspecified value.

(define str "a few words") => unspecified (string-replace str #\space #\-) => "a-few-words" (substring-replace str 2 9 #\space #\-) => "a few-words" str => "a few words" (string-replace! str #\space #\-) => unspecified str => "a-few-words"

procedure: string-fill! string char: Stores char in every element of string and returns an unspecified value.

procedure: substring-fill! string start end char

Stores char in elements start (inclusive) to end (exclusive) of string and returns an unspecified value.

(define s (make-string 10 #\space)) => unspecified (substring-fill! s 2 8 #\*) => unspecified s => " ****** "

procedure: substring-move-left! string1 start1 end1 string2 start2

procedure: substring-move-right! string1 start1 end1 string2 start2

Copies the characters from start1 to end1 of string1 into string2 at the start2-th position. The characters are copied as follows (note that this is only important when string1 and string2 are eqv?):

substring-move-left!: The copy starts at the left end and moves toward the right (from smaller indices to larger). Thus if string1 and string2 are the same, this procedure moves the characters toward the left inside the string.
substring-move-right!: The copy starts at the right end and moves toward the left (from larger indices to smaller). Thus if string1 and string2 are the same, this procedure moves the characters toward the right inside the string.

The following example shows how these procedures can be used to build up a string (it would have been easier to use string-append):

(define answer (make-string 9 #\*)) => unspecified answer => "*********" (substring-move-left! "start" 0 5 answer 0) => unspecified answer => "start****" (substring-move-left! "-end" 0 4 answer 5) => unspecified answer => "start-end"

procedure: reverse-string string

procedure: reverse-substring string start end

procedure: reverse-string! string

procedure: reverse-substring! string start end

Reverses the order of the characters in the given string or substring. reverse-string and reverse-substring return newly allocated strings; reverse-string! and reverse-substring! modify their argument strings and return an unspecified value.

(reverse-string "foo bar baz") => "zab rab oof" (reverse-substring "foo bar baz" 4 7) => "rab" (let ((foo "foo bar baz")) (reverse-string! foo) foo) => "zab rab oof" (let ((foo "foo bar baz")) (reverse-substring! foo 4 7) foo) => "foo rab baz"

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.10 Variable-Length Strings

MIT Scheme allows the length of a string to be dynamically adjusted in a limited way. When a new string is allocated, by whatever method, it has a specific length. At the time of allocation, it is also given a maximum length, which is guaranteed to be at least as large as the string's length. (Sometimes the maximum length will be slightly larger than the length, but it is a bad idea to count on this. Programs should assume that the maximum length is the same as the length at the time of the string's allocation.) After the string is allocated, the operation set-string-length! can be used to alter the string's length to any value between 0 and the string's maximum length, inclusive.

procedure: string-maximum-length string

Returns the maximum length of string. The following is guaranteed:

(<= (string-length string) (string-maximum-length string)) => #t

The maximum length of a string never changes.

procedure: set-string-length! string k: Alters the length of string to be k, and returns an unspecified value. K must be less than or equal to the maximum length of string. set-string-length! does not change the maximum length of string.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

6.11 Byte Vectors

MIT Scheme implements strings as packed vectors of 8-bit ISO-8859-1 bytes. Most of the string operations, such as string-ref, coerce these 8-bit codes into character objects. However, some lower-level operations are made available for use.

procedure: vector-8b-ref string k

Returns character k of string as an ISO-8859-1 code. K must be a valid index of string.

(vector-8b-ref "abcde" 2) => 99 ;c

procedure: vector-8b-set! string k code: Stores code in element k of string and returns an unspecified value. K must be a valid index of string, and code must be a valid ISO-8859-1 code.

procedure: vector-8b-fill! string start end code: Stores code in elements start (inclusive) to end (exclusive) of string and returns an unspecified value. Code must be a valid ISO-8859-1 code.

procedure: vector-8b-find-next-char string start end code

procedure: vector-8b-find-next-char-ci string start end code

Returns the index of the first occurrence of code in the given substring; returns #f if code does not appear. The index returned is relative to the entire string, not just the substring. Code must be a valid ISO-8859-1 code.

vector-8b-find-next-char-ci doesn't distinguish uppercase and lowercase letters.

procedure: vector-8b-find-previous-char string start end code

procedure: vector-8b-find-previous-char-ci string start end code

Returns the index of the last occurrence of code in the given substring; returns #f if code does not appear. The index returned is relative to the entire string, not just the substring. Code must be a valid ISO-8859-1 code.

vector-8b-find-previous-char-ci doesn't distinguish uppercase and lowercase letters.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Chris Hanson on June, 17 2002 using texi2html