[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A string is a mutable sequence of characters. In the current
implementation of MIT Scheme, the elements of a string must all
satisfy the predicate char-ascii?
; if someone ports MIT
Scheme to a non-ASCII operating system this requirement will
change.
A string is written as a sequence of characters enclosed within double
quotes " "
. To include a double quote inside a string, precede
the double quote with a backslash \
(escape it), as in
"The word \"recursion\" has many meanings." |
The printed representation of this string is
The word "recursion" has many meanings. |
To include a backslash inside a string, precede it with another backslash; for example,
"Use #\\Control-q to quit." |
The printed representation of this string is
Use #\Control-q to quit. |
The effect of a backslash that doesn't precede a double quote or
backslash is unspecified in standard Scheme, but MIT Scheme specifies
the effect for three other characters: \t
, \n
, and
\f
. These escape sequences are respectively translated into the
following characters: #\tab
, #\newline
, and #\page
.
Finally, a backslash followed by exactly three octal digits is
translated into the character whose ISO-8859-1 code is those
digits.
If a string literal is continued from one line to another, the string
will contain the newline character (#\newline
) at the line break.
Standard Scheme does not specify what appears in a string literal at a
line break.
The length of a string is the number of characters that it contains. This number is an exact non-negative integer that is established when the string is created (but see section 6.10 Variable-Length Strings). Each character in a string has an index, which is a number that indicates the character's position in the string. The index of the first (leftmost) character in a string is 0, and the index of the last character is one less than the length of the string. The valid indexes of a string are the exact non-negative integers less than the length of the string.
A number of the string procedures operate on substrings. A substring is a segment of a string, which is specified by two integers start and end satisfying these relationships:
0 <= start <= end <= (string-length string) |
Start is the index of the first character in the substring, and end is one greater than the index of the last character in the substring. Thus if start and end are equal, they refer to an empty substring, and if start is zero and end is the length of string, they refer to all of string.
Some of the procedures that operate on strings ignore the difference between uppercase and lowercase. The versions that ignore case include `-ci' (for "case insensitive") in their names.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
char-ascii?
.
(make-string 10 #\x) => "xxxxxxxxxx" |
char-ascii?
.
(string #\a) => "a" (string #\a #\b #\c) => "abc" (string #\a #\space #\b #\space #\c) => "a b c" (string) => "" |
list->string
returns a newly allocated string formed from the
elements of char-list. This is equivalent to (apply string
char-list)
. The inverse of this operation is
string->list
.
(list->string '(#\a #\b)) => "ab" (string->list "Hello") => (#\H #\e #\l #\l #\o) |
Note regarding variable-length strings: the maximum length of the result depends only on the length of string, not its maximum length. If you wish to copy a string and preserve its maximum length, do the following:
(define (string-copy-preserving-max-length string) (let ((length)) (dynamic-wind (lambda () (set! length (string-length string)) (set-string-length! string (string-maximum-length string))) (lambda () (string-copy string)) (lambda () (set-string-length! string length))))) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
#t
if object is a string; otherwise returns
#f
.
(string? "Hi") => #t (string? 'Hi) => #f |
(string-length "") => 0 (string-length "The length") => 10 |
#t
if string has zero length; otherwise returns
#f
.
(string-null? "") => #t (string-null? "Hi") => #f |
(string-ref "Hello" 1) => #\e (string-ref "Hello" 5) error--> 5 not in correct range |
char-ascii?
.
(define str "Dog") => unspecified (string-set! str 0 #\L) => unspecified str => "Log" (string-set! str 3 #\t) error--> 3 not in correct range |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
#t
if the two strings (substrings) are the same length
and contain the same characters in the same (relative) positions;
otherwise returns #f
. string-ci=?
and
substring-ci=?
don't distinguish uppercase and lowercase letters,
but string=?
and substring=?
do.
(string=? "PIE" "PIE") => #t (string=? "PIE" "pie") => #f (string-ci=? "PIE" "pie") => #t (substring=? "Alamo" 1 3 "cola" 2 4) => #t ; compares "la" |
(string<? "cat" "dog") => #t (string<? "cat" "DOG") => #f (string-ci<? "cat" "DOG") => #t (string>? "catkin" "cat") => #t ; shorter is lesser |
string-compare
distinguishes uppercase and lowercase letters;
string-compare-ci
does not.
(define (cheer) (display "Hooray!")) (define (boo) (display "Boo-hiss!")) (string-compare "a" "b" cheer (lambda() 'ignore) boo) -| Hooray! => unspecified |
string-hash
returns an exact non-negative integer that can be used
for storing the specified string in a hash table. Equal strings
(in the sense of string=?
) return equal (=
) hash codes,
and non-equal but similar strings are usually mapped to distinct hash
codes.
string-hash-mod
is like string-hash
, except that it limits
the result to a particular range based on the exact non-negative integer
k. The following are equivalent:
(string-hash-mod string k) (modulo (string-hash string) k) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
#t
if the first word in the string
(substring) is capitalized, and any subsequent words are either lower
case or capitalized. Otherwise, they return #f
. A word is
defined as a non-null contiguous sequence of alphabetic characters,
delimited by non-alphabetic characters or the limits of the string
(substring). A word is capitalized if its first letter is upper case
and all its remaining letters are lower case.
(map string-capitalized? '("" "A" "art" "Art" "ART")) => (#f #t #f #t #f) |
#t
if all the letters in the string
(substring) are of the correct case, otherwise they return #f
.
The string (substring) must contain at least one letter or the
procedures return #f
.
(map string-upper-case? '("" "A" "art" "Art" "ART")) => (#f #t #f #f #t) |
string-capitalize
returns a newly allocated copy of string
in which the first alphabetic character is uppercase and the remaining
alphabetic characters are lowercase. For example, "abcDEF"
becomes "Abcdef"
. string-capitalize!
is the destructive
version of string-capitalize
: it alters string and returns
an unspecified value. substring-capitalize!
destructively
capitalizes the specified part of string.
string-downcase
returns a newly allocated copy of string in
which all uppercase letters are changed to lowercase.
string-downcase!
is the destructive version of
string-downcase
: it alters string and returns an
unspecified value. substring-downcase!
destructively changes the
case of the specified part of string.
(define str "ABCDEFG") => unspecified (substring-downcase! str 3 5) => unspecified str => "ABCdeFG" |
string-upcase
returns a newly allocated copy of string in
which all lowercase letters are changed to uppercase.
string-upcase!
is the destructive version of
string-upcase
: it alters string and returns an unspecified
value. substring-upcase!
destructively changes the case of the
specified part of string.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
string-append
returns the empty
string (""
).
(string-append) => "" (string-append "*" "ace" "*") => "*ace*" (string-append "" "" "") => "" (eq? str (string-append str)) => #f ; newly allocated |
(substring "" 0 0) => "" (substring "arduous" 2 5) => "duo" (substring "arduous" 2 8) error--> 8 not in correct range (define (string-copy s) (substring s 0 (string-length s))) |
(define (string-head string end) (substring string 0 end)) |
(define (string-tail string start) (substring string start (string-length string))) (string-tail "uncommon" 2) => "common" |
#\space
. If k is less than the
length of string, the resulting string is a truncated form of
string. string-pad-left
adds padding characters or
truncates from the beginning of the string (lowest indices), while
string-pad-right
does so at the end of the string (highest
indices).
(string-pad-left "hello" 4) => "ello" (string-pad-left "hello" 8) => " hello" (string-pad-left "hello" 8 #\*) => "***hello" (string-pad-right "hello" 4) => "hell" (string-pad-right "hello" 8) => "hello " |
string-trim
) both ends of
string; (string-trim-left
) the beginning of string;
or (string-trim-right
) the end of string. Char-set
defaults to char-set:not-whitespace
.
(string-trim " in the end ") => "in the end" (string-trim " ") => "" (string-trim "100th" char-set:numeric) => "100" (string-trim-left "-.-+-=-" (char-set #\+)) => "+-=-" (string-trim "but (+ x y) is" (char-set #\( #\))) => "(+ x y)" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The first few procedures in this section perform string search, in which a given string (the text) is searched to see if it contains another given string (the pattern) as a proper substring. At present these procedures are implemented using a hybrid strategy. For short patterns of less than 4 characters, the naive string-search algorithm is used. For longer patterns, the Boyer-Moore string-search algorithm is used.
#f
is returned.
substring-search-forward
limits its search to the specified
substring of string; string-search-forward
searches all of
string.
(string-search-forward "rat" "pirate") => 2 (string-search-forward "rat" "pirate rating") => 2 (substring-search-forward "rat" "pirate rating" 4 13) => 7 (substring-search-forward "rat" "pirate rating" 9 13) => #f |
#f
is returned.
substring-search-backward
limits its search to the specified
substring of string; string-search-backward
searches all of
string.
(string-search-backward "rat" "pirate") => 5 (string-search-backward "rat" "pirate rating") => 10 (substring-search-backward "rat" "pirate rating" 1 8) => 5 (substring-search-backward "rat" "pirate rating" 9 13) => #f |
substring-search-all
limits its search to the specified substring
of string; string-search-all
searches all of string.
(string-search-all "rat" "pirate") => (2) (string-search-all "rat" "pirate rating") => (2 7) (substring-search-all "rat" "pirate rating" 4 13) => (7) (substring-search-all "rat" "pirate rating" 9 13) => () |
#t
if
pattern is a substring of string, otherwise returns
#f
.
(substring? "rat" "pirate") => #t (substring? "rat" "outrage") => #f (substring? "" any-string) => #t (if (substring? "moon" text) (process-lunar text) 'no-moon) |
#f
if char does not appear in the
string. For the substring procedures, the index returned is relative to
the entire string, not just the substring. The -ci
procedures
don't distinguish uppercase and lowercase letters.
(string-find-next-char "Adam" #\A) => 0 (substring-find-next-char "Adam" 1 4 #\A) => #f (substring-find-next-char-ci "Adam" 1 4 #\A) => 2 |
#f
if none of the
characters in char-set occur in string.
For the substring procedure, only the substring is searched, but the
index returned is relative to the entire string, not just the substring.
(string-find-next-char-in-set my-string char-set:alphabetic) => start position of the first word in my-string ; Can be used as a predicate: (if (string-find-next-char-in-set my-string (char-set #\( #\) )) 'contains-parentheses 'no-parentheses) |
#f
if char doesn't appear in the
string. For the substring procedures, the index returned is relative to
the entire string, not just the substring. The -ci
procedures
don't distinguish uppercase and lowercase letters.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
-ci
procedures
don't distinguish uppercase and lowercase letters.
(string-match-forward "mirror" "micro") => 2 ; matches "mi" (string-match-forward "a" "b") => 0 ; no match |
-ci
procedures don't distinguish uppercase and lowercase
letters.
(string-match-backward-ci "BULBOUS" "fractious") => 3 ; matches "ous" |
#t
if the first string (substring) forms
the prefix of the second; otherwise returns #f
. The -ci
procedures don't distinguish uppercase and lowercase letters.
(string-prefix? "abc" "abcdef") => #t (string-prefix? "" any-string) => #t |
#t
if the first string (substring) forms
the suffix of the second; otherwise returns #f
. The -ci
procedures don't distinguish uppercase and lowercase letters.
(string-suffix? "ous" "bulbous") => #t (string-suffix? "" any-string) => #t |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
MIT Scheme provides support for using regular expressions to search and match strings. This manual does not define regular expressions; instead see section `Syntax of Regular Expressions' in The Emacs Editor.
In addition to providing standard regular-expression support, MIT Scheme also provides the REXP abstraction. This is an alternative way to write regular expressions that is easier to read and understand than the standard notation. Regular expressions written in this notation can be translated into the standard notation.
The regular-expression support is a run-time-loadable option. To use it, execute
(load-option 'regular-expression) |
once before calling any of the procedures defined here.
6.8.1 Regular-expression procedures 6.8.2 REXP abstraction
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Procedures that perform regular-expression match and search accept
standardized arguments. Regexp is the regular expression; it is a
string. String is the string being matched or searched.
Procedures that operate on substrings also accept start and
end index arguments with the usual meaning. The optional argument
case-fold? says whether the match/search is case-sensitive; if
case-fold? is #f
, it is case-sensitive, otherwise it is
case-insensitive. The optional argument syntax-table is a
character syntax table that defines the character syntax, such as which
characters are legal word constituents. This feature is primarily for
Edwin, so character syntax tables will not be documented here.
Supplying #f
for (or omitting) syntax-table will select the
default character syntax, equivalent to Edwin's fundamental
mode.
#f
for no match, or a set of match registers
(see below) if the match succeeds. Here is an example showing how to
extract the matched substring:
(let ((r (re-substring-match regexp string start end))) (and r (substring string start (re-match-end-index 0 r)))) |
#f
if it is unsuccessful.
re-substring-search-forward
limits its search to the specified
substring of string; re-string-search-forward
searches all
of string.
#f
if it is unsuccessful.
re-substring-search-backward
limits its search to the specified
substring of string; re-string-search-backward
searches all
of string.
When a successful match or search occurs, the above procedures return a
set of match registers. The match registers are a set of index
registers that record indexes into the matched string. Each index
register corresponds to an instance of the regular-expression grouping
operator `\(', and records the start index (inclusive) and end
index (exclusive) of the matched group. These registers are numbered
from 1
to 9
, corresponding left-to-right to the grouping
operators in the expression. Additionally, register 0
corresponds to the entire substring matching the regular expression.
0
and 9
inclusive. Registers must be a match-registers object as returned
by one of the regular-expression match or search procedures above.
re-match-start-index
returns the start index of the corresponding
regular-expression register, and re-match-end-index
returns the
corresponding end index.
0
and 9
inclusive. If the matched regular expression
contained m grouping operators, then the value of this procedure
is undefined for n strictly greater than m.
This procedure extracts the substring corresponding to the match register specified by registers and n. This is equivalent to the following expression:
(substring string (re-match-start-index n registers) (re-match-end-index n registers)) |
(regexp-group "foo" "bar" "baz") => "\\(foo\\|bar\\|baz\\)" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In addition to providing standard regular-expression support, MIT Scheme also provides the REXP abstraction. This is an alternative way to write regular expressions that is easier to read and understand than the standard notation. Regular expressions written in this notation can be translated into the standard notation.
The REXP abstraction is a set of combinators that are
composed into a complete regular expression. Each combinator directly
corresponds to a particular piece of regular-expression notation. For
example, the expression (rexp-any-char)
corresponds to the
.
character in standard regular-expression notation, while
(rexp* rexp)
corresponds to the *
character.
The primary advantages of REXP are that it makes the nesting structure of regular expressions explicit, and that it simplifies the description of complex regular expressions by allowing them to be built up using straightforward combinators.
#t
if object is a REXP expression, or
#f
otherwise. A REXP is one of: a string, which
represents the pattern matching that string; a character set, which
represents the pattern matching a character in that set; or an object
returned by calling one of the procedures defined here.
(re-compile-pattern (rexp->regexp rexp) #f) |
.
construct.
^
construct.
$
construct.
\`
construct.
\'
construct.
\b
construct.
\B
construct.
\<
construct.
\>
construct.
\w
construct.
\W
construct.
The next two procedures accept a syntax-type argument specifying
the syntax class to be matched against. This argument is a symbol
selected from the following list. Each symbol is followed by the
equivalent character used in standard regular-expression notation.
whitespace
(space character),
punctuation
(.
),
word
(w
),
symbol
(_
),
open
((
),
close
()
),
quote
('
),
string-delimiter
("
),
math-delimiter
($
),
escape
(\
),
char-quote
(/
),
comment-start
(<
),
comment-end
(>
).
\s
construct.
\S
construct.
\|
construct.
rexp-group
is like rexp-sequence
, except that the result
is marked as a match group. This is equivalent to the \(
... \)
construct.
The next three procedures in principal accept a single REXP
argument. For convenience, they accept multiple arguments, which are
converted into a single argument by rexp-group
. Note, however,
that if only one REXP argument is supplied, and it's very
simple, no grouping occurs.
*
construct.
+
construct.
?
construct.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
string-replace
and
substring-replace
return a newly allocated string containing the
result. string-replace!
and substring-replace!
destructively modify string and return an unspecified value.
(define str "a few words") => unspecified (string-replace str #\space #\-) => "a-few-words" (substring-replace str 2 9 #\space #\-) => "a few-words" str => "a few words" (string-replace! str #\space #\-) => unspecified str => "a-few-words" |
(define s (make-string 10 #\space)) => unspecified (substring-fill! s 2 8 #\*) => unspecified s => " ****** " |
eqv?
):
substring-move-left!
substring-move-right!
The following example shows how these procedures can be used to build up
a string (it would have been easier to use string-append
):
(define answer (make-string 9 #\*)) => unspecified answer => "*********" (substring-move-left! "start" 0 5 answer 0) => unspecified answer => "start****" (substring-move-left! "-end" 0 4 answer 5) => unspecified answer => "start-end" |
reverse-string
and reverse-substring
return newly
allocated strings; reverse-string!
and reverse-substring!
modify their argument strings and return an unspecified value.
(reverse-string "foo bar baz") => "zab rab oof" (reverse-substring "foo bar baz" 4 7) => "rab" (let ((foo "foo bar baz")) (reverse-string! foo) foo) => "zab rab oof" (let ((foo "foo bar baz")) (reverse-substring! foo 4 7) foo) => "foo rab baz" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
MIT Scheme allows the length of a string to be dynamically adjusted in a
limited way. When a new string is allocated, by whatever method, it has
a specific length. At the time of allocation, it is also given a
maximum length, which is guaranteed to be at least as large as the
string's length. (Sometimes the maximum length will be slightly larger
than the length, but it is a bad idea to count on this. Programs should
assume that the maximum length is the same as the length at the time of
the string's allocation.) After the string is allocated, the operation
set-string-length!
can be used to alter the string's length to
any value between 0 and the string's maximum length, inclusive.
(<= (string-length string) (string-maximum-length string)) => #t |
The maximum length of a string never changes.
set-string-length!
does not change the
maximum length of string.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
MIT Scheme implements strings as packed vectors of 8-bit
ISO-8859-1 bytes. Most of the string operations, such as
string-ref
, coerce these 8-bit codes into character objects.
However, some lower-level operations are made available for use.
(vector-8b-ref "abcde" 2) => 99 ;c |
#f
if code does not appear. The index
returned is relative to the entire string, not just the substring.
Code must be a valid ISO-8859-1 code.
vector-8b-find-next-char-ci
doesn't distinguish uppercase and
lowercase letters.
#f
if code does not appear. The index
returned is relative to the entire string, not just the substring.
Code must be a valid ISO-8859-1 code.
vector-8b-find-previous-char-ci
doesn't distinguish uppercase and
lowercase letters.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |