grep {base} | R Documentation |
grep
searches for matches to pattern
(its first
argument) within the character vector x
(second
argument). regexpr
does too, but returns more detail in a
different format.
sub
and gsub
perform replacement of matches determined
by regular expression matching.
grep(pattern, x, ignore.case = FALSE, extended = TRUE, perl = FALSE, value = FALSE, fixed = FALSE) sub(pattern, replacement, x, ignore.case = FALSE, extended = TRUE, perl = FALSE) gsub(pattern, replacement, x, ignore.case = FALSE, extended = TRUE, perl = FALSE) regexpr(pattern, text, extended = TRUE, perl = FALSE, fixed = FALSE)
pattern |
character string containing a regular expression
(or character string for fixed = TRUE ) to be matched
in the given character vector. |
x, text |
a character vector where matches are sought. |
ignore.case |
if FALSE , the pattern matching is case
sensitive and if TRUE , case is ignored during matching. |
extended |
if TRUE , extended regular expression matching
is used, and if FALSE basic regular expressions are used. |
perl |
logical. Should perl-compatible regexps be used?
Has priority over extended . |
value |
if FALSE , a vector containing the (integer )
indices of the matches determined by grep is returned, and if
TRUE , a vector containing the matching elements themselves is
returned. |
fixed |
logical. If TRUE , pattern is a string to be
matched as is. Overrides all other arguments. |
replacement |
a replacement for matched pattern in sub and
gsub . |
Arguments which should be character strings or character vectors are coerced to character if possible.
The two *sub
functions differ only in that sub
replaces
only the first occurrence of a pattern
whereas gsub
replaces all occurrences.
For regexpr
it is an error for pattern
to be NA
,
otherwise NA
is permitted and matches only itself.
The regular expressions used are those specified by POSIX 1003.2,
either extended or basic, depending on the value of the
extended
argument, unless perl = TRUE
when they are
those of PCRE,
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/.
(The exact set of patterns supported may depend on the version of
PCRE installed on the system in use.)
For grep
a vector giving either the indices of the elements of
x
that yielded a match or, if value
is TRUE
, the
matched elements.
For sub
and gsub
a character vector of the same length
as the original.
For regexpr
an integer vector of the same length as text
giving the starting position of the first match, or -1 if there
is none, with attribute "match.length"
giving the length of the
matched text (or -1 for no match).
The standard regular-expression code has been reported to be very slow
or give errors when applied to extremely long character strings
(tens of thousands of characters or more): the code used when
perl=TRUE
seems faster and more reliable for such usages.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole (grep
)
regular expression (aka regexp
) for the details
of the pattern specification.
agrep
for approximate matching.
tolower
, toupper
and chartr
for character translations.
charmatch
, pmatch
, match
.
apropos
uses regexps and has nice examples.
grep("[a-z]", letters) txt <- c("arm","foot","lefroo", "bafoobar") if(any(i <- grep("foo",txt))) cat("'foo' appears at least once in\n\t",txt,"\n") i # 2 and 4 txt[i] ## Double all 'a' or 'b's; "\" must be escaped, i.e., 'doubled' gsub("([ab])", "\\1_\\1_", "abc and ABC") txt <- c("The", "licenses", "for", "most", "software", "are", "designed", "to", "take", "away", "your", "freedom", "to", "share", "and", "change", "it.", "", "By", "contrast,", "the", "GNU", "General", "Public", "License", "is", "intended", "to", "guarantee", "your", "freedom", "to", "share", "and", "change", "free", "software", "--", "to", "make", "sure", "the", "software", "is", "free", "for", "all", "its", "users") ( i <- grep("[gu]", txt) ) # indices stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) ) (ot <- sub("[b-e]",".", txt)) txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution txt[gsub("g","#", txt) != gsub("g","#", txt, ignore.case = TRUE)] # the "G" words regexpr("en", txt) ## trim trailing white space str = 'Now is the time ' sub(' +$', '', str) ## spaces only sub('[[:space:]]+$', '', str) ## white space, POSIX-style sub('\\s+$', '', str, perl = TRUE) ## Perl-style white space