stri_extract_all {stringi} | R Documentation |
These functions extract all substrings matching a given pattern.
stri_extract_all_*
extracts all the matches.
On the other hand, stri_extract_first_*
and stri_extract_last_*
provide the first or the last matches, respectively.
stri_extract_all(str, ..., regex, fixed, coll, charclass) stri_extract_first(str, ..., regex, fixed, coll, charclass) stri_extract_last(str, ..., regex, fixed, coll, charclass) stri_extract(str, ..., regex, fixed, coll, charclass, mode = c("first", "all", "last")) stri_extract_all_charclass(str, pattern, merge = TRUE, simplify = FALSE, omit_no_match = FALSE) stri_extract_first_charclass(str, pattern) stri_extract_last_charclass(str, pattern) stri_extract_all_coll(str, pattern, simplify = FALSE, omit_no_match = FALSE, ..., opts_collator = NULL) stri_extract_first_coll(str, pattern, ..., opts_collator = NULL) stri_extract_last_coll(str, pattern, ..., opts_collator = NULL) stri_extract_all_regex(str, pattern, simplify = FALSE, omit_no_match = FALSE, ..., opts_regex = NULL) stri_extract_first_regex(str, pattern, ..., opts_regex = NULL) stri_extract_last_regex(str, pattern, ..., opts_regex = NULL) stri_extract_all_fixed(str, pattern, simplify = FALSE, omit_no_match = FALSE, ..., opts_fixed = NULL) stri_extract_first_fixed(str, pattern, ..., opts_fixed = NULL) stri_extract_last_fixed(str, pattern, ..., opts_fixed = NULL)
str |
character vector with strings to search in |
... |
supplementary arguments passed to the underlying functions,
including additional settings for |
mode |
single string;
one of: |
pattern, regex, fixed, coll, charclass |
character vector defining search patterns; for more details refer to stringi-search |
merge |
single logical value;
should consecutive matches be merged into one string; |
simplify |
single logical value;
if |
omit_no_match |
single logical value; if |
opts_collator, opts_fixed, opts_regex |
a named list used to tune up
a search engine's settings; see |
Vectorized over str
and pattern
.
If you would like to extract regex capture groups individually,
check out stri_match
.
stri_extract
, stri_extract_all
, stri_extract_first
,
and stri_extract_last
are convenience functions.
They just call stri_extract_*_*
, depending on the arguments used.
Relying
on one of those underlying functions will make your code run slightly faster.
For stri_extract_all*
, if simplify=FALSE
(the default), then
a list of character vectors is returned. Each list element
represents the results of a separate search scenario.
If a pattern is not found and omit_no_match=FALSE
,
then a character vector of length 1,
with single NA
value will be generated.
Otherwise, i.e. if simplify
is not FALSE
,
then stri_list2matrix
with byrow=TRUE
argument
is called on the resulting object.
In such a case, a character matrix with an appropriate number of rows
(according to the length of str
, pattern
, etc.)
is returned. Note that stri_list2matrix
's fill
argument is set
to an empty string and NA
,
for simplify
equal to TRUE
and NA
, respectively.
stri_extract_first*
and stri_extract_last*
,
on the other hand, return a character vector.
A NA
element indicates no match.
Other search_extract: stri_extract_all_boundaries
,
stri_match_all
,
stringi-search
stri_extract_all('XaaaaX', regex=c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) stri_extract_all('Bartolini', coll='i') stri_extract_all('stringi is so good!', charclass='\\p{Zs}') # all whitespaces stri_extract_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}') stri_extract_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}', merge=FALSE) stri_extract_first_charclass('AaBbCc', '\\p{Ll}') stri_extract_last_charclass('AaBbCc', '\\p{Ll}') ## Not run: # emoji support available since ICU 57 stri_extract_all_charclass(stri_enc_fromutf32(32:55200), "\\p{EMOJI}") ## End(Not run) stri_extract_all_coll(c('AaaaaaaA', 'AAAA'), 'a') stri_extract_first_coll(c('Yy\u00FD', 'AAA'), 'y', strength=2, locale="sk_SK") stri_extract_last_coll(c('Yy\u00FD', 'AAA'), 'y', strength=1, locale="sk_SK") stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) stri_extract_first_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) stri_extract_last_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) stri_list2matrix(stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+'))) stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+'), simplify=TRUE) stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+'), simplify=NA) stri_extract_all_fixed("abaBAba", "Aba", case_insensitive=TRUE) stri_extract_all_fixed("abaBAba", "Aba", case_insensitive=TRUE, overlap=TRUE)