stri_locate_all {stringi} | R Documentation |
These functions may be used e.g. to find the indices (positions), at which
a given pattern is matched.
stri_locate_all_*
locate all the matches.
On the other hand, stri_locate_first_*
and stri_locate_last_*
give the first or the last matches, respectively.
stri_locate_all(str, ..., regex, fixed, coll, charclass) stri_locate_first(str, ..., regex, fixed, coll, charclass) stri_locate_last(str, ..., regex, fixed, coll, charclass) stri_locate(str, ..., regex, fixed, coll, charclass, mode = c("first", "all", "last")) stri_locate_all_charclass(str, pattern, merge = TRUE, omit_no_match = FALSE) stri_locate_first_charclass(str, pattern) stri_locate_last_charclass(str, pattern) stri_locate_all_coll(str, pattern, omit_no_match = FALSE, ..., opts_collator = NULL) stri_locate_first_coll(str, pattern, ..., opts_collator = NULL) stri_locate_last_coll(str, pattern, ..., opts_collator = NULL) stri_locate_all_regex(str, pattern, omit_no_match = FALSE, ..., opts_regex = NULL) stri_locate_first_regex(str, pattern, ..., opts_regex = NULL) stri_locate_last_regex(str, pattern, ..., opts_regex = NULL) stri_locate_all_fixed(str, pattern, omit_no_match = FALSE, ..., opts_fixed = NULL) stri_locate_first_fixed(str, pattern, ..., opts_fixed = NULL) stri_locate_last_fixed(str, pattern, ..., opts_fixed = NULL)
str |
character vector with strings to search in |
... |
supplementary arguments passed to the underlying functions,
including additional settings for |
mode |
single string;
one of: |
pattern, regex, fixed, coll, charclass |
character vector defining search patterns; for more details refer to stringi-search |
merge |
single logical value;
indicates whether consecutive sequences of indices in the resulting
matrix shall be merged; |
omit_no_match |
single logical value; if |
opts_collator, opts_fixed, opts_regex |
a named list used to tune up
a search engine's settings; see
|
Vectorized over str
and pattern
.
The matched string(s) may be extracted by calling
the stri_sub
function.
Alternatively, you may call stri_extract
directly.
stri_locate
, stri_locate_all
, stri_locate_first
,
and stri_locate_last
are convenience functions.
They just call stri_locate_*_*
, depending on arguments used.
Unless you are a very lazy person, please call the underlying functions
directly for better performance.
For stri_locate_all_*
,
a list of integer matrices is returned. Each list element
represents the results of a separate search scenario.
The first column gives the start positions
of matches, and the second column gives the end positions.
Moreover, you may get two NA
s in one row
for no match (if omit_no_match
is FALSE
)
or NA
arguments.
stri_locate_first_*
and stri_locate_last_*
,
on the other hand, return an integer matrix with
two columns, giving the start and end positions of the first
or the last matches, respectively, and two NA
s if and
only if they are not found.
For stri_locate_*_regex
, if the match is of length 0,
end
will be one character less than start
.
Other search_locate: stri_locate_all_boundaries
,
stringi-search
Other indexing: stri_locate_all_boundaries
,
stri_sub
stri_locate_all('XaaaaX', regex=c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) stri_locate_all('Bartolini', fixed='i') stri_locate_all('a b c', charclass='\\p{Zs}') # all white spaces stri_locate_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}') stri_locate_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}', merge=FALSE) stri_locate_first_charclass('AaBbCc', '\\p{Ll}') stri_locate_last_charclass('AaBbCc', '\\p{Ll}') stri_locate_all_coll(c('AaaaaaaA', 'AAAA'), 'a') stri_locate_first_coll(c('Yy\u00FD', 'AAA'), 'y', strength=2, locale="sk_SK") stri_locate_last_coll(c('Yy\u00FD', 'AAA'), 'y', strength=1, locale="sk_SK") pat <- stri_paste("\u0635\u0644\u0649 \u0627\u0644\u0644\u0647 ", "\u0639\u0644\u064a\u0647 \u0648\u0633\u0644\u0645XYZ") stri_locate_last_coll("\ufdfa\ufdfa\ufdfaXYZ", pat, strength = 1) stri_locate_all_fixed(c('AaaaaaaA', 'AAAA'), 'a') stri_locate_all_fixed(c('AaaaaaaA', 'AAAA'), 'a', case_insensitive=TRUE, overlap=TRUE) stri_locate_first_fixed(c('AaaaaaaA', 'aaa', 'AAA'), 'a') stri_locate_last_fixed(c('AaaaaaaA', 'aaa', 'AAA'), 'a') #first row is 1-2 like in locate_first stri_locate_all_fixed('bbbbb', 'bb') stri_locate_first_fixed('bbbbb', 'bb') # but last row is 3-4, unlike in locate_last, # keep this in mind [overlapping pattern match OK]! stri_locate_last_fixed('bbbbb', 'bb') stri_locate_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) stri_locate_first_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) stri_locate_last_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) # Use regex positive-lookahead to locate overlapping pattern matches: stri_locate_all_regex("ACAGAGACTTTAGATAGAGAAGA", "(?=AGA)") # note that start > end here (match of 0 length)