stri_locate_all_boundaries {stringi} | R Documentation |
These functions locate specific text boundaries
(like character, word, line, or sentence boundaries).
stri_locate_all_*
locate all the matches.
On the other hand, stri_locate_first_*
and stri_locate_last_*
give the first or the last matches, respectively.
stri_locate_all_boundaries(str, omit_no_match = FALSE, ..., opts_brkiter = NULL) stri_locate_last_boundaries(str, ..., opts_brkiter = NULL) stri_locate_first_boundaries(str, ..., opts_brkiter = NULL) stri_locate_all_words(str, omit_no_match = FALSE, locale = NULL) stri_locate_last_words(str, locale = NULL) stri_locate_first_words(str, locale = NULL)
str |
character vector or an object coercible to |
omit_no_match |
single logical value; if |
... |
additional settings for |
opts_brkiter |
a named list with ICU BreakIterator's settings
as generated with |
locale |
|
Vectorized over str
.
For more information on the text boundary analysis
performed by ICU's BreakIterator
, see
stringi-search-boundaries.
In case of stri_locate_*_words
,
just like in stri_extract_all_words
and stri_count_words
,
ICU's word BreakIterator
iterator is used
to locate word boundaries, and all non-word characters
(UBRK_WORD_NONE
rule status) are ignored.
This is function is equivalent to a call to
stri_locate_*_boundaries(str, type="word", skip_word_none=TRUE, locale=locale)
For stri_locate_all_*
, a list of length(str)
integer matrices
is returned. The first column gives the start positions
of substrings between located boundaries, and the second column gives
the end positions. The indices are code point-based, thus
they may be passed e.g. to the stri_sub
function.
Moreover, you may get two NA
s in one row
for no match (if omit_no_match
is FALSE
)
or NA
arguments.
stri_locate_first_*
and stri_locate_last_*
,
on the other hand, return an integer matrix with
two columns, giving the start and end positions of the first
or the last matches, respectively, and two NA
s if and
only if they are not found.
Other search_locate: stri_locate_all
,
stringi-search
Other indexing: stri_locate_all
,
stri_sub
Other locale_sensitive: %s<%
,
stri_compare
,
stri_count_boundaries
,
stri_duplicated
,
stri_enc_detect2
,
stri_extract_all_boundaries
,
stri_opts_collator
,
stri_order
,
stri_split_boundaries
,
stri_trans_tolower
,
stri_unique
, stri_wrap
,
stringi-locale
,
stringi-search-boundaries
,
stringi-search-coll
Other text_boundaries: stri_count_boundaries
,
stri_extract_all_boundaries
,
stri_opts_brkiter
,
stri_split_boundaries
,
stri_split_lines
,
stri_trans_tolower
,
stri_wrap
,
stringi-search-boundaries
,
stringi-search
test <- "The\u00a0above-mentioned features are very useful. Warm thanks to their developers." stri_locate_all_boundaries(test, type="line") stri_locate_all_boundaries(test, type="word") stri_locate_all_boundaries(test, type="sentence") stri_locate_all_boundaries(test, type="character") stri_locate_all_words(test) stri_extract_all_boundaries("Mr. Jones and Mrs. Brown are very happy. So am I, Prof. Smith.", type="sentence", locale="en_US@ss=standard") # ICU >= 56 only