stri_split {stringi} | R Documentation |
These functions split each element of str
into substrings.
pattern
indicates delimiters that separate the input into tokens.
The input data between the matches become the fields themselves.
stri_split(str, ..., regex, fixed, coll, charclass) stri_split_fixed(str, pattern, n = -1L, omit_empty = FALSE, tokens_only = FALSE, simplify = FALSE, ..., opts_fixed = NULL) stri_split_regex(str, pattern, n = -1L, omit_empty = FALSE, tokens_only = FALSE, simplify = FALSE, ..., opts_regex = NULL) stri_split_coll(str, pattern, n = -1L, omit_empty = FALSE, tokens_only = FALSE, simplify = FALSE, ..., opts_collator = NULL) stri_split_charclass(str, pattern, n = -1L, omit_empty = FALSE, tokens_only = FALSE, simplify = FALSE)
str |
character vector with strings to search in |
... |
supplementary arguments passed to the underlying functions,
including additional settings for |
pattern, regex, fixed, coll, charclass |
character vector defining search patterns; for more details refer to stringi-search |
n |
integer vector, maximal number of strings to return, and, at the same time, maximal number of text boundaries to look for |
omit_empty |
logical vector; determines whether empty
tokens should be removed from the result ( |
tokens_only |
single logical value;
may affect the result if |
simplify |
single logical value;
if |
opts_collator, opts_fixed, opts_regex |
a named list used to tune up
a search engine's settings; see
|
Vectorized over str
, pattern
, n
, and omit_empty
.
If n
is negative, then all pieces are extracted.
Otherwise, if tokens_only
is FALSE
(this is the default,
for compatibility with the stringr package), then n-1
tokes are extracted (if possible) and the n
-th string
gives the remainder (see Examples).
On the other hand, if tokens_only
is TRUE
,
then only full tokens (up to n
pieces) are extracted.
omit_empty
is applied during the split process: if it is set to
TRUE
, then tokens of zero length are ignored. Thus, empty strings
will never appear in the resulting vector. On the other hand, if
omit_empty
is NA
, then empty tokes are substituted with
missing strings.
Empty search patterns are not supported. If you would like to split a
string into individual characters, use e.g.
stri_split_boundaries(str, type="character")
for THE Unicode way.
stri_split
is a convenience function. It calls either
stri_split_regex
, stri_split_fixed
, stri_split_coll
,
or stri_split_charclass
, depending on the argument used. Relying
on one of those underlying functions will make your code run slightly faster.
If simplify=FALSE
(the default),
then the functions return a list of character vectors.
Otherwise, stri_list2matrix
with byrow=TRUE
and n_min=n
arguments is called on the resulting object.
In such a case, a character matrix with an appropriate number of rows
(according to the length of str
, pattern
, etc.)
is returned. Note that stri_list2matrix
's fill
argument
is set to an empty string and NA
, for simplify
equal to
TRUE
and NA
, respectively.
Other search_split: stri_split_boundaries
,
stri_split_lines
,
stringi-search
stri_split_fixed("a_b_c_d", "_") stri_split_fixed("a_b_c__d", "_") stri_split_fixed("a_b_c__d", "_", omit_empty=TRUE) stri_split_fixed("a_b_c__d", "_", n=2, tokens_only=FALSE) # "a" & remainder stri_split_fixed("a_b_c__d", "_", n=2, tokens_only=TRUE) # "a" & "b" only stri_split_fixed("a_b_c__d", "_", n=4, omit_empty=TRUE, tokens_only=TRUE) stri_split_fixed("a_b_c__d", "_", n=4, omit_empty=FALSE, tokens_only=TRUE) stri_split_fixed("a_b_c__d", "_", omit_empty=NA) stri_split_fixed(c("ab_c", "d_ef_g", "h", ""), "_", n=1, tokens_only=TRUE, omit_empty=TRUE) stri_split_fixed(c("ab_c", "d_ef_g", "h", ""), "_", n=2, tokens_only=TRUE, omit_empty=TRUE) stri_split_fixed(c("ab_c", "d_ef_g", "h", ""), "_", n=3, tokens_only=TRUE, omit_empty=TRUE) stri_list2matrix(stri_split_fixed(c("ab,c", "d,ef,g", ",h", ""), ",", omit_empty=TRUE)) stri_split_fixed(c("ab,c", "d,ef,g", ",h", ""), ",", omit_empty=FALSE, simplify=TRUE) stri_split_fixed(c("ab,c", "d,ef,g", ",h", ""), ",", omit_empty=NA, simplify=TRUE) stri_split_fixed(c("ab,c", "d,ef,g", ",h", ""), ",", omit_empty=TRUE, simplify=TRUE) stri_split_fixed(c("ab,c", "d,ef,g", ",h", ""), ",", omit_empty=NA, simplify=NA) stri_split_regex(c("ab,c", "d,ef , g", ", h", ""), "\\p{WHITE_SPACE}*,\\p{WHITE_SPACE}*", omit_empty=NA, simplify=TRUE) stri_split_charclass("Lorem ipsum dolor sit amet", "\\p{WHITE_SPACE}") stri_split_charclass(" Lorem ipsum dolor", "\\p{WHITE_SPACE}", n=3, omit_empty=c(FALSE, TRUE)) stri_split_regex("Lorem ipsum dolor sit amet", "\\p{Z}+") # see also stri_split_charclass