stri_opts_collator {stringi} | R Documentation |
A convenience function to tune the ICU Collator's behavior,
e.g. in stri_compare
, stri_order
,
stri_unique
, stri_duplicated
,
as well as stri_detect_coll
and other stringi-search-coll functions.
stri_opts_collator(locale = NULL, strength = 3L, alternate_shifted = FALSE, french = FALSE, uppercase_first = NA, case_level = FALSE, normalization = FALSE, numeric = FALSE, ...)
locale |
single string, |
strength |
single integer in {1,2,3,4}, which defines collation strength;
|
alternate_shifted |
single logical value; |
french |
single logical value; used in Canadian French;
|
uppercase_first |
single logical value; |
case_level |
single logical value; controls whether an extra case level (positioned before the third level) is generated or not |
normalization |
single logical value; if |
numeric |
single logical value; when turned on, this attribute generates a collation key for the numeric value of substrings of digits; this is a way to get '100' to sort AFTER '2' |
... |
any other arguments to this function are purposely ignored |
ICU's collator performs a locale-aware, natural-language alike string comparison. This is a more reliable way of establishing relationships between string than that provided by base R, and definitely one that is more complex and appropriate than ordinary byte-comparison.
A note on collation strength
:
generally, strength
set to 4 is
the least permissive.
Set to 2 to ignore case differences.
Set to 1 to also ignore diacritical differences.
The strings are Unicode-normalized before the comparison.
Returns a named list object; missing settings are left with default values.
Collation – ICU User Guide, http://userguide.icu-project.org/collation
ICU Collation Service Architecture – ICU User Guide, http://userguide.icu-project.org/collation/architecture
icu::Collator
Class Reference – ICU4C API Documentation,
http://www.icu-project.org/apiref/icu4c/classicu_1_1Collator.html
Other locale_sensitive: %s<%
,
stri_compare
,
stri_count_boundaries
,
stri_duplicated
,
stri_enc_detect2
,
stri_extract_all_boundaries
,
stri_locate_all_boundaries
,
stri_order
,
stri_split_boundaries
,
stri_trans_tolower
,
stri_unique
, stri_wrap
,
stringi-locale
,
stringi-search-boundaries
,
stringi-search-coll
Other search_coll: stringi-search-coll
,
stringi-search
stri_cmp("number100", "number2") stri_cmp("number100", "number2", opts_collator=stri_opts_collator(numeric=TRUE)) stri_cmp("number100", "number2", numeric=TRUE) # equivalent stri_cmp("above mentioned", "above-mentioned") stri_cmp("above mentioned", "above-mentioned", alternate_shifted=TRUE)