stri_duplicated {stringi} | R Documentation |
stri_duplicated()
determines which strings in a character vector
are duplicates of other elements.
stri_duplicated_any()
determines if there are any duplicated
strings in a character vector.
stri_duplicated(str, fromLast = FALSE, ..., opts_collator = NULL) stri_duplicated_any(str, fromLast = FALSE, ..., opts_collator = NULL)
str |
a character vector |
fromLast |
a single logical value; indicating whether duplication should be considered from the reverse side |
... |
additional settings for |
opts_collator |
a named list with ICU Collator's options
as generated with |
Missing values are regarded as equal.
Unlike duplicated
and anyDuplicated
,
these functions test for canonical equivalence of strings
(and not whether the strings are just bytewise equal)
Such operations are locale-dependent.
Hence, stri_duplicated
and stri_duplicated_any
are significantly slower (but much better suited for natural language
processing) than their base R counterpart.
See also stri_unique
for extracting unique elements.
stri_duplicated()
returns a logical vector of the same length
as str
. Each of its elements indicates whether a canonically
equivalent string was already found in str
.
stri_duplicated_any()
returns a single non-negative integer.
Value of 0 indicates that all the elements in str
are unique.
Otherwise, it gives the index of the first non-unique element.
Collation - ICU User Guide, http://userguide.icu-project.org/collation
Other locale_sensitive: %s<%
,
stri_compare
,
stri_count_boundaries
,
stri_enc_detect2
,
stri_extract_all_boundaries
,
stri_locate_all_boundaries
,
stri_opts_collator
,
stri_order
,
stri_split_boundaries
,
stri_trans_tolower
,
stri_unique
, stri_wrap
,
stringi-locale
,
stringi-search-boundaries
,
stringi-search-coll
Other locale_sensitive: %s<%
,
stri_compare
,
stri_count_boundaries
,
stri_enc_detect2
,
stri_extract_all_boundaries
,
stri_locate_all_boundaries
,
stri_opts_collator
,
stri_order
,
stri_split_boundaries
,
stri_trans_tolower
,
stri_unique
, stri_wrap
,
stringi-locale
,
stringi-search-boundaries
,
stringi-search-coll
# In the following examples, we have 3 duplicated values, # "a" - 2 times, NA - 1 time stri_duplicated(c("a", "b", "a", NA, "a", NA)) stri_duplicated(c("a", "b", "a", NA, "a", NA), fromLast=TRUE) stri_duplicated_any(c("a", "b", "a", NA, "a", NA)) # compare the results: stri_duplicated(c("\u0105", stri_trans_nfkd("\u0105"))) duplicated(c("\u0105", stri_trans_nfkd("\u0105"))) stri_duplicated(c("gro\u00df", "GROSS", "Gro\u00df", "Gross"), strength=1) duplicated(c("gro\u00df", "GROSS", "Gro\u00df", "Gross"))