stri_encode {stringi} | R Documentation |
These functions convert a character vector between encodings.
stri_encode(str, from = NULL, to = NULL, to_raw = FALSE) stri_conv(str, from = NULL, to = NULL, to_raw = FALSE)
str |
a character vector, a raw vector, or
a list of |
from |
input encoding:
|
to |
target encoding:
|
to_raw |
a single logical value; indicates whether a list of raw vectors shall be returned rather than a character vector |
stri_conv
is an alias for stri_encode
.
These two functions aim to replace R's iconv
.
It is not only faster, but also
works in the same manner on all platforms.
Please refer to stri_enc_list
for the list
of supported encodings and stringi-encoding
for a general discussion.
If str
is a character vector
and from
is either missing, ""
, or NULL
,
then the declared encodings are used
(see stri_enc_mark
) – in such a case bytes
-declared
strings are disallowed.
Otherwise, the internal encoding declarations are ignored and
a converter selected with from
is used.
On the other hand, for str
being a raw vector
or a list of raw vectors,
we assume that the input encoding is the current default encoding
as given by stri_enc_get
.
For to_raw=FALSE
, the output
strings have always marked encodings according to the target converter
used (as specified by to
) and the current default Encoding
(ASCII
, latin1
, UTF-8
, native
,
or bytes
in all other cases).
Note that problems may occur if to
indicates e.g UTF-16 or UTF-32,
as the output strings may have embedded NULs.
In such cases use to_raw=TRUE
and consider
specifying a byte order marker (BOM) for portability reasons
(e.g. set UTF-16
or UTF-32
which automatically
adds BOMs).
Note that stri_encode(as.raw(data), "encodingname")
is a wise substitute for rawToChar
.
In the current version of stringi, if an incorrect code point is found on input, it is replaced by the default (for that target encoding) substitute character and a warning is generated.
If to_raw
is FALSE
,
then a character vector with encoded strings (and sensible
encoding marks) is returned.
Otherwise, a list of raw vectors is produced.
Conversion – ICU User Guide, http://userguide.icu-project.org/conversion
Converters – ICU User Guide, http://userguide.icu-project.org/conversion/converters (technical details)
Other encoding_conversion: stri_enc_fromutf32
,
stri_enc_toascii
,
stri_enc_tonative
,
stri_enc_toutf32
,
stri_enc_toutf8
,
stringi-encoding