| utf8_nchar {cli} | R Documentation |
By default it counts Unicode grapheme clusters, instead of code points.
utf8_nchar(x, type = c("chars", "bytes", "width", "graphemes", "codepoints"))
x |
Character vector, it is converted to UTF-8. |
type |
Whether to count graphemes (characters), code points, bytes, or calculate the display width of the string. |
Numeric vector, the length of the strings in the character vector.
Other UTF-8 string manipulation:
utf8_graphemes(),
utf8_substr()
# Grapheme example, emoji with combining characters. This is a single # grapheme, consisting of five Unicode code points: # * `\U0001f477` is the construction worker emoji # * `\U0001f3fb` is emoji modifier that changes the skin color # * `\u200d` is the zero width joiner # * `\u2640` is the female sign # * `\ufe0f` is variation selector 16, requesting an emoji style glyph emo <- "\U0001f477\U0001f3fb\u200d\u2640\ufe0f" cat(emo) utf8_nchar(emo, "chars") # = graphemes utf8_nchar(emo, "bytes") utf8_nchar(emo, "width") utf8_nchar(emo, "codepoints") # For comparision, the output for width depends on the R version used: nchar(emo, "chars") nchar(emo, "bytes") nchar(emo, "width")