| rm_non_ascii {qdapRegex} | R Documentation |
Remove/replace/extract non-ASCII substring from a string. This is the template used by
other qdapRegex rm_XXX functions.
rm_non_ascii(text.var, trim = !extract, clean = TRUE,
pattern = "@rm_non_ascii", replacement = "", extract = FALSE,
dictionary = getOption("regex.library"), ascii.out = TRUE, ...)
ex_non_ascii(text.var, trim = !extract, clean = TRUE,
pattern = "@rm_non_ascii", replacement = "", extract = TRUE,
dictionary = getOption("regex.library"), ascii.out = TRUE, ...)
text.var |
The text variable. |
trim |
logical. If |
clean |
trim logical. If |
pattern |
A character string containing a regular expression (or
character string for |
replacement |
Replacement for matched |
extract |
logical. If |
dictionary |
A dictionary of canned regular expressions to search within
if |
ascii.out |
logical. If |
... |
ignored. |
Returns a character string with "all non-ascii" removed.
iconv is used within rm_non_ascii.
iconv's behavior across operating systems may not be
consistent.
stackoverflow's MrFlick, hwnd, and Tyler Rinker <tyler.rinker@gmail.com>.
Other rm_ functions: rm_abbreviation,
rm_between, rm_bracket,
rm_caps_phrase, rm_caps,
rm_citation_tex, rm_citation,
rm_city_state_zip,
rm_city_state, rm_date,
rm_default, rm_dollar,
rm_email, rm_emoticon,
rm_endmark, rm_hash,
rm_nchar_words, rm_non_words,
rm_number, rm_percent,
rm_phone, rm_postal_code,
rm_repeated_characters,
rm_repeated_phrases,
rm_repeated_words, rm_tag,
rm_time, rm_title_name,
rm_url, rm_white,
rm_zip
x <- c("Hello World", "Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
Encoding(x) <- "latin1"
x
rm_non_ascii(x)
rm_non_ascii(x, replacement="<<FLAG>>")
ex_non_ascii(x)
ex_non_ascii(x, ascii.out=FALSE)
## simple regex to remove non-ascii
rm_default(x, pattern="[^ -~]")
ex_default(x, pattern="[^ -~]")