| group_str {sjmisc} | R Documentation |
This function groups elements of a string vector (character or string variable) according to the element's distance ('similatiry'). The more similar two string elements are, the higher is the chance to be combined into a group.
group_str(strings, maxdist = 2, method = "lv", strict = FALSE, trim.whitespace = TRUE, remove.empty = TRUE, showProgressBar = FALSE)
strings |
Character vector with string elements. |
maxdist |
Maximum distance between two string elements, which is allowed to treat two elements as similar or equal. |
method |
Method for distance calculation. The default is |
strict |
Logical; if |
trim.whitespace |
Logical; if |
remove.empty |
Logical; if |
showProgressBar |
Logical; if |
A character vector where similar string elements (values) are recoded
into a new, single value. The return value is of same length as
strings, i.e. grouped elements appear multiple times, so
the count for each grouped string is still avaiable (see 'Examples').
oldstring <- c("Hello", "Helo", "Hole", "Apple",
"Ape", "New", "Old", "System", "Systemic")
newstring <- group_str(oldstring)
# see result
newstring
# count for each groups
table(newstring)
# print table to compare original and grouped string
frq(oldstring)
frq(newstring)
# larger groups
newstring <- group_str(oldstring, maxdist = 3)
frq(oldstring)
frq(newstring)
# be more strict with matching pairs
newstring <- group_str(oldstring, maxdist = 3, strict = TRUE)
frq(oldstring)
frq(newstring)