| cSplit_f {splitstackshape} | R Documentation |
A variation of the concat.split family of functions designed for
large rectangular datasets. This function makes use of fread from the "data.table" package for very speedy splitting of concatenated columns of data.
cSplit_f(indt, splitCols, sep, drop = TRUE, dotsub = "|", stripWhite = FALSE)
indt |
The input |
splitCols |
The columns that need to be split up. |
sep |
The character or characters that serve as delimiters within the columns that need to be split up. If different columns use different delimiters, enter the delimiters as a character vector. |
drop |
Logical. Should the original columns be dropped? Defaults to
|
dotsub |
The character that should be substituted as a delimiter
if |
stripWhite |
Logical. Should whitespace be stripped before writing to the temporary file? Defaults to |
While the general concat.split functions (cSplit in particular) are able to handle
"unbalanced" datasets (for example, where the number of fields in a given
column might differ from row to row) because of the nature of fread
from the "data.table" package, this function does not support such data
types.
A data.table.
Ananda Mahto. Thanks also to Arun Srinivasan for helping to refine this function.
http://stackoverflow.com/a/19231054/1270695
## Sample data. Change `n` to larger values to test on larger data
set.seed(1)
n <- 10
mydf <- data.frame(id = sequence(n))
mydf <- within(mydf, {
v3 <- do.call(paste, c(data.frame(matrix(sample(
letters, n*4, TRUE), ncol = 4)), sep = "_"))
v2 <- do.call(paste, c(data.frame(matrix(sample(
LETTERS, n*3, TRUE), ncol = 3)), sep = "."))
v1 <- do.call(paste, c(data.frame(matrix(sample(
10, n*2, TRUE), ncol = 2)), sep = "-"))
})
mydf
cSplit_f(mydf, splitCols = c("v1", "v2", "v3"), sep = c("-", ".", "_"))