splitstackshape-package {splitstackshape}R Documentation

splitstackshape

Description

Stack and Reshape Datasets After Splitting Concatenated Values

Details

Package: splitstackshape
Type: Package
Version: 1.4.2
Date: 2014-10-23
License: GPL-3

Online data collection tools like Google Forms often export multiple-response questions with data concatenated in cells. The concat.split family of functions splits such data into separate cells. The package also includes functions to stack groups of columns and to reshape wide data, even when the data are "unbalanced"—something which reshape does not handle, and which melt and dcast from reshape2 do not easily handle.

Author(s)

Ananda Mahto

Maintainer: Ananda Mahto <ananda@mahto.info>

Examples

## concat.split
head(cSplit(concat.test, "Likes", drop = TRUE))

## Reshape
set.seed(1)
mydf <- data.frame(id_1 = 1:6, id_2 = c("A", "B"),
                   varA.1 = sample(letters, 6),
                   varA.2 = sample(letters, 6),
                   varA.3 = sample(letters, 6),
                   varB.2 = sample(10, 6),
                   varB.3 = sample(10, 6),
                   varC.3 = rnorm(6))
mydf
Reshape(mydf, id.vars = c("id_1", "id_2"),
        var.stubs = c("varA", "varB", "varC"))

## Stacked
Stacked(data = mydf, id.vars = c("id_1", "id_2"),
        var.stubs = c("varA", "varB", "varC"),
        sep = ".")
## Not run: 
## Processing times
set.seed(1)
Nrow <- 1000000
Ncol <- 10
mybigdf <- cbind(id = 1:Nrow, as.data.frame(matrix(rnorm(Nrow*Ncol),
                                                   nrow=Nrow)))
head(mybigdf)
dim(mybigdf)
tail(mybigdf)
A <- names(mybigdf)
names(mybigdf) <- c("id", paste("varA", 1:3, sep = "_"),
                    paste("varB", 1:4, sep = "_"),
                    paste("varC", 1:3, sep = "_"))
system.time({
   O1 <- Reshape(mybigdf, id.vars = "id",
   var.stubs = c("varA", "varB", "varC"), sep = "_")
   O1 <- O1[order(O1$id, O1$time), ]
})
system.time({
   O2 <- merged.stack(mybigdf, id.vars="id",
   var.stubs=c("varA", "varB", "varC"), sep = "_")
})
system.time({
   O3 <- Stacked(mybigdf, id.vars="id",
   var.stubs=c("varA", "varB", "varC"), sep = "_")
})
DT <- data.table(mybigdf)
system.time({
   O4 <- merged.stack(DT, id.vars="id",
   var.stubs=c("varA", "varB", "varC"), sep = "_")
})

## End(Not run)



[Package splitstackshape version 1.4.2 Index]