| cells {validate} | R Documentation |
Cell counts and differences for a series of datasets
cells(..., .list = NULL, compare = c("to_first", "sequential"))
... |
For |
.list |
A |
compare |
How to compare the datasets. |
An object of class cellComparison, which is really an array
with a few extra attributes. It counts the total number of cells, the number of
missings, the number of altered values and changes therein as compared to
the reference defined in how.
When comparing the contents of two data sets, the total number of cells in the current data set can be partitioned as in the following figure.
This function computes the partition for two or more
datasets, comparing the current set to the first (default) or to the
previous (by setting compare='sequential').
This function assumes that the datasets have the same dimensions and that both rows and columns are ordered similarly.
The figure is reproduced from MPJ van der Loo and E. De Jonge (2018) Statistical Data Cleaning with applications in R (John Wiley & Sons).
Other comparing:
as.data.frame,cellComparison-method,
as.data.frame,validatorComparison-method,
barplot,cellComparison-method,
barplot,validatorComparison-method,
compare(),
match_cells(),
plot,cellComparison-method,
plot,validatorComparison-method
data(retailers) # start with raw data step0 <- retailers # impute turnovers step1 <- step0 step1$turnover[is.na(step1$turnover)] <- mean(step1$turnover,na.rm=TRUE) # flip sign of negative revenues step2 <- step1 step2$other.rev <- abs(step2$other.rev) # create an overview of differences, comparing to the previous step cells(raw = step0, imputed = step1, flipped = step2, compare="sequential") # create an overview of differences compared to raw data out <- cells(raw = step0, imputed = step1, flipped = step2) out # Graphical overview of the changes plot(out) barplot(out) # transform data to data.frame (easy for use with ggplot) as.data.frame(out)