unique.matrix {uniqueAtomMat} | R Documentation |
These S3 methods are alternative (typically much faster) implementations of counterparts in the base
package for atomic matrices.
unique.matrix
returns a matrix with duplicated rows (or columns) removed.
duplicated.matrix
returns a logical vector indicating which rows (or columns) are duplicated.
anyDuplicated.matrix
returns an integer indicating the index of the first duplicate row (or column) if any, and 0L
otherwise.
## S3 method for class 'matrix' unique(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, signif=Inf, ...) ## S3 method for class 'matrix' duplicated(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, signif=Inf,...) ## S3 method for class 'matrix' anyDuplicated(x, incomparables = FALSE, MARGIN = 1, fromLast = FALSE, signif=Inf,...)
x |
an atomic matrix of mode |
incomparables |
a vector of values that cannot be compared, as in |
fromLast |
a logical scalar indicating if duplication should be considered
from the last, as in |
... |
arguments for particular methods. |
MARGIN |
a numeric scalar, the matrix margin to be held fixed, as in |
signif |
a numerical scalar only applicable to numeric or complex |
These S3 methods are alternative implementations of counterparts in the base
package for atomic matrices (i.e., double, integer, logical, character, complex and raw) directly based on C++98 Standard Template Library (STL) std::set
, or C++11 STL std::unordered_set
. The implementation treats the whole row (or column) vector as the key, without the intermediate steps of converting the mode to character
nor collapsing them into a scalar as done in base
. On systems with empty `R CMD config CXX1X`
, the C++98 STL std::set
is used, which is typically implemented as a self-balancing tree (usually a red-black tree) that takes O[n log(n)] to find all duplicates, where n=dim(x)[MARGIN]
. On systems with non-empty `R CMD config CXX1X`
, the C++11 STL std::unordered_set
is used, with average O(n) performance and worst case O(n^2) performance.
Missing values are regarded as equal, but NaN
is not equal to
NA_real_
.
Further, in contrast to the base
counterparts, characters are compared directly based on their internal representations; i.e., no encoding issues for characters. Complex values are compared by their real and imaginary parts separately.
unique.matrix
returns a matrix with duplicated rows (if MARGIN=1
) or columns (if MARGIN=2
) removed.
duplicated.matrix
returns a logical vector indicating which rows (if MARGIN=1
) or columns (if MARGIN=2
) are duplicated.
anyDuplicated.matrix
returns an integer indicating the index of the first (if fromLast=FALSE
) or last (if fromLast=TRUE
) duplicate row (if MARGIN=1
) or column (if MARGIN=2
) if any, and 0L
otherwise.
In contrast to the base
counterparts,
characters are compared directly based on their internal representations without considering encoding issues; for numeric and complex matrices, the default signif
is Inf
, i.e. comparing floating point values directly without rounding; and long vectors are not supported yet.
base::duplicated
, base::unique
, signif
, grpDuplicated
## prepare test data: set.seed(9992722L, kind="Mersenne-Twister") x.double=model.matrix(~gl(5,8))[sample(40), ] ## typical uses unique(x.double) unique(x.double, fromLast=TRUE) unique(t(x.double), MARGIN=2) unique(t(x.double), MARGIN=2, fromLast=TRUE) anyDuplicated(x.double) anyDuplicated(x.double, fromLast = TRUE) ## additional atomic test data x.integer=as.integer(x.double); attributes(x.integer)=attributes(x.double) x.factor=as.factor(x.integer); dim(x.factor)=dim(x.integer); dimnames(x.factor)=dimnames(x.integer) x.logical=as.logical(x.double); attributes(x.logical)=attributes(x.double) x.character=as.character(x.double); attributes(x.character)=attributes(x.double) x.complex=as.complex(x.double); attributes(x.complex)=attributes(x.double) x.raw=as.raw(x.double); attributes(x.raw)=attributes(x.double) ## compare results with base: stopifnot(identical(base::duplicated.matrix(x.double), uniqueAtomMat::duplicated.matrix(x.double) )) stopifnot(identical(base::duplicated.matrix(x.integer, fromLast=TRUE), uniqueAtomMat::duplicated.matrix(x.integer, fromLast=TRUE) )) stopifnot(identical(base::duplicated.matrix(t(x.logical), MARGIN=2L), uniqueAtomMat::duplicated.matrix(t(x.logical), MARGIN=2L) )) stopifnot(identical(base::duplicated.matrix(t(x.character), MARGIN=2L, fromLast=TRUE), uniqueAtomMat::duplicated.matrix(t(x.character), MARGIN=2L, fromLast=TRUE) )) stopifnot(identical(base::unique.matrix(x.complex), uniqueAtomMat::unique.matrix(x.complex) )) stopifnot(identical(base::unique.matrix(x.raw), uniqueAtomMat::unique.matrix(x.raw) )) stopifnot(identical(base::unique.matrix(x.factor), uniqueAtomMat::unique.matrix(x.factor) )) stopifnot(identical(base::duplicated.matrix(x.double, MARGIN=0), uniqueAtomMat::duplicated.matrix(x.double, MARGIN=0) )) stopifnot(identical(base::anyDuplicated.matrix(x.integer, MARGIN=0), uniqueAtomMat::anyDuplicated.matrix(x.integer, MARGIN=0) )) ## benchmarking if (require(microbenchmark)){ print(microbenchmark(base::duplicated.matrix(x.double))) print(microbenchmark(uniqueAtomMat::duplicated.matrix(x.double))) print(microbenchmark(base::duplicated.matrix(x.character))) print(microbenchmark(uniqueAtomMat::duplicated.matrix(x.character))) }else{ print(system.time(replicate(5e3L, base::duplicated.matrix(x.double)))) print(system.time(replicate(5e3L, uniqueAtomMat::duplicated.matrix(x.double)))) print(system.time(replicate(5e3L, base::duplicated.matrix(x.character)))) print(system.time(replicate(5e3L, uniqueAtomMat::duplicated.matrix(x.character)))) }