| bigmemory-package {bigmemory} | R Documentation |
Create, store, access, and manipulate massive matrices. Matrices are, by
default, allocated to shared memory and may use memory-mapped files.
Packages biganalytics, synchronicity, bigalgebra, and
bigtabulate provide advanced functionality. Access to and
manipulation of a big.matrix object is exposed in by an S4
class whose interface is simlar to that of an matrix. Use of
these packages in parallel environments can provide substantial speed and
memory efficiencies. bigmemory also provides a C++
framework for the development of new tools that can work both with
big.matrix and native matrix objects.
Index of functions/methods (grouped in a friendly way):
big.matrix, filebacked.big.matrix, as.big.matrix is.big.matrix, is.separated, is.filebacked describe, attach.big.matrix, attach.resource sub.big.matrix, is.sub.big.matrix dim, dimnames, nrow, ncol, print, head, tail, typeof, length read.big.matrix, write.big.matrix mwhich morder, mpermute deepcopy flush
Multi-gigabyte data sets challenge and frustrate users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of 's rich statistical programming environment. The package bigmemory and sister packages biganalytics, synchronicity, bigtabulate, and bigalgebra bridge this gap, implementing massive matrices and supporting their manipulation and exploration. The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set. The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster. These features of the Bigmemory Project open the door for powerful and memory-efficient parallel analyses and data mining of massive data sets.
This project (bigmemory and its sister packages) is still actively developed, although the design and current features can be viewed as "stable." Please feel free to email us with any questions: bigmemoryauthors@gmail.com.
Various options are available.
options(bigmemory.typecast.warning) can be set to avoid annoying
warnings that might occur if, for example, you assign objects (typically
type double) to char, short, or integer big.matrix objects.
options(bigmemory.print.warning) protects against extracting and
printing a massive matrix (which would involve the creation of a second
massive copy of the matrix). options(bigmemory.allow.dimnames) by
default prevents the setting of dimnames attributes, because they
aren't allocated to shared memory and changes will not be visible across
processes. options(bigmemory.default.type) is "double" be
default (a change in default behavior as of 4.1.1) but may be changed by the
user.
Versions >=4.0 represent a major redesign, with the mutexes (locking)
abstracted to package synchronicity, the exploratory data analysis
functionality relocated to package biganalytics, and new linear
algebra support available in package bigalgebra. Package
bigtabulate extends the bigmemory package with table-, tapply-,
and split-like behavior. The functions may also be used with regular
matrices for speed and memory-efficiency gains. Package bigmemory
itself is now minimalist, providing only the core functionality. As an
example, the apply() method appears in biganalytics, supporting
exploration and analysis, while mwhich, morder
and mpermute appear in bigmemory as fundamental tools
for data manipulation.
Note that you can't simply use a big.matrix with many (most) existing
functions (e.g. lm, kmeans). One nice exception
is split, because this function only accesses subsets of the
matrix.
Michael J. Kane, John W. Emerson, Peter Haverty, and Charles Determan Jr.
Maintainers: Michael J. Kane <bigmemoryauthors@gmail.com>
For example, big.matrix, mwhich,
read.big.matrix
# Our examples are all trivial in size, rather than burning huge amounts
# of memory.
x <- big.matrix(5, 2, type="integer", init=0,
dimnames=list(NULL, c("alpha", "beta")))
x
x[1:2,]
x[,1] <- 1:5
x[,"alpha"]
colnames(x)
options(bigmemory.allow.dimnames=TRUE)
colnames(x) <- NULL
x[,]