| simil {proxyC} | R Documentation |
Fast similarity/distance computation function for large sparse matrices. You
can floor small similarity value to to save computation time and storage
space by an arbitrary threshold (min_simil) or rank (rank).
Please increase the number of threads for better performance using
setThreadOptions.
simil(
x,
y = NULL,
margin = 1,
method = c("cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamman",
"simple matching", "faith"),
min_simil = NULL,
rank = NULL,
drop0 = FALSE,
diag = FALSE,
digits = 14
)
dist(
x,
y = NULL,
margin = 1,
method = c("euclidean", "chisquared", "kullback", "manhattan", "maximum", "canberra",
"minkowski", "hamming"),
p = 2,
smooth = 0,
drop0 = FALSE,
diag = FALSE,
digits = 14
)
x |
Matrix object |
y |
if a matrix or Matrix object is provided, proximity
between documents or features in |
margin |
integer indicating margin of similarity/distance computation. 1 indicates rows or 2 indicates columns. |
method |
method to compute similarity or distance |
min_simil |
the minimum similarity value to be recorded. |
rank |
an integer value specifying top-n most similarity values to be recorded. |
drop0 |
if |
diag |
if |
digits |
determines rounding of small values towards zero. Use primarily to correct rounding errors in C++. See zapsmall. |
p |
weight for Minkowski distance |
smooth |
adds a fixed value to all the cells to avoid division by zero. Only used when 'method' is "chisquared" or "kullback". |
zapsmall
mt <- Matrix::rsparsematrix(100, 100, 0.01) simil(mt, method = "cosine")[1:5, 1:5] mt <- Matrix::rsparsematrix(100, 100, 0.01) dist(mt, method = "euclidean")[1:5, 1:5]