| simil {proxyC} | R Documentation |
Fast similarity/distance computation function for large sparse matrices. You
can floor small similarity value to to save computation time and storage
space by an arbitrary threashold (min_simil) or rank (rank).
Please increase the numbner of threads for better perfromance using
setThreadOptions.
simil(x, y = NULL, margin = 1, method = c("cosine", "correlation",
"jaccard", "ejaccard", "dice", "edice", "hamman", "simple matching",
"faith"), min_simil = NULL, rank = NULL, drop0 = FALSE,
digits = 14)
dist(x, y = NULL, margin = 1, method = c("euclidean", "chisquared",
"hamming", "kullback", "manhattan", "maximum", "canberra", "minkowski"),
p = 2, drop0 = FALSE, digits = 14)
x |
Matrix object |
y |
if a matrix or Matrix object is provided, proximity
between documents or features in |
margin |
integer indicating margin of similarity/distance computation. 1 indicates rows or 2 indicates columns. |
method |
method to compute similarity or distance |
min_simil |
the minimum similarity value to be recoded. |
rank |
an integer value specifying top-n most similarity values to be recorded. |
drop0 |
if |
digits |
determines rounding of small values towards zero. Use primarily to correct rounding errors in C++. See zapsmall. |
p |
weight for minkowski distance |
zapsmall
mt <- Matrix::rsparsematrix(100, 100, 0.01) simil(mt, method = "cosine")[1:5, 1:5] mt <- Matrix::rsparsematrix(100, 100, 0.01) dist(mt, method = "euclidean")[1:5, 1:5]