simil {proxyC}R Documentation

Compute similarity/distance between rows or columns of large matrices

Description

Fast similarity/distance computation function for large sparse matrices. You can floor small similarity value to to save computation time and storage space by an arbitrary threashold (min_simil) or rank (rank). Please increase the numbner of threads for better perfromance using setThreadOptions.

Usage

simil(x, y = NULL, margin = 1, method = c("cosine", "correlation",
  "jaccard", "ejaccard", "dice", "edice", "hamman", "simple matching",
  "faith"), min_simil = NULL, rank = NULL, drop0 = FALSE,
  digits = 14)

dist(x, y = NULL, margin = 1, method = c("euclidean", "chisquared",
  "hamming", "kullback", "manhattan", "maximum", "canberra", "minkowski"),
  p = 2, drop0 = FALSE, digits = 14)

Arguments

x

Matrix object

y

if a matrix or Matrix object is provided, proximity between documents or features in x and y is computed.

margin

integer indicating margin of similarity/distance computation. 1 indicates rows or 2 indicates columns.

method

method to compute similarity or distance

min_simil

the minimum similarity value to be recoded.

rank

an integer value specifying top-n most similarity values to be recorded.

drop0

if TRUE, zero values are removed regardless of min_simil or rank.

digits

determines rounding of small values towards zero. Use primarily to correct rounding errors in C++. See zapsmall.

p

weight for minkowski distance

See Also

zapsmall

Examples

mt <- Matrix::rsparsematrix(100, 100, 0.01)
simil(mt, method = "cosine")[1:5, 1:5]
mt <- Matrix::rsparsematrix(100, 100, 0.01)
dist(mt, method = "euclidean")[1:5, 1:5]

[Package proxyC version 0.1.5 Index]