simil {proxyC}R Documentation

Compute similarity/distance between rows or columns of large matrices

Description

Fast similarity/distance computation function for large sparse matrices. You can floor small similarity value to to save computation time and storage space by an arbitrary threshold (min_simil) or rank (rank). Please increase the number of threads for better performance using setThreadOptions.

Usage

simil(
  x,
  y = NULL,
  margin = 1,
  method = c("cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamman",
    "simple matching", "faith"),
  min_simil = NULL,
  rank = NULL,
  drop0 = FALSE,
  diag = FALSE,
  digits = 14
)

dist(
  x,
  y = NULL,
  margin = 1,
  method = c("euclidean", "chisquared", "kullback", "manhattan", "maximum", "canberra",
    "minkowski", "hamming"),
  p = 2,
  smooth = 0,
  drop0 = FALSE,
  diag = FALSE,
  digits = 14
)

Arguments

x

Matrix object

y

if a matrix or Matrix object is provided, proximity between documents or features in x and y is computed.

margin

integer indicating margin of similarity/distance computation. 1 indicates rows or 2 indicates columns.

method

method to compute similarity or distance

min_simil

the minimum similarity value to be recorded.

rank

an integer value specifying top-n most similarity values to be recorded.

drop0

if TRUE, zero values are removed regardless of min_simil or rank.

diag

if TRUE, only compute diagonal elements of the similarity/distance matrix; useful when comparing corresponding rows or columns of 'x' and 'y'.

digits

determines rounding of small values towards zero. Use primarily to correct rounding errors in C++. See zapsmall.

p

weight for Minkowski distance

smooth

adds a fixed value to all the cells to avoid division by zero. Only used when 'method' is "chisquared" or "kullback".

See Also

zapsmall

Examples

mt <- Matrix::rsparsematrix(100, 100, 0.01)
simil(mt, method = "cosine")[1:5, 1:5]
mt <- Matrix::rsparsematrix(100, 100, 0.01)
dist(mt, method = "euclidean")[1:5, 1:5]

[Package proxyC version 0.2.0 Index]