| RelaxedWordMoversDistance {text2vec} | R Documentation |
Relaxed word movers distance tries to measure distance between documents by calculating how hard is to transform words from first document into words from second document and vice versa. For more detail see original article: http://mkusner.github.io/publications/WMD.pdf.
RelaxedWordMoversDistance RWMD
R6Class object.
progressbarlogical = TRUE whether to display progressbar
For usage details see Methods, Arguments and Examples sections.
rwmd = RelaxedWordMoversDistance$new(wv, method = c("cosine", "euclidean"), normalize = TRUE, progressbar = interactive())
rwmd$dist2(x, y)
rwmd$pdist2(x, y)
$new(wv, method = c("cosine", "euclidean"))Constructor for RWMD model For description of arguments see Arguments section
$dist2(x, y)Computes distance between each row of sparse matrix
x and each row of sparse matrix y
$pdist2(x, y)Computes "parallel" distance between rows of
sparse matrix x and corresponding rows of the sparse matrix y
RWMD object
x sparse document term matrix
y = NULL sparse document term matrix.
If y = NULL (as by default), we will assume y = x
word vectors. Numeric matrix which contains word embeddings. Rows - words, columns - corresponding vectors. Rows should have word names.
name of the distance for measuring similarity between two word vectors.
In original paper authors use "euclidean",
however we use "cosine" by default (better from our experience).
This means distance = 1 - cosine_angle_betwen_wv
## Not run:
data("movie_review")
tokens = word_tokenizer(tolower(movie_review$review))
v = create_vocabulary(itoken(tokens))
v = prune_vocabulary(v, term_count_min = 5, doc_proportion_max = 0.5)
it = itoken(tokens)
vectorizer = vocab_vectorizer(v)
dtm = create_dtm(it, vectorizer)
tcm = create_tcm(it, vectorizer, skip_grams_window = 5)
glove_model = GloVe$new(word_vectors_size = 50, vocabulary = v, x_max = 10)
wv = glove_model$fit_transform(tcm, n_iter = 10)
# get average of main and context vectors as proposed in GloVe paper
wv = wv + t(glove_model$components)
rwmd_model = RWMD$new(wv)
rwmd_dist = dist2(dtm[1:100, ], dtm[1:10, ], method = rwmd_model, norm = 'none')
head(rwmd_dist)
## End(Not run)