hash_lemmas {lexicon}R Documentation

Lemmatization List

Description

A dataset based on Mechura's (2016) English lemmatization list. This data set can be useful for join style lemma replacement of inflected token forms to their root lemmas. While this is not a true morphological analysis this style of lemma replacement is fast and typically still robust.

Usage

data(hash_lemmas)

Format

A data frame with 41,532 rows and 2 variables

Details

References

Mechura, M. B. (2016). Lemmatization list: English (en) [Data file]. Retrieved from http://www.lexiconista.com


[Package lexicon version 0.7.4 Index]