| phonetics {RecordLinkage} | R Documentation |
Interface to phonetic coding functions.
pho_h(str) soundex(str)
str |
A character vector or matrix. Factors are converted to character. |
Translates its argument to a phonetic code. pho_h
by Jörg Michael (see references) is intended for German language
and normalizes umlauts and accent characters.
soundex is a widespread algorithm for English names. This implementation
can only handle common characters. Both algorithms strip off
non-alphabetical characters, with the exception that numbers are left
unchanged by pho_h.
The C code for soundex was taken from PostgreSQL 8.3.6.
A character vector or matrix with the same size and dimensions as str,
containing its phonetic encoding.
Andreas Borg (R interface only)
Jörg Michael, Doppelgänger gesucht – Ein Programm für kontextsensitive phonetische Textumwandlung, in: c't 1999, No. 25, pp. 252–261. The Source code is published (under GPL) at http://www.heise.de/ct/ftp/99/25/252/.
jarowinkler and levenshteinSim
for string comparison.