phonetics {RecordLinkage}R Documentation

Phonetic Code

Description

Interface to phonetic coding functions.

Usage

pho_h(str)
soundex(str)

Arguments

str

A character vector or matrix. Factors are converted to character.

Details

Translates its argument to a phonetic code. pho_h by Jörg Michael (see references) is intended for German language and normalizes umlauts and accent characters. soundex is a widespread algorithm for English names. This implementation can only handle common characters. Both algorithms strip off non-alphabetical characters, with the exception that numbers are left unchanged by pho_h.

The C code for soundex was taken from PostgreSQL 8.3.6.

Value

A character vector or matrix with the same size and dimensions as str, containing its phonetic encoding.

Author(s)

Andreas Borg (R interface only)

References

Jörg Michael, Doppelgänger gesucht – Ein Programm für kontextsensitive phonetische Textumwandlung, in: c't 1999, No. 25, pp. 252–261. The Source code is published (under GPL) at http://www.heise.de/ct/ftp/99/25/252/.

See Also

jarowinkler and levenshteinSim for string comparison.


[Package RecordLinkage version 0.4-11.2 Index]