Journals & Magazines >IEEE Transactions on Speech a... >Volume: 7 Issue: 6

Kanji-to-Hiragana conversion based on a length-constrained n-gram analysis

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A common problem in speech processing is the conversion of the written form of a language to a set of phonetic symbols representing the pronunciation. In this paper, we f...Show More

Metadata

Abstract:

A common problem in speech processing is the conversion of the written form of a language to a set of phonetic symbols representing the pronunciation. In this paper, we focus on an aspect of this problem specific to the Japanese language. Written Japanese consists of a mixture of three types of symbols: Kanji, Hiragana, and Katakana. We describe an algorithm for converting conventional Japanese orthography to a Hiragana-like symbol set that closely approximates the most common pronunciation of the text. The algorithm is based on two hypotheses: (1) the correct reading of a Kanji character can be determined by examining a small number of adjacent characters and (2) the number of such combinations required in a dictionary is manageable. The algorithm described here converts the input test by selecting the most probable sequence of orthographic units (n-grams) that can be concatenated to form the input text. In closed-set testing, the n-gram algorithm was shown to provide better performance than several public domain algorithms, achieving a sentence error rate of 3% on a wide range of text material. Though the focus of this paper is written Japanese, the pattern matching algorithm described here has applications to similar problems in other languages.

Published in: IEEE Transactions on Speech and Audio Processing ( Volume: 7, Issue: 6, November 1999)

Page(s): 685 - 696

Date of Publication: 06 August 2002

ISSN Information:

DOI: 10.1109/89.799694

References is not available for this document.

Kanji-to-Hiragana conversion based on a length-constrained n-gram analysis

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Kanji-to-Hiragana conversion based on a length-constrained n-gram analysis

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?