Loading [MathJax]/extensions/MathMenu.js
Using Pronunciation-Based Morphological Subword Units to Improve OOV Handling in Keyword Search | IEEE Journals & Magazine | IEEE Xplore

Using Pronunciation-Based Morphological Subword Units to Improve OOV Handling in Keyword Search


Abstract:

Out-of-vocabulary (OOV) keywords present a challenge for keyword search (KWS) systems especially in the low-resource setting. Previous research has centered around approa...Show More

Abstract:

Out-of-vocabulary (OOV) keywords present a challenge for keyword search (KWS) systems especially in the low-resource setting. Previous research has centered around approaches that use a variety of subword units to recover OOV words. This paper systematically investigates morphology-based subword modeling approaches on seven low-resource languages. We show that using morphological subword units (morphs) in speech recognition decoding is substantially better than expanding word-decoded lattices into subword units including phones, syllables and morphs. As alternatives to grapheme-based morphs, we apply unsupervised morphology learning to sequences of phonemes, graphones, and syllables. Using one of these phone-based morphs is almost always better than using the grapheme-based morphs, but the particular choice varies with the language. By combining the different methods, a substantial gain is obtained over the best single case for all languages, especially for OOV performance.
Page(s): 79 - 92
Date of Publication: 30 October 2015

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.