Subword scheme for keyword search | IEEE Conference Publication | IEEE Xplore

Subword scheme for keyword search


Abstract:

Keyword search (KWS) is an important application of spoken language technology. The technique of Large Vocabulary Continuous Speech Recognition (LVCSR) is playing an impo...Show More

Abstract:

Keyword search (KWS) is an important application of spoken language technology. The technique of Large Vocabulary Continuous Speech Recognition (LVCSR) is playing an important role in KWS system. However, for a language with large vocabulary and relatively insufficient text corpus, the vocabulary size keeps going up very quickly with the increasing amount of text, as we observed in Tamil. This brings difficulty in training a reliable language model, which may undermine KWS performance. Subword unit has been successfully employed in KWS system to handle out-of-vocabulary (OOV) problem. Inspired by this, we propose a novel subword scheme from the perspective of pronunciation to alleviate the large vocabulary problem. We find that the subword-based system outperforms our best word-based system on Tamil conversational telephone speech. The experiment of system combination shows that, over the best word-based system, a single subword-based system contains more complementary information than the total of that of the other three word-based systems.
Date of Conference: 07-10 December 2014
Date Added to IEEE Xplore: 02 April 2015
Electronic ISBN:978-1-4799-7129-9
Conference Location: South Lake Tahoe, NV, USA

References

References is not available for this document.