Loading [MathJax]/extensions/MathMenu.js
Unsupervised clustering of syllables for language identification | IEEE Conference Publication | IEEE Xplore

Unsupervised clustering of syllables for language identification


Abstract:

Automatic Language Recognition makes extensive use of phonotactics for identifying a language. The accuracy of phonotactic information depends upon the amount of data ava...Show More

Abstract:

Automatic Language Recognition makes extensive use of phonotactics for identifying a language. The accuracy of phonotactic information depends upon the amount of data available for training. The state of the art approaches capture the phonotactics in terms of cross-lingual GMM tokens. The accuracy of such tokenisers crucially depends upon the availability of specific corpora. In this paper, we suggest an alternative to GMM tokens, namely, syllable based tokens. Syllables implicitly capture the phonotactics across phonemes in a language. Unsupervised Syllable tokenisation for language identification requires a) segmentation of speech into syllable-like units syllable level, and b) unsupervised modeling of the syllable tokens by Hidden Markov Models. The first issue is addressed by segmenting the wavform into syllable-like units using a well-established group delay based segmentation algorithm. To address the second issue, two different solutions are proposed, namely, (i) a top down clustering approach, which does not require significant parameter tuning, and is also robust, and (ii) a universal syllable approach. In this syllable models for every language are obtained from adapted universal syllable models. Experimental results on the OGI 1992 multilingual corpus and NIST 2003 LRE corpus show that the proposed approaches donot require significant tuning of parameters and the performance is comparable to that of a well-tuned baseline syllable tokenisation system.
Date of Conference: 27-31 August 2012
Date Added to IEEE Xplore: 18 October 2012
Print ISBN:978-1-4673-1068-0

ISSN Information:

Conference Location: Bucharest, Romania

Contact IEEE to Subscribe

References

References is not available for this document.