By Topic

Automatic language identification using support vector machines and phonetic N-gram

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yan Deng ; Dept. of Electron. Eng., Tsinghua Univ., Beijing ; Jia Liu

In this paper, we describe two approaches for language identification (LID) using support vector machines (SVM) and phonetic n-gram. One is to use the language model scores of phone sequences to do SVM training. The other is to use the n-gram probabilities of those phones to train SVM models. For the second approach, we propose a new effective normalization method. In the experiments of 30 s test for 5 languages, our new normalization method shows a relative reduction of 15.8% in terms of equal error rate (EER) compared with the traditional one. And it makes the system using the second approach reaches an EER of 2.4%, a relative reduction of about 35.5% in comparison with the first one. Details of implementation and experimental results are presented in this paper.

Published in:

Audio, Language and Image Processing, 2008. ICALIP 2008. International Conference on

Date of Conference:

7-9 July 2008