By Topic

Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yun Lei ; Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA ; Hansen, J.H.L.

Automatic dialect classification has emerged as an important area in the speech research field. Effective dialect classification is useful in developing robust speech systems, such as speech recognition and speaker identification. In this paper, two novel algorithms are proposed to improve dialect classification for text-independent spontaneous speech in Arabic and Spanish languages, along with probe results for Chinese. The problem considers the case where no transcripts but dialect labels are available for training and test data, and speakers are speaking spontaneously, which is defined as text-independent dialect classification. The Gaussian mixture model (GMM) is used as the baseline system for text-independent dialect classification. The major motivation is to suppress confused/distractive regions from the dialect language space and emphasize discriminative/sensitive information of the available dialects. In the training phase, a symmetric version of the Kullback-Leibler divergence is used to find the most discriminative GMM mixtures (KLD-GMM), where the confused acoustic GMM region is suppressed. For testing, the more discriminative frames are detected and used via the location of where the frames are in the GMM mixture feature space, which is termed frame selection decoding (FSD-GMM). The first KLD-GMM and second FSD-GMM techniques, are shown to improve dialect classification performance for three-way dialect tasks. The two algorithms and their combination are evaluated on dialects of Arabic and Spanish corpora. Measurable improvement is achieved in both two cases, over a generalized maximum-likelihood estimation GMM baseline (MLE-GMM).

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:19 ,  Issue: 1 )
Biometrics Compendium, IEEE