Skip to Main Content
The prevailing system for language recognition is the parallel phoneme recognition followed by vector space modeling (PPRVSM), which uses a vector space model to describe the cooccurrence information of phones. As the super-vectors are composed of phonetic N-Grams, so for high dimension vectors, there is a problem that the number of N-Grams grows exponentially as the order N increases, which will result in data sparseness. In this paper, we propose a feature selection algorithm to solve this problem, which uses the maximum relevance criteria based on mutual information to select the most discriminative N-Grams to identify languages. The effectiveness of the technique is demonstrated on the NIST 2005 language recognition 30-second task. And we achieve 4.81% in terms of equal-error-rate (EER).