Abstract:
Automatic Spoken language identification (LID) refers to the automatic process of identifying languages spoken in the audio files. Pure acoustic approaches have shown gre...Show MoreMetadata
Abstract:
Automatic Spoken language identification (LID) refers to the automatic process of identifying languages spoken in the audio files. Pure acoustic approaches have shown great potential in LID. Since acoustic approaches have become more and more popular, phonetic information has been largely overlooked. In this paper, we present a genetic-based fusion approach based on the score probabilities of two phonetic LID systems. There are two SVM classifiers trained on perplexities as their feature vectors which are obtained from phone language models of different phone recognizers. Two phone recognizers are here utilized; one decodes the speech file to a sequence of IPA alphabet, as a universal phone recognizer, and the other is a Farsi phone recognizer which is trained on FARSDAT databases. With the help of the genetic-based fusion approach, we will extract 54 weights. We have 27 languages in our database and 2 individual phonetic LID systems; therefore, we will achieve 54 weights for our fusion. The first 27 weights correspond to our system using a universal phone recognizer and the second 27 weights are related to our system with the Farsi phone recognizer. In the end, we use these weights to combine the results of each of our individual phonetic LID systems. The experimental results conducted on 27 languages within the NIST-LRE09 corpus demonstrated that the proposed fusion approach could greatly increase the classification accuracy of target languages. It should also be noted that we separate the files of each speaker and place them only in one set (train set, development set, or test set) to prevent speaker-related biases.
Date of Conference: 28-29 October 2021
Date Added to IEEE Xplore: 28 February 2022
ISBN Information: