Skip to Main Content
This paper presents a method for extracting distinctive phonetic features (DPFs) for automatic speech recognition (ASR). The method comprises three stages: i) a acoustic feature extractor, ii) a multilayer neural network (MLN) and iii) a hidden Markov model (HMM) based classifier. At first stage, acoustic features, local features (LFs), are extracted from input speech. On the other stage, MLN generates a 45-dimentional DPF vector from the LFs of 75- dimentions. Finally, these 45-dimentional DPF vector is inserted into an HMM-based classifier to obtain phoneme strings. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed DPF extractor provides a higher phoneme correct rate and accuracy with fewer mixture components in the HMMs compared to the method based on mel frequency cepstral coefficients (MFCCs). Moreover, a higher correct rate for each phonetic feature is obtained using the proposed method.
Date of Conference: 15-17 Dec. 2010