Skip to Main Content
In this paper, a novel approach is proposed for secondary feature extraction based on clusters tracking in spectro-temporal domain. Because of high dimensionality of the spectro-temporal features space, this domain is unsuitable for practical speech recognition systems. In order to reduce the dimensions of the feature space, weighted K-means (WKM) clustering technique is applied to spectro-temporal domain. The elements of mean vectors and covariance matrices of clusters are considered as the feature vector of each frame. However the cluster locations change gradually over the time. The main approach is based on the idea that the variations in clusters locations should be temporally tracked frame by frame and the parameters of these variations are considered in the extraction of secondary feature vectors of each speech frame. Several models are used to register the clusters in the new coming frame. In addition, a new architecture is proposed to classify the speech frames by a combining classifier using both tracked and non-tracked secondary features. The assessments were conducted for the proposed feature vectors on classification of several subsets of TIMIT database phonemes. Using tracked secondary feature vectors, the result was improved to 77.4% on voiced plosives classification which was relatively 1.8% higher than the results of non-tracked secondary feature vectors. The results on other subsets showed good improvement in classification rate too.