Skip to Main Content
The future commercialization of speaker- and speech-recognition technology is impeded by the large degradation in system performance due to environmental differences between training and testing conditions. This is known as the "mismatched condition." Studies have shown [l] that most contemporary systems achieve good recognition performance if the conditions during training are similar to those during operation (matched conditions). Frequently, mismatched conditions axe present in which the performance is dramatically degraded as compared to the ideal matched conditions. A common example of this mismatch is when training is done on clean speech and testing is performed on noise- or channel-corrupted speech. Robust speech techniques  attempt to maintain the performance of a speech processing system under such diverse conditions of operation. This article presents an overview of current speaker-recognition systems and the problems encountered in operation, and it focuses on the front-end feature extraction process of robust speech techniques as a method of improvement. Linear predictive (LP) analysis, the first step of feature extraction, is discussed, and various robust cepstral features derived from LP coefficients are described. Also described is the afJine transform, which is a feature transformation approach that integrates mismatch to simultaneously combat both channel and noise distortion.