Skip to Main Content
In this paper, we propose a speaker-independent isolated ward recognition system whose performance is comparable to that of a conventional isolated word recognizer, but whose computation is greatly reduced. The structure of the proposed recognizer consists of a word-based vector quantization (VQ) preprocessor, followed by a conventional DTW postprocessor. The purpose of the preprocessor is essentially to eliminate from further consideration all words in the vocabulary which are unlikely recognition candidates. In some cases, the preprocessor will be able to eliminate all word candidates except one; for such cases, there is no further processing required for word recognition. In all other cases (i.e., when more than one word candidate is passed on), a dynamic time warping (DTW) processor is used to re-solve finer acoustical distinctions among the remaining word candidates. The performance of this type of recognizer (i.e., using a word-based preprocessor and a standard DTW comparison to make finer distinctions) is affected by a number of factors involved with the details of exactly how the system is implemented-e.g., the distortion measure used in the preprocessor and in the DTW comparison, the size of the VQ codebook for each vocabulary word, the decision thresholds of the preprocessor, etc. Several of these factors were studied experimentally using testing databases consisting of isolated digits and words from a vocabulary of 129 airline terms. The results show that the proposed preprocessor has the capability of reducing computation for recognition by up to an order of magnitude, while maintaining the same performance as that obtained using a DTW comparison without the pre-processor. A somewhat smaller reduction in memory over the straight DTW implementation is also obtained in the proposed approach.