Skip to Main Content
This paper considers the problem of automatically detecting and locating key words in a stream of continuous speech. The system described here is a template-matching procedure which uses as its basic waveform features a set of linear prediction coefficients. The similarity measure between a segment of the template and a segment of the incoming speech stream is taken to be a ratio of minimum prediction residuals. This similarity measure is used in conjunction with a dynamic-programming time-warp algorithm developed by Bridle and a novel method for using multiple templates. Using templates and incoming speech spoken by the same person in a quiet room, an accuracy in excess of 99 percent was obtained. Further experiments are described which explore cross-speaker word spotting and the effects of noise on system performance. The results of these experiments suggest that the technique described in this paper could well form the basis for a practical system.