Skip to Main Content
In most template-matching methods of automatic word recognition, putatively corresponding frames of the template and the unknown speech are found by allowing time alignment such that a least cumulative spectral distance is obtained. The resultant time warping allows the best match to the spectrum of each frame, but in doing so it can destroy temporal relations among frames. Therefore, a technique was developed to take advantage of characteristic temporal relations among the acoustic segments of a test word. An algorithm using jumps in energy and spectral tilt was used to divide the word into acoustic segments, and upper and lower bounds on ratios of unwarped durations were set from known occurrences of the word in a development data base. These segmentation procedures and ratio criteria were then applied to the best-scoring stretches of speech from a different set of talkers found by an automatic speech recognition system that relies on a spectrally-based time warp. None of the 12 occurrences, and 19 of the 22 non-occurrences of the test word were rejected.