Skip to Main Content
We present the results of applying a combination of features for recognizing word utterances extracted from a continuous stream of speech. Three sets of features, namely, spectral energy in Bark bands, mel frequency cepstral coefficients, and parameters from an AM-FM model, were employed for training and testing a set of keywords in the CallHome telephone speech database. A pair-wise comparison between the feature set of an unknown word utterance and that of each of the reference utterances in a dynamic time warping process showed a false negative score of 4 out of 12, and a false positive score of 5 out of 132 for a subset of speech from the database. Long, multisyllabic words were spotted correctly while two short words in the word list contributed to errors.