Skip to Main Content
A method of phoneme recognition of connected speech is described. Input to the system is assumed to consist of the 24 continuant phonemes in connected English speech. The system first categorizes each successive 20-ms segment of the input speech utterance as either voiced fricative, voiced nonfricative, unvoiced fricative or no-speech, Utilizing a measure of the relative energy balance between low and high frequencies. Next, the recognition of each 20-ms segment is performed from a distribution of axis-crossing intervals of speech prefiltered to emphasize each formant frequency range. Segmentation is performed from the results of the recognition of each 20-ms segment and from changes in categorization. Finally, the results of the recognition of each 20-ms segment between each pair of segmentation boundaries are combined and the phonemic sound occurring most frequently is printed out. The system has been trained for a single male speaker. Preliminary results for this speaker and for four 3-4-s sentences indicate: a correct categorization decision for about 97 percent of the input 20-ms segments, a correct recognition for about 78 percent of the input 20-ms segments, and an overall correct phoneme recognition for about 87 percent of the input phonemes.