Skip to Main Content
A speaker-independent isolated word speech recognition system is discussed in which an unknown utterance is described by a set of speech feature measurements and then compared with a reference set of the same measurements obtained during a training procedure with a population of speakers. To reduce significantly the number of word confusions, a segmentation procedure is used. As a result, the whole vocabulary is divided into a number of subgroups characterized by a certain phonetic structure which is represented by the sequence of four types of segments. Each of these segments reflects quite reliably certain speaker independent events in the speech signal. Thus, the subgroup may consist of the reference sets for several confusable words, on the other hand, each word is represented by its reference sets in a number of the subgroups. The system was evaluated Using a 20-word vocabulary (including ten digits). A mean recognition accuracy of about 95 percent was obtained. The tradeoff between quite precise segmentation, which leads to a very bulky system if it is speaker-independent, and rough segmentation, which in some cases does not reduce the number of confusable words too much, is also discussed.