Skip to Main Content
This paper describes experiments in which word recognition is based on comparing the projections of input words on an orthogonal basis with those of a stored library of words. An initial orthogonal basis is determined from the generalized spectrum of short time segments selected from a vocabulary of ten words. The initial basis is optimized by minimizing the complementary error energy. By projecting a spoken word onto the optimum orthogonal basis, a sequence of numbers is generated to represent the word. By correlating the absolute values of the sequence with those of a stored library of words, the spoken word is identified. The percent of correct recognition varies from 71.6 to 96.6 percent for two speakers. Techniques are developed to improve the recognition scores and to reduce the lengthy computer processing time and large storage requirement. First a master template is made for each word by averaging six templates for the particular word. For one speaker the percent of correct recognition increases to 100 percent when incoming words are compared against the master templates. For a second speaker, the recognition rates improve significantly and vary between 93 and 98 percent when the master templates are used. To further improve the recognition process, the feasibility of grouping words into several classes is demonstrated. The classifications are based on the locations of formant regions and the time durations of each spoken word.