By Topic

Word recognition by means of orthogonal functions

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
M. Clark ; Department of Defense, Washington, D.C

This paper describes experiments in which word recognition is based on comparing the projections of input words on an orthogonal basis with those of a stored library of words. An initial orthogonal basis is determined from the generalized spectrum of short time segments selected from a vocabulary of ten words. The initial basis is optimized by minimizing the complementary error energy. By projecting a spoken word onto the optimum orthogonal basis, a sequence of numbers is generated to represent the word. By correlating the absolute values of the sequence with those of a stored library of words, the spoken word is identified. The percent of correct recognition varies from 71.6 to 96.6 percent for two speakers. Techniques are developed to improve the recognition scores and to reduce the lengthy computer processing time and large storage requirement. First a master template is made for each word by averaging six templates for the particular word. For one speaker the percent of correct recognition increases to 100 percent when incoming words are compared against the master templates. For a second speaker, the recognition rates improve significantly and vary between 93 and 98 percent when the master templates are used. To further improve the recognition process, the feasibility of grouping words into several classes is demonstrated. The classifications are based on the locations of formant regions and the time durations of each spoken word.

Published in:

IEEE Transactions on Audio and Electroacoustics  (Volume:18 ,  Issue: 3 )