By Topic

Reading machine: From text to speech

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Lee, F. ; Massachusetts Institute of Technology, Cambridge, Mass

A machine with unrestricted vocabulary, that is capable of converting printed text into connected speech in real time, would be extremely useful to blind people. The problems in implementing such a machine are mainly 1) character recognition, 2) conversion of the symbolic form of written language into a symbolic form of spoken language, and 3) synthesis of connected speech from the symbolic description. The character recognition must be highly accurate, although high speed is not necessary. The language in spoken form may be symbolically represented by strings of segmental phonemes, together with additional specifications at phrase and sentence or suprasegmental levels. The segmental phonemes characterize the basic speech sound elements, and the suprasegmental specifications characterize intonation, stress, and pauses. For a restricted vocabulary, a spelling to pronouncing dictionary indicating pronunciation, as well as spelling, can be used to obtain the segmental phonemes; however, for an unrestricted vocabulary in a language like English, a scheme employing a dictionary that indicates the elements of words (prefixes, suffixes, and roots), together with a set of rules for word formation, is necessary and more economical. Since suprasegmental specifications depend upon sentence structure, sentence analysis, or parsing, must be performed to identify essential groups. The construction of a speech synthesizer may be based on the terminal transfer characteristic of the human vocal tract as a whole, or it may be based on the transfer characteristics of a cascade of many sections of variable cross-section area acoustic tubes which simulate the vocal tract. Speech synthesis-by-rule is the generation, according to a set of predetermined rules, of the variable parameters of a speech synthesizer as functions of time from an input of segmental and suprasegmental specifications.

Published in:

Audio and Electroacoustics, IEEE Transactions on  (Volume:17 ,  Issue: 4 )