Skip to Main Content
A machine with unrestricted vocabulary, that is capable of converting printed text into connected speech in real time, would be extremely useful to blind people. The problems in implementing such a machine are mainly 1) character recognition, 2) conversion of the symbolic form of written language into a symbolic form of spoken language, and 3) synthesis of connected speech from the symbolic description. The character recognition must be highly accurate, although high speed is not necessary. The language in spoken form may be symbolically represented by strings of segmental phonemes, together with additional specifications at phrase and sentence or suprasegmental levels. The segmental phonemes characterize the basic speech sound elements, and the suprasegmental specifications characterize intonation, stress, and pauses. For a restricted vocabulary, a spelling to pronouncing dictionary indicating pronunciation, as well as spelling, can be used to obtain the segmental phonemes; however, for an unrestricted vocabulary in a language like English, a scheme employing a dictionary that indicates the elements of words (prefixes, suffixes, and roots), together with a set of rules for word formation, is necessary and more economical. Since suprasegmental specifications depend upon sentence structure, sentence analysis, or parsing, must be performed to identify essential groups. The construction of a speech synthesizer may be based on the terminal transfer characteristic of the human vocal tract as a whole, or it may be based on the transfer characteristics of a cascade of many sections of variable cross-section area acoustic tubes which simulate the vocal tract. Speech synthesis-by-rule is the generation, according to a set of predetermined rules, of the variable parameters of a speech synthesizer as functions of time from an input of segmental and suprasegmental specifications.