Skip to Main Content
As the use of voice response systems employing synthetic speech becomes more widespread in consumer products, industrial and military applications, and aids for the handicapped, it will be necessary to develop reliable methods of comparing different synthesis systems and of assessing how human observers perceive and respond to the speech generated by these systems. The selection of a specific voice response system for a particular application depends on a wide variety of factors only one of which is the inherent intelligibility of the speech generated by the synthesis routines. In this paper, we describe the results of several studies that applied measures of phoneme intelligibility, word recognition, and comprehension to assess the perception of synthetic speech. Several techniques were used to compare performance of different synthesis systems with natural speech and to learn more about how humans perceive synthetic speech generated by rule. Our findings suggest that the perception of synthetic speech depends on an interaction of several factors including the acoustic-phonetic properties of the speech signal, the requirements of the perceptual task, and the previous experience of the listener. Differences in perception between natural speech and high-quality synthetic speech appear to be related to the redundancy of the acoustic-phonetic information encoded in the speech signal.