Skip to Main Content
Assembling a speech data base that is both manageably small and sufficiently diverse can be a useful step in the development of speaker independent speech recognition systems. Yet there has been no data on what kind of speaker sample might be required to ensure a group whose speech includes certain phonetic or linguistic traits. The data gathered in this study suggests that some common and important dialect features will not be found even in a large number of speakers, if sampling is conducted at a single location. In order to compile a large pool of prospective speakers, 152 people were recorded for about one or two minutes speaking extemporaneously; the recordings were then rated by the three authors according to fifteen characteristics that form three classes: voice quality, manner of speaking, and dialect. Although a wide variety of voice characteristics and manners of speaking were evident among the 152 speakers, the dialect features covered a limited range. We discuss the possible causes of this distribution of characteristics in the sample and some of its implications for collecting adequate databases for speech recognition research.
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '85. (Volume:10 )
Date of Conference: Apr 1985