Skip to Main Content
Assembling a speech data base that is both manageably small and sufficiently diverse can be a useful step in the development of speaker independent speech recognition systems. Yet there has been no data on what kind of speaker sample might be required to ensure a group whose speech includes certain phonetic or linguistic traits. The data gathered in this study suggests that some common and important dialect features will not be found even in a large number of speakers, if sampling is conducted at a single location. In order to compile a large pool of prospective speakers, 152 people were recorded for about one or two minutes speaking extemporaneously; the recordings were then rated by the three authors according to fifteen characteristics that form three classes: voice quality, manner of speaking, and dialect. Although a wide variety of voice characteristics and manners of speaking were evident among the 152 speakers, the dialect features covered a limited range. We discuss the possible causes of this distribution of characteristics in the sample and some of its implications for collecting adequate databases for speech recognition research.