Skip to Main Content
As part of our goal to design large-vocabulary, phonetically-based isolated word recognition systems, we investigated the statistical properties and constraints of the phonemic structures of English words. Our database consisted of five lexicons varying in size from 1250 to 20,000 words. The lexicons included, in addition to a phonemic transcription for each word, the word's frequency of occurrence as determined from the Brown Corpus. We studied the distributions of the phonemes, both individually and by class, within the lexicon and within the corpus. Distributions of consonant clusters were also obtained. Finally, the distribution of words in terms of patterns derived from broad categorization of the phonemes was investigated. This paper summarizes the results of these studies and discusses implications for phonetically-based isolated word recognition strategies.