Skip to Main Content
In natural speech, some phoneme transitions correspond to abrupt changes in the acoustic signal. Others are less clear-cut because the acoustic transition from one phoneme to the next is gradual. In this paper we determine the naturally occurring groups of phonemes (regardless of conventional phonetic categories) which show similar characteristics in such behavior. These data-driven groupings could be used in the design of decision-trees for context-dependent phoneme clustering, as used in large-vocabulary speech recognition and alignment systems, or during the design of speech databases for speech synthesis systems. We use 128 different Hidden Markov Model phoneme alignment systems and a large corpus of British English speech to assess the consistency with which different phoneme transitions can be identified. The phoneme transitions are grouped automatically so as to minimize the statistical differences in behavior between members of each group. In this way we derive two sets of phonemic classes, one for the first phoneme of each phoneme-to-phoneme transition, and another for the second. The grouping of the phonemes confirms that broad phonetic classes are a significant indicator of the accuracy with which boundaries can be identified, but there are a number of exceptions and some apparent sub-divisions and mergers of accepted phonetic classes. The automatic grouping of the second phonemes results in two singletons, /Z/ and /N/ (in SAMPA notation). Finally, statistics are presented which characterize the precision with which transitions between these automatic classes can be identified. These could provide weightings to be applied to different transitions to provide a more realistic assessment when evaluating the relative accuracies of different alignment systems.