Skip to Main Content
Methods for performing voiced/unvoiced/mixed excitation classification of speech are explored. The decision-making process is viewed as a pattern recognition problem. Three aspects of the task are examined: classifier type, decision structure, and feature selection. A variety of different approaches are compared. A classifier is obtained which, in limited tests, achieves 95 percent classification accuracy on speaker dependent tests (with 82.7 percent correct identification of mixed excitation frames), and 94 percent accuracy on speaker independent tests (with 77.6 percent correct identification of mixed excitation frames). The classifier uses a binary decision tree structure, in which a speech segment is first classified as predominantly voiced or predominantly unvoiced, then tested to determine if the excitation for the segment is mixed or not. Each decision is made using a Bayes classifier. The feature selection procedure identified a set of 14 features to make the voiced/unvoiced/mixed excitation classification.