Skip to Main Content
The importance of the emotion information in human speech has been growing in recent years due to increasing use of natural user interfacing in embedded systems. Speech-based human-machine communication has the advantage of a high degree of usability, but it need not be limited to speech-to-text and text-to-speech capabilities. Emotion recognition in uttered speech has been considered in this research to integrate a speech recognizer/synthesizer with the capacity to recognize and synthesize emotion. This paper describes a complete framework for recognizing and synthesizing emotional speech based on smart logic (fuzzy logic and artificial neural networks). Time-domain signal-processing algorithms has been applied to reduce computational complexity at the feature-extraction level. A fuzzy-logic engine was modeled to make inferences about the emotional content of the uttered speech. An artificial neural network was modeled to synthesize emotive speech. Both were designed to be integrated into an embedded handheld device that implements a speech-based natural user interface (NUI).