Abstract:
Recognition of speaker emotion during interaction in spoken dialog systems can enhance the user experience, and provide system operators with information valuable to ongo...Show MoreMetadata
Abstract:
Recognition of speaker emotion during interaction in spoken dialog systems can enhance the user experience, and provide system operators with information valuable to ongoing assessment of interaction system performance and utility. Interaction utterances are very short, and we assume the speaker's emotion is constant throughout a given utterance. This paper investigates combinations of a GMM-based low-level feature extractor with a neural network serving as a high level feature extractor. The advantage of this system architecture is that it combines the fast developing neural network-based solutions with the classic statistical approaches applied to emotion recognition. Experiments on a Mandarin data set compare different solutions under the same or close conditions.
Published in: 2017 Information Theory and Applications Workshop (ITA)
Date of Conference: 12-17 February 2017
Date Added to IEEE Xplore: 31 August 2017
ISBN Information: