Abstract:
The goal of this paper is to describe the voice command system as part of the multi modal user interface for residential application project demoed at CES 2012. The appli...Show MoreMetadata
Abstract:
The goal of this paper is to describe the voice command system as part of the multi modal user interface for residential application project demoed at CES 2012. The application is a 3D TV panel which can be controlled through face recognition, gesture, and speech. The speech interface is invoked using activation keyword, and terminated in similar fashion with de-activation keyword. Speaker recognition is performed on the activation keyword to allow personalization of the voice commands available to the particular user, who in this scenario is a member of the household. A separate setting is also devised to enable guest user to have basic interaction with the system. Template matching scheme using dynamic time warping is employed for its simplicity and robustness to noise. The template chosen is a cluster of Gaussian Mixture Model (GMM), each representing a sub-word unit. A state model for voice interaction is presented to allow efficient operation of this interface.
Date of Conference: 12-14 January 2012
Date Added to IEEE Xplore: 16 February 2012
ISBN Information: