I. Introduction
Speech recognition technology research is mainly focusing on the voice of human. The dramatic increase in direct person-to-person communication triggered by international trade and human migration since the 1970s motivated basic research activities in spoken language technology. Furthermore, information technologies, such as the Internet, broadband networks, and the popularization of powerful personal computers, have been increasing our ability to access documents written in foreign languages and have face-to-face conversations between persons with different mother tongues from the end of the 1990s and into the 2000s. In order to enable smooth verbal communication between people of different languages, it is necessary to translate their utterances based on an understanding of the intentions of the speakers, their cultural backgrounds, and the context of the dialog. Our ultimate goal is to develop a speech recognition control system to deal with all of these features. An audio-visual speech recognition control system [1]–[5] will be developed by Mel Frequency Cepstral Coefficient and image-based method in order to speak independence.