In this paper, an acoustic and visual signal based context awareness system is proposed for a mobile application. In particular multimodal system is designed that can sense and determine, in real-time, user contextual information, such as where the user is or what the user does, by processing acoustic and visual signals from the suitable sensors available in a mobile device. A variety of contextual information, such as babble sound in cafeteria, user¿s movement, and etc., can be recognized by the proposed acoustic and visual feature extraction and classification methods. We first describe the overall structure of the proposed system and then the algorithm for each module performing detection or classification of various contextual scenarios is presented. Representative experiments demonstrate the superiority of the proposed system while the actual implementation of the proposed scheme into mobile device such as a smart-phone confirms the effectiveness and realization of the proposed system.