Skip to Main Content
With an overflow of multimedia information around us and an urgent need to identify data accurately, an audio and visual identification system with a high accuracy rate is developed to meet the demand. Classification and feature extraction are performed separately on audio and visual signals. Pending on the temporal correlation of the feature vectors of objects and speakers, indexes of all objects included in an audio/visual sequence are listed in a time sequence. In integrating the audio/visual features, every object or character of the key frames has a set of feature vectors; the user can select and search specific characters that have the audio and visual features from the entire index set. Due to integrating the audio/visual identification results in the time order, the proposed identification system can increase the accuracy about 4% and 6% in our experiments, comparing with the results using the audio features and visual features separately, respectively.