Skip to Main Content
Conventional approaches to video annotation predominantly focus on supervised identification of a limited set of concepts, while unsupervised annotation with infinite vocabulary remains unexplored. This work aims to exploit the overlap in content of news video to automatically annotate by mining similar videos that reinforce, filter, and improve the original annotations. The algorithm employs a two-step process of search followed by mining. Given a query video consisting of visual content and speech-recognized transcripts, similar videos are first ranked in a multimodal search. Then, the transcripts associated with these similar videos are mined to extract keywords for the query. We conducted extensive experiments over the TRECVID 2005 corpus and showed the superiority of the proposed approach to using only the mining process on the original video for annotation. This work represents the first attempt at unsupervised automatic video annotation leveraging overlapping video content.