In this paper we propose a system that annotates a user generated video based on the associated location metadata, by exploiting user-tagged image databases. An example of such a database is a photo sharing Web site such as Flickr where users upload their images and annotate them with various tags. The goal is to find the tags that have high probability of being relevant to the video without any complex object or action recognition being done to the video sequence. A video is first segmented into camera views and a set of keyframes are selected to represent the video. We will describe the concept of camera view as the basic element of user generated videos which has special properties suitable for the video annotation application. The keyframes are used to retrieve the most relevant images in the database. A ldquotag processingrdquo step is then used to tag the video.
Published in:
Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on
Date of Conference: June 28 2009-July 3 2009