We present a system that summarizes the textual, audio, and video information of music videos in a format tuned to the preferences of a focus group of 20 users. First, we analyzed user-needs for the content and the layout of the music summaries. Then, we designed algorithms that segment individual song videos from full music video programs by noting changes in color palette, transcript, and audio classification. We summarize each song with automatically selected high level information such as title, artist, duration, title frame, and text as well as audio and visual segments of the chorus. Our system automatically determines with high recall and precision chorus locations, from the placement of repeated words and phrases in the text of the song's lyrics. Our Bayesian belief network then selects other significant video and audio content from the multiple media. Overall, we are able to compress content by a factor of 10. Our second user study has identified the principal variations between users in their choices of content desired in the summary, and in their choices of the platforms that should support their viewing.
Published in:
Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on
(Volume:3
)
Date of Conference: 27-30 June 2004