Skip to Main Content
This paper proposes an effective keyword extraction method for the Web videos by analyzing the structure of the Web pages. The proposed scheme calculates the relative importance (or weights) of the text blocks to a video by analyzing the distances of the text blocks to the video. This distance, called the layout distance, indicates a degree of relevance of text block to video, and could be estimated by analyzing the layout structure of Web pages. Since the Web pages with several videos such as Web pages posting UCC videos have a special layout structure, this layout analysis helps to precisely estimate the relevance of text block to the video. This weight of text block is used to compute the final weights of keywords extracted from that text block by analyzing their HTML tags and other well-known techniques such as TF/IDF. Some experiments with 1,087 Web pages that have total 2,462 videos show that the precision of the proposed extraction scheme is 17% higher than ImageRover.