By Topic

Multimedia, IEEE Transactions on

Issue 2 • Date Feb. 2009

Filter Results

Displaying Results 1 - 18 of 18
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (35 KB)  
    Freely Available from IEEE
  • Integration of Context and Content for Multimedia Management: An Introduction to the Special Issue

    Page(s): 193 - 195
    Save to Project icon | Request Permissions | PDF file iconPDF (631 KB)  
    Freely Available from IEEE
  • Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context

    Page(s): 196 - 207
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1301 KB) |  | HTML iconHTML  

    With the exponential growth of social media, there exist huge numbers of near-duplicate web videos, ranging from simple formatting to complex mixture of different editing effects. In addition to the abundant video content, the social Web provides rich sets of context information associated with web videos, such as thumbnail image, time duration and so on. At the same time, the popularity of Web 2.0 demands for timely response to user queries. To balance the speed and accuracy aspects, in this paper, we combine the contextual information from time duration, number of views, and thumbnail images with the content analysis derived from color and local points to achieve real-time near-duplicate elimination. The results of 24 popular queries retrieved from YouTube show that the proposed approach integrating content and context can reach real-time novelty re-ranking of web videos with extremely high efficiency, where the majority of duplicates can be rapidly detected and removed from the top rankings. The speedup of the proposed approach can reach 164 times faster than the effective hierarchical method proposed in , with just a slight loss of performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image Annotation Within the Context of Personal Photo Collections Using Hierarchical Event and Scene Models

    Page(s): 208 - 219
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2135 KB) |  | HTML iconHTML  

    Most image annotation systems consider a single photo at a time and label photos individually. In this work, we focus on collections of personal photos and exploit the contextual information naturally implied by the associated GPS and time metadata. First, we employ a constrained clustering method to partition a photo collection into event-based subcollections, considering that the GPS records may be partly missing (a practical issue). We then use conditional random field (CRF) models to exploit the correlation between photos based on 1) time-location constraints and 2) the relationship between collection-level annotation (i.e., events) and image-level annotation (i.e., scenes). With the introduction of such a multilevel annotation hierarchy, our system addresses the problem of annotating consumer photo collections that requires a more hierarchical description of the customers' activities than do the simpler image annotation tasks. The efficacy of the proposed system is validated by extensive evaluation using a sizable geotagged personal photo collection database, which consists of over 100 photo collections and is manually labeled for 12 events and 12 scenes to create ground truth. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Context-Aware Person Identification in Personal Photo Collections

    Page(s): 220 - 228
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (916 KB) |  | HTML iconHTML  

    Identifying the people in photos is an important need for users of photo management systems. We present MediAssist, one such system which facilitates browsing, searching and semi-automatic annotation of personal photos, using analysis of both image content and the context in which the photo is captured. This semi-automatic annotation includes annotation of the identity of people in photos. In this paper, we focus on such person annotation, and propose person identification techniques based on a combination of context and content. We propose language modelling and nearest neighbor approaches to context-based person identification, in addition to novel face color and image color content-based features (used alongside face recognition and body patch features). We conduct a comprehensive empirical study of these techniques using the real private photo collections of a number of users, and show that combining context- and content-based analysis improves performance over content or context alone. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using Visual Context and Region Semantics for High-Level Concept Detection

    Page(s): 229 - 243
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1196 KB) |  | HTML iconHTML  

    In this paper we investigate detection of high-level concepts in multimedia content through an integrated approach of visual thesaurus analysis and visual context. In the former, detection is based on model vectors that represent image composition in terms of region types, obtained through clustering over a large data set. The latter deals with two aspects, namely high-level concepts and region types of the thesaurus, employing a model of a priori specified semantic relations among concepts and automatically extracted topological relations among region types; thus it combines both conceptual and topological context. A set of algorithms is presented, which modify either the confidence values of detected concepts, or the model vectors based on which detection is performed. Visual context exploitation is evaluated on TRECVID and Corel data sets and compared to a number of related visual thesaurus approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Content-Based Attention Ranking Using Visual and Contextual Attention Model for Baseball Videos

    Page(s): 244 - 255
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1313 KB) |  | HTML iconHTML  

    The attention analysis of multimedia data is challenging since different models have to be constructed according to different attention characteristics. This paper analyzes how people are excited about the watched video content and proposes a content-driven attention ranking strategy which enables client users to iteratively browse the video according to their preference. The proposed attention rank (AR) algorithm, which is extended from the Google PageRank algorithm that sorts the websites based on the importance, can effectively measure the user interest (UI) level for each video frame. The degree of attention is derived by integrating the object-based visual attention model (VAM) with the contextual attention model (CAM), which not only can more reliably take advantage of the human perceptual characteristics, but also can effectively identify which video content may attract users' attention. The information of users' feedback is utilized in re-ranking procedure to further improve the retrieving accuracy. The proposed algorithm is specifically evaluated on broadcasted baseball videos. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RoleNet: Movie Analysis from the Perspective of Social Networks

    Page(s): 256 - 271
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1657 KB) |  | HTML iconHTML  

    With the idea of social network analysis, we propose a novel way to analyze movie videos from the perspective of social relationships rather than audiovisual features. To appropriately describe role's relationships in movies, we devise a method to quantify relations and construct role's social networks, called RoleNet. Based on RoleNet, we are able to perform semantic analysis that goes beyond conventional feature-based approaches. In this work, social relations between roles are used to be the context information of video scenes, and leading roles and the corresponding communities can be automatically determined. The results of community identification provide new alternatives in media management and browsing. Moreover, by describing video scenes with role's context, social-relation-based story segmentation method is developed to pave a new way for this widely-studied topic. Experimental results show the effectiveness of leading role determination and community identification. We also demonstrate that the social-based story segmentation approach works much better than the conventional tempo-based method. Finally, we give extensive discussions and state that the proposed ideas provide insights into context-based video analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effective Annotation and Search for Video Blogs with Integration of Context and Content Analysis

    Page(s): 272 - 285
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1037 KB) |  | HTML iconHTML  

    In recent years, weblogs (or blogs) have received great popularity worldwide, among which video blogs (or vlogs) are playing an increasingly important role. However, research on vlog analysis is still in the early stage, and how to manage vlogs effectively so that they can be more easily accessible is a challenging problem. In this paper, we propose a novel vlog management model which is comprised of automatic vlog annotation and user-oriented vlog search. For vlog annotation, we extract informative keywords from both the target vlog itself and relevant external resources; besides semantic annotation, we perform sentiment analysis on comments to obtain the overall evaluation. For vlog search, we present saliency-based matching to simulate human perception of similarity, and organize the results by personalized ranking and category-based clustering. An evaluation criterion is also proposed for vlog annotation, which assigns a score to an annotation according to its accuracy and completeness in representing the vlog's semantics. Experimental results demonstrate the effectiveness of the proposed management model for vlogs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scale-Invariant Visual Language Modeling for Object Categorization

    Page(s): 286 - 294
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1412 KB) |  | HTML iconHTML  

    In recent years, ldquobag-of-wordsrdquo models, which treat an image as a collection of unordered visual words, have been widely applied in the multimedia and computer vision fields. However, their ignorance of the spatial structure among visual words makes them indiscriminative for objects with similar word frequencies but different word spatial distributions. In this paper, we propose a visual language modeling method (VLM), which incorporates the spatial context of the local appearance features into the statistical language model. To represent the object categories, models with different orders of statistical dependencies have been exploited. In addition, the multilayer extension to the VLM makes it more resistant to scale variations of objects. The model is effective and applicable to large scale image categorization. We train scale invariant visual language models based on the images which are grouped by Flickr tags, and use these models for object categorization. Experimental results show they achieve better performance than single layer visual language models and ldquobag-of-wordsrdquo models. They also achieve comparable performance with 2-D MHMM and SVM-based methods, while costing much less computational time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Video Summarization Based on Mining the Story-Structure and Semantic Relations Among Concept Entities

    Page(s): 295 - 312
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4673 KB) |  | HTML iconHTML  

    Video summarization techniques have been proposed for years to offer people comprehensive understanding of the whole story in the video. Roughly speaking, existing approaches can be classified into the two types: one is static storyboard, and the other is dynamic skimming. However, despite that these traditional methods give brief summaries for users, they still do not provide with a concept-organized and systematic view. In this paper, we present a structural video content browsing system and a novel summarization method by utilizing the four kinds of entities: who, what, where, and when to establish the framework of the video contents. With the assistance of the above-mentioned indexed information, the structure of the story can be built up according to the characters, the things, the places, and the time. Therefore, users can not only browse the video efficiently but also focus on what they are interested in via the browsing interface. In order to construct the fundamental system, we employ maximum entropy criterion to integrate visual and text features extracted from video frames and speech transcripts, generating high-level concept entities. A novel concept expansion method is introduced to explore the associations among these entities. After constructing the relational graph, we exploit graph entropy model to detect meaningful shots and relations, which serve as the indices for users. The results demonstrate that our system can achieve better performance and information coverage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • QUC-Tree: Integrating Query Context Information for Efficient Music Retrieval

    Page(s): 313 - 323
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (492 KB) |  | HTML iconHTML  

    In this paper, we introduce a novel indexing scheme-query context tree (QUC-tree) to facilitate efficient query sensitive music search under different query contexts. Distinguished from the previous approaches, QUC-tree is a balanced multiway tree structure, where each level represents the data space at different dimensionality. Before the tree structure construction, principle component analysis (PCA) is applied for data analysis and transforming the raw composite features into a new feature space sorted by the importance of acoustic features. The PCA transformed data and reduced dimensions in the upper levels can alleviate suffering from dimensionality curse. To accurately mimic human perception, an extension called QUC +-tree is proposed, which further applies multivariate regression and EM based algorithm to estimate the weight of each individual feature. The comprehensive extensive experiments to evaluate the proposed structures against state-of-art techniques based on different datasets. The experimental results demonstrate the superiority of our technique. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • WMA: A Marking-Based Synchronized Multimedia Tutoring System for English Composition Studies

    Page(s): 324 - 332
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1108 KB) |  | HTML iconHTML  

    This paper presents a system, Web-based Multimedia Annotation (WMA) system, for english as foreign language learning in writing skills. The whole correcting process, including the instructor's voice and navigation events (i.e., tele-pointer (cursor), highlight, pen strokes, markings and annotations), can be captured through our system for later access. We address the issues of exploring involved media correlation to benefit adaptable presentation in a synchronization manner from temporal, spatial and content domains. The proposed computed synchronization techniques include speech-event binding process in the temporal domain, tele-pointer movement interpolation and adaptable handwriting presentation in the spatial domain, and visualized annotation erasing in the content domain. The experimental results show that in the speech-event binding process 74% of speech access entries for accessible visualized annotations are found. The acceptable rate of human perception of tele-pointer movement is higher than 85% if time interval is selected carefully. The accuracy of visualized annotation erasing for content removal is about 71%. Our user study shows that students can devote their efforts to writing practice because they can better understand their own mistakes corrected by the instructors using this multimedia presentation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Contextual Mixture Tracking

    Page(s): 333 - 341
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (540 KB) |  | HTML iconHTML  

    Multiple object tracking (MOT) poses three challenges to conventional well-studied single object tracking (SOT) algorithms: 1) multiple targets lead the configuration space to be exponential to the number of targets; 2) multiple motion conditions due to multiple targets' entering, exiting and intersection make the prediction process degrade in precision; 3) visual ambiguities among nearby targets make the trackers error prone. In this paper, we address the MOT problem by embedding contextual proposal distributions and contextual observation models into a mixture tracker which is implemented in a Particle Filter framework. The proposal distributions are adaptively selected by motion conditions of targets which are determined by context information, and the multiple features are combined according to their discriminative power between ambiguity prone objects. The induction of contextual proposal distribution and observation model can help to surmount the incapability of conventional mixture tracker in handling object occlusions, meanwhile retain its merits of flexibility and high efficiency. The final experiments show significant improvement in variable number objects tracking scenarios compared with other methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Multimedia EDICS

    Page(s): 342
    Save to Project icon | Request Permissions | PDF file iconPDF (16 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia Information for authors

    Page(s): 343 - 344
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (22 KB)  
    Freely Available from IEEE

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo