By Topic

Multimedia, IEEE Transactions on

Issue 7 • Date Nov. 2010

Filter Results

Displaying Results 1 - 20 of 20
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Digital Cinema Watermarking for Estimating the Position of the Pirate

    Page(s): 605 - 621
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2286 KB) |  | HTML iconHTML  

    Many illegal copies of digital video productions for cinema release can be found on the Internet before their official release. During the illegal copying of cinema footage, composite geometric distortions commonly occur due to the angle of the camcorder relative to the screen. We propose a novel video watermarking based on spread spectrum way that satisfies the requirements for protecting digital cinema. It enables the detector to not only extract the embedded message but also estimate the position where the camcorder recording is made. It is sure that the proposed position estimating model (PEM) can judge the seat in a theater with a mean absolute error (MAE) of (33.84, 9.53, 50.38) cm. Experimental results using various types of films show that the presented method provides the mathematical model for detecting and investigating the position of the pirate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Robust Block-Based Image/Video Registration Approach for Mobile Imaging Devices

    Page(s): 622 - 635
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6473 KB) |  | HTML iconHTML  

    Digital video stabilization enables to acquire video sequences without disturbing jerkiness by compensating unwanted camera movements. In this paper, we propose a novel fast image registration algorithm based on block matching. Unreliable motion vectors (i.e., not related with jitter movements) are properly filtered out by making use of ad-hoc rules taking into account local similarity, local “activity,” and matching effectiveness. Moreover, a temporal analysis of the relative error computed at each frame has been performed. Reliable information is then used to retrieve inter-frame transformation parameters. Experiments on real cases confirm the effectiveness of the proposed approach even in critical conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Comparison of Perceptually-Based Metrics for Objective Evaluation of Geometry Processing

    Page(s): 636 - 649
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1664 KB) |  | HTML iconHTML  

    Recent advances in 3-D graphics technologies have led to an increasing use of processing techniques on 3-D meshes, such as filtering, compression, watermarking, simplification, deformation, and so forth. Since these processes may modify the visual appearance of the 3-D objects, several metrics have been introduced to properly drive or evaluate them, from classic geometric ones such as Hausdorff distance, to more complex perceptually-based measures. This paper presents a survey on existing perceptually-based metrics for visual impairment of 3-D objects and provides an extensive comparison between them. In particular, different scenarios which correspond to different perceptual and cognitive mechanisms are analyzed. The objective is twofold: 1) catching the behavior of existing measures to help Perception researchers for designing new 3-D metrics and 2) providing a comparison between them to inform and help computer graphics researchers for choosing the most accurate tool for the design and the evaluation of their mesh processing algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptation of Multimedia Presentations for Different Display Sizes in the Presence of Preferences and Temporal Constraints

    Page(s): 650 - 664
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1751 KB) |  | HTML iconHTML  

    Multimedia content adaptation has become important due to many devices with different amounts of resources like display sizes, memories, and computation capabilities. Existing studies perform content adaptation on web pages and other media files that have the same start times and durations. In this paper, we present a content adaptation method for multimedia presentations constituting media files with different start times and durations. We perform adaptation based on preferences and temporal constraints specified by authors and generate an order of importance among media files. Our method can automatically generate layouts by computing the locations, start times, and durations of the media files. We compare three solutions to generate layouts: 1) exhaustive search, 2) dynamic programming, and 3) greedy algorithms. We analyze the presentations by varying screen resolutions, media files, preferences, and temporal constraints. Our analysis shows that screen utilizations are 92%, 85%, and 80% for the three methods, respectively. The time to generate layouts for a presentation with 100 media files is 1200, 17, and 10 s, respectively, for the three methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-Time Visual Concept Classification

    Page(s): 665 - 681
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1548 KB) |  | HTML iconHTML  

    As datasets grow increasingly large in content-based image and video retrieval, computational efficiency of concept classification is important. This paper reviews techniques to accelerate concept classification, where we show the trade-off between computational efficiency and accuracy. As a basis, we use the Bag-of-Words algorithm that in the 2008 benchmarks of TRECVID and PASCAL lead to the best performance scores. We divide the evaluation in three steps: 1) Descriptor Extraction, where we evaluate SIFT, SURF, DAISY, and Semantic Textons. 2) Visual Word Assignment, where we compare a k-means visual vocabulary with a Random Forest and evaluate subsampling, dimension reduction with PCA, and division strategies of the Spatial Pyramid. 3) Classification, where we evaluate the χ2, RBF, and Fast Histogram Intersection kernel for the SVM. Apart from the evaluation, we accelerate the calculation of densely sampled SIFT and SURF, accelerate nearest neighbor assignment, and improve accuracy of the Histogram Intersection kernel. We conclude by discussing whether further acceleration of the Bag-of-Words pipeline is possible. Our results lead to a 7-fold speed increase without accuracy loss, and a 70-fold speed increase with 3% accuracy loss. The latter system does classification in real-time, which opens up new applications for automatic concept classification. For example, this system permits five standard desktop PCs to automatically tag for 20 classes all images that are currently uploaded to Flickr. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Natural Visible and Infrared Facial Expression Database for Expression Recognition and Emotion Inference

    Page(s): 682 - 691
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2176 KB) |  | HTML iconHTML  

    To date, most facial expression analysis has been based on visible and posed expression databases. Visible images, however, are easily affected by illumination variations, while posed expressions differ in appearance and timing from natural ones. In this paper, we propose and establish a natural visible and infrared facial expression database, which contains both spontaneous and posed expressions of more than 100 subjects, recorded simultaneously by a visible and an infrared thermal camera, with illumination provided from three different directions. The posed database includes the apex expressional images with and without glasses. As an elementary assessment of the usability of our spontaneous database for expression recognition and emotion inference, we conduct visible facial expression recognition using four typical methods, including the eigenface approach [principle component analysis (PCA)], the fisherface approach [PCA + linear discriminant analysis (LDA)], the Active Appearance Model (AAM), and the AAM-based + LDA. We also use PCA and PCA+LDA to recognize expressions from infrared thermal images. In addition, we analyze the relationship between facial temperature and emotion through statistical analysis. Our database is available for research purposes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 3-D Model Search and Retrieval From Range Images Using Salient Features

    Page(s): 692 - 704
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2368 KB) |  | HTML iconHTML  

    This paper presents a novel framework for partial matching and retrieval of 3-D models based on a query-by-range-image approach. Initially, salient features are extracted for both the query range image and the 3-D target model. The concept behind the proposed algorithm is that, for a 3-D object and a corresponding query range image, there should be a virtual camera with such intrinsic and extrinsic parameters that would generate an optimum range image, in terms of minimizing an error function that takes into account the salient features of the objects, when compared to other parameter sets or other target 3-D models. In the context of the developed framework, a novel method is also proposed to hierarchically search in the parameter space for the optimum solution. Experimental results illustrate the efficiency of the proposed approach even in the presence of noise or occlusion. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining Compositional Features From GPS and Visual Cues for Event Recognition in Photo Collections

    Page(s): 705 - 716
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1495 KB) |  | HTML iconHTML  

    As digital cameras with Global Positioning System (GPS) capability become available and people geotag their photos using other means, it is of great interest to annotate semantic events (e.g., hiking, skiing, party) characterized by a collection of geotagged photos with timestamps and GPS information at the capture. We address this emerging event classification problem by mining informative features derived from image contents and spatio-temporal traces of GPS coordinates that characterize the underlying movement patterns of various event types, both based on the entire collection as opposed to individual photos. Considering that events are better described by the co-occurrence of objects and scenes, we bundle primitive features such as color and texture histograms or GPS features to form the discriminative compositional feature. A data mining method is proposed to efficiently discover discriminative compositional features of small classification errors. A theoretical analysis is also presented to guide the selection of the data mining parameters. Upon compositional feature mining, we perform the multiclass AdaBoost to further integrate the mined compositional features. Finally, the GPS and visual modalities are united through a confidence-based fusion. Based on a dataset of more than 3000 geotagged images, experimental results have shown the synergy of all of the components in our proposed approach to event classification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-View Video Summarization

    Page(s): 717 - 729
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1179 KB) |  | HTML iconHTML  

    Previous video summarization studies focused on monocular videos, and the results would not be good if they were applied to multi-view videos directly, due to problems such as the redundancy in multiple views. In this paper, we present a method for summarizing multi-view videos. We construct a spatio-temporal shot graph and formulate the summarization problem as a graph labeling task. The spatio-temporal shot graph is derived from a hypergraph, which encodes the correlations with different attributes among multi-view video shots in hyperedges. We then partition the shot graph and identify clusters of event-centered shots with similar contents via random walks. The summarization result is generated through solving a multi-objective optimization problem based on shot importance evaluated using a Gaussian entropy fusion scheme. Different summarization objectives, such as minimum summary length and maximum information coverage, can be accomplished in the framework. Moreover, multi-level summarization can be achieved easily by configuring the optimization parameters. We also propose the multi-view storyboard and event board for presenting multi-view summaries. The storyboard naturally reflects correlations among multi-view summarized shots that describe the same important event. The event-board serially assembles event-centered multi-view shots in temporal order. Single video summary which facilitates quick browsing of the summarized multi-view video can be easily generated based on the event board representation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Authentication of Scalable Video Streams With Low Communication Overhead

    Page(s): 730 - 742
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (850 KB) |  | HTML iconHTML  

    The large prevalence of multimedia systems in recent years makes the security of multimedia communications an important and critical issue. We study the problem of securing the delivery of scalable video streams so that receivers can ensure the authenticity of the video content. Our focus is on recent scalable video coding (SVC) techniques, such as H.264/SVC, which can provide three scalability types at the same time: temporal, spatial, and visual quality. This three-dimensional scalability offers a great flexibility that enables customizing video streams for a wide range of heterogeneous receivers and network conditions. This flexibility, however, is not supported by current stream authentication schemes in the literature. We propose an efficient and secure authentication scheme that accounts for the full scalability of video streams, and enables verification of all possible substreams that can be extracted from the original stream. In addition, we propose an algorithm for minimizing the amount of authentication information that need to be attached to streams. The proposed authentication scheme supports end-to-end authentication, in which any third-party entity involved in the content delivery process, such as stream adaptation proxies and caches, does not have to understand the authentication mechanism. Our simulation study with real video traces shows that the proposed authentication scheme is robust against packet losses, incurs low computational cost for receivers, has short delay, and adds low communication overhead. Finally, we implement the proposed authentication scheme as an open source library called svcAuth, which can be used as a transparent add-on by any multimedia streaming application. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SPANC: Optimizing Scheduling Delay for Peer-to-Peer Live Streaming

    Page(s): 743 - 753
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1064 KB) |  | HTML iconHTML  

    In peer-to-peer (P2P) live streaming using unstructured mesh, packet scheduling is an important factor in overall playback delay. In this paper, we propose a scheduling algorithm to minimize scheduling delay. To achieve low delay, our scheduling is predominantly push in nature, and the schedule needs to be changed only upon significant change in network states (due to, for examples, bandwidth change or parent churns). Our scheme, termed SPANC (Substream Pushing and Network Coding), pushes video packets in substreams and recovers packet loss using network coding. Given heterogeneous contents, delays, and bandwidths of parents of a peer, we formulate the substream assignment (SA) problem to assign substreams to parents with minimum delay. The SA problem can be optimally solved in polynomial time by transforming it to a max-weighted bipartite matching problem. We then formulate the fast recovery with network coding (FRNC) problem, which is to assign network coded packets to each parent to achieve minimum recovery delay. The FRNC problem can also be solved exactly in polynomial time with dynamic programming. Simulation results show that SPANC achieves substantially lower delay with little cost in bandwidth, as compared with recent approaches based on pull, network coding and hybrid pull-push. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Lightweight SCTP for Partially Reliable Overlay Video Multicast Service for Mobile Terminals

    Page(s): 754 - 766
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1013 KB) |  | HTML iconHTML  

    In this article, a video multicast protocol for multi-homed mobile terminals is proposed as an alternative stream control transmission protocol (SCTP) for partially reliable multicast services. It works with overlay peer-to-peer video multicast facility in the application layer. For a multi-homed mobile terminal, an error burst may occur when a handover is in process in the primary path switching procedure. The key issue concerned in this protocol is the ability to predict packet loss and to retransmit the lost packets as soon as a mobile terminal completes its primary path switching procedure. This property controls the delay sensitivity of transmissions. Conversely, the protocol can tolerate partial loss in video transmission as long as the loss is limited to a relatively short error burst. In addition, it reduces the message overhead significantly and provides a scalable communication mechanism for multicast applications. The performance improvement of the proposed protocol comes from 1) the estimation of temporal velocity of mobile terminals with lost packet prediction in a long error burst, and 2) the requirement for each mobile terminal to indicate which packets can be safely discarded from its agent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Multimedia EDICS

    Page(s): 767
    Save to Project icon | Request Permissions | PDF file iconPDF (16 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia Information for authors

    Page(s): 768 - 769
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • Special issue on integrated circuit and system security

    Page(s): 770
    Save to Project icon | Request Permissions | PDF file iconPDF (152 KB)  
    Freely Available from IEEE
  • Special issue on Soft detection for Wireless Transmission

    Page(s): 771
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE
  • Call for papers-Special Issue on Robust Measures and tests using Sparse Data for Detection and Elimination

    Page(s): 772
    Save to Project icon | Request Permissions | PDF file iconPDF (140 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE

Aims & Scope

The scope of the Periodical is the various aspects of research in multimedia technology and applications of multimedia.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Chang Wen Chen
State University of New York at Buffalo